> I do not see why it would be much slower than direct access to the storage.
Implementations of protocols like ODBC/JDBC generally implement their custom on-wire binary protocols that must be marshalled to/from the lib - and the performance would vary a lot from one implementation to another. We are seeing a lot of improvements in this space though, especially with the adoption of Arrow.
There is also the question of computing for ML. Data scientists today use several tools/frameworks ranging from scikit-learn/XGBoost to PyTorch/Keras/TensorFlow - to name a few. Enabling data scientists to use these frameworks against near-realtime data without worrying about provisioning infrastructure or managing dependencies or adding an additional export-to-cloud-storage hop is a game changer IMO.
Implementations of protocols like ODBC/JDBC generally implement their custom on-wire binary protocols that must be marshalled to/from the lib - and the performance would vary a lot from one implementation to another. We are seeing a lot of improvements in this space though, especially with the adoption of Arrow.
There is also the question of computing for ML. Data scientists today use several tools/frameworks ranging from scikit-learn/XGBoost to PyTorch/Keras/TensorFlow - to name a few. Enabling data scientists to use these frameworks against near-realtime data without worrying about provisioning infrastructure or managing dependencies or adding an additional export-to-cloud-storage hop is a game changer IMO.