This is a Distributed SQL engine not a database. We store no data. You store you...

chrisjc · on Aug 5, 2019

Is there any reason why a SQL format isn't is that list? Wondering if there's a way to join SQL sources with file storage sources. An example of this would be filtering or enrichment operations.

// sorry if this is a stupid question.

felipe_aramburu · on Aug 6, 2019

When you say SQL format do you mean being able to read the output of a jdbc or odbc driver? If this is the case then mostly just time. You are not the first person to ask about this and now that there are java bindings in cudf this might become easier to make a reality in the next few months.

Or do you mean being able to read a database's file format natively? If this is the case there are many reasons. 1. There are many poorly/non documented formats 2. Even if you decide to read some other DB's format natively, those formats change over time 3. Little control of how and where the data is laid out

roaramburu · on Aug 5, 2019

Not a stupid question. The reason is priorities, but definitely our ideal to do predicate push down and join databases to files, streams, etc.

ohnoesjmr · on Aug 5, 2019

I've read the website, but I could't find a hint that the engine is distributed. Even the spark benchmarks compare a single instance with multiple nodes.

Is it distributed? How do I set it up in a distributed mode? Does it support nested parquet (something that even spark itself struggles to support inside SQL).

roaramburu · on Aug 5, 2019

Distributed is getting released in the next few days, I've been playing with it over the past week.

Right now we use k8s on Google K8s Engine(GKE) to deploy in distributed mode.

We don't supported nested at present, there are Rapids teams looking into this.