sharing some thoughts here, in that I am recently developing a similar thing:
1. "Query 1.6B rows in milliseconds, live" is just like "sum 1.6B numbers from memory in ms".
In fact, if not full SQL functionalities supported, a naive SQL query is just some tight loop on top of arrays(as partitions for naive data parallelism) and multi-core processors.
So, this kind is just several-line benchmark(assumed to ignore the data preparing and threading wrapping) to see how much time the sum loop can finish.
In fact again, this is just a naive memory bandwidth bench code.
Let's count: now the 6-channel xeon-sp can provide ~120GB/s bandwidth. Then sum loop with 1.6B 4-byte ints without compression in such processors' memory could be finished about ~1.6*4/120 ~= 50ms.
Then, if you find that you get 200ms in xxx db, you in fact has wasted 75% time(150ms) in other things than your own brew a small c program for such toy analysis.
2. Some readers like to see comparisons to ClickHouse(referred as CH below).
The fact is that, CH is a little slow for such naive cases here(seen at web[1] been pointed by guys).
This is because CH is a real world product. All optimizations here are ten- year research and usage in database industry and all included in CH and much much more.
Can you hold such statement in the title when you enable reading from persistent disk? or when doing a high-cardinality aggregation in the query(image that low-cardinality aggregation is like as a tight loop + hash table in L2)?
1. We do plan to support full SQL functionalities, we have a pretty good subset already [1]. I think that what we do is more than "naive memory bandwidth bench code", however I am happy to listen and when the time is right implement functionalities/features that you think we are missing.
2. "Can you hold such statement in the title when you enable reading from persistent disk?" We already persist to disk, the numbers you see imply reading from disk. We do this by using memory mapped files [2].
1. "Query 1.6B rows in milliseconds, live" is just like "sum 1.6B numbers from memory in ms".
In fact, if not full SQL functionalities supported, a naive SQL query is just some tight loop on top of arrays(as partitions for naive data parallelism) and multi-core processors.
So, this kind is just several-line benchmark(assumed to ignore the data preparing and threading wrapping) to see how much time the sum loop can finish.
In fact again, this is just a naive memory bandwidth bench code.
Let's count: now the 6-channel xeon-sp can provide ~120GB/s bandwidth. Then sum loop with 1.6B 4-byte ints without compression in such processors' memory could be finished about ~1.6*4/120 ~= 50ms.
Then, if you find that you get 200ms in xxx db, you in fact has wasted 75% time(150ms) in other things than your own brew a small c program for such toy analysis.
2. Some readers like to see comparisons to ClickHouse(referred as CH below).
The fact is that, CH is a little slow for such naive cases here(seen at web[1] been pointed by guys).
This is because CH is a real world product. All optimizations here are ten- year research and usage in database industry and all included in CH and much much more.
Can you hold such statement in the title when you enable reading from persistent disk? or when doing a high-cardinality aggregation in the query(image that low-cardinality aggregation is like as a tight loop + hash table in L2)?
[1] https://tech.marksblogg.com/benchmarks.html