CREATE INDEX currently has the restriction that the index must fit in memory [1]. As the data is already sorted, creating an index is not necessary anyway. The min/max indexes created automatically by the system are sufficient to complete the query in a few milliseconds.
D CREATE TABLE passwords (hash TEXT, count INT);
D COPY passwords FROM '~/Downloads/pwned-passwords-sha1-ordered-by-hash-v8.txt' (SEPARATOR ':');
D .timer on
D SELECT \* FROM passwords WHERE hash=upper('5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8');
┌──────────────────────────────────────────┬─────────┐
│ hash │ count │
│ varchar │ int32 │
├──────────────────────────────────────────┼─────────┤
│ 5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8 │ 9545824 │
└──────────────────────────────────────────┴─────────┘
Run Time (s): real 0.005 user 0.007455 sys 0.000584
I cannot even ssh into the server after trying to use DuckDB. It is completely dead (with all the ducks, what a misery).
The reason is probably that it's using a full index, in contrast with the sparse index in ClickHouse, and maybe it's trying to build it in memory, going to swap (the server has 32 GB memory).
Because DuckDB uses ACID [1] data is loaded in an all-or-nothing manner. As the load was interrupted due to the system running out of memory, the table is expected to be empty.
If you load the data properly (creating the index after insertion, which is definitely preferable in this case), it will load extremely quickly (milliseconds).
You should also disclose your relationship with a competing project. For the record, I use DuckDB in personal projects and love it. You seem to be misusing it. :)
“These days”? A 32 bit intel running mysql back in 2000 would probably have laughed at this query.
Granted, I’d expect one or two disk seeks, at ~ 10 ms each. I imagine on modern hardware, it would be in the 100’s of usec range. (Assuming you limited it to 2-4 GB of ram).
https://github.com/ClickHouse/ClickHouse/issues/42363