Personally I'm more interested in indexing a boatload of data as fast as possible, with modest resource requirements. Had a couple of cases where I'd like to do that on a mid-tier laptop, with no particular success so far. I have to guess, but it seems that ‘data scientists’ either buy big fat boxes with tons of ram and cpu, or offload everything to big fat boxes in datacenters, or twiddle thumbs for a quite while. You'd think that by now writing indexes on all columns at the top sequential drive speed would be a solved problem from any ‘Learn data science in two days’ tutorial.
Dunno why it'd be surprising: WAL has higher concurrency but it translates to more overhead, writes have to hit the WAL then be flushed to the database, and reads have to check the database and the WAL.
WAL could have faster writes for small amounts of data, but once you have enough contents that the WAL has to be flushed during processing you're better off skipping the WAL entirely.
Sorry I don’t mean to hijack the original post but for performant insert and indexing (which I assume is for analysis), I’d recommend using Clickhouse or QuestDB
It is certainly feasible to saturate NVMe with just index writes in many niche implementations today. The trick is usually copious amounts of batching so that IO can do more per unit.
Last time this was here, I ran this and got the following results:
/fast-sqlite3-inserts (master)> time make busy-rust
Sun Jul 18 17:04:59 UTC 2021 [RUST] busy.rs (100_000_000) iterations
real 0m9.816s
user 0m9.380s
sys 0m0.433s
________________________________________________________
Executed in 9.92 secs fish external
usr time 9.43 secs 0.20 millis 9.43 secs
sys time 0.47 secs 1.07 millis 0.47 secs
fast-sqlite3-inserts (master)> time make busy-rust-thread
Sun Jul 18 17:04:48 UTC 2021 [RUST] threaded_busy.rs (100_000_000) iterations
real 0m2.104s
user 0m13.640s
sys 0m0.724s
________________________________________________________
Executed in 2.33 secs fish external
usr time 13.68 secs 0.20 millis 13.68 secs
sys time 0.78 secs 1.18 millis 0.78 secs
I'm probably doing something wrong. Or I'm getting the pace needed for the billion?
Author of the code here and yes, that's correct! Also, it is possible that rust compiler might be optimising this code, remove the random generation calls altogether. (I haven't gotten around to doing the analysis, yet)
I am surprised M1 Air taking 20 seconds, on my old machine its 7seconds ish.
https://news.ycombinator.com/item?id=27872575