Hacker News new | past | comments | ask | show | jobs | submit login

How much faster are your benchmarks using WAL mode? https://www.sqlite.org/wal.html



I experimented earlier to push the limits of SQLite inserts and wrote a blog post[0] about it. We can apply some of the learnings here.

I reviewed the OP's code and did some benchmarks; SQLite is not the bottleneck here. The code first generates the commit info from the git log, prints that to stdin [1] and the python script reads from it one by one in a loop [2]. Each of the commit info is written to SQLite. So, with or without WAL, the time is almost the same.

To confirm my hypothesis, I ran the project without insert calls. On my machine, for cpython, it took 160 seconds and without sqlite inserts 159 ish.

I believe the git log will be fast anyway, so other ways to make it faster would be to read a bunch of commits at once and then do batch inserts. We can also make it run in parallel since each commit info is independent, and we don't need to care about ordering while inserting.

[0] - https://avi.im/blag/2021/fast-sqlite-inserts/

[1] - https://github.com/jmforsythe/Git-Heat-Map/blob/bd9bc22/git-...

[2] - https://github.com/jmforsythe/Git-Heat-Map/blob/bd9bc22/git-...


Unfortunately the commit info is not independent at the moment, and that is mostly due to wanting to track renaming/deletion of files.


Nice work, thank you for the analysis




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: