Hacker News new | past | comments | ask | show | jobs | submit login

If someone dumped that project in my lap and said fix it (and it I was more used to low level programming), I’d probably start be refreshing myself on the last 10+ years of GC advances since I stopped reading SIGPLAN. Particularly multithreaded sweep. Because essentially you want to decouple delete from free so you can not do 100% of the bookkeeping work in the middle of time sensitive operations. But not fall as far behind as Postgres can with its vacuuming albatross.

In a way, deletion is a form of eventual consistency. The user loses access to the data but the system still knows about it for a while.

Just off the top of my head, I would think for LSM systems, you would resort to snapshotting as the edit history became much larger than the retained row count, and as you delete old snapshots (two GC roots) you could compare the old and the new and drop everything that didn’t survive. You only have to finish well before the next snapshot interval, and if you maintain a queue you only have to process them on average as fast as the snapshot interval.

And for BTree systems you can amortize the deletes across every insert, the way some realtime systems clean up a few free pointers on every allocation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: