Sparkey: A simple constant key/value storage library

jitl · 2024-01-04T18:30:22 1704393022

At Airbnb we are implementing translation serving on top of Sparkey; part of our Ruby system open sourced here: https://github.com/airbnb/hammerspace

The way the system worked (iirc it’s been a while): we’d periodically dump all the translation strings from our translation editor app (MySQL backed) into a sparkey S3 path, and use a Kubernetes daemonset to fetch the sparkey file to each host in the cluster. Finally application pods written in any language use a host volume to read translation strings from the local sparkey file. Certainly overkill for smaller apps but with a bunch of services and hundreds of thousands of translation strings, the out of band updates are well worth it and keep the app container small.

esprehn · 2024-01-04T19:24:50 1704396290

IIRC many of Airbnb's systems were implemented on top of sparkey: translations, feature flags, experiments, dynamic config. It stood the test of time really well. Probably all built by Jon Tai.

nurgasemetey · 2024-01-04T19:48:10 1704397690

Very ingenious indeed. How would you deal with data corruptions? How it was detected? Is repair automatic?

esprehn · 2024-01-05T03:35:55 1704425755

I don't believe there was any special handling for corruption, but there was probably hash checking when syncing new data every few minutes. The data stored in there was not user data, so the failure modes are not as severe. Sparkey is also never the source of truth, it's a local cache for each service instance.

mooreds · 2024-01-04T20:40:11 1704400811

Do you know if that system is still in use? The gem looks abandoned.

igor47 · 2024-01-04T18:52:03 1704394323

Ah yes! For the Rails monolith the sparkey integration was accomplished through https://github.com/airbnb/hammerspace

KAdot · 2024-01-04T19:25:43 1704396343

A similar idea implemented in Go https://github.com/akrylysov/pogreb (disclaimer: I'm the author of this package).

teo_zero · 2024-01-05T10:57:24 1704452244

Interesting. I think this is wrong:

> As soon as we reach an entry with higher displacement than the thing we're looking for, we can abort the lookup.

Isn't it the other way out?

Besides, I hope the index size is a power of 2, so to use a bitwise AND instead of an expensive modulo in order to find the optimal placement of every hash found during the lookup, otherwise the integer division would be a bottleneck. (Of course I'm considering an index cached in memory: if you have to access the disk anyway, than that dominates the lookup time.)

Finally, I'm still wondering how they handle deletions (I've only read the text, not the code).

krka · 2024-01-05T14:44:39 1704465879

I haven't worked on this code in a long time but I will try to answer based on what I remember. (Also on phone without easy access to the code)

The idea is that all entries have a natural ideal position in the table and the displacement is how far away from it it is placed in practice. If two entries are competing for the same slot, the logic will prefer displacements of 1,2 over 0,3 so if you observe an entry with displacement 5 after already having visited 5 entries you can already conclude you wont find what you are looking for. However, I am not sure this quick exit is actually implemented.

I think the index size is not a power of two, but that was something I considered but it would greatly affect size. At the time of building it, spinning disks were still common and optimizing for space compactness and minimizing page faults/iops were more important. I think it would be at most one division per lookup and the the memory/disk reads would dominate the work anyway.

Deletions are implemented as a special tombstone entry in the log file which then causes a removal from the index slot as well. There is also a naive implementation of a compaction flow (go through log, build index, regenerate new log by looking at the index, then rewrite index)

teo_zero · 2024-01-06T00:32:21 1704501141

Thank you for the answer.

> if you observe an entry with displacement 5 after already having visited 5 entries you can already conclude you wont find what you are looking for.

Exactly. If I have already visited 5 slots without finding what I was looking for, and I'm examining the 6th one, where my target would then be "off by 6" from its natural place, and I find it filled with something that is off by 5 or less from ITS natural place, then I can abort the search knowing that my target has never been inserted because if it was inserted and arrived at this slot, the algorithm would have evicted the former occupant of the slot (smaller displacement means wealthier in Robin Hood's metaphor, so prone to be robbed of the slot).

To summarize, I can abort the search when I find an entry with a displacement lower than my target would have in the same position. But the docs read higher. Hence my puzzlement.

> I think it would be at most one division per lookup

It's one division per visited slot, because in each slot I need to calculate the displacement of the entry that I find there. And to calculate the displacement I need to know the "natural" slot, which I calculate as hash % size.

> Deletions are implemented as a special tombstone entry in the log file

Agreed.

> which then causes a removal from the index slot as well.

That's the part that I'm missing. If you simply remove it without rearranging all the neighbor slots, the nice properties we discussed before becomes void.

krka · 2024-01-06T09:25:34 1704533134

True, if the lookup actually implemented this, it would be more divisions. This might partly be why it did not get implemented at all. With a sparse index (25% empty capacity) we should quickly run into empty slots anyway in most cases, and if we mostly expect hits the check would not be useful anyway.

For deletions, neighbour entries would bubble up into their correct slots to preserve that property.

krka · 2024-01-14T10:35:25 1705228525

Now when I'm at an actual computer, I see that it was implemented, at least in the Java version. Removing that made for a 30% speed increase: https://github.com/spotify/sparkey-java/pull/65

vitiral · 2024-01-04T18:03:40 1704391420

Hey, this is effectively the same idea I'm going to implement!

https://github.com/civboot/civboot/blob/main/blog/0012-dev-l...

Mine will be slightly more than key/value. My index file will support indexes of arbitrary record fields as well as the index of all items in loam, but it's effectively the same idea.

Note you could keep a live instance running to handle live writes, which uses an in-memory hash for recent writes and updates the hash index as-needed. Because the index can be rebuilt from the data there is no concern about data loss.

unbearded · 2024-01-05T03:11:34 1704424294

I use sparkey / hammerspace for caching static data (tried rocksdb but hammerspace was faster) for many years.

It helps me have a 20ms on a 100r/s on a ruby on rails app (ruunning on a ryzen 5950X). I keep seeing people brag about 100ms on applications and maybe I’m old school or haven’t seen enough, but 100ms breaks personal SLO. will trigger a 5-alarm alert. Also use redis a lot. But when possible, I use hammerspace/sparkey instead of it.

madmountaingoat · 2024-01-05T15:34:05 1704468845

Sparkey is simple and super fast. If you use them like Spotify frequently does, where a pipeline creates a new daily copy, hydrating them to the nodes that need access becomes the bigger issue for using them.

dang · 2024-01-05T05:37:41 1704433061

secwang · 2024-01-05T08:48:21 1704444501

glad they mentioned cdb.