Note: The performance values mentioned on that page (on an EC2 c1.medium instance using spinning-rust disks!) is wildly out of date. I'll get around to updating them some day.
For reference, on my laptop (Dell Latitude 7390 with an i7-8650U CPU):
* Bulk inserts run at ~600,000/second (up from 125,000).
* Bulk extracts run at ~660,000/second while in RAM (up from 30,000) and ~220,000/second from disk (up from 20,000).
* Bulk updates run at ~700,000/second in RAM dropping to ~300,000 from disk (up from 110,000 dropping to 60,000).
* Random reads run at ~800,000/second in RAM dropping to ~20,000 from disk (up from 220,000 dropping to 11,000).
* Random mixed run at ~400,000/second in RAM dropping to ~10,000 from disk (up from 30,000 dropping to 4,000).
* Hot-spot reads run at ~800,000/second in RAM dropping to ~500,000 from disk (up from 220,000 dropping to 60,000).
Note that "in RAM" means "the dataset fits into RAM" -- in all cases data is durably stored to disk.
There is almost certainly room for improvement; I haven't done extensive profiling yet.
> It was designed to satisfy the needs of the Tarsnap online backup service for high-performance key-value storage, although it is not yet being used for that purpose
Does the fact that you're still maintaining Kivaloo this many years later imply it's now being used by Tarsnap?
Tarsnap has recently started using Kivaloo. Over time I intend to use it far more -- but since Tarsnap is a backup service, I'm starting with the least critical parts first.
Out of curiosity, what was the use case/upstream requirement for building Kivaloo in the first place?
The Kivaloo page notes that it was designed for Tarsnap, but I'm curious as to the "why" - eg, an eventual-consistency/consensus building block, a simple local metadata service, server-side housekeeping...?
(I'm also curious what "{indexed ,}sorted files" means, and how UFS is significant.)
1. What are, off top of your head, some design changes or code changes required that'd bring drastic performance improvements?
2. What are some key internals that you think differentiate Kivaloo from other embedded KV stores? I assume you must have gone through a lot of existing literature on the topic before building this. For example, LMDB, BDB, RocksDB, LevelDB, SQLite and the likes come to mind that can double-up as KV stores.
3. Does it store the database in flat files with a WAL in front? Is the file format of the database custom, or based on existing formats?
4. Does the database auto index the fields? Or, use any other such aids to speed up access to data?
No, I mean with a dataset which is too large to fit into the amount of RAM on the system.
1. What are, off top of your head, some design changes or code changes required that'd bring drastic performance improvements?
Nothing immediately comes to mind. Profiling may reveal some improvements, of course.
2. What are some key internals that you think differentiate Kivaloo from other embedded KV stores? I assume you must have gone through a lot of existing literature on the topic before building this. For example, LMDB, BDB, RocksDB, LevelDB, SQLite and the likes come to mind that can double-up as KV stores.
Well... kivaloo isn't an embedded KV store, so that would be a big differentiating factor. It's a network daemon.
3. Does it store the database in flat files with a WAL in front? Is the file format of the database custom, or based on existing formats?
The "on-disk" format is the pages of a append-only B+Tree, with the last page being the tree root.
I put "on-disk" in scare quotes because there are other backends, e.g. using Amazon DynamoDB to store pages.
4. Does the database auto index the fields? Or, use any other such aids to speed up access to data?
There are no fields. Key-value pairs, nothing more.
You are correct, as things are at the present time. I designed kivaloo to be composable with the intention that I could put a replication/sharding layer in front of it later, however.
I am genuinely intrigued about how much is costs to commission those tests. They are so complete and thorough that I can't see myself being able to afford one for some project of mine, but I would absolutely like to to have an estimate
The 255 byte limit is because I store keys and values as a one-byte length followed by the relevant data. For Tarsnap what I typically want is ~40 byte keys and data (hence using those sizes for benchmarking).
Yes, my laptop has a Intel 660p 512 GB NVMe disk.
No documentation per se, although I hope the library interfaces are reasonably understandable. I'd be happy to help though -- this code deserves to be used!
Is the one-byte length central to some logic (or some performance concerns), or could one hack the `uint8_t` to a `uint16_t` in kvldskey and so on to get something that could store a bit more? We have some systems that need only 16-32 byte keys but upwards of around 16k values, we're currently using RocksDB but these performance numbers + no compaction process + mux approach are making me interested...
Interesting question. You would need to adjust kvldskey and all of the places where they are serialized (e.g. in pages and in the network protocol). You would also need a much larger page size -- the maximum key+value pair size has to fit into 1/3 of a page.
But if you're willing to make those adjustments and use 64 kB pages, I imagine it would work just fine. Please stay in touch!
Forgive my complete ignorance/inexperience, but why does the page size need to be increased to 64kB, and why does the key+value length(?) have to fit into 1/3 of a page?
B+Tree leaf nodes contain keys and values. In kvlds (which is the core B+Tree component of kivaloo), the code aims to keep nodes at least 2/3 full, in order to keep the tree approximately balanced. If a key-value pair can take more than 1/3 of a page, you could have a situation where a node is less than 2/3 of a page but adding another key-value pair would take it above the maximum allowed size.
This restriction could be relaxed, e.g. to require only that internal nodes are at least 2/3 full, in which case for the small key / large value case you could have pages which are barely larger than the largest value. I didn't bother doing that since it wasn't relevant to my usage.
I don't think I wrote much documentation about this, sorry. Basically the idea is that there's a pointer which moves through the key-space looking at pages, and if it passes any pages which are "old" it marks them as dirty so that they're rewritten as part of the next batch. The rate at which the cleaning pointer moves through key-space depends on the accumulated "cleaning debt", which is based on the amount of garbage along with the current I/O rate; the aim is to hit a steady-state where the total amount of I/O is constant and the cleaning gets the "left over" I/O after requests are serviced.
FYI, that's a very American English centric way of explaining how to pronounce 'loo'!
British English pronounces the lone word 'lieu' differently (with a y/j in front of the 'ū'), as in 'lieutenant' as 'leff', and is already familiar with 'loo's and pronounces them 'loo'.
'lieu' is of course of French origin, and they pronounce it differently again.
It's obvious, given 'kivaloo', how it's supposed to be. But unless it's a deliberate joke I think repeating 'loo' is a better pronunciation key!
Wow, it's never occurred to me anyone would pronounce that much differently (Am/Can pronunciation of 'r' in 'er's aside)! UK is 'soll-duh', or I suppose 'sohl-duh', but certainly sounding the 'l'.
Wiktionary suggests Canada generally follows the pronunciation your familiar with.
Yes, generally that is true. There are a few words, at least out west, that stand out: "About" "Schedule" "America".
The last one is in jest. The word is pronounced the same--in the US, often the US is called America, seemingly excluding Canada, Central America, South America.
I'm ashamed to admit I didn't get the connection of the name to "key value" until this comment. So do you yourself pronounce it kee va loo or kee va lyoo? As an American, I see no distinction between the sounds of 'loo' and 'lieu'.
Also American, I pronounce loo /lu/ and lieu /lyu/, in IPA.
I do commonly hear the expression "in lieu of" rendered "in loo of" but have always considered that incorrect, much like pronouncing the t in "often".
It is, of course, just dialect, and the entire idea of 'incorrect pronunciation' is suspect! But I can't help but consider /lyu/ to indicate a higher, er, register.
The standard American pronunciation of lieu is "loo". Some people who read the word before ever hearing it (it’s a rare word in ordinary speech, more common in writing), or maybe took a high school French class and want to sound fancy, say something like "lyoo" instead, but this is non-standard and uncommon. (Neither of the two is a correct pronunciation in French.)
I think it really depends on where you're from. I've been hopelessly trying to teach my wife 'perfect' English pronunciation at her request, and the best advice I came up with is to talk with your throat and not your mouth. At least in the Southeast, this works almost perfectly. But we say 'loo' if anything at all, it's a pretty rare word.
From my experience, the more north you go the more you talk with your tongue/mouth. In my examples I was almost able to complete full sentences without moving my lips at all.
Yeah those kind of reading explanations are garbage. There is an alphabet dedicated to that purpose, the International Phonetic Alphabet, which was created exactly for the purpose of writing pronunciation without ambiguity.
I was looking at this yesterday because of the post on "use of a life" post by cperciva. One thing on the benchmarks was I didn't know what was considered "good" numbers, even though I know it's dependent on the type of workload.
Is there a way to compare it to other key-value stores in a fair apples to apples comparison?
Is there somewhere I can nominate kivaloo for some sort of easily understandable code award? Granted I'm a little biased as it looks a lot like code I would write myself, but wow, it's so refreshing to see code that's full of (useful! illuminating!) comments. Thanks cperciva!
It's rather hard to compare. My understanding is that LMDB is a library which accesses a shared memory-mapped file; kivaloo is a collection of daemons which are accessed via the network or a unix socket. I guess you could place a network protocol front-end on top of LMDB and compare that to kivaloo?
And sure, small values and a lack of transactions are intentional limitations which allowed me to improve performance for the use case I cared about. Of course there's nothing stopping you from constructing that functionality in other ways; in fact I'm planning on releasing a daemon which provides key-blob storage by storing large objects in Amazon S3 and using the core kivaloo functionality as an indirection table.
Paul Graham tweeted yesterday[1] an essay that the author had written which references kivaloo. At least, that's how I came to know about it yesterday.
I read the document on it, then tried to imagine what I would use it for, and couldn't really come up with a use case. I had trouble understanding where I would apply the 255 byte key to 255 byte value paradigm. I also didn't know how the benchmarks compared to similar applications.
255 bytes is obviously overkill for a key (a 32 or 64 byte secure hash of anything larger than 255 bytes can be assumed to unique), so it's just a question of what values are usefully in 255 bytes:
One thought that comes to mind is a distributed store with centralized master, 255 bytes is plenty for a hostname and resource ID.
If they're hashes, sure. If you want to answer range requests -- e.g. "give me everything starting with ABCDEF" -- you may have a less densely utilized address space (rather like IP networking has had).
My accounting database has keys of the form "usage/"<256-bit user ID><20-byte timestamp><64-bit machine ID><usage type string>. Not 255 bytes, to be sure, but not trivially short either.
Slighlty offtopic, but it's ironic how a service with the tagline "Online backups for the truly paranoid" serves insecure connections to its website and doesn't employ HSTS. Especially since the user is expected to download the client for the service from that website.
Of course you are right, but some commonly accepted procedure nowadays is to do that automatically with a redirect, as a service for the user. Sure, you may have a reason for not to do so, but instead of dismissing a valid argument you could maybe explain why not ...
Edit: On second thought, I recognize this whole subthread started with the wrong tone ("ironic", "reckless"), kind of dismissive or patronizing and not calling for a constructive discussion, so please disregard this comment of mine, sorry.
Well, if the user actually checks the signatures and remembers from their last visit that their should be one coming along with the software, maybe. That's a big if though.
Considering there's no disadvantage in deploying HSTS and there's no good reason to serve an insecure website, especially since they already have HTTPS set up, I'm baffled by that decision. Maybe it's just an oversight but that's not too reassuring either.
> Well, if the user actually checks the signatures and remembers from their last visit that their should be one coming along with the software, maybe. That's a big if though.
HTTPS+HSTS adds no security here.
The only (reasonable) way to validate a payload end-to-end is an offline signature validation. This is true regardless of transport-level security measures like HTTPS.
Doing this properly with gpg makes even more sense due to the service targeting the paranoid, who is exactly the group of individuals who are certain to validate a signature.
> Considering there's no disadvantage in deploying HSTS and there's no good reason to serve an insecure website, especially since they already have HTTPS set up, I'm baffled by that decision. Maybe it's just an oversight but that's not too reassuring either.
It's easy, but adds nothing in ways of security here. Just privacy.
(Not that privacy does not have value, of course, but that's an entirely different argument.)
Assuming your attacker can count bytes, there's no privacy added by using TLS to download static content from the Tarsnap website. There simply aren't enough similarly-sized files.
But such an attacker would need to know that the content was downloaded from the Tarsnap website, right? If TLS is configured with Encrypted SNI, the handshake no longer reveals server information and thereby TLS does add privacy.
(However, Encrypted SNI is not widely adopted yet.)
That page is not very informative. It doesn't explain why anyone should be interested in this project, and compare with alternatives. What's the novelty, what's the edge, etc.
Hardly surprising that it didn't gain much traction.
For reference, on my laptop (Dell Latitude 7390 with an i7-8650U CPU):
* Bulk inserts run at ~600,000/second (up from 125,000).
* Bulk extracts run at ~660,000/second while in RAM (up from 30,000) and ~220,000/second from disk (up from 20,000).
* Bulk updates run at ~700,000/second in RAM dropping to ~300,000 from disk (up from 110,000 dropping to 60,000).
* Random reads run at ~800,000/second in RAM dropping to ~20,000 from disk (up from 220,000 dropping to 11,000).
* Random mixed run at ~400,000/second in RAM dropping to ~10,000 from disk (up from 30,000 dropping to 4,000).
* Hot-spot reads run at ~800,000/second in RAM dropping to ~500,000 from disk (up from 220,000 dropping to 60,000).
Note that "in RAM" means "the dataset fits into RAM" -- in all cases data is durably stored to disk.
There is almost certainly room for improvement; I haven't done extensive profiling yet.