200,000,000 keys in Redis 2.0.0-rc3

rbranson · on July 25, 2010

Try out Tokyo Tyrant. It'll use at least half the space.

ergo98 · on July 24, 2010

The key is a long cast to a string, and the value is an int. So 2.23GB of data became 24GB. Quite a large loss factor there.

Calculation the load capacity seems like it should be a trivial exercise and shouldn't require a test it approach.

antirez · on July 24, 2010

If you turn a txt file with a list of "common surnames -> percentage of population" into a binary tree it will get more or less an order of magnitude bigger in memory compared to the raw txt file.

This is a common pattern: when you add a lot of metadata, for fast access, memory management, "zero-copy" transmission of this information, expires, ..., the size is not going to be the one of concatenating all this data like in a unique string.

If we want we can save more memory per key, with some tradeoff. Example: instead of having an hash table where every hash slot is a pointer (holding the key) -> pointer (holding the value) map, we can take the hash and use it as a unique string with: <len>key<ptr><len>key<ptr>.... for all the colliding strings. Than tune the hash table in order to have on average 5 or 10 collisions per hash slot.

That is: prefixed length as string and pointer (that is of fixed size).

Maybe we'll do this in future, and in Redis master we are doing a lot of this in the "value" side of the business. So if you store a list of 10 elements it will use an amount of memory very similar to storing the same bytes as a comma separated value.

But for now our reasoning is: it's not bad to be able to store 1 million of keys with less than 200 MB of memory (100 MB on 32bit systems) if an entry level box is able to serve this data at the rate of 100k requests/second, including the networking overhead. And with hashes we have a much better memory performance compared to top level keys. So... with a few GB our users can store ten or hundred of millions of stuff in a Redis server.

Better to focus on stability and clustering for now... and when we'll have a system that can scale transparently we'll focus on memory overhead to check if it's still important to reduce it, or since we want to be a very high performance DB , if it's just better to accept the tradeoff.

ergo98 · on July 24, 2010

Interesting post. Thank you.

simonw · on July 24, 2010

Are you suggesting it's not worth testing the software if you can calculate what it should be able to handle? I don't think that's a good approach - software often behaves radically differently under extreme load for all manner of unpredictable reasons, so if you're going to use something in production it's a very good idea to run some real load tests first.

ergo98 · on July 24, 2010

No, I'm not suggesting that at all. However in this case the test is so incredibly simplistic that it's virtually useless.

And yes, it's rare to use a system where you can't calculate up front exactly what the load will be, and the test is nothing more than a confirmation. Running it and then looking quizzically at the results is kind of shocking really.

jzawodn · on July 24, 2010

Heh.

Sorry to shock you.

"Trust but verify" is often useful with infrastructure software. But clearly you have a different approach. No big deal.

jzawodn · on July 24, 2010

Uhm, no. If you read that post you'd see that I was mainly interested in the per-key overhead. I'd seen it spoken of as being in the 100 bytes/key range but never saw confirmation of that. So I did some back of the envelope calculations and then decided to try it for real.

ryanpetrich · on July 24, 2010

He could use binary keys instead (if his client library supports it)

antirez · on July 24, 2010

yep but the main overhead is actually not the size of the key and value, but the metadata: hash table, pointers, redisObject with fields like refcount and so forth.

mfukar · on July 25, 2010

Stress testing is vital to account for the harsh conditions in which most deployments, both hardware and software, operate.