Hacker News new | past | comments | ask | show | jobs | submit login
Instagram: Making the Switch to Cassandra from Redis (planetcassandra.org)
77 points by dikbrouwer on June 8, 2013 | hide | past | favorite | 27 comments



I think that here the approach is very wise: every time you have a problem involving a big amount of data, Redis may be costly, since it stores data in memory. Sometimes this is unavoidable, for instance I may see Twitter switching from Redis to another in-memory store to cache timelines, but hardly to something using disk. So it is very legit to say, I'm running a number of hosts since my memory requirements are high, let's try a cheaper on disk solution.

However that said it is also true that memory starts to be seriously cheap, and the real limit may be to run stuff on EC2 or similar platforms.

So sometimes the cost of RAM is a virtual cost associated with the platform you are running your services in, compared to the actual cost of that amount of memory.


I'm not entirely sure what I'd search for so I figure I might as well ask here. When a key won't fit in redis, does redis remove the least active key(s) to make room? And is there a way to persist the removed keys to disk to possibly be retrieved at a later date? That way the most active keys remain in memory while the least active keys are still available.


I haven't actually experienced this, someone correct me if I am wrong: you can choose from different policies (LRU, TTL, random) to expire/remove keys, otherwise if you disable maxmemory it will start using the OS virtual memory and slow down.

Redis had a virtual-memory option like you describe at some point, but it was scraped due to poor performance and other issues.


Ah LRU and virtual memory are the keywords I was looking for. I couldn't remember! Thanks!


The cost of VM RAM is bounded below by the cost of the physical RAM used to provide it, unless it's subsidized by costs elsewhere.


Just to be clear -- this doesn't mean we have eliminated Redis. It's still a very useful tool.


I think that's a very important point. It's not saying that X is better than Y, it's just for these particular use cases, one tool is better suited than the other tool because of which problems they were designed to tackle.

Awesome interview and kudos to whoever recommended the switch, that's a ton of savings you guys made!


The article compares Redis to Cassandra which means comparing apples to oranges for their use case. It would have been meaningful if they explained why they chose Casssandra, instead of for example Hbase.


I was thinking the same thing.

"We are very happy with the hammer we are currently using. Previously, we were using pliers to hammer nails and we saw that it wasn't very efficient."


According to the article, the interviewee used to work at DataStax (leader in Cassandra) before joining Instagram. I guess this is one of the reason why they first considered Cassandra and it turned out to be very successful. Biased or not, it is always good to have an engineer with deep understanding on the database, rather than just being an ordinary user.


Exactly the point I was thinking


Interesting that Instagram is still (at least partially) on EC2 rather than in Facebook's datacenters.


I'd tend to think if it ain't broke don't fix it.


I wonder if being part of Facebook (from which Cassandra originated) has anything to do with the choice for Cassandra over other databases.


Apparently Facebook is not using Cassandra much. For their messaging project they ended up with HBase instead of Cassandra [1]. I don't know if the system replaced the earlier Cassandra system, or complemented it.

[1] http://nosql.mypopescu.com/post/1583884165/facebook-the-unde...



I assure you it did not.


I'm interested if @antirez has thought of extending the (admittedly fun) redis api to a disk-backed version for when you don't need the performance of in-memory, but appreciate the simplicity of the redis api.

Maybe be able to assign a specific DB number to disk?

I realize that might be kitchen sinking Redis more than he intends, but it might be worth it.


As others have noted, this capability was intentionally removed. From http://redis.io/topics/virtual-memory

"Redis VM is now deprecated. Redis 2.4 will be the latest Redis version featuring Virtual Memory (but it also warns you that Virtual Memory usage is discouraged). We found that using VM has several disadvantages and problems. In the future of Redis we want to simply provide the best in-memory database (but persistent on disk as usual) ever, without considering at least for now the support for databases bigger than RAM. Our future efforts are focused into providing scripting, cluster, and better persistence."


I assume Instagram was a serious user of Redis. If you can switch from Redis to Cassandra in a matter of days, even after presumably years of Redis-usage being ground in your code base (because no matter how much you try to abstract out your fundamental data store, it always comes poking out in your architecture), it's probably not useful for Redis to try to out-Cassandra Cassandra. Let Redis be the best Redis it can be, and let Cassandra be the best Cassandra it can be.


We are still a very serious user of Redis, and I agree with this. Redis works well as a persisted networked heap, which very different from a database. IMHO Redis would have to trade off a lot of it's value to become a "real database."


The ArDb project might fill this gap "A redis-protocol compatible persistent storage server, support LevelDB/KyotoCabinet/LMDB as storage engine." https://github.com/yinqiwen/ardb


IIRC, antirez experimented with that for a version or two before deciding it wasn't right for redis, deprecating then removing those added features.


My understanding is that being single threaded and assuming that every operation will complete very quickly is how Redis sidesteps a lot of complexity around atomicity and transactions. This is probably why swap as an option was quickly removed.


I've been working on a server that implements the Redis protocol and stores the data in LevelDB: https://github.com/cupcake/setdb



http://github.com/inaka/edis is here now for that purpose




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: