actually a lot of people overestimate the amount of their data. Btw there are a few solutions:
* given that RAM is cheap, use a Linux box with 32 or 64 GB of RAM. After all with the performances of Redis you need a single box when with other solutions more than one server is needed. It's hard to saturate 150,000 write or read per second even with a lot of users.
* to split data across different servers (using application-level partitioning or consistent hashing).
* to wait for Redis VM implementation (virtual memory).
Basically there is the plan to implement something like Operating Systems already do with memory pages. More information (and why we can't just let the OS do the work for us) in the Redis FAQ at http://code.google.com/p/redis/wiki/FAQ
Search for "Do you plan to implement Virtual Memory in Redis? Why don't just let the Operating System handle it for you?".
Yes, I think we had this discussion before but I'll repeat my concerns:
given that RAM is cheap, use a Linux box with 32 or 64 GB of RAM. After all with the performances of Redis you need a single box when with other solutions more than one server is needed. It's hard to saturate 150,000 write or read per second even with a lot of users.
For many apps even 64G is small. The problem is less about the height of the ceiling but about the fact that a ceiling exists - and that redis effectively stops working when it's reached. I agree that there are applications where redis is a perfect fit, but for many others this limit is a serious problem. Also note that many projects, especially those just starting out, simply can't afford to start with 64G servers. 4-8G is a more realistic scale to assume and that will fill up faster than anyone likes.
FWIW, the EC2 high-memory instance (64G) you cite in your FAQ goes for a swift $1728 USD/month. Translated to english that means: reddit is pretty much out of the question for cloud-based apps because RAM is expensive in the cloud and the normal instance types (<$300 USD/month) top out at 4G.
to split data across different servers (using application-level partitioning or consistent hashing).
Sharding is always worthwhile for scaling but it's a fairly delicate subject (esp. the rebalancing after add/remove of a shard and queries spanning multiple shards) and I haven't seen a shrinkwrapped solution for redis, yet.
So, while an option, most people will probably rather use a competing product (e.g. MongoDB) before opening that can of worms.
to wait for Redis VM implementation (virtual memory).
Yup, that would be me. Once the RAM-limitation is gone redis will suddenly become very interesting to me.
There's no reason to store all of your application's data in Redis. If you're dealing with large files (photos / video for example) you can use S3 for the actual bytes, and just store the metadata in Redis. 4KB of metadata per item means 10,000,000 items would fit in 4.8GB of RAM.
For many apps even 64G is small. The problem is less about the height of the ceiling but about the fact that a ceiling exists - and that redis effectively stops working when it's reached. I agree that there are applications where redis is a perfect fit, but for many others this limit is a serious problem. Also note that many projects, especially those just starting out, simply can't afford to start with 64G servers. 4-8G is a more realistic scale to assume and that will fill up faster than anyone likes.
Yes I understand this concerns, and this is why I'm trying to address this with virtual memory and redis-cluster (a proxy that takes care of handling fault tolerant consistent hashing). But my point is that there is also a cultural barrier about this issues. with 8 GB of RAM, especially if you are starting up and if you take care of selecting a data layout that is cheap, there is a lot of data you can put in memory. Another interesting alternative is to put only "hot" data (metadata) on Redis, and use another on-disk DB for the rest.
FWIW, the EC2 high-memory instance (64G) you cite in your FAQ goes for a swift $1728 USD/month.
High performance DBs and EC2 IMHO are not a great fit. Not only RAM is expensive, but memory bandwidth is not optimal.
EC2 is just expansive from every angle you see it, it's not a problem just with Redis, but also with MySQL performances as memory is crucial to make MySQL working well.
Sharding is always worthwhile for scaling but it's a fairly delicate subject (esp. the rebalancing after add/remove of a shard and queries spanning multiple shards) and I haven't seen a shrinkwrapped solution for redis, yet.
Application level partitioning is a good option I think. It's easy to manage, for instance you take your users in an instance, their blog posts in another, and comments in another one. Are you little and low traffic in the start and can't buy three hosts? Just use three Redis instances in the same box, and move them in different hosts as you grow.
So, while an option, most people will probably rather use a competing product (e.g. MongoDB) before opening that can of worms.
MongoDB and Redis are very different products. If MongoDB is a good fit for your application, use it, it's great. But if you need Redis, MongoDB is not a drop in replacement in any way IMHO.
Yup, that would be me. Once the RAM-limitation is gone redis will suddenly become very interesting to me.
This will only work well if data access pattern is biased btw. And the virtual memory will not remove completely the dataset size limitation. For instance if you have 2 GB of RAM it will make sense to setup VM to have up to 32 GB of data for instance, given a biased enough access pattern, but it's not like it will work well with 1 TB of data.
So I in short I think: most in-memory barrier is cultural. There are solutions to distribute among different servers that are not hard to implement and maintain. In every kind of DB the memory should be proportional to the dataset for it to scale, and with some kind of evenly distributed data access pattern you need to take everything in memory anyway.
Also the fact that in Redis writes are as cheap as reads is not something to forgot. There are many applications where it will be much more viable to have more RAM that scaling concerns with writes.
Redis is not for everything, but I think there is a domain of applications where it is a very good fit.
Another interesting alternative is to put only "hot" data (metadata) on Redis, and use another on-disk DB for the rest.
Yup that's how I (still) see redis at the moment, more a persistent cache than a primary datastore.
EC2 is just expensive from every angle you see it
Not really, but that's a different story (scaling out vs up etc.). In general redis as of now is mostly geared towards a "scale-up" approach whereas cloud deployments naturally need to "scale-out" instead.
Application level partitioning is a good option I think. It's easy to manage, for instance you take your users in an instance, their blog posts in another, and comments in another one. Are you little and low traffic in the start and can't buy three hosts? Just use three Redis instances in the same box, and move them in different hosts as you grow.
Yes, and that's exactly the can of worms you don't want to get into.
What happens when one of your comment/user/post instances outgrows its host? You have to split it further. Either logically again (users A-L on host1, M-Z on host2) or anonymously (even users on host1, odd users on host2).
Since nobody wants to constantly think about their data-layout the latter variant is definitely preferable.
MongoDB has a leg up here by providing a beta of anonymous, maintenance-free sharding already.
This will only work well if data access pattern is biased btw. And the virtual memory will not remove completely the dataset size limitation. For instance if you have 2 GB of RAM it will make sense to setup VM to have up to 32 GB of data for instance, given a biased enough access pattern, but it's not like it will work well with 1 TB of data.
That sounds bad. Very bad. Not for the people who are happy with redis today, but for those who are not touching it because of that constraint.
There are solutions to distribute among different servers that are not hard to implement and maintain.
Okay, I'll bite. Where is the turnkey solution that distributes my data over n redis-instances with automatic failover, automatic rebalancing after adding/removing an instance, n-copies for redundancy, reliable failure modes when an instance outgrows the available memory?
I would say without at least some of these features a redis-cluster could become a nightmare to maintain in the long run.
I would say without at least some of these features a redis-cluster could become a nightmare to maintain in the long run.
Indeed, this is very helpful, and it's exactly what redis-cluster will do. You talk to Redis-cluster, and it will talk to other N redis instances, handling faults, adding or removing nodes, and so forth.
It's like "mongos" process basically.
But the roadmap is to implement virtual memory first and redis-cluster later, as I think that virutal memory is a more promptly available solution to start, and works with applications designed to run into a singel Redis server (think to SORT BY and so o, sets intersections and so forth).
It's something like two release away. The next release will implement the Hash type, needed for virtual memory so that applications can be designed to put data that is accessed with similar patterns in the same hash.
HGET, HSET, HEXISTS, and HDEL are all I will need to implement genhash on top of Redis. Perfect.
Tonight we're rolling out yet another Redis application; as a backend "session-database" for a server farm. It's only 2-3 slower than shared memory, and just about the same speed as message passing.
Oh, now I understand, if I give you this basic api you can implement genhash! Ok got it.
Very happy to read you are rolling out Redis servers, it is cool to see such a great community feedbacks after just 9 months of the first line of code written :)
actually a lot of people overestimate the amount of their data. Btw there are a few solutions:
* given that RAM is cheap, use a Linux box with 32 or 64 GB of RAM. After all with the performances of Redis you need a single box when with other solutions more than one server is needed. It's hard to saturate 150,000 write or read per second even with a lot of users.
* to split data across different servers (using application-level partitioning or consistent hashing).
* to wait for Redis VM implementation (virtual memory).
Basically there is the plan to implement something like Operating Systems already do with memory pages. More information (and why we can't just let the OS do the work for us) in the Redis FAQ at http://code.google.com/p/redis/wiki/FAQ
Search for "Do you plan to implement Virtual Memory in Redis? Why don't just let the Operating System handle it for you?".