I want to share this with you as I think it's a game changer for everybody that looked at Redis, didn't liked the snapshotting because their data is important, and decided to discard it from the viable alternatives. Well I also hope the article is somewhat informative as a side effect.
p.s. version 1.1 is currently in beta, the feature is available on Git, a stable version (rc1) will be released at the end of this year.
Will the client libraries be updated in tandem with the 1.1 release? Especially the PHP extension as it seems to be missing SETNX as far as I can tell.
Hello pierrefar. Yes, the most important client libs will support (many already do) 1.1 sorted sets, MSET and MSETNX commands (Multi keys set in a single command atomically), and the other features once 1.1 will be released.
But about PHP, I've good news, there are two new fully featured high quality implementations of the Redis protocol for PHP:
Also the PHP C module got two new developers and is now much more stable, supporting the full 1.0 protocol AFAIK: http://github.com/owlient/phpredis
So the client libs arena is getting better and better fortunately. Other good quality client libs are for Ruby, Java, and Go Language. Python is getting better with the time too.
I'm actually writing a CMS using Redis and absolutely need SETNX (which one current PHP-only lib supports well). I would like to move to a compiled PHP module so that I get a nice performance boost.
Coupled with this new log-based persistence, I'll be more comfortable with the whole set up.
If you want to talk more, my email is hello at (my username).com.
And the fun continues! Gonna upgrade to latest and start supporting 1.1 API in cl-redis tomorrow. Win32 builds are greatly appreciated. Contacted the OHM developer yesterday in #ohm on freenode; will have a chat and see if we can make that into a language-independent protocol (I have been making the same thing for Common Lisp for sometime now.)
Great, you can find me on #Redis (freenode) as well if I can help. About Win32: Redis compiles without problems on Cygwin but at least two tests are failing (it's about INCR with 64 bit values, strtol() is not working well maybe?).
actually a lot of people overestimate the amount of their data. Btw there are a few solutions:
* given that RAM is cheap, use a Linux box with 32 or 64 GB of RAM. After all with the performances of Redis you need a single box when with other solutions more than one server is needed. It's hard to saturate 150,000 write or read per second even with a lot of users.
* to split data across different servers (using application-level partitioning or consistent hashing).
* to wait for Redis VM implementation (virtual memory).
Basically there is the plan to implement something like Operating Systems already do with memory pages. More information (and why we can't just let the OS do the work for us) in the Redis FAQ at http://code.google.com/p/redis/wiki/FAQ
Search for "Do you plan to implement Virtual Memory in Redis? Why don't just let the Operating System handle it for you?".
Yes, I think we had this discussion before but I'll repeat my concerns:
given that RAM is cheap, use a Linux box with 32 or 64 GB of RAM. After all with the performances of Redis you need a single box when with other solutions more than one server is needed. It's hard to saturate 150,000 write or read per second even with a lot of users.
For many apps even 64G is small. The problem is less about the height of the ceiling but about the fact that a ceiling exists - and that redis effectively stops working when it's reached. I agree that there are applications where redis is a perfect fit, but for many others this limit is a serious problem. Also note that many projects, especially those just starting out, simply can't afford to start with 64G servers. 4-8G is a more realistic scale to assume and that will fill up faster than anyone likes.
FWIW, the EC2 high-memory instance (64G) you cite in your FAQ goes for a swift $1728 USD/month. Translated to english that means: reddit is pretty much out of the question for cloud-based apps because RAM is expensive in the cloud and the normal instance types (<$300 USD/month) top out at 4G.
to split data across different servers (using application-level partitioning or consistent hashing).
Sharding is always worthwhile for scaling but it's a fairly delicate subject (esp. the rebalancing after add/remove of a shard and queries spanning multiple shards) and I haven't seen a shrinkwrapped solution for redis, yet.
So, while an option, most people will probably rather use a competing product (e.g. MongoDB) before opening that can of worms.
to wait for Redis VM implementation (virtual memory).
Yup, that would be me. Once the RAM-limitation is gone redis will suddenly become very interesting to me.
There's no reason to store all of your application's data in Redis. If you're dealing with large files (photos / video for example) you can use S3 for the actual bytes, and just store the metadata in Redis. 4KB of metadata per item means 10,000,000 items would fit in 4.8GB of RAM.
For many apps even 64G is small. The problem is less about the height of the ceiling but about the fact that a ceiling exists - and that redis effectively stops working when it's reached. I agree that there are applications where redis is a perfect fit, but for many others this limit is a serious problem. Also note that many projects, especially those just starting out, simply can't afford to start with 64G servers. 4-8G is a more realistic scale to assume and that will fill up faster than anyone likes.
Yes I understand this concerns, and this is why I'm trying to address this with virtual memory and redis-cluster (a proxy that takes care of handling fault tolerant consistent hashing). But my point is that there is also a cultural barrier about this issues. with 8 GB of RAM, especially if you are starting up and if you take care of selecting a data layout that is cheap, there is a lot of data you can put in memory. Another interesting alternative is to put only "hot" data (metadata) on Redis, and use another on-disk DB for the rest.
FWIW, the EC2 high-memory instance (64G) you cite in your FAQ goes for a swift $1728 USD/month.
High performance DBs and EC2 IMHO are not a great fit. Not only RAM is expensive, but memory bandwidth is not optimal.
EC2 is just expansive from every angle you see it, it's not a problem just with Redis, but also with MySQL performances as memory is crucial to make MySQL working well.
Sharding is always worthwhile for scaling but it's a fairly delicate subject (esp. the rebalancing after add/remove of a shard and queries spanning multiple shards) and I haven't seen a shrinkwrapped solution for redis, yet.
Application level partitioning is a good option I think. It's easy to manage, for instance you take your users in an instance, their blog posts in another, and comments in another one. Are you little and low traffic in the start and can't buy three hosts? Just use three Redis instances in the same box, and move them in different hosts as you grow.
So, while an option, most people will probably rather use a competing product (e.g. MongoDB) before opening that can of worms.
MongoDB and Redis are very different products. If MongoDB is a good fit for your application, use it, it's great. But if you need Redis, MongoDB is not a drop in replacement in any way IMHO.
Yup, that would be me. Once the RAM-limitation is gone redis will suddenly become very interesting to me.
This will only work well if data access pattern is biased btw. And the virtual memory will not remove completely the dataset size limitation. For instance if you have 2 GB of RAM it will make sense to setup VM to have up to 32 GB of data for instance, given a biased enough access pattern, but it's not like it will work well with 1 TB of data.
So I in short I think: most in-memory barrier is cultural. There are solutions to distribute among different servers that are not hard to implement and maintain. In every kind of DB the memory should be proportional to the dataset for it to scale, and with some kind of evenly distributed data access pattern you need to take everything in memory anyway.
Also the fact that in Redis writes are as cheap as reads is not something to forgot. There are many applications where it will be much more viable to have more RAM that scaling concerns with writes.
Redis is not for everything, but I think there is a domain of applications where it is a very good fit.
Another interesting alternative is to put only "hot" data (metadata) on Redis, and use another on-disk DB for the rest.
Yup that's how I (still) see redis at the moment, more a persistent cache than a primary datastore.
EC2 is just expensive from every angle you see it
Not really, but that's a different story (scaling out vs up etc.). In general redis as of now is mostly geared towards a "scale-up" approach whereas cloud deployments naturally need to "scale-out" instead.
Application level partitioning is a good option I think. It's easy to manage, for instance you take your users in an instance, their blog posts in another, and comments in another one. Are you little and low traffic in the start and can't buy three hosts? Just use three Redis instances in the same box, and move them in different hosts as you grow.
Yes, and that's exactly the can of worms you don't want to get into.
What happens when one of your comment/user/post instances outgrows its host? You have to split it further. Either logically again (users A-L on host1, M-Z on host2) or anonymously (even users on host1, odd users on host2).
Since nobody wants to constantly think about their data-layout the latter variant is definitely preferable.
MongoDB has a leg up here by providing a beta of anonymous, maintenance-free sharding already.
This will only work well if data access pattern is biased btw. And the virtual memory will not remove completely the dataset size limitation. For instance if you have 2 GB of RAM it will make sense to setup VM to have up to 32 GB of data for instance, given a biased enough access pattern, but it's not like it will work well with 1 TB of data.
That sounds bad. Very bad. Not for the people who are happy with redis today, but for those who are not touching it because of that constraint.
There are solutions to distribute among different servers that are not hard to implement and maintain.
Okay, I'll bite. Where is the turnkey solution that distributes my data over n redis-instances with automatic failover, automatic rebalancing after adding/removing an instance, n-copies for redundancy, reliable failure modes when an instance outgrows the available memory?
I would say without at least some of these features a redis-cluster could become a nightmare to maintain in the long run.
I would say without at least some of these features a redis-cluster could become a nightmare to maintain in the long run.
Indeed, this is very helpful, and it's exactly what redis-cluster will do. You talk to Redis-cluster, and it will talk to other N redis instances, handling faults, adding or removing nodes, and so forth.
It's like "mongos" process basically.
But the roadmap is to implement virtual memory first and redis-cluster later, as I think that virutal memory is a more promptly available solution to start, and works with applications designed to run into a singel Redis server (think to SORT BY and so o, sets intersections and so forth).
It's something like two release away. The next release will implement the Hash type, needed for virtual memory so that applications can be designed to put data that is accessed with similar patterns in the same hash.
HGET, HSET, HEXISTS, and HDEL are all I will need to implement genhash on top of Redis. Perfect.
Tonight we're rolling out yet another Redis application; as a backend "session-database" for a server farm. It's only 2-3 slower than shared memory, and just about the same speed as message passing.
Oh, now I understand, if I give you this basic api you can implement genhash! Ok got it.
Very happy to read you are rolling out Redis servers, it is cool to see such a great community feedbacks after just 9 months of the first line of code written :)
I am extremely excited about Redis, however its data size limit as the amount of available RAM is a big showstopper for me. Even when the available RAM in entry level servers is getting larger and larger (however in most cases it is <512 mb for entry level VPSes) and 64 G servers are easily available, the thought that it holds all data in RAM is not very comforting. Why keep all the data (even the pieces which are seldom used) in RAM? Couldn't there be a functionality (like TC) which flushes all old data to disk from RAM and makes room in RAM for new data?
In the mean time, I've found it's not too difficult to work around the RAM limitation. I'm working with pretty small servers, and I just use another database (MySQL usually) as a datastore for infrequently accessed data.
good point. People may wonder if this is the same as using a caching solution like memcached or using Redis itself just like a cache. It's very different for a couple of very important reasons: 1) by splitting data by different access priorities there is not to handle caching and invalidation, it's just two DBs for two kind of data, both consistent. 2) It's possible to run complex queries in the Redis side, for instance sorted lists are invaluable to take elements in order even when there are frequent score updates.
yep this is exactly the plan :) But actually it will be a good idea to use such a feature only if data access is not evenly distributed across the dataset, otherwise it's simply better to buy more RAM or more servers IMHO.
IMHO, that kind of plan (flush old data to disk and keep new data) is too much work and likely to hit both performance and durability. I'd like to see redis adding more features for production ready such as this one (append only log file). The RAM usage really is not a big deal since most people have their dedicated server and can easily add more RAM.
Another question about the rewrite append only log file strategy. Will that have a possibility for loss data if the filesystem fails to recover from last power off? I mean, if you $rewrite$ a file, there is always a possibility.
Hello, actually the rewrite happens against another file and only when everything is ok it gets renamed atomically and the fd for next appends switched.
"Very good" compared to normal disks. RAM is still much faster. Btw a setup to try is to use Redis with an SSD disk as swap, and load it with more data than RAM can hold and let the OS swap to see how it performs.
Much of what makes redis differently useful from all the other database applications out there is based on the fact that it keeps all data in RAM. If you want a database that holds some data in memory and some on disk, there are dozens to choose from.
I hope that in adding disk-based storage, redis doesn't become like most of the other databases, where you pay a price for their disk-centric storage, even if you don't need or want it.
read performance is the same of course, as the append only file is not touched. write performances from the first tests appear to be like 70% of snapshotting (tested with redis-benchmark), but more tests are needed to really understand what's the write speed hit.
p.s. version 1.1 is currently in beta, the feature is available on Git, a stable version (rc1) will be released at the end of this year.