If you are just starting with redis, beware that the entire dataset is held in memory, and memory is much more precious than disk. Don't just throw everything you can at it just because it's fast.
ARDB[1] seems like a good choice if you want the convenience of Redis and disk persistence. It is like a Redis that's backed by Facebook's RocksDB. The redis protocol support of ARDB is pretty good.
When I've used Redis, I've turned disk persistence off. This is for cases where (a) consistent performance is more important than being able to recover data (like when the data expires or otherwise is used quickly, and not needed for long), and for online operations failing to a replicated slave is a more robust solution than trying to relaunch a single node and have it read data back from disk (which doesn't help you if the machine itself dies). Or, (b), the long-term persistence is handled in a separate layer that's written to separately and the Redis instances are just used for fast on-line usage (again with slaves set up to avoid losing the data from memory entirely).
You can do SQL in pure memory too, but for the use cases I've had for redis, there isn't need for a schema or firm transaction guarantees, etc, so the setup and operational simplicity of Redis wins out.
I know many of you will be well aware of how one can use redis. This is targeted for the people like me (and many others) who have just entered into this field.
The hyperloglog command set has been added to Redis since this post. Hyperloglog is useful for estimating the number of unique X over a time period w/o needing to keep track of each X. So, e.g. an easy way to estimate your daily active users.
Good use case explaination for Redis. But how would you do this for a production system with 100M users. If there are over 10M users logged on at the same time you cache would be of size 10M*5000 IDS? Or would each user have an individual live cache returned per session with 5000IDS.
The leaderboard example makes more sense as the top scorers number is the same throughout and the board can be updated real time.
Any time you have per user data you have an a clear shard key. i.e. instead of one Redis instance you have 100 where each serves a subset of users. In practice sharding isn't quite so simple but having a clear boundary makes it fairly easy to manage.
I don't think that solution is viable within the constraints the parent post is talking about. There are many problems that come up with millions of users. Proper load balancing in that case is not simple and mostly dependent on the business logic.
I used Redis in many high scale applications and it has been working very well so far.
Primarily we use it as a cache backend, and also as a temporary storage for processing data (the expire feature plays very nicely).
in a recent project I was trying to store more information in Redis (such as view counts etc..) but faced the problem of querying the data later (as it relies mainly on keys/regexp).