It is very good to see this article and Redis bitmaps exploited, since it is an extremely memory efficient way to store data, and given the encoding, it is extremely fast to also fetch big amount of information this way.
I really suggest to also looking at GETRANGE and SETRANGE operations that allow to access sub-ranges of a large bitmap fetching or setting arbitrary ranges fo bits.
Probably Lua scripting in 2.6 will enable a lot more interesting things combining server side execution and bitmaps. Thanks for sharing this work.
Great point and looking forward to Lua support in 2.6. GETRANGE and SETRANGE operations also allow buffered writes at the collection time because setting bit to 1 is an idempotent operation. They are particularly attractive in sharded environment where one can perform:
It would be really cool if redis supported the ability to count the high bits in a string as a native command. No need to pull a 16k string down to client
Although it is another command in a growing list of them. maybe something for lua.
How does it compare with just using sets? Usually only a small fraction of your total userbase is online any given day, especially as dead accounts accumulate... wouldn't the bitmaps be inefficiently sparse?
You can do clever things with word aligned hybrid bitmaps to heavily compress sparse bitmaps.
Though, if you are storing binary users, 700 million users take less than 90 MB of space (assuming a straight array implementation). Old days wouldn't need to be kept in memory. If you want to think beyond redis, you could keep the active data in redis then flush old data to disk for later querying.
Sounds useful if your users map to a (0, Max] integer representation. Sounds complicated if you use uuids or external vendor IDs for users anywhere (you'd need an intermediate mapping table somewhere).
There is certainly a tipping point between sets and bitsets. Bitsets are far more efficient for storage and for performing unions or intersections for millions of users. For example, for a typical 10% of user base active daily for mobile apps (http://www.avc.com/a_vc/2011/07/301010.html), bit sets will pay off both in storage and performance.
Obviously, each use case is different and for very sparse case, sets would be a better choice.
An honest question: what use are realtime metrics if you can't act on them in realtime? If it takes a day or more to gather enough data to make a decision and react to it with code or otherwise, then anything more than daily metrics seem like a distraction. I suppose it can be useful if you have some automated systems in place or if you're using it for alarms.
If you're pushing changes to production many times a day, you can use realtime metrics to notice if anything goes awry, and know when you should roll back. It won't give you the same nuances as an A/B test over a month, but it's a great way to do a quick double check to keep things moving quickly while avoiding disaster.
The same way the length of a meter is known. By defining it.
They are measuring visitors, they can assign any numbers they like to those.
In my (limited) use cases for redis which interfaced with external information like that I usually had a mapping table anyway, to get something like
users:nameFromExternalSource:myIncrementalId
foo:myId:ThatUsersFoo
bar:myId:ThatUsersBar
Note again that I'm no expert on redis. There might be problems with that approach that I don't know about - but for me this worked out quite well and seemed to save memory vs any foo:externalIdOrUsername:dataHere name scheme.
I really suggest to also looking at GETRANGE and SETRANGE operations that allow to access sub-ranges of a large bitmap fetching or setting arbitrary ranges fo bits.
Probably Lua scripting in 2.6 will enable a lot more interesting things combining server side execution and bitmaps. Thanks for sharing this work.