Realtime metrics using Redis bitmaps

antirez · on Nov 29, 2011

It is very good to see this article and Redis bitmaps exploited, since it is an extremely memory efficient way to store data, and given the encoding, it is extremely fast to also fetch big amount of information this way.

I really suggest to also looking at GETRANGE and SETRANGE operations that allow to access sub-ranges of a large bitmap fetching or setting arbitrary ranges fo bits.

Probably Lua scripting in 2.6 will enable a lot more interesting things combining server side execution and bitmaps. Thanks for sharing this work.

pjscott · on Nov 30, 2011

For example, with Lua scripting, you could add custom commands for working with Bloom filters, to do fast approximate set membership testing:

http://en.wikipedia.org/wiki/Bloom_filter

I'm pretty excited about this.

rubyorchard · on Nov 30, 2011

Great point and looking forward to Lua support in 2.6. GETRANGE and SETRANGE operations also allow buffered writes at the collection time because setting bit to 1 is an idempotent operation. They are particularly attractive in sharded environment where one can perform:

  WATCH
  bits = GETRANGE ...
  bits = bits | buffered_bits
  MULTI
  SETRANGE .... bits
  EXEC

pbrumm · on Nov 30, 2011

It would be really cool if redis supported the ability to count the high bits in a string as a native command. No need to pull a 16k string down to client

Although it is another command in a growing list of them. maybe something for lua.

pjscott · on Nov 30, 2011

This is something that should probably be implemented natively, for speed. Preferably with GCC's __builtin_popcount intrinsic, if it's available.

yesbabyyes · on Nov 29, 2011

Bitmaps surely seem usable in Redis today - how can you do AND, OR, XOR on Redis bitmaps in Lua?

Also, will you create some kind of market for Redis scripts? That seems useful - even if it's just links to Github! ;)

jubos · on Nov 29, 2011

Just curious, how does the Lua runtime perform with bitmap operations?

pshc · on Nov 29, 2011

How does it compare with just using sets? Usually only a small fraction of your total userbase is online any given day, especially as dead accounts accumulate... wouldn't the bitmaps be inefficiently sparse?

seiji · on Nov 30, 2011

You can do clever things with word aligned hybrid bitmaps to heavily compress sparse bitmaps.

Though, if you are storing binary users, 700 million users take less than 90 MB of space (assuming a straight array implementation). Old days wouldn't need to be kept in memory. If you want to think beyond redis, you could keep the active data in redis then flush old data to disk for later querying.

Sounds useful if your users map to a (0, Max] integer representation. Sounds complicated if you use uuids or external vendor IDs for users anywhere (you'd need an intermediate mapping table somewhere).

rubyorchard · on Nov 29, 2011

There is certainly a tipping point between sets and bitsets. Bitsets are far more efficient for storage and for performing unions or intersections for millions of users. For example, for a typical 10% of user base active daily for mobile apps (http://www.avc.com/a_vc/2011/07/301010.html), bit sets will pay off both in storage and performance.

Obviously, each use case is different and for very sparse case, sets would be a better choice.

cmdrkeene · on Nov 30, 2011

Here's a quick ruby implementation:

https://github.com/cmdrkeene/bitmap-counter

rubyorchard · on Nov 30, 2011

Awesome!

wahnfrieden · on Nov 30, 2011

An honest question: what use are realtime metrics if you can't act on them in realtime? If it takes a day or more to gather enough data to make a decision and react to it with code or otherwise, then anything more than daily metrics seem like a distraction. I suppose it can be useful if you have some automated systems in place or if you're using it for alarms.

jphackworth · on Nov 30, 2011

If you're pushing changes to production many times a day, you can use realtime metrics to notice if anything goes awry, and know when you should roll back. It won't give you the same nuances as an A/B test over a month, but it's a great way to do a quick double check to keep things moving quickly while avoiding disaster.

yread · on Nov 30, 2011

How do they know that user ids are contiguous?

darklajid · on Nov 30, 2011

The same way the length of a meter is known. By defining it. They are measuring visitors, they can assign any numbers they like to those.

In my (limited) use cases for redis which interfaced with external information like that I usually had a mapping table anyway, to get something like

users:nameFromExternalSource:myIncrementalId

foo:myId:ThatUsersFoo

bar:myId:ThatUsersBar

Note again that I'm no expert on redis. There might be problems with that approach that I don't know about - but for me this worked out quite well and seemed to save memory vs any foo:externalIdOrUsername:dataHere name scheme.

joshu · on Nov 30, 2011

Compressed bitmaps would be cool too.

est · on Dec 2, 2011

Are there any cool bit level compressors?

You can take advantage of sparse bits and doing logic operations on them in compressed state.

mythz · on Nov 29, 2011

Cool!