Hacker News new | past | comments | ask | show | jobs | submit login
Realtime metrics using Redis bitmaps (getspool.com)
184 points by rubyorchard on Nov 29, 2011 | hide | past | favorite | 19 comments



It is very good to see this article and Redis bitmaps exploited, since it is an extremely memory efficient way to store data, and given the encoding, it is extremely fast to also fetch big amount of information this way.

I really suggest to also looking at GETRANGE and SETRANGE operations that allow to access sub-ranges of a large bitmap fetching or setting arbitrary ranges fo bits.

Probably Lua scripting in 2.6 will enable a lot more interesting things combining server side execution and bitmaps. Thanks for sharing this work.


For example, with Lua scripting, you could add custom commands for working with Bloom filters, to do fast approximate set membership testing:

http://en.wikipedia.org/wiki/Bloom_filter

I'm pretty excited about this.


Great point and looking forward to Lua support in 2.6. GETRANGE and SETRANGE operations also allow buffered writes at the collection time because setting bit to 1 is an idempotent operation. They are particularly attractive in sharded environment where one can perform:

  WATCH
  bits = GETRANGE ...
  bits = bits | buffered_bits
  MULTI
  SETRANGE .... bits
  EXEC


It would be really cool if redis supported the ability to count the high bits in a string as a native command. No need to pull a 16k string down to client

Although it is another command in a growing list of them. maybe something for lua.


This is something that should probably be implemented natively, for speed. Preferably with GCC's __builtin_popcount intrinsic, if it's available.


Bitmaps surely seem usable in Redis today - how can you do AND, OR, XOR on Redis bitmaps in Lua?

Also, will you create some kind of market for Redis scripts? That seems useful - even if it's just links to Github! ;)


Just curious, how does the Lua runtime perform with bitmap operations?


How does it compare with just using sets? Usually only a small fraction of your total userbase is online any given day, especially as dead accounts accumulate... wouldn't the bitmaps be inefficiently sparse?


You can do clever things with word aligned hybrid bitmaps to heavily compress sparse bitmaps.

Though, if you are storing binary users, 700 million users take less than 90 MB of space (assuming a straight array implementation). Old days wouldn't need to be kept in memory. If you want to think beyond redis, you could keep the active data in redis then flush old data to disk for later querying.

Sounds useful if your users map to a (0, Max] integer representation. Sounds complicated if you use uuids or external vendor IDs for users anywhere (you'd need an intermediate mapping table somewhere).


There is certainly a tipping point between sets and bitsets. Bitsets are far more efficient for storage and for performing unions or intersections for millions of users. For example, for a typical 10% of user base active daily for mobile apps (http://www.avc.com/a_vc/2011/07/301010.html), bit sets will pay off both in storage and performance.

Obviously, each use case is different and for very sparse case, sets would be a better choice.


Here's a quick ruby implementation:

https://github.com/cmdrkeene/bitmap-counter


Awesome!


An honest question: what use are realtime metrics if you can't act on them in realtime? If it takes a day or more to gather enough data to make a decision and react to it with code or otherwise, then anything more than daily metrics seem like a distraction. I suppose it can be useful if you have some automated systems in place or if you're using it for alarms.


If you're pushing changes to production many times a day, you can use realtime metrics to notice if anything goes awry, and know when you should roll back. It won't give you the same nuances as an A/B test over a month, but it's a great way to do a quick double check to keep things moving quickly while avoiding disaster.


How do they know that user ids are contiguous?


The same way the length of a meter is known. By defining it. They are measuring visitors, they can assign any numbers they like to those.

In my (limited) use cases for redis which interfaced with external information like that I usually had a mapping table anyway, to get something like

users:nameFromExternalSource:myIncrementalId

foo:myId:ThatUsersFoo

bar:myId:ThatUsersBar

Note again that I'm no expert on redis. There might be problems with that approach that I don't know about - but for me this worked out quite well and seemed to save memory vs any foo:externalIdOrUsername:dataHere name scheme.


Compressed bitmaps would be cool too.


Are there any cool bit level compressors?

You can take advantage of sparse bits and doing logic operations on them in compressed state.


Cool!




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: