Scaling SQL with Redis

bryanh · on May 13, 2014

Redis really is a fundamental building block for designing distributed systems these days. I was kind of surprised, but all these examples exist independently in the Zapier codebase as well (all backed by Redis).

I've been meaning to open source our timeseries implementation for a while now, it is very similar to the linked article but uses a "{key}:YYYY:MM:DD:hh:mm:ss" pattern on hashes where you pick your stored granularity and TTL for each time unit. For example: store second granularity "{key}:YYYY:MM:DD:hh:mm": {0-60: count} for 8 hours, minute granularity "{key}:YYYY:MM:DD:hh": {0-60: count} for 24 hours, hour granularity "{key}:YYYY:MM:DD": {0-24: count} for 3 days, the rest forever. Very similar to https://github.com/jimeh/redistat or other implementations.

Fun!

import · on May 13, 2014

And also

https://github.com/antirez/redis-timeseries https://github.com/o/simmetrica https://www.npmjs.org/package/redis-timeseries

bryanh · on May 13, 2014

Oooo, simmetrica looks very nice! IIRC when I was writing our implementation there weren't any solid Python versions yet. Glad to see that changing!

More good implementation info here http://blog.apiaxle.com/post/storing-near-realtime-stats-in-....

popee · on May 13, 2014

You just solved some of my problems. Thank you very much!

Btw is there any nodejs module for voting? I've done it myself for one app but it would be nice to see other solutions.

gingerlime · on May 13, 2014

Hey Bryan,

just curious, are you guys not already using statsd/graphite? I think that at least you used to, since you contributed to my small script[0] to automate the installation of graphite... So I'm curious if/why it wasn't good enough, or whether this has different requirements that graphite wasn't suitable for?

[0]https://github.com/gingerlime/graphite-fabric

mikeknoop · on May 13, 2014

(Jumping in for Bryan; also @Zapier)

We installed statsd/graphite early on to experiment around with visualizing our task and request logs for Zaps. We've since settled into Elasticsearch and Graylog which is phenomenal for debugging and support -- but has it's growing pains.

The timeseries stuff is used more at the application layer, rather than the pure logging layer. For example, I believe we're using it to track how many tasks an account has done over the last 30 days for pricing/plans.

bryanh · on May 13, 2014

Yep! Redis tends to be more for reading inflight rate/plan limiting, something I've not heard a lot about in conjunction with graphite (though it might be great!). We might bring back statsd/graphite for alerting/monitoring in general though, we've been looking for solutions there.

gingerlime · on May 13, 2014

Thanks (also to Mike) for sharing. I can't comment about your specific use-case, but I imagine that unless your latency requirements are very demanding, graphite can play nicely here. Its rather expressive querying allows you to aggregate timeseries data pretty easily. Of course, Graphite is primarily used for monitoring and trending, but it's not limited to this use-case. Alerting isn't something it's particularly great at though, but there are other tools that plug in there.

Not trying to diss redis, I'm using redis as well and love it, but was just curious since you mentioned time-series data.

necro · on May 13, 2014

Last time I used Redis I was surprised to determine to my surprise that Redis was single threaded. Of course I could have just RTFM but I assumed incorrectly.

This means that if you have part of your application that requires fast consistent GETs, and then another application does a slower SORT, UNION, DIFF, etc, on the same db or even other dbs on the same Redis server, EVERY other client request has to wait for this slower command to finish. http://redis.io/topics/latency

This is something that one really has to engineer around in order to use it in an environment that requires performance and consistent latency. In our case of 1000s req/s it was just unacceptable to have the latency be affected, sometimes by 10 times, by a slower command.

I do love all the sort, diff, union commands.

jbert · on May 13, 2014

If the two datasets with different access speed requirements are disjoint, you can just run two instances of redis. One for the high-latency gruntwork, one for the low-latency GETs.

If the datasets aren't disjoint, then you're trying to do fast and slow ops with the same data, which - if you need accurate values - is going to be mildly hairy even if multithreaded, since you'll need to somehow lock the data while you do the slow op (which will exclude the GETs, causing high latency), or you'll need some kind of transaction-based stable view to operate on (e.g. transactional memory?)

necro · on May 13, 2014

Very rare data access is disjoint, unless you're only doing key/value put/get. I think the interest of Redis is that it has many other features than simply put/get, and all those sorts, diff, etc typically would work a set of data that is being written in.

For sure having multiple instances will help some of this, but adds more complexity. Do you have your app write to multiple instances, and then read low latency from one, and read high latency from another? Is that data now consistent? Do you setup Redis replication and make sure that works right and then read from different replicas? Or perhaps you engineer some queue that does not block writes, groups them together and writes to Redis in a separate thread. Then you have to maintain all this and make sure it's correct, back it up, what are the corner cases, failure modes, etc.

From my experience, if you want to engineer things well, you end up essentially building out the same sub systems that a larger db engine has. Say Innodb. I'm smart enough to know that I'm not smart enough to build a one off complex system more correctly than really smart people that have been iterating over many years and improving things on something like innodb.

There are very rare, very specific cases where I would use redis over something else if I was building something realtime, large and important.

opendais · on May 13, 2014

That may be your experience but...

I suggest you google YouPorn's architecture.

I think its a domain/scale issue. It isn't a 'everything must become a more complex db engine to be engineered well' issue.

elementai · on May 13, 2014

I love Redis so much, it became like a superglue where "just enough" performance is needed to resolve a bottleneck problem, but you don't have resources to rewrite a whole thing in something fast.

cdelsolar · on May 13, 2014

Another way that we have used timeseries with Redis at Leftronic is ZSETs. The "score" is the timestamp and the key is a string like {"value": 42, "timestamp": 123456789}. That way you can have auto-sorting/replacement/insertion of timeseries points. Including the timestamp in the key is necessary so you can have duplicate values.

gumballhead · on May 13, 2014

Interesting, but why not use redis pub-sub for the job queues instead of forwarding to RabbitMQ?

kondro · on May 13, 2014

Durability & high availability?

estrabd · on May 13, 2014

Redis is durable with its bin logs, but if you're pushing many "jobs" through Redis, you will end up wanting to turn off bin logging because of the lag it introduces.

mantrax5 · on May 13, 2014

Durable to me means the chance of losing state on unexpected shutdown is ruled out. It's not the case for Redis.

estrabd · on May 14, 2014

In the ACID sense, no it doesn't have write ahead logging or recovery. But it's good enough for government work.

saryant · on May 13, 2014

Rabbit is durable and highly available as well. It can be clustered and it's confirmable queues allow for a high volume of writes while remaining durable.

estrabd · on May 13, 2014

This was my take away question. Redis can be used extremely effectively as a pool of job queues with failover. Perhaps RabbitMQ provides robust bidirectional messaging? While pooling Redis works well for one-way job submission (with each Redis instance being backed by some set of work consumers), making the process synchronous (whereby the consuming worker communicates back to the producer) is not so clearly handled in a robust way unless the producer is listening on some set of Redis instances for the single reply message. RabbitMQ seems heavy weight just to solve that single problem, though.

saryant · on May 13, 2014

Making something synchronous when it involves a job queue sounds like a recipe for disaster, IMO. Better to let both the consumer and producer act in a fire-and-forget manner with the original consumer producing a reply on a second queue which the original producer will eventually handle.

estrabd · on May 14, 2014

Our use case necessitated a synchronous interaction, but the message back was done exactly as you suggest - just uses a private second queue. The challenge was that while the initial job is fire-and-forget (or wait for a reply), the 2nd private queue was just a placeholder for a reliable messaging backend that needed to be implemented. It doesn't matter what worker gets the job, but it matters to whom the consumer of the job replies. I believe Redis provides enough primitives to make a highly reliable messaging system. We just have not done it.

threeseed · on May 13, 2014

I would be curious to compare this PostgreSQL + RabbitMQ + Redis solution with Cassandra. It is very well suited to time series data which is why it is so popular in advertising industries.

Also you would think that rate limiting would be handled at the load balancing layer with Nginx, Apache, Layer7 etc. Way before it gets close to your app.

Not criticising Sentry for doing things a bit different. Redis is a fantastic technology.

zeeg · on May 13, 2014

We handle rate limiting with iptables, nginx, and Redis. Redis is the final state, but our goal is to make a sustainable and fast rate limiting solution which we can actually report metrics on. When things get dropped in iptables for example we have very little information, and Nginx is almost as low level as that.

itamarhaber · on May 13, 2014

Interesting use case and a great writeup - thanks for sharing :)

mantrax5 · on May 13, 2014

Redis is great... except for the fact it's (publicly) not ACID, so adding Redis in the mix and calling it "scaling SQL" is outright misleading, because it loses the very properties SQL exists to provide.

Redis will enter into conflicts (where in this article's example, those locks won't "lock" the thing you're locking), and it'll lose minutes of committed operations on unexpected stops.

Does that make Redis useless? Hell no. Can it help scale your app if carefully considered, with regards to its properties? Sure. Does it "scale SQL"? No.

zeeg · on May 13, 2014

Almost the entire post assumes you're not using it for durability. It's about making tradeoffs so the use of SQL can scale. If you want to be semantic, ACID will never be performant and scalable. The use of SQL on getsentry.com is already pushing Postgres to the limit's of what a locking/transactional database can do.

The locks are a minor bullet point in a much larger picture. Redis is never going to generate "conflicts" in a classical sense, but there are race conditions with the specific lock implementation. I definitely didn't suggest they were strong.

nkozyra · on May 13, 2014

> If you want to be semantic, ACID will never be performant and scalable.

I disagree with this part. We may not have great options for it now but we're largely stuck with the requirement of a hard lock for data consistency - someday someone will figure out how to mitigate the effect here.

patrickmay · on May 13, 2014

> ACID will never be performant and scalable

We manage this quite well at GigaSpaces (http://www.gigaspaces.com). I have some examples up at http://gigaspacesinanger.wordpress.com that show some use cases.

diakritikal · on May 13, 2014

> ACID will never be performant and scalable

I think the chaps over at HyperDex.org may strongly disagree with you.

mantrax5 · on May 13, 2014

I don't understand what's so hard to say the thing being scaled up is "the application domain model" and not "SQL". Not hard, is it?

A "scaling SQL" article that suggests adding Redis is like a "make more beer" article that suggests adding water.

There are performant algorithms for durable operations (as seen in frameworks like LMAX's Disruptor) which are simply not explored by Redis. The Disruptor is not canonical ACID, but it is durable.

They stumbled upon scalable durability because they had no other choice. As a trading platform, they were required to be durable by law, and required to scale by their clients.

A blanket all-or-nothing statement like "it will never scale" stops you before you even try to research the space of possible solutions.

twic · on May 13, 2014

As an aside, i think the persistence approach used in Disruptor is taken from Prevayler:

http://prevayler.org/

Which perhaps got it from other even earlier efforts of which i am not aware.

cdelsolar · on May 13, 2014

> it'll lose minutes of committed operations on unexpected stops.

Never noticed anything like this and I've been using Redis for 3+ years.

seivan · on May 13, 2014

I wrote this three years ago. It helped for a while, but these days I'd probably just do it in postgresql and try to use the native arrays as much as possible.

https://github.com/seivan/redis-friendships

https://github.com/seivan/Rfizzy

https://github.com/seivan/redis-messages