I really wish SQS had reliably lower latency, like Redis, and also supported pri...

actuator · on May 27, 2019

I have worked with sorted sets with millions of items. The latency really depends on what you execute. Commands that involve only few elements are fine. Like the ZPOPMAX you mentioned should be way under a ms if you pop say 10 items, and you should be able to get way more than 1k QPS. The thing with sorted sets like most data structures in Redis is to just read the time complexity well in their documentation. For an operation like ZPOPMAX it is linearly proportional to the number of items you pop, so if you pop too many it will take time.

etaioinshrdlu · on May 27, 2019

But actually ZPOPMAX is proportional to log(number of elements in set). So when the set grows to millions you may have a performance problem. I have no idea how fast growing this log function is so I really have no idea of the performance on sets of millions...

actuator · on May 28, 2019

Well, that represents the asymptotic time complexity. So, roughly with an increasing N it increases by log of it. Here is an example of the log(base 2) function growth.

Log(4) = 2, Log(16) = 4, Log(256) = 8, Log(65536) = 16, Log(4,294,967,296) = 32

Assuming other things remaining unaffected, theoretically that means in a set of 4 million the operation should be about 4 times slower than in a set of 250 elements.

plasma · on May 27, 2019

We use Redis as a job queue and its great; the only limitation is being sometimes concerned about job queue size due to memory limits of the Redis server itself.

espadrine · on May 27, 2019

Also, when you don't want to lose a message, the Redis persistence story requires careful thought. It requires setting up RDB + AOF + appendfsync=always + backups.

manigandham · on May 28, 2019

You don't need to fsync on every write, that's what the replica is for. At lower/mid-tier hardware, network is faster than storage and your message is on multiple machines before it's even written to disk so fsync of 1 second is usually fine.

actuator · on May 27, 2019

This would just kill the performance on virtualized hardware like EBS. You would lose a lot of benefits of Redis at that point.

ramraj07 · on May 27, 2019

Always hear using redis etc for job queues, what do these "job queues" entail? I've been an amateur and have used postgres tables as queues .. am I not being efficient?

actuator · on May 27, 2019

Well, they are just a mechanism to push jobs between different components where you treat Redis as a message queue. In Redis you can implement one using Lists, Sorted Sets, Stream or Pub/Sub; though they are commonly implemented through the first two. Though if you end up using first two, to guarantee at least once semantics you need to take care of acknowledgement in a not so pleasant way.

You can replace Redis with pretty much any message queue though like RabbitMQ which has a better consumption story. The main advantage of using any one of these would be the throughout you can achieve and it decouples your database at that point which a lot of people prefer.

fsaintjacques · on May 27, 2019

Tables as queues is perfectly legit, one less component to worry about.

stavros · on May 27, 2019

Does anyone have any rough performance numbers on each approach? I'll only use DB tables when I don't want to deploy Redis (which I usually do, as a cache or whatever), and I'll use RabbitMQ for more "serious" projects. On small projects, I'll even forgo Redis and use the DB as a cache too, which works pretty well.

What do y'all do?

ramraj07 · on May 27, 2019

My job queue requirements have been very modest (less than ten jobs per hour) and no more than 2 or 3 workers. I also generally have redis deployed but it never occurred to me to use it given it's "ephemeral" nature - always remained paranoid that services on the cloud should always be assumed to go down at any time

stavros · on May 27, 2019

I generally agree, but I have Redis as part of my main stack now, and it's been pretty solid. I'm fairly distrustful of "cloud" services, though, so I run my stuff on Hetzner and manage them myself. For my revenues, where a day's downtime might cost me $5/mo in lost revenue, I think it makes more sense.

actuator · on May 27, 2019

One more advantage it gives is you can have pretty easy job guarantees through the database transactions.

kondro · on May 28, 2019

Good luck with durability (of which Redis has no decent guarantees) and availability (of which depend entirely on how good you are at configuring and maintaining Redis servers and worse, the way you access them as a client).

jgalt212 · on May 28, 2019

I'm gonna give SQS a shot because authentication in redis seems to be quite a challenge, or was so the last time we looked about six months back.