I wonder how this compares to streams in Kafka or Kinesis. One of the main advantages of redis is that I see it used in many cases as a replacement for memcache (just a key/value store for bytes/strings) so it already exists in many infrastructures.
I shared my experience sometime back in another HN thread [1]:
"A key difference I observed was that if a Kafka consumer crashes, a rebalance is triggered by Kafka after which the remaining consumers seamlessly start consuming the messages from the last committed offset of the failed consumer.
Whereas with Redis streams I had to write code in my application to periodically poll and claim unacked messages pending for more than some threshold time."
From my experience, Kafka has the best api for handling read-once, distributed streams. Almost every other streaming solution, like Redis in this case, has a non-ideal or non-existent way to coordinate stream consumers in a way that prevents double-reads. And lots of streaming applications need to ensure read-once (think about what a double read ends up as - maybe a twice-sent message, or a duplicate metric), so I'm not sure why they all struggle so much with just copying kafka's pretty simple consumer api
How so? Kafka only accepts offsets which are meant for batches of items or even an entire partition. This means a single item not being processed within that batch requires your own code to compensate. It's the weakest of all the messaging models.
Per-message acknowledgement is an advancement. Redis requires manual lookups for unacknowledged items but you can also use Apache Pulsar for a more scalable distributed disk-based system which itself is a solid evolution over Kafka's design.
Also note that "exactly-once" semantics are actually impossible. Messaging systems are either "at-least-once" or "at-most-once". Kafka has some attempts at using transactions to solve this but that's only when using Kafka streams and only ensures read progress, not the processing result.
I’m not sure what you mean by “only accepts offsets which are meant for batches” but with Kafka the offsets are per-partition and you have to flexibility to control exactly when that offset is marked as processed. In our systems we always used manual offset committing and would only commit an offset once processing of the message has completed successfully to ensure both at-most-once and seamless failover.
Offsets are a marker that says everything before it (in that partition) is processed. This creates 2 issues:
1) Your application must coordinate and make sure that everything up to that offset is indeed processed successfully. 2) You application must stop if it encounters an error (because it can't commit an offset greater than that item) or handle it separately by logging to another topic, database, etc.
Other systems like Redis, Pulsar, Google PubSub provide per-message acknowledgement to allow items to be individually processed without blocking other forward progress.
Ah, I see what you’re saying now. #2 was always a pain to deal with, but I think other systems have similar problems. Other messaging systems deal with this with things like dead letter queues etc, but no matter what you use for message processing you will need some specialized logic to handle records which can’t be processed normally. In Kafka, you can raise an exception for the offset and then move on. When dealing with the exception, you can seek directly to the record offset and take it from there.
For #1, any application which has an in-order requirement would suffer from this problem. I worked with event processing systems so we never really had to worry about this, since each event was independent. However, there were instances where we would need to track state for certain objects getting processed to make sure all of their child objects were also processed. For this we would use an external store with a short TTL since the lifetime of the object during processing would only be a few minutes.
All-in-all it just comes down to what your app’s requirements are. I don’t think Kafka is meant to replace every pub sub service out there, but definitely has some great use cases.
This is a fair thing to point out. For us the “at-most-once” guarantee was based on committing. Luckily our processing was idempotent so in the rare case of the above scenario it wouldn’t cause any duplication