Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> one is a message broker, and the other is a distributed streaming platform

I think this is an odd way of putting it. One is smart messaging; dumb clients. The other is dumb messaging; smart clients. It turns out the latter (i.e. Kafka) scales wonderfully so you can send more data, but you add complexity to your clients, who can't just now pluck messages off a queue to process, or have messages retry upon the first 3 failures, as they could with RabbitMQ.

Having said that, Kafka lets you keep all your data, so you don't have to worry about losing messages to unexpected interactions between RabbitMQ rules. But having said that, now you have to store all your data.



> who can't just now pluck messages off a queue to process

The problem is you cannot mark individual messages as read, for a given consumer&partition you can only update the offset for a partition.

If a certain message processing takes very long, all other messages in that partition will have to wait.

Also, with kafka, the max read concurrency is equal to the number of partitions, for something like rabbitMq it is much higher; but you do get nice message ordering for any given partition in kafka which you do not get in RabbitMq (afik); you are also get some really nice data locality with kafka because unless the consumers get the partitions re-assigned, all messages for the same key are served on the same physical consumer.


> The problem is you cannot mark individual messages as read, for a given consumer&partition you can only update the offset for a partition.

Hence "smart clients". If you MUST process every message at least once, you will anyway be tracking messages individually on the client (e.g. a DB or file system plus logic for idempotent message processing) and thus disable auto-offset commits back to the cluster for your consumer.

RabbitMQ says "let me track this for you", Kafka says "you already need to track this so why duplicate the data in the cluster and complicate the protocol".

If you don't have reliable persistent storage available and insist on using the Kafka cluster to track offsets, you can track processed offsets in memory and whenever your lowest processed offset moves forward, you have your consumer commit that offset manually as part of its message loop.

If your service restarts your downstream commands need to be idempotent of course because you will reconsume messages you may have previously processed, but this would be the case regardless of Kafka or RabbitMQ unless you're using distributed transactions (yuck).

> If a certain message processing takes very long, all other messages in that partition will have to wait.

You can stream messages into a buffer and process them in parallel, and commit the low watermark offset whenever it changes, as described above. I've implemented this in .NET with Channels and saturate the CPUs with no problem.


You've made very good points about smart clients, but at some point one has to ponder if it's worth it or one should just not use kafka in the first place.

I've seen databases used as messaging queues and if it was up to me, I'd never do that. It's usually "but we already have kafka + db, why burden ourselves with another messaging technology?", which is fair.

> You can stream messages into a buffer and process them in parallel, and commit the low watermark offset whenever it changes, as described above. I've implemented this in .NET with Channels and saturate the CPUs with no problem.

That is very nice -- certainly seems better than just batch processing of kafka messages, but you're still just kicking the can down the road. How large do allow the buffer to become and what do you do when it's getting too large?

You probably use a DLQ.

Don't get me wrong, I think the buffer idea probably works most of the time.


Completely agree. Kafka was another team's decision, not mine, so I had to figure it out. RabbitMQ is very convenient in that you don't need to read a couple of books on reliable data integration patterns to get something working simply and intuitively.

I am fond of Kafka now that I understand it, but I was also an assembly language programmer in a past life so my opinion is probably in the minority.

Regarding the buffer size: you need to implement back pressure, especially if you are CPU and not IO bound; it's another thing that's easy to get wrong with Kafka.


> You can stream messages into a buffer and process them in parallel, and commit the low watermark offset whenever it changes, as described above. I've implemented this in .NET with Channels and saturate the CPUs with no problem.

And there are libraries that will manage all this for you e.g. https://github.com/line/decaton


If you have idempotent messages, why can't you use auto offset committing?


You are quite correct - you absolutely can use auto offset commits in that case. In my scenario, though, I have a lot of messages and a low recovery time objective on service restart so I find it cleaner to skip messages I know I won't need. Also reduces noise on the service logs, makes for easier debugging etc.


Worth noting that Kafka is getting queues: https://cwiki.apache.org/confluence/display/KAFKA/KIP-932%3A...


And also Rabbit has streams[1]. There's a lot of overlap.

[1] https://www.rabbitmq.com/streams.html


Just my 2c but for anyone unaware, you should check out NATS.

It combines the best of both Kafka and RabbitMQ IMO.


I thought NATS didn't actually store messages, am I mistaken?

Looking at Wikipedia (https://en.wikipedia.org/wiki/NATS_Messaging) I see that I'm technically right, it's JetStream that does the storage layer - but it's part of the NATS server.

From memory, I really liked the philosophy of NATS but found the nomenclature confusing.


I think they call that part of NATS “Jetstream” if I’m not mistaken. I haven’t used it, but I believe it has some form of message persistence.

I have used it mostly for message-first services, and found subject-based messaging a breath of fresh air to decouple services. You can do the same thing with RabbitMQ topic exchanges, but it requires quite a bit more hand-waving.


Jetstream does indeed have message persistence: I can issue queries like “get messages on topic since 5 minutes ago” - I do this a ton. However, that seems to be the extent of the storage/query API that it exposes for historical messages. I’m quite a big fan, and would recommend it with the caveat that Jetstream is considerably more complex than simple nats and I get the feeling I’m barely scratching the surface with it.


NATS JetStream also implements subject-based addressing at the stream level (unlike Kafka where 1 stream = 1 topic, and you can only use the message's key for distribution, not for addressing).

So you can for example ask for [the first/the last/all] of the messages on a particular subject, or on a hierarchy of subjects by using wildcards. All the filtering is done at the server level.


I’m really interested in a Kafka like message broker in the Go ecosystem and look forward to checking it out for whatever my next project ends up being.


It’s pretty cool. I would personally suggest Kafka or RabbitMQ depending on your needs, as Jetstream has proven to require a lot of ops engineering to remain stable in production


You're getting to the key thing:

You don't want to classify them by what they do. You want to classify them by what the clients must do/experience.


It is not odd, it is basically accurate. You are making a fetish of the S-C interaction but the essential matter is that Kafka is designed to store & distribute logs, whereas Rabbit is designed to route & send messages. The ‘store’ bit is very much a part of Kafka’s mission statement but not for Rabbit.


> One is smart messaging; dumb clients. The other is dumb messaging; smart clients.

All the smartness of the messaging can be implemented in the smart clients. Then you can expose that as a smart messaging api to dumb clients.

The most obvious example is kafka streams which exposes a "simple" api rather than dealing directly with kafka, but obviously you could create a less featurefull wrapper than that.


I can't help but think that this just gives you the worst of both worlds. You are now on the hook managing that non-standard "smart" wrapper which will quickly just become the status quo for the project. Anyone wanting to change how it works needs to understand exactly how "smart" you made it and all the side effects that will come with making a change there.

I pushed against knative in our company particularly for that reason. Like we wanna use kafka because [Insert kafka sales pitch], but we don't want our developers to utilize any of the kafka features. We're just gonna define the kafka client in some yaml format and have our clients handle an http request per message. It didn't make sense to me.


Thats kind of like saying dont use any software libraries because they all use the standard lib indirectly so you may as well just use that?

Its just an abstraction layer to make things less effort.


> Thats kind of like saying dont use any software libraries because they all use the standard lib indirectly so you may as well just use that?

This is decent advice, IMO. The cost of dependency management is often vastly understated.


That’s Not Invented Here syndrome, and it’s decidedly bad advice.

The cost of dependency management may be understated but it’s always less than the cost of reimplementing everything found in established libraries.


One HTTP connection per message (if this is what the original poster meant) is probably a bad idea whether you implement it yourself or not.

Also, let's be honest: The phalanx of developers that violently argue for importing everything and never implementing anything yourselves is way bigger than people who argue for the opposite; I don't think we need to worry about the latter making things worse as much as the former.

We've seen what the world turns into in both scenarios, I would argue, and at least with the first one we got software that ran decently and developers who knew how to actually do things. We have overall much safer languages now so their home grown solutions won't have the same safety issues that they've historically had.

With the importer crowd we've gotten software that feels like molasses and an industry full of people who know only the surface of everything and more importantly become tied to frameworks and specific APIs because they never actually go beyond any surface APIs.

As with most tradeoffs in software there is a good middle-ground, but we won't ever get there if we don't have people who argue for making things yourselves.


Who has the time to count the cost. Just keep shipping and worry about the costs later is the reality in a large portion of the tech ecosystem.


Being judicious in which dependencies you take on is not the same as Not Invented Here syndrome. Code is usually a liability, not an asset.


yeah, don't wrap all calls to a standard lib in another homegrown or non-standard single-digit user lib that makes changes in all sort of subtle ways. There are plenty of C++ projects that make their own or wrap stdlib and they are always a big wtf.

It's one thing to have an abstraction for kafka in your code, it's another to wrap the client in a smart client that reimplements something like rabbitmq, and much worse a smart service.


> don't wrap all calls to a standard lib

Im not saying to expose the same primitives - what would be the point in that? I am saying that EVERY lib you use will be using the standard lib or some abstraction of it to perform its own utility.

> It's one thing to have an abstraction for kafka in your code, it's another to wrap the client in a smart client, and much worse a smart service.

That abstraction is exactly what i am talking about. Why write 50 lines of boilerplate multiple times throughout your code when you can wrap that up in a single function call and expose THAT as your client. You know thats exactly what you will end up doing on any non-trivial project. Or you could use a lib that already does that - such as the "official" kafka streams lib.


mediocre Libraries sink to the bottom over time. Home grown libraries have a different quality and don’t sink easily. This is definitely YMMV for different teams

I can imagine home grown libraries having inconsistent api with wild and wonderful assumptions and beware the edge cases


This would be my instinct too.


And reimplement rabbitmq? Great idea. Let's do it in rust too.


That's a neat way to put it!

- RabbitMQ: Smart messaging, dumb client - Kafka: Dumb messaging, smart client

Have you heard of Fluvio? Fluvio: Smart messaging, Smart Client, Stateful Streaming

Kafka + Flink in Rust + WASM Git Repo - https://github.com/infinyon/fluvio


>All the smartness of the messaging can be implemented in the smart clients.

How do you do, for example, a queue with priorities client side without it being insanity? That's a relatively basic AMQP thing. Or managing the number of redeliveries for a message that's being repeatedly rejected.

You can absolutely try to build some of this with a look-aside shared data store that all clients have to depend on in order to emulate having the capability in the broker, but you just introduced another common point of failure in addition to the messaging infrastructure. Life is too short for this.


I totally agree that you cant do a lot of AMQP stuff. As you noted, you can build some of it by managing state via transactional producers, etc - but you definitely cant do everything. The biggest gripe for me is actually dynamic "queue" creation, patterns for topics, etc. So I use an MQ for an MQ ;)

I'm just saying you can "dumb down" the client side on kafka by creating an abstraction layer (or one of the many higher level libs that already do that).


Those requirements would definitely be examples of those that are fulfilled by smart messaging.


Every decision has a consequence. There are a lot more options depending on the use case.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: