Hacker News new | past | comments | ask | show | jobs | submit login

Our company is currently looking in to kafka and microservices. The problem we have is that the volume of actions going on has gone past what a single rails app with sql server can handle. When I look in to it, it seems like it would mostly be used as some kind of job queue where worker microservices churn through the entries in kafka to do some kind of data processing without needing sql.

But then there are blog posts saying kafka is a terrible job queue because you can only have one worker per partition and it's hard to get more partitions dynamically.




Sure, you can only one have consumer in a consumer group per partition. But partitions are cheap. And it's reasonably trivial to add more partitions should you find you need more concurrent consumers.

A very basic rule of thumb is, on an X broker cluster, have N partitions, where N / X = 0.

There's no harm in choosing something like 20 - 30 partitions for a topic, and increasing that when you need to scale consumers horizontally.

Dropping partitions is harder, but again, they're cheap, you won't need to for most use cases.

Only caveat to increasing partition count is when you're relying on absolute ordering per partition - key hashing can point to different partitions when you have 10 vs 50. It can still be done, but it requires a careful approach.


There’s also Pulsar to consider, in case you have not.

https://pulsar.apache.org/


What you can do is have a really large number of partitions and scale consumers up only when needed (workers need not be 1:1 with partitions.)


why kafka over the other messaging options? e.g. rabbitmq/amazon sqs/azure queue storage




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: