I only had a brief exposure to it, but my impression is that it's sort of a message queue optimized for very large data (TB or more). So, for example, there's no way to easily answer questions like "How many requests did server X generate between 1pm and 2pm and how many of them were served by server Y?" because when your data doesn't fit in a single machine, supporting such queries requires a lot of bookkeeping. If you never need them, you don't want to pay for them.
Of course, when have a few megabytes of data and you route it through Kafka, then all you get is an opaque message queue where you can't see which message went from where to where. Good luck debugging any issues. But, hey, you got to use Kafka.
> for example, there's no way to easily answer questions like "How many requests did server X generate between 1pm and 2pm and how many of them were served by server Y?"
There's many ways to answer that using data streamed over Kafka - ingest it into your preferred query engine, go query it.
Of course, when have a few megabytes of data and you route it through Kafka, then all you get is an opaque message queue where you can't see which message went from where to where. Good luck debugging any issues. But, hey, you got to use Kafka.