Can Kafka be thought of as a non-hosted version of Kinesis? I thought Kinesis wo...

justinsb · on March 12, 2014

If you can tolerate some data loss, yes.

I believe Kafka can still lose some data if all the active machines fail. It's a deliberate design decision (it's the right thing to do if you want to remain available and can tolerate some data data loss). I believe the Kafka team are working on it, but it's non-trivial to fix.

ryanbrush · on March 12, 2014

I believe the ability to configure this behavior is being tracked at the link below. It seems like it's a switch between consistency and availability. By default Kafka prefers availability, and the possible inconsistency results in data loss (because Kafka just discards some inconsistent data it can't resolve). But the JIRA linked below should make that behavior optional, so if a majority of machines fail the cluster will become unavailable rather than inconsistent.

https://issues.apache.org/jira/browse/KAFKA-1028

hatred · on March 12, 2014

Yes, Kinesis can be thought as a non-hosted version of Kafka. To me , using either of them is a cost versus benefits trade-off i.e. if you are willing to pay the cost of using Kinesis to get a hosted solution where-in the operational burden is greatly reduced or vice-versa.

ihsw · on March 12, 2014

One main advantage is that Kinesis is elastic -- it scales automatically based on load. Managing a Kafka cluster is an unnecessary task with Kinesis available, which alleviates quite a bit of headache.

nullspace · on March 12, 2014

Ehh - this is just my two cents with working on Kafka. When they say it's high performance, they really really mean. I have gotten very high throughputs on just 2 medium machines.

If you process that much data, Kafka is one of the last things which you'll need to scale out.

namelezz · on March 12, 2014

"It scales automatically based on load." Is it freeee?