Hacker News new | past | comments | ask | show | jobs | submit login
BookKeeper: High-availability scalable distributed logging (muratbuffalo.blogspot.com)
17 points by mad44 on Oct 3, 2014 | hide | past | favorite | 2 comments



Kafka can be compared to HedWig, which uses BookKeeper internally to store data.

HedWig is closer to the traditional message broker model where the broker (Hub in case of HedWig) keeps track of the subscriptions and what's been consumed so far. Kafka, on the other hand, uses a stateless broker model where the consumers maintain the subscription state about what has been consumed.

HedWig Hubs keep track of all subscriptions and once all consumers "consume" a given message Hubs delete the message. Kafka doesn't do that. It allows its consumers to start all over again even if the messages have been consumed (as long as the message is not too old).

HedWig is also slower because of its focus on high durability. Earlier versions of Kafka didn't care about durability as much, so Kafka was much faster.

HedWig is also design to work with a large number of topics and a few consumers for those topics. Kafka can do a better job supporting a large number of consumers (given its stateless broker design).


The post mentions Tango, which has a novel consistency algorithm, but doesn't mention how BookKeeper differs from Kafka (also an Apache project). Can anyone comment on the difference?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: