Apache Kafka – Publish-subscribe messaging rethought as a distributed commit log

helper · on March 12, 2014

@Aphyr's "Call me maybe: Kafka"[1] blog post is a great reference for what Kafka is and what distributed guarantees it tries to provide.

[1]: http://aphyr.com/posts/293-call-me-maybe-kafka

tveita · on March 12, 2014

And this is a good explanation of the motivation behind the design, and how it is used in practice:

http://engineering.linkedin.com/distributed-systems/log-what...

rdtsc · on March 12, 2014

Good point.

TL;DR: "Kafka’s replication claimed to be CA, but in the presence of a partition, threw away an arbitrarily large volume of committed writes. It claimed tolerance to F-1 failures, but a single node could cause catastrophe."

Dylan16807 · on March 12, 2014

And the reply to that is:

"we can tolerate N-1 Kafka node failures, but only N/2-1 Zookeeper failures. This actually is sensible, though, as Kafka node count scales with data size but Zookeeper node count doesn’t. So we would commonly have five Zookeeper replicas but only replication factor 3 within Kafka (even if we have many, many Kafka servers—data in Kafka is partitioned so not all nodes are identical)."

(Zookeeper is used to track node membership)

jessaustin · on March 12, 2014

I think I see what he's trying to say, but there's a letter that isn't included in the adjective "CA", and that letter is "P". How could it be otherwise? Indeed, from TFA: "Kafka can do this because LinkedIn’s brokers run in a datacenter, where partitions are rare."

Nacraile · on March 12, 2014

You're correct, the system does not advertise partition tolerance, and @Aphyr does indeed confirm that the system is not partition tolerant.

Broadly, I think sacrificing P is pretty deeply misguided. If you run a distributed system at any scale for any period of time, you're going to experience network partitions, even within a single DC. (I speak from experience: I work on a distributed system which runs a whole bunch of machines in a whole bunch of datacentres worldwide. We see a reasonable number of non-trivial within-DC network partitions every year.)

A much more detailed argument of the above: http://codahale.com/you-cant-sacrifice-partition-tolerance/

You really do need to think about what happens to your system when it gets partitioned. This paper has much of interest to say on the topic: http://cs-www.cs.yale.edu/homes/dna/papers/abadi-pacelc.pdf

derefr · on March 13, 2014

Isn't "a partition-intolerant system" really just another way to say "a non-distributed system"?

aaronblohowiak · on March 13, 2014

No, it is just a system that can lose data in the face of a partition.

arielweisberg · on March 12, 2014

Partitions are != network partitions. Processes on the same node can become partitioned, threads in the same process can become partitioned. Actors within threads can become partitioned from themselves or other actors.

In my experience network partitions are not the most common kind of partition although they are well represented, especially on some network topologies.

IMO you should never assume that any two things won't be partitioned as part of your expectation of correctness or availability.

pfraze · on March 12, 2014

Can you describe a partition of two actors within the same thread? I'd guess this would be in some sort of logic failure state, where either the consuming actor doesn't correctly receive its messages, or the producing actor doesn't correctly transmit.

lgierth · on March 12, 2014

It can also happen when one can't keep up with the other and falls behind.

boredandroid · on March 13, 2014

I am one of the Kafka committers. I had posted some notes on the Jepsen post which I think may give a more complete picture: http://blog.empathybox.com/post/62279088548/a-few-notes-on-k...

ot · on March 12, 2014

Kafka's author wrote a very well motivated and interesting introduction to Kafka-based architectures a few months ago on LinkedIn's blog:

http://engineering.linkedin.com/distributed-systems/log-what...

Previous HN discussion: https://news.ycombinator.com/item?id=6916557

burke · on March 12, 2014

We're using Kafka at Shopify and we're pretty satisfied with it. We wrote a client library for Go: https://github.com/Shopify/sarama

nofinator · on March 12, 2014

I like the hattip to Franz Kafka, although I would be wary of submitting An Imperial Message [1] to the queue. It may never make its way out.

[1] http://www.kafka-online.info/an-imperial-message.html

crawdog · on March 12, 2014

If you are interested in a sample project I have one available here for tailing a file into a queue: https://github.com/rickcrawford/kafka_file_tailer

res0nat0r · on March 12, 2014

Can Kafka be thought of as a non-hosted version of Kinesis? I thought Kinesis would be a good solution to dump logfile data to for processing and ingestion. Could you explain some technical reasons to use Kafka vs. Kinesis? Thanks.

justinsb · on March 12, 2014

If you can tolerate some data loss, yes.

I believe Kafka can still lose some data if all the active machines fail. It's a deliberate design decision (it's the right thing to do if you want to remain available and can tolerate some data data loss). I believe the Kafka team are working on it, but it's non-trivial to fix.

ryanbrush · on March 12, 2014

I believe the ability to configure this behavior is being tracked at the link below. It seems like it's a switch between consistency and availability. By default Kafka prefers availability, and the possible inconsistency results in data loss (because Kafka just discards some inconsistent data it can't resolve). But the JIRA linked below should make that behavior optional, so if a majority of machines fail the cluster will become unavailable rather than inconsistent.

https://issues.apache.org/jira/browse/KAFKA-1028

hatred · on March 12, 2014

Yes, Kinesis can be thought as a non-hosted version of Kafka. To me , using either of them is a cost versus benefits trade-off i.e. if you are willing to pay the cost of using Kinesis to get a hosted solution where-in the operational burden is greatly reduced or vice-versa.

ihsw · on March 12, 2014

One main advantage is that Kinesis is elastic -- it scales automatically based on load. Managing a Kafka cluster is an unnecessary task with Kinesis available, which alleviates quite a bit of headache.

nullspace · on March 12, 2014

Ehh - this is just my two cents with working on Kafka. When they say it's high performance, they really really mean. I have gotten very high throughputs on just 2 medium machines.

If you process that much data, Kafka is one of the last things which you'll need to scale out.

namelezz · on March 12, 2014

"It scales automatically based on load." Is it freeee?

MagicWishMonkey · on March 13, 2014

Sorry for the dumb question, but can anyone explain how this compares to something like RabbitMQ?

geoffroy · on March 13, 2014

How does it compare to beanstalkd ?

shmerl · on March 12, 2014

Using it with C++ is not really on par with using it with Java. Implementing the system itself in Java also looks very questionable, since supposedly performance should be very critical there.

saltysugar · on March 12, 2014

Kafka is mostly in Scala (which runs on JVM). Scala provides many attractive features to build scalable and type-safe code. And about performance, Kafka is used of LinkedIn and it process hundreds of gigabytes of data, close to a billion messages per day (reported in their paper in 2011), and the engineers claim that they're processing terabytes of data a day now.

Not sure on what basis you claim the choice of language to be "questionable", but keep in mind that Scala's type-safety and many other features are much more difficult to achieve in C/C++. Cleaner code is sometimes more important than some tiny gain in performance.

Also in terms of scaling, Kafka cleverly takes advantage of many aspects in their design to ensure low-latency high-throughput.

* Little random I/O

* Relying heavily on the OS pagecache for data storage

Performance-wise, Kafka can outshine some of the in-memory message storing message queues.

Source: http://kafka.apache.org/documentation.html#design

http://research.microsoft.com/en-us/um/people/srikanth/netdb...

shmerl · on March 12, 2014

That document acknowledges inherent JVM limitations:

> Java garbage collection becomes increasingly fiddly and slow as the in-heap data increases.

So they had to work around that. In my view it's not a tiny issue. I'd say, instead of working around such inherent limitations, it's better not to have them to begin with when making high performance systems. That was my main point above. Time spent dancing around such problems defeats the purpose of supposed easiness of development.

saltysugar · on March 12, 2014

You don't even bother to read past the section about the limitations that the engineers were well aware of in advance - they re-emphasize a common concern for people who question their choice of JVM.

The next line reads: "As a result of these factors using the filesystem and relying on pagecache is superior to maintaining an in-memory cache or other structure—we at least double the available cache..."

And if you don't know about pagecache, it's an in-memory cache managed by the OS and has nothing to do with JVM's memory at all.

And you forgot that C++ isn't the easiest language when it comes to designing a distributed system. Scala, as I mentioned, offers many other features that suit the needs of the team. Of course if you're a good engineer you'll know that there are trade-offs such as compiling time, but that's the same for every engineering decision.

shmerl · on March 12, 2014

They were aware and had to design around them. See my point above.

andrewflnr · on March 12, 2014

Designing around platform flaws can be worth it if there are commensurate benefits. You're very lucky if you've never had to do this, or perhaps just blind to the tradeoffs you were making.

nl · on March 12, 2014

That's like saying "we had to consider memory management" in a C++ system.

When you are designing a high performance system you have to consider everything. Different platforms have different tradeoffs, but the tradeoffs on the JVM have been well proven over time.

noelwelsh · on March 12, 2014

They didn't "work around it". Kafka was designed from the start to avoid memory pressure.

BTW, I've talked to finance people who have JVM applications that haven't had a major GC in months. Really, it's not a big deal.

kasey_junk · on March 12, 2014

Having written a few low latency systems a few of which are on the JVM, I will say that in all of those cases, allocating/freeing memory is always a slow down regardless of GC or not (cache coherence is almost always the deal breaker here). So in those systems, you simply do not allocate/deallocate along the critical path.

Is this difficult in Java? Yes. It's also difficult in C++. Just because it is difficult doesn't mean it is impossible.

So if I am resorting to managing my own memory anyway, why would I use the JVM? Because typically the code that is on the critical path is a small percentage of the entire code base, and the other advantages of the JVM (tooling, language features, libraries etc.) out way the downsides.

That's not always the case, and I don't have any specific knowledge of Kafka, but just because something needs to be low/consistent latency doesn't mean it can't (or shouldn't) be written on the JVM.

fauigerzigerk · on March 12, 2014

You are right that this can be a major problem with the JVM and working around it can be a lot of work. But you need to consider what kind of system we're talking about in this particular case.

This is a persistent message queue for log messages. Messages are coming in sequentially and subscribers read them sequentially. It makes zero sense to keep tons of messages in memory inside complex data structures as they are not indexed or searched or analyzed.

So in this particular case it's not actually a workaround. It's just sensible design and I wouldn't do it any differently in C++ either.

teraflop · on March 12, 2014

There are reasons why implementing a system in Java might be a questionable decision, but unless that system involves extremely intensive number-crunching or has hard real-time requirements, performance probably isn't one of them.

shmerl · on March 12, 2014

A messaging system should really be very attentive to performance. If it itself is a bottleneck, it's not possible to create anything that requires performance on top of it. And by performance I mean time constraints as well.

saltysugar · on March 12, 2014

Read their paper before making a claim that performance isn't important in Kafka:

http://research.microsoft.com/en-us/um/people/srikanth/netdb...

teraflop · on March 12, 2014

You're right, in principle. However, rewriting it in C++ instead of Java would maybe get you a factor of 2 performance improvement, and that's if it was completely CPU-bound, which isn't a reasonable assumption for a tool designed to manage large disk-resident datasets.

By "hard real-time" I was referring to latency, rather than throughput. Achieving very low latencies is difficult in Java because you don't know exactly when the GC will kick in, but nevertheless it's possible to get very high throughput.

MichaelGG · on March 12, 2014

But does Java allow easy management of memory, like .NET? From building a moderately high-performance packet-processing system in F#, apart from a few specific encoding parts where the machine code was subpar, the biggest hit to CPU seemed to be the GC. Removing a single allocation from a path provided a measurable improvement. Most of the gain was being able to allocate memory directly (1GB managed heap on top of 16GB+ unmanaged).

With Java, you don't have the unmanaged or struct support, so doesn't that really add up? If you go "native", isn't there significant overhead since you can't have pointers in Java (right - the bytecode doesn't support it?)?

People pull it off, but it seems that GC overhead would be a killer.

jorgeortiz85 · on March 12, 2014

Kafka uses Linux's zero-copy transfer to move bytes between the network and the disk without going through user-space, let alone the JVM.

There's still GC from objects allocated by Kafka in the JVM, but the actual message data doesn't even go through the JVM.

kasey_junk · on March 12, 2014

The standard way to do this is to go off heap using the Unsafe packages. Essentially you start handling your own memory, with all that entails.

aaronblohowiak · on March 12, 2014

Once you are saturating disk and network, who cares about efficient cpu usage?

shmerl · on March 12, 2014

As was discussed in this thread, garbage collection can be a major problem.

Xorlev · on March 13, 2014

Can be, but tends not to be for Kafka. We run using a CMS collector and never have pauses that are enough to care about.

t0mas88 · on March 12, 2014

Depending on the task Java is actually faster than C++ at times. Garbage Collection, when done right, can for example be faster than manual memory management. And a JIT compiler can do optimizations that a normal C++ compiler could only do with help from the developer.

There is a reason many of these kind of systems are written in JVM based languages. Examples: Hadoop and all of its siblings, Cassandra, Storm, Kafka. So either all of those people in successful projects make "questionable decisions" or your knowledge of Java/JVM performance has been outdated by the new developments of the past few years...

hatred · on March 12, 2014

I don't particularly agree with the reasons mentioned by you. Most of the existing systems are JVM based due to the excellent tooling/supporting libraries around Java and its ease of use. This enables you to focus more on building the system rather then focus on micro-optimizations around C++.

shmerl · on March 12, 2014

Not sure about the reason for Hadoop to be in Java. Google's map reduce system which inspired Hadoop was written in C++ for good reasons as well. Systems you mentioned are written in Java I assume, because it's easier for those projects to find developers. In the future this will change, with languages like Rust emerging as better C++ alternatives.

strlen · on March 12, 2014

Kafka is focused more on throughput rather than latency: if latency is critical, you should use a database rather than a message queue (may be with some caveats, e.g., for finance, telcos, or air traffic control; TIBCO et al cover that market well, however).

Kafka's use of the JVM does not really impact throughput. Very little portion of the messages' lifecycle is spent in the JVM heap, Kafka makes aggressive us of the OS page-cache and avoids copies (e.g., using sendfile() whenever possible).

I agree that, e.g., if Kafka had to make heavy use of in-process memory (this is appropriate for databases) as opposed to OS managed buffers, then a language without a garbage collector (or perhaps a garbage collected language that gave you an option not to generate garbage in the first place...) would help.

That said, there's other advantages to using C++, but they won't bring significant performance improvements -- although I suppose one could experiment with, e.g., using AIO but that might end up causing more harm than good in case of Kafka.

As the "slow compilation times" feature, scalac and sbt do a much better job of it than cmake and g++/clang :-)

nl · on March 12, 2014

What is this, 1999?

Some of the highest performance large scale systems in the world are written in Java. Notably, in the messaging market Oracle's MQSeries message bus and StreamBase (now TIBCO StreamBase) run on the JVM. Both are used in some of the largest messaging implementations in the world.

sporkster · on March 13, 2014

MQSeries is IBM's not Oracle. Near pervasive in finance, but not exactly high performance. IBM's low latency offering is called MQ LLM.

The high-perf messaging bus in the TIBCO lineup is FTL[1], not StreamBase which is a CEP engine.

[1] http://www.tibco.com/products/automation/enterprise-messagin...

nl · on March 14, 2014

Yes, you are right about MQSeries.

icedchai · on March 12, 2014

Why is it "questionable"? It seems mostly IO, not CPU bound. Java gives you platform independent access to memory mapped files and non-blocking I/O.

shmerl · on March 12, 2014

Garbage collection is a big downside since it introduces an implicit level of uncertainty and unpredictability performance wise. For intense I/O I'd also go with C++ over Java in general.

kushti · on March 12, 2014

It depends on memory usage patterns. In some cases Java's GC could be even faster than malloc/free. I could say more, it's meaningless to discuss such provocative topic as "performance" with no any details.

shmerl · on March 12, 2014

Of course it depends. My point is, it's too unpredictable, even if you assume certain patterns are better or worse. That's why systems with RAII are much better by design in this aspect.

samplonius · on March 12, 2014

This is just cargo cult thinking: Java is bad. GC is bad. C++ is not Java, so it is good.

The only thing to do, is get on your coconut radio and wait for the planes to bring you some cargo. While you are waiting, Kafka is a useful MQ for event processing and is being used to solve real world problems today.

oscargrouch · on March 13, 2014

He gives a technical based explanation, wich is pretty accurate.. and not a emotional biased answer..

shmerl · on March 12, 2014

You obviously didn't get the point.

pjmlp · on March 12, 2014

RAII semantics can be implemented in most GC languages nowadays with try/using/with/... Or with lambdas if the language allows for it.

RAII in C++ is a good pattern, but it does not avoid performance penalty of cascading release of resources, impossibility of using exceptions in destructors or thread contention with shared resources.

pjmlp · on March 12, 2014

C++ being fast is a matter of almost 30 years of compiler optimization improvements.

Java compilers are lacking a bit behind, but are quite comparable for distributed computing, up to the point I seldom use C++ on the job nowadays, other than when replacing existing systems with JVM/.NET ones.

For Java 9, there are already some features being discussed that will help Java compilers to improve code generation, like value types, better FFI and making unsafe an official package for those cases when there is no way around it.