TL;DR: "Kafka’s replication claimed to be CA, but in the presence of a partition, threw away an arbitrarily large volume of committed writes. It claimed tolerance to F-1 failures, but a single node could cause catastrophe."
"we can tolerate N-1 Kafka node failures, but only N/2-1 Zookeeper failures. This actually is sensible, though, as Kafka node count scales with data size but Zookeeper node count doesn’t. So we would commonly have five Zookeeper replicas but only replication factor 3 within Kafka (even if we have many, many Kafka servers—data in Kafka is partitioned so not all nodes are identical)."
I think I see what he's trying to say, but there's a letter that isn't included in the adjective "CA", and that letter is "P". How could it be otherwise? Indeed, from TFA: "Kafka can do this because LinkedIn’s brokers run in a datacenter, where partitions are rare."
You're correct, the system does not advertise partition tolerance, and @Aphyr does indeed confirm that the system is not partition tolerant.
Broadly, I think sacrificing P is pretty deeply misguided. If you run a distributed system at any scale for any period of time, you're going to experience network partitions, even within a single DC. (I speak from experience: I work on a distributed system which runs a whole bunch of machines in a whole bunch of datacentres worldwide. We see a reasonable number of non-trivial within-DC network partitions every year.)
Partitions are != network partitions. Processes on the same node can become partitioned, threads in the same process can become partitioned. Actors within threads can become partitioned from themselves or other actors.
In my experience network partitions are not the most common kind of partition although they are well represented, especially on some network topologies.
IMO you should never assume that any two things won't be partitioned as part of your expectation of correctness or availability.
Can you describe a partition of two actors within the same thread? I'd guess this would be in some sort of logic failure state, where either the consuming actor doesn't correctly receive its messages, or the producing actor doesn't correctly transmit.
Can Kafka be thought of as a non-hosted version of Kinesis? I thought Kinesis would be a good solution to dump logfile data to for processing and ingestion. Could you explain some technical reasons to use Kafka vs. Kinesis? Thanks.
I believe Kafka can still lose some data if all the active machines fail. It's a deliberate design decision (it's the right thing to do if you want to remain available and can tolerate some data data loss). I believe the Kafka team are working on it, but it's non-trivial to fix.
I believe the ability to configure this behavior is being tracked at the link below. It seems like it's a switch between consistency and availability. By default Kafka prefers availability, and the possible inconsistency results in data loss (because Kafka just discards some inconsistent data it can't resolve). But the JIRA linked below should make that behavior optional, so if a majority of machines fail the cluster will become unavailable rather than inconsistent.
Yes, Kinesis can be thought as a non-hosted version of Kafka. To me , using either of them is a cost versus benefits trade-off i.e. if you are willing to pay the cost of using Kinesis to get a hosted solution where-in the operational burden is greatly reduced or vice-versa.
One main advantage is that Kinesis is elastic -- it scales automatically based on load. Managing a Kafka cluster is an unnecessary task with Kinesis available, which alleviates quite a bit of headache.
Ehh - this is just my two cents with working on Kafka. When they say it's high performance, they really really mean. I have gotten very high throughputs on just 2 medium machines.
If you process that much data, Kafka is one of the last things which you'll need to scale out.
Using it with C++ is not really on par with using it with Java. Implementing the system itself in Java also looks very questionable, since supposedly performance should be very critical there.
Kafka is mostly in Scala (which runs on JVM). Scala provides many attractive features to build scalable and type-safe code. And about performance, Kafka is used of LinkedIn and it process hundreds of gigabytes of data, close to a billion messages per day (reported in their paper in 2011), and the engineers claim that they're processing terabytes of data a day now.
Not sure on what basis you claim the choice of language to be "questionable", but keep in mind that Scala's type-safety and many other features are much more difficult to achieve in C/C++. Cleaner code is sometimes more important than some tiny gain in performance.
Also in terms of scaling, Kafka cleverly takes advantage of many aspects in their design to ensure low-latency high-throughput.
* Little random I/O
* Relying heavily on the OS pagecache for data storage
Performance-wise, Kafka can outshine some of the in-memory message storing message queues.
That document acknowledges inherent JVM limitations:
> Java garbage collection becomes increasingly fiddly and slow as the in-heap data increases.
So they had to work around that. In my view it's not a tiny issue. I'd say, instead of working around such inherent limitations, it's better not to have them to begin with when making high performance systems. That was my main point above. Time spent dancing around such problems defeats the purpose of supposed easiness of development.
You don't even bother to read past the section about the limitations that the engineers were well aware of in advance - they re-emphasize a common concern for people who question their choice of JVM.
The next line reads:
"As a result of these factors using the filesystem and relying on pagecache is superior to maintaining an in-memory cache or other structure—we at least double the available cache..."
And if you don't know about pagecache, it's an in-memory cache managed by the OS and has nothing to do with JVM's memory at all.
And you forgot that C++ isn't the easiest language when it comes to designing a distributed system. Scala, as I mentioned, offers many other features that suit the needs of the team. Of course if you're a good engineer you'll know that there are trade-offs such as compiling time, but that's the same for every engineering decision.
Designing around platform flaws can be worth it if there are commensurate benefits. You're very lucky if you've never had to do this, or perhaps just blind to the tradeoffs you were making.
That's like saying "we had to consider memory management" in a C++ system.
When you are designing a high performance system you have to consider everything. Different platforms have different tradeoffs, but the tradeoffs on the JVM have been well proven over time.
Having written a few low latency systems a few of which are on the JVM, I will say that in all of those cases, allocating/freeing memory is always a slow down regardless of GC or not (cache coherence is almost always the deal breaker here). So in those systems, you simply do not allocate/deallocate along the critical path.
Is this difficult in Java? Yes. It's also difficult in C++. Just because it is difficult doesn't mean it is impossible.
So if I am resorting to managing my own memory anyway, why would I use the JVM? Because typically the code that is on the critical path is a small percentage of the entire code base, and the other advantages of the JVM (tooling, language features, libraries etc.) out way the downsides.
That's not always the case, and I don't have any specific knowledge of Kafka, but just because something needs to be low/consistent latency doesn't mean it can't (or shouldn't) be written on the JVM.
You are right that this can be a major problem with the JVM and working around it can be a lot of work. But you need to consider what kind of system we're talking about in this particular case.
This is a persistent message queue for log messages. Messages are coming in sequentially and subscribers read them sequentially. It makes zero sense to keep tons of messages in memory inside complex data structures as they are not indexed or searched or analyzed.
So in this particular case it's not actually a workaround. It's just sensible design and I wouldn't do it any differently in C++ either.
There are reasons why implementing a system in Java might be a questionable decision, but unless that system involves extremely intensive number-crunching or has hard real-time requirements, performance probably isn't one of them.
A messaging system should really be very attentive to performance. If it itself is a bottleneck, it's not possible to create anything that requires performance on top of it. And by performance I mean time constraints as well.
You're right, in principle. However, rewriting it in C++ instead of Java would maybe get you a factor of 2 performance improvement, and that's if it was completely CPU-bound, which isn't a reasonable assumption for a tool designed to manage large disk-resident datasets.
By "hard real-time" I was referring to latency, rather than throughput. Achieving very low latencies is difficult in Java because you don't know exactly when the GC will kick in, but nevertheless it's possible to get very high throughput.
But does Java allow easy management of memory, like .NET? From building a moderately high-performance packet-processing system in F#, apart from a few specific encoding parts where the machine code was subpar, the biggest hit to CPU seemed to be the GC. Removing a single allocation from a path provided a measurable improvement. Most of the gain was being able to allocate memory directly (1GB managed heap on top of 16GB+ unmanaged).
With Java, you don't have the unmanaged or struct support, so doesn't that really add up? If you go "native", isn't there significant overhead since you can't have pointers in Java (right - the bytecode doesn't support it?)?
People pull it off, but it seems that GC overhead would be a killer.
Depending on the task Java is actually faster than C++ at times. Garbage Collection, when done right, can for example be faster than manual memory management. And a JIT compiler can do optimizations that a normal C++ compiler could only do with help from the developer.
There is a reason many of these kind of systems are written in JVM based languages. Examples: Hadoop and all of its siblings, Cassandra, Storm, Kafka. So either all of those people in successful projects make "questionable decisions" or your knowledge of Java/JVM performance has been outdated by the new developments of the past few years...
I don't particularly agree with the reasons mentioned by you. Most of the existing systems are JVM based due to the excellent tooling/supporting libraries around Java and its ease of use. This enables you to focus more on building the system rather then focus on micro-optimizations around C++.
Not sure about the reason for Hadoop to be in Java. Google's map reduce system which inspired Hadoop was written in C++ for good reasons as well. Systems you mentioned are written in Java I assume, because it's easier for those projects to find developers. In the future this will change, with languages like Rust emerging as better C++ alternatives.
Kafka is focused more on throughput rather than latency: if latency is critical, you should use a database rather than a message queue (may be with some caveats, e.g., for finance, telcos, or air traffic control; TIBCO et al cover that market well, however).
Kafka's use of the JVM does not really impact throughput. Very little portion of the messages' lifecycle is spent in the JVM heap, Kafka makes aggressive us of the OS page-cache and avoids copies (e.g., using sendfile() whenever possible).
I agree that, e.g., if Kafka had to make heavy use of in-process memory (this is appropriate for databases) as opposed to OS managed buffers, then a language without a garbage collector (or perhaps a garbage collected language that gave you an option not to generate garbage in the first place...) would help.
That said, there's other advantages to using C++, but they won't bring significant performance improvements -- although I suppose one could experiment with, e.g., using AIO but that might end up causing more harm than good in case of Kafka.
As the "slow compilation times" feature, scalac and sbt do a much better job of it than cmake and g++/clang :-)
Some of the highest performance large scale systems in the world are written in Java. Notably, in the messaging market Oracle's MQSeries message bus and StreamBase (now TIBCO StreamBase) run on the JVM. Both are used in some of the largest messaging implementations in the world.
Garbage collection is a big downside since it introduces an implicit level of uncertainty and unpredictability performance wise. For intense I/O I'd also go with C++ over Java in general.
It depends on memory usage patterns. In some cases Java's GC could be even faster than malloc/free. I could say more, it's meaningless to discuss such provocative topic as "performance" with no any details.
Of course it depends. My point is, it's too unpredictable, even if you assume certain patterns are better or worse. That's why systems with RAII are much better by design in this aspect.
This is just cargo cult thinking: Java is bad. GC is bad. C++ is not Java, so it is good.
The only thing to do, is get on your coconut radio and wait for the planes to bring you some cargo. While you are waiting, Kafka is a useful MQ for event processing and is being used to solve real world problems today.
RAII semantics can be implemented in most GC languages nowadays with try/using/with/... Or with lambdas if the language allows for it.
RAII in C++ is a good pattern, but it does not avoid performance penalty of cascading release of resources, impossibility of using exceptions in destructors or thread contention with shared resources.
C++ being fast is a matter of almost 30 years of compiler optimization improvements.
Java compilers are lacking a bit behind, but are quite comparable for distributed computing, up to the point I seldom use C++ on the job nowadays, other than when replacing existing systems with JVM/.NET ones.
For Java 9, there are already some features being discussed that will help Java compilers to improve code generation, like value types, better FFI and making unsafe an official package for those cases when there is no way around it.
[1]: http://aphyr.com/posts/293-call-me-maybe-kafka