Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That document acknowledges inherent JVM limitations:

> Java garbage collection becomes increasingly fiddly and slow as the in-heap data increases.

So they had to work around that. In my view it's not a tiny issue. I'd say, instead of working around such inherent limitations, it's better not to have them to begin with when making high performance systems. That was my main point above. Time spent dancing around such problems defeats the purpose of supposed easiness of development.



You don't even bother to read past the section about the limitations that the engineers were well aware of in advance - they re-emphasize a common concern for people who question their choice of JVM.

The next line reads: "As a result of these factors using the filesystem and relying on pagecache is superior to maintaining an in-memory cache or other structure—we at least double the available cache..."

And if you don't know about pagecache, it's an in-memory cache managed by the OS and has nothing to do with JVM's memory at all.

And you forgot that C++ isn't the easiest language when it comes to designing a distributed system. Scala, as I mentioned, offers many other features that suit the needs of the team. Of course if you're a good engineer you'll know that there are trade-offs such as compiling time, but that's the same for every engineering decision.


They were aware and had to design around them. See my point above.


Designing around platform flaws can be worth it if there are commensurate benefits. You're very lucky if you've never had to do this, or perhaps just blind to the tradeoffs you were making.


That's like saying "we had to consider memory management" in a C++ system.

When you are designing a high performance system you have to consider everything. Different platforms have different tradeoffs, but the tradeoffs on the JVM have been well proven over time.


They didn't "work around it". Kafka was designed from the start to avoid memory pressure.

BTW, I've talked to finance people who have JVM applications that haven't had a major GC in months. Really, it's not a big deal.


Having written a few low latency systems a few of which are on the JVM, I will say that in all of those cases, allocating/freeing memory is always a slow down regardless of GC or not (cache coherence is almost always the deal breaker here). So in those systems, you simply do not allocate/deallocate along the critical path.

Is this difficult in Java? Yes. It's also difficult in C++. Just because it is difficult doesn't mean it is impossible.

So if I am resorting to managing my own memory anyway, why would I use the JVM? Because typically the code that is on the critical path is a small percentage of the entire code base, and the other advantages of the JVM (tooling, language features, libraries etc.) out way the downsides.

That's not always the case, and I don't have any specific knowledge of Kafka, but just because something needs to be low/consistent latency doesn't mean it can't (or shouldn't) be written on the JVM.


You are right that this can be a major problem with the JVM and working around it can be a lot of work. But you need to consider what kind of system we're talking about in this particular case.

This is a persistent message queue for log messages. Messages are coming in sequentially and subscribers read them sequentially. It makes zero sense to keep tons of messages in memory inside complex data structures as they are not indexed or searched or analyzed.

So in this particular case it's not actually a workaround. It's just sensible design and I wouldn't do it any differently in C++ either.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: