Hacker News new | past | comments | ask | show | jobs | submit login

> Nothing was surprising, we got exactly what we expected.

Such a satisfying feeling in the engineering world.

> We noticed Cassandra was running 10 second “stop-the-world” GC constantly but we had no idea why.

This makes me very thankful for the work that the Go team has put into the go GC.

> In the scenario that a user edits a message at the same time as another user deletes the same message, we ended up with a row that was missing all the data except the primary key and the text since all Cassandra writes are upserts.

Does cassandra not offer a mechanism to do a conditional update? I'd expect to be able to submit a upsert that fails if the row isn't present, or has a `deleted = true` field, or something to that effect.




> This makes me very thankful for the work that the Go team has put into the go GC.

If you read that part of the piece again, the problem with the GC wasn't necessarily with Java's GC implementation but with Cassandra's storage architecture combined with a mass delete, which tombstoned millions of records and then Cassandra had to scan them and pass them over: "Cassandra had to effectively scan millions of message tombstones (generating garbage faster than the JVM could collect it)"

Even if you had an incremental GC, you'd be eating CPU and thrashing the cache like crazy doing that. And you may never catch up.

Actually even if you didn't have a GC language, you'd likely still have scan or free operations.

Writing databases is hard. And actually the various JVM's GCs are top knotch and have had hundreds of thousands of engineering hours put into perfecting them for various workloads.

Still, these days I'm glad not to be near it and I work in Rust or C++.


I'm not discounting that the GC had an overwhelming amount of work to do.

My comment wasn't intended to convey that the go gc is somehow more efficient, but rather that I'm thankful for the tradeoffs they've made prioritizing latency and responsiveness, and minimizing the STW phase. I can see how I didn't give enough context for it to be interpreted correctly.

It's been my experience that with larger STW pauses, you wind up with other effects -- to the outside observer it's impossible to tell if the service is even available or not. You could argue that if you're thrashing memory that hard, no, it's not available. In general though, it makes for higher variance in latency distributions.

> Writing databases is hard. And actually the various JVM's GCs are top knotch and have had hundreds of thousands of engineering hours put into perfecting them for various workloads.

no doubt on both accounts. again, thankful for the design tradeoffs.

> Still, these days I'm glad not to be near it and I work in Rust or C++.

For anything that really needs to be very consistent, or have full control execution, I don't blame you. Cassandra is a prime example of what happens when you don't have that full control.


I feel like GP is talking about GC at the database level, not the language level.

IE something like RocksDB - written in C++ - still has a compaction phase.


I think they're actually talking about both at once.

Cassandra definitely has compaction (similar LSM storage as RocksDB), and it causes pauses.

It's been over 10 years since I touched Cassandra, but when I did there was often a royal dumpster fire when compaction & JVM GC ganged up to make a big mess.


"Conditional update" requires consensus to be fault tolerant, and regular Cassandra operations don't use consensus. Cassandra has a feature called light-weight transactions which can sort of get you there, and they use Paxos to do it. It unfortunately has a history of bugs that invalidate the guarantees you'd want from this feature, but more modern versions of Cassandra are improving in this regard.


Yes. ScyllaDB also has LWTs based on Paxos. More here: https://www.scylladb.com/2020/07/15/getting-the-most-out-of-...

[Disclosure: I work for ScyllaDB]


> This makes me very thankful for the work that the Go team has put into the go GC.

Has the Go team hit on some GC algorithm that has escaped the notice of the entire compiler-creating world since 1995? Not to disparage Go at all, it's got brilliant people working on it, but you could boil the oceans at this point with the effort from the brilliant folks who have been working on JVM garbage collection for the last 20+ years.


The Go team has members with decades of experience writing garbage collection algorithms, including on the JVM, so it's not starting from scratch.

Go has value types and stack allocation that allow it to create _far_ less garbage than Java and means its heap objects violate the generational hypothesis-- because shortlived objects tend to stay on the stack. This means that the Go GC can comfortably operate with far less GC throughput than typical Java programs.

https://go.dev/blog/ismmkeynote


Java has done escape analysis and scalar replacement since 6.


> Has the Go team hit on some GC algorithm that has escaped the notice of the entire compiler-creating world since 1995?

It's like the saying about recycling: "1. Reduce 2. Reuse 3. Recycle". Just like in real life, it's far better to never have to go to all the way down to #3. If you never generate garbage in the first place, you won't have to waste resources cleaning it up. So it's less about how good the actual Go GC is and more that the language itself greatly discourages wasting memory via aggressive stack allocation, value types, and an allocation-conscious culture.


It actually has very poor throughout when compared to almost any JVM GC. This is mitigated somewhat by the commonality of value types.

The major advantage is mostly predictably short pauses. For web services it’s a reasonable trade off, but it can be a problem that the GC can’t be tuned for something else.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: