There are _so many_ bugs in this code. One example among many: https://github.co...

kiitos · on March 28, 2025

https://github.com/DiceDB/dice/pull/1588

This PR attempted to fix the memory model violation I mentioned in the parent comment, but also added in an extra change that swapped the sync.Mutex to a sync.RWMutex. The PR description claimed 2 benefits: "Eliminates the data race, ensuring thread safety" -- correct! at least to some level; but also "Improves performance by allowing concurrent ExpandID calls, which is likely a common operation" -- which is totally unsubstantiated, and very likely false, as RWMutex is only faster than a regular Mutex under very narrowly-defined load patterns.

In any case, the PR had no kind of test or benchmark to validate either of these claims, so not a great start by the author. But then a maintainer chimed in with a comment that expressed concerns about edge-condition performance details, without any kind of data or evidence, and apparently didn't care about (or know about?) the much more important fixes that the PR made re: data races.

https://github.com/DiceDB/dice/pull/1588#issuecomment-274521...

> I tried changing this, but I did not see any benefit in benchmark numbers.

No apparent understanding of the bugs in this code, nor how changes may or may not fix those bugs, nor really how performance is defined or can be meaningfully evaluated.

Again, hobby project or whatever, all good. But the authors and maintainers of this project are clearly, demonstrably, in over their heads on this one.

senderista · on March 17, 2025

Haven't looked at the code, but enforcing mutual exclusion between writers but not readers can make sense for a single-writer lock-free algorithm.

ignoramous · on March 17, 2025

> single-writer lock-free algorithm

I understand the need for correct lock-free impls: Given OP's description, simply avoiding read mutexes can't be the way to go about it?

nebulous1 · on March 17, 2025

I don't use Go.

https://go.dev/ref/mem

If I'm reading this correctly, they are recommending a lock in this situation. However, they are saying the implementations has two options, either raise an error reporting the race (if the implementation is told to do so), or, because the value being read is not larger than a machine word, reply to the read with a correct value from a previous write. If true then it cannot reply with corrupted data.

kiitos · on March 17, 2025

> However, they are saying the implementations has two options, either raise an error reporting the race (if the implementation is told to do so), or, because the value being read is not larger than a machine word, reply to the read with a correct value from a previous write.

The spec says

> A read r of a memory location x holding a value that is not larger than a machine word must observe some write w such that r does not happen before w and there is no write w' such that w happens before w' and w' happens before r. That is, each read must observe a value written by a preceding or concurrent write.

These rules apply only if the value isn't larger than a machine word. Otherwise,

> Reads of memory locations larger than a single machine word ... can lead to inconsistent values not corresponding to a single write.

The size of a machine word is different depending on how a program is compiled, so whether or not a value is larger than a machine word isn't know-able by the program itself.

And even if you can assert that your program will only be built where a machine word is always at least of size e.g. uint64, the spec only guarantees that unsynchronized reads of a uint64 will return some previous valid write, it doesn't guarantee anything about which value is returned. So `x=1; x=3; x=2;` concurrently with `print(x); print(x); print(x)` can print `1 1 1` or `3 3 3` or `2 1 1` or `3 2 1` and so on. It won't return a corrupted uint64, but it can return any prior uint64, which is still a data race, and almost certainly useless to the application.

nebulous1 · on March 17, 2025

Thanks. So the structure in the OP is an array of uint32s.

> that unsynchronized reads of a uint64 will return some previous valid write, it doesn't guarantee anything about which value is returned

Your the second person saying this, so is my interpretation that this is dissallowed by the part that you quoted incorrect?

> must observe some write w such that r does not happen before w and there is no write w' such that w happens before w' and w' happens before r

edit: somebody is answering this below by the way

fashion-at-cost · on March 17, 2025

The goalposts have been moved. The claim is that this pattern isn’t suitable for production code. The ground truth is that a compliant Go implementation may elect to: crash; read the first value ever set to the variable for the entire lifetime of the program; or behave completely as you’d expect from a single core interleaved execution order. The first is an opt-in, the latter two are up to the whims of the runtime and an implementation may alternate between them at any point.

Is that the kind of uncertainty you want in your production systems? Or is your only requirement that they don’t serve “corrupt” data?

Don’t be “clever”. Use locks.

nebulous1 · on March 17, 2025

I don't disagree, but that's not the claim I was replying to. The question I was asking about was

> I understand the need for correct lock-free impls: Given OP's description, simply avoiding read mutexes can't be the way to go about it?

I did note that the documentation recommends a lock.

> read the first value ever set to the variable for the entire lifetime of the program

That is not my reading of the current memory model? It seems to specifically prohibit this behaviour in requirement 3:

> 2. w does not happen before any other write w' (to x) that happens before r.

fashion-at-cost · on March 17, 2025

In this context, ”happens before” is not a wall-clock colloquialism but in fact a term of art that is specifically described as:

> The happens before relation is defined as the transitive closure of the union of the sequenced before and synchronized before relations.

Without synchronization, the degenerate sequencing is perfectly valid.

That’s the problem with being “clever” - you miss a definition and your entire mental model is busted.

nebulous1 · on March 17, 2025

So, in the situation in the comment OP, with sychronized writes and and unsynchronized reads, what is this "happens before" stipulation prohibiting?

fashion-at-cost · on March 17, 2025

A single reader thread cannot read a value written by a write, then later read a value written by a write that happens before the first write.

nebulous1 · on March 17, 2025

Thanks!

kiitos · on March 17, 2025

Yep. And even if you were to lock down the implementation of the compiler, the version of Go you're using, the specific set of hardware and OS that you build on and deploy to, and so on -- that still doesn't indemnify you against arbitrary or unexpected behavior, if your code violates the memory model!

senderista · on March 17, 2025

Oh, so cycleMap is a non-threadsafe structure? I don't know golang so I didn't realize this.

kiitos · on March 17, 2025

Nothing in Go is thread-safe, unless explicitly documented otherwise. Some examples of explicitly-documented-otherwise stuff are in package sync and package sync/atomic.

cycleMap is definitely not thread-safe. The authors knew this, to some extent, because they synchronized writes via an adjacent mutex. But they didn't synchronize reads thru the same mutex, which is the issue.

senderista · on March 18, 2025

OK, this doesn't inspire confidence then.