https://github.com/DiceDB/dice/blob/0e241a9ca253f17b4d364cdf... defines func ExpandID, which reads from cycleMap without locking the package-global mutex; and func NextID, which writes to cycleMap under a lock of the package-global mutex. So writes are synchronized, but only between each other, and not with reads, so concurrent calls to ExpandID and NextID would race.
This is all fine as a hobby project or whatever, but very far from any kind of production-capable system.
This PR attempted to fix the memory model violation I mentioned in the parent comment, but also added in an extra change that swapped the sync.Mutex to a sync.RWMutex. The PR description claimed 2 benefits: "Eliminates the data race, ensuring thread safety" -- correct! at least to some level; but also "Improves performance by allowing concurrent ExpandID calls, which is likely a common operation" -- which is totally unsubstantiated, and very likely false, as RWMutex is only faster than a regular Mutex under very narrowly-defined load patterns.
In any case, the PR had no kind of test or benchmark to validate either of these claims, so not a great start by the author. But then a maintainer chimed in with a comment that expressed concerns about edge-condition performance details, without any kind of data or evidence, and apparently didn't care about (or know about?) the much more important fixes that the PR made re: data races.
> I tried changing this, but I did not see any benefit in benchmark numbers.
No apparent understanding of the bugs in this code, nor how changes may or may not fix those bugs, nor really how performance is defined or can be meaningfully evaluated.
Again, hobby project or whatever, all good. But the authors and maintainers of this project are clearly, demonstrably, in over their heads on this one.
If I'm reading this correctly, they are recommending a lock in this situation. However, they are saying the implementations has two options, either raise an error reporting the race (if the implementation is told to do so), or, because the value being read is not larger than a machine word, reply to the read with a correct value from a previous write. If true then it cannot reply with corrupted data.
> However, they are saying the implementations has two options, either raise an error reporting the race (if the implementation is told to do so), or, because the value being read is not larger than a machine word, reply to the read with a correct value from a previous write.
The spec says
> A read r of a memory location x holding a value that is not larger than a machine word must observe some write w such that r does not happen before w and there is no write w' such that w happens before w' and w' happens before r. That is, each read must observe a value written by a preceding or concurrent write.
These rules apply only if the value isn't larger than a machine word. Otherwise,
> Reads of memory locations larger than a single machine word ... can lead to inconsistent values not corresponding to a single write.
The size of a machine word is different depending on how a program is compiled, so whether or not a value is larger than a machine word isn't know-able by the program itself.
And even if you can assert that your program will only be built where a machine word is always at least of size e.g. uint64, the spec only guarantees that unsynchronized reads of a uint64 will return some previous valid write, it doesn't guarantee anything about which value is returned. So `x=1; x=3; x=2;` concurrently with `print(x); print(x); print(x)` can print `1 1 1` or `3 3 3` or `2 1 1` or `3 2 1` and so on. It won't return a corrupted uint64, but it can return any prior uint64, which is still a data race, and almost certainly useless to the application.
The goalposts have been moved. The claim is that this pattern isn’t suitable for production code. The ground truth is that a compliant Go implementation may elect to: crash; read the first value ever set to the variable for the entire lifetime of the program; or behave completely as you’d expect from a single core interleaved execution order. The first is an opt-in, the latter two are up to the whims of the runtime and an implementation may alternate between them at any point.
Is that the kind of uncertainty you want in your production systems? Or is your only requirement that they don’t serve “corrupt” data?
Yep. And even if you were to lock down the implementation of the compiler, the version of Go you're using, the specific set of hardware and OS that you build on and deploy to, and so on -- that still doesn't indemnify you against arbitrary or unexpected behavior, if your code violates the memory model!
Nothing in Go is thread-safe, unless explicitly documented otherwise. Some examples of explicitly-documented-otherwise stuff are in package sync and package sync/atomic.
cycleMap is definitely not thread-safe. The authors knew this, to some extent, because they synchronized writes via an adjacent mutex. But they didn't synchronize reads thru the same mutex, which is the issue.
One example among many:
https://github.com/DiceDB/dice/blob/0e241a9ca253f17b4d364cdf... defines func ExpandID, which reads from cycleMap without locking the package-global mutex; and func NextID, which writes to cycleMap under a lock of the package-global mutex. So writes are synchronized, but only between each other, and not with reads, so concurrent calls to ExpandID and NextID would race.
This is all fine as a hobby project or whatever, but very far from any kind of production-capable system.