> The C definition is that "undefined behavior" can have arbitrary concrete behavior, not that a compiler can assume it does not happen.
What is the difference between those? How does a compiler that assumes UB never happens violate the requirement that UB can have arbitrary concrete behavior? If we look at a simple example like optimizing "x + y > x" (signed arithmetic, y known to be positive) to "true" -- that will lead to some arbitrary concrete behavior of the program, so it seems covered by the definition.
I assume that what the original C authors meant was closer to "on signed integer overflow, non-deterministically pick some result from the following set", but that's not what they wrote in the standard... if you want to specify that something is non-deterministic, you need to spell out exactly what the set of possible choices are. Maybe for singed integer overflow one could infer this (though it really should be made explicit IMO), but C also says that the program has UB "by default" if it runs into a case not described by the standard, and there's just no way to infer a set of choices from that as far as I can see.
"arbitrary concrete behavior" means that at this point anything can happen on the real machine. This implies that everything before this point has to behave according to the specification. "is impossible" is stronger, as the whole program could behave erratically. But having partial correctness is important in a lot of scenarios and this is why we want to have it and in "UB" it is the former and not "impossible".
In the ISO C standard, we use "unspecified" for a non-deterministic choice among clearly specified alternatives. So this is well understood.
> "arbitrary concrete behavior" means that at this point anything can happen on the real machine. This implies that everything before this point has to behave according to the specification. "is impossible" is stronger, as the whole program could behave erratically. But having partial correctness is important in a lot of scenarios and this is why we want to have it and in "UB" it is the former and not "impossible".
So that rules out "time-traveling UB", but it would still permit optimizing "x+y < x" to "false" for non-negative y, right? I can't tell if you think that that is a legal transformation or not, and I'd be curious to know. :)
FWIW I agree we shouldn't let UB time-travel. We should say that all observable events until the point of UB must be preserved. AFAIK that is e.g. what CompCert does. But I would still describe that as "the compiler may assume that UB does not happen" (and CompCert makes use of that assumption for its optimizations), so I don't understand the distinction you are making.
> In the ISO C standard, we use "unspecified" for a non-deterministic choice among clearly specified alternatives. So this is well understood.
Quite a few places in the standard just say "result/behavior is unspecified", so the set of alternatives is often not very clear IMO. In particular, when it says that under some condition "the result is unspecified", and let's say the result has integer type, does that mean it non-deterministically picks some "normal" integer value, or can it be an "unspecified value" that behaves more like LLVM undef in that it is distinct from every "normal" value and can violate basic properties like "x == x"?
It's impossible in safe Rust (modulo compiler bugs and things like using a debugger to poke in the program's memory from the outside). That's the key difference.
Of course unsafe Rust is not memory safe. That's why it is called like that. :) Go has unsafe operations too (https://go.dev/ref/spec#Package_unsafe), and of course if you use those all bets are off. But as you will notice, my example doesn't use those operations.
> The rest is your bug; the variable values coming out of sync with each other, not maintaining the invariant among their values.
If the language and its runtime let me break their invariant, then that's their bug, not mine. This is the fundamental promise of type-safe languages: you can't accidentally break the language abstraction.
> It could be the case that a thread-unsafe program breaks a managed run-time, but not an unvarnished truth.
I demonstrated that the Go runtime is such a case, and I think that should be considered a memory safety violation. Not sure which part of that you disagree with...
(EDIT: removed the first part since I realized you were replying to some comment further up, not my example.)
> Rust has, unfortunately, changed the narrative so that people now believe memory safety is a property of the language, when it is one of the implementation.
I am not sure I agree with that (the concept of memory-safe languages looong predates Rust), but you can just define a memory-safe language as one where all conforming implementations are memory-safe -- making it a feature of the language itself, not just a feature of a particular implementation.
See https://www.youtube.com/watch?v=QrrH2lcl9ew for a a presentation of Google's study, which found no measurable difference in productivity between teams using Rust vs Go.
> it met one common definition of "memory safety", which was essentially "have a garbage collector"
This is the first time I hear that being suggested as ever having been the definition of memory safety. Do you have a source for this?
Given that except for Go every single language gets this right (to my knowledge), I am kind of doubtful that this is a consequence of the term changing its meaning.
True, "have a garbage collector" was never the formal definition, it was more "automatic memory management". But this predates the work on Rust's ownership system and while there were theories of static automatic memory management, all practical examples of automatic memory management were some form of garbage collection.
If you go to the original 2009 announcement presentation for Go [1], not only is "memory-safety" listed as a primary goal, but Pike provides the definition of memory-safe that they are using, which is:
"The program should not be able to derive a bad address and just use it"
Which Go mostly achieves with a combination of garbage collection and not allowing pointer arithmetic.
The source of Go's failure is concurrency, which has a knock-on effect that invalidates memory safety. Note that stated goal from 2009 is "good support for concurrency", not "concurrent-safe".
Thanks! I added a reference to that in the blog post.
Interestingly, in 2012 Rob Pike explicitly said that Go is "not purely memory safe" because "sharing is legal": https://go.dev/talks/2012/splash.slide#49. However it is not entirely clear what he means by that (I was not able to find a recording of the talk), but it seems likely he's referring to this very issue.
> "The program should not be able to derive a bad address and just use it"
My example does exactly that, so -- as you say, Go mostly achieves this, but not entirely.
> Note that stated goal from 2009 is "good support for concurrency", not "concurrent-safe".
My argument is that being concurrency-unsafe implies being memory-unsafe, for the reasons laid down in the blog post. I understand that that is a somewhat controversial opinion. :)
Hey! Cards on the table I'm not in love w/ your post, but mostly I'm curious about what discussion or outcome you were hoping for with it. Doesn't this boil down to "bad things will happen if you have data races in your code, so don't have data races in your code". Does it really matter what those bad things are?
Can you point me to the Go definition of memory safety? I searched all over their website, and couldn't find any.
(But also, it'd be kind of silly for every language to make up their own definition of memory safety. Then even C is memory safe, they just have to define it the right way. ;)
A data race is defined as a write to a memory location happening concurrently with another read or write to that same location, unless all the accesses involved are atomic data accesses as provided by the sync/atomic package.
Which describes exactly what is happening in the OP's program:
func repeat_get() {
for {
x := globalVar // <-- unsynchronized read of globalVar
x.get() // <-- unsynchronized call to Thing.get()
}
}
By itself this isn't a problem, these are just reads, and you don't need synchronization for concurrent reads by themself. The problem is introduced here:
func repeat_swap() {
var myval = 0
for {
globalVar = &Ptr { val: &myval } // <-- unsynchronized write to globalVar
globalVar = &Int { val: 42 } // <-- unsynchronized write to globalVar
}
}
func main() {
go repeat_get() // <-- one goroutine is doing unsynchronized reads
repeat_swap() // <-- another goroutine is doing unsynchronized writes
}
Just a (chef's kiss) textbook example of a data race, and a clearly unsound Go program. I don't know how or why the OP believes "this program ... [is] according to Wikipedia memory-safe" -- it very clearly is not.
But, you know, I think everyone here is basically talking past each other.
> There is plenty of undefined behavior that can't lead to violating memory safety. For example, in many languages, argument evaluation order is undefined. If you have some code like:
You are mixing up non-determinism and UB. Sadly that's a common misunderstanding.
Java also sometimes uses "memory safe" to refer to programs that don't have null pointer exceptions. So in that sense, Java isn't memory safe by construction either.
These terms are used slightly differently by different communities, which is why I discuss this point in the article. But you seem adamant that you have the sole authority for defining these terms so :shrug:
When those US government articles about how we should switch to memory safe languages come out, they refer to Java as a “memory safe language”.
They also count data race freedom as part of memory safety, which I think is wrong (and contradicts their inclusion of Java and even Go in the list of memory safe languages).
So no, I’m not an authority. I’m just following the general trend of how the term is used.
And ive never heard “memory safe” used in relation to not having null pointer exceptions. That’s a new one and sounds nonsensical, frankly
> They also count data race freedom as part of memory safety, which I think is wrong (and contradicts their inclusion of Java and even Go in the list of memory safe languages).
For Java, there's no contradiction if you define data race freedom as "data races cannot cause arbitrary memory corruption / UB".
> And ive never heard “memory safe” used in relation to not having null pointer exceptions. That’s a new one and sounds nonsensical, frankly
I was also surprised, but it's what I was told by people working on verification of Java programs. And you can see e.g. at https://link.springer.com/content/pdf/10.1007/978-3-030-1750... that people are proving memory safety of Java programs, which would not make sense at all if all Java programs are memory safe by construction.
> Curiously, Go itself is unclear about its memory safety on go.dev.
Yeah... I was actually surprised by that when I did the research for the article. I had to go to Wikipedia to find a reference for "Go is considered memory-safe".
Maybe they didn't think much about it, or maybe they enjoy the ambiguity. IMO it'd be more honest to just clearly state this. I don't mind Go making different trade-offs than my favorite language, but I do mind them not being upfront about the consequences of their choices.
What is the difference between those? How does a compiler that assumes UB never happens violate the requirement that UB can have arbitrary concrete behavior? If we look at a simple example like optimizing "x + y > x" (signed arithmetic, y known to be positive) to "true" -- that will lead to some arbitrary concrete behavior of the program, so it seems covered by the definition.
I assume that what the original C authors meant was closer to "on signed integer overflow, non-deterministically pick some result from the following set", but that's not what they wrote in the standard... if you want to specify that something is non-deterministic, you need to spell out exactly what the set of possible choices are. Maybe for singed integer overflow one could infer this (though it really should be made explicit IMO), but C also says that the program has UB "by default" if it runs into a case not described by the standard, and there's just no way to infer a set of choices from that as far as I can see.
reply