The same problem frequently bites me in Python too:
functions = []
for i in range(10):
functions.append(lambda: print(f'Hello {i}'))
for fn in functions:
fn()
# Hello 9
# Hello 9
# Hello 9
# Hello 9
# Hello 9
# Hello 9
# Hello 9
# Hello 9
# Hello 9
# Hello 9
I find it one of Python's biggest warts because it's silent, hard to troubleshoot (especially the first time!), and like in Go the most straightforward fix looks like a mistake (i=i):
for i in range(10):
functions.append(lambda i=i: print(f'Hello {i}'))
I guess this is a lesson in designing language semantics that match people's intuitions, and learning from previous languages' mistakes.
I've always referenced this as one of the subtle points of genius for how Java did lambdas:
by requiring captured variables to be final it removes a lot of ambiguity around what a variable name refers to. I like that local variables can only be changed locally. If you do want crosstalk between the inner and outer scopes you have to be more explicit and introduce a reference to talk through.
I love python, but I basically avoid this construction and use a single element list if I need it. I can never remember exactly how it works.
I found this interesting because I have never run into this before...and I thought that was rare (in relation to how comfortable I am with Python). That said, I personally think this is the right and intuitive outcome.
If we expand out the loop to be manual we would have a script like:
functions = []
# expanded loop
i = 1
functions.append(lambda: print(f"Hello {i}"))
i = 2
functions.append(lambda: print(f"Hello {i}"))
...
i = 9
functions.append(lambda: print(f"Hello {i}"))
Now at this point if we were to:
print(f"Hello {i}")
What would the expected output be? I would posit that anything other than "Hello 9" would be wrong, both logically and intuitively.
So by extension a loop of effectively "print(f"Hello {i}")" 9 times should just print "Hello 9" 9 times IMO. Anything else is counter-intuitive and definitely surprising.
I highly prefer lexical scoping, where the variables are bound to the block they were declared in, like Javascript's `let` vs the old `var`. This avoids shadowing and general namespace pollution.
I know it's not how Python operates, but I think it's how it should. Though I'd argue the syntactical similarities between functions and loops nudge users towards this second model.
Note that the way you write this is the best way to form closures in Python: form them inside a function with the closure variables as function parameters. This forces the closure to capture the current value of the closure variables when the closure is formed. Note that you could put the calls to the _loop function in a for loop and things would still work:
>>> functions = []
>>> def _loop(i):
... functions.append(lambda: print(f"Hello {i}"))
...
>>> for i in range(10):
... _loop(i)
...
>>> for f in functions:
... f()
...
Hello 0
Hello 1
Hello 2
Hello 3
Hello 4
Hello 5
Hello 6
Hello 7
Hello 8
Hello 9
> lexical scoping, where the variables are bound to the block they were declared in
In Python that would only work if a "block" included comprehensions. For example:
>>> functions = [lambda: print(f"Hello {i}") for i in range(10)]
>>> for f in functions:
... f()
...
Hello 9
Hello 9
Hello 9
Hello 9
Hello 9
Hello 9
Hello 9
Hello 9
Hello 9
Hello 9
You could fix this by defining a _loop function as above and forming the closure inside it; but changing variable scoping to be lexical in "blocks" wouldn't fix this case unless the list comprehension itself counted as a "block", which is not how Python defines blocks.
Wow. When I saw the original for-loop example I was like "ok not a huge deal, doesn't surprise me too much really". But rewritten in the comprehension form I'm like wtf
Well put. I too prefer the reasonability of lexical scoping. Although I must confess I am not well versed enough to quite grasp how what Python does _isn't_ conforming to lexical scope.
Since I was confused I did some internet searching and found this:
Which I think suggests the thing you are finding confusing isn't lexical scoping in Python, but rather the environment mutability.
Ultimately, I think I better understand what you are highlighting and can't say I entirely disagree. I just personally find how it is today ergonomic, but that could also just be bias as I am fairly comfortable in Python (warts and all).
I woudl say taht Python has scoping, yes, but it does not have lexical scoping in any sense of "lexical scoping" that I am aware of. If it did, the code below would not actually work ,as the outside-the-loop print would be trying to access a variable not available in the lexical scope of "the function body", as it is defined and established within the lexical scope of "the loop".
So, at least in my book, no, Python has "global scope", "function scope" and probably one or two more scopes (I think there's a "class scope" as well).
Here's some code in Python, and some equivalent code in Go.
def foo(a_list):
print(f"list is {len(a_list} elements")
for element in a_list:
print(element)
print(element)
And here's the equivalent Go code:
func foo(aList []int) { // Let's use ints...
fmt.Printf("list is %d elements\n", len(aList))
var element int // Notice this declaration! This is ensuring that element is declared outside the lexical scope of the for loop
for _, element = range aList {
fmt.Println(element)
}
fmt.Println(element)
}
Which, now that I have done enough reading I think crystallizes the confusing thing for others being highlighted here (for me). Depending on preference, the lack of block scoping can be surprising for someone. Which also explains my bias, I started with Python which probably plays a large part in why I find function level scoping without block scoping ergonomic.
I think it comes down to if you think it makes sense for variables to be captured by value or by reference in closures.
Which ever makes the most sense depends on the context of a program which is why in languages that have both reference and closure semantics you can choose how the variables are captured. When that isn't the case you need to pick for someone, and it gets weird.
> I think it comes down to if you think it makes sense for variables to be captured by value or by reference in closures.
No, this comes down to “should a loop control variable be scoped to the block—or in python’s case function—the loop is in and updated with each iteration or a fresh variable scoped to each loop iteration that happens to share the same name.”
In the "the variable lives in the function scope", the answer is unequivocally "it should be bound once and updated", if it only exists in the scope of the loop function, both "it is bound on each iteration" (and thus safe to close over without surprise) and "it is bound once and updated" are valid answers, but I have a preference for the first, but many languages actually choose the second.
The example makes sense when you think of `i` getting reassigned, but in other languages with variable shadowing (think Rust, but Go too when you're in a nested scope) you might replace `i = 2` with `let i = 2` (Rust), in which case the closure closes over precisely the `let i = 2` variable and the expected output differs.
I think the reason this seems surprising is the value/reference semantics of values. For example:
def foo(i, j):
i[0] += 1
j += 1
a = [0]
b = 0
foo(a, b)
print(a) # [1]
print(b) # [0]
So people think the body of the loop like a function call.
I don't think it's a bad expectation, in fact I think it's quite a natural expectation—in particular if you've programmed functional languages where modifying values is the exception, not the rule—which is why it surprises people. It's just not the one way Python chose.
The problem is the intersection of people's intuition about closures from languages without mutable state bumping into the imperative world built of mutable state.
For example, equivalent code in Elixir:
i = 0
list = []
list = list ++ [fn -> i end]
i = i + 1
list = list ++ [fn -> i end]
i = i + 1
halfway = list
list = list ++ [fn -> i end]
i = i + 1
list = list ++ [fn -> i end]
IO.inspect Enum.map(halfway, fn f -> f.() end)
IO.inspect Enum.map(list, fn f -> f.() end)
Would produce the functional-intuitive result of
[0,1]
[0,1,2,3]
Because there is no mutable state. Those repeated assignments to i and list are exactly equivalent to the scenario where each i was actually i1, i2, i3, etc.
I expect that to write the obvious result (the last value assigned) and I also expect a capture in a loop to write the value as it was at the time of iteration.
They are both the behavior that’s intuitively obvious. And that’s despite them both being the same behavior.
As we can see, it’s important to not use the simple/obvious implementation because it’s so unintuitive it’ll need to be changed even if breaking (as in C#)
I feel that the original Python example is suffering from a little too much lambda.
It seems like we're iterating through something to build up a computation we may execute later. I can conceive of a situation where you might want to do that, or at least consider it, but in general I say just do the work now and build a list of results.
It looks like that because it's a minimal example. Real cases usually involve callbacks or expensive computations. How often it happens depends a lot on what you're doing.
seems like the non magic solution here would be block scope. I guess it would still be slightly magic in that each iteration gets its own scope, but at least that's easier to wrap your head around than just a special case
This is mostly why Ecmascript introduced lexical scoping (a bit similar to "block" scoping) with "let" back int ES 5 I believe, because var used the function scope and developers would run into issues whenever they did this:
for (var i = 0 ; i < 10 ; i ++ ){
someElement.addEventListener('click',function(){
console.log(i);
})
}
i would always be equal to 9. With let instead of var, i is properly scoped and the executed script display each increment correctly.
The solution before let was to introduce a closure in the for statement body
for (var i = 0 ; i < 10 ; i ++ ){
(function(i){
someElement.addEventListener('click',function(){
console.log(i);
}))(i)
}
in order to capture i value. let can also easily isolate the scope of a variable so that it doesn't popule the global scope
{
let foo = "bar";
var baz = "qix";
}
// foo is undefined here, while baz is defined.
which removes the need for self invoking functions.
I guess this is confusing but i just think of it as capturing the reference to i, not the value of i. It would be nice if python had a nice way to deal with this as many people also trip over:
def fun(initial_empty_list=[]):
where initial_empty_list is a reference captured at function definition time, not a new value initialized on each call to the function.
Wow, Python and Go have it too? It was the biggest wart of pre-ES6 JavaScript, I never imagined other languages had it too (and it's honestly very disappointing for Go, since it's much more recent and we had plenty of hindsight when it was created…)
That's the point: the misleading code doesn't compile because the `i` variable scope in limited to the current execution of the loop (and that's why you can't borrow it for `functions`'s lifetime), which is exactly how ES6 fixed this in JavaScript.
Ah I have "fond" memories about once spending a good week debugging this, because I was also spawning threads/sub-processes in the loop, making this a Heisenbug that caused the program to crash every hour or so...
The statement literally declares or sets a variable named i in that scope. When the loop exits, i still exists in the scope with the value 9. If you call a function that was given a reference to i, the value will be 9 as expected because the function was called after the loop exited.
An equivalent construct works differently (and one would say in a way that is less error-prone) in other languages.
For instance the behaviour of a Python loop varies drastically depending on the size of the iteration:
def loop(n):
for i in range(n):
pass
print(i)
loop(10) # 9
loop(0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 4, in loop
UnboundLocalError: local variable 'i' referenced before assignment
That Python works this way is specific to Python. And a language which doesn't have the issues this implies would be "getting it right", in the sense of avoiding sharp corners and edge cases.
The statement can mean whatever the language designers decide it should mean, and one plausible meaning is to introduce a new scope for the body of the loop. It's not even something unique in Python, given that sequence comprehensions do just that; e.g. this prints 0,1,...,9:
for f in (lambda: print(i) for i in range(0, 10)): f()
Unfortunately, Python is simply inconsistent in this regard. For example, list comprehensions leak the variable for back-compat reasons, so if you substitute (lambda: ...) with [lambda: ...] above, you'll get a bunch of 9s.
But, backwards compatibility aside, the language could change to make for-loops behave like sequence comprehensions wrt scoping.
> Unfortunately, Python is simply inconsistent in this regard. For example, list comprehensions leak the variable for back-compat reasons
No, they don’t. They did in Python 2—list comps were introduced in 2.0, genexps in 2.4, and set/dict comps in 3.0 but also included in the later 2.7 release—but that’s been non-current for more than a decade, and conpletely out of support for two years. Let it go.
> But, backwards compatibility aside, the language could change to make for-loops behave like sequence comprehensions wrt scoping.
Sure in Python 4, but after 2->3, not sure many people are looking forward to that.
Python 3.10.5 (tags/v3.10.5:f377153, Jun 6 2022, 16:14:13) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> [f() for f in [lambda: i for i in range(0, 10)]]
[9, 9, 9, 9, 9, 9, 9, 9, 9, 9]
>>> [f() for f in (lambda: i for i in range(0, 10))]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
I don't know what this is supposed to prove, it just shows that the generator is lazy while the list isn't. It seems unrelated to scoping issues or leaking variables.
It shows that list comprehension has a single loop variable across all loop iterations that gets reassigned on each iteration, while the sequence comprehension creates a new loop variable bound to current item on every iteration.
I don't think that's what's happening. In your example with the generator expression, you're calling each lambda as you iterate through the generator, which due to the lazy evaluation of the generator means that the value of the single i variable shared across all each lambda is still only the latest value reached.
If you instead fully evaluate the generator expression before calling any of the functions (for example, by passing it to the list constructor), you get the same behavior as the list comprehension case:
>>> [f() for f in list(lambda: i for i in range(0, 10))]
[9, 9, 9, 9, 9, 9, 9, 9, 9, 9]
You're right, but that means that sequence comprehension also leaks the variable, so it's even worse than I thought.
Side note: I think that commenters above didn't quite understand what I meant by "leaking", because there's more than one scope boundary here. Roughly speaking, any comprehension or loop can be desugared into something that looks like a C-style for-loop:
Scope 1 is outside relative to the loop. Scope 2 is specific to the loop but shared by all its iterations. Scope 3 is specific to one loop iteration. The "leaking" I referred to above is from scope 3 to scope 2. I think other commenters took it to mean leaking from scope 2 to scope 1 - i.e. the ability to use the variable outside of the comprehension; that is, indeed, something that changed between Python 2 and 3.
If so, it's a bit misleading. The go bug just requires a reference, rather than a lambda. I bucket lambdas into a category of language features I expect to have more sharp edges around capture semantics than references.
Bug:
var all []*Item
for _, item := range items {
all = append(all, &item)
}
Fix:
var all []*Item
for _, item := range items {
item := item
all = append(all, &item)
}
Yeah, makes sense that it's popular. I just thought you were making a joke by creating a similar looking issue with a different underlying reason.
In Go example the issue happens because item variable is per-loop, in your python example the issue is not related to loops at all, it's just because functions capture the value of global variables at execution time.
And the cherry on top is that the solution is also similar looking(i=i), but working with a different mechanic underneath(default argument assignment).
Anyway, this was my perspective that led me to interpret this as satire. A bit disappointed haha
If I do the same with for instead of foreach in C#, my IDE gives me a warning: "Captured variable is modified in the outer scope". Isn’t that the case for Python?
I'm an experienced Go programmer and still accidentally do this every so often. It certainly bit me and my team several times when we were newish to Go. And particularly for newbies, it's hard to debug and understand when you do run into it. All of which to say I'm really glad they're trying to fix this.
I've been annoyed more than once by Java's “local variables referenced from a lambda expression must be final or effectively final” error. I used to suspect that this was just an excuse to simplify the javac implementation, but I'm not so sure anymore. It is interesting to see the issue from the other side.
The issue is not restricted to variables declared in loop headers, so the proposed loop change for Go might only be the start.
> I used to suspect that this was just an excuse to simplify the javac implementation, but I'm not so sure anymore.
AFAICT it doesn't simplify javac much, if at all. It still needs to synthesize closure objects with fields to store the closed-over values. It's just that those fields can be final.
I think Java did this to avoid programmer confusion. I think it was the right choice.
It simplifies the implementation immensely, because the fields in those synthetic objects can be populated with copies of the variables from the stack frame. This:
void foo() {
int a = 1;
Runnable r = () -> System.out.println(a);
r.run(); // prints 1
}
Can get turned into something like this:
class r_closure implements Runnable {
final int a;
r_closure(int a) { this.a = a; }
@Override
public void run() { System.out.println(a); }
}
void foo() {
int a = 1;
Runnable r = new r_closure(a);
r.run(); // prints 1
}
The local a is a perfectly normal local, and the field a is a perfectly normal field.
What would have to happen if the variable was mutable? For example, if you wanted to write this:
void foo() {
int a = 1;
Runnable r = () -> System.out.println(a);
a = 2;
r.run(); // prints 2
}
You have to transform it to something like this:
class r_closure implements Runnable {
int a;
r_closure(int a) { this.a = a; }
@Override
public void run() { System.out.println(a); }
}
void foo() {
int _a = 1;
r_closure r = new r_closure(_a);
r.a = 2;
r.run();
}
Where there is no local, and where the method looks like it's accessing a local, it's actually reaching into the closure and mutating its field!
Now think about doing this if you've captured a variable in two closures, or a variable number of closures in a list. The wheels come off this approach.
Instead, you would have to promote the shared mutable variable to its own object, like this:
class int_box {
int i;
int_box(int i) { this.i = i; }
}
class q_closure implements Runnable {
final int_box a;
q_closure(int_box a) { this.a = a; }
@Override
public void run() { System.out.println("q = " + a.i); }
}
class r_closure implements Runnable {
final int_box a;
r_closure(int_box a) { this.a = a; }
@Override
public void run() { System.out.println("r = " + a.i); }
}
void foo() {
int_box a = new int_box(1);
Runnable q = new q_closure(a);
Runnable r = new r_closure(a);
a.i = 2;
q.run(); // prints 2
r.run(); // prints 2
}
Now you've taken a simple local variable which just need to be copied, and turned into its own thing on the heap!
It gets even worse. In the general case, you need to use a separate object for each captured mutable variable for space safety reasons (avoiding unwanted object retention not present in the source code). The easiest way to achieve space safety involves a separate object for each captured variable, but of course that is unnecessarily wasteful in many cases. But in order to coalesce captured locals into fewer reference objects, you need to do some sort of lifetime analysis to find cases where locals become unused at the same time.
> The issue is not restricted to variables declared in loop headers, so the proposed loop change for Go might only be the start.
Technically it’s not but practically it’s by far the most common way for this to unexpectedly arise.
The other cases like closing over a variable and then modifying it before the closure is invoked are a lot less common to hit unexpectedly, and a lot harder to fix nicely (short of Java’s big hammer).
The latter is also easier to re-inline accidentally because
go func (x int) { do_work(x) } (x)
seems like a very roundabout way to say
go do_work(x)
The "x := x" solution too suffers from this problem but slightly less: both idioms look like they are no-ops (while they are actually not) but at least "x := x" is weird enough to look like it was a deliberate choice, not some vestige from refactoring.
Huh. Now the closure-capturing in Go makes even less to me: instead of capturing the variable's current value it captures the variable itself i.e. puts &x into the closure instead of x — and no other piece of Go does that, although I was sure "go fun(args)" passed args by-ref but apparently not.
What's even the point of capturing the variable itself? To allow for writing inline callbacks that could sneakily mutate loop-local variables?
More generally allow the closure to write to its environment. That is the normal behaviour of closures in imperative langages, Java being the major exception because it rejects closing over non-final variable (and obviously that only blocks assignment, if the object is mutable you can do what you want to it).
What would be the use of closing over the value rather than the variable? That would stop a lot of interesting use of closed-over variables (like persisting values from call to call).
In fairness while that would be a lot less convenient in reference-based langages in Go you could just close over a pointer to the variable, making the relationship explicit.
That’s how you’d do it using a [=] lambda in c++ or a move closure in rust.
It's even worse in python where loops do not even create a new scope and assign on each iteration. From reading this Go creates a scope for the entire loop, but assigns rather than initializes on each iteration.
Then again, Python has the same syntax for assignment and initialization.
Thrilled to see Jared Parsons of the C# team pitch in and provide some perspective on how things were done for C#5 when a similar change was made. Kudos Jared!
What's interesting is that C# 5 release (which made the breaking change) was back in 2012, and both the change and the reasons for it were very widely discussed at the time. This is right around the time when Go shipped its 1.0, and it's kinda surprising that they either didn't look closely at "near-peer" languages, or if they did, couldn't see how this problem was fully applicable to their PL design, as well.
(Note that C# at least had the excuse of not having closures in the first version, which makes scoping of "foreach" moot - the problem only showed up in C# 2.0. But Go had lambdas from the get-go, so this interaction between loops and closures was always there.)
Same here, DevDiv is now polyglot focused, so you will see regular comments from .NET folks on other languages as well (mainly Java and Go). David Fowler tends to tweet every now and then about them as well.
> we gave the general rule that language redefinitions like what I just described are not permitted
Not just a "general rule" that document also specifically talks about precisely this issue (for loops) and resolves that Go will not fix this.
Change is the only constant, we should design systems with the expectation that they'll need to adapt over time or they will be replaced by something which can. With this mindset, Go should have solved the for loop problem years ago, just as C# did. This could have been a story about how once upon a time Go had these very silly for loop semantics, but that hasn't been true for many years.
This necessity of change is why I think the decision not to take Epochs for C++ 20 was much more consequential than things like rejecting the "Goals and priorities" paper which had immediate effects (in that particular case spurring the Carbon experiment).
I mean, Go had vendoring and vendoring managers not unlike npm and it's node_modules.
But a language more comfortable with change implies that change is more likely - not less likely. Given pretty much every modern language has some kind of dependency management utility - I'd be surprised if Go didn't end up with one
Are you of the opinion that the version in go.mod existed many years ago, or that requiring a recent number in it won’t comparatively limit the impact? Both seem obviously false to me.
I think they misinterpreted the word “existed” in this case. They took it as go mod would have never been made instead of go mod didn’t exist at that time.
But that's exactly what that sentence says, isn't it? Otherwise it'd been "Arguably if Go had done this years ago when go.mod did not exist, this would have had an even bigger impact" or something like this?
> But that's exactly what that sentence says, isn't it?
No?
> Otherwise it'd been "Arguably if Go had done this years ago when go.mod did not exist, this would have had an even bigger impact" or something like this?
That’s a worst take on the same sentiment? I find your version a lot clunkier. The original sets a hypothetical stage and from that its conclusion, I think it flows better.
> The original sets a hypothetical stage and from that its conclusion, I think it flows better.
The stage is "years ago", and conclusion is "go.mod would not have existed and this would have an even bigger impact". My version re-arranges the sentence so that "go.mod not existing" bit is a part of the premise, not of the conclusion.
Tense agreement in English subjunctive is hard. Especially for non-natives such as me: I do parse the original statement like that and just can't bring myself to understand it otherwise.
English native, and I parsed it the same way. The OG comment to me reads as if "go.mod never would have existed", not "go.mod didn't exist at this point in time."
Exactly, the comma is what does it, it separates the sentence into three fragments and then the 'and' joins segments two and three.
If the comma had instead been the word "when", as suggested, this would parse the other way. It still would have been a bit awkward but would make sense.
Ironically I have stopped using the word "when" in constructions like this because it confuses non-native English (esp. German) speakers who read it in the conditional rather than temporal sense.
There's an ambiguity in the phrase "then go.mod wouldn't have existed." One way to read it is "then, as a consequence, go.mod wouldn't have existed" and the other is "then go.mod wouldn't have existed at that time." I believe the intention was the latter, whereas you're inferring the former.
go.mod provides a way to introduce the new behaviour incrementally. The key point is the dependencies can (and already do) declare different go versions in their go.mod file. When everything is compiled, each module is compiled with the behaviour that applies to its own declared go version. So even if you update your go.mod go version to take advantage of this (soon to be) new behaviour, you can continue to use existing deps happily without worrying they will break.
If you're at all interested in how for loops and scope work in Javascript, Jake Archibald and Surma have a great video on the topic: https://www.youtube.com/watch?v=Nzokr6Boeaw
Similar changes were made in newer versions of ES so that the for loop in this article works out of the box, like C#.
Slightly off-topic to this article: I wish "do while" loops had the "while" condition in the inner scope, not the outer scope. So many times I have wished that I could access the inner scope... I end up using a while(true) with an if { break; } at the end instead in 99% of cases where a do while could've been the perfect thing...
Except the ECMAScript changes weren’t changes but additions. Anything using the var keyword gets the traditional semantics, anything using let and const gets lexical scoping.
As an example:
x = [];
for (var i of [1, 2, 3]) {
x.push(() => i);
}
x.map(f => f()) == [3, 3, 3]
x = [];
for (let i of [1, 2, 3]) {
x.push(() => i);
}
x.map(f => f()) == [1, 2, 3]
What’s being proposed for Go is instead a breaking change.
>What’s being proposed for Go is instead a breaking change.
It sounds like you only get the breaking change in modules that require above a certain version of Go. So this should not break old code. It's perhaps more analogous to the way that "use strict" in JavaScript 'breaks' parsing of octal constants.
> What’s being proposed for Go is instead a breaking change.
It’s a breaking change in the same sense that the `”use strict;”` semantics of JS were: it’s not actually a breaking change, because you have to opt in.
Ecmascript had the same problem, and fixed it without breaking backwards compatibility. In javascript,
for (var i=0; i<3; i++) setTimeout(() => console.log(i))
Would print 3,3,3.
So in ecmascript 6, blocked-scope variables were introduced, but the semantics of old-style "var" declarations was not changed. You can now write
for (let i=0; i<3; i++) setTimeout(() => console.log(i))
Which prints 0,1,2 as expected.
And you don't have to go open a `go.mod` file to know what the code you are reading does.
Right call for js, but I think not for C# or Go, since as time goes on, the old behavior will be more and more of a relic, but you'll be paying the cost for having two syntaxes forever. (In js tho you have no alternative, because you don't have anything like go.mod (I think?) and you really can't break backcompat.)
I'm just happy that it's one of the few weird quirks of Go. In my spheres, Javascript is the most commonly used language, and the 'wat' presentation is frequently referenced to.
I fail to see any downside to changing this semantics. This has honestly always felt like a language design bug more than anything. Having to write foo := foo at the beginning of a loop for it to behave as expected is a strong design smell.
The gradual breaking (of fix depending on your point of view) with explicit opt-in looks great to me.
> Loop variables being per-loop instead of per-iteration is the only design decision I know of in Go that makes programs incorrect more often than it makes them correct. Since it is the only such design decision, I do not see any plausible candidates for additional exceptions.
(For me, the link you posted does a 302 redirect to https://www.uber.com/pt-BR/blog/data-race-patterns-in-go/ which gives me a 404 error page. It's a bit insane that whether the link you posted works or not depends on your locale, and unfortunately this is not the first time I've seen this kind of baffling redirect misbehavior.)
var all []\*Item
for _, item := range items {
all = append(all, &item)
}
When &item is the same for all iterations, that means that it's pointing to the same memory address. Is each item in items copied to this address prior to each iteration body invocation? This seems strange as this copy could potentially be very expensive. What am I missing?
To add to the sibling, you could iterate by index to avoid the copy:
for i := range items {
ptr := &items[i]
...
}
In this case of course you also get different semantics. The ptr variable is bound to the address of each of the original Item values in the slice, whereas in the code in your comment, &item is the address of a single heap-allocated Item variable.
Most of the time you're iterating a slice of pointers, though, so only the address gets copied. And in those cases, this bug doesn't exist(unless of course you're going from * to ** for some reason).
You aren’t missing anything. How expensive the copy is depends on the type of item. This is why you should probably use pointers in slices you’re going to loop over instead of the object itself.
Technically there is no crazy behaviour, it’s a natural consequence of scoping loop variables outside the loop, which historically was common.
It became an issue as lambdas and other lambda-type constructs (which implicitly keep a reference on the loop variable) became more common, and a bunch of languages got caught in it. Later languages switched to the “inner scoping” mechanism to avoid it.
This is not really what's happening. The variable is scoped to the loop. The issue is that even if you take the address of the variable, each iteration of the loop will have the same variable with the same address.
nums := []int{1, 2, 3}
for _, num := range nums {
fmt.Printf("%p\n", &num)
}
This will print the same address three times. If you add "num := num" as the first line in the loop, it will print three different addresses. The proposal is to make this the default behaviour.
Arguably not, because an 'iteration' is a unit of execution, not a lexical unit. If the variables were really scoped to loop iterations, that would be a form of dynamic scope, which would have a different semantics. So for example, say the loop calls a function foo. This function executes inside every iteration of the loop, but within foo, one cannot access the loop iteration variable (as it is in a different lexical context).
My point is precisely that lexical terms (like loop body) are insufficient to explain the Go behaviour. The loop variable is clearly lexically scoped to the loop body. "Scoping loop variables outside the loop" is commonly understood as the Python gotcha:
for x in range(3):
foo(x)
print(x) # prints 2
or the pre-C99 style:
int i;
for (i = 0; i < 3; ++i)
foo(i)
printf("%d", i); // prints 3
This is not the case in Go. I don't think talking about variable scopes accurately describes the issue (because there's nothing special about loop scopes here: the same "escape" can happen from any scope), and changing "loop" to "loop body" doesn't improve this. The term "loop iteration" at least identifies the dependency between different iterations of the loop as the issue.
I don't understand what the thought experiment about non-lexical scopes has to do with this.
It's purely a question of scopes, as indicated by the equivalent code in the article:
for _, elem := range elems {
elem := elem
... &elem ...
}
Nothing beyond regular lexical scoping and Go's ordinary assignment semantics are necessary to see how this works. The second 'elem' has a narrower scope than the first (it is limited to the loop body). Abusing Go syntax, you can think of the current semantics as follows:
{
var elem Elem
for _, elem = range elems {
... &elem ...
}
}
Here 'elem' scopes outside the loop body, and so is reassigned on every iteration of the loop (and &elem evaluates to the same address on every iteration).
>the thought experiment about non-lexical scopes
It's not just a thought experiment. There are languages with dynamically scoped variables (e.g. global vars in Common Lisp).
I understood the JS version of this problem, but to see it happen with an address-of operator is just weird. In C if I took the address of a function scoped / temporary variable and kept it around it would be very bad.
I guess taking the address "promoted" the shared variable to enable it to survive past the function?
Said differently: Go puts variables on the stack only when the compiler can prove no references escape the stack frame. Stack is purely an optimization.
The subtle difference between the two ways of communicating the design: There are cases where the reference does not escape, but the compiler doesn't know how to prove that. So saying "if it escapes it's boxed" is subtly wrong.
The default is heap, and only when the compiler can be sure it's safe, things go on the stack.
It’s just what happens if you interpret the for loop as modifying a single variable binding on every loop rather than rebinding on every loop. That’s how loops have normally been conceived in imperative languages, but it doesn’t make much of a difference until you add closures (which can capture the binding) or garbage collection (allowing you to capture the address of the ‘local’ variable). The same issue existed with ‘var’ for loops in JS and bit me several times: https://stackoverflow.com/questions/750486/javascript-closur...
If you hide pointers and destructively update variables, this happens easily. I'd say the Go problem is a bit worse because there are also exposed pointers via that '&' operator.
But the problem should have been known when the language was designed. In CommonLisp, i.e., quite a mighty but old language, it is the same: the capture is on a loop variable that is destructively updated:
(setf refs (loop for i in '(1 2 3) collect #'(lambda () i)))
-> (#<function) #<function> #<function>)
(mapcar #'(lambda (f) (funcall f)) refs)
-> (3 3 3)
The same done with a fresh function parameter works. And since this is the usual style in Lisp, I suppose the problem will not be that obvious like in Go:
I remember learning about this gotcha around the time that Go was coming out. I was very disappointed that they were sticking to this behaviour around the same time that JavaScript was fixing it. As a new language they get less sympathy from me.
It does feel like Go is relearning lessons of other languages quite often. It's good that this and generics could be fixed later on, but it's unfortunate that they couldn't fix the null pointers (and going further always defaulting to some default value) and error handling.
Several languages have or have had this behaviour. C# made a similar change to its 'foreach' loops a long time ago (but not 'for' loops for some reason).
I suppose it's the natural way to do it when implementing a language and not thinking about it too much. It makes a for-loop equivalent to a simple while loop with the loop vvariable initialized outside of the loop.
The C# change is described in detail in the linked GitHub issue - one of C# devs left a comment explaining it.
TL;DR is that both "for" and "foreach" scoping fixes would be breaking changes, but "foreach" was easier to justify because it was already a C#-specific construct syntactically, unlike "for" which uses the same exact syntax as C, Java etc, and they were very sensitive to backwards compatibility at the time (esp. since the tooling didn't have the ability to target various language versions within the same project easily). At the same time, "foreach" represented the vast majority of breakage when they looked at existing code, perhaps because the scoping in classic "for" is more obvious due to the fact that variable mutation is explicit there.
I think it's quite normal. Because the common three part for semantic is just a suger of
initVariable()
while checkVariable():
doSomethong…
updateVariable()
So the variable is shared by all iterations at first place. And that isn't a issue until we have lambda or something similar that can capture reference of loop variable.
Later languages found it is problematic when use with reference capturing features and changed to something else (one variable per iteration)
It is a combination of scoping, which other people already wrote about, and also because the language support mutable data by default. Had the language been immutable/persistent by default, this particular semantic problem cannot occur in the first place.
Complexity increases rapidly when you combine constructs in a programming language. You get some feature interactions which are hard to get right, and also to predict.
I gave this post all the upvotes and hearts I got. It's one of those weird Go language quirks where developer expectations just never match what's going on behind the scene. I still often find myself always looking at the loops if they actually modify/use correct variables.
Shockingly high quality discussion too— have the comment moderation tools on Github gotten better in recent years, or is this just the golang community being awesome?
Strange, never had a problem with this. If I need to reference non-pointer I just use key(append(foo, items[k])) but i find it weird that the print example is not working since in loop the value is passed by copy, just like everywhere else in go, so it should work correctly. as for pointer, again, pass by copy. I see no problem here, it either never bit me in my ass or i figured it out soon in my Go journey.
I find this same semantic in JavaScript weird albeit practical. Many descriptions of either for-loops or let-variables don't mention that their combination is a special case: each iteration gets its own instance of the variable, but somehow the result of i++ gets copied to the following iterations.
Here you can see that the i from the previous iteration gets copied first and i++ applies to the next iteration:
for (let i = 0; i < 10; i++) {
setTimeout(() => {
console.log(i);
}, 1000);
}
// prints 0...9 (as expected?)
I always thought of the i++ happening at the end of the previous iteration, but that's wrong as it produces a different result if written explicitly:
for (let i = 0; i < 10; ) {
setTimeout(() => {
console.log(i);
}, 1000);
i++;
}
// prints 1...10 (as expected?)
(Also, these don't change if I remove the curly braces, so the let is not scoped within curly braces as I thought...)
My knowledge of javascript semantics is pretty limited, but you can see what's happening more clearly:
for (let i = 0; i < 10; ) {
setTimeout(() => {
console.log(i, foo);
}, 1000);
let foo = "baz-"+i;
i++;
}
// "1 baz-0", "2 baz-1, etc
It's behaving as each loop iteration is creating its own closure, and the inner function is referring to those variables by reference. So any changes you make to them inside the body of the loop will end up visible to the inner function.
My point is that for the i++ to make the loop advance (whether it is written inside the block or outside it in the header), behind the scenes, the value of the old i is copied to the new i at the beginning of each iteration.
I think to then understand why my two examples produce different results, you have to know that the i++ in the header happens to be executed after that copying has been made (an arbitrary choice?), while the i++ within the body will be (naturally) executed before the copying.
I suppose the way it works can be intuitive, but it can also be confusing if you think of the i++ as the last action of each iteration in both of my example cases.
Actually, there's a third way to write the example loop, but who can guess which result it gives?
for (let i = 0; i++ < 10; ) {
setTimeout(() => {
console.log(i);
}, 1000);
}
... it will print 1...10! So even within the loop header, one part is run before the copying and another part after the copying. How is this intuitive and where is this documented apart from the language spec?
EDIT: My bad, of course the iteration condition has to be checked at the beginning of the iteration, so in this third example i == 1 during all of the first iteration).
There we can finally see that on the first iteration, an environment is created (and iteration variables copied) before the test ("step 2"). After that, a new environment is created (and iteration variables copied) towards the end of each iteration, before the increment step ("step 3.e" and "step 3.f").
Another commenter links to a great video that explains the same thing about the spec and how confusing the environment creations can be: https://news.ycombinator.com/item?id=33160373
WTF. I haven't been interested in Go after looking at it briefly a decade ago. This post brings back the weird feeling about it that I had then. It looks like taking the address of a local variable (item := item; append(..., &item);) and using that outside the scope of the local variable. But apparently that is ok.
Variations of such perceived weirdness exist in many other languages with complicated "object models" as well to be fair. Delphi has some strange adressing stuff going on as well. Python has this weird "default list" thing. Most object languages don't let you take the address of something (like the Go example shows) at all, but have only object references which I find unergonomic.
Yes, taking the address of a local variable is exactly what it's doing. But Go's compiler (and garbage collector) ensures it's safe to do that. The compiler will allocate on the stack where possible, and the heap where necessary, invisible to you. This is all very normal in Go.
I'm not sure prefixing your comment with "WTF" and your 10-years-ago dismissal helps the discussion here. Yes, as we've learned, this was probably the wrong decision, but it's not hard to see why it was done that way originally (C# made the same decision), and now they're having a reasonable technical discussion to try to solve it. And -- even though I've been bitten by this several times myself -- it's not a terribly common occurrence.
I can see why it's a surprise. Most languages that I know of fall into one of two types: (1) garbage collected and assigning a variable or passing to a function actually passes a reference to the object (certainly true of Python, Java, C#); (2) memory is manually managed and you can take the address of an object (C, C++, Rust).
That history makes it feel like "taking the address" is a really trivial operation - returning a numerical value that the compiler had access to at that point anyway. Here it's adding a reference to the object in some sense, and maybe even changing how it's allocated earlier in its lifetime (on the heap rather than the stack). I don't use Go and I agree that using &x for that operation feels a bit wrong as an outsider.
Fwiw the issue occurs in langages of category (1) though only in a subset of the cases. Generally closures as they implicitly take references on their lexical context.
It also occurs in langages of category (2), specifically C++ lambdas where i think it can cause UAF/UB. I assume it also happens in C with the block extension (is that still Apple specific?) though I don’t know the details of that thing so maybe not.
The discussion here is specifically about using & to extend the lifetime. In the first case you mention, you don't use the & operator. In the second case, you do (at least with C++ lambdas), but there's no lifetime extension going on.
There's no special 'lifetime extension' operation involved. Semantically, everything is heap allocated† in Go. The garbage collector takes care of deallocating it when nothing references it anymore. In other words, at the level of the language semantics, all tracking of lifetimes is done dynamically at runtime by the garbage collector.
Go does in fact stack allocate variables which it can prove not to outlive their lexical scopes, but this is merely an optimization. Unless you are trying to write optimal code, there is never any reason to think about which values are stack allocated in Go.
There's not really any such thing as a 'local variable' in Go. A variable has whatever scope it has, but there's nothing special, semantically speaking, about variables defined inside functions or inside loops.
If the use of & in the example code is puzzling, it's probably because you're expecting Go to have some C-like concept of an automatic (i.e. stack allocated) variable – but it just doesn't.
>That history makes it feel like "taking the address" is a really trivial operation - returning a numerical value that the compiler had access to at that point anyway.
It is in fact a trivial operation in Go too, as I hope the above has clarified.
---
† Strictly speaking 'semantically heap allocated' is nonsense, but hopefully you know what I mean. There is no way to declare a variable in Go in such a way as to force it to be deallocated at the end of a particular lexical scope. A variable's lexical scope and its lifetime are entirely divorced (as is typical in a GCed language).
> There's no special 'lifetime extension' operation involved. ... The garbage collector takes care of deallocating it when nothing references it anymore.
I never used the word "special". As you say, adding a reference will mean the garbage collector won't deallocate it (until that reference is removed). In other words... its lifetime is extended. That's exactly what I meant.
> If the use of & in the example code is puzzling, it's probably because you're expecting Go to have some C-like concept of an automatic (i.e. stack allocated) variable ...
Not at all. In C++, you can use & on a reference variable and it will return the address of the object being referred to, regardless of whether it is allocated on the stack or the heap (or even statically allocated). Even in C, you can do &*x on a pointer to any object (which is silly by itself, but useful when combined with pointer arithmetic e.g. &x[3] translates to &*(x+3)).
> It [the & operator in Go] is in fact a trivial operation in Go too, as I hope the above has clarified.
Maybe I should have avoided the word "trivial" as its meaning is subjective, but I was careful to define what I meant by it: "returning a numerical value that the compiler had access to at that point anyway". Your comment just confirms that, as I said, it does more than that – it also adds a reference to the object.
---
To be clear, I'm not saying that it's bad or wrong that Go uses the & operator to mean this. Once you're familiar with the language, you probably get used to it very quickly. My point was just that it's a surprise initially if you're not familiar with the language, that's all.
>Your comment just confirms that, as I said, it does more than that – it also adds a reference to the object.
It simply evaluates to the address of the object, just as it does in C. if you think the & operator is doing something in addition to this, I think that must just be based on a misunderstanding.
I am not quite sure what you mean by 'adding a reference' to the object.
Let's take this function:
func foo() *int {
var x int
return &x
}
All that happens is the following:
- An integer is allocated (and initialized to zero).
- The address of this integer is returned.
If we dig into the implementation, we'll see that the integer is allocated on the heap. As far as Go's language semantics are concerned, everything is allocated on the heap and left to the GC to clean up.
As an implementation detail, values that provably don't outlive their containing functions are (sometimes) stack allocated. As x outlives its containing function, it won't be stack allocated. That's it. There is no special operation of 'adding a reference' or 'extending a lifetime'. Nor does the compiler even analyze lifetimes except for the purposes of applying an optional optimisation which has no effect on the semantics of the program. If you turned this optimisation off (which you totally could) then there'd be no need for the compiler to worry about x's lifetime at all.
> - An integer is allocated (and initialized to zero).
> - The address of this integer is returned.
That is not all that happens, at least down at the C/assembler level.
Let me illustrate what I mean. Consider this function, which also does both of these things (cobbled together from Google searches so please excuse incorrect syntax):
func foo() uintptr{
var x int
return uintptr(unsafe.Pointer(&x))
}
All that function does is allocate an integer (and initialise to zero) and return the address of that integer. Exactly the same as your function, right? Except it's obviously not - it doesn't extend the lifetime of the integer variable.
So why not? The GC somehow knows to ignore the number returned from my function, even though, under the hood, it's still stored in a register or stack location or whatever in exactly the same way as the address returned from your function. So how does the GC know to ignore it? Is that number somehow marked in a way that says "GC, when you're scanning memory looking for address-like numbers, don't pay attention to this one"? No. It doesn't look at the number in the first place because it hasn't been told to look at it.
In contrast, in your example, the memory address is not just returned from the function (in the C sense that it's put in a register for the caller to receive). It, additionally, somehow registers that memory address with the GC to let it know that there's another reference to that variable location. That is the extra thing that your function does that mine doesn't. And that magic happens (or at least starts) at the moment you use the & operator.
Yes, Go has a precise (i.e. non-conservative) garbage collector. It seems odd to me to think about that as some kind of special feature of the & operator. Even if one does, it's certainly not a surprising feature. Knowing that Go has a precise GC, one certainly expects the GC to know that the value of &x references x. If it didn't that would be a major bug.
The Go GC isn't a reference counting implementation. It traces the values of variables on the stack and it knows their types (because it knows which function any given stack frame corresponds to and it knows which variables that function allocates). Thus it knows that if a variable is of type *int and has a non-nil value then its value references an int. (And so on for fields of structs that are stored in stack variables, etc.) The & operator does not need to do anything special. The & operator merely takes the address of the object. When that address is stored in a pointer variable (or array member, or struct field...), that's when it becomes visible to the GC as a reference.
I apologize for being snarky. "WTF" is an expression of surprise though, much more than it is criticism. My point was that these object models achieve a desired level of user-friendliness and safety at the cost of being less orthogonal and less composable (as compared to, say, C) and having weird corner cases and surprises that catch you off-guard.
It can be argued, based on the number of people who at some point write code in C that takes the address of a stack variable and returns it back out of the function scope, that the "less orthogonal" corner case that catches you off guard is the way C forbids that action. Do not mistake internalized concepts from a particular language as some sort of divinely approved dictate of how programming must work.
This was basically Dijkstra's point in his BASIC considered harmful post... I think in 2022 it should be C considered harmful for the same reason. C is not the base truth of computation. It isn't even very good. A language smart enough to analyze taking pointers and notice it can't put something on a stack and simply take care of it is, in my opinion, the one that is not catching you off guard... specifically, the "guard" that one must take in C around what is stack versus heap.
C# made the same decision because it did not have closures initially, and so there was no practical way to observe the difference.
But Go did have closures initially, and worse yet, they already had C# as an example of how closures and loops interact. So they definitely had the opportunity to learn from that mistake, and I don't think it's unreasonable to ask why they did not.
I think the bar to a "WTF" reaction here is lower because this is the sort of thing we've come to expect from Go. I'm in the same boat as jstimpfle where I looked at Go some years ago, found it to be remarkably quirky for a relatively traditionalist language that isn't trying to do lots of new ideas, and haven't yet regretted staying away.
It's not just this bizarre gotcha (the fact that C# had it too doesn't make it OK). It's that Go has so many of these cases where they took very strong positions on things and then later reversed their position only after many, many years:
- Generics
- Only one gc knob
- No backwards incompatible language changes
Also, Go has been around quite a long time now and we've all read quite a few rants about its surprising cases. How comes this one never came up before? The thread provides evidence that it bites people regularly. It suggests to outsiders that you can't easily evaluate Go by reading about it because there will be sharp edges that people aren't talking about simply due to the quantity of things that are even worse.
You have remarkably strong opinions for someone who has not used the language much.
Personally after using it for 10 years I've been bitten very rarely by weird corners of the language and have enjoyed using it. My complaints are more around things I'd rather see removed (struct tags, panic, nils) and inconsistencies (built-in generics were quite limited, I quite like the design for generics they came up with though so I guess that is resolved once they update the stdlib).
Overall it's still my favourite language compared to others I'm forced to work in, I particularly like the decision to eschew inheritance.
I agree with this take wholeheartedly. Go is a pragmatic language. Some of the design decisions make a lot more sense when you use it, and because Go seems to have a culture of utility and self reflection, I think you see more openness and constructive criticism than in some other languages I’ve used.
I mean Javascript does / did the same thing (var loop variables get hoisted to the function scope, so they are available outside of the loop; add to that that `range` creates a pointer and you have a perfect storm of weird, confusing things.
But this is an important point: They were aware of it; it was by design; they had the chance to change it before 1.0 and didn't, and now they are showing willingness to change. So many languages are resistant to change on the one hand (I mean Go is), yet not resistant to keep adding features (e.g. Java / JS).
> Javascript does / did the same thing (var loop variables get hoisted to the function scope, so they are available outside of the loop; add to that that `range` creates a pointer and you have a perfect storm of weird, confusing things.
let is preferred nowadays in JS and doesn’t have the weird hoisting behaviour that var does/did. JS has neither “range” nor pointers though so I’m not sure what you mean by that.
In Rust we know we only gave the append function an immutable reference which lives until the next iteration. If append is OK with that, we're golden. Maybe internally it clones these references -shrug.
If append() needs the thing, not just an immutable reference which expires soon, its signature would demand we move one into it, and we don't have one so we'd need to e.g. make one with Clone.
TBF Rust also intentionally used the semantics Go is migrating to from the start, because:
- experience with the issue in languages with wider scoping e.g. it’s a common issue in JS, as well as Python (though slightly less so)
- Rust’s iterators were originally internal so that was pretty natural
(1) is also why for(let and for(const have different scoping than for(var in JS: `var` has function scoping, `let` and `const` were introduced with block scoping, and for loops they were specifically specced with “inner” (per-iteration) scope.
If you take a Rust for loop, de-sugar it to produce the actual loop { } which would run, and then modify that loop to have the Go semantics - with a single long-lived variable which is re-assigned each iteration, Rust detects the faulty Go cases of course.
[Edited: I tried to explain what's going on here, but I don't think my explanation was helpful so I've just left the surface]
Yes Rust’s ownership rules make it rather complicated to reproduce the faulty behaviour, as it’s about sharing mutable state which Rust intensely dislikes. You’d need to wilfully share (and update) internally mutable structures (cells, atomics) which is pretty noticeable and not something you do by mistake.
Aside from Python and the Shell, it never occurred to me that any language could possibly think of other semantics. It's news to me that C++ also assigns rather than initializes on each iteration.
It's simply a very bad idea that provides no use yet creates many bugs.
The thing is it only create many bugs when you throw closures into the mix. So historically languages did it because it was easy to implement (create a counter, increment the counter, run the loop body).
Before the early aughts, closures were mostly really common in functional languages which tend towards immutable bindings (and immutability in general), and very closure-focused languages closured everything so didn’t hit that issue (e.g. you wouldn’t hit it in Smalltalk because your counter would be a parameter to a block, so closing over that was no issue).
It’s really in the 00s with the explosion of callbacks-pile-javascript (and more generally the functionalisation of imperative languages) that the problem became a serious concern: you loop over a thing, you start some sort of async operation (network request for instance), and you find out that despite the request being correct the entire thing goes wonky (then again things commonly went wonky which didn’t help).
> The thing is it only create many bugs when you throw closures into the mix. So historically languages did it because it was easy to implement (create a counter, increment the counter, run the loop body).
It's a problem with references in general as this shows.
I also don't feel it's easier to implement at all.
One can either rewrite:
for $id:var in $exp:iter { $code:body }
to:
{ let $id:var;
while(True) {
let result = $exp:iter.next();
if(result.is_none()) break;
$id:Var <- result.extract();
$code:body
}
}
Or
while(True) {
let result = $exp:iter.next();
if(result.is_none()) break;
let $id:var = result.extract();
$code:body
}
The latter implementation is as far as I see easier, not more complex. Obviously all the code to create scoping already exists in the compiler and for-loops over an iterator work with a syntactic rewrite to an infinite loop with a break.
> Most languages don't have references, and in those that do before the issue was understood, the explicitness made it a much smaller issue.
But Go and C++ do, where this issue arose with or without closures.
> Now try lowering to bytecode or assembly instead of high-level pseudocode.
It doesn't matter, because as I said, all that is already in the compiler.
It would be needlessly complex and error-prone for compilers to hardcode custom code generation for such abstractions; it's transformed to something else the compiler already understands at a far higher level. I know for a fact that in Rust, for-loops already desugar to a simple infinite loop construct with a break at the H.I.R. level and all further optimizations only happen from there.
Which makes it more confusing why it was originally as it was because it's really not harder to implement as any compiler implements it as desugaring before optimizations een occur and this form is simpler.
The only explanation I see is that they really gave it no thought at all whatsoever and it wasn't a tradeoff but simply not thinking clearly.
I have said this before, but a source->source compiler is a very handy thing to have. A loop construct in chez or guile scheme would probably translate into something tail recursive. You could then macro-expand and optimize the loop to the code that is then compiled into lower level.
I have been in situations where I have had to expand a macro to figure out what is going on. Not having it in a situation like this (where a for loop is obviously just a goto or a tail call) is usuallly a pain in the ass. If it was translated into the same language it would also be easier to define what it should translate to.
I'd rather use a language from the beginning that isn't made up accommodate tons of newgrads as fast as possible on a big codebase. That is what Google needs, but most people are better of with more advanced languages. But people follow Google just like they did with Angular. I predict Go will follow the same destiny, will just take longer. The tech debt is still piling up.
Yeah… I seem to remember scratching my head because of a variation of this problem. It was kind of driving me nuts. Yet the number of moments like this has been low for me in Go compared to C++.
This seems reasonable. As others state, this is hardly an issue specific to Go.
In CoffeeScript (which I think solves this nicely), it looks something like
for i in [0...3] # 3 3 3
setTimeout (() -> console.log "#{i}"), 100
for i in [0...3] # 0 1 2
setTimeout (do (i) -> () -> console.log "#{i}"), 100
for i in [0...3] # 0 1 2
do (i) ->
setTimeout (() -> console.log "#{i}"), 100
I used to get compiler warnings for this particular problem (getting a warning tag in Vim, not when compiling). Sadly the warning stopped working after some update and I have been bitten by this several times since.
I'm glad I read this today because I'm starting to do some work in Go. Anyone have any other good sources for "gotchas" in Go (Go-tchas?) that I can read to familiarize myself with how to think better in Go?
The reason for this change is to make wrong programs into correct programs. The examples will introduce more GC pressure, because the `append(..)` will append different values as the programmer expected. Thus, correcting the bug adds to GC pressure.
In the cases where the new value isn't appended, the compiler can easily optimize it away, and reuse the same memory location in a register or on the stack. My guess is that the optimizer portion of the compiler is at a place nowadays where this will happen, with no further change needed. When the compiler was new, it might have been a regression in efficiency.
If I understand correctly the fix requires the new code to add a line in go.mod to use the new behavior. This is about the same as adding x:=x in the loop, and more hidden. Not good.
The alternative can be that whenever address of iteration variable is used inside the loop the variable is per iteration and otherwise it is per loop. This way it is not breaking the old code and have new semantics.
> This is about the same as adding x:=x in the loop
It's one per module though. In large enough modules you'll have tens of x:=x though. I assume this also opens the option of doing more such changes through the same system in the future.
The “static analysis” section says that it is impossible to catch all cases where address is used which is true. However if the analysis checks for whether address is TAKEN, then it is trivial. I would like to propose that as an alternative- whenever address is taken the variable is per iteration. Otherwise per loop.
And assuming new projects get the new mod by default, it also makes the langage default-safe, as opposed to requiring the cognitive overhead of evaluating individual loops, or requiring this nonsense as explicit prolog to every loop.
Specifically, Golang 2.0 was reserved for backwards incompatible changes, generally understood to be of some significant size, not twiddly bits like this that are adequately covered by a go.mod flag. Generics was thought to possibly involve backwards incompatible changes, but it was done without backwards incompatibility. There aren't any similarly-sized issues on the horizon right now that would result in a 2.0.
If you track the issues closely on GitHub, there's a constant low-level discussion going on about it, but so far nothing seems to be both important enough and large enough to justify the version change, like, not even close.
Looks like a bug you would expect to find in JS. ;) Seriously, I think the bug is apparent. But I have done time in C.
Closures btw is a sucha horrible pattern. Added in many langs. Always banned by corporate guidelines. The fib example on the GO site is a good example on how confusing it can be. It is right up there with promises and other trash.
Ah ok, so different name resolution rules apply on the left and right sides of the operator?
On the left side, "item" resolves to the "item" in inner scope (that is declared on the same line). On the right side it resolves to the "item" in outer scope.
Then I understand what's happening though I'm still not convinced it's sane behaviour. I would expect all identical names in a single scope to refer to the same variable.
I guess that sane language would be Pascal, where := is the assignment operator (and = tests for equality, like you would expect if your brain hadn't been molded by generations of C-family languages). In Go, I guess having := is a nod to Pascal, but it declares a variable besides assigning it.
One interesting thing several modern languages did was to decide that := is actually two separate tokens where : separates an optional type and = separates an initializer, so that
my_goose : Goose = get_a_goose();
... lets us give my_goose an explicit type, while we can write:
my_goose := get_a_goose();
... and leave the compiler to infer that my_goose is a Goose because that's what get_a_goose() returns and we didn't specify.
The idea here is that we have this flexibility but we didn't burden the language with what feels like an extra feature (like C++ auto) with its own special rules. Given that := starts out as a single token this is in some sense revisionist, but so long as you're trying to learn to program, not studying the history of programming, that's fine.
It reminds me of how eggcorns can become language features. A person incorrectly analyses a word or phrase they've heard, e.g. they think the things which fall off an oak tree must be named "eggcorns" because they look a bit like eggs. They apply this analysis, and, if the results are successful and out-competes existing correct analysis, it can dominate, next thing you know† your spelling correction tool says "Did you mean eggcorn?" when you type acorn.
† In reality these transitions usually take generations
It is a non op; it's a matter of scope and is resolved entirely statically.
One shadows the variable by a variable of the same name. The key is that the new variable is scoped only to the inside of the loop, whereas the original variable is scoped to the outside of it, thus assigning to the original variable in a next iteration of the loop updates that variable again.
The major case where this is an issue is if one somehow took a reference to the variable, that reference thus now points to the new value rather than the old.
Taking a reference to the new, more closely scoped variable that is local to the iteration of the loop does not have this problem.
Declares a new variable a and assigns it. It's sugar for:
var a = b
So read it as:
var item = item
Many languages have scopes and variable shadowing. You can argue this is bad practice or whatever, but it's only confusing if you think := is assignment which is covered in the first page of the Go language tutorial.
The thing which surprised me was not the variable shadowing, rather that the name "item" resovles to different variables on the left and right sides of the :=. This is unusual language design, most languages assign a single meaning to each name in each scope.
It's unusual for the C-style family, but very common in functional languages outside of it, such as the ML family. It's not uncommon to see repeated lines of the form:
let x = ... x ...
and every such line is a declaration of a new variable that shadows the preceding one.
should still be a noop imo. If it does some magic like dereferencing, make it explicit. And no, other languages making the same mistake is not a good excuse. Especially not for a language focused on inexperienced newgrads.
Exactly; the difference is in scope, the `item` in the range is scoped to the enclosing block, the `item` inside of the block is loop-block-scoped, shadowing the parent block.
That said, it looks and feels dirty and buggy and it's a known workaround for a Go issue, so I'm glad they're at least talking about fixing it.
I see your point. I think we're just using a different definition of "no-op".
The statement by itself doesn't produce any visible side effect.
However it creates a new logical variable in the abstract machine, and that can have real consequence in the real underlying machine, depending on which statements happen next.
In particular, the new variable and the old variable are independent and thus the new variable may require to allocate some storage location (e.g. on the stack, a register, ...) to keep track of further mutations to its value (I say "may" because an optimising compiler may do without that extra location)
the two lines doing different things even though looking exactly the same.
I'm a Scala dev and in Scala we have a similar thing, where a new import or definition in the middle of the code can change line semantics between two equal lines.
But the difference is that this is not a mistake but a concious design to switch "contexts" and is heavily guarded by the typesystem, where GOs typesystem is incredibly weak in comparison, which is not an inherent drawback, but here it is.
I do feel that I'll stick to my feelings that this design is confusing and should have been avoided from the beginning. Ku
udos to the go team for breaking compatibility here. This is necessary for a language to not become another COBOL.
You can't declare the same variable twice in the same scope.
item := item
works only because it declares a new variable in the current scope, initialized with a variable with the same name (but different variable) from the outer scope.
Go distinguishes declarations from assignments. In this example, the second assignment is a no-op indeed.
Why should it be a no-op? Even if you interpret the identifier on the right as referring to the same variable, it just means that you're trying to initialize the variable to an unspecified value. Surely that ought to be an error, not a no-op?
This is the declaration operator not assignment. It would not look like a no-op even if it isn't completly clear. It is creating a variable you control vs one that the loop controls and changes in ways you may not expect.
Am I the only one who thought about currying here? I mean, neither the ticket nor the thread mentions it. I don't think it's such an alien concept to people.
TBH, with my full sincerity, this isn't really a way to solve the problem. Currying has been used for decades for reasons. For-loop isn't the only place where you introduce outer-scope variables into an immediate function. You can always have other variables involved, and you'll still have to be careful. Even though this change will reduce the absolute number of bugs, people will still have the same issue.
Are you sure you mean currying? Currying is where you transform a function taking multiple arguments into a function that only takes one of the arguments and returns another function which takes one of the remaining arguments which (...recursively).
Maybe you meant partial function application? Even that doesn't seem relevant, given that the function is actually being called immediately in the first example.
Hmm maybe it wasn't a common usage of the name. IIRC some people did call this "currying" in some imperative languages, though never did so in actual FP nor Lisp. You see, clearly this technique here doesn't really have any name.
I also think "currying" and "partial application" kinda-sorta make sense. Unbound variables in a "immediate function" are merely hidden parameters of the function if analyzed semantically. The question is whether you pass them by-ref or by-value.
Yes, some people misuse the concept of currying in the wild. It really shouldn't be changed from its FP definition, because it is fundamentally different from partial application. Basically currying is only useful in Haskell. If you don't build it into your compiler as a first-class citizen you're going to be forever fighting a proliferation of closures in your generated code and then trying to optimize them back out. This is not a wise approach to language design.
Partial application is the better solution in imperative languages.
It is also not relevant to this discussion in the slightest, because the behavior of allocation and how those are passed into closures is independent of the question of how you spell those closures in code. You can easily build currying or some syntactic partial application into Go but leave this problem there, and you can easily fix this problem without building currying or partial application in.
This seems like a weird take. Currying was a completely natural thing to do in SML for example, why would it only be "useful" in Haskell ?
I agree that currying isn't likely to be the solution to any problem you have in say, Rust or Java. But "only useful in Haskell" jumped out as a weird claim.
Because SML is effectively dead and the only living language where it is useful is Haskell. It is not practical to discuss programming languages if one must include every dead language, every academic language every conceived of in some paper somewhere, every language sitting as a years-dead GitHub repo, etc.
Currying is the observation that a function that takes a pair as an argument, `(a, b) -> c`, is equivalent to a function that returns a function: `a -> (b -> c)`.
I'm absolutely not looking forward to potentially breaking changes. Stability is such an important selling point (*). This is not something that can be fixed by a search-replace or a refactoring operation. And it's just begging for problems in the community.
(*) please don't reply with "then you should keep the version below 1.20"
I knew there would be someone. So if you want a feature in a higher version, update your code base. Yeah, that's what stability doesn't mean. And "it's your problem now" isn't great Go advocacy either.
You’re confusing stability with ossification. There must be a way to correct mistakes, lest the language and runtime just keep accumulating crap. I wish the Go team would take a MUCH firmer stance on this.
There is no feature, it only sets the static evaluation of the language itself.
Think for a second, if the go.mod stricture locked in the entire thing, you’d just be told to not upgrade.
The entire point of the system is that it’s possible to update language-level semantics while allowing for the rest to progress for everybody.
Setting go.mod means you get stdlib updates, but the semantics of the for loop does not change. And if the designers decide to add similar breaking changes in the future (change string literals or whatever), you’ll also be ignoring those.
So your original complaint was "being able to opt into nBC is bad uwu", and now your complaint is that you want to mix and match if a future nBC change you do like comes around?
Potential is the key word there; RSC has done his homework, actually made the change, ran thousands of test cases and... the impact was minimal. The net effect was that existing implementation bugs were discovered and fixed; only TWO cases were found that caused an issue, and the fix was trivial.
With that in mind, while it's technically a backwards-incompatible change, I'd not go as far as call it a breaking change, since it was a net positive to make the change.
C# did it 10 years ago and it did not cause any significant problems in the community, mainly gratitude (as the C# guy in the replies mentions).
I'm trying to think of a case where you production code is legitimately dependent upon the following code snippet working with the current semantics so that all contains nothing but hundreds of the last item ...
var all []*Item
for _, item := range items {
all = append(all, &item)
}
I can't think of anything that isn't terribly contrived for the sake of argument.
I'm pretty sure my code won't be affected, nor will such trivial examples, but there's a lot of code around. I imagine the problem can occur by accident. Take the following scenario. A program has an array of buffers. Each has a lock. An initial function spins off threads with a buffer for each go routine, hoping to distribute the buffers evenly over the threads to minimize locking. However, because of the range bug, it hands the same buffer to each go routine. Then there's another implementation bug that just works fine because locking the buffer prevents it, or because the buffer contains the correct content, because somewhere, someone forgot to copy it, but there's only one, so no-one ever noticed. Is that so unlikely?
Another comment says that Russ Cox did the experiment, and did encounter a few problems. Not many, but they do exist.
Er... your scenario (or something of that nature) is literally mentioned in the post:
> Of the failures, 36 (62%) were tests not testing what they looked like they tested because of bad interactions with t.Parallel: the new semantics made the tests actually run correctly, and then the tests failed because they found actual latent bugs in the code under test.
And if you want to keep your old bugs in under-tested programs, you can, that's why the new behaviour is opt-in.
When C# made the equivalent breaking change, they tried very hard to find some code out in the wild that would be broken by it.
So far as I know, they didn't find any. They did, however, find a lot of already-broken code that authors didn't realize was broken, and that would be fixed by the change.