Go: Redefining For Loop Variable Semantics

BoppreH · on Oct 11, 2022

The same problem frequently bites me in Python too:

    functions = []

    for i in range(10):
        functions.append(lambda: print(f'Hello {i}'))
    
    for fn in functions:
        fn()

    # Hello 9
    # Hello 9
    # Hello 9
    # Hello 9
    # Hello 9
    # Hello 9
    # Hello 9
    # Hello 9
    # Hello 9
    # Hello 9

I find it one of Python's biggest warts because it's silent, hard to troubleshoot (especially the first time!), and like in Go the most straightforward fix looks like a mistake (i=i):

    for i in range(10):
        functions.append(lambda i=i: print(f'Hello {i}'))

I guess this is a lesson in designing language semantics that match people's intuitions, and learning from previous languages' mistakes.

kjeetgill · on Oct 11, 2022

I've always referenced this as one of the subtle points of genius for how Java did lambdas:

by requiring captured variables to be final it removes a lot of ambiguity around what a variable name refers to. I like that local variables can only be changed locally. If you do want crosstalk between the inner and outer scopes you have to be more explicit and introduce a reference to talk through.

I love python, but I basically avoid this construction and use a single element list if I need it. I can never remember exactly how it works.

gshulegaard · on Oct 11, 2022

I found this interesting because I have never run into this before...and I thought that was rare (in relation to how comfortable I am with Python). That said, I personally think this is the right and intuitive outcome.

If we expand out the loop to be manual we would have a script like:

   functions = []
   
   # expanded loop
   i = 1
   functions.append(lambda: print(f"Hello {i}"))
   i = 2
   functions.append(lambda: print(f"Hello {i}"))
   ...
   i = 9
   functions.append(lambda: print(f"Hello {i}"))

Now at this point if we were to:

   print(f"Hello {i}")

What would the expected output be? I would posit that anything other than "Hello 9" would be wrong, both logically and intuitively.

So by extension a loop of effectively "print(f"Hello {i}")" 9 times should just print "Hello 9" 9 times IMO. Anything else is counter-intuitive and definitely surprising.

BoppreH · on Oct 11, 2022

That's one way of thinking about it, and it works well for Python.

Here's another one, and the way I think about it:

    def _loop(i):
        functions.append(lambda: print(f"Hello {i}"))

    _loop(0)
    _loop(1)
    _loop(2)

    print(i)
    # Error!

I highly prefer lexical scoping, where the variables are bound to the block they were declared in, like Javascript's `let` vs the old `var`. This avoids shadowing and general namespace pollution.

I know it's not how Python operates, but I think it's how it should. Though I'd argue the syntactical similarities between functions and loops nudge users towards this second model.

pdonis · on Oct 11, 2022

Note that the way you write this is the best way to form closures in Python: form them inside a function with the closure variables as function parameters. This forces the closure to capture the current value of the closure variables when the closure is formed. Note that you could put the calls to the _loop function in a for loop and things would still work:

    >>> functions = []
    >>> def _loop(i):
    ...     functions.append(lambda: print(f"Hello {i}"))
    ...
    >>> for i in range(10):
    ...     _loop(i)
    ...
    >>> for f in functions:
    ...     f()
    ...
    Hello 0
    Hello 1
    Hello 2
    Hello 3
    Hello 4
    Hello 5
    Hello 6
    Hello 7
    Hello 8
    Hello 9

> lexical scoping, where the variables are bound to the block they were declared in

In Python that would only work if a "block" included comprehensions. For example:

    >>> functions = [lambda: print(f"Hello {i}") for i in range(10)]
    >>> for f in functions:
    ...     f()
    ...
    Hello 9
    Hello 9
    Hello 9
    Hello 9
    Hello 9
    Hello 9
    Hello 9
    Hello 9
    Hello 9
    Hello 9

You could fix this by defining a _loop function as above and forming the closure inside it; but changing variable scoping to be lexical in "blocks" wouldn't fix this case unless the list comprehension itself counted as a "block", which is not how Python defines blocks.

dkmar · on Oct 11, 2022

Wow. When I saw the original for-loop example I was like "ok not a huge deal, doesn't surprise me too much really". But rewritten in the comprehension form I'm like wtf

gshulegaard · on Oct 12, 2022

Well put. I too prefer the reasonability of lexical scoping. Although I must confess I am not well versed enough to quite grasp how what Python does _isn't_ conforming to lexical scope.

Since I was confused I did some internet searching and found this:

https://stackoverflow.com/questions/51604346/does-python-sco...

Which I think suggests the thing you are finding confusing isn't lexical scoping in Python, but rather the environment mutability.

Ultimately, I think I better understand what you are highlighting and can't say I entirely disagree. I just personally find how it is today ergonomic, but that could also just be bias as I am fairly comfortable in Python (warts and all).

randomswede · on Oct 12, 2022

I woudl say taht Python has scoping, yes, but it does not have lexical scoping in any sense of "lexical scoping" that I am aware of. If it did, the code below would not actually work ,as the outside-the-loop print would be trying to access a variable not available in the lexical scope of "the function body", as it is defined and established within the lexical scope of "the loop".

So, at least in my book, no, Python has "global scope", "function scope" and probably one or two more scopes (I think there's a "class scope" as well).

Here's some code in Python, and some equivalent code in Go.

    def foo(a_list):
        print(f"list is {len(a_list} elements")

        for element in a_list:
            print(element)
        print(element)

And here's the equivalent Go code:

    func foo(aList []int) { // Let's use ints...
        fmt.Printf("list is %d elements\n", len(aList))
        var element int // Notice this declaration! This is ensuring that element is declared outside the lexical scope of the for loop
        for _, element = range aList {
            fmt.Println(element)
        }
        fmt.Println(element)
    }

gshulegaard · on Oct 12, 2022

I think you are confusing type of scoping (lexical/static, dynamic) with scoping levels.

Python has lexical scoping, but it does _not_ have block level scoping.

https://en.wikipedia.org/wiki/Scope_(computer_science)#Block...

Which, now that I have done enough reading I think crystallizes the confusing thing for others being highlighted here (for me). Depending on preference, the lack of block scoping can be surprising for someone. Which also explains my bias, I started with Python which probably plays a large part in why I find function level scoping without block scoping ergonomic.

duped · on Oct 11, 2022

I think it comes down to if you think it makes sense for variables to be captured by value or by reference in closures.

Which ever makes the most sense depends on the context of a program which is why in languages that have both reference and closure semantics you can choose how the variables are captured. When that isn't the case you need to pick for someone, and it gets weird.

dragonwriter · on Oct 11, 2022

> I think it comes down to if you think it makes sense for variables to be captured by value or by reference in closures.

No, this comes down to “should a loop control variable be scoped to the block—or in python’s case function—the loop is in and updated with each iteration or a fresh variable scoped to each loop iteration that happens to share the same name.”

randomswede · on Oct 12, 2022

In the "the variable lives in the function scope", the answer is unequivocally "it should be bound once and updated", if it only exists in the scope of the loop function, both "it is bound on each iteration" (and thus safe to close over without surprise) and "it is bound once and updated" are valid answers, but I have a preference for the first, but many languages actually choose the second.

tmazeika · on Oct 11, 2022

The example makes sense when you think of `i` getting reassigned, but in other languages with variable shadowing (think Rust, but Go too when you're in a nested scope) you might replace `i = 2` with `let i = 2` (Rust), in which case the closure closes over precisely the `let i = 2` variable and the expected output differs.

_flux · on Oct 11, 2022

I think the reason this seems surprising is the value/reference semantics of values. For example:

    def foo(i, j):
      i[0] += 1
      j += 1
    a = [0]
    b = 0
    foo(a, b)
    print(a) # [1]
    print(b) # [0]

So people think the body of the loop like a function call.

I don't think it's a bad expectation, in fact I think it's quite a natural expectation—in particular if you've programmed functional languages where modifying values is the exception, not the rule—which is why it surprises people. It's just not the one way Python chose.

OkayPhysicist · on Oct 11, 2022

The problem is the intersection of people's intuition about closures from languages without mutable state bumping into the imperative world built of mutable state.

For example, equivalent code in Elixir:

    i = 0
    list = []
    list = list ++ [fn -> i end] 
    i = i + 1
    list = list ++ [fn -> i end]
    i = i + 1
    halfway = list
    list = list ++ [fn -> i end]
    i = i + 1
    list = list ++ [fn -> i end]
    IO.inspect Enum.map(halfway, fn f -> f.() end)
    IO.inspect Enum.map(list, fn f -> f.() end)

Would produce the functional-intuitive result of [0,1] [0,1,2,3]

Because there is no mutable state. Those repeated assignments to i and list are exactly equivalent to the scenario where each i was actually i1, i2, i3, etc.

masklinn · on Oct 11, 2022

> I thought that was rare

It is rare, if only because defining functions (lambdas or otherwise) in a loop is somewhat uncommon.

That's not very helpful when you hit that issue though.

alkonaut · on Oct 11, 2022

I expect that to write the obvious result (the last value assigned) and I also expect a capture in a loop to write the value as it was at the time of iteration.

They are both the behavior that’s intuitively obvious. And that’s despite them both being the same behavior.

As we can see, it’s important to not use the simple/obvious implementation because it’s so unintuitive it’ll need to be changed even if breaking (as in C#)

jrumbut · on Oct 11, 2022

I feel that the original Python example is suffering from a little too much lambda.

It seems like we're iterating through something to build up a computation we may execute later. I can conceive of a situation where you might want to do that, or at least consider it, but in general I say just do the work now and build a list of results.

BoppreH · on Oct 11, 2022

It looks like that because it's a minimal example. Real cases usually involve callbacks or expensive computations. How often it happens depends a lot on what you're doing.

asddubs · on Oct 11, 2022

seems like the non magic solution here would be block scope. I guess it would still be slightly magic in that each iteration gets its own scope, but at least that's easier to wrap your head around than just a special case

throw_m239339 · on Oct 11, 2022

This is mostly why Ecmascript introduced lexical scoping (a bit similar to "block" scoping) with "let" back int ES 5 I believe, because var used the function scope and developers would run into issues whenever they did this:

     for (var i = 0 ; i < 10 ; i ++ ){
         someElement.addEventListener('click',function(){
            console.log(i);
         })
     }

i would always be equal to 9. With let instead of var, i is properly scoped and the executed script display each increment correctly.

The solution before let was to introduce a closure in the for statement body

     for (var i = 0 ; i < 10 ; i ++ ){
         (function(i){
             someElement.addEventListener('click',function(){
                 console.log(i);
         }))(i)
     }

in order to capture i value. let can also easily isolate the scope of a variable so that it doesn't popule the global scope

    {
       let foo = "bar";
       var baz = "qix";
    }
    
    // foo is undefined here, while baz is defined.

which removes the need for self invoking functions.

Simplicitas · on Oct 11, 2022

Yup. "Match [human] intuition" is the operative phrase in all of this. Thanks. Hope Go addresses this soon.

dekhn · on Oct 11, 2022

I guess this is confusing but i just think of it as capturing the reference to i, not the value of i. It would be nice if python had a nice way to deal with this as many people also trip over:

def fun(initial_empty_list=[]): where initial_empty_list is a reference captured at function definition time, not a new value initialized on each call to the function.

littlestymaar · on Oct 11, 2022

Wow, Python and Go have it too? It was the biggest wart of pre-ES6 JavaScript, I never imagined other languages had it too (and it's honestly very disappointing for Go, since it's much more recent and we had plenty of hindsight when it was created…)

Edit: looks like Rust does it right: https://play.rust-lang.org/?version=stable&mode=debug&editio...

kevincox · on Oct 11, 2022

Not quite how you tested it. The `move` is actually copying the variable:

    fn main() {
        let mut functions = vec![];
    
        let mut i = 0;
        functions.push(Box::new(move || println!("Hello {i}")) as Box<dyn Fn()>);
        i = 1;
        functions.push(Box::new(move || println!("Hello {i}")));
        i = 2;
        functions.push(Box::new(move || println!("Hello {i}")));
        
        for func in functions {
            func();
        }
    }

But the most equivalent code would be the following which doesn't compile:

    fn main() {
        let mut functions = vec![];
    
        for i in 0..10 {
            functions.push(Box::new(|| println!("Hello {i}")))
        }
        
        for func in functions {
            func();
        }
    }

littlestymaar · on Oct 11, 2022

That's the point: the misleading code doesn't compile because the `i` variable scope in limited to the current execution of the loop (and that's why you can't borrow it for `functions`'s lifetime), which is exactly how ES6 fixed this in JavaScript.

tomtung · on Oct 12, 2022

Ah I have "fond" memories about once spending a good week debugging this, because I was also spawning threads/sub-processes in the loop, making this a Heisenbug that caused the program to crash every hour or so...

maxloh · on Oct 11, 2022

It is a scoping problem IMO. In JavaScript variables are block-scoped, i will be out of scope after the loop.

It would be better if Python could support block-scoping too. (Maybe it is time for inventing a "strict mode" for Python)

jmconfuzeus · on Oct 11, 2022

Technically, it makes sense for Python to work like this because your functions are being bound to a variable i which is located in the global scope.

I wouldn't call this a design mistake but more of a misunderstanding of how scope works.

remram · on Oct 11, 2022

If other languages get this right then it's not a fundamental misunderstanding but a quirk of that specific language.

jmconfuzeus · on Oct 11, 2022

Dunno what you mean by getting it right tbh.

for i in range(10)

The statement literally declares or sets a variable named i in that scope. When the loop exits, i still exists in the scope with the value 9. If you call a function that was given a reference to i, the value will be 9 as expected because the function was called after the loop exited.

Don't see any quirk or mistake here.

masklinn · on Oct 11, 2022

An equivalent construct works differently (and one would say in a way that is less error-prone) in other languages.

For instance the behaviour of a Python loop varies drastically depending on the size of the iteration:

     def loop(n):
         for i in range(n):
             pass
         print(i)

     loop(10) # 9
     loop(0)
     Traceback (most recent call last):
       File "<stdin>", line 1, in <module>
       File "<stdin>", line 4, in loop
     UnboundLocalError: local variable 'i' referenced before assignment

That Python works this way is specific to Python. And a language which doesn't have the issues this implies would be "getting it right", in the sense of avoiding sharp corners and edge cases.

jmconfuzeus · on Oct 11, 2022

I see, it makes sense to force people to use different variables for iteration in that case because it can easily trip inexperienced developers up.

int_19h · on Oct 11, 2022

The statement can mean whatever the language designers decide it should mean, and one plausible meaning is to introduce a new scope for the body of the loop. It's not even something unique in Python, given that sequence comprehensions do just that; e.g. this prints 0,1,...,9:

   for f in (lambda: print(i) for i in range(0, 10)): f()

Unfortunately, Python is simply inconsistent in this regard. For example, list comprehensions leak the variable for back-compat reasons, so if you substitute (lambda: ...) with [lambda: ...] above, you'll get a bunch of 9s.

But, backwards compatibility aside, the language could change to make for-loops behave like sequence comprehensions wrt scoping.

dragonwriter · on Oct 11, 2022

> Unfortunately, Python is simply inconsistent in this regard. For example, list comprehensions leak the variable for back-compat reasons

No, they don’t. They did in Python 2—list comps were introduced in 2.0, genexps in 2.4, and set/dict comps in 3.0 but also included in the later 2.7 release—but that’s been non-current for more than a decade, and conpletely out of support for two years. Let it go.

> But, backwards compatibility aside, the language could change to make for-loops behave like sequence comprehensions wrt scoping.

Sure in Python 4, but after 2->3, not sure many people are looking forward to that.

int_19h · on Oct 12, 2022

   Python 3.10.5 (tags/v3.10.5:f377153, Jun  6 2022, 16:14:13) [MSC v.1929 64 bit (AMD64)] on win32
   Type "help", "copyright", "credits" or "license" for more information.
   >>> [f() for f in [lambda: i for i in range(0, 10)]]
   [9, 9, 9, 9, 9, 9, 9, 9, 9, 9]
   >>> [f() for f in (lambda: i for i in range(0, 10))]
   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

remram · on Oct 12, 2022

I don't know what this is supposed to prove, it just shows that the generator is lazy while the list isn't. It seems unrelated to scoping issues or leaking variables.

int_19h · on Oct 12, 2022

It shows that list comprehension has a single loop variable across all loop iterations that gets reassigned on each iteration, while the sequence comprehension creates a new loop variable bound to current item on every iteration.

ekiru · on Oct 12, 2022

I don't think that's what's happening. In your example with the generator expression, you're calling each lambda as you iterate through the generator, which due to the lazy evaluation of the generator means that the value of the single i variable shared across all each lambda is still only the latest value reached.

If you instead fully evaluate the generator expression before calling any of the functions (for example, by passing it to the list constructor), you get the same behavior as the list comprehension case:

    >>> [f() for f in list(lambda: i for i in range(0, 10))]
    [9, 9, 9, 9, 9, 9, 9, 9, 9, 9]

int_19h · on Oct 13, 2022

You're right, but that means that sequence comprehension also leaks the variable, so it's even worse than I thought.

Side note: I think that commenters above didn't quite understand what I meant by "leaking", because there's more than one scope boundary here. Roughly speaking, any comprehension or loop can be desugared into something that looks like a C-style for-loop:

   /* scope 1 */
   for (/* scope 2 */) {
      /* scope 3 */
   }

Scope 1 is outside relative to the loop. Scope 2 is specific to the loop but shared by all its iterations. Scope 3 is specific to one loop iteration. The "leaking" I referred to above is from scope 3 to scope 2. I think other commenters took it to mean leaking from scope 2 to scope 1 - i.e. the ability to use the variable outside of the comprehension; that is, indeed, something that changed between Python 2 and 3.

Epokhe · on Oct 11, 2022

This seems like a cleverly crafted satire reply, rather than an actual complaint about python.

iudqnolq · on Oct 11, 2022

If so, it's a bit misleading. The go bug just requires a reference, rather than a lambda. I bucket lambdas into a category of language features I expect to have more sharp edges around capture semantics than references.

Bug:

    var all []*Item
    for _, item := range items {
     all = append(all, &item)
    }

Fix:

    var all []*Item
    for _, item := range items {
     item := item
     all = append(all, &item)
    }

BoppreH · on Oct 11, 2022

I assure you it's not satire, otherwise it wouldn't be such a popular question on StackOverflow[1] and an item in the Python FAQ[2]

[1]: https://stackoverflow.com/questions/2295290/what-do-lambda-f...

[2]: https://docs.python.org/3/faq/programming.html#why-do-lambda...

Epokhe · on Oct 11, 2022

Yeah, makes sense that it's popular. I just thought you were making a joke by creating a similar looking issue with a different underlying reason.

In Go example the issue happens because item variable is per-loop, in your python example the issue is not related to loops at all, it's just because functions capture the value of global variables at execution time.

And the cherry on top is that the solution is also similar looking(i=i), but working with a different mechanic underneath(default argument assignment).

Anyway, this was my perspective that led me to interpret this as satire. A bit disappointed haha

Semaphor · on Oct 11, 2022

If I do the same with for instead of foreach in C#, my IDE gives me a warning: "Captured variable is modified in the outer scope". Isn’t that the case for Python?

ketzu · on Oct 11, 2022

Copy pasting OPs code to pycharm: No warning.

jstx1 · on Oct 11, 2022

Pylint catches it - https://pylint.pycqa.org/en/latest/user_guide/messages/warni...

Semaphor · on Oct 11, 2022

Probably worthy of a YouTrack report ;)

maxloh · on Oct 11, 2022

I don't think it is a good practice to capture it in lint tooling, instead of by the language itself.

Semaphor · on Oct 12, 2022

It’s what linters are great at, catch things that work the way you don’t expect.

benhoyt · on Oct 11, 2022

I'm an experienced Go programmer and still accidentally do this every so often. It certainly bit me and my team several times when we were newish to Go. And particularly for newbies, it's hard to debug and understand when you do run into it. All of which to say I'm really glad they're trying to fix this.

fweimer · on Oct 11, 2022

I've been annoyed more than once by Java's “local variables referenced from a lambda expression must be final or effectively final” error. I used to suspect that this was just an excuse to simplify the javac implementation, but I'm not so sure anymore. It is interesting to see the issue from the other side.

The issue is not restricted to variables declared in loop headers, so the proposed loop change for Go might only be the start.

titzer · on Oct 11, 2022

> I used to suspect that this was just an excuse to simplify the javac implementation, but I'm not so sure anymore.

AFAICT it doesn't simplify javac much, if at all. It still needs to synthesize closure objects with fields to store the closed-over values. It's just that those fields can be final.

I think Java did this to avoid programmer confusion. I think it was the right choice.

twic · on Oct 11, 2022

It simplifies the implementation immensely, because the fields in those synthetic objects can be populated with copies of the variables from the stack frame. This:

    void foo() {
        int a = 1;
        Runnable r = () -> System.out.println(a);
        r.run(); // prints 1
    }

Can get turned into something like this:

    class r_closure implements Runnable {
        final int a;

        r_closure(int a) { this.a = a; }

        @Override
        public void run() { System.out.println(a); }
    }

    void foo() {
        int a = 1;
        Runnable r = new r_closure(a);
        r.run(); // prints 1
    }

The local a is a perfectly normal local, and the field a is a perfectly normal field.

What would have to happen if the variable was mutable? For example, if you wanted to write this:

    void foo() {
        int a = 1;
        Runnable r = () -> System.out.println(a);
        a = 2;
        r.run(); // prints 2
    }

You have to transform it to something like this:

    class r_closure implements Runnable {
        int a;

        r_closure(int a) { this.a = a; }

        @Override
        public void run() { System.out.println(a); }
    }

    void foo() {
        int _a = 1;
        r_closure r = new r_closure(_a);
        r.a = 2;
        r.run();
    }

Where there is no local, and where the method looks like it's accessing a local, it's actually reaching into the closure and mutating its field!

Now think about doing this if you've captured a variable in two closures, or a variable number of closures in a list. The wheels come off this approach.

Instead, you would have to promote the shared mutable variable to its own object, like this:

    class int_box {
        int i;

        int_box(int i) { this.i = i; }
    }

    class q_closure implements Runnable {
        final int_box a;

        q_closure(int_box a) { this.a = a; }

        @Override
        public void run() { System.out.println("q = " + a.i); }
    }

    class r_closure implements Runnable {
        final int_box a;

        r_closure(int_box a) { this.a = a; }

        @Override
        public void run() { System.out.println("r = " + a.i); }
    }

    void foo() {
        int_box a = new int_box(1);
        Runnable q = new q_closure(a);
        Runnable r = new r_closure(a);
        a.i = 2;
        q.run(); // prints 2
        r.run(); // prints 2
    }

Now you've taken a simple local variable which just need to be copied, and turned into its own thing on the heap!

fweimer · on Oct 11, 2022

It gets even worse. In the general case, you need to use a separate object for each captured mutable variable for space safety reasons (avoiding unwanted object retention not present in the source code). The easiest way to achieve space safety involves a separate object for each captured variable, but of course that is unnecessarily wasteful in many cases. But in order to coalesce captured locals into fewer reference objects, you need to do some sort of lifetime analysis to find cases where locals become unused at the same time.

masklinn · on Oct 11, 2022

> The issue is not restricted to variables declared in loop headers, so the proposed loop change for Go might only be the start.

Technically it’s not but practically it’s by far the most common way for this to unexpectedly arise.

The other cases like closing over a variable and then modifying it before the closure is invoked are a lot less common to hit unexpectedly, and a lot harder to fix nicely (short of Java’s big hammer).

tezza · on Oct 11, 2022

Yep, bitten by this one on repeat myself. definitely a WTF moment each time.

kubb · on Oct 11, 2022

same here, happened several times when launching a goroutine inside the loop

the fix was usually adding x := x

before go func() { do something with x }

masklinn · on Oct 11, 2022

A common alternative fix is to use IIFE, since the expression must be a call.

`go` evaluates everything but the final function call in the context of the caller. So

    go func (x int) { … } (x)

Will do the same, and is easier to extract to a named function.

Joker_vD · on Oct 11, 2022

The latter is also easier to re-inline accidentally because

    go func (x int) { do_work(x) } (x)

seems like a very roundabout way to say

   go do_work(x)

The "x := x" solution too suffers from this problem but slightly less: both idioms look like they are no-ops (while they are actually not) but at least "x := x" is weird enough to look like it was a deliberate choice, not some vestige from refactoring.

masklinn · on Oct 11, 2022

Er… it makes no difference?

Your first version is in fact a worse way to write the second one.

Joker_vD · on Oct 11, 2022

Huh. Now the closure-capturing in Go makes even less to me: instead of capturing the variable's current value it captures the variable itself i.e. puts &x into the closure instead of x — and no other piece of Go does that, although I was sure "go fun(args)" passed args by-ref but apparently not.

What's even the point of capturing the variable itself? To allow for writing inline callbacks that could sneakily mutate loop-local variables?

masklinn · on Oct 11, 2022

More generally allow the closure to write to its environment. That is the normal behaviour of closures in imperative langages, Java being the major exception because it rejects closing over non-final variable (and obviously that only blocks assignment, if the object is mutable you can do what you want to it).

gpderetta · on Oct 11, 2022

In c++ close by value/reference is under programmer control and there is no default

randomswede · on Oct 12, 2022

What would be the use of closing over the value rather than the variable? That would stop a lot of interesting use of closed-over variables (like persisting values from call to call).

masklinn · on Oct 13, 2022

In fairness while that would be a lot less convenient in reference-based langages in Go you could just close over a pointer to the variable, making the relationship explicit.

That’s how you’d do it using a [=] lambda in c++ or a move closure in rust.

Cthulhu_ · on Oct 11, 2022

rsc himself mentioned having been bitten by it on quite a few occasions, too.

Blikkentrekker · on Oct 11, 2022

It's even worse in python where loops do not even create a new scope and assign on each iteration. From reading this Go creates a scope for the entire loop, but assigns rather than initializes on each iteration.

Then again, Python has the same syntax for assignment and initialization.

zendist · on Oct 11, 2022

Thrilled to see Jared Parsons of the C# team pitch in and provide some perspective on how things were done for C#5 when a similar change was made. Kudos Jared!

int_19h · on Oct 11, 2022

What's interesting is that C# 5 release (which made the breaking change) was back in 2012, and both the change and the reasons for it were very widely discussed at the time. This is right around the time when Go shipped its 1.0, and it's kinda surprising that they either didn't look closely at "near-peer" languages, or if they did, couldn't see how this problem was fully applicable to their PL design, as well.

(Note that C# at least had the excuse of not having closures in the first version, which makes scoping of "foreach" moot - the problem only showed up in C# 2.0. But Go had lambdas from the get-go, so this interaction between loops and closures was always there.)

pjmlp · on Oct 11, 2022

Well, this isn't the only language feature where Go designers ignored the path trailed by other languages.

Still looking forward to the day they will discover Pascal enumerations.

pjmlp · on Oct 11, 2022

Same here, DevDiv is now polyglot focused, so you will see regular comments from .NET folks on other languages as well (mainly Java and Go). David Fowler tends to tweet every now and then about them as well.

Cthulhu_ · on Oct 11, 2022

Yeah, it's often a joy to read these posts and the perspectives on it; this is how programming language development should be done.

tialaramex · on Oct 11, 2022

> we gave the general rule that language redefinitions like what I just described are not permitted

Not just a "general rule" that document also specifically talks about precisely this issue (for loops) and resolves that Go will not fix this.

Change is the only constant, we should design systems with the expectation that they'll need to adapt over time or they will be replaced by something which can. With this mindset, Go should have solved the for loop problem years ago, just as C# did. This could have been a story about how once upon a time Go had these very silly for loop semantics, but that hasn't been true for many years.

This necessity of change is why I think the decision not to take Epochs for C++ 20 was much more consequential than things like rejecting the "Goals and priorities" paper which had immediate effects (in that particular case spurring the Carbon experiment).

skywhopper · on Oct 11, 2022

What’s your point here? They changed their mind, given years of data that showed this specific issue was worthy of an exception.

CodesInChaos · on Oct 11, 2022

> Epochs for C++ 20

How did they plan to make Epochs compatible with headers being copy&pasted at the #include site? And what about template instantiation?

tialaramex · on Oct 11, 2022

Epochs is intended to be used with C++ 20 Modules. An epoch can only apply to a whole module.

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p18...

hoseja · on Oct 11, 2022

Modules I suppose.

morelisp · on Oct 11, 2022

Arguably if Go had done this years ago, go.mod would not have existed and this would have an even bigger impact.

Just like we'll rarely pick exactly the right behavior, we'll rarely pick exactly the right time to fix the broken behavior.

apatheticonion · on Oct 11, 2022

I mean, Go had vendoring and vendoring managers not unlike npm and it's node_modules.

But a language more comfortable with change implies that change is more likely - not less likely. Given pretty much every modern language has some kind of dependency management utility - I'd be surprised if Go didn't end up with one

Ensorceled · on Oct 11, 2022

> Arguably if Go had done this years ago, go.mod would not have existed and this would have an even bigger impact.

I'm interested in the logical arguments that would support this. I can't think of any.

morelisp · on Oct 11, 2022

Are you of the opinion that the version in go.mod existed many years ago, or that requiring a recent number in it won’t comparatively limit the impact? Both seem obviously false to me.

Ensorceled · on Oct 11, 2022

I read this as:

    IF loop variable syntax had been redefined years ago
    THEN
        go.mod would not exist
        AND
        impact would have been larger

It seems from this thread that what they meant to say was:

    IF loop variables had been redefined BEFORE go.mod existed
    THEN impact would have been larger

johnnypangs · on Oct 11, 2022

I think they misinterpreted the word “existed” in this case. They took it as go mod would have never been made instead of go mod didn’t exist at that time.

Joker_vD · on Oct 11, 2022

But that's exactly what that sentence says, isn't it? Otherwise it'd been "Arguably if Go had done this years ago when go.mod did not exist, this would have had an even bigger impact" or something like this?

masklinn · on Oct 11, 2022

> But that's exactly what that sentence says, isn't it?

No?

> Otherwise it'd been "Arguably if Go had done this years ago when go.mod did not exist, this would have had an even bigger impact" or something like this?

That’s a worst take on the same sentiment? I find your version a lot clunkier. The original sets a hypothetical stage and from that its conclusion, I think it flows better.

Joker_vD · on Oct 11, 2022

> The original sets a hypothetical stage and from that its conclusion, I think it flows better.

The stage is "years ago", and conclusion is "go.mod would not have existed and this would have an even bigger impact". My version re-arranges the sentence so that "go.mod not existing" bit is a part of the premise, not of the conclusion.

Tense agreement in English subjunctive is hard. Especially for non-natives such as me: I do parse the original statement like that and just can't bring myself to understand it otherwise.

dpcx · on Oct 11, 2022

English native, and I parsed it the same way. The OG comment to me reads as if "go.mod never would have existed", not "go.mod didn't exist at this point in time."

Ensorceled · on Oct 11, 2022

Exactly, the comma is what does it, it separates the sentence into three fragments and then the 'and' joins segments two and three.

If the comma had instead been the word "when", as suggested, this would parse the other way. It still would have been a bit awkward but would make sense.

morelisp · on Oct 11, 2022

Ironically I have stopped using the word "when" in constructions like this because it confuses non-native English (esp. German) speakers who read it in the conditional rather than temporal sense.

varispeed · on Oct 11, 2022

How one depends on another? To me it sounds these things are orthogonal.

heleninboodler · on Oct 11, 2022

There's an ambiguity in the phrase "then go.mod wouldn't have existed." One way to read it is "then, as a consequence, go.mod wouldn't have existed" and the other is "then go.mod wouldn't have existed at that time." I believe the intention was the latter, whereas you're inferring the former.

room271 · on Oct 11, 2022

go.mod provides a way to introduce the new behaviour incrementally. The key point is the dependencies can (and already do) declare different go versions in their go.mod file. When everything is compiled, each module is compiled with the behaviour that applies to its own declared go version. So even if you update your go.mod go version to take advantage of this (soon to be) new behaviour, you can continue to use existing deps happily without worrying they will break.

masklinn · on Oct 11, 2022

Because of time? How would you use go.mod when go mod does not exist?

zebracanevra · on Oct 11, 2022

If you're at all interested in how for loops and scope work in Javascript, Jake Archibald and Surma have a great video on the topic: https://www.youtube.com/watch?v=Nzokr6Boeaw

Similar changes were made in newer versions of ES so that the for loop in this article works out of the box, like C#.

Slightly off-topic to this article: I wish "do while" loops had the "while" condition in the inner scope, not the outer scope. So many times I have wished that I could access the inner scope... I end up using a while(true) with an if { break; } at the end instead in 99% of cases where a do while could've been the perfect thing...

chrismorgan · on Oct 11, 2022

Except the ECMAScript changes weren’t changes but additions. Anything using the var keyword gets the traditional semantics, anything using let and const gets lexical scoping.

As an example:

  x = [];
  for (var i of [1, 2, 3]) {
      x.push(() => i);
  }
  x.map(f => f()) == [3, 3, 3]

  x = [];
  for (let i of [1, 2, 3]) {
      x.push(() => i);
  }
  x.map(f => f()) == [1, 2, 3]

What’s being proposed for Go is instead a breaking change.

foldr · on Oct 11, 2022

>What’s being proposed for Go is instead a breaking change.

It sounds like you only get the breaking change in modules that require above a certain version of Go. So this should not break old code. It's perhaps more analogous to the way that "use strict" in JavaScript 'breaks' parsing of octal constants.

chrismorgan · on Oct 11, 2022

Ah, my apologies; I missed that part. (I last used Go a decade ago, long before go.mod.)

Then this seems fairly reasonable, given also how rarely things will depend on one way or the other.

baq · on Oct 11, 2022

> What’s being proposed for Go is instead a breaking change.

given the context, it might actually be a fixing change...

masklinn · on Oct 11, 2022

> What’s being proposed for Go is instead a breaking change.

It’s a breaking change in the same sense that the `”use strict;”` semantics of JS were: it’s not actually a breaking change, because you have to opt in.

lovasoa · on Oct 11, 2022

Ecmascript had the same problem, and fixed it without breaking backwards compatibility. In javascript,

    for (var i=0; i<3; i++) setTimeout(() => console.log(i))

Would print 3,3,3. So in ecmascript 6, blocked-scope variables were introduced, but the semantics of old-style "var" declarations was not changed. You can now write

    for (let i=0; i<3; i++) setTimeout(() => console.log(i))

Which prints 0,1,2 as expected.

And you don't have to go open a `go.mod` file to know what the code you are reading does.

dilap · on Oct 11, 2022

Right call for js, but I think not for C# or Go, since as time goes on, the old behavior will be more and more of a relic, but you'll be paying the cost for having two syntaxes forever. (In js tho you have no alternative, because you don't have anything like go.mod (I think?) and you really can't break backcompat.)

xmprt · on Oct 11, 2022

I'm all for this change. I've had to explain this weird programming quirk of Go to coworkers on multiple occasions and can't wait for it to be solved.

Cthulhu_ · on Oct 11, 2022

I'm just happy that it's one of the few weird quirks of Go. In my spheres, Javascript is the most commonly used language, and the 'wat' presentation is frequently referenced to.

corsix · on Oct 11, 2022

Lua gets this right - the lowering of loops (e.g. https://www.lua.org/manual/5.1/manual.html#2.4.5) says “var is invisible” and has “local v = var”, the latter akin to Go’s “item := item”

shp0ngle · on Oct 11, 2022

If you never did a mistake connected to this and caused a bug in real prod code, you haven’t been writing go for long enough.

barsonme · on Oct 11, 2022

Ehhhh, I’ve been writing Go for ~10 years and I’ve never had this make it into prod. Many times in local development and testing, though.

ainar-g · on Oct 11, 2022

Or you've been using the looppointer analyser[1] from the start, heh.

[1]: https://github.com/kyoh86/looppointer

Traubenfuchs · on Oct 11, 2022

Ideally at least one of the tests out of your testing pyramid would catch something like that though.

Cthulhu_ · on Oct 11, 2022

Ideally yeah, but unfortunately very few people work in ideal circumstances.

viiralvx · on Oct 11, 2022

Yeahhhhh, I worked at a big company on their CI product and I had this bug bite me and break builds for a little bit.

Luckily caught my mistake and fixed it within 15 minutes, but it's always easy to overlook.

bandrami · on Oct 11, 2022

I internalized early on that the likelihood of this mistake means Go wants me to refactor when I find myself liable to making it.

thiht · on Oct 11, 2022

I fail to see any downside to changing this semantics. This has honestly always felt like a language design bug more than anything. Having to write foo := foo at the beginning of a loop for it to behave as expected is a strong design smell.

The gradual breaking (of fix depending on your point of view) with explicit opt-in looks great to me.

the_gipsy · on Oct 11, 2022

> Loop variables being per-loop instead of per-iteration is the only design decision I know of in Go that makes programs incorrect more often than it makes them correct. Since it is the only such design decision, I do not see any plausible candidates for additional exceptions.

The bar is quite low.

dontlaugh · on Oct 11, 2022

It's also very subjective. Just off the top of my head I can think of:

- zero values instead of sum types

- nil, especially on interfaces

- goroutines sharing memory

- goroutine panics not being raised in the spawner by default

seer-zig · on Oct 11, 2022

golang has quite the suspect decisions that show up when writing large programs https://www.uber.com/blog/data-race-patterns-in-go/

cesarb · on Oct 11, 2022

Working link: https://www.uber.com/en-US/blog/data-race-patterns-in-go/

(For me, the link you posted does a 302 redirect to https://www.uber.com/pt-BR/blog/data-race-patterns-in-go/ which gives me a 404 error page. It's a bit insane that whether the link you posted works or not depends on your locale, and unfortunately this is not the first time I've seen this kind of baffling redirect misbehavior.)

fomine3 · on Oct 11, 2022

It's great that who worked for similar changes on C#, responses immediately.

jupp0r · on Oct 11, 2022

How does this work,

in this code:

    var all []\*Item
    for _, item := range items {
         all = append(all, &item)
    }

When &item is the same for all iterations, that means that it's pointing to the same memory address. Is each item in items copied to this address prior to each iteration body invocation? This seems strange as this copy could potentially be very expensive. What am I missing?

foldr · on Oct 11, 2022

To add to the sibling, you could iterate by index to avoid the copy:

    for i := range items {
        ptr := &items[i]
        ...
    }

In this case of course you also get different semantics. The ptr variable is bound to the address of each of the original Item values in the slice, whereas in the code in your comment, &item is the address of a single heap-allocated Item variable.

ytch · on Oct 11, 2022

Could we assume it is like executing:

    var item Item

    item = items[0]
    all = append(all, &item)

    item = items[1]
    all = append(all, &item)

    ...

silisili · on Oct 11, 2022

Nothing, you got it.

Most of the time you're iterating a slice of pointers, though, so only the address gets copied. And in those cases, this bug doesn't exist(unless of course you're going from * to ** for some reason).

skywhopper · on Oct 11, 2022

You aren’t missing anything. How expensive the copy is depends on the type of item. This is why you should probably use pointers in slices you’re going to loop over instead of the object itself.

hardware2win · on Oct 11, 2022

How did they even end up with such a crazy behaviour?

masklinn · on Oct 11, 2022

Technically there is no crazy behaviour, it’s a natural consequence of scoping loop variables outside the loop, which historically was common.

It became an issue as lambdas and other lambda-type constructs (which implicitly keep a reference on the loop variable) became more common, and a bunch of languages got caught in it. Later languages switched to the “inner scoping” mechanism to avoid it.

Go did not follow the switch because Go.

nemetroid · on Oct 11, 2022

This is not really what's happening. The variable is scoped to the loop. The issue is that even if you take the address of the variable, each iteration of the loop will have the same variable with the same address.

    nums := []int{1, 2, 3}
    for _, num := range nums {
        fmt.Printf("%p\n", &num)
    }

This will print the same address three times. If you add "num := num" as the first line in the loop, it will print three different addresses. The proposal is to make this the default behaviour.

masklinn · on Oct 11, 2022

> This is not really what's happening.

Yes it is.

> The variable is scoped to the loop.

Maybe “loop body” would make the comment clearer?

> The issue is that even if you take the address of the variable, each iteration of the loop will have the same variable with the same address.

Hence the variable not being scoped to the loop (body), one variable is shared between all iterations of the loop.

nemetroid · on Oct 11, 2022

> Maybe “loop body” would make the comment clearer?

I think "loop iteration" might.

foldr · on Oct 11, 2022

Arguably not, because an 'iteration' is a unit of execution, not a lexical unit. If the variables were really scoped to loop iterations, that would be a form of dynamic scope, which would have a different semantics. So for example, say the loop calls a function foo. This function executes inside every iteration of the loop, but within foo, one cannot access the loop iteration variable (as it is in a different lexical context).

nemetroid · on Oct 11, 2022

My point is precisely that lexical terms (like loop body) are insufficient to explain the Go behaviour. The loop variable is clearly lexically scoped to the loop body. "Scoping loop variables outside the loop" is commonly understood as the Python gotcha:

    for x in range(3):
        foo(x)
    print(x)  # prints 2

or the pre-C99 style:

    int i;
    for (i = 0; i < 3; ++i)
        foo(i)
    printf("%d", i);  // prints 3

This is not the case in Go. I don't think talking about variable scopes accurately describes the issue (because there's nothing special about loop scopes here: the same "escape" can happen from any scope), and changing "loop" to "loop body" doesn't improve this. The term "loop iteration" at least identifies the dependency between different iterations of the loop as the issue.

I don't understand what the thought experiment about non-lexical scopes has to do with this.

foldr · on Oct 11, 2022

It's purely a question of scopes, as indicated by the equivalent code in the article:

    for _, elem := range elems {
      elem := elem
      ... &elem ...
    }

Nothing beyond regular lexical scoping and Go's ordinary assignment semantics are necessary to see how this works. The second 'elem' has a narrower scope than the first (it is limited to the loop body). Abusing Go syntax, you can think of the current semantics as follows:

    {
      var elem Elem
      for _, elem = range elems {
        ... &elem ...
      }
    }

Here 'elem' scopes outside the loop body, and so is reassigned on every iteration of the loop (and &elem evaluates to the same address on every iteration).

>the thought experiment about non-lexical scopes

It's not just a thought experiment. There are languages with dynamically scoped variables (e.g. global vars in Common Lisp).

lmz · on Oct 11, 2022

I understood the JS version of this problem, but to see it happen with an address-of operator is just weird. In C if I took the address of a function scoped / temporary variable and kept it around it would be very bad.

I guess taking the address "promoted" the shared variable to enable it to survive past the function?

masklinn · on Oct 11, 2022

Yes, if you take a reference and it “escapes” the variable gets boxed (promoted to a pointer on the heap).

yencabulator · on Oct 11, 2022

Said differently: Go puts variables on the stack only when the compiler can prove no references escape the stack frame. Stack is purely an optimization.

masklinn · on Oct 11, 2022

Except

1. Java is also supposed to do that, and clearly way less successful

2. Escape log is part of the standard baseline tools (it’s just a flag), so it’s not considered anything hidden or arcane

3. Go tends to log escapes, meaning it’s baseline is to stack allocate

So it’s a lot more than, say, common subexpression optimisation.

yencabulator · on Oct 11, 2022

The subtle difference between the two ways of communicating the design: There are cases where the reference does not escape, but the compiler doesn't know how to prove that. So saying "if it escapes it's boxed" is subtly wrong.

The default is heap, and only when the compiler can be sure it's safe, things go on the stack.

foldr · on Oct 11, 2022

It’s just what happens if you interpret the for loop as modifying a single variable binding on every loop rather than rebinding on every loop. That’s how loops have normally been conceived in imperative languages, but it doesn’t make much of a difference until you add closures (which can capture the binding) or garbage collection (allowing you to capture the address of the ‘local’ variable). The same issue existed with ‘var’ for loops in JS and bit me several times: https://stackoverflow.com/questions/750486/javascript-closur...

beeforpork · on Oct 11, 2022

If you hide pointers and destructively update variables, this happens easily. I'd say the Go problem is a bit worse because there are also exposed pointers via that '&' operator.

But the problem should have been known when the language was designed. In CommonLisp, i.e., quite a mighty but old language, it is the same: the capture is on a loop variable that is destructively updated:

    (setf refs (loop for i in '(1 2 3) collect #'(lambda () i)))
    -> (#<function) #<function> #<function>)

    (mapcar #'(lambda (f) (funcall f)) refs)
    -> (3 3 3)

The same done with a fresh function parameter works. And since this is the usual style in Lisp, I suppose the problem will not be that obvious like in Go:

    (setf refs (mapcar #'(lambda (i) #'(lambda () i)) '(1 2 3)))
    -> (#<function> #<function> #<function>)

    (mapcar #'(lambda (f) (funcall f)) refs)
    -> (1 2 3)

the_mitsuhiko · on Oct 11, 2022

If only this was a go specific problem:

    >>> def foo():
    ...  return [lambda: x for x in range(10)]
    ...
    >>> [x() for x in foo()]
    [9, 9, 9, 9, 9, 9, 9, 9, 9, 9]

randfur · on Oct 11, 2022

I remember learning about this gotcha around the time that Go was coming out. I was very disappointed that they were sticking to this behaviour around the same time that JavaScript was fixing it. As a new language they get less sympathy from me.

lawn · on Oct 11, 2022

It does feel like Go is relearning lessons of other languages quite often. It's good that this and generics could be fixed later on, but it's unfortunate that they couldn't fix the null pointers (and going further always defaulting to some default value) and error handling.

masklinn · on Oct 11, 2022

> It does feel like Go is relearning lessons of other languages quite often.

A common sentiment when Go was unveiled was that its designers had ignored the 35-odd years of language research and experience since C.

int_19h · on Oct 11, 2022

Python actually does this right for sequence comprehensions:

   >>> [f() for f in (lambda: x for x in range(10))]
   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

List comprehensions have been around for longer, and so behave the way they do for backwards compatibility reasons.

mpartel · on Oct 11, 2022

Several languages have or have had this behaviour. C# made a similar change to its 'foreach' loops a long time ago (but not 'for' loops for some reason).

I suppose it's the natural way to do it when implementing a language and not thinking about it too much. It makes a for-loop equivalent to a simple while loop with the loop vvariable initialized outside of the loop.

int_19h · on Oct 11, 2022

The C# change is described in detail in the linked GitHub issue - one of C# devs left a comment explaining it.

TL;DR is that both "for" and "foreach" scoping fixes would be breaking changes, but "foreach" was easier to justify because it was already a C#-specific construct syntactically, unlike "for" which uses the same exact syntax as C, Java etc, and they were very sensitive to backwards compatibility at the time (esp. since the tooling didn't have the ability to target various language versions within the same project easily). At the same time, "foreach" represented the vast majority of breakage when they looked at existing code, perhaps because the scoping in classic "for" is more obvious due to the fact that variable mutation is explicit there.

Blikkentrekker · on Oct 11, 2022

To be fair it is no more difficult to implement it with the variable initialized inside of it. In fact, it's easier to.

hardware2win · on Oct 11, 2022

In c# you mean closure capture?

masklinn · on Oct 11, 2022

Almost certainly, that is the vehicle for the issue in most languages.

mmis1000 · on Oct 11, 2022

I think it's quite normal. Because the common three part for semantic is just a suger of

    initVariable()
    while checkVariable():
       doSomethong…
       updateVariable()

So the variable is shared by all iterations at first place. And that isn't a issue until we have lambda or something similar that can capture reference of loop variable.

Later languages found it is problematic when use with reference capturing features and changed to something else (one variable per iteration)

jlouis · on Oct 11, 2022

It is a combination of scoping, which other people already wrote about, and also because the language support mutable data by default. Had the language been immutable/persistent by default, this particular semantic problem cannot occur in the first place.

Complexity increases rapidly when you combine constructs in a programming language. You get some feature interactions which are hard to get right, and also to predict.

mftb · on Oct 11, 2022

I'm happy the Go team takes breaking changes seriously. They should go ahead and fix this. The current semantics are lame.

crnkofe · on Oct 11, 2022

I gave this post all the upvotes and hearts I got. It's one of those weird Go language quirks where developer expectations just never match what's going on behind the scene. I still often find myself always looking at the loops if they actually modify/use correct variables.

bborud · on Oct 11, 2022

That was a very well written issue.

mikepurvis · on Oct 11, 2022

Shockingly high quality discussion too— have the comment moderation tools on Github gotten better in recent years, or is this just the golang community being awesome?

hknmtt · on Oct 11, 2022

Strange, never had a problem with this. If I need to reference non-pointer I just use key(append(foo, items[k])) but i find it weird that the print example is not working since in loop the value is passed by copy, just like everywhere else in go, so it should work correctly. as for pointer, again, pass by copy. I see no problem here, it either never bit me in my ass or i figured it out soon in my Go journey.

tuukkah · on Oct 11, 2022

I find this same semantic in JavaScript weird albeit practical. Many descriptions of either for-loops or let-variables don't mention that their combination is a special case: each iteration gets its own instance of the variable, but somehow the result of i++ gets copied to the following iterations.

Here you can see that the i from the previous iteration gets copied first and i++ applies to the next iteration:

    for (let i = 0; i < 10; i++) { 
        setTimeout(() => {
            console.log(i);
        }, 1000);
    }
    // prints 0...9 (as expected?)

I always thought of the i++ happening at the end of the previous iteration, but that's wrong as it produces a different result if written explicitly:

    for (let i = 0; i < 10; ) {
        setTimeout(() => {
            console.log(i);
        }, 1000);
        i++;
    }
    // prints 1...10 (as expected?)

(Also, these don't change if I remove the curly braces, so the let is not scoped within curly braces as I thought...)

roywiggins · on Oct 11, 2022

My knowledge of javascript semantics is pretty limited, but you can see what's happening more clearly:

    for (let i = 0; i < 10; ) {
        setTimeout(() => {
            console.log(i, foo);
        }, 1000);
        let foo = "baz-"+i;
        i++;
    }
    // "1 baz-0", "2 baz-1, etc

It's behaving as each loop iteration is creating its own closure, and the inner function is referring to those variables by reference. So any changes you make to them inside the body of the loop will end up visible to the inner function.

tuukkah · on Oct 11, 2022

My point is that for the i++ to make the loop advance (whether it is written inside the block or outside it in the header), behind the scenes, the value of the old i is copied to the new i at the beginning of each iteration.

I think to then understand why my two examples produce different results, you have to know that the i++ in the header happens to be executed after that copying has been made (an arbitrary choice?), while the i++ within the body will be (naturally) executed before the copying.

I suppose the way it works can be intuitive, but it can also be confusing if you think of the i++ as the last action of each iteration in both of my example cases.

Actually, there's a third way to write the example loop, but who can guess which result it gives?

    for (let i = 0; i++ < 10; ) {
        setTimeout(() => {
            console.log(i);
        }, 1000);
    }

... it will print 1...10! So even within the loop header, one part is run before the copying and another part after the copying. How is this intuitive and where is this documented apart from the language spec?

EDIT: My bad, of course the iteration condition has to be checked at the beginning of the iteration, so in this third example i == 1 during all of the first iteration).

Here's the relevant spec section 14.7.4.3 ForBodyEvaluation: https://tc39.es/ecma262/multipage/ecmascript-language-statem...

There we can finally see that on the first iteration, an environment is created (and iteration variables copied) before the test ("step 2"). After that, a new environment is created (and iteration variables copied) towards the end of each iteration, before the increment step ("step 3.e" and "step 3.f").

tuukkah · on Oct 11, 2022

Another commenter links to a great video that explains the same thing about the spec and how confusing the environment creations can be: https://news.ycombinator.com/item?id=33160373

jstimpfle · on Oct 11, 2022

WTF. I haven't been interested in Go after looking at it briefly a decade ago. This post brings back the weird feeling about it that I had then. It looks like taking the address of a local variable (item := item; append(..., &item);) and using that outside the scope of the local variable. But apparently that is ok.

Variations of such perceived weirdness exist in many other languages with complicated "object models" as well to be fair. Delphi has some strange adressing stuff going on as well. Python has this weird "default list" thing. Most object languages don't let you take the address of something (like the Go example shows) at all, but have only object references which I find unergonomic.

benhoyt · on Oct 11, 2022

Yes, taking the address of a local variable is exactly what it's doing. But Go's compiler (and garbage collector) ensures it's safe to do that. The compiler will allocate on the stack where possible, and the heap where necessary, invisible to you. This is all very normal in Go.

I'm not sure prefixing your comment with "WTF" and your 10-years-ago dismissal helps the discussion here. Yes, as we've learned, this was probably the wrong decision, but it's not hard to see why it was done that way originally (C# made the same decision), and now they're having a reasonable technical discussion to try to solve it. And -- even though I've been bitten by this several times myself -- it's not a terribly common occurrence.

quietbritishjim · on Oct 11, 2022

I can see why it's a surprise. Most languages that I know of fall into one of two types: (1) garbage collected and assigning a variable or passing to a function actually passes a reference to the object (certainly true of Python, Java, C#); (2) memory is manually managed and you can take the address of an object (C, C++, Rust).

That history makes it feel like "taking the address" is a really trivial operation - returning a numerical value that the compiler had access to at that point anyway. Here it's adding a reference to the object in some sense, and maybe even changing how it's allocated earlier in its lifetime (on the heap rather than the stack). I don't use Go and I agree that using &x for that operation feels a bit wrong as an outsider.

masklinn · on Oct 11, 2022

Fwiw the issue occurs in langages of category (1) though only in a subset of the cases. Generally closures as they implicitly take references on their lexical context.

It also occurs in langages of category (2), specifically C++ lambdas where i think it can cause UAF/UB. I assume it also happens in C with the block extension (is that still Apple specific?) though I don’t know the details of that thing so maybe not.

quietbritishjim · on Oct 11, 2022

The discussion here is specifically about using & to extend the lifetime. In the first case you mention, you don't use the & operator. In the second case, you do (at least with C++ lambdas), but there's no lifetime extension going on.

foldr · on Oct 11, 2022

There's no special 'lifetime extension' operation involved. Semantically, everything is heap allocated† in Go. The garbage collector takes care of deallocating it when nothing references it anymore. In other words, at the level of the language semantics, all tracking of lifetimes is done dynamically at runtime by the garbage collector.

Go does in fact stack allocate variables which it can prove not to outlive their lexical scopes, but this is merely an optimization. Unless you are trying to write optimal code, there is never any reason to think about which values are stack allocated in Go.

There's not really any such thing as a 'local variable' in Go. A variable has whatever scope it has, but there's nothing special, semantically speaking, about variables defined inside functions or inside loops.

If the use of & in the example code is puzzling, it's probably because you're expecting Go to have some C-like concept of an automatic (i.e. stack allocated) variable – but it just doesn't.

>That history makes it feel like "taking the address" is a really trivial operation - returning a numerical value that the compiler had access to at that point anyway.

It is in fact a trivial operation in Go too, as I hope the above has clarified.

---

† Strictly speaking 'semantically heap allocated' is nonsense, but hopefully you know what I mean. There is no way to declare a variable in Go in such a way as to force it to be deallocated at the end of a particular lexical scope. A variable's lexical scope and its lifetime are entirely divorced (as is typical in a GCed language).

quietbritishjim · on Oct 12, 2022

> There's no special 'lifetime extension' operation involved. ... The garbage collector takes care of deallocating it when nothing references it anymore.

I never used the word "special". As you say, adding a reference will mean the garbage collector won't deallocate it (until that reference is removed). In other words... its lifetime is extended. That's exactly what I meant.

> If the use of & in the example code is puzzling, it's probably because you're expecting Go to have some C-like concept of an automatic (i.e. stack allocated) variable ...

Not at all. In C++, you can use & on a reference variable and it will return the address of the object being referred to, regardless of whether it is allocated on the stack or the heap (or even statically allocated). Even in C, you can do &*x on a pointer to any object (which is silly by itself, but useful when combined with pointer arithmetic e.g. &x[3] translates to &*(x+3)).

> It [the & operator in Go] is in fact a trivial operation in Go too, as I hope the above has clarified.

Maybe I should have avoided the word "trivial" as its meaning is subjective, but I was careful to define what I meant by it: "returning a numerical value that the compiler had access to at that point anyway". Your comment just confirms that, as I said, it does more than that – it also adds a reference to the object.

---

To be clear, I'm not saying that it's bad or wrong that Go uses the & operator to mean this. Once you're familiar with the language, you probably get used to it very quickly. My point was just that it's a surprise initially if you're not familiar with the language, that's all.

foldr · on Oct 13, 2022

>Your comment just confirms that, as I said, it does more than that – it also adds a reference to the object.

It simply evaluates to the address of the object, just as it does in C. if you think the & operator is doing something in addition to this, I think that must just be based on a misunderstanding.

I am not quite sure what you mean by 'adding a reference' to the object.

Let's take this function:

    func foo() *int {
      var x int
      return &x
    }

All that happens is the following:

- An integer is allocated (and initialized to zero).

- The address of this integer is returned.

If we dig into the implementation, we'll see that the integer is allocated on the heap. As far as Go's language semantics are concerned, everything is allocated on the heap and left to the GC to clean up.

As an implementation detail, values that provably don't outlive their containing functions are (sometimes) stack allocated. As x outlives its containing function, it won't be stack allocated. That's it. There is no special operation of 'adding a reference' or 'extending a lifetime'. Nor does the compiler even analyze lifetimes except for the purposes of applying an optional optimisation which has no effect on the semantics of the program. If you turned this optimisation off (which you totally could) then there'd be no need for the compiler to worry about x's lifetime at all.

quietbritishjim · on Oct 13, 2022

> All that happens is the following:

> - An integer is allocated (and initialized to zero).

> - The address of this integer is returned.

That is not all that happens, at least down at the C/assembler level.

Let me illustrate what I mean. Consider this function, which also does both of these things (cobbled together from Google searches so please excuse incorrect syntax):

    func foo() uintptr{
      var x int
      return uintptr(unsafe.Pointer(&x))
    }

All that function does is allocate an integer (and initialise to zero) and return the address of that integer. Exactly the same as your function, right? Except it's obviously not - it doesn't extend the lifetime of the integer variable.

So why not? The GC somehow knows to ignore the number returned from my function, even though, under the hood, it's still stored in a register or stack location or whatever in exactly the same way as the address returned from your function. So how does the GC know to ignore it? Is that number somehow marked in a way that says "GC, when you're scanning memory looking for address-like numbers, don't pay attention to this one"? No. It doesn't look at the number in the first place because it hasn't been told to look at it.

In contrast, in your example, the memory address is not just returned from the function (in the C sense that it's put in a register for the caller to receive). It, additionally, somehow registers that memory address with the GC to let it know that there's another reference to that variable location. That is the extra thing that your function does that mine doesn't. And that magic happens (or at least starts) at the moment you use the & operator.

foldr · on Oct 14, 2022

Yes, Go has a precise (i.e. non-conservative) garbage collector. It seems odd to me to think about that as some kind of special feature of the & operator. Even if one does, it's certainly not a surprising feature. Knowing that Go has a precise GC, one certainly expects the GC to know that the value of &x references x. If it didn't that would be a major bug.

The Go GC isn't a reference counting implementation. It traces the values of variables on the stack and it knows their types (because it knows which function any given stack frame corresponds to and it knows which variables that function allocates). Thus it knows that if a variable is of type *int and has a non-nil value then its value references an int. (And so on for fields of structs that are stored in stack variables, etc.) The & operator does not need to do anything special. The & operator merely takes the address of the object. When that address is stored in a pointer variable (or array member, or struct field...), that's when it becomes visible to the GC as a reference.

vips7L · on Oct 11, 2022

This cannot occur in Java. Any captured variables by lambdas must be effectively final.

jstimpfle · on Oct 11, 2022

I apologize for being snarky. "WTF" is an expression of surprise though, much more than it is criticism. My point was that these object models achieve a desired level of user-friendliness and safety at the cost of being less orthogonal and less composable (as compared to, say, C) and having weird corner cases and surprises that catch you off-guard.

jerf · on Oct 11, 2022

It can be argued, based on the number of people who at some point write code in C that takes the address of a stack variable and returns it back out of the function scope, that the "less orthogonal" corner case that catches you off guard is the way C forbids that action. Do not mistake internalized concepts from a particular language as some sort of divinely approved dictate of how programming must work.

This was basically Dijkstra's point in his BASIC considered harmful post... I think in 2022 it should be C considered harmful for the same reason. C is not the base truth of computation. It isn't even very good. A language smart enough to analyze taking pointers and notice it can't put something on a stack and simply take care of it is, in my opinion, the one that is not catching you off guard... specifically, the "guard" that one must take in C around what is stack versus heap.

int_19h · on Oct 11, 2022

C# made the same decision because it did not have closures initially, and so there was no practical way to observe the difference.

But Go did have closures initially, and worse yet, they already had C# as an example of how closures and loops interact. So they definitely had the opportunity to learn from that mistake, and I don't think it's unreasonable to ask why they did not.

origin_path · on Oct 11, 2022

I think the bar to a "WTF" reaction here is lower because this is the sort of thing we've come to expect from Go. I'm in the same boat as jstimpfle where I looked at Go some years ago, found it to be remarkably quirky for a relatively traditionalist language that isn't trying to do lots of new ideas, and haven't yet regretted staying away.

It's not just this bizarre gotcha (the fact that C# had it too doesn't make it OK). It's that Go has so many of these cases where they took very strong positions on things and then later reversed their position only after many, many years:

- Generics

- Only one gc knob

- No backwards incompatible language changes

Also, Go has been around quite a long time now and we've all read quite a few rants about its surprising cases. How comes this one never came up before? The thread provides evidence that it bites people regularly. It suggests to outsiders that you can't easily evaluate Go by reading about it because there will be sharp edges that people aren't talking about simply due to the quantity of things that are even worse.

grey-area · on Oct 11, 2022

You have remarkably strong opinions for someone who has not used the language much.

Personally after using it for 10 years I've been bitten very rarely by weird corners of the language and have enjoyed using it. My complaints are more around things I'd rather see removed (struct tags, panic, nils) and inconsistencies (built-in generics were quite limited, I quite like the design for generics they came up with though so I guess that is resolved once they update the stdlib).

Overall it's still my favourite language compared to others I'm forced to work in, I particularly like the decision to eschew inheritance.

doctor_eval · on Oct 11, 2022

I agree with this take wholeheartedly. Go is a pragmatic language. Some of the design decisions make a lot more sense when you use it, and because Go seems to have a culture of utility and self reflection, I think you see more openness and constructive criticism than in some other languages I’ve used.

Cthulhu_ · on Oct 11, 2022

I mean Javascript does / did the same thing (var loop variables get hoisted to the function scope, so they are available outside of the loop; add to that that `range` creates a pointer and you have a perfect storm of weird, confusing things.

But this is an important point: They were aware of it; it was by design; they had the chance to change it before 1.0 and didn't, and now they are showing willingness to change. So many languages are resistant to change on the one hand (I mean Go is), yet not resistant to keep adding features (e.g. Java / JS).

udp · on Oct 11, 2022

> Javascript does / did the same thing (var loop variables get hoisted to the function scope, so they are available outside of the loop; add to that that `range` creates a pointer and you have a perfect storm of weird, confusing things.

let is preferred nowadays in JS and doesn’t have the weird hoisting behaviour that var does/did. JS has neither “range” nor pointers though so I’m not sure what you mean by that.

silon42 · on Oct 11, 2022

I also feel there is a problem in that line, not necessarily just a for loop.

Maybe that's my C/C++/Rust experience.

tialaramex · on Oct 11, 2022

In Rust we know we only gave the append function an immutable reference which lives until the next iteration. If append is OK with that, we're golden. Maybe internally it clones these references -shrug.

If append() needs the thing, not just an immutable reference which expires soon, its signature would demand we move one into it, and we don't have one so we'd need to e.g. make one with Clone.

masklinn · on Oct 11, 2022

TBF Rust also intentionally used the semantics Go is migrating to from the start, because:

- experience with the issue in languages with wider scoping e.g. it’s a common issue in JS, as well as Python (though slightly less so) - Rust’s iterators were originally internal so that was pretty natural

(1) is also why for(let and for(const have different scoping than for(var in JS: `var` has function scoping, `let` and `const` were introduced with block scoping, and for loops they were specifically specced with “inner” (per-iteration) scope.

tialaramex · on Oct 11, 2022

If you take a Rust for loop, de-sugar it to produce the actual loop { } which would run, and then modify that loop to have the Go semantics - with a single long-lived variable which is re-assigned each iteration, Rust detects the faulty Go cases of course.

[Edited: I tried to explain what's going on here, but I don't think my explanation was helpful so I've just left the surface]

masklinn · on Oct 11, 2022

> Rust detects the faulty Go cases of course

Yes Rust’s ownership rules make it rather complicated to reproduce the faulty behaviour, as it’s about sharing mutable state which Rust intensely dislikes. You’d need to wilfully share (and update) internally mutable structures (cells, atomics) which is pretty noticeable and not something you do by mistake.

Blikkentrekker · on Oct 11, 2022

Aside from Python and the Shell, it never occurred to me that any language could possibly think of other semantics. It's news to me that C++ also assigns rather than initializes on each iteration.

It's simply a very bad idea that provides no use yet creates many bugs.

masklinn · on Oct 11, 2022

The thing is it only create many bugs when you throw closures into the mix. So historically languages did it because it was easy to implement (create a counter, increment the counter, run the loop body).

Before the early aughts, closures were mostly really common in functional languages which tend towards immutable bindings (and immutability in general), and very closure-focused languages closured everything so didn’t hit that issue (e.g. you wouldn’t hit it in Smalltalk because your counter would be a parameter to a block, so closing over that was no issue).

It’s really in the 00s with the explosion of callbacks-pile-javascript (and more generally the functionalisation of imperative languages) that the problem became a serious concern: you loop over a thing, you start some sort of async operation (network request for instance), and you find out that despite the request being correct the entire thing goes wonky (then again things commonly went wonky which didn’t help).

Blikkentrekker · on Oct 11, 2022

> The thing is it only create many bugs when you throw closures into the mix. So historically languages did it because it was easy to implement (create a counter, increment the counter, run the loop body).

It's a problem with references in general as this shows.

I also don't feel it's easier to implement at all.

One can either rewrite:

   for $id:var in $exp:iter { $code:body }

to:

   { let $id:var;
     while(True) {
       let result = $exp:iter.next();
       if(result.is_none()) break;
       $id:Var <- result.extract();
       $code:body
     }
   }

Or

   while(True) {
     let result = $exp:iter.next();
     if(result.is_none()) break;
     let $id:var = result.extract();
     $code:body
   }

The latter implementation is as far as I see easier, not more complex. Obviously all the code to create scoping already exists in the compiler and for-loops over an iterator work with a syntactic rewrite to an infinite loop with a break.

masklinn · on Oct 11, 2022

> It's a problem with references in general as this shows.

Most languages don't have references, and in those that do before the issue was understood, the explicitness made it a much smaller issue.

> I also don't feel it's easier to implement at all.

> One can either rewrite:

Now try lowering to bytecode or assembly instead of high-level pseudocode.

Blikkentrekker · on Oct 11, 2022

> Most languages don't have references, and in those that do before the issue was understood, the explicitness made it a much smaller issue.

But Go and C++ do, where this issue arose with or without closures.

> Now try lowering to bytecode or assembly instead of high-level pseudocode.

It doesn't matter, because as I said, all that is already in the compiler.

It would be needlessly complex and error-prone for compilers to hardcode custom code generation for such abstractions; it's transformed to something else the compiler already understands at a far higher level. I know for a fact that in Rust, for-loops already desugar to a simple infinite loop construct with a break at the H.I.R. level and all further optimizations only happen from there.

tialaramex · on Oct 12, 2022

Yup. C# likewise described the change in semantics as just a different de-sugaring of their for each loop when that happened.

Blikkentrekker · on Oct 12, 2022

Which makes it more confusing why it was originally as it was because it's really not harder to implement as any compiler implements it as desugaring before optimizations een occur and this form is simpler.

The only explanation I see is that they really gave it no thought at all whatsoever and it wasn't a tradeoff but simply not thinking clearly.

maccard · on Oct 11, 2022

> Maybe internally it clones these references -shrug.

That's what go does under the hood in 95% of cases. If I do:

    func foo(a int) *int {
        return &a
    }

    func main() {
        fmt.Println(*foo(3))
    }

It does the right thing. It's just that in this case (and in lambdas) they got it wrong.

avl999 · on Oct 11, 2022

`&varName` is not equivalent to C version of the superficially similar looking operation.

bjoli · on Oct 11, 2022

I have said this before, but a source->source compiler is a very handy thing to have. A loop construct in chez or guile scheme would probably translate into something tail recursive. You could then macro-expand and optimize the loop to the code that is then compiled into lower level.

I have been in situations where I have had to expand a macro to figure out what is going on. Not having it in a situation like this (where a for loop is obviously just a goto or a tail call) is usuallly a pain in the ass. If it was translated into the same language it would also be easier to define what it should translate to.

valenterry · on Oct 11, 2022

I'd rather use a language from the beginning that isn't made up accommodate tons of newgrads as fast as possible on a big codebase. That is what Google needs, but most people are better of with more advanced languages. But people follow Google just like they did with Angular. I predict Go will follow the same destiny, will just take longer. The tech debt is still piling up.

socialdemocrat · on Oct 11, 2022

Yeah… I seem to remember scratching my head because of a variation of this problem. It was kind of driving me nuts. Yet the number of moments like this has been low for me in Go compared to C++.

arh68 · on Oct 12, 2022

This seems reasonable. As others state, this is hardly an issue specific to Go.

In CoffeeScript (which I think solves this nicely), it looks something like

  for i in [0...3] # 3 3 3
    setTimeout (() -> console.log "#{i}"), 100

  for i in [0...3] # 0 1 2
    setTimeout (do (i) -> () -> console.log "#{i}"), 100

  for i in [0...3] # 0 1 2
    do (i) ->
      setTimeout (() -> console.log "#{i}"), 100

foldl2022 · on Oct 11, 2022

Glad that Nim had solved this problem with a macro `capture`.

It's up to the developers to capture loop variables or not.

vagab0nd · on Oct 14, 2022

I used to get compiler warnings for this particular problem (getting a warning tag in Vim, not when compiling). Sadly the warning stopped working after some update and I have been bitten by this several times since.

ok_dad · on Oct 11, 2022

I'm glad I read this today because I'm starting to do some work in Go. Anyone have any other good sources for "gotchas" in Go (Go-tchas?) that I can read to familiarize myself with how to think better in Go?

bts · on Oct 11, 2022

Data Race Patterns in Go: https://eng.uber.com/data-race-patterns-in-go/