Go internals: capturing loop variables in closures

The Go wiki has a page titled CommonMistakes. Amusingly, it only lists a
single entry at this time – using goroutines on loop iterator variables,
providing this example:

for _, val := range values {
go func() {
fmt.Println(val)
}()
}

This will print the last value in values, len(values) times. The fix
is very simple:

// assume the type of each value is string
for _, val := range values {
go func(val string) {
fmt.Println(val)
}(val)
}

Being aware of the fix is sufficient to be able to write correct Go programs.
However, if you find the details of Go’s implementation fascinating, this post
should provide a deeper understanding of the problem and its solution.

Basics – passing by value and by reference

Go makes the distinction between passing objects by value and by reference
[1] explicit. Let’s start with example 1 [2]:

func foobyval(n int) {
fmt.Println(n)
}

func main() {
for i := 0; i < 5; i++ {
go foobyval(i)
}

time.Sleep(100 * time.Millisecond)
}

It should come as no surprise that this will print out the values 0 to 4, likely
in some scrambled order. So we’ll move on to example 2:

func foobyref(n *int) {
fmt.Println(*n)
}

func main() {
for i := 0; i < 5; i++ {
go foobyref(&i)
}

time.Sleep(100 * time.Millisecond)
}

This code has a data race and will fail the Go race detector (when executed with
-race). On my machine it prints:

5
5
5
5
5

But it could also print other sequences. Understanding why will take us 80% of
the way to grokking the main topic of this post, so let’s spend some time
exploring this.

It turns out that the answer is right there in the Go spec, which states:

Variables declared by the init statement are re-used in each iteration.

This means that when the program is running, there’s just a single object
representing i, not a new one for each iteration. This object gets assigned
a new value on each iteration.

Let’s study the difference in the generated machine code [3] for the for
loop between examples 1 and 2. Starting with example 1:

0x0021 00033 (go-func-byval.go:13) MOVQ AX, “”.i+24(SP)
0x0026 00038 (go-func-byval.go:14) MOVL $8, (SP)
0x002d 00045 (go-func-byval.go:14) LEAQ “”.foobyval·f(SB), CX
0x0034 00052 (go-func-byval.go:14) MOVQ CX, 8(SP)
0x0039 00057 (go-func-byval.go:14) MOVQ AX, 16(SP)
0x003e 00062 (go-func-byval.go:14) CALL runtime.newproc(SB)
0x0043 00067 (go-func-byval.go:13) MOVQ “”.i+24(SP), AX
0x0048 00072 (go-func-byval.go:13) INCQ AX
0x004b 00075 (go-func-byval.go:13) CMPQ AX, $5
0x004f 00079 (go-func-byval.go:13) JLT 33

The go statement gets translated to a call to runtime.newproc. While
this machinery is interesting, we’ll leave it for a separate article. What’s
interesting for our needs here is how the loop variable i is lowered. It’s
stored in the AX register, which is then passed by value onto the stack
as an argument for the call too foobyval [4]. The by value here manifests
by simply copying the value of AX onto the stack. Changing AX will
henceforth not affect the value passed into foobyval.

On the other hand, the same loop in example 2 looks as follows:

0x0039 00057 (go-func-byref.go:14) MOVL $8, (SP)
0x0040 00064 (go-func-byref.go:14) LEAQ “”.foobyref·f(SB), CX
0x0047 00071 (go-func-byref.go:14) MOVQ CX, 8(SP)
0x004c 00076 (go-func-byref.go:14) MOVQ AX, 16(SP)
0x0051 00081 (go-func-byref.go:14) CALL runtime.newproc(SB)
0x0056 00086 (go-func-byref.go:13) MOVQ “”.&i+24(SP), AX
0x005b 00091 (go-func-byref.go:13) INCQ (AX)
0x005e 00094 (go-func-byref.go:13) CMPQ (AX), $5
0x0062 00098 (go-func-byref.go:13) JLT 57

The code is very similar, with just one critical difference. Now AX holds
the address of i rather than its value. Note how the increment and
comparison for the loop condition are made on (AX) instead of AX.
When we pass AX onto the stack, we’re passing the address of i.
Modifying (AX) will actually affect the value of i observed inside the
goroutine.

None of this should be surprising – after all, we’re passing a pointer to
an integer into foobyref.

What happens at run-time is that, on my machine, the loop finishes running
before any of the goroutines it launches actually start. Once they do, they
still hold a reference to the same i – not a copy of it. What’s the value of
i at this point? It’s 5, exactly where the loop stops iterating. This is why
all the goroutines print out 5. But the goroutines could also launch in the
middle of the loop and a difference sequence would occur – a classical data
race.

Methods with value vs. pointer receivers

A similar artifact can be observed when creating goroutines that invoke methods.
This is even pointed out explicitly on the CommonMistakes page. Consider
example 3:

type MyInt int

func (mi MyInt) Show() {
fmt.Println(mi)
}

func main() {
ms := []MyInt{50, 60, 70, 80, 90}
for _, m := range ms {
go m.Show()
}

time.Sleep(100 * time.Millisecond)
}

This prints the elements of ms, possibly in a scrambled order, as you’d
expect. A very similar example 4 uses a pointer receiver for the Show
method:

type MyInt int

func (mi *MyInt) Show() {
fmt.Println(*mi)
}

func main() {
ms := []MyInt{50, 60, 70, 80, 90}
for _, m := range ms {
go m.Show()
}

time.Sleep(100 * time.Millisecond)
}

Can you guess what the output is going to be? It’s 90 repeated five times. The
reason is exactly the same as in the simpler example 2. Here it’s a bit more
insidious because of the syntactic sugar Go applies to calling methods with
pointer receivers. Whereas between examples 1 and 2 we changed the argument
passed into the function from i to &i, here the method invocation is
exactly the same! It’s go m.Show() in both cases, yet the behavior is very
different.

This is a rather unfortunate confluence of Go features, IMHO. Nothing in the
call site hints that m is passed by reference, and one has to go look at
the documentation of the Show method to see that (the method can be in
a completely different package from the calling code, of course). In most cases
this feature is useful, enabling us to write cleaner code. But here the
passing by reference has unexpected effects.

Closures

Finally we come back to closures, with example 5:

func foobyval(n int) {
fmt.Println(n)
}

func main() {
for i := 0; i < 5; i++ {
go func() {
foobyval(i)
}()
}

time.Sleep(100 * time.Millisecond)
}

This is likely to print not what you’d expect. On my machine it prints:

5
5
5
5
5

Even though i is passed by value to foobyval in the the closure, which
seems like it should be fine based on example 1. Let’s find out why. We’ll start
with the disassembly of the for loop:

0x0039 00057 (go-closure.go:14) MOVL $8, (SP)
0x0040 00064 (go-closure.go:14) LEAQ “”.main.func1·f(SB), CX
0x0047 00071 (go-closure.go:14) MOVQ CX, 8(SP)
0x004c 00076 (go-closure.go:14) MOVQ AX, 16(SP)
0x0051 00081 (go-closure.go:14) CALL runtime.newproc(SB)
0x0056 00086 (go-closure.go:13) MOVQ “”.&i+24(SP), AX
0x005b 00091 (go-closure.go:13) INCQ (AX)
0x005e 00094 (go-closure.go:13) CMPQ (AX), $5
0x0062 00098 (go-closure.go:13) JLT 57

It’s fairly similar to example 2: note that i is represented as an address
in the AX register, meaning that we pass it around by reference, even though
the closure calls foobyval. This loop body invokes a function using
runtime.newproc, but where does this function come from?

func1 is created by the compiler to represent the closure. The compiler
outlines the closure’s code into a standalone function and inserts a call to
it in main. The main challenge in outlining closures like this is how to
treat the variables the closure implicitly uses but weren’t declared in its
argument list.

Here’s the body of func1:

0x0000 00000 (go-closure.go:14) TEXT “”.main.func1(SB), ABIInternal, $16-8
0x0000 00000 (go-closure.go:14) MOVQ (TLS), CX
0x0009 00009 (go-closure.go:14) CMPQ SP, 16(CX)
0x000d 00013 (go-closure.go:14) JLS 56
0x000f 00015 (go-closure.go:14) SUBQ $16, SP
0x0013 00019 (go-closure.go:14) MOVQ BP, 8(SP)
0x0018 00024 (go-closure.go:14) LEAQ 8(SP), BP
0x001d 00029 (go-closure.go:15) MOVQ “”.&i+24(SP), AX
0x0022 00034 (go-closure.go:15) MOVQ (AX), AX
0x0025 00037 (go-closure.go:15) MOVQ AX, (SP)
0x0029 00041 (go-closure.go:15) CALL “”.foobyval(SB)
0x002e 00046 (go-closure.go:16) MOVQ 8(SP), BP
0x0033 00051 (go-closure.go:16) ADDQ $16, SP
0x0037 00055 (go-closure.go:16) RET

The interesting thing to note is that the function has an argument in
24(SP), which is an int pointer – note the MOVQ (AX), AX which
extracts its value before passing it to foobyval. In essence, func1
looks something like this:

func func1(i *int) {
foobyval(*i)
}

And the loop in main is transformed into:

for i := 0; i < 5; i++ {
go func1(&i)
}

This is equivalent to example 2, which explains the output of the program. The
technical jargon for what’s happening here is that i is a free variable
inside the closure, and such variables are captured by reference in Go.

Are they always, though? Surprisingly, the answer is no. In some cases, free
variables are captured by value. Here’s a variation of our example:

for i := 0; i < 5; i++ {
ii := i
go func() {
foobyval(ii)
}()
}

This will print 0, 1, 2, 3, 4 in a scrambled order. Why is the behavior here
different from example 5? After all, ii is a free variable in the closure,
same as i in example 5.

It turns out this behavior is an artifact of the heuristics used by the Go
compiler to handle closures.

Peeking under the hood

If you’re not familiar with the architecture of the Go compiler, I suggest
checking out my earlier article on Go compiler internals –
part 1,
part 2.

The concrete syntax tree created by the parser creates the following note for
the go statement calling the closure:

0: *syntax.CallStmt {
. Tok: go
. Call: *syntax.CallExpr {
. . Fun: *syntax.FuncLit {
. . . Type: *syntax.FuncType {
. . . . ParamList: nil
. . . . ResultList: nil
. . . }
. . . Body: *syntax.BlockStmt {
. . . . List: []syntax.Stmt (1 entries) {
. . . . . 0: *syntax.ExprStmt {
. . . . . . X: *syntax.CallExpr {
. . . . . . . Fun: foobyval @ go-closure.go:15:4
. . . . . . . ArgList: []syntax.Expr (1 entries) {
. . . . . . . . 0: i @ go-closure.go:15:13
. . . . . . . }
. . . . . . . HasDots: false
. . . . . . }
. . . . . }
. . . . }
. . . . Rbrace: syntax.Pos {}
. . . }
. . }
. . ArgList: nil
. . HasDots: false
. }
}

The function called is represented by a FuncLit node – a function literal.
When this tree is converted into the AST, outlining the function literal into
a standalone function is one of the outcomes. This is done in the
noder.funcLit method that lives in gc/closure.go.

The type checker completes the transformation and we get this AST node for the
outlined function representing the closure:

main.func1:
. DCLFUNC l(14) tc(1) FUNC-func()
. DCLFUNC-body
. . CALLFUNC l(15) tc(1)
. . . NAME-main.foobyval a(true) l(8) x(0) class(PFUNC) tc(1) used FUNC-func(int)
. . CALLFUNC-list
. . . NAME-main.i l(15) x(0) class(PAUTOHEAP) tc(1) used int

Note that the value passed into foobyval is NAME-main.i, directly
referencing the variable from the function enclosing the closure.

At this point comes the stage of the compiler most relevant to our exploration –
capturevars. Its goal is to decide how to capture closed variables (i.e.
free variables used in closures). Here’s a comment for the relevant function
in the compiler, which also describes the heuristic:

// capturevars is called in a separate phase after all typechecking is done.
// It decides whether each variable captured by a closure should be captured
// by value or by reference.
// We use value capturing for values <= 128 bytes that are never reassigned
// after capturing (effectively constant).

When it runs on example 5, capturevars marks the loop variable i as
captured by reference, and adds a addrtaken flag to it. This is visible
in the AST:

FOR l(13) tc(1)
. LT l(13) tc(1) bool
. . NAME-main.i a(true) g(1) l(13) x(0) class(PAUTOHEAP) esc(h) tc(1) addrtaken assigned used int

For the loop variable, the heuristic for value capturing doesn’t apply because
the value is reassigned after the call (recall the quote from the spec which
says that loop vars are re-used between iterations). Therefore, i is
captured by reference.

In the variation where we have ii := i before the go statement, ii
is not reassigned after the goroutine is launched, so it’s captured by value
[5].

Therefore, we see a fascinating example of two language features interacting in
unexpected ways. Instead of defining a new variable for each iteration, Go
reuses the same one. This, in turn, leads to the capturevars heuristic to
tag it for reference capturing, which leads to unexpected output. The Go FAQ admits this behavior may
have been a design mistake:

This behavior of the language, not defining a new variable for each iteration,
may have been a mistake in retrospect. It may be addressed in a later version
but, for compatibility, cannot change in Go version 1.

Once you’re aware of the problem, it shouldn’t cause you much problems in
realistic scenarios – just be wary of free variables captured by closures.
Always assume they can be captured by reference, unless you explicitly checked
that the capture is by value. To avoid mistakes, only read-only values should be
left to be implicitly captured by closures invoked in goroutines; this makes
sense from a concurrency standpoint as well.

[1]
Some readers noted that – strictly speaking – there is no concept of
pass by reference in Go, as everything is passed by value, including
possibly pointers. In this article when I say “pass by reference”, I mean
“pass by address”, which in some cases is explicit (as in passing &n
into a function expecting a *int), and in other cases implicit – as
the latter parts of the article demonstrate.

[2]
Here and elsewhere in this post, I’m using time.Sleep as a quick and
hacky way to wait for all the spawned goroutines to finish. Without this,
main will return before the other goroutine start running. The
correct way to wait for goroutines in real code would be something like a
WaitGroup or a done channel.

[3]
The disassembly for all the samples in this post is obtained by calling
the Go compiler directly with go tool compile -l -S. The -l flag
disables function inlining, which makes the resulting assembly more
readable.

[4]

foobyval is not called directly because it’s invoked in a go
statement. Rather, we pass its address as the second argument (at
16(SP)) to runtime.newproc, and the arguments to foobyval
(i in this case) follow higher on the stack.

[5]
As an exercise, add a dummy ii = 10 assignment as the very last
statement in the for body (after the go closure call). What is
the output? Why?

Flatlogic Admin Templates banner

Leave a Reply

Your email address will not be published. Required fields are marked *