Jalex Chang

2020.11.14

The Escape Analysis in Go -

We can use memory much efficiently than thought

Jalex Chang

- Gopher.

- Love software engineering, database systems, and distributed systems.

- Backend Engineer @ Umbo Computer Vision

Contact:

- jalex.cpc @ gmail.com

- jalex.chang @ Facebook

- JalexChang @ GitHub

Agenda

  • Introduction

  • The Escape Analysis

  • Programming Tips

  • Discussions

  • Summary

Introduction

  • In this tech talk, we are going to introduce Go's escape analysis (ESC) and its underlying working process.

  • The topics we will cover in the talk:

    • What is ESC?

    • Why does Go need ESC?

    • When does ESC get to work?

    • How does ESC really work? Any exception?  

    • How to utilize ESC to benefit our programs?

Go's memory allocation & managemet

  • Actually, Go's memory allocation and management mechanisms are complicated.

    • Such as garbage collection (GC), TCMalloc, multi-layered memory allocator, and etc.  

  • Let's abstract it and focus on the variable (object) allocation.

Go's variable declaration in concept

  • For declared variables in Go, they need to be allocated as objects in memory where either in the heap or on the stack.

  • Heap

    • A global storage space

    • Where stored objects can be shared

    • Where stored objects are managed by the GC

  • Stack frames

    • A local storage space belonging to a function

    • ​Each stack frame is stuck to a goroutine.

    • Where stored objects are used privately

    • Where stored objects are managed by the belonging frame's lifecycle

Heap vs Stack 

  • From a variable declaration perspective

    • Allocating objects on the stack is faster than in the heap.

    • Because goroutines can fully control their stack frames.

    • No locking, no GC, and less overhead.

Let's do some experiments to prove it~

Experiment1 - small objects

type T struct {
    X int32 // 4B
}

var global interface{}

func BenchmarkAllocOnHeap(b *testing.B) {
    b.ReportAllocs()
    for i := 0; i <= b.N; i++ {
        global = &T{}
    }
}

func BenchmarkAllocOnStack(b *testing.B) {
    b.ReportAllocs()
    for i := 0; i <= b.N; i++ {
        local := T{}
        _ = local
    }
}
$ go test -cpu 1,2,4,8,16 -bench=. malloc_small_object_test.go 
goos: darwin
goarch: amd64
BenchmarkAllocOnHeap       66171081     18.2 ns/op    4 B/op   1 allocs/op
BenchmarkAllocOnHeap-2     93559117     12.4 ns/op    4 B/op   1 allocs/op
BenchmarkAllocOnHeap-4     92098896     13.0 ns/op    4 B/op   1 allocs/op
BenchmarkAllocOnHeap-8     85893501     12.2 ns/op    4 B/op   1 allocs/op
BenchmarkAllocOnHeap-16    86982369     11.9 ns/op    4 B/op   1 allocs/op
BenchmarkAllocOnStack      1000000000   0.294 ns/op   0 B/op   0 allocs/op
BenchmarkAllocOnStack-2    1000000000   0.292 ns/op   0 B/op   0 allocs/op
BenchmarkAllocOnStack-4    1000000000   0.296 ns/op   0 B/op   0 allocs/op
BenchmarkAllocOnStack-8    1000000000   0.299 ns/op   0 B/op   0 allocs/op
BenchmarkAllocOnStack-16   1000000000   0.294 ns/op   0 B/op   0 allocs/op
PASS
ok      command-line-arguments  7.451s
About 40 times faster.

Experiment2 - huge Objects

type T struct {
    X [1000]int32 // 4KB
}

var global interface{}

func BenchmarkAllocOnHeap(b *testing.B) {
    b.ReportAllocs()
    for i := 0; i <= b.N; i++ {
        global = &T{}
    }
}

func BenchmarkAllocOnStack(b *testing.B) {
    b.ReportAllocs()
    for i := 0; i <= b.N; i++ {
        local := T{}
        _ = local
    }
}
$ go test -cpu 1,2,4,8,16 -bench=. malloc_huge_object_test.go 
goos: darwin
goarch: amd64
BenchmarkAllocOnHeap       1626262     784 ns/op    4096 B/op  1 allocs/op
BenchmarkAllocOnHeap-2     1852974     613 ns/op    4096 B/op  1 allocs/op
BenchmarkAllocOnHeap-4     1949342     613 ns/op    4096 B/op  1 allocs/op
BenchmarkAllocOnHeap-8     1902932     629 ns/op    4096 B/op  1 allocs/op
BenchmarkAllocOnHeap-16    1765797     689 ns/op    4096 B/op  1 allocs/op
BenchmarkAllocOnStack      1000000000  0.398 ns/op  0 B/op     0 allocs/op
BenchmarkAllocOnStack-2    1000000000  0.297 ns/op  0 B/op     0 allocs/op
BenchmarkAllocOnStack-4    1000000000  0.301 ns/op  0 B/op     0 allocs/op
BenchmarkAllocOnStack-8    1000000000  0.293 ns/op  0 B/op     0 allocs/op
BenchmarkAllocOnStack-16   1000000000  0.295 ns/op  0 B/op     0 allocs/op
PASS
ok      command-line-arguments  11.450s
About 2000 times faster.

Experiment3 - super large objects

// The maximum size of explicitly
// declared variables on stacks is 10MB
type T struct {
    X [10 * 1000 * 1000]byte // 10MB
}

var global interface{}

func BenchmarkAllocOnHeap(b *testing.B) {
    b.ReportAllocs()
    for i := 0; i <= b.N; i++ {
        global = &T{}
    }
}

func BenchmarkAllocOnStack(b *testing.B) {
    b.ReportAllocs()
    for i := 0; i <= b.N; i++ {
        local := T{}
        _ = local
    }
}
$ go test -cpu 1,2,4,8,16 -bench=. malloc_max_object_test.go 
goos: darwin
goarch: amd64
BenchmarkAllocOnHeap      1659        687219 ns/op  10008461 B/op  1 allocs/op
BenchmarkAllocOnHeap-2    1568        797501 ns/op  10008811 B/op  1 allocs/op
BenchmarkAllocOnHeap-4    1593        816421 ns/op  10008715 B/op  1 allocs/op
BenchmarkAllocOnHeap-8    1360        782797 ns/op  10009793 B/op  1 allocs/op
BenchmarkAllocOnHeap-16   1424        817991 ns/op  10009466 B/op  1 allocs/op
BenchmarkAllocOnStack     1000000000  0.313 ns/op   0 B/op         0 allocs/op
BenchmarkAllocOnStack-2   1000000000  0.327 ns/op   0 B/op         0 allocs/op
BenchmarkAllocOnStack-4   1000000000  0.302 ns/op   0 B/op         0 allocs/op
BenchmarkAllocOnStack-8   1000000000  0.303 ns/op   0 B/op         0 allocs/op
BenchmarkAllocOnStack-16  1000000000  0.347 ns/op   0 B/op         0 allocs/op
PASS
ok      command-line-arguments  8.197s
About 24M times faster.

Now we know allocating objects on the stack really matters.......

 But how does Go know where a variable should be allocated?

The Escape Analysis

What is the escape analysis (ESC)?

  • The escape analysis is a mechanism to automatically decide whether a variable should be allocated in the heap or not in compile time.

    • It tries to keep variables on the stack as much as possible.

    • If a variable would be allocated in the heap, the variable is escaped (from the stack).

When does ESC happen?

ESC - concept

A variable's construction or type doesn’t determine where it lives. Only how the variable is shared does.
  • ESC would consider assignment relationships between declared variables.

  • Generally, a variable scapes if:

    • its address has been captured by ​the address-of operand (&).

    • and at least one of the related variables has already escaped.​​

package main

var g *int

func main() {
	// ecsape to heap
	v := 0
	g = &v
}
$ go run -gcflags "-m=2 -l" basic_concept.go 
# command-line-arguments
./basic_concept.go:8:2: v escapes to heap:
./basic_concept.go:8:2:   flow: {heap} = &v:
./basic_concept.go:8:2:     from &v (address-of) at ./basic_concept.go:9:6
./basic_concept.go:8:2:     from g = &v (assign) at ./basic_concept.go:9:4
./basic_concept.go:8:2: moved to heap: v

How does ESC work?

  • Basically, ESC determines whether variables escape or not by

    • the data-flow analysis (shortest path analysis)

    • and other additional rules

ESC - data-flow analysis

  • Data-flow is a directed weighted graph

    • Constructed from the abstract syntax tree (AST).
    • It is used to represent relationships between variables.

  • Vertices (locations)

    • Represent all declared variables.

    • Compound types (struct, slice, and map...) is lowered to the simplest representation.

  • Edges

    • Represent assignments between variables.

    • ​Each edge has a weight representing addressing/dereference counts (derefs).

Examples of data-flow representation

Data-flow analysis - process flow 

Step1. Construct locations

  • Walk through all functions to collect declared variables.

Step2. Construct edges

  • Walk through all functions again to collect assignments.

Step3. Analyze the built graph

  • Iteratively ​walk through the built graph (based on the Bellman-Ford algorithm).

    • Start from every location.

    • Mark a variable as escaped if the source location has escaped and the relative derefs (shortest path) is -1.​

    • Stop expanding a variable's incoming edges if the variable escapes.

Step4. Collect escape notes

  • Walk through locations to collect the escape reasons of marked variables.

Feel dizzy? Let me show you an example.

Construct locations

var p **int

func f1() {
    var x1 *int
    p = &x1

    x2 := x1
    x3 := *p
    x4 := &x3
    _ = x2
    _ = x4
}

func f2() {
    var t **int
    y1 := 1
    y2 := &y1
    y3 := y2

    t = &y2
    p = t
    t = &y3
}

func main() {
    f1()
    f2()
}

Construct edges

var p **int

func f1() {
    var x1 *int
    p = &x1

    x2 := x1
    x3 := *p
    x4 := &x3
    _ = x2
    _ = x4
}

func f2() {
    var t **int
    y1 := 1
    y2 := &y1
    y3 := y2

    t = &y2
    p = t
    t = &y3
}

func main() {
    f1()
    f2()
}

Analyze the built graph (1)

var p **int

func f1() {
    var x1 *int
    p = &x1

    x2 := x1
    x3 := *p
    x4 := &x3
    _ = x2
    _ = x4
}

func f2() {
    var t **int
    y1 := 1
    y2 := &y1
    y3 := y2

    t = &y2
    p = t
    t = &y3
}

func main() {
    f1()
    f2()
}

Analyze the built graph (2)

var p **int

func f1() {
    var x1 *int
    p = &x1

    x2 := x1
    x3 := *p
    x4 := &x3
    _ = x2
    _ = x4
}

func f2() {
    var t **int
    y1 := 1
    y2 := &y1
    y3 := y2

    t = &y2
    p = t
    t = &y3
}

func main() {
    f1()
    f2()
}

Analyze the built graph (3)

var p **int

func f1() {
    var x1 *int
    p = &x1

    x2 := x1
    x3 := *p
    x4 := &x3
    _ = x2
    _ = x4
}

func f2() {
    var t **int
    y1 := 1
    y2 := &y1
    y3 := y2

    t = &y2
    p = t
    t = &y3
}

func main() {
    f1()
    f2()
}

Analyze the built graph (4)

x1, y2, and y3 have checked,

let's skip them  

var p **int

func f1() {
    var x1 *int
    p = &x1

    x2 := x1
    x3 := *p
    x4 := &x3
    _ = x2
    _ = x4
}

func f2() {
    var t **int
    y1 := 1
    y2 := &y1
    y3 := y2

    t = &y2
    p = t
    t = &y3
}

func main() {
    f1()
    f2()
}

Analyze the built graph (5)

The analysis is finnished!  

var p **int

func f1() {
    var x1 *int
    p = &x1

    x2 := x1
    x3 := *p
    x4 := &x3
    _ = x2
    _ = x4
}

func f2() {
    var t **int
    y1 := 1
    y2 := &y1
    y3 := y2

    t = &y2
    p = t
    t = &y3
}

func main() {
    f1()
    f2()
}

Collect escape notes

$ go run -gcflags "-m=2 -l" indirect_primitives.go
command-line-arguments
./indirect_primitives.go:6:6: x1 escapes to heap:
./indirect_primitives.go:6:6:   flow: {heap} = &x1:
./indirect_primitives.go:6:6:     from &x1 (address-of) at ./indirect_primitives.go:7:6
./indirect_primitives.go:6:6:     from p = &x1 (assign) at ./indirect_primitives.go:7:4
./indirect_primitives.go:6:6: moved to heap: x1
./indirect_primitives.go:20:2: y3 escapes to heap:
./indirect_primitives.go:20:2:   flow: t = &y3:
./indirect_primitives.go:20:2:     from &y3 (address-of) at ./indirect_primitives.go:24:6
./indirect_primitives.go:20:2:     from t = &y3 (assign) at ./indirect_primitives.go:24:4
./indirect_primitives.go:20:2:   flow: {heap} = t:
./indirect_primitives.go:20:2:     from p = t (assign) at ./indirect_primitives.go:23:4
./indirect_primitives.go:19:2: y2 escapes to heap:
./indirect_primitives.go:19:2:   flow: t = &y2:
./indirect_primitives.go:19:2:     from &y2 (address-of) at ./indirect_primitives.go:22:6
./indirect_primitives.go:19:2:     from t = &y2 (assign) at ./indirect_primitives.go:22:4
./indirect_primitives.go:19:2:   flow: {heap} = t:
./indirect_primitives.go:19:2:     from p = t (assign) at ./indirect_primitives.go:23:4
./indirect_primitives.go:18:2: y1 escapes to heap:
./indirect_primitives.go:18:2:   flow: y2 = &y1:
./indirect_primitives.go:18:2:     from &y1 (address-of) at ./indirect_primitives.go:19:8
./indirect_primitives.go:18:2:     from y2 := &y1 (assign) at ./indirect_primitives.go:19:5
./indirect_primitives.go:18:2: moved to heap: y1
./indirect_primitives.go:19:2: moved to heap: y2
./indirect_primitives.go:20:2: moved to heap: y3
var p **int

func f1() {
    var x1 *int
    p = &x1

    x2 := x1
    x3 := *p
    x4 := &x3
    _ = x2
    _ = x4
}

func f2() {
    var t **int
    y1 := 1
    y2 := &y1
    y3 := y2

    t = &y2
    p = t
    t = &y3
}

func main() {
    f1()
    f2()
}

In addition to the data-flow analysis, there are some (but not all) additional rules in ESC.

Huge objects

package main

type smallExplicitT struct {
    a [1000 * 1000]int32 // 4MB
}

func main() {
    dcl3 := smallExplicitT{}
    dcl4 := make([]int32, 0, 15*1000) // 60KB
    _ = dcl3
    _ = dcl4
}
  • For explicit declarations (var or :=)

    • The variables escape if their sizes are over 10MB

  • For implicit declarations (new or make)

    • ​The variables escape if their sizes are over 64KB 
package main

type hugeExplicitT struct {
	a [3 * 1000 * 1000]int32 // 12MB
}

func main() {
    // dcl1 escapes to heap: too large for stack
    dcl1 := hugeExplicitT{}
    // dcl2 escapes to heap: too large for stack
    dcl2 := make([]int32, 0, 17*1000) // 68KB
    _ = dcl1
    _ = dcl2
}

Slice

  • A slice variable escapes if its size of the capacity is non-constant.  

package main

func main() {
	const constSize = 10
	var varSize = 10

	s1 := []int32{}
	// s2 escapes to heap: non-constant size
	s2 := make([]int32, varSize)
	s3 := make([]int32, constSize)
	// s4 escapes to heap: non-constant size
	s4 := make([]int32, varSize, varSize)
	s5 := make([]int32, varSize, constSize)
	// s6 escapes to heap: non-constant size
	s6 := make([]int32, constSize, varSize)
	s7 := make([]int32, constSize, constSize)
}

Map

  • A variable escapes if it is referenced by a map's key or value.
  • The escape happens no matter the map escape or not.
package main

func map1() {
	m1 := make(map[int]int)
	k1 := 0
	v1 := 0
	m1[k1] = v1
}

func map2() {
	m2 := make(map[*int]*int)
	k2 := 0 // escapes to heap: key of map put
	v2 := 0 // escapes to heap
	m2[&k2] = &v2
}

func map3() {
	m3 := make(map[interface{}]interface{})
	k3 := 0       // escapes to heap: key of map put
	v3 := 0       // escapes to heap
	m3[&k3] = &v3 // interface-converted happens
}

Return values

  • Returning values is a backward behavior that
    • the referenced variables escape if the return values are pointers
    • the values escape if they are map or slice 
func f1() **int {
    // t escapes to heap
    t := 0   
    // x1 escapes to heap
    x1 := &t
    return &x1
}

func f2() *int {
    // t escapes to heap
    t := 0 
    x2 := &t
    return x2
}

func f3() int {
    t := 0
    x3 := t
    return x3
}
func f4() map[string]int {
    // kv escapes to heap
    kv := make(map[string]int) 
    return kv
}

func f5() []int {
    // s escapes to heap
    s := []int{} 
    return s
}

Input parameters

  • Passing arguments is a forward behavior that
    • the arguments escape if input parameters have leaked (to heap)
package main

func f1(x1 *int) **int {
    // x1 escapes to heap: parameter leaking
    return &x1 
}

func f2(x2 *int) *int {
    return x2
}

func f3(x3 *int) int {
    return *x3
}

func main() {
    v1 := 1 // v1 escapes to heap
    f1(&v1)

    v2 := 1
    f2(&v2)

    v3 := 1
    f3(&v3)
}

Closure function

  • A variable escapes if
    • the source variable is captured by a closure function
    • and their relationship is address-of (derefs = -1 )
package main

func closure1() {
    var x *int
    func(x1 *int) {
        func(x2 *int) {
            func(x3 *int) {
                y := 1
                x3 = &y
            }(x2)
        }(x1)
    }(x)
    _ = x
}
func closure2() {
    var x *int
    func() {
        func() {
            func() {
                // y escapes to heap
                y := 1 
                // x is captured by a closure
                x = &y
            }()
        }()
    }()
    _ = x
}

How to utilize ESC to benefit our programs?

  • Through understanding the concept of ESC, we can find that
    • variables usually escape
      • when their addresses are captured by other variables.
      • when ESC does not know their object sizes in compile time.
    • And passing arguments to a function is safer than returning values from the function. 

Observations

So, the first and most important suggestion is:

try not to use pointers as much as possible

Initialize slice with constants

package main

func foo1(kv1 map[string]int) {
    // constant cap let the slice stay on the stack
    const initSize = 1000
    s1 := make([]int, 0, initSize)
    for _, v := range kv1 {
        s1 = append(s1, v)
    }

    // do something else    
}

func main() {
    kv := make(map[string]int)
    kv["a"] = 0
    kv["b"] = 1
    kv["c"] = 2

    foo1(kv)
}
package main

func foo2(kv2 map[string]int) {
    initSize := len(kv2)
    // escapes to heap
    s2 := make([]int, 0, initSize)
    for _, v := range kv2 {
        s2 = append(s2, v)
    }

    // do something else 
}

func main() {
    kv := make(map[string]int)
    kv["a"] = 0
    kv["b"] = 1
    kv["c"] = 2

    foo2(kv)
}

Passing variables to closure functions

  • Passing variables to closure as arguments instead of accessing the variables directly.
func closure1() {
    var x *int
    func(x1 *int) {
        func(x2 *int) {
            func(x3 *int) {
                y := 1
                x3 = &y
            }(x2)
        }(x1)
    }(x)
    _ = x
}
func closure2() {
    var x *int
    func() {
        func() {
            func() {
                // y escapes to heap
                y := 1 
                // x is captured by a closure
                x = &y
            }()
        }()
    }()
    _ = x
}

Argument injection

// Read reads data into p.
// It returns the number of bytes read into p.
// The bytes are taken from at most one Read on the underlying Reader,
// hence n may be less than len(p).
// To read exactly len(p) bytes, use io.ReadFull(b, p).
// At EOF, the count will be zero and err will be io.EOF.
func (b *Reader) Read(p []byte) (n int, err error){
// ....
}
  • Injecting changes to the passed parameters instead of return values back. 
  • For exmaple: Reader.Read in pkg bufio.

Discussions

Q: Do I really need to worry about where variables are allocated?

 

In most cases, no.

Actually, Go's garbage collection is super powerful!

 

 

Discussions (cont'd)

Q: When I should start to optimize my programs?

 

Premature optimization is the root of all evil.

Only optimize services when they have performance or cost issues.

Discussions (cont'd)

Q: How can I know if variables in my programs escape or not?

 

Don't guess. Test it!  

go tool compile -l -m=[1-4] <file_path>

Discussions (cont'd)

Q: I have got lost during the sharing, what am I supposed to know?

 

Don't use pointers(?

Takeaways

In this tech sharing, we have introduced Go's escape analysis (ESC) and its underlying working process.

  • The goal of ESC is to keep objects on the stack as much as possible.

    • Because allocating objects on the stack is faster than in the heap.

  • ESC determines variables escape or not by data-flow (graph) analysis and other rules. 

Through understanding the ESC, we have learned:

  • Abusing pointers would make variables escape-prone.

  • Should use map, slice, and closure carefully.

  • Passing arguments to a function is safer than returning values from it.

Thanks for listening.

The Escape Analysis in Go

By Jalex Chang

The Escape Analysis in Go

Introduce the escape analysis in Go (1.15) and its underlying working process.

  • 2,454