Jalex Chang

2020.11.14

The Escape Analysis in Go -

We can use memory much efficiently than thought

Jalex Chang

- Gopher.

- Love software engineering, database systems, and distributed systems.

- Backend Engineer @ Umbo Computer Vision

Contact:

- jalex.cpc @ gmail.com

- jalex.chang @ Facebook

- JalexChang @ GitHub

Agenda

  • Introduction

  • The Escape Analysis

  • Programming Tips

  • Discussions

  • Summary

Introduction

  • In this tech talk, we are going to introduce Go's escape analysis (ESC) and its underlying working process.

  • The topics we will cover in the talk:

    • What is ESC?

    • Why does Go need ESC?

    • When does ESC get to work?

    • How does ESC really work? Any exception?  

    • How to utilize ESC to benefit our programs?

Go's memory allocation & managemet

  • Actually, Go's memory allocation and management mechanisms are complicated.

    • Such as garbage collection (GC), TCMalloc, multi-layered memory allocator, and etc.  

  • Let's abstract it and focus on the variable (object) allocation.

Go's variable declaration in concept

  • For declared variables in Go, they need to be allocated as objects in memory where either in the heap or on the stack.

  • Heap

    • A global storage space

    • Where stored objects can be shared

    • Where stored objects are managed by the GC

  • Stack frames

    • A local storage space belonging to a function

    • ​Each stack frame is stuck to a goroutine.

    • Where stored objects are used privately

    • Where stored objects are managed by the belonging frame's lifecycle

Heap vs Stack 

  • From a variable declaration perspective

    • Allocating objects on the stack is faster than in the heap.

    • Because goroutines can fully control their stack frames.

    • No locking, no GC, and less overhead.

Let's do some experiments to prove it~

Experiment1 - small objects

type T struct {
    X int32 // 4B
}

var global interface{}

func BenchmarkAllocOnHeap(b *testing.B) {
    b.ReportAllocs()
    for i := 0; i <= b.N; i++ {
        global = &T{}
    }
}

func BenchmarkAllocOnStack(b *testing.B) {
    b.ReportAllocs()
    for i := 0; i <= b.N; i++ {
        local := T{}
        _ = local
    }
}
$ go test -cpu 1,2,4,8,16 -bench=. malloc_small_object_test.go 
goos: darwin
goarch: amd64
BenchmarkAllocOnHeap       66171081     18.2 ns/op    4 B/op   1 allocs/op
BenchmarkAllocOnHeap-2     93559117     12.4 ns/op    4 B/op   1 allocs/op
BenchmarkAllocOnHeap-4     92098896     13.0 ns/op    4 B/op   1 allocs/op
BenchmarkAllocOnHeap-8     85893501     12.2 ns/op    4 B/op   1 allocs/op
BenchmarkAllocOnHeap-16    86982369     11.9 ns/op    4 B/op   1 allocs/op
BenchmarkAllocOnStack      1000000000   0.294 ns/op   0 B/op   0 allocs/op
BenchmarkAllocOnStack-2    1000000000   0.292 ns/op   0 B/op   0 allocs/op
BenchmarkAllocOnStack-4    1000000000   0.296 ns/op   0 B/op   0 allocs/op
BenchmarkAllocOnStack-8    1000000000   0.299 ns/op   0 B/op   0 allocs/op
BenchmarkAllocOnStack-16   1000000000   0.294 ns/op   0 B/op   0 allocs/op
PASS
ok      command-line-arguments  7.451s
About 40 times faster.

Experiment2 - huge Objects

type T struct {
    X [1000]int32 // 4KB
}

var global interface{}

func BenchmarkAllocOnHeap(b *testing.B) {
    b.ReportAllocs()
    for i := 0; i <= b.N; i++ {
        global = &T{}
    }
}

func BenchmarkAllocOnStack(b *testing.B) {
    b.ReportAllocs()
    for i := 0; i <= b.N; i++ {
        local := T{}
        _ = local
    }
}
$ go test -cpu 1,2,4,8,16 -bench=. malloc_huge_object_test.go 
goos: darwin
goarch: amd64
BenchmarkAllocOnHeap       1626262     784 ns/op    4096 B/op  1 allocs/op
BenchmarkAllocOnHeap-2     1852974     613 ns/op    4096 B/op  1 allocs/op
BenchmarkAllocOnHeap-4     1949342     613 ns/op    4096 B/op  1 allocs/op
BenchmarkAllocOnHeap-8     1902932     629 ns/op    4096 B/op  1 allocs/op
BenchmarkAllocOnHeap-16    1765797     689 ns/op    4096 B/op  1 allocs/op
BenchmarkAllocOnStack      1000000000  0.398 ns/op  0 B/op     0 allocs/op
BenchmarkAllocOnStack-2    1000000000  0.297 ns/op  0 B/op     0 allocs/op
BenchmarkAllocOnStack-4    1000000000  0.301 ns/op  0 B/op     0 allocs/op
BenchmarkAllocOnStack-8    1000000000  0.293 ns/op  0 B/op     0 allocs/op
BenchmarkAllocOnStack-16   1000000000  0.295 ns/op  0 B/op     0 allocs/op
PASS
ok      command-line-arguments  11.450s
About 2000 times faster.

Experiment3 - super large objects

// The maximum size of explicitly
// declared variables on stacks is 10MB
type T struct {
    X [10 * 1000 * 1000]byte // 10MB
}

var global interface{}

func BenchmarkAllocOnHeap(b *testing.B) {
    b.ReportAllocs()
    for i := 0; i <= b.N; i++ {
        global = &T{}
    }
}

func BenchmarkAllocOnStack(b *testing.B) {
    b.ReportAllocs()
    for i := 0; i <= b.N; i++ {
        local := T{}
        _ = local
    }
}
$ go test -cpu 1,2,4,8,16 -bench=. malloc_max_object_test.go 
goos: darwin
goarch: amd64
BenchmarkAllocOnHeap      1659        687219 ns/op  10008461 B/op  1 allocs/op
BenchmarkAllocOnHeap-2    1568        797501 ns/op  10008811 B/op  1 allocs/op
BenchmarkAllocOnHeap-4    1593        816421 ns/op  10008715 B/op  1 allocs/op
BenchmarkAllocOnHeap-8    1360        782797 ns/op  10009793 B/op  1 allocs/op
BenchmarkAllocOnHeap-16   1424        817991 ns/op  10009466 B/op  1 allocs/op
BenchmarkAllocOnStack     1000000000  0.313 ns/op   0 B/op         0 allocs/op
BenchmarkAllocOnStack-2   1000000000  0.327 ns/op   0 B/op         0 allocs/op
BenchmarkAllocOnStack-4   1000000000  0.302 ns/op   0 B/op         0 allocs/op
BenchmarkAllocOnStack-8   1000000000  0.303 ns/op   0 B/op         0 allocs/op
BenchmarkAllocOnStack-16  1000000000  0.347 ns/op   0 B/op         0 allocs/op
PASS
ok      command-line-arguments  8.197s
About 24M times faster.

Now we know allocating objects on the stack really matters.......

 But how does Go know where a variable should be allocated?

The Escape Analysis

What is the escape analysis (ESC)?

  • The escape analysis is a mechanism to automatically decide whether a variable should be allocated in the heap or not in compile time.

    • It tries to keep variables on the stack as much as possible.

    • If a variable would be allocated in the heap, the variable is escaped (from the stack).

When does ESC happen?

ESC - concept

A variable's construction or type doesn’t determine where it lives. Only how the variable is shared does.
  • ESC would consider assignment relationships between declared variables.

  • Generally, a variable scapes if:

    • its address has been captured by ​the address-of operand (&).

    • and at least one of the related variables has already escaped.​​

package main

var g *int

func main() {
	// ecsape to heap
	v := 0
	g = &v
}
$ go run -gcflags "-m=2 -l" basic_concept.go 
# command-line-arguments
./basic_concept.go:8:2: v escapes to heap:
./basic_concept.go:8:2:   flow: {heap} = &v:
./basic_concept.go:8:2:     from &v (address-of) at ./basic_concept.go:9:6
./basic_concept.go:8:2:     from g = &v (assign) at ./basic_concept.go:9:4
./basic_concept.go:8:2: moved to heap: v

How does ESC work?

  • Basically, ESC determines whether variables escape or not by

    • the data-flow analysis (shortest path analysis)

    • and other additional rules

ESC - data-flow analysis

  • Data-flow is a directed weighted graph

    • Constructed from the abstract syntax tree (AST).
    • It is used to represent relationships between variables.

  • Vertices (locations)

    • Represent all declared variables.

    • Compound types (struct, slice, and map...) is lowered to the simplest representation.

  • Edges

    • Represent assignments between variables.

    • ​Each edge has a weight representing addressing/dereference counts (derefs).

Examples of data-flow representation

Data-flow analysis - process flow 

Step1. Construct locations

  • Walk through all functions to collect declared variables.

Step2. Construct edges

  • Walk through all functions again to collect assignments.

Step3. Analyze the built graph

  • Iteratively ​walk through the built graph (based on the Bellman-Ford algorithm).

    • Start from every location.

    • Mark a variable as escaped if the source location has escaped and the relative derefs (shortest path) is -1.​

    • Stop expanding a variable's incoming edges if the variable escapes.

Step4. Collect escape notes

  • Walk through locations to collect the escape reasons of marked variables.

Feel dizzy? Let me show you an example.

Construct locations

var p **int

func f1() {
    var x1 *int
    p = &x1

    x2 := x1
    x3 := *p
    x4 := &x3
    _ = x2
    _ = x4
}

func f2() {
    var t **int
    y1 := 1
    y2 := &y1
    y3 := y2

    t = &y2
    p = t
    t = &y3
}

func main() {
    f1()
    f2()
}

Construct edges

var p **int

func f1() {
    var x1 *int
    p = &x1

    x2 := x1
    x3 := *p
    x4 := &x3
    _ = x2
    _ = x4
}

func f2() {
    var t **int
    y1 := 1
    y2 := &y1
    y3 := y2

    t = &y2
    p = t
    t = &y3
}

func main() {
    f1()
    f2()
}

Analyze the built graph (1)

var p **int

func f1() {
    var x1 *int
    p = &x1

    x2 := x1
    x3 := *p
    x4 := &x3
    _ = x2
    _ = x4
}

func f2() {
    var t **int
    y1 := 1
    y2 := &y1
    y3 := y2

    t = &y2
    p = t
    t = &y3
}

func main() {
    f1()
    f2()
}

Analyze the built graph (2)

var p **int

func f1() {
    var x1 *int
    p = &x1

    x2 := x1
    x3 := *p
    x4 := &x3
    _ = x2
    _ = x4
}

func f2() {
    var t **int
    y1 := 1
    y2 := &y1
    y3 := y2

    t = &y2
    p = t
    t = &y3
}

func main() {
    f1()
    f2()
}

Analyze the built graph (3)

var p **int

func f1() {
    var x1 *int
    p = &x1

    x2 := x1
    x3 := *p
    x4 := &x3
    _ = x2
    _ = x4
}

func f2() {
    var t **int
    y1 := 1
    y2 := &y1
    y3 := y2

    t = &y2
    p = t
    t = &y3
}

func main() {
    f1()
    f2()
}

Analyze the built graph (4)

x1, y2, and y3 have checked,

let's skip them  

var p **int

func f1() {
    var x1 *int
    p = &x1

    x2 := x1
    x3 := *p
    x4 := &x3
    _ = x2
    _ = x4
}

func f2() {
    var t **int
    y1 := 1
    y2 := &y1
    y3 := y2

    t = &y2
    p = t
    t = &y3
}

func main() {
    f1()
    f2()
}

Analyze the built graph (5)

The analysis is finnished!  

var p **int

func f1() {
    var x1 *int
    p = &x1

    x2 := x1
    x3 := *p
    x4 := &x3
    _ = x2
    _ = x4
}

func f2() {
    var t **int
    y1 := 1
    y2 := &y1
    y3 := y2

    t = &y2
    p = t
    t = &y3
}

func main() {
    f1()
    f2()
}

Collect escape notes

$ go run -gcflags "-m=2 -l" indirect_primitives.go
command-line-arguments
./indirect_primitives.go:6:6: x1 escapes to heap:
./indirect_primitives.go:6:6:   flow: {heap} = &x1:
./indirect_primitives.go:6:6:     from &x1 (address-of) at ./indirect_primitives.go:7:6
./indirect_primitives.go:6:6:     from p = &x1 (assign) at ./indirect_primitives.go:7:4
./indirect_primitives.go:6:6: moved to heap: x1
./indirect_primitives.go:20:2: y3 escapes to heap:
./indirect_primitives.go:20:2:   flow: t = &y3:
./indirect_primitives.go:20:2:     from &y3 (address-of) at ./indirect_primitives.go:24:6
./indirect_primitives.go:20:2:     from t = &y3 (assign) at ./indirect_primitives.go:24:4
./indirect_primitives.go:20:2:   flow: {heap} = t:
./indirect_primitives.go:20:2:     from p = t (assign) at ./indirect_primitives.go:23:4
./indirect_primitives.go:19:2: y2 escapes to heap:
./indirect_primitives.go:19:2:   flow: t = &y2:
./indirect_primitives.go:19:2:     from &y2 (address-of) at ./indirect_primitives.go:22:6
./indirect_primitives.go:19:2:     from t = &y2 (assign) at ./indirect_primitives.go:22:4
./indirect_primitives.go:19:2:   flow: {heap} = t:
./indirect_primitives.go:19:2:     from p = t (assign) at ./indirect_primitives.go:23:4
./indirect_primitives.go:18:2: y1 escapes to heap:
./indirect_primitives.go:18:2:   flow: y2 = &y1:
./indirect_primitives.go:18:2:     from &y1 (address-of) at ./indirect_primitives.go:19:8
./indirect_primitives.go:18:2:     from y2 := &y1 (assign) at ./indirect_primitives.go:19:5
./indirect_primitives.go:18:2: moved to heap: y1
./indirect_primitives.go:19:2: moved to heap: y2
./indirect_primitives.go:20:2: moved to heap: y3
var p **int

func f1() {
    var x1 *int
    p = &x1

    x2 := x1
    x3 := *p
    x4 := &x3
    _ = x2
    _ = x4
}

func f2() {
    var t **int
    y1 := 1
    y2 := &y1
    y3 := y2

    t = &y2
    p = t
    t = &y3
}

func main() {
    f1()
    f2()
}

In addition to the data-flow analysis, there are some (but not all) additional rules in ESC.

Huge objects

package main

type smallExplicitT struct {
    a [1000 * 1000]int32 // 4MB
}

func main() {
    dcl3 := smallExplicitT{}
    dcl4 := make([]int32, 0, 15*1000) // 60KB
    _ = dcl3
    _ = dcl4
}
  • For explicit declarations (var or :=)

    • The variables escape if their sizes are over 10MB

  • For implicit declarations (new or make)

    • ​The variables escape if their sizes are over 64KB 
package main

type hugeExplicitT struct {
	a [3 * 1000 * 1000]int32 // 12MB
}

func main() {
    // dcl1 escapes to heap: too large for stack
    dcl1 := hugeExplicitT{}
    // dcl2 escapes to heap: too large for stack
    dcl2 := make([]int32, 0, 17*1000) // 68KB
    _ = dcl1
    _ = dcl2
}

Slice

  • A slice variable escapes if its size of the capacity is non-constant.  

package main

func main() {
	const constSize = 10
	var varSize = 10

	s1 := []int32{}
	// s2 escapes to heap: non-constant size
	s2 := make([]int32, varSize)
	s3 := make([]int32, constSize)
	// s4 escapes to heap: non-constant size
	s4 := make([]int32, varSize, varSize)
	s5 := make([]int32, varSize, constSize)
	// s6 escapes to heap: non-constant size
	s6 := make([]int32, constSize, varSize)
	s7 := make([]int32, constSize, constSize)
}

Map

  • A variable escapes if it is referenced by a map's key or value.
  • The escape happens no matter the map escape or not.
package main

func map1() {
	m1 := make(map[int]int)
	k1 := 0
	v1 := 0
	m1[k1] = v1
}

func map2() {
	m2 := make(map[*int]*int)
	k2 := 0 // escapes to heap: key of map put
	v2 := 0 // escapes to heap
	m2[&k2] = &v2
}

func map3() {
	m3 := make(map[interface{}]interface{})
	k3 := 0       // escapes to heap: key of map put
	v3 := 0       // escapes to heap
	m3[&k3] = &v3 // interface-converted happens
}

Return values

  • Returning values is a backward behavior that
    • the referenced variables escape if the return values are pointers
    • the values escape if they are map or slice 
func f1() **int {
    // t escapes to heap
    t := 0   
    // x1 escapes to heap
    x1 := &t
    return &x1
}

func f2() *int {
    // t escapes to heap
    t := 0 
    x2 := &t
    return x2
}

func f3() int {
    t := 0
    x3 := t
    return x3
}
func f4() map[string]int {
    // kv escapes to heap
    kv := make(map[string]int) 
    return kv
}

func f5() []int {
    // s escapes to heap
    s := []int{} 
    return s
}

Input parameters

  • Passing arguments is a forward behavior that
    • the arguments escape if input parameters have leaked (to heap)
package main

func f1(x1 *int) **int {
    // x1 escapes to heap: parameter leaking
    return &x1 
}

func f2(x2 *int) *int {
    return x2
}

func f3(x3 *int) int {
    return *x3
}

func main() {
    v1 := 1 // v1 escapes to heap
    f1(&v1)

    v2 := 1
    f2(&v2)

    v3 := 1
    f3(&v3)
}

Closure function

  • A variable escapes if
    • the source variable is captured by a closure function
    • and their relationship is address-of (derefs = -1 )
package main

func closure1() {
    var x *int
    func(x1 *int) {
        func(x2 *int) {
            func(x3 *int) {
                y := 1
                x3 = &y
            }(x2)
        }(x1)
    }(x)
    _ = x
}
func closure2() {
    var x *int
    func() {
        func() {
            func() {
                // y escapes to heap
                y := 1 
                // x is captured by a closure
                x = &y
            }()
        }()
    }()
    _ = x
}

How to utilize ESC to benefit our programs?

  • Through understanding the concept of ESC, we can find that
    • variables usually escape
      • when their addresses are captured by other variables.
      • when ESC does not know their object sizes in compile time.
    • And passing arguments to a function is safer than returning values from the function. 

Observations

So, the first and most important suggestion is:

try not to use pointers as much as possible

Initialize slice with constants

package main

func foo1(kv1 map[string]int) {
    // constant cap let the slice stay on the stack
    const initSize = 1000
    s1 := make([]int, 0, initSize)
    for _, v := range kv1 {
        s1 = append(s1, v)
    }

    // do something else    
}

func main() {
    kv := make(map[string]int)
    kv["a"] = 0
    kv["b"] = 1
    kv["c"] = 2

    foo1(kv)
}
package main

func foo2(kv2 map[string]int) {
    initSize := len(kv2)
    // escapes to heap
    s2 := make([]int, 0, initSize)
    for _, v := range kv2 {
        s2 = append(s2, v)
    }

    // do something else 
}

func main() {
    kv := make(map[string]int)
    kv["a"] = 0
    kv["b"] = 1
    kv["c"] = 2

    foo2(kv)
}

Passing variables to closure functions

  • Passing variables to closure as arguments instead of accessing the variables directly.
func closure1() {
    var x *int
    func(x1 *int) {
        func(x2 *int) {
            func(x3 *int) {
                y := 1
                x3 = &y
            }(x2)
        }(x1)
    }(x)
    _ = x
}
func closure2() {
    var x *int
    func() {
        func() {
            func() {
                // y escapes to heap
                y := 1 
                // x is captured by a closure
                x = &y
            }()
        }()
    }()
    _ = x
}

Argument injection

// Read reads data into p.
// It returns the number of bytes read into p.
// The bytes are taken from at most one Read on the underlying Reader,
// hence n may be less than len(p).
// To read exactly len(p) bytes, use io.ReadFull(b, p).
// At EOF, the count will be zero and err will be io.EOF.
func (b *Reader) Read(p []byte) (n int, err error){
// ....
}
  • Injecting changes to the passed parameters instead of return values back. 
  • For exmaple: Reader.Read in pkg bufio.

Discussions

Q: Do I really need to worry about where variables are allocated?

 

In most cases, no.

Actually, Go's garbage collection is super powerful!

 

 

Discussions (cont'd)

Q: When I should start to optimize my programs?

 

Premature optimization is the root of all evil.

Only optimize services when they have performance or cost issues.

Discussions (cont'd)

Q: How can I know if variables in my programs escape or not?

 

Don't guess. Test it!  

go tool compile -l -m=[1-4] <file_path>

Discussions (cont'd)

Q: I have got lost during the sharing, what am I supposed to know?

 

Don't use pointers(?

Takeaways

In this tech sharing, we have introduced Go's escape analysis (ESC) and its underlying working process.

  • The goal of ESC is to keep objects on the stack as much as possible.

    • Because allocating objects on the stack is faster than in the heap.

  • ESC determines variables escape or not by data-flow (graph) analysis and other rules. 

Through understanding the ESC, we have learned:

  • Abusing pointers would make variables escape-prone.

  • Should use map, slice, and closure carefully.

  • Passing arguments to a function is safer than returning values from it.

Thanks for listening.