Jalex Chang
2020.11.14
The Escape Analysis in Go -
We can use memory much efficiently than thought
Jalex Chang
- Gopher.
- Love software engineering, database systems, and distributed systems.
- Backend Engineer @ Umbo Computer Vision
Contact:
- jalex.cpc @ gmail.com
- jalex.chang @ Facebook
- JalexChang @ GitHub
Agenda
-
Introduction
-
The Escape Analysis
-
Programming Tips
-
Discussions
-
Summary
References
[1] Source code of Go compiler, https://github.com/golang/go/tree/dev.boringcrypto.go1.15/src/cmd/compile
[2] Source code of Escape Analysis, https://github.com/golang/go/blob/dev.boringcrypto.go1.15/src/cmd/compile/internal/gc/escape.go
[3] Understanding Allocations: the Stack and the Heap - GopherCon SG 2019, https://www.youtube.com/watch?v=ZMZpH4yT7M0
[4] A visual guide to Go Memory Allocator from scratch (Golang), https://medium.com/@ankur_anand/a-visual-guide-to-golang-memory-allocator-from-ground-up-e132258453ed
[5] Go: Overview of the Compiler, https://medium.com/a-journey-with-go/go-overview-of-the-compiler-4e5a153ca889
[6] Escape analysis in the Go compiler, https://talks.cuonglm.xyz/escape-analysis-in-go-compiler.slide
Introduction
-
In this tech talk, we are going to introduce Go's escape analysis (ESC) and its underlying working process.
-
The topics we will cover in the talk:
-
What is ESC?
-
Why does Go need ESC?
-
When does ESC get to work?
-
How does ESC really work? Any exception?
-
How to utilize ESC to benefit our programs?
-
Go's memory allocation & managemet
-
Actually, Go's memory allocation and management mechanisms are complicated.
-
Such as garbage collection (GC), TCMalloc, multi-layered memory allocator, and etc.
-
-
Let's abstract it and focus on the variable (object) allocation.
Go's variable declaration in concept
-
For declared variables in Go, they need to be allocated as objects in memory where either in the heap or on the stack.
-
Heap
-
A global storage space
-
Where stored objects can be shared
-
Where stored objects are managed by the GC
-
-
Stack frames
-
A local storage space belonging to a function
-
Each stack frame is stuck to a goroutine.
-
Where stored objects are used privately
-
Where stored objects are managed by the belonging frame's lifecycle
-
Heap vs Stack
-
From a variable declaration perspective
-
Allocating objects on the stack is faster than in the heap.
-
Because goroutines can fully control their stack frames.
-
No locking, no GC, and less overhead.
-
Let's do some experiments to prove it~
Experiment1 - small objects
type T struct {
X int32 // 4B
}
var global interface{}
func BenchmarkAllocOnHeap(b *testing.B) {
b.ReportAllocs()
for i := 0; i <= b.N; i++ {
global = &T{}
}
}
func BenchmarkAllocOnStack(b *testing.B) {
b.ReportAllocs()
for i := 0; i <= b.N; i++ {
local := T{}
_ = local
}
}
$ go test -cpu 1,2,4,8,16 -bench=. malloc_small_object_test.go
goos: darwin
goarch: amd64
BenchmarkAllocOnHeap 66171081 18.2 ns/op 4 B/op 1 allocs/op
BenchmarkAllocOnHeap-2 93559117 12.4 ns/op 4 B/op 1 allocs/op
BenchmarkAllocOnHeap-4 92098896 13.0 ns/op 4 B/op 1 allocs/op
BenchmarkAllocOnHeap-8 85893501 12.2 ns/op 4 B/op 1 allocs/op
BenchmarkAllocOnHeap-16 86982369 11.9 ns/op 4 B/op 1 allocs/op
BenchmarkAllocOnStack 1000000000 0.294 ns/op 0 B/op 0 allocs/op
BenchmarkAllocOnStack-2 1000000000 0.292 ns/op 0 B/op 0 allocs/op
BenchmarkAllocOnStack-4 1000000000 0.296 ns/op 0 B/op 0 allocs/op
BenchmarkAllocOnStack-8 1000000000 0.299 ns/op 0 B/op 0 allocs/op
BenchmarkAllocOnStack-16 1000000000 0.294 ns/op 0 B/op 0 allocs/op
PASS
ok command-line-arguments 7.451s
About 40 times faster.
Experiment2 - huge Objects
type T struct {
X [1000]int32 // 4KB
}
var global interface{}
func BenchmarkAllocOnHeap(b *testing.B) {
b.ReportAllocs()
for i := 0; i <= b.N; i++ {
global = &T{}
}
}
func BenchmarkAllocOnStack(b *testing.B) {
b.ReportAllocs()
for i := 0; i <= b.N; i++ {
local := T{}
_ = local
}
}
$ go test -cpu 1,2,4,8,16 -bench=. malloc_huge_object_test.go
goos: darwin
goarch: amd64
BenchmarkAllocOnHeap 1626262 784 ns/op 4096 B/op 1 allocs/op
BenchmarkAllocOnHeap-2 1852974 613 ns/op 4096 B/op 1 allocs/op
BenchmarkAllocOnHeap-4 1949342 613 ns/op 4096 B/op 1 allocs/op
BenchmarkAllocOnHeap-8 1902932 629 ns/op 4096 B/op 1 allocs/op
BenchmarkAllocOnHeap-16 1765797 689 ns/op 4096 B/op 1 allocs/op
BenchmarkAllocOnStack 1000000000 0.398 ns/op 0 B/op 0 allocs/op
BenchmarkAllocOnStack-2 1000000000 0.297 ns/op 0 B/op 0 allocs/op
BenchmarkAllocOnStack-4 1000000000 0.301 ns/op 0 B/op 0 allocs/op
BenchmarkAllocOnStack-8 1000000000 0.293 ns/op 0 B/op 0 allocs/op
BenchmarkAllocOnStack-16 1000000000 0.295 ns/op 0 B/op 0 allocs/op
PASS
ok command-line-arguments 11.450s
About 2000 times faster.
Experiment3 - super large objects
// The maximum size of explicitly
// declared variables on stacks is 10MB
type T struct {
X [10 * 1000 * 1000]byte // 10MB
}
var global interface{}
func BenchmarkAllocOnHeap(b *testing.B) {
b.ReportAllocs()
for i := 0; i <= b.N; i++ {
global = &T{}
}
}
func BenchmarkAllocOnStack(b *testing.B) {
b.ReportAllocs()
for i := 0; i <= b.N; i++ {
local := T{}
_ = local
}
}
$ go test -cpu 1,2,4,8,16 -bench=. malloc_max_object_test.go
goos: darwin
goarch: amd64
BenchmarkAllocOnHeap 1659 687219 ns/op 10008461 B/op 1 allocs/op
BenchmarkAllocOnHeap-2 1568 797501 ns/op 10008811 B/op 1 allocs/op
BenchmarkAllocOnHeap-4 1593 816421 ns/op 10008715 B/op 1 allocs/op
BenchmarkAllocOnHeap-8 1360 782797 ns/op 10009793 B/op 1 allocs/op
BenchmarkAllocOnHeap-16 1424 817991 ns/op 10009466 B/op 1 allocs/op
BenchmarkAllocOnStack 1000000000 0.313 ns/op 0 B/op 0 allocs/op
BenchmarkAllocOnStack-2 1000000000 0.327 ns/op 0 B/op 0 allocs/op
BenchmarkAllocOnStack-4 1000000000 0.302 ns/op 0 B/op 0 allocs/op
BenchmarkAllocOnStack-8 1000000000 0.303 ns/op 0 B/op 0 allocs/op
BenchmarkAllocOnStack-16 1000000000 0.347 ns/op 0 B/op 0 allocs/op
PASS
ok command-line-arguments 8.197s
About 24M times faster.
Now we know allocating objects on the stack really matters.......
But how does Go know where a variable should be allocated?
The Escape Analysis
What is the escape analysis (ESC)?
-
The escape analysis is a mechanism to automatically decide whether a variable should be allocated in the heap or not in compile time.
-
It tries to keep variables on the stack as much as possible.
-
If a variable would be allocated in the heap, the variable is escaped (from the stack).
-
When does ESC happen?
ESC - concept
A variable's construction or type doesn’t determine where it lives. Only how the variable is shared does.
-
ESC would consider assignment relationships between declared variables.
-
Generally, a variable scapes if:
-
its address has been captured by the address-of operand (&).
-
and at least one of the related variables has already escaped.
-
package main
var g *int
func main() {
// ecsape to heap
v := 0
g = &v
}
$ go run -gcflags "-m=2 -l" basic_concept.go
# command-line-arguments
./basic_concept.go:8:2: v escapes to heap:
./basic_concept.go:8:2: flow: {heap} = &v:
./basic_concept.go:8:2: from &v (address-of) at ./basic_concept.go:9:6
./basic_concept.go:8:2: from g = &v (assign) at ./basic_concept.go:9:4
./basic_concept.go:8:2: moved to heap: v
How does ESC work?
-
Basically, ESC determines whether variables escape or not by
-
the data-flow analysis (shortest path analysis)
- and other additional rules
-
ESC - data-flow analysis
-
Data-flow is a directed weighted graph
- Constructed from the abstract syntax tree (AST).
-
It is used to represent relationships between variables.
-
Vertices (locations)
-
Represent all declared variables.
-
Compound types (struct, slice, and map...) is lowered to the simplest representation.
-
-
Edges
-
Represent assignments between variables.
-
Each edge has a weight representing addressing/dereference counts (derefs).
-
Examples of data-flow representation
Data-flow analysis - process flow
Step1. Construct locations
-
Walk through all functions to collect declared variables.
Step2. Construct edges
-
Walk through all functions again to collect assignments.
Step3. Analyze the built graph
-
Iteratively walk through the built graph (based on the Bellman-Ford algorithm).
-
Start from every location.
-
Mark a variable as escaped if the source location has escaped and the relative derefs (shortest path) is -1.
-
Stop expanding a variable's incoming edges if the variable escapes.
-
Step4. Collect escape notes
-
Walk through locations to collect the escape reasons of marked variables.
Feel dizzy? Let me show you an example.
Construct locations
var p **int
func f1() {
var x1 *int
p = &x1
x2 := x1
x3 := *p
x4 := &x3
_ = x2
_ = x4
}
func f2() {
var t **int
y1 := 1
y2 := &y1
y3 := y2
t = &y2
p = t
t = &y3
}
func main() {
f1()
f2()
}
Construct edges
var p **int
func f1() {
var x1 *int
p = &x1
x2 := x1
x3 := *p
x4 := &x3
_ = x2
_ = x4
}
func f2() {
var t **int
y1 := 1
y2 := &y1
y3 := y2
t = &y2
p = t
t = &y3
}
func main() {
f1()
f2()
}
Analyze the built graph (1)
var p **int
func f1() {
var x1 *int
p = &x1
x2 := x1
x3 := *p
x4 := &x3
_ = x2
_ = x4
}
func f2() {
var t **int
y1 := 1
y2 := &y1
y3 := y2
t = &y2
p = t
t = &y3
}
func main() {
f1()
f2()
}
Analyze the built graph (2)
var p **int
func f1() {
var x1 *int
p = &x1
x2 := x1
x3 := *p
x4 := &x3
_ = x2
_ = x4
}
func f2() {
var t **int
y1 := 1
y2 := &y1
y3 := y2
t = &y2
p = t
t = &y3
}
func main() {
f1()
f2()
}
Analyze the built graph (3)
var p **int
func f1() {
var x1 *int
p = &x1
x2 := x1
x3 := *p
x4 := &x3
_ = x2
_ = x4
}
func f2() {
var t **int
y1 := 1
y2 := &y1
y3 := y2
t = &y2
p = t
t = &y3
}
func main() {
f1()
f2()
}
Analyze the built graph (4)
x1, y2, and y3 have checked,
let's skip them
var p **int
func f1() {
var x1 *int
p = &x1
x2 := x1
x3 := *p
x4 := &x3
_ = x2
_ = x4
}
func f2() {
var t **int
y1 := 1
y2 := &y1
y3 := y2
t = &y2
p = t
t = &y3
}
func main() {
f1()
f2()
}
Analyze the built graph (5)
The analysis is finnished!
var p **int
func f1() {
var x1 *int
p = &x1
x2 := x1
x3 := *p
x4 := &x3
_ = x2
_ = x4
}
func f2() {
var t **int
y1 := 1
y2 := &y1
y3 := y2
t = &y2
p = t
t = &y3
}
func main() {
f1()
f2()
}
Collect escape notes
$ go run -gcflags "-m=2 -l" indirect_primitives.go
command-line-arguments
./indirect_primitives.go:6:6: x1 escapes to heap:
./indirect_primitives.go:6:6: flow: {heap} = &x1:
./indirect_primitives.go:6:6: from &x1 (address-of) at ./indirect_primitives.go:7:6
./indirect_primitives.go:6:6: from p = &x1 (assign) at ./indirect_primitives.go:7:4
./indirect_primitives.go:6:6: moved to heap: x1
./indirect_primitives.go:20:2: y3 escapes to heap:
./indirect_primitives.go:20:2: flow: t = &y3:
./indirect_primitives.go:20:2: from &y3 (address-of) at ./indirect_primitives.go:24:6
./indirect_primitives.go:20:2: from t = &y3 (assign) at ./indirect_primitives.go:24:4
./indirect_primitives.go:20:2: flow: {heap} = t:
./indirect_primitives.go:20:2: from p = t (assign) at ./indirect_primitives.go:23:4
./indirect_primitives.go:19:2: y2 escapes to heap:
./indirect_primitives.go:19:2: flow: t = &y2:
./indirect_primitives.go:19:2: from &y2 (address-of) at ./indirect_primitives.go:22:6
./indirect_primitives.go:19:2: from t = &y2 (assign) at ./indirect_primitives.go:22:4
./indirect_primitives.go:19:2: flow: {heap} = t:
./indirect_primitives.go:19:2: from p = t (assign) at ./indirect_primitives.go:23:4
./indirect_primitives.go:18:2: y1 escapes to heap:
./indirect_primitives.go:18:2: flow: y2 = &y1:
./indirect_primitives.go:18:2: from &y1 (address-of) at ./indirect_primitives.go:19:8
./indirect_primitives.go:18:2: from y2 := &y1 (assign) at ./indirect_primitives.go:19:5
./indirect_primitives.go:18:2: moved to heap: y1
./indirect_primitives.go:19:2: moved to heap: y2
./indirect_primitives.go:20:2: moved to heap: y3
var p **int
func f1() {
var x1 *int
p = &x1
x2 := x1
x3 := *p
x4 := &x3
_ = x2
_ = x4
}
func f2() {
var t **int
y1 := 1
y2 := &y1
y3 := y2
t = &y2
p = t
t = &y3
}
func main() {
f1()
f2()
}
In addition to the data-flow analysis, there are some (but not all) additional rules in ESC.
Huge objects
package main
type smallExplicitT struct {
a [1000 * 1000]int32 // 4MB
}
func main() {
dcl3 := smallExplicitT{}
dcl4 := make([]int32, 0, 15*1000) // 60KB
_ = dcl3
_ = dcl4
}
-
For explicit declarations (var or :=)
-
The variables escape if their sizes are over 10MB
-
-
For implicit declarations (new or make)
- The variables escape if their sizes are over 64KB
package main
type hugeExplicitT struct {
a [3 * 1000 * 1000]int32 // 12MB
}
func main() {
// dcl1 escapes to heap: too large for stack
dcl1 := hugeExplicitT{}
// dcl2 escapes to heap: too large for stack
dcl2 := make([]int32, 0, 17*1000) // 68KB
_ = dcl1
_ = dcl2
}
Slice
-
A slice variable escapes if its size of the capacity is non-constant.
package main
func main() {
const constSize = 10
var varSize = 10
s1 := []int32{}
// s2 escapes to heap: non-constant size
s2 := make([]int32, varSize)
s3 := make([]int32, constSize)
// s4 escapes to heap: non-constant size
s4 := make([]int32, varSize, varSize)
s5 := make([]int32, varSize, constSize)
// s6 escapes to heap: non-constant size
s6 := make([]int32, constSize, varSize)
s7 := make([]int32, constSize, constSize)
}
Map
- A variable escapes if it is referenced by a map's key or value.
- The escape happens no matter the map escape or not.
package main
func map1() {
m1 := make(map[int]int)
k1 := 0
v1 := 0
m1[k1] = v1
}
func map2() {
m2 := make(map[*int]*int)
k2 := 0 // escapes to heap: key of map put
v2 := 0 // escapes to heap
m2[&k2] = &v2
}
func map3() {
m3 := make(map[interface{}]interface{})
k3 := 0 // escapes to heap: key of map put
v3 := 0 // escapes to heap
m3[&k3] = &v3 // interface-converted happens
}
Return values
- Returning values is a backward behavior that
- the referenced variables escape if the return values are pointers
- the values escape if they are map or slice
func f1() **int {
// t escapes to heap
t := 0
// x1 escapes to heap
x1 := &t
return &x1
}
func f2() *int {
// t escapes to heap
t := 0
x2 := &t
return x2
}
func f3() int {
t := 0
x3 := t
return x3
}
func f4() map[string]int {
// kv escapes to heap
kv := make(map[string]int)
return kv
}
func f5() []int {
// s escapes to heap
s := []int{}
return s
}
Input parameters
- Passing arguments is a forward behavior that
- the arguments escape if input parameters have leaked (to heap)
package main
func f1(x1 *int) **int {
// x1 escapes to heap: parameter leaking
return &x1
}
func f2(x2 *int) *int {
return x2
}
func f3(x3 *int) int {
return *x3
}
func main() {
v1 := 1 // v1 escapes to heap
f1(&v1)
v2 := 1
f2(&v2)
v3 := 1
f3(&v3)
}
Closure function
- A variable escapes if
- the source variable is captured by a closure function
- and their relationship is address-of (derefs = -1 )
package main
func closure1() {
var x *int
func(x1 *int) {
func(x2 *int) {
func(x3 *int) {
y := 1
x3 = &y
}(x2)
}(x1)
}(x)
_ = x
}
func closure2() {
var x *int
func() {
func() {
func() {
// y escapes to heap
y := 1
// x is captured by a closure
x = &y
}()
}()
}()
_ = x
}
How to utilize ESC to benefit our programs?
- Through understanding the concept of ESC, we can find that
-
variables usually escape
- when their addresses are captured by other variables.
- when ESC does not know their object sizes in compile time.
- And passing arguments to a function is safer than returning values from the function.
-
variables usually escape
Observations
So, the first and most important suggestion is:
try not to use pointers as much as possible
Initialize slice with constants
package main
func foo1(kv1 map[string]int) {
// constant cap let the slice stay on the stack
const initSize = 1000
s1 := make([]int, 0, initSize)
for _, v := range kv1 {
s1 = append(s1, v)
}
// do something else
}
func main() {
kv := make(map[string]int)
kv["a"] = 0
kv["b"] = 1
kv["c"] = 2
foo1(kv)
}
package main
func foo2(kv2 map[string]int) {
initSize := len(kv2)
// escapes to heap
s2 := make([]int, 0, initSize)
for _, v := range kv2 {
s2 = append(s2, v)
}
// do something else
}
func main() {
kv := make(map[string]int)
kv["a"] = 0
kv["b"] = 1
kv["c"] = 2
foo2(kv)
}
Passing variables to closure functions
- Passing variables to closure as arguments instead of accessing the variables directly.
func closure1() {
var x *int
func(x1 *int) {
func(x2 *int) {
func(x3 *int) {
y := 1
x3 = &y
}(x2)
}(x1)
}(x)
_ = x
}
func closure2() {
var x *int
func() {
func() {
func() {
// y escapes to heap
y := 1
// x is captured by a closure
x = &y
}()
}()
}()
_ = x
}
Argument injection
// Read reads data into p.
// It returns the number of bytes read into p.
// The bytes are taken from at most one Read on the underlying Reader,
// hence n may be less than len(p).
// To read exactly len(p) bytes, use io.ReadFull(b, p).
// At EOF, the count will be zero and err will be io.EOF.
func (b *Reader) Read(p []byte) (n int, err error){
// ....
}
- Injecting changes to the passed parameters instead of return values back.
- For exmaple: Reader.Read in pkg bufio.
Discussions
Q: Do I really need to worry about where variables are allocated?
In most cases, no.
Actually, Go's garbage collection is super powerful!
Discussions (cont'd)
Q: When I should start to optimize my programs?
Premature optimization is the root of all evil.
Only optimize services when they have performance or cost issues.
Discussions (cont'd)
Q: How can I know if variables in my programs escape or not?
Don't guess. Test it!
go tool compile -l -m=[1-4] <file_path>
Discussions (cont'd)
Q: I have got lost during the sharing, what am I supposed to know?
Don't use pointers(?
Takeaways
In this tech sharing, we have introduced Go's escape analysis (ESC) and its underlying working process.
-
The goal of ESC is to keep objects on the stack as much as possible.
-
Because allocating objects on the stack is faster than in the heap.
-
-
ESC determines variables escape or not by data-flow (graph) analysis and other rules.
Through understanding the ESC, we have learned:
-
Abusing pointers would make variables escape-prone.
-
Should use map, slice, and closure carefully.
-
Passing arguments to a function is safer than returning values from it.
Thanks for listening.
The Escape Analysis in Go
By Jalex Chang
The Escape Analysis in Go
Introduce the escape analysis in Go (1.15) and its underlying working process.
- 3,132