2020.11.14
Jalex Chang
- Gopher.
- Love software engineering, database systems, and distributed systems.
- Backend Engineer @ Umbo Computer Vision
Contact:
- jalex.cpc @ gmail.com
- jalex.chang @ Facebook
- JalexChang @ GitHub
Introduction
The Escape Analysis
Programming Tips
Discussions
Summary
[1] Source code of Go compiler, https://github.com/golang/go/tree/dev.boringcrypto.go1.15/src/cmd/compile
[2] Source code of Escape Analysis, https://github.com/golang/go/blob/dev.boringcrypto.go1.15/src/cmd/compile/internal/gc/escape.go
[3] Understanding Allocations: the Stack and the Heap - GopherCon SG 2019, https://www.youtube.com/watch?v=ZMZpH4yT7M0
[4] A visual guide to Go Memory Allocator from scratch (Golang), https://medium.com/@ankur_anand/a-visual-guide-to-golang-memory-allocator-from-ground-up-e132258453ed
[5] Go: Overview of the Compiler, https://medium.com/a-journey-with-go/go-overview-of-the-compiler-4e5a153ca889
[6] Escape analysis in the Go compiler, https://talks.cuonglm.xyz/escape-analysis-in-go-compiler.slide
In this tech talk, we are going to introduce Go's escape analysis (ESC) and its underlying working process.
The topics we will cover in the talk:
What is ESC?
Why does Go need ESC?
When does ESC get to work?
How does ESC really work? Any exception?
How to utilize ESC to benefit our programs?
Actually, Go's memory allocation and management mechanisms are complicated.
Such as garbage collection (GC), TCMalloc, multi-layered memory allocator, and etc.
Let's abstract it and focus on the variable (object) allocation.
For declared variables in Go, they need to be allocated as objects in memory where either in the heap or on the stack.
Heap
A global storage space
Where stored objects can be shared
Where stored objects are managed by the GC
Stack frames
A local storage space belonging to a function
Each stack frame is stuck to a goroutine.
Where stored objects are used privately
Where stored objects are managed by the belonging frame's lifecycle
From a variable declaration perspective
Allocating objects on the stack is faster than in the heap.
Because goroutines can fully control their stack frames.
No locking, no GC, and less overhead.
Let's do some experiments to prove it~
type T struct {
X int32 // 4B
}
var global interface{}
func BenchmarkAllocOnHeap(b *testing.B) {
b.ReportAllocs()
for i := 0; i <= b.N; i++ {
global = &T{}
}
}
func BenchmarkAllocOnStack(b *testing.B) {
b.ReportAllocs()
for i := 0; i <= b.N; i++ {
local := T{}
_ = local
}
}
$ go test -cpu 1,2,4,8,16 -bench=. malloc_small_object_test.go
goos: darwin
goarch: amd64
BenchmarkAllocOnHeap 66171081 18.2 ns/op 4 B/op 1 allocs/op
BenchmarkAllocOnHeap-2 93559117 12.4 ns/op 4 B/op 1 allocs/op
BenchmarkAllocOnHeap-4 92098896 13.0 ns/op 4 B/op 1 allocs/op
BenchmarkAllocOnHeap-8 85893501 12.2 ns/op 4 B/op 1 allocs/op
BenchmarkAllocOnHeap-16 86982369 11.9 ns/op 4 B/op 1 allocs/op
BenchmarkAllocOnStack 1000000000 0.294 ns/op 0 B/op 0 allocs/op
BenchmarkAllocOnStack-2 1000000000 0.292 ns/op 0 B/op 0 allocs/op
BenchmarkAllocOnStack-4 1000000000 0.296 ns/op 0 B/op 0 allocs/op
BenchmarkAllocOnStack-8 1000000000 0.299 ns/op 0 B/op 0 allocs/op
BenchmarkAllocOnStack-16 1000000000 0.294 ns/op 0 B/op 0 allocs/op
PASS
ok command-line-arguments 7.451s
About 40 times faster.
type T struct {
X [1000]int32 // 4KB
}
var global interface{}
func BenchmarkAllocOnHeap(b *testing.B) {
b.ReportAllocs()
for i := 0; i <= b.N; i++ {
global = &T{}
}
}
func BenchmarkAllocOnStack(b *testing.B) {
b.ReportAllocs()
for i := 0; i <= b.N; i++ {
local := T{}
_ = local
}
}
$ go test -cpu 1,2,4,8,16 -bench=. malloc_huge_object_test.go
goos: darwin
goarch: amd64
BenchmarkAllocOnHeap 1626262 784 ns/op 4096 B/op 1 allocs/op
BenchmarkAllocOnHeap-2 1852974 613 ns/op 4096 B/op 1 allocs/op
BenchmarkAllocOnHeap-4 1949342 613 ns/op 4096 B/op 1 allocs/op
BenchmarkAllocOnHeap-8 1902932 629 ns/op 4096 B/op 1 allocs/op
BenchmarkAllocOnHeap-16 1765797 689 ns/op 4096 B/op 1 allocs/op
BenchmarkAllocOnStack 1000000000 0.398 ns/op 0 B/op 0 allocs/op
BenchmarkAllocOnStack-2 1000000000 0.297 ns/op 0 B/op 0 allocs/op
BenchmarkAllocOnStack-4 1000000000 0.301 ns/op 0 B/op 0 allocs/op
BenchmarkAllocOnStack-8 1000000000 0.293 ns/op 0 B/op 0 allocs/op
BenchmarkAllocOnStack-16 1000000000 0.295 ns/op 0 B/op 0 allocs/op
PASS
ok command-line-arguments 11.450s
About 2000 times faster.
// The maximum size of explicitly
// declared variables on stacks is 10MB
type T struct {
X [10 * 1000 * 1000]byte // 10MB
}
var global interface{}
func BenchmarkAllocOnHeap(b *testing.B) {
b.ReportAllocs()
for i := 0; i <= b.N; i++ {
global = &T{}
}
}
func BenchmarkAllocOnStack(b *testing.B) {
b.ReportAllocs()
for i := 0; i <= b.N; i++ {
local := T{}
_ = local
}
}
$ go test -cpu 1,2,4,8,16 -bench=. malloc_max_object_test.go
goos: darwin
goarch: amd64
BenchmarkAllocOnHeap 1659 687219 ns/op 10008461 B/op 1 allocs/op
BenchmarkAllocOnHeap-2 1568 797501 ns/op 10008811 B/op 1 allocs/op
BenchmarkAllocOnHeap-4 1593 816421 ns/op 10008715 B/op 1 allocs/op
BenchmarkAllocOnHeap-8 1360 782797 ns/op 10009793 B/op 1 allocs/op
BenchmarkAllocOnHeap-16 1424 817991 ns/op 10009466 B/op 1 allocs/op
BenchmarkAllocOnStack 1000000000 0.313 ns/op 0 B/op 0 allocs/op
BenchmarkAllocOnStack-2 1000000000 0.327 ns/op 0 B/op 0 allocs/op
BenchmarkAllocOnStack-4 1000000000 0.302 ns/op 0 B/op 0 allocs/op
BenchmarkAllocOnStack-8 1000000000 0.303 ns/op 0 B/op 0 allocs/op
BenchmarkAllocOnStack-16 1000000000 0.347 ns/op 0 B/op 0 allocs/op
PASS
ok command-line-arguments 8.197s
About 24M times faster.
The escape analysis is a mechanism to automatically decide whether a variable should be allocated in the heap or not in compile time.
It tries to keep variables on the stack as much as possible.
If a variable would be allocated in the heap, the variable is escaped (from the stack).
A variable's construction or type doesn’t determine where it lives. Only how the variable is shared does.
ESC would consider assignment relationships between declared variables.
Generally, a variable scapes if:
its address has been captured by the address-of operand (&).
and at least one of the related variables has already escaped.
package main
var g *int
func main() {
// ecsape to heap
v := 0
g = &v
}
$ go run -gcflags "-m=2 -l" basic_concept.go
# command-line-arguments
./basic_concept.go:8:2: v escapes to heap:
./basic_concept.go:8:2: flow: {heap} = &v:
./basic_concept.go:8:2: from &v (address-of) at ./basic_concept.go:9:6
./basic_concept.go:8:2: from g = &v (assign) at ./basic_concept.go:9:4
./basic_concept.go:8:2: moved to heap: v
Basically, ESC determines whether variables escape or not by
the data-flow analysis (shortest path analysis)
Data-flow is a directed weighted graph
It is used to represent relationships between variables.
Vertices (locations)
Represent all declared variables.
Compound types (struct, slice, and map...) is lowered to the simplest representation.
Edges
Represent assignments between variables.
Each edge has a weight representing addressing/dereference counts (derefs).
Step1. Construct locations
Walk through all functions to collect declared variables.
Step2. Construct edges
Walk through all functions again to collect assignments.
Step3. Analyze the built graph
Iteratively walk through the built graph (based on the Bellman-Ford algorithm).
Start from every location.
Mark a variable as escaped if the source location has escaped and the relative derefs (shortest path) is -1.
Stop expanding a variable's incoming edges if the variable escapes.
Step4. Collect escape notes
Walk through locations to collect the escape reasons of marked variables.
var p **int
func f1() {
var x1 *int
p = &x1
x2 := x1
x3 := *p
x4 := &x3
_ = x2
_ = x4
}
func f2() {
var t **int
y1 := 1
y2 := &y1
y3 := y2
t = &y2
p = t
t = &y3
}
func main() {
f1()
f2()
}
var p **int
func f1() {
var x1 *int
p = &x1
x2 := x1
x3 := *p
x4 := &x3
_ = x2
_ = x4
}
func f2() {
var t **int
y1 := 1
y2 := &y1
y3 := y2
t = &y2
p = t
t = &y3
}
func main() {
f1()
f2()
}
var p **int
func f1() {
var x1 *int
p = &x1
x2 := x1
x3 := *p
x4 := &x3
_ = x2
_ = x4
}
func f2() {
var t **int
y1 := 1
y2 := &y1
y3 := y2
t = &y2
p = t
t = &y3
}
func main() {
f1()
f2()
}
var p **int
func f1() {
var x1 *int
p = &x1
x2 := x1
x3 := *p
x4 := &x3
_ = x2
_ = x4
}
func f2() {
var t **int
y1 := 1
y2 := &y1
y3 := y2
t = &y2
p = t
t = &y3
}
func main() {
f1()
f2()
}
var p **int
func f1() {
var x1 *int
p = &x1
x2 := x1
x3 := *p
x4 := &x3
_ = x2
_ = x4
}
func f2() {
var t **int
y1 := 1
y2 := &y1
y3 := y2
t = &y2
p = t
t = &y3
}
func main() {
f1()
f2()
}
x1, y2, and y3 have checked,
let's skip them
var p **int
func f1() {
var x1 *int
p = &x1
x2 := x1
x3 := *p
x4 := &x3
_ = x2
_ = x4
}
func f2() {
var t **int
y1 := 1
y2 := &y1
y3 := y2
t = &y2
p = t
t = &y3
}
func main() {
f1()
f2()
}
The analysis is finnished!
var p **int
func f1() {
var x1 *int
p = &x1
x2 := x1
x3 := *p
x4 := &x3
_ = x2
_ = x4
}
func f2() {
var t **int
y1 := 1
y2 := &y1
y3 := y2
t = &y2
p = t
t = &y3
}
func main() {
f1()
f2()
}
$ go run -gcflags "-m=2 -l" indirect_primitives.go
command-line-arguments
./indirect_primitives.go:6:6: x1 escapes to heap:
./indirect_primitives.go:6:6: flow: {heap} = &x1:
./indirect_primitives.go:6:6: from &x1 (address-of) at ./indirect_primitives.go:7:6
./indirect_primitives.go:6:6: from p = &x1 (assign) at ./indirect_primitives.go:7:4
./indirect_primitives.go:6:6: moved to heap: x1
./indirect_primitives.go:20:2: y3 escapes to heap:
./indirect_primitives.go:20:2: flow: t = &y3:
./indirect_primitives.go:20:2: from &y3 (address-of) at ./indirect_primitives.go:24:6
./indirect_primitives.go:20:2: from t = &y3 (assign) at ./indirect_primitives.go:24:4
./indirect_primitives.go:20:2: flow: {heap} = t:
./indirect_primitives.go:20:2: from p = t (assign) at ./indirect_primitives.go:23:4
./indirect_primitives.go:19:2: y2 escapes to heap:
./indirect_primitives.go:19:2: flow: t = &y2:
./indirect_primitives.go:19:2: from &y2 (address-of) at ./indirect_primitives.go:22:6
./indirect_primitives.go:19:2: from t = &y2 (assign) at ./indirect_primitives.go:22:4
./indirect_primitives.go:19:2: flow: {heap} = t:
./indirect_primitives.go:19:2: from p = t (assign) at ./indirect_primitives.go:23:4
./indirect_primitives.go:18:2: y1 escapes to heap:
./indirect_primitives.go:18:2: flow: y2 = &y1:
./indirect_primitives.go:18:2: from &y1 (address-of) at ./indirect_primitives.go:19:8
./indirect_primitives.go:18:2: from y2 := &y1 (assign) at ./indirect_primitives.go:19:5
./indirect_primitives.go:18:2: moved to heap: y1
./indirect_primitives.go:19:2: moved to heap: y2
./indirect_primitives.go:20:2: moved to heap: y3
var p **int
func f1() {
var x1 *int
p = &x1
x2 := x1
x3 := *p
x4 := &x3
_ = x2
_ = x4
}
func f2() {
var t **int
y1 := 1
y2 := &y1
y3 := y2
t = &y2
p = t
t = &y3
}
func main() {
f1()
f2()
}
package main
type smallExplicitT struct {
a [1000 * 1000]int32 // 4MB
}
func main() {
dcl3 := smallExplicitT{}
dcl4 := make([]int32, 0, 15*1000) // 60KB
_ = dcl3
_ = dcl4
}
For explicit declarations (var or :=)
The variables escape if their sizes are over 10MB
For implicit declarations (new or make)
package main
type hugeExplicitT struct {
a [3 * 1000 * 1000]int32 // 12MB
}
func main() {
// dcl1 escapes to heap: too large for stack
dcl1 := hugeExplicitT{}
// dcl2 escapes to heap: too large for stack
dcl2 := make([]int32, 0, 17*1000) // 68KB
_ = dcl1
_ = dcl2
}
A slice variable escapes if its size of the capacity is non-constant.
package main
func main() {
const constSize = 10
var varSize = 10
s1 := []int32{}
// s2 escapes to heap: non-constant size
s2 := make([]int32, varSize)
s3 := make([]int32, constSize)
// s4 escapes to heap: non-constant size
s4 := make([]int32, varSize, varSize)
s5 := make([]int32, varSize, constSize)
// s6 escapes to heap: non-constant size
s6 := make([]int32, constSize, varSize)
s7 := make([]int32, constSize, constSize)
}
package main
func map1() {
m1 := make(map[int]int)
k1 := 0
v1 := 0
m1[k1] = v1
}
func map2() {
m2 := make(map[*int]*int)
k2 := 0 // escapes to heap: key of map put
v2 := 0 // escapes to heap
m2[&k2] = &v2
}
func map3() {
m3 := make(map[interface{}]interface{})
k3 := 0 // escapes to heap: key of map put
v3 := 0 // escapes to heap
m3[&k3] = &v3 // interface-converted happens
}
func f1() **int {
// t escapes to heap
t := 0
// x1 escapes to heap
x1 := &t
return &x1
}
func f2() *int {
// t escapes to heap
t := 0
x2 := &t
return x2
}
func f3() int {
t := 0
x3 := t
return x3
}
func f4() map[string]int {
// kv escapes to heap
kv := make(map[string]int)
return kv
}
func f5() []int {
// s escapes to heap
s := []int{}
return s
}
package main
func f1(x1 *int) **int {
// x1 escapes to heap: parameter leaking
return &x1
}
func f2(x2 *int) *int {
return x2
}
func f3(x3 *int) int {
return *x3
}
func main() {
v1 := 1 // v1 escapes to heap
f1(&v1)
v2 := 1
f2(&v2)
v3 := 1
f3(&v3)
}
package main
func closure1() {
var x *int
func(x1 *int) {
func(x2 *int) {
func(x3 *int) {
y := 1
x3 = &y
}(x2)
}(x1)
}(x)
_ = x
}
func closure2() {
var x *int
func() {
func() {
func() {
// y escapes to heap
y := 1
// x is captured by a closure
x = &y
}()
}()
}()
_ = x
}
package main
func foo1(kv1 map[string]int) {
// constant cap let the slice stay on the stack
const initSize = 1000
s1 := make([]int, 0, initSize)
for _, v := range kv1 {
s1 = append(s1, v)
}
// do something else
}
func main() {
kv := make(map[string]int)
kv["a"] = 0
kv["b"] = 1
kv["c"] = 2
foo1(kv)
}
package main
func foo2(kv2 map[string]int) {
initSize := len(kv2)
// escapes to heap
s2 := make([]int, 0, initSize)
for _, v := range kv2 {
s2 = append(s2, v)
}
// do something else
}
func main() {
kv := make(map[string]int)
kv["a"] = 0
kv["b"] = 1
kv["c"] = 2
foo2(kv)
}
func closure1() {
var x *int
func(x1 *int) {
func(x2 *int) {
func(x3 *int) {
y := 1
x3 = &y
}(x2)
}(x1)
}(x)
_ = x
}
func closure2() {
var x *int
func() {
func() {
func() {
// y escapes to heap
y := 1
// x is captured by a closure
x = &y
}()
}()
}()
_ = x
}
// Read reads data into p.
// It returns the number of bytes read into p.
// The bytes are taken from at most one Read on the underlying Reader,
// hence n may be less than len(p).
// To read exactly len(p) bytes, use io.ReadFull(b, p).
// At EOF, the count will be zero and err will be io.EOF.
func (b *Reader) Read(p []byte) (n int, err error){
// ....
}
Q: Do I really need to worry about where variables are allocated?
In most cases, no.
Actually, Go's garbage collection is super powerful!
Q: When I should start to optimize my programs?
Premature optimization is the root of all evil.
Only optimize services when they have performance or cost issues.
Q: How can I know if variables in my programs escape or not?
Don't guess. Test it!
go tool compile -l -m=[1-4] <file_path>
Q: I have got lost during the sharing, what am I supposed to know?
Don't use pointers(?
In this tech sharing, we have introduced Go's escape analysis (ESC) and its underlying working process.
The goal of ESC is to keep objects on the stack as much as possible.
Because allocating objects on the stack is faster than in the heap.
ESC determines variables escape or not by data-flow (graph) analysis and other rules.
Through understanding the ESC, we have learned:
Abusing pointers would make variables escape-prone.
Should use map, slice, and closure carefully.
Passing arguments to a function is safer than returning values from it.