Decoding the Black Box: Understanding the Go Compiler
FOSSASIA 2024
@imJenal
Jyotsna Gupta
@imJenal
Hi! I'm Jyotsna
Type Checking & Semantic Analysis
Generate Intermediate Representation (IR)
Optimize
IR (SSA)
Source
Code
Input
Read
Source
Code File
Lexer: Convert to Tokens
Parser: Create
AST
Generate Machine Code
Link Machine Code with Libraries
Create Executable Files
Executable Ready
@imJenal
Type Checking & Semantic Analysis
Generate Intermediate Representation (IR)
Optimize
IR (SSA)
Source
Code
Input
Read
Source
Code File
Lexer: Convert to Tokens
Parser: Create
AST
Generate Machine Code
Link Machine Code with Libraries
Create Executable Files
Executable Ready
@imJenal
-
Also known as Scanner
-
Breaks down the source code into tokens
-
Tokens are the smallest units of meaning
-
Identifies keywords, identifiers, literals, operators
- Discards whitespace and comments
Lexer
@imJenal
Type Checking & Semantic Analysis
Generate Intermediate Representation (IR)
Optimize
IR (SSA)
Source
Code
Input
Read
Source
Code File
Lexer: Convert to Tokens
Parser: Create
AST
Generate Machine Code
Link Machine Code with Libraries
Create Executable Files
Executable Ready
@imJenal
-
Constructs tree-like structure known as an Abstract Syntax Tree (AST)
-
Represents the grammatical structure of the code
- The parser checks the code against the grammatical rules of the programming language to ensure it's structured correctly. ( e.g. missing semi-colon gets detected )
Parser
@imJenal
Type Checking & Semantic Analysis
Generate Intermediate Representation (IR)
Optimize
IR (SSA)
Source
Code
Input
Read
Source
Code File
Lexer: Convert to Tokens
Parser: Create
AST
Generate Machine Code
Link Machine Code with Libraries
Create Executable Files
Executable Ready
@imJenal
- This phase involves checking that the AST adheres to the semantic rules of the language: type checking, scoping rules, and constraints
- Ensures that operations are performed on compatible types, variables are declared before use, and the code respects the language's rules ( e.g. not using an integer as a function)
Semantic Analysis
@imJenal
Type Checking & Semantic Analysis
Generate Intermediate Representation (IR)
Optimize
IR (SSA)
Source
Code
Input
Read
Source
Code File
Lexer: Convert to Tokens
Parser: Create
AST
Generate Machine Code
Link Machine Code with Libraries
Create Executable Files
Executable Ready
@imJenal
- IR: Lower-level representation of the program
- Translates the AST into a sequence of instruction set
- Independent of the target machine's architecture but closer to the machine language than high-level code
- Serves as a middle ground that makes it easier to apply optimizations
Intermediate Code Generation
@imJenal
Type Checking & Semantic Analysis
Generate Intermediate Representation (IR)
Optimize
IR (SSA)
Source
Code
Input
Read
Source
Code File
Lexer: Convert to Tokens
Parser: Create
AST
Generate Machine Code
Link Machine Code with Libraries
Create Executable Files
Executable Ready
@imJenal
- The goal is to make the program run faster and be more resource-efficient
- Techniques can include eliminating unnecessary calculations, reducing the number of instructions, or minimizing memory usage
- One of the methods: Static Single Assignment (SSA) [SSA: each variable is assigned exactly once]
Optimize IR
@imJenal
Type Checking & Semantic Analysis
Generate Intermediate Representation (IR)
Optimize
IR (SSA)
Source
Code
Input
Read
Source
Code File
Lexer: Convert to Tokens
Parser: Create
AST
Generate Machine Code
Link Machine Code with Libraries
Create Executable Files
Executable Ready
@imJenal
- This phase converts the optimized IR into machine code, the binary instructions that the processor can execute
- The code generator maps each instruction in the IR to a sequence of machine-level instructions.
- It takes into account the specifics of the target processor's architecture (such as the number and type of registers) during this translation.
Machine Code Generation
@imJenal
Type Checking & Semantic Analysis
Generate Intermediate Representation (IR)
Optimize
IR (SSA)
Source
Code
Input
Read
Source
Code File
Lexer: Convert to Tokens
Parser: Create
AST
Generate Machine Code
Link Machine Code with Libraries
Create Executable Files
Executable Ready
@imJenal
- Final step in the compilation
- Different pieces of generated machine code (from the source code files and libraries) are combined into a single executable program
- The linker resolves references to undefined symbols, like functions or variables declared in other files or libraries
- Once linking is completed, you have an executable file that the operating system can load into memory and the CPU can execute, thus running the program
Linking and Execution
@imJenal
Type Checking & Semantic Analysis
Generate Intermediate Representation (IR)
Optimize
IR (SSA)
Source
Code
Input
Read
Source
Code File
Lexer: Convert to Tokens
Parser: Create
AST
Generate Machine Code
Link Machine Code with Libraries
Create Executable Files
Executable Ready
@imJenal
Getting Started
@imJenal
- Add a log statement
- Add a panic()
Compiler itself provides logging, debugging and visualization capabilities
@imJenal
$ go build -gcflags=-m=2 # print optimization info, including inlining, escape analysis $ go build -gcflags=-W # print internal parse tree after type checking
$ GOSSAFUNC=Foo go build # generate ssa.html file for func Foo $ go build -gcflags=-S # print assembly $ go tool compile -bench=out.txt x.go # print timing of compiler phases
@imJenal
Additional helpful tools
-
compilebench : benchmarks the speed of the compiler
-
Benchstat : computes statistical summaries and A/B comparisons of Go benchmarks
-
Peflock : locking wrapper for running benchmarks on shared hosts
- view-annotated-file : View annotated files based on line-spec
@imJenal
/internal/syntax :
lexer, parser/internal/types2 : type checking
/internal/types
: compiler types/internal/ir
: compiler AST/internal/noder
: create compiler AST/internal/deadcode
: dead code elimination/internal/inline
: function call inlining/internal/devirtualize
: devirtualization of known interface method calls/internal/escape
: escape analysiscmd/compile/internal/walk
(order of evaluation, desugaring)cmd/compile/internal/ssa
(SSA passes and rules)cmd/compile/internal/ssagen
(converting IR to SSA)
cmd/compile
contains the main packages that form the Go compiler
Resources
- https://go.dev/src/cmd/compile/README
- https://pkg.go.dev/golang.org/x/tools/cmd/compilebench
- https://cs.opensource.google/go/x/perf
- https://github.com/aclements/perflock
- https://github.com/loov/view-annotated-file
@imJenal
@ imJenal
@ imJenal
https://slides.com/jenal/fossasia2024-hanoi
FOSSASIA Vietnam 2024 || Apr 8 - Apr 10
By Jyotsna Gupta
FOSSASIA Vietnam 2024 || Apr 8 - Apr 10
FOSSASIA 2024 | 8 Apr -10 Apr | Hanoi, Vietnam
- 175