Decoding the Black Box: Understanding the Go Compiler

FOSSASIA 2024 

@imJenal

 Jyotsna Gupta

@imJenal

Hi! I'm Jyotsna

Type Checking & Semantic Analysis

Generate Intermediate Representation (IR)

Optimize

IR (SSA)

Source

Code

Input

Read

Source

Code File

Lexer: Convert to Tokens

Parser: Create

AST

Generate Machine Code

Link Machine Code with Libraries

Create Executable Files

Executable Ready

@imJenal

Type Checking & Semantic Analysis

Generate Intermediate Representation (IR)

Optimize

IR (SSA)

Source

Code

Input

Read

Source

Code File

Lexer: Convert to Tokens

Parser: Create

AST

Generate Machine Code

Link Machine Code with Libraries

Create Executable Files

Executable Ready

@imJenal

  • Also known as Scanner
     
  • Breaks down the source code into tokens
     
  • Tokens are the smallest units of meaning
     
  • Identifies keywords, identifiers, literals, operators
     
  • Discards whitespace and comments

Lexer

@imJenal

Type Checking & Semantic Analysis

Generate Intermediate Representation (IR)

Optimize

IR (SSA)

Source

Code

Input

Read

Source

Code File

Lexer: Convert to Tokens

Parser: Create

AST

Generate Machine Code

Link Machine Code with Libraries

Create Executable Files

Executable Ready

@imJenal

  • Constructs tree-like structure known as an Abstract Syntax Tree (AST)
     
  • Represents the grammatical structure of the code
     
  • The parser checks the code against the grammatical rules of the programming language to ensure it's structured correctly. ( e.g. missing semi-colon gets detected )

Parser

@imJenal

Type Checking & Semantic Analysis

Generate Intermediate Representation (IR)

Optimize

IR (SSA)

Source

Code

Input

Read

Source

Code File

Lexer: Convert to Tokens

Parser: Create

AST

Generate Machine Code

Link Machine Code with Libraries

Create Executable Files

Executable Ready

@imJenal

  • This phase involves checking that the AST adheres to the semantic rules of the language: type checking, scoping rules, and constraints
     
  • Ensures that operations are performed on compatible types, variables are declared before use, and the code respects the language's rules ( e.g. not using an integer as a function)

Semantic Analysis

@imJenal

Type Checking & Semantic Analysis

Generate Intermediate Representation (IR)

Optimize

IR (SSA)

Source

Code

Input

Read

Source

Code File

Lexer: Convert to Tokens

Parser: Create

AST

Generate Machine Code

Link Machine Code with Libraries

Create Executable Files

Executable Ready

@imJenal

  • IR: Lower-level representation of the program
     
  • Translates the AST into a sequence of instruction set
     
  • Independent of the target machine's architecture but closer to the machine language than high-level code
     
  • Serves as a middle ground that makes it easier to apply optimizations

Intermediate Code Generation

@imJenal

Type Checking & Semantic Analysis

Generate Intermediate Representation (IR)

Optimize

IR (SSA)

Source

Code

Input

Read

Source

Code File

Lexer: Convert to Tokens

Parser: Create

AST

Generate Machine Code

Link Machine Code with Libraries

Create Executable Files

Executable Ready

@imJenal

  • The goal is to make the program run faster and be more resource-efficient
     
  • Techniques can include eliminating unnecessary calculations, reducing the number of instructions, or minimizing memory usage
     
  • One of the methods: Static Single Assignment (SSA)  [SSA: each variable is assigned exactly once]

Optimize IR

@imJenal

Type Checking & Semantic Analysis

Generate Intermediate Representation (IR)

Optimize

IR (SSA)

Source

Code

Input

Read

Source

Code File

Lexer: Convert to Tokens

Parser: Create

AST

Generate Machine Code

Link Machine Code with Libraries

Create Executable Files

Executable Ready

@imJenal

  • This phase converts the optimized IR into machine code, the binary instructions that the processor can execute
     
  • The code generator maps each instruction in the IR to a sequence of machine-level instructions.
     
  • It takes into account the specifics of the target processor's architecture (such as the number and type of registers) during this translation.

Machine Code Generation

@imJenal

Type Checking & Semantic Analysis

Generate Intermediate Representation (IR)

Optimize

IR (SSA)

Source

Code

Input

Read

Source

Code File

Lexer: Convert to Tokens

Parser: Create

AST

Generate Machine Code

Link Machine Code with Libraries

Create Executable Files

Executable Ready

@imJenal

  • Final step in the compilation
     
  • Different pieces of generated machine code (from the source code files and libraries) are combined into a single executable program
     
  • The linker resolves references to undefined symbols, like functions or variables declared in other files or libraries
     
  • Once linking is completed, you have an executable file that the operating system can load into memory and the CPU can execute, thus running the program

Linking and Execution

@imJenal

Type Checking & Semantic Analysis

Generate Intermediate Representation (IR)

Optimize

IR (SSA)

Source

Code

Input

Read

Source

Code File

Lexer: Convert to Tokens

Parser: Create

AST

Generate Machine Code

Link Machine Code with Libraries

Create Executable Files

Executable Ready

@imJenal

Getting Started

@imJenal

  • Add a log statement
  • Add a panic()

Compiler itself provides logging, debugging and visualization capabilities

@imJenal

$ go build -gcflags=-m=2
# print optimization info, including inlining, escape analysis

$ go build -gcflags=-W
# print internal parse tree after type checking
$ GOSSAFUNC=Foo go build
# generate ssa.html file for func Foo

$ go build -gcflags=-S
# print assembly

$ go tool compile -bench=out.txt x.go
# print timing of compiler phases

@imJenal

Additional helpful tools

  • compilebench : benchmarks the speed of the compiler
     
  • Benchstat : computes statistical summaries and A/B comparisons of Go benchmarks
     
  • Peflock : locking wrapper for running benchmarks on shared hosts
     
  • view-annotated-file : View annotated files based on line-spec

@imJenal

  • /internal/syntax : lexer, parser
  • /internal/types2 : type checking
  • /internal/types: compiler types
  • /internal/ir  : compiler AST
  • /internal/noder : create compiler AST
  • /internal/deadcode : dead code elimination
  • /internal/inline : function call inlining
  • /internal/devirtualize : devirtualization of known interface method calls
  • /internal/escape : escape analysis
  • cmd/compile/internal/walk (order of evaluation, desugaring)
  • cmd/compile/internal/ssa (SSA passes and rules)
  • cmd/compile/internal/ssagen (converting IR to SSA)

 

cmd/compile contains the main packages that form the Go compiler

 

Resources

  • https://go.dev/src/cmd/compile/README
     
  • https://pkg.go.dev/golang.org/x/tools/cmd/compilebench
     
  • https://cs.opensource.google/go/x/perf
     
  • https://github.com/aclements/perflock
     
  • https://github.com/loov/view-annotated-file

 

@imJenal

@ imJenal 

@ imJenal 

https://slides.com/jenal/fossasia2024-hanoi 

FOSSASIA Vietnam 2024 || Apr 8 - Apr 10

By Jyotsna Gupta

FOSSASIA Vietnam 2024 || Apr 8 - Apr 10

FOSSASIA 2024 | 8 Apr -10 Apr | Hanoi, Vietnam

  • 84