COMP6771

Advanced C++ Programming

Non-assessable Lecture 1

Course Toolchain

What does "non-assessable" mean?

Unless otherwise specified via an official form of communication by Hayden, indicating a course retcon, we will not assess you on any material in this deck of lecture slides in any assignment or exam.

To help you distinguish between assessable and non-assessable lecture material, they will be visibly different.

Legal stuff

  • Christopher works for Google.
  • Nothing Christopher says is representative of Google.
  • Nothing Christopher says is representative of how Google practices software engineering.
  • All opinions are Christopher's own opinions.
  • Christopher's collaboration with UNSW is completely independent of his employment at Google.
  • This is true for the whole duration of the course, from Week 1 and beyond Week 10.
  • Google owns and maintains the following open-source libraries that we use:
    • Abseil
    • Google Benchmark

The Simplest C++ Program: simple.cpp

int main() {}
clang++-11 -std=c++20 -Wall -Wextra -pedantic -Werror -o simple simple.cpp

How do we compile simple.cpp?

g++-10 -std=c++20 -Wall -Wextra -pedantic -Werror -o simple simple.cpp

Clang (LLVM project)

GCC (GNU)

cl.exe /std:c++latest /W4 /Wx /EHsc /permissive- /Fo"simple.exe" simple.cpp

MSVC (Windows only)

How compilation works

int main() {}

Compiler

simple

How compilation works:
Source files and translation units

Source files are program text stored in some file.

int main() {}

simple.cpp

A source file with all of its headers included is called a translation unit (or TU for short).

How compilation works:
Lexer

Lexer

  1. Scans in characters from source files.
  2. Groups the characters into sequences called tokens.
  3. Passes tokens to parser.
int main() {}
token{kind::int_,   {.line=1,.col=1},  {.line=1,.col=4},  "int"}
token{kind::id_,    {.line=1,.col=5},  {.line=1,.col=9},  "main"}
token{kind::lparen, {.line=1,.col=10}, {.line=1,.col=11}, "("}
token{kind::rparen, {.line=1,.col=11}, {.line=1,.col=12}, ")"}
token{kind::lcurly, {.line=1,.col=14}, {.line=1,.col=15}, "{"}
token{kind::rcurly, {.line=1,.col=15}, {.line=1,.col=16}, "}"}
token{kind::eof,    {.line=1,.col=16}, {.line=1,.col=16}, "$"}

simple.cpp

Tokens

How compilation works:
Parser

Lexer

  1. Checks to make sure that tokens are ordered according to the grammar.
  2. Generates syntax errors for tokens out of place (e.g.                            ).
  3. If no errors, generates an intermediate representation (e.g. directed acyclic graph) to represent "special" tokens.
  4. Gives IR to the semantic analyser (Checker).
int main() {}
|translation_unit
-|declaration_seq
--|declaration
---|function_definition
----|return_type:         "int" @ {1,1}..{1,4}
----|identifier:          "main" @ {1,5}..{1,9}
----|parameters:          none
----|function_body
-----|compound_statement: empty
-|eof

simple.cpp

      main int()

Parser

Tokens

IR

How compilation works:
Checker

Lexer

  1. Checks to make sure the semantic rules are upheld.
  2. Annotates IR received from parser.
  3. Gives annotated IR to the next stage.
int main() {}
|translation_unit
-|declaration_seq
--|declaration
---|function_definition
----|return_type:         "int" @ {1,1}..{1,4}
----|identifier:          "main" @ {1,5}..{1,9}
----|parameters:          none
----|function_body
-----|compound_statement
------|return_statement
-------|primary_expression: 0
-|eof

simple.cpp

Parser

Tokens

Checker

IR

IR'

How compilation works:
Code Generator

Lexer

Generates target file equivalent to source file.

Example on right is x86_64 assembly.

int main() {}

simple.cpp

Parser

Tokens

Checker

IR

IR'

main:                                   # @main
        push    rbp
        mov     rbp, rsp
        xor     eax, eax
        pop     rbp
        ret

CodeGen

Target

How compilation translation works:
Linker

int main() {}

Compiler

simple.cpp

Target program

Linker

How compilation translation works:
Assembler

Compiler

hello.cpp

hello.o

#include <iostream>

int main() {
  std::cout << "Hi\n";
}

Assembler

Target program

x86_64 assembly

C++ Standard Library code

Compiler

Target program

x86_64 assembly

Assembler

libc++.so

Talk pls

How compilation translation works:
Linker

Compiler

hello.cpp

hello.o

Linker

#include <iostream>

int main() {
  std::cout << "Hi\n";
}

libc++.so

Pre-compiled library

Assembler

Target program

hello program

x86_64 assembly

A flag missing from first year?

clang++-11 -std=c++20 -Wall -Wextra -pedantic -Werror -o simple -O3 simple.cpp
g++-10 -std=c++20 -Wall -Wextra -pedantic -Werror -o simple -O3 simple.cpp

Clang (LLVM project)

GCC (GNU)

cl.exe /std:c++latest /W4 /Wx /EHsc /permissive- /Fo"simple.exe" /Ox simple.cpp

MSVC (Windows only)

How compilation works:
Optimiser

Lexer

  • Transforms the original mapping into something equivalent, but "better".
  • Might optimise for speed or size.

Tokens

hello.cpp

#include <iostream>

int main() {
  std::cout << "Hi\n";
}

Parser

IR

IR'

Checker

IR''

Optimiser

Checker

Target

CodeGen

Example on Compiler Explorer

How compilation translation works:
Link-time optimisation

clang++-11 -std=c++20 -Wall -Wextra -pedantic -Werror -o simple -O3 -flto=thin simple.cpp
g++-10 -std=c++20 -Wall -Wextra -pedantic -Werror -o simple -O3 -flto simple.cpp

Clang (LLVM project)

GCC (GNU)

cl.exe /std:c++latest /W4 /Wx /EHsc /permissive- /Fo"simple.exe" /Ox /GL simple.cpp

MSVC (Windows only)

The compiler's optimiser can't make optimisations across different object files.

If you compile first.cpp today, second.cpp tomorrow, and link them three days from now, how can the compiler reasonably optimise on that?

The linker has all the object files at the same time, so it's able to optimise across object files during linking.

How compilation translation works:
Link-time optimisation

clang++-11 -std=c++20 -Wall -Wextra -pedantic -Werror -o simple -O3 -flto=thin -fuse-ld=lld simple.cpp
g++-10 -std=c++20 -Wall -Wextra -pedantic -Werror -o simple -O3 -flto -fuse-ld=gold simple.cpp

Clang (LLVM project)

GCC (GNU)

cl.exe /std:c++latest /W4 /Wx /EHsc /permissive- /Fo"simple.exe" /Ox /GL simple.cpp

MSVC (Windows only)

What's this got to do with the course toolchain, excactly?

What is a toolchain anyway?

Set of programming tools used to build a project.

Compiler (clang++-11)

Linker (lld-11)

Linter (clang-tidy-11)

Package manager (vcpkg)

Debugger (lldb-11)

Libraries

Standard library (libc++-11, libc++abi-11)

// word_ladder.cpp

// implements word_ladder::generate
// lexicon.cpp

// implements word_ladder::lexicon
// word_ladder_test.cpp

// tests word_ladder::generate

Visual example

#ifndef COMP6771_WORD_LADDER_HPP
#define COMP6771_WORD_LADDER_HPP

// headers...

namespace word_ladder {
    [[nodiscard]] auto read_lexicon(std::string const& path) -> std::unordered_set<std::string>;

    auto generate(std::string const&, std::string const&, std::unordered_set<std:string> const&)
        -> std::vector<std::vector<std::string>>;
} // namespace word_ladder

#endif // COMP6771_WORD_LADDER_HPP

Visual example

#ifndef COMP6771_WORD_LADDER_HPP
#define COMP6771_WORD_LADDER_HPP

// headers...

namespace word_ladder {
    [[nodiscard]] auto read_lexicon(std::string const& path) -> absl::flat_hash_set<std::string>;

    auto generate(std::string const&, std::string const&, absl::flat_hash_set<std:string> const&)
        -> std::vector<std::vector<std::string>>;
} // namespace word_ladder

#endif // COMP6771_WORD_LADDER_HPP

Forgot to recompile

Recompiled

Recompiled

// word_ladder.cpp

// implements word_ladder::generate
// lexicon.cpp

// implements word_ladder::lexicon
// word_ladder_test.cpp

// tests word_ladder::generate
ld.lld: error: undefined symbol: word_ladder::generate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)
>>> referenced by word_ladder_test1.cpp
>>>               word_ladder_test1.o:(____C_A_T_C_H____T_E_S_T____0())
ld.lld: error: undefined symbol: word_ladder::generate(std::string const&, std::string const&, std::unordered_set<std::string> const&)
>>> referenced by word_ladder_test1.cpp
>>>               word_ladder_test1.o:(____C_A_T_C_H____T_E_S_T____0())

Linker error!

Feeling dizzy yet?
We're only compiling three source files...

We should automate this process!

Shell scripts aren't enough because they don't understand the notion of a dependency. They'll either compile everything every time (slow) or compile exactly what you ask for.

A build system automates the process of compiling and linking the edited parts of a program so that you don't need to worry about the process more than once.

Examples: make, ninja, Maven, Apache Ant, Cargo

Problem: build systems are good for a single toolchain.

What if we wanted to build for all three major operating systems?

Need to write three build scripts???

Yuck!

What if we wanted to build for all available major toolchains?

Need to write build scripts per OS, per toolchain???

Double yuck!!

CMake

CMake is a build system generator.

We state what we want; let CMake work out how to write the build script.

We'll now switch over to a live demo where we set up a project.

Toolchain files

A toolchain file is a file that contains all the details about your toolchain.

You tell CMake where it is by defining CMAKE_TOOLCHAIN_FILE.

CMake then uses this toolchain file to generate all the toolchain-specific build rules.

Our toolchain files are located in

config/cmake/toolchain

Course Toolchain

By cs6771

Course Toolchain

  • 645