COMP6771
Advanced C++ Programming
Non-assessable Lecture 1
Course Toolchain
What does "non-assessable" mean?
Unless otherwise specified via an official form of communication by Hayden, indicating a course retcon, we will not assess you on any material in this deck of lecture slides in any assignment or exam.
To help you distinguish between assessable and non-assessable lecture material, they will be visibly different.
Legal stuff
- Christopher works for Google.
- Nothing Christopher says is representative of Google.
- Nothing Christopher says is representative of how Google practices software engineering.
- All opinions are Christopher's own opinions.
- Christopher's collaboration with UNSW is completely independent of his employment at Google.
- This is true for the whole duration of the course, from Week 1 and beyond Week 10.
-
Google owns and maintains the following open-source libraries that we use:
- Abseil
- Google Benchmark
The Simplest C++ Program: simple.cpp
int main() {}
clang++-11 -std=c++20 -Wall -Wextra -pedantic -Werror -o simple simple.cpp
How do we compile simple.cpp?
g++-10 -std=c++20 -Wall -Wextra -pedantic -Werror -o simple simple.cpp
Clang (LLVM project)
GCC (GNU)
cl.exe /std:c++latest /W4 /Wx /EHsc /permissive- /Fo"simple.exe" simple.cpp
MSVC (Windows only)
How compilation works
int main() {}
Compiler
simple
How compilation works:
Source files and translation units
Source files are program text stored in some file.
int main() {}
simple.cpp
A source file with all of its headers included is called a translation unit (or TU for short).
How compilation works:
Lexer
Lexer
- Scans in characters from source files.
- Groups the characters into sequences called tokens.
- Passes tokens to parser.
int main() {}
token{kind::int_, {.line=1,.col=1}, {.line=1,.col=4}, "int"}
token{kind::id_, {.line=1,.col=5}, {.line=1,.col=9}, "main"}
token{kind::lparen, {.line=1,.col=10}, {.line=1,.col=11}, "("}
token{kind::rparen, {.line=1,.col=11}, {.line=1,.col=12}, ")"}
token{kind::lcurly, {.line=1,.col=14}, {.line=1,.col=15}, "{"}
token{kind::rcurly, {.line=1,.col=15}, {.line=1,.col=16}, "}"}
token{kind::eof, {.line=1,.col=16}, {.line=1,.col=16}, "$"}
simple.cpp
Tokens
How compilation works:
Parser
Lexer
- Checks to make sure that tokens are ordered according to the grammar.
- Generates syntax errors for tokens out of place (e.g. ).
- If no errors, generates an intermediate representation (e.g. directed acyclic graph) to represent "special" tokens.
- Gives IR to the semantic analyser (Checker).
int main() {}
|translation_unit -|declaration_seq --|declaration ---|function_definition ----|return_type: "int" @ {1,1}..{1,4} ----|identifier: "main" @ {1,5}..{1,9} ----|parameters: none ----|function_body -----|compound_statement: empty -|eof
simple.cpp
main int()
Parser
Tokens
IR
How compilation works:
Checker
Lexer
- Checks to make sure the semantic rules are upheld.
- Annotates IR received from parser.
- Gives annotated IR to the next stage.
int main() {}
|translation_unit -|declaration_seq --|declaration ---|function_definition ----|return_type: "int" @ {1,1}..{1,4} ----|identifier: "main" @ {1,5}..{1,9} ----|parameters: none ----|function_body -----|compound_statement ------|return_statement -------|primary_expression: 0 -|eof
simple.cpp
Parser
Tokens
Checker
IR
IR'
How compilation works:
Code Generator
Lexer
Generates target file equivalent to source file.
Example on right is x86_64 assembly.
int main() {}
simple.cpp
Parser
Tokens
Checker
IR
IR'
main: # @main push rbp mov rbp, rsp xor eax, eax pop rbp ret
CodeGen
Target
How compilation translation works:
Linker
int main() {}
Compiler
simple.cpp
Target program
Linker
How compilation translation works:
Assembler
Compiler
hello.cpp
hello.o
#include <iostream>
int main() {
std::cout << "Hi\n";
}
Assembler
Target program
x86_64 assembly
C++ Standard Library code
Compiler
Target program
x86_64 assembly
Assembler
libc++.so
Talk pls
How compilation translation works:
Linker
Compiler
hello.cpp
hello.o
Linker
#include <iostream>
int main() {
std::cout << "Hi\n";
}
libc++.so
Pre-compiled library
Assembler
Target program
hello program
x86_64 assembly
A flag missing from first year?
clang++-11 -std=c++20 -Wall -Wextra -pedantic -Werror -o simple -O3 simple.cpp
g++-10 -std=c++20 -Wall -Wextra -pedantic -Werror -o simple -O3 simple.cpp
Clang (LLVM project)
GCC (GNU)
cl.exe /std:c++latest /W4 /Wx /EHsc /permissive- /Fo"simple.exe" /Ox simple.cpp
MSVC (Windows only)
How compilation works:
Optimiser
Lexer
- Transforms the original mapping into something equivalent, but "better".
- Might optimise for speed or size.
Tokens
hello.cpp
#include <iostream>
int main() {
std::cout << "Hi\n";
}
Parser
IR
IR'
Checker
IR''
Optimiser
Checker
Target
CodeGen
Example on Compiler Explorer
How compilation translation works:
Link-time optimisation
clang++-11 -std=c++20 -Wall -Wextra -pedantic -Werror -o simple -O3 -flto=thin simple.cpp
g++-10 -std=c++20 -Wall -Wextra -pedantic -Werror -o simple -O3 -flto simple.cpp
Clang (LLVM project)
GCC (GNU)
cl.exe /std:c++latest /W4 /Wx /EHsc /permissive- /Fo"simple.exe" /Ox /GL simple.cpp
MSVC (Windows only)
The compiler's optimiser can't make optimisations across different object files.
If you compile first.cpp today, second.cpp tomorrow, and link them three days from now, how can the compiler reasonably optimise on that?
The linker has all the object files at the same time, so it's able to optimise across object files during linking.
How compilation translation works:
Link-time optimisation
clang++-11 -std=c++20 -Wall -Wextra -pedantic -Werror -o simple -O3 -flto=thin -fuse-ld=lld simple.cpp
g++-10 -std=c++20 -Wall -Wextra -pedantic -Werror -o simple -O3 -flto -fuse-ld=gold simple.cpp
Clang (LLVM project)
GCC (GNU)
cl.exe /std:c++latest /W4 /Wx /EHsc /permissive- /Fo"simple.exe" /Ox /GL simple.cpp
MSVC (Windows only)
What's this got to do with the course toolchain, excactly?
What is a toolchain anyway?
Set of programming tools used to build a project.
Compiler (clang++-11)
Linker (lld-11)
Linter (clang-tidy-11)
Package manager (vcpkg)
Debugger (lldb-11)
Libraries
Standard library (libc++-11, libc++abi-11)
// word_ladder.cpp
// implements word_ladder::generate
// lexicon.cpp
// implements word_ladder::lexicon
// word_ladder_test.cpp
// tests word_ladder::generate
Visual example
#ifndef COMP6771_WORD_LADDER_HPP
#define COMP6771_WORD_LADDER_HPP
// headers...
namespace word_ladder {
[[nodiscard]] auto read_lexicon(std::string const& path) -> std::unordered_set<std::string>;
auto generate(std::string const&, std::string const&, std::unordered_set<std:string> const&)
-> std::vector<std::vector<std::string>>;
} // namespace word_ladder
#endif // COMP6771_WORD_LADDER_HPP
Visual example
#ifndef COMP6771_WORD_LADDER_HPP
#define COMP6771_WORD_LADDER_HPP
// headers...
namespace word_ladder {
[[nodiscard]] auto read_lexicon(std::string const& path) -> absl::flat_hash_set<std::string>;
auto generate(std::string const&, std::string const&, absl::flat_hash_set<std:string> const&)
-> std::vector<std::vector<std::string>>;
} // namespace word_ladder
#endif // COMP6771_WORD_LADDER_HPP
Forgot to recompile
Recompiled
Recompiled
// word_ladder.cpp
// implements word_ladder::generate
// lexicon.cpp
// implements word_ladder::lexicon
// word_ladder_test.cpp
// tests word_ladder::generate
ld.lld: error: undefined symbol: word_ladder::generate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)
>>> referenced by word_ladder_test1.cpp
>>> word_ladder_test1.o:(____C_A_T_C_H____T_E_S_T____0())
ld.lld: error: undefined symbol: word_ladder::generate(std::string const&, std::string const&, std::unordered_set<std::string> const&)
>>> referenced by word_ladder_test1.cpp
>>> word_ladder_test1.o:(____C_A_T_C_H____T_E_S_T____0())
Linker error!
Feeling dizzy yet?
We're only compiling three source files...
We should automate this process!
Shell scripts aren't enough because they don't understand the notion of a dependency. They'll either compile everything every time (slow) or compile exactly what you ask for.
A build system automates the process of compiling and linking the edited parts of a program so that you don't need to worry about the process more than once.
Examples: make, ninja, Maven, Apache Ant, Cargo
Problem: build systems are good for a single toolchain.
What if we wanted to build for all three major operating systems?
Need to write three build scripts???
Yuck!
What if we wanted to build for all available major toolchains?
Need to write build scripts per OS, per toolchain???
Double yuck!!
CMake
CMake is a build system generator.
We state what we want; let CMake work out how to write the build script.
We'll now switch over to a live demo where we set up a project.
Toolchain files
A toolchain file is a file that contains all the details about your toolchain.
You tell CMake where it is by defining CMAKE_TOOLCHAIN_FILE.
CMake then uses this toolchain file to generate all the toolchain-specific build rules.
Our toolchain files are located in
config/cmake/toolchain
Course Toolchain
By cs6771
Course Toolchain
- 645