All images and fonts placed here to
preload them into the browser cache
Nexa Text Regular
Nexa Text Bold
Nexa Text Italic
Nexa Text Bold Italic
Nexa Regular
Nexa Bold
Nexa Italic
Nexa Bold Italic
#include <random_code>
using in_cpp;
to preload() {
// the monospace
font as = well;
}
Your Performance Todo List
The most important optimisation opportunities and pitfalls to remember about
by
Jan Bielak
Your Performance Todo List
The most important optimisation opportunities and pitfalls to remember about
by
Jan Bielak
Jan Bielak
Warsaw Staszic High School, Poland
Self-taught C++ Developer
Realtime rendering
Game development
janbielak.com github.com/janekb04 youtube.com/@janbielak
Practically Correct, Just-in-Time Shell Script Parallelization
Konstantinos Kallas, Tammam Mustafa, Jan Bielak, Dimitris Karnikis, Thurston H.Y. Dang, Michael Greenberg, Nikos Vasilakis. 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI22)
Performance
Performance
Performance
1. No unnecessary work
2. Use all computing power
3. Avoid waits and stalls
4. Use hardware efficiently
No unnecessary copying
No unnecessary allocations
Use all cores
Use SIMD
Lockless data structures
Asynchronous APIs
Job Systems
Cache friendliness
Well predictable code
5. OS-level efficiency
Performance
1. No unnecessary work
2. Use all computing power
3. Avoid waits and stalls
4. Use hardware efficiently
5. OS-level efficiency
Performance
1. No unnecessary work
2. Use all computing power
3. Avoid waits and stalls
4. Use hardware efficiently
5. OS-level efficiency
Effective use of C++
Build pipeline modification
Manual hardware oriented optimisations
Your Performance Todo List
Effective use of C++
Build pipeline modification
Manual hardware oriented optimisations
Build pipeline modification
1. Enable compiler optimisations
GCC, LLVM, ICC | -O2 or -O3 |
---|---|
MSVC | /Ox or /O2 |
Optimize for speed
Optimize for size
GCC, LLVM, ICC | -Os |
---|---|
MSVC | /O1 |
Optimization #1
Longer compile time
Build pipeline modification
2. Set target architecture
GCC, LLVM, ICC | -march=native -mtune=native |
---|---|
MSVC |
/arch:IA32 or /arch:SSE or /arch:SSE2 or /arch:AVX or /arch:AVX2 or /arch:AVX512 |
For x86
For ARM
GCC, LLVM | -mcpu=native |
---|---|
MSVC | /arch:ARMv7VE or /arch:VFPv4 or /arch:armv8.0 ... or /arch:armv8.8 |
automatic detection of current processor's features
automatic detection of current processor's features
needs to be specified manually
needs to be specified manually
Build pipeline modification
3. Use fast math
GCC, LLVM | -ffast-math (included in -Ofast) |
---|---|
MSVC | /fp:fast |
ICC | -fp-model=fast |
Faster computation
Less precise results
Non standard-compliant
Build pipeline modification
4. Disable exceptions and RTTI
GCC, LLVM, ICC | -fno-exceptions |
---|---|
MSVC | /EHs-c- /D_HAS_EXCEPTIONS=0 |
GCC, LLVM, ICC | -fno-rtti |
---|---|
MSVC | /GR- |
No exceptions
No RTTI
Limited performance gains
Non standard-compliant
Breaks code using exceptions
Build pipeline modification
5. Enable Link Time Optimization
Compiler
Compiler
Compiler
Linker
}
?
?
?
Build pipeline modification
5. Enable Link Time Optimization
Compiler
Compiler
Compiler
Linker
}
GCC, LLVM | -flto |
---|---|
MSVC | /GL |
ICC | -ipo |
Build pipeline modification
6. Use Unity Builds
Compiler
Linker
}
Compiler
Build pipeline modification
6. Use Unity Builds
Compiler
Linker
}
Compiler
Unity Build
Unity Build
CMake | -DCMAKE_UNITY_BUILD=ON |
---|
Build pipeline modification
7. Link statically
Static Linking | Dynamic Linking |
---|---|
|
Better optimisable
More space efficient
Can be updated independently of executable
Build pipeline modification
8. Use Profile Guided Optimisation
Build pipeline
Build pipeline modification
8. Use Profile Guided Optimisation
Build pipeline
Execute
Build pipeline modification
8. Use Profile Guided Optimisation
Build pipeline
Execute
Build pipeline
GCC, LLVM | -fprofile-generate |
---|---|
MSVC | /GENPROFILE |
ICC | -prof-gen |
GCC, LLVM | -fprofile-use |
---|---|
MSVC | /USEPROFILE |
ICC | -prof-use |
Build pipeline modification
9. Try different compilers
Build pipeline modification
10. Try different standard libraries
Build pipeline modification
11. Keep your tools updated
Build pipeline modification
12. Preload with a replacement lib
env LD_PRELOAD=/usr/lib/libSUPERmalloc.so ./myprogram
env DYLD_INSERT_LIBRARIES=/usr/lib/libSUPERmalloc.dylib ./myprogram
Requires DLL injection
Windows
macOS
Linux, BSD
Build pipeline modification
13. Use binary post processing tools
LLVM BOLT
perf record
perf2bolt
Build pipeline modification
13. Use binary post processing tools
LLVM BOLT
perf record
perf2bolt
llvm-bolt
Effective use of C++
Build pipeline modification
Manual hardware oriented optimisations
Your Performance Todo List
- Enable compiler optimisations
- Set target architecture
- Use fast math
- Disable exceptions and RTTI
- Enable Link Time Optimisation
- Use Unity Builds
- Link statically
- Use Profile Guided Optimisation
- Try different compilers
- Try different standard library implementations
- Keep your tools updated
- Preload you program with a replacement library
- Use binary post processing tools
Annotate your code
14. Constexpr all the things
Effective use of C++
Constant expressions:
Literals:
1, 3.0f, nullptr, "Hello"
Arithmetic:
2 + 3, 4.0 / 3.0
Sizes and alignments:
sizeof(int), alignof(std::vector<int>)
...
14. Constexpr all the things
Effective use of C++
constexpr int f(int x) { return 3 * x + 5; }
Constexpr functions:
invocation MAY be a constant expression
f(5)
int x;
std::cin >> x;
f(x);
is a constant expression
is NOT a constant expression
Is a given invocation evaluated at compile time?
if (std::is_constant_evaluated()) { ... }
if consteval { ... }
consteval int f(int x) { return 3 * x + 5; }
Immediate functions:
f(5)
int x;
std::cin >> x;
f(x);
is a constant expression
is a COMPILE ERRROR
invocation MUST be a constant expression
(inside function body)
if constexpr (compile_time_condition) {...}
If constexpr:
if constexpr (std::is_constant_evaluated()) {...}
ALWAYS TRUE
14. Constexpr all the things
Effective use of C++
constexpr std::array<int> primes{ 2, 3, 5, 7, 11 };
Constexpr variables:
variable must be initialised at its declaration
constexpr int x;
x = 3;
is a COMPILE ERRROR
implies
primes[0] = 1;
is a COMPILE ERRROR
const
constexpr int f(int x) { return x + 1; }
int main()
{
int x1 = 3;
constexpr int y1 = f(x1);
constexpr int x2 = 3;
constexpr int y2 = f(x2);
}
accessing it is a constant expression
is a COMPILE ERRROR
constinit std::array<int> primes{ 2, 3, 5, 7, 11 };
Constinit variables:
(by a constant a expression)
variable must be initialised at its declaration by a constant expression
14. Constexpr all the things
Effective use of C++
constexpr std::array<int> primes{ 2, 3, 5, 7, 11 };
Constexpr variables:
constinit std::array<int> primes{ 2, 3, 5, 7, 11 };
Constinit variables:
constexpr int f(int x) { return 3 * x + 5; }
Constexpr functions:
Is a given invocation evaluated at compile time?
if (std::is_constant_evaluated()) { ... }
if consteval { ... }
consteval int f(int x) { return 3 * x + 5; }
Immediate functions:
if constexpr (compile_time_condition) {...}
If constexpr:
15. Make variables const
Effective use of C++
std::vector<float>
get_mean_deltas(std::vector<float> data)
{
float sum = 0;
for (auto&& num : data)
sum += num;
for (auto& num : data)
num -= sum / data.size();
return data;
}
Declare variables const
15. Make variables const
Effective use of C++
std::vector<float>
get_mean_deltas(std::vector<float> data)
{
const float sum = std::accumulate(
data.begin(),
data.end(),
0.0f
);
for (auto& num : data)
num -= sum / data.size();
return data;
}
std::vector<float>
get_mean_deltas(std::vector<float> data)
{
const float sum = std::accumulate(
data.begin(),
data.end(),
0.0f
);
const float __mean = sum / data.size();
for (auto& num : data)
num -= __mean;
return data;
}
...so this expression is loop-invariant and can be hoisted
and no expensive division in loop!
sum is const...
~ compiler's thought process
(paraphrased)
Declare variables const
15. Make variables const
Effective use of C++
template <typename T>
class vector {
T* begin;
T* end;
T* capacity;
/* ... */
public:
constexpr size_t size() const noexcept {
return end - begin;
}
};
template <typename T>
class vector {
T* begin;
T* end;
T* capacity;
/* ... */
public:
constexpr size_t size(this const vector& self) noexcept {
return self.end - self.begin;
}
};
Declare member functions const
15. Make variables const
Effective use of C++
Copy globals to const locals
(if copying is cheap)
struct {
/* ... */
bool fill;
} _internal__state;
void set_draw_mode_filled();
void set_draw_mode_wireframe();
void draw_mesh(const mesh* m) {
for (const primitive* prim = m->begin(); prim != m->end(); ++prim) {
if(_internal__is_frontfacing(*prim)) {
if (_internal__state.fill) {
_internal__draw_prim_filled(*prim);
}
else {
_internal__draw_prim_wireframe(*prim);
}
}
}
}
15. Make variables const
Effective use of C++
Copy globals to const locals
(if copying is cheap)
void draw_mesh(const mesh* m) {
for (const primitive* prim = m->begin(); prim != m->end(); ++prim)
if(_internal__is_frontfacing(*prim))
if (_internal__state.fill)
_internal__draw_prim_filled(*prim);
else
_internal__draw_prim_wireframe(*prim);
}
void draw_mesh(const mesh* m) {
if (_internal__state.fill)
for (const primitive* prim = m->begin(); prim != m->end(); ++prim)
if(_internal__is_frontfacing(*prim))
_internal__draw_prim_filled(*prim);
else
for (const primitive* prim = m->begin(); prim != m->end(); ++prim)
if(_internal__is_frontfacing(*prim))
_internal__draw_prim_wireframe(*prim);
}
could modify _internal__state.fill
could modify _internal__state.fill
15. Make variables const
Effective use of C++
Copy globals to const locals
(if copying is cheap)
void draw_mesh(const mesh* m) {
const bool fill = _internal__state.fill;
for (const primitive* prim = m->begin(); prim != m->end(); ++prim)
if(_internal__is_frontfacing(*prim))
if (fill)
_internal__draw_prim_filled(*prim);
else
_internal__draw_prim_wireframe(*prim);
}
void draw_mesh(const mesh* m) {
if (_internal__state.fill)
for (const primitive* prim = m->begin(); prim != m->end(); ++prim)
if(_internal__is_frontfacing(*prim))
_internal__draw_prim_filled(*prim);
else
for (const primitive* prim = m->begin(); prim != m->end(); ++prim)
if(_internal__is_frontfacing(*prim))
_internal__draw_prim_wireframe(*prim);
}
could modify _internal__state.fill
but we don't care
16. Noexcept all the things
Effective use of C++
void f();
COULD throw an exception
void f() noexcept;
WILL NEVER throw an exception
void f() noexcept(true);
void f() noexcept(false);
template <typename T>
void swap(T&& lhs, T&& rhs)
noexcept(std::is_nothrow_move_constructible<T>
&& std::is_nothrow_move_assignable<T>)
{
T tmp = std::move(lhs);
lhs = std::move(rhs);
rhs = std::move(tmp);
}
noexceptness
depends on T
16. Noexcept all the things
Effective use of C++
void f();
COULD throw an exception
void f() noexcept;
WILL NEVER throw an exception
void f() noexcept(true);
void f() noexcept(false);
template <typename T>
void swap(T&& lhs, T&& rhs)
noexcept(std::is_nothrow_move_constructible<T>
&& std::is_nothrow_move_assignable<T>)
{
T tmp = std::move(lhs);
lhs = std::move(rhs);
rhs = std::move(tmp);
}
template <typename T>
void swap(T&& lhs, T&& rhs)
noexcept(noexcept(T(std::move(lhs)))
&& noexcept(lhs = std::move(rhs)))
{
T tmp = std::move(lhs);
lhs = std::move(rhs);
rhs = std::move(tmp);
}
17. Use static for internal linkage
Effective use of C++
int counter() {
static int counter = 0;
return ++counter;
};
struct image {
namespace fs = std::filesystem;
static image from_file(fs::path path);
};
Static variables
Static member functions
17. Use static for internal linkage
Effective use of C++
static int global_value;
static void global_func();
Internal linkage variables
Internal linkage functions
a.cpp
b.cpp
// Forward declarations
extern int global_value;
void global_func();
extern int global_value2;
void global_func2();
//Use
void example() {
global_value = 42;
global_func();
global_value2 = 42;
global_func2();
}
17. Use static for internal linkage
Effective use of C++
static int global_value;
static void global_func();
Internal linkage functions
a.cpp
b.cpp
int global_value2;
void global_func2();
// Forward declarations
extern int global_value;
void global_func();
extern int global_value2;
void global_func2();
//Use
void example() {
global_value = 42;
global_func();
global_value2 = 42;
global_func2();
}
unresolved external symbol
unresolved external symbol
?
?
17. Use static for internal linkage
Effective use of C++
18. Use [[noreturn]]
Effective use of C++
[[noreturn]] void Log::Error(const String& msg) {
logfile << msg << '\n';
std::cerr << msg << '\n';
throw Engine::RuntimeError(msg);
}
19. Use [[likely]] and [[unlikely]]
Effective use of C++
void internal_work();
19. Use [[likely]] and [[unlikely]]
Effective use of C++
bool require_init = true;
void init_lib();
void internal_work();
19. Use [[likely]] and [[unlikely]]
Effective use of C++
bool require_init = true;
void init_lib();
void internal_work();
void work()
{
if(require_init) {
init_lib();
require_init = false;
}
internal_work();
}
19. Use [[likely]] and [[unlikely]]
Effective use of C++
bool require_init = true;
void init_lib();
void internal_work();
void work()
{
if(require_init) {
init_lib();
require_init = false;
}
internal_work();
}
Effective use of C++
C++23 | [[assume(condition)]]; |
---|---|
GCC | if (!condition) __builtin_unreachable(); |
MSVC, ICC | __assume(condition); |
LLVM | __builtin_assume(condition); |
20. Use [[assume(condition)]];
Effective use of C++
[[assume(condition)]]; | assert(condition); |
---|---|
Condition must be true | Condition must be true |
For the optimiser | For the programmer |
If !condition then Undefined Behaviour |
If !condition then std::abort() in Debug Mode noop in Release Mode |
20. Use [[assume(condition)]];
Effective use of C++
void implementation(internal_t* obj) {
if (obj) {
internal_work(*obj);
}
}
void interface(public_t* obj) {
if (obj) {
[[assume(obj->internal)]];
implementation(obj->internal);
}
}
Assume that pointer is non null*
*better use a reference
void limiter(float* samples, size_t count) {
[[assume(samples % 32 == 0)]];
[[assume(size > 0)]];
for (int i = 0; i < count; ++i) {
samples[i] = std::clamp(samples[i], -1.0, 1.0)
}
}
Assume pointer alignment*
*or use std::assume_aligned
example taken from P1774 (the [[assume]] proposal)
20. Use [[assume(condition)]];
Effective use of C++
const char* get_name(TextureType type) {
switch(e) {
case TextureType::Texture2D:
return "Texture2D";
case TextureType::Texture3D:
return "Texture3D";
case TextureType::Texture2DArray:
return "Texture2DArray";
case TextureType::Cubemap:
return "Cubemap";
default:
[[assume(false)]];
}
}
Declare a code path unreachable
*or use std::unreachable
20. Use [[assume(condition)]];
21. Use __restrict
Effective use of C++
float* __restrict buffer0;
float* __restrict buffer1;
21. Use __restrict
Effective use of C++
float* __restrict buffer0;
float* __restrict buffer1;
UB if overlap
21. Use __restrict
Effective use of C++
pointer provenance
21. Use __restrict
Effective use of C++
GCC, LLVM, ICC | __attribute__((malloc)) |
---|---|
MSVC | __declspec(restrict) |
22. Make functions pure
Effective use of C++
f
param0
param1
output
GCC, LLVM, ICC | __attribute__((pure)) or [[gnu::pure]] |
---|---|
MSVC | Not Supported |
22. Make functions pure
Effective use of C++
f
param0
param1
output
GCC, LLVM, ICC | __attribute__((pure)) or [[gnu::pure]] |
---|---|
MSVC | Not Supported |
f
param0
param1
output
GCC, LLVM, ICC | __attribute__((const)) or [[gnu::const]] |
---|---|
MSVC | Not Supported |
global state
Effective use of C++
Build pipeline modification
Manual hardware oriented optimisations
Your Performance Todo List
- Enable compiler optimisations
- Set target architecture
- Use fast math
- Disable exceptions and RTTI
- Enable Link Time Optimisation
- Use Unity Builds
- Link statically
- Use Profile Guided Optimisation
- Try different compilers
- Try different standard library implementations
- Keep your tools updated
- Preload you program with a replacement library
- Use binary post processing tools
14. Use constexpr
15. Make variables const
16. Use noexcept
17. Use static for internal linkage
18. Use [[noreturn]]
19. Use [[likely]] and [[unlikely]]
20. Use [[assume]]
21. Mark pointers restrict
22. Mark functions as pure
Annotate your code
No redundant copies
23. Take parameters properly
Effective use of C++
void func(??? x);
if x can be null
if needing ownership of x
if x is copied
take by value
if x is moved from
func(x);
call site:
declaration?
take by rvalue reference
(x is only read from)
take by unique_ptr, shared_ptr
take std::optional of x
if x is modified
take by lvalue reference
if x is cheap to copy
take by value
take by const lvalue reference
if x is a range
false
false
false
false
false
false
false
true
true
true
true
true
true
true
does x need to be a contiguous array
false
true
take std::span
can x be an arbitrary range
true
take std::ranges::***
false
does x need to be a specific container
true
take the container
false
take iterator pair
does x need to be perfectly forwarded
take by "universal reference"
true
false
type&& x
type&& x
type x
type& x
type x
const type& x
START HERE
23. Take parameters properly
Effective use of C++
void f(const std::string& s);
f("Hello");
f(std::string{"Hello"}.c_str());
void f(const char* s);
(safe - lifetime of temporary extended)
implicit conversion to string
(allocation)
verbose
(safe)
f(std::string{"Hello"});
f("Hello");
void f(std::string_view s);
works for both
(no copies)
(safe)
23. Take parameters properly
Effective use of C++
if x can be null
if needing ownership of x
if x is copied
take by value
if x is moved from
take by rvalue reference
(x is only read from)
take by unique_ptr, shared_ptr
take std::optional of x
if x is modified
take by lvalue reference
if x is cheap to copy
take by value
take by const lvalue reference
if x is a range
false
false
false
false
false
false
false
true
true
true
true
true
true
true
does x need to be a contiguous array
false
true
take std::span
can x be an arbitrary range
true
take std::ranges::***
false
does x need to be a specific container
true
take the container
false
take iterator pair
does x need to be perfectly forwarded
take by "universal reference"
true
false
type&& x
type&& x
type x
type& x
type x
const type& x
START HERE
23. Take parameters properly
Effective use of C++
if x can be null
if needing ownership of x
if x is copied
take by value
if x is moved from
take by rvalue reference
(x is only read from)
take by unique_ptr, shared_ptr
take std::optional of x
if x is modified
take by lvalue reference
if x is cheap to copy
take by value
take by const lvalue reference
if x is a range
false
false
false
false
false
false
false
true
true
true
true
true
true
true
does x need to be a contiguous array
false
true
take std::span
can x be an arbitrary range
true
take std::ranges::***
false
does x need to be a specific container
true
take the container
false
take iterator pair
does x need to be perfectly forwarded
take by "universal reference"
true
false
type&& x
type&& x
type x
type& x
type x
const type& x
if x is a readonly string
START HERE
true
take std::string_view
false
23. Take parameters properly
Effective use of C++
if x can be null
if needing ownership of x
if x is copied
take by value
if x is moved from
take by rvalue reference
(x is only read from)
take by unique_ptr, shared_ptr
take std::optional of x
if x is modified
take by lvalue reference
if x is cheap to copy
take by value
take by const lvalue reference
if x is a range
false
false
false
false
false
false
false
true
true
true
true
true
true
true
does x need to be a contiguous array
false
true
take std::span
can x be an arbitrary range
true
take std::ranges::***
false
does x need to be a specific container
true
take the container
false
take iterator pair
does x need to be perfectly forwarded
take by "universal reference"
true
false
type&& x
type&& x
type x
type& x
type x
const type& x
if x is a readonly string
START HERE
true
take std::string_view
false
is x an invocable
false
true
try in this order:
std::invocable<Args...> auto&& x
return_t(*x)(Args...)
std::move_only_function&&<return_t(Args...)> x
std::function<return_t(Args...)> x
23. Take parameters properly
Effective use of C++
if x can be null
if needing ownership of x
if x is copied
take by value
if x is moved from
take by rvalue reference
(x is only read from)
take by unique_ptr, shared_ptr
take std::optional of x
if x is modified
take by lvalue reference
if x is cheap to copy
take by value
take by const lvalue reference
if x is a range
false
false
false
false
false
false
false
true
true
true
true
true
true
true
does x need to be a contiguous array
false
true
take std::span
can x be an arbitrary range
true
take std::ranges::***
false
does x need to be a specific container
true
take the container
false
take iterator pair
does x need to be perfectly forwarded
take by "universal reference"
true
false
type&& x
type&& x
type x
type& x
type x
const type& x
if x is a readonly string
START HERE
true
take std::string_view
false
is x an invocable
false
true
try in this order:
std::invocable<Args...> auto&& x
return_t(*x)(Args...)
std::move_only_function&&<return_t(Args...)> x
std::function<return_t(Args...)> x
is x a raw memory address
true
false
use a raw pointer
24. Avoid allocations in loops
Effective use of C++
while (true) {
std::string line;
std::getline(std::cin, line);
if (!std::cin)
break;
process_line(line);
}
std::string line;
while (true) {
std::getline(std::cin, line);
if (!std::cin)
break;
process_line(line);
}
std::vector<int> shiny;
for (int i = 1; i <= 100 ++i)
if (is_shiny(i))
shiny.push_back(i);
std::vector<int> shiny;
shiny.reserve(100);
for (int i = 1; i <= 100 ++i)
if (is_shiny(i))
shiny.push_back(i);
move objects out of loops
.clear() if necessary
reserve() when an upper bound on size is known ahead of time
25. Avoid copying exceptions
Effective use of C++
catch(std::exception e) {
std::cerr << e.what() << '\n';
}
catch(const std::exception& e) {
std::cerr << e.what() << '\n';
}
catch(mutable_err& e) {
e.append("Caught in foo")
throw e;
}
catch(mutable_err& e) {
e.append("Caught in foo")
throw;
}
catch by reference
rethrow current exception
26. Avoid copies in range-for
Effective use of C++
std::vector<std::string> names;
for (auto name : names) {
process(name);
}
std::vector<std::string> names;
for (const auto& name : names) {
process(name);
}
avoid copying the iterated object
27. Avoid copies in lambda captures
Effective use of C++
std::flat_set<std::string> deviceLayers;
auto supported = [deviceLayers](std::string_view layer) {
return deviceLayers.contains(layer);
}
std::flat_set<std::string> deviceLayers;
auto supported = [&deviceLayers](std::string_view layer) {
return deviceLayers.contains(layer);
}
capture [&object]
28. Avoid copies in str. bindings
Effective use of C++
auto [first_person, age] = *map.begin();
const auto& [first_person, age] = *map.begin();
bind reference
29. Provide ref qualified methods
Effective use of C++
template <typename T>
class simple_optional {
T data;
bool has_data;
public:
/* *** */
T& value() {
if (!has_data)
throw bad_optional_access();
return data;
}
const T& value() const {
if (!has_data)
throw bad_optional_access();
return data;
}
};
simple_optional<Queue> get_transfer_queue();
try {
Queue q = get_transfer_queue().value();
// ...
Queue gets copied
Effective use of C++
template <typename T>
class simple_optional {
T data;
bool has_data;
public:
/* *** */
T& value() & {
if (!has_data)
throw bad_optional_access();
return data;
}
const T& value() const& {
if (!has_data)
throw bad_optional_access();
return data;
}
T&& value() && {
if (!has_data)
throw bad_optional_access();
return std::move(data);
}
};
simple_optional<Queue> get_transfer_queue();
try {
Queue q = get_transfer_queue().value();
// ...
Queue gets moved
29. Provide ref qualified methods
Effective use of C++
template <typename T>
class simple_optional {
T data;
bool has_data;
public:
/* *** */
decltype(auto) value(this auto&& self) {
if (!self.has_data)
throw bad_optional_access();
return std::forward_like<decltype(self)>(self.data);
}
};
no code duplication
29. Provide ref qualified methods
Effective use of C++
Build pipeline modification
Manual hardware oriented optimisations
Your Performance Todo List
- Enable compiler optimisations
- Set target architecture
- Use fast math
- Disable exceptions and RTTI
- Enable Link Time Optimisation
- Use Unity Builds
- Link statically
- Use Profile Guided Optimisation
- Try different compilers
- Try different standard library implementations
- Keep your tools updated
- Preload you program with a replacement library
- Use binary post processing tools
14. Use constexpr
15. Make variables const
16. Use noexcept
17. Use static for internal linkage
18. Use [[noreturn]]
19. Use [[likely]] and [[unlikely]]
20. Use [[assume]]
21. Mark pointers restrict
22. Mark functions as pure
Annotate your code
No redundant copies
23. Take function parameters properly
24. Avoid allocations in loops
25. Avoid copying exceptions
26. Avoid copies in range-for
27. Avoid copies in lambda captures
28. Avoid copies in structured bindings
29. Provide && method overloads
Cache-friendly code
Memory
Memory
Is memory a contiguous sequence of bytes?
Memory
Is memory a contiguous sequence of bytes?
C++ Standard:
NO
Process address space:
YES
(logical, virtual address space)
Virtual address space in the Physical address space:
NO
Physical address space:
YES
Hardware caching:
Not even a sequence...
Virtual memory
Caches
Physical address space
Process address space
C++ memory model
memory page
Page table
Is memory a contiguous sequence of bytes?
C++ Standard:
NO
Process address space:
YES
(logical, virtual address space)
Virtual address space in the Physical address space:
NO
Physical address space:
YES
C++ Standard:
NO
Process address space:
YES
(logical, virtual address space)
Virtual address space in the Physical address space:
NO
Physical address space:
YES
Access virtual memory address
Translate to physical address
Get data
Virtual Memory
Physical address space
Process address space
C++ memory model
memory page
Page table
Is memory a contiguous sequence of bytes?
C++ Standard:
NO
Process address space:
YES
(logical, virtual address space)
Virtual address space in the Physical address space:
NO
Physical address space:
YES
C++ Standard:
NO
Process address space:
YES
(logical, virtual address space)
Virtual address space in the Physical address space:
NO
Physical address space:
YES
Access virtual memory address
Translate to physical address
Get data
Swap