All images and fonts placed here to
preload them into the browser cache
Nexa Text Regular
Nexa Text Bold
Nexa Text Italic
Nexa Text Bold Italic
#include <random_code>
using in_cpp;
to preload() {
// the monospace
font as = well;
}
Self-taught C++ Developer
Realtime rendering
Game development
janbielak.com github.com/janekb04 youtube.com/@janbielak
Practically Correct, Just-in-Time Shell Script Parallelization
Konstantinos Kallas, Tammam Mustafa, Jan Bielak, Dimitris Karnikis, Thurston H.Y. Dang, Michael Greenberg, Nikos Vasilakis. 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI22)
GCC, LLVM, ICC | -O2 or -O3 |
---|---|
MSVC | /Ox or /O2 |
Optimize for speed
Optimize for size
GCC, LLVM, ICC | -Os |
---|---|
MSVC | /O1 |
Optimization #1
Longer compile time
GCC, LLVM, ICC | -march=native -mtune=native |
---|---|
MSVC |
/arch:IA32 or /arch:SSE or /arch:SSE2 or /arch:AVX or /arch:AVX2 or /arch:AVX512 |
For x86
For ARM
GCC, LLVM | -mcpu=native |
---|---|
MSVC | /arch:ARMv7VE or /arch:VFPv4 or /arch:armv8.0 ... or /arch:armv8.8 |
automatic detection of current processor's features
automatic detection of current processor's features
needs to be specified manually
needs to be specified manually
GCC, LLVM | -ffast-math (included in -Ofast) |
---|---|
MSVC | /fp:fast |
ICC | -fp-model=fast |
Faster computation
Less precise results
Non standard-compliant
GCC, LLVM, ICC | -fno-exceptions |
---|---|
MSVC | /EHs-c- /D_HAS_EXCEPTIONS=0 |
GCC, LLVM, ICC | -fno-rtti |
---|---|
MSVC | /GR- |
No exceptions
No RTTI
Limited performance gains
Non standard-compliant
Breaks code using exceptions
Compiler
Compiler
Compiler
Linker
}
?
?
?
Compiler
Compiler
Compiler
Linker
}
GCC, LLVM | -flto |
---|---|
MSVC | /GL |
ICC | -ipo |
Compiler
Linker
}
Compiler
Compiler
Linker
}
Compiler
Unity Build
Unity Build
CMake | -DCMAKE_UNITY_BUILD=ON |
---|
Static Linking | Dynamic Linking |
---|---|
|
Better optimisable
More space efficient
Can be updated independently of executable
Build pipeline
Build pipeline
Execute
Build pipeline
Execute
Build pipeline
GCC, LLVM | -fprofile-generate |
---|---|
MSVC | /GENPROFILE |
ICC | -prof-gen |
GCC, LLVM | -fprofile-use |
---|---|
MSVC | /USEPROFILE |
ICC | -prof-use |
env LD_PRELOAD=/usr/lib/libSUPERmalloc.so ./myprogram
env DYLD_INSERT_LIBRARIES=/usr/lib/libSUPERmalloc.dylib ./myprogram
Requires DLL injection
Windows
macOS
Linux, BSD
LLVM BOLT
perf record
perf2bolt
LLVM BOLT
perf record
perf2bolt
llvm-bolt
Annotate your code
Constant expressions:
Literals:
1, 3.0f, nullptr, "Hello"
Arithmetic:
2 + 3, 4.0 / 3.0
Sizes and alignments:
sizeof(int), alignof(std::vector<int>)
...
constexpr int f(int x) { return 3 * x + 5; }
Constexpr functions:
invocation MAY be a constant expression
f(5)
int x;
std::cin >> x;
f(x);
is a constant expression
is NOT a constant expression
Is a given invocation evaluated at compile time?
if (std::is_constant_evaluated()) { ... }
if consteval { ... }
consteval int f(int x) { return 3 * x + 5; }
Immediate functions:
f(5)
int x;
std::cin >> x;
f(x);
is a constant expression
is a COMPILE ERRROR
invocation MUST be a constant expression
(inside function body)
if constexpr (compile_time_condition) {...}
If constexpr:
if constexpr (std::is_constant_evaluated()) {...}
ALWAYS TRUE
constexpr std::array<int> primes{ 2, 3, 5, 7, 11 };
Constexpr variables:
variable must be initialised at its declaration
constexpr int x;
x = 3;
is a COMPILE ERRROR
implies
primes[0] = 1;
is a COMPILE ERRROR
const
constexpr int f(int x) { return x + 1; }
int main()
{
int x1 = 3;
constexpr int y1 = f(x1);
constexpr int x2 = 3;
constexpr int y2 = f(x2);
}
accessing it is a constant expression
is a COMPILE ERRROR
constinit std::array<int> primes{ 2, 3, 5, 7, 11 };
Constinit variables:
(by a constant a expression)
variable must be initialised at its declaration by a constant expression
constexpr std::array<int> primes{ 2, 3, 5, 7, 11 };
Constexpr variables:
constinit std::array<int> primes{ 2, 3, 5, 7, 11 };
Constinit variables:
constexpr int f(int x) { return 3 * x + 5; }
Constexpr functions:
Is a given invocation evaluated at compile time?
if (std::is_constant_evaluated()) { ... }
if consteval { ... }
consteval int f(int x) { return 3 * x + 5; }
Immediate functions:
if constexpr (compile_time_condition) {...}
If constexpr:
std::vector<float>
get_mean_deltas(std::vector<float> data)
{
float sum = 0;
for (auto&& num : data)
sum += num;
for (auto& num : data)
num -= sum / data.size();
return data;
}
Declare variables const
std::vector<float>
get_mean_deltas(std::vector<float> data)
{
const float sum = std::accumulate(
data.begin(),
data.end(),
0.0f
);
for (auto& num : data)
num -= sum / data.size();
return data;
}
std::vector<float>
get_mean_deltas(std::vector<float> data)
{
const float sum = std::accumulate(
data.begin(),
data.end(),
0.0f
);
const float __mean = sum / data.size();
for (auto& num : data)
num -= __mean;
return data;
}
...so this expression is loop-invariant and can be hoisted
and no expensive division in loop!
sum is const...
~ compiler's thought process
(paraphrased)
Declare variables const
template <typename T>
class vector {
T* begin;
T* end;
T* capacity;
/* ... */
public:
constexpr size_t size() const noexcept {
return end - begin;
}
};
template <typename T>
class vector {
T* begin;
T* end;
T* capacity;
/* ... */
public:
constexpr size_t size(this const vector& self) noexcept {
return self.end - self.begin;
}
};
Declare member functions const
Copy globals to const locals
(if copying is cheap)
struct {
/* ... */
bool fill;
} _internal__state;
void set_draw_mode_filled();
void set_draw_mode_wireframe();
void draw_mesh(const mesh* m) {
for (const primitive* prim = m->begin(); prim != m->end(); ++prim) {
if(_internal__is_frontfacing(*prim)) {
if (_internal__state.fill) {
_internal__draw_prim_filled(*prim);
}
else {
_internal__draw_prim_wireframe(*prim);
}
}
}
}
Copy globals to const locals
(if copying is cheap)
void draw_mesh(const mesh* m) {
for (const primitive* prim = m->begin(); prim != m->end(); ++prim)
if(_internal__is_frontfacing(*prim))
if (_internal__state.fill)
_internal__draw_prim_filled(*prim);
else
_internal__draw_prim_wireframe(*prim);
}
void draw_mesh(const mesh* m) {
if (_internal__state.fill)
for (const primitive* prim = m->begin(); prim != m->end(); ++prim)
if(_internal__is_frontfacing(*prim))
_internal__draw_prim_filled(*prim);
else
for (const primitive* prim = m->begin(); prim != m->end(); ++prim)
if(_internal__is_frontfacing(*prim))
_internal__draw_prim_wireframe(*prim);
}
could modify _internal__state.fill
could modify _internal__state.fill
Copy globals to const locals
(if copying is cheap)
void draw_mesh(const mesh* m) {
const bool fill = _internal__state.fill;
for (const primitive* prim = m->begin(); prim != m->end(); ++prim)
if(_internal__is_frontfacing(*prim))
if (fill)
_internal__draw_prim_filled(*prim);
else
_internal__draw_prim_wireframe(*prim);
}
void draw_mesh(const mesh* m) {
if (_internal__state.fill)
for (const primitive* prim = m->begin(); prim != m->end(); ++prim)
if(_internal__is_frontfacing(*prim))
_internal__draw_prim_filled(*prim);
else
for (const primitive* prim = m->begin(); prim != m->end(); ++prim)
if(_internal__is_frontfacing(*prim))
_internal__draw_prim_wireframe(*prim);
}
could modify _internal__state.fill
but we don't care
void f();
COULD throw an exception
void f() noexcept;
WILL NEVER throw an exception
void f() noexcept(true);
void f() noexcept(false);
template <typename T>
void swap(T&& lhs, T&& rhs)
noexcept(std::is_nothrow_move_constructible<T>
&& std::is_nothrow_move_assignable<T>)
{
T tmp = std::move(lhs);
lhs = std::move(rhs);
rhs = std::move(tmp);
}
noexceptness
depends on T
void f();
COULD throw an exception
void f() noexcept;
WILL NEVER throw an exception
void f() noexcept(true);
void f() noexcept(false);
template <typename T>
void swap(T&& lhs, T&& rhs)
noexcept(std::is_nothrow_move_constructible<T>
&& std::is_nothrow_move_assignable<T>)
{
T tmp = std::move(lhs);
lhs = std::move(rhs);
rhs = std::move(tmp);
}
template <typename T>
void swap(T&& lhs, T&& rhs)
noexcept(noexcept(T(std::move(lhs)))
&& noexcept(lhs = std::move(rhs)))
{
T tmp = std::move(lhs);
lhs = std::move(rhs);
rhs = std::move(tmp);
}
int counter() {
static int counter = 0;
return ++counter;
};
struct image {
namespace fs = std::filesystem;
static image from_file(fs::path path);
};
Static variables
Static member functions
static int global_value;
static void global_func();
Internal linkage variables
Internal linkage functions
a.cpp
b.cpp
// Forward declarations
extern int global_value;
void global_func();
extern int global_value2;
void global_func2();
//Use
void example() {
global_value = 42;
global_func();
global_value2 = 42;
global_func2();
}
static int global_value;
static void global_func();
Internal linkage functions
a.cpp
b.cpp
int global_value2;
void global_func2();
// Forward declarations
extern int global_value;
void global_func();
extern int global_value2;
void global_func2();
//Use
void example() {
global_value = 42;
global_func();
global_value2 = 42;
global_func2();
}
unresolved external symbol
unresolved external symbol
?
?
[[noreturn]] void Log::Error(const String& msg) {
logfile << msg << '\n';
std::cerr << msg << '\n';
throw Engine::RuntimeError(msg);
}
void internal_work();
bool require_init = true;
void init_lib();
void internal_work();
bool require_init = true;
void init_lib();
void internal_work();
void work()
{
if(require_init) {
init_lib();
require_init = false;
}
internal_work();
}
bool require_init = true;
void init_lib();
void internal_work();
void work()
{
if(require_init) {
init_lib();
require_init = false;
}
internal_work();
}
C++23 | [[assume(condition)]]; |
---|---|
GCC | if (!condition) __builtin_unreachable(); |
MSVC, ICC | __assume(condition); |
LLVM | __builtin_assume(condition); |
[[assume(condition)]]; | assert(condition); |
---|---|
Condition must be true | Condition must be true |
For the optimiser | For the programmer |
If !condition then Undefined Behaviour |
If !condition then std::abort() in Debug Mode noop in Release Mode |
void implementation(internal_t* obj) {
if (obj) {
internal_work(*obj);
}
}
void interface(public_t* obj) {
if (obj) {
[[assume(obj->internal)]];
implementation(obj->internal);
}
}
Assume that pointer is non null*
*better use a reference
void limiter(float* samples, size_t count) {
[[assume(samples % 32 == 0)]];
[[assume(size > 0)]];
for (int i = 0; i < count; ++i) {
samples[i] = std::clamp(samples[i], -1.0, 1.0)
}
}
Assume pointer alignment*
*or use std::assume_aligned
example taken from P1774 (the [[assume]] proposal)
const char* get_name(TextureType type) {
switch(e) {
case TextureType::Texture2D:
return "Texture2D";
case TextureType::Texture3D:
return "Texture3D";
case TextureType::Texture2DArray:
return "Texture2DArray";
case TextureType::Cubemap:
return "Cubemap";
default:
[[assume(false)]];
}
}
Declare a code path unreachable
*or use std::unreachable
float* __restrict buffer0;
float* __restrict buffer1;
float* __restrict buffer0;
float* __restrict buffer1;
UB if overlap
pointer provenance
GCC, LLVM, ICC | __attribute__((malloc)) |
---|---|
MSVC | __declspec(restrict) |
f
param0
param1
output
GCC, LLVM, ICC | __attribute__((pure)) or [[gnu::pure]] |
---|---|
MSVC | Not Supported |
f
param0
param1
output
GCC, LLVM, ICC | __attribute__((pure)) or [[gnu::pure]] |
---|---|
MSVC | Not Supported |
f
param0
param1
output
GCC, LLVM, ICC | __attribute__((const)) or [[gnu::const]] |
---|---|
MSVC | Not Supported |
global state
14. Use constexpr
15. Make variables const
16. Use noexcept
17. Use static for internal linkage
18. Use [[noreturn]]
19. Use [[likely]] and [[unlikely]]
20. Use [[assume]]
21. Mark pointers restrict
22. Mark functions as pure
Annotate your code
No redundant copies
void func(??? x);
if x can be null
if needing ownership of x
if x is copied
take by value
if x is moved from
func(x);
call site:
declaration?
take by rvalue reference
(x is only read from)
take by unique_ptr, shared_ptr
take std::optional of x
if x is modified
take by lvalue reference
if x is cheap to copy
take by value
take by const lvalue reference
if x is a range
false
false
false
false
false
false
false
true
true
true
true
true
true
true
does x need to be a contiguous array
false
true
take std::span
can x be an arbitrary range
true
take std::ranges::***
false
does x need to be a specific container
true
take the container
false
take iterator pair
does x need to be perfectly forwarded
take by "universal reference"
true
false
type&& x
type&& x
type x
type& x
type x
const type& x
START HERE
void f(const std::string& s);
f("Hello");
f(std::string{"Hello"}.c_str());
void f(const char* s);
(safe - lifetime of temporary extended)
implicit conversion to string
(allocation)
verbose
(safe)
f(std::string{"Hello"});
f("Hello");
void f(std::string_view s);
works for both
(no copies)
(safe)
if x can be null
if needing ownership of x
if x is copied
take by value
if x is moved from
take by rvalue reference
(x is only read from)
take by unique_ptr, shared_ptr
take std::optional of x
if x is modified
take by lvalue reference
if x is cheap to copy
take by value
take by const lvalue reference
if x is a range
false
false
false
false
false
false
false
true
true
true
true
true
true
true
does x need to be a contiguous array
false
true
take std::span
can x be an arbitrary range
true
take std::ranges::***
false
does x need to be a specific container
true
take the container
false
take iterator pair
does x need to be perfectly forwarded
take by "universal reference"
true
false
type&& x
type&& x
type x
type& x
type x
const type& x
START HERE
if x can be null
if needing ownership of x
if x is copied
take by value
if x is moved from
take by rvalue reference
(x is only read from)
take by unique_ptr, shared_ptr
take std::optional of x
if x is modified
take by lvalue reference
if x is cheap to copy
take by value
take by const lvalue reference
if x is a range
false
false
false
false
false
false
false
true
true
true
true
true
true
true
does x need to be a contiguous array
false
true
take std::span
can x be an arbitrary range
true
take std::ranges::***
false
does x need to be a specific container
true
take the container
false
take iterator pair
does x need to be perfectly forwarded
take by "universal reference"
true
false
type&& x
type&& x
type x
type& x
type x
const type& x
if x is a readonly string
START HERE
true
take std::string_view
false
if x can be null
if needing ownership of x
if x is copied
take by value
if x is moved from
take by rvalue reference
(x is only read from)
take by unique_ptr, shared_ptr
take std::optional of x
if x is modified
take by lvalue reference
if x is cheap to copy
take by value
take by const lvalue reference
if x is a range
false
false
false
false
false
false
false
true
true
true
true
true
true
true
does x need to be a contiguous array
false
true
take std::span
can x be an arbitrary range
true
take std::ranges::***
false
does x need to be a specific container
true
take the container
false
take iterator pair
does x need to be perfectly forwarded
take by "universal reference"
true
false
type&& x
type&& x
type x
type& x
type x
const type& x
if x is a readonly string
START HERE
true
take std::string_view
false
is x an invocable
false
true
try in this order:
std::invocable<Args...> auto&& x
return_t(*x)(Args...)
std::move_only_function&&<return_t(Args...)> x
std::function<return_t(Args...)> x
if x can be null
if needing ownership of x
if x is copied
take by value
if x is moved from
take by rvalue reference
(x is only read from)
take by unique_ptr, shared_ptr
take std::optional of x
if x is modified
take by lvalue reference
if x is cheap to copy
take by value
take by const lvalue reference
if x is a range
false
false
false
false
false
false
false
true
true
true
true
true
true
true
does x need to be a contiguous array
false
true
take std::span
can x be an arbitrary range
true
take std::ranges::***
false
does x need to be a specific container
true
take the container
false
take iterator pair
does x need to be perfectly forwarded
take by "universal reference"
true
false
type&& x
type&& x
type x
type& x
type x
const type& x
if x is a readonly string
START HERE
true
take std::string_view
false
is x an invocable
false
true
try in this order:
std::invocable<Args...> auto&& x
return_t(*x)(Args...)
std::move_only_function&&<return_t(Args...)> x
std::function<return_t(Args...)> x
is x a raw memory address
true
false
use a raw pointer
while (true) {
std::string line;
std::getline(std::cin, line);
if (!std::cin)
break;
process_line(line);
}
std::string line;
while (true) {
std::getline(std::cin, line);
if (!std::cin)
break;
process_line(line);
}
std::vector<int> shiny;
for (int i = 1; i <= 100 ++i)
if (is_shiny(i))
shiny.push_back(i);
std::vector<int> shiny;
shiny.reserve(100);
for (int i = 1; i <= 100 ++i)
if (is_shiny(i))
shiny.push_back(i);
move objects out of loops
.clear() if necessary
reserve() when an upper bound on size is known ahead of time
catch(std::exception e) {
std::cerr << e.what() << '\n';
}
catch(const std::exception& e) {
std::cerr << e.what() << '\n';
}
catch(mutable_err& e) {
e.append("Caught in foo")
throw e;
}
catch(mutable_err& e) {
e.append("Caught in foo")
throw;
}
catch by reference
rethrow current exception
std::vector<std::string> names;
for (auto name : names) {
process(name);
}
std::vector<std::string> names;
for (const auto& name : names) {
process(name);
}
avoid copying the iterated object
std::flat_set<std::string> deviceLayers;
auto supported = [deviceLayers](std::string_view layer) {
return deviceLayers.contains(layer);
}
std::flat_set<std::string> deviceLayers;
auto supported = [&deviceLayers](std::string_view layer) {
return deviceLayers.contains(layer);
}
capture [&object]
auto [first_person, age] = *map.begin();
const auto& [first_person, age] = *map.begin();
bind reference
template <typename T>
class simple_optional {
T data;
bool has_data;
public:
/* *** */
T& value() {
if (!has_data)
throw bad_optional_access();
return data;
}
const T& value() const {
if (!has_data)
throw bad_optional_access();
return data;
}
};
simple_optional<Queue> get_transfer_queue();
try {
Queue q = get_transfer_queue().value();
// ...
Queue gets copied
template <typename T>
class simple_optional {
T data;
bool has_data;
public:
/* *** */
T& value() & {
if (!has_data)
throw bad_optional_access();
return data;
}
const T& value() const& {
if (!has_data)
throw bad_optional_access();
return data;
}
T&& value() && {
if (!has_data)
throw bad_optional_access();
return std::move(data);
}
};
simple_optional<Queue> get_transfer_queue();
try {
Queue q = get_transfer_queue().value();
// ...
Queue gets moved
template <typename T>
class simple_optional {
T data;
bool has_data;
public:
/* *** */
decltype(auto) value(this auto&& self) {
if (!self.has_data)
throw bad_optional_access();
return std::forward_like<decltype(self)>(self.data);
}
};
no code duplication
14. Use constexpr
15. Make variables const
16. Use noexcept
17. Use static for internal linkage
18. Use [[noreturn]]
19. Use [[likely]] and [[unlikely]]
20. Use [[assume]]
21. Mark pointers restrict
22. Mark functions as pure
Annotate your code
No redundant copies
23. Take function parameters properly
24. Avoid allocations in loops
25. Avoid copying exceptions
26. Avoid copies in range-for
27. Avoid copies in lambda captures
28. Avoid copies in structured bindings
29. Provide && method overloads
Cache-friendly code
Memory
Memory
Is memory a contiguous sequence of bytes?
Memory
Is memory a contiguous sequence of bytes?
C++ Standard:
NO
Process address space:
YES
(logical, virtual address space)
Virtual address space in the Physical address space:
NO
Physical address space:
YES
Hardware caching:
Not even a sequence...
Virtual memory
Caches
Physical address space
Process address space
C++ memory model
memory page
Page table
Is memory a contiguous sequence of bytes?
C++ Standard:
NO
Process address space:
YES
(logical, virtual address space)
Virtual address space in the Physical address space:
NO
Physical address space:
YES
C++ Standard:
NO
Process address space:
YES
(logical, virtual address space)
Virtual address space in the Physical address space:
NO
Physical address space:
YES
Access virtual memory address
Translate to physical address
Get data
Virtual Memory
Physical address space
Process address space
C++ memory model
memory page
Page table
Is memory a contiguous sequence of bytes?
C++ Standard:
NO
Process address space:
YES
(logical, virtual address space)
Virtual address space in the Physical address space:
NO
Physical address space:
YES
C++ Standard:
NO
Process address space:
YES
(logical, virtual address space)
Virtual address space in the Physical address space:
NO
Physical address space:
YES
Access virtual memory address
Translate to physical address
Get data
Swap
Disk
Working set
Access virtual memory address
Check page table
Swap page in
Fetch from RAM
RAM
Disk
Translate to physical address
Get data
DATA
in the working set
page fault
, thrashing
Virtual Memory
Access virtual memory address
Check page table
Swap page in
Fetch from RAM
RAM
Disk
Translate to physical address
Get data
DATA
in the working set
page fault
, thrashing
Caching
Access virtual memory address
Check TLB
Check cache
Check page table
Swap page in
Fetch from RAM
L1
L2
L3
CPU
RAM
Disk
Translate to physical address
Get data
DATA
in the working set
page fault
, thrashing
High latency
Prefetching
you wanted ar[0]?
well, here's the whole ar
Data locality
Cache line
Caching
you wanted ar[0]?
well, here's the whole ar
Temporal locality
CPU cache
Caching
Processor's
Execution Units
μop Cache
Loopback buffer
L1 Instruction Cache
L1 Data Cache
Register renaming and register files
L2 Cache
L3 Cache
Working set
TLB
CPU
RAM
Core
Page table
Memory
Access virtual memory address
Check TLB
Check cache
Check page table
Swap page in
Fetch from RAM
L1
L2
L3
CPU
RAM
Translate to physical address
Get data
DATA
in the working set
page fault
, thrashing
TLB hit
hit
hit
miss
miss
miss
TLB miss
std::array
std::vector
std::deque
std::flat_map
std::flat_set
std::list
std::set
std::unordered_set
std::map
std::unordered_map
int matrix[rows][cols];
for (int row = 0; row < rows; ++row)
for (int col = 0; col < cols; ++col)
process(matrix[row][col]);
int matrix[rows][cols];
for (int col = 0; col < cols; ++col)
for (int row = 0; row < rows; ++row)
process(matrix[row][col]);
struct DebugInfo {
std::string name;
time_point creation;
size_t use_cnt;
}
class DescriptorSet {
VkDescriptorSet handle;
DebugInfo debug;
// guaranteed to outlive,
// not dangling
const Device& device;
public:
// ...
};
device
some c-string
handle
debug
debug.name
debug.name.m_data
debug.name.m_len
debug.creation
debug.use_cnt
device
...
struct DebugInfo {
std::string name;
time_point creation;
size_t use_cnt;
}
class DescriptorSet {
VkDescriptorSet handle;
VkDevice device_raw;
unique_ptr<DebugInfo> debug;
const Device& device;
public:
// ...
};
handle
debug
device_raw
...
device
Linux | pthread_set_affinity |
---|---|
Windows | SetThreadAffinityMask |
macOS | thread_policy_set with thread_affinity_policy_t |
Pin thread to a core
Linux, macOS | setpriority |
---|---|
Windows | SetPriorityClass |
Linux | pthread_setschedprio |
---|---|
Windows | SetThreadPriority |
macOS | setThreadPriority (Objective C) |
Set priority of the process
Set priority of a thread
Contiguous data structures
Data oriented design
SOA vs AOS
Sequential memory access
Entity Component Systems
NUMA architectures
int thread1_data{};
int thread2_data{};
std::thread t1{work, std::ref(thread1_data)};
std::thread t2{work, std::ref(thread2_data)};
likely on the same cache line
false sharing
alignas(std::hardware_destructive_interference_size) int thread1_data{};
alignas(std::hardware_destructive_interference_size) thread2_data{};
std::thread t1{work, std::ref(thread1_data)};
std::thread t2{work, std::ref(thread2_data)};
on different cache lines
no dependencies
CPU
RAM
Cache
regular store
non-temporal store
14. Use constexpr
15. Make variables const
16. Use noexcept
17. Use static for internal linkage
18. Use [[noreturn]]
19. Use [[likely]] and [[unlikely]]
20. Use [[assume]]
21. Mark pointers restrict
22. Mark functions as pure
Annotate your code
No redundant copies
30. Keep the working set small
31. Exploit data locality
32. Exploit temporal locality
33. Avoid false sharing
34. Use non temporal stores
Cache-friendly code
Branch predictor friendly code
23. Take function parameters properly
24. Avoid allocations in loops
25. Avoid copying exceptions
26. Avoid copies in range-for
27. Avoid copies in lambda captures
28. Avoid copies in structured bindings
29. Provide && method overloads
Branch predictor
35. Avoid indirected calls
36. Make branches predictable
37. Use branchless optimisations
38. Use SIMD intrinsics
38. Use SIMD intrinsics
14. Use constexpr
15. Make variables const
16. Use noexcept
17. Use static for internal linkage
18. Use [[noreturn]]
19. Use [[likely]] and [[unlikely]]
20. Use [[assume]]
21. Mark pointers restrict
22. Mark functions as pure
Annotate your code
No redundant copies
30. Keep the working set small
31. Exploit data locality
32. Exploit temporal locality
33. Avoid false sharing
34. Use non temporal stores
35. Avoid indirected calls
36. Make branches predictable
37. Use branchless optimisations
Cache-friendly code
Branch predictor friendly code
38. Use SIMD intrinsics
23. Take function parameters properly
24. Avoid allocations in loops
25. Avoid copying exceptions
26. Avoid copies in range-for
27. Avoid copies in lambda captures
28. Avoid copies in structured bindings
29. Provide && method overloads
Presentation made using slides.com
Nexa font family by Fontfabric
SVGs made in Pixelmator Pro
“Memory Tape” rendered in Blender