Need For Speed: part 2
Reduce heap allocations
This talk is not about:
-
Reduce the amount of RAM your application requires
-
Special allocators (memory pools, arena allocators,...)
CppCon 2017: John Lakos
“Local ('Arena') Memory Allocators"
Problem
Some C++ features are SO convenient that the user sometimes forget about the cost of heap allocations
C style manual memory management is NOT the solution
Measure first!
- Valgrind / Callgrind
- Perf ( Hotspot )
- Visual Studio
- Intel VTune
Only if you saw that new and delete
take a relevant amount of CPU, try to reduce heap allocations
Tip #1: use Small Object Optimizations
std::string can be implemented to avoid allocations for strings with size <= 23 characters
There are implementations of std::funtion which use stack memory only (no heap allocations)
Many implementations of SmallVector<T, SIZE>, which do not allocate on the heap if size() < SIZE
Prefer std::array<T,Size> over std::vector<T>
class String
{
public:
String(const String& s) { ... }
String(const char* s) { ... }
// all the methods of std::string
private:
union Data {
struct NonSSO {
char* ptr; // 8 bytes
std::size_t size; // 8 bytes
std::size_t capacity; // 8 bytes
} non_sso;
struct SSO {
char string[ 23 ]; // 23 bytes
uint8_t size; // 1 byte
} sso;
} m_data;
};
https://github.com/elliotgoodrich/SSO-23
CppCon 2016: Nicholas Ormrod
“The strange details of std::string at Facebook"
Tip #2: use std::move
- Move resources from one instance to another.
- Usually it invalidates the source object.
- Prevent most of the cost of copying and heap allocations
- Particularly useful to insert into a STL containers (and more intuitive than emplace)
void StringPush()
{
std::vector<std::string> buffer;
buffer.reserve(100);
for(int i=0; i<100; i++)
{
std::string created_string("long string that requires memory allocation");
buffer.push_back(created_string);
}
}
void StringMove()
{
std::vector<std::string> buffer;
buffer.reserve(100);
for(int i=0; i<100; i++)
{
std::string created_string("long string that requires memory allocation");
buffer.push_back( std::move(created_string)) ;
}
}
(real world example)
Recycling in action
struct Point {
float x, y, z;
float red, green, blue;
std::vector<float> features; // this usually contains 3 to 8 elements
};
struct PointCloud {
std::vector<Point> points; // usually 10.000 +- 40% points.
std::map<std::string, float> features;
};
std::deque<PointCloud> buffer;
// This function alone takes 38% of the CPU
void DeserializePointCloud(const PointCloudMessage& msg)
{
PointCloud cloud;
while( buffer.size() >= MAX_BUFFER_SIZE) {
buffer.pop_front();
}
cloud.features = // deserialize from message
for (int i=0; i < N_POINTS; i++)
{
Point point = // deserialize from message
cloud.points.push_back( point );
}
buffer.push_back( cloud );
}
We can do better!
// This function is now 2X faster than the previous one
void DeserializePointCloud(const PointCloudMessage& msg)
{
// recycle an instance already created instance of
PointCloud cloud;
if( buffer.size() >= MAX_BUFFER_SIZE) {
cloud = std::move( buffer.front() );
}
while( buffer.size() >= MAX_BUFFER_SIZE) {
scan_buffer.pop_front();
}
cloud.features = // deserialize from message
cloud.points.resize( N_POINTS );
for (int i=0; i < N_POINTS; i++)
{
cloud.points[i] = // deserialize from message
}
buffer.push_back( std::move(cloud) );
}
Recycle an already allocate instance
Resize, don't reserve
Move, do not copy
Tip #3: think about your API
std::string IntToString( int num );
int StringToInt( std::string text );
std::string IntToString( int num );
int StringToInt( const std::string& text );
bool IntToString( int num, std::string& output);
bool StringToInt( const std::string& text, int& number );
Better...
Best... (?)
void TakesCharStar(const char* s); // C convention
void TakesString(const std::string& s); // C++ convention
//----------------------------------------------------
// your code
const char* char_array = "helloWorld";
std::string cpp_string("helloWorld");
TakesCharStar( char_array ); // OK
TakesCharStar( cpp_string.c_str() ); // ugly ?
TakesString( char_array ); // potentially allocate memory
TakesString( cpp_string ); // OK
void TakesStringView( std::string_view s);
//----------------------------------------------------
// your code
const char* char_array = "helloWorld";
std::string cpp_string("helloWorld");
TakesStringView( char_array ); // OK
TakesStringView( cpp_string ); // OK
Exceptions to tip #3:
Return Value Optimization
Sometime you do not have to worry, because the compiler calls std::move for you under the hood
Let's play a game...
struct Noisy
{
Noisy() { std::cout << "constructed\n"; }
Noisy(const Noisy&) { std::cout << "copy-constructed\n"; }
Noisy(Noisy&&) { std::cout << "move-constructed\n"; }
~Noisy() { std::cout << "destructed\n"; }
};
std::vector<Noisy> f()
{
std::vector<Noisy> v = std::vector<Noisy>(2);
return v;
}
void PrintSize(std::vector<Noisy> arg)
{
std::cout << "arg.size() = " << arg.size() << '\n';
}
int main()
{
std::vector<Noisy> v = f();
PrintSize(v);
return 0;
}
constructed
constructed
copy-constructed
copy-constructed
arg.size() = 2
destructed
destructed
destructed
destructed
Guess the output...
Guess the output...
std::vector<Noisy> f()
{
std::vector<Noisy> v = std::vector<Noisy>(2);
return v;
}
int main()
{
std::vector<Noisy> v = f();
return 0;
}
constructed constructed destructed destructed
std::vector<Noisy> f()
{
std::vector<Noisy> v = std::vector<Noisy>(2);
return v;
}
void PrintSize(std::vector<Noisy> arg)
{
std::cout << "arg.size() = " << arg.size() << '\n';
}
int main()
{
std::vector<Noisy> v = f();
return 0;
}
constructed constructed
arg.size() = 2
destructed destructed
Guess the output...
Reduce heap allocations
By Davide Faconti
Reduce heap allocations
- 575