Need For Speed: part 2

 

Reduce heap allocations

This talk is not about:

  • Reduce the amount of RAM your application requires

  • Special allocators (memory pools, arena allocators,...)

CppCon 2017: John Lakos

“Local ('Arena') Memory Allocators"

Problem

Some C++ features are SO convenient that the user sometimes forget about the cost of heap allocations

C style manual memory management is NOT the solution

Measure first!

  • Valgrind / Callgrind
  • Perf ( Hotspot )
  • Visual Studio
  • Intel VTune

Only if you saw that new and delete

take a relevant amount of CPU, try to reduce heap allocations

Tip #1: use Small Object Optimizations

std::string can be implemented to avoid allocations for strings with size <= 23 characters

There are implementations of std::funtion which use stack memory only (no heap allocations)

Many implementations of SmallVector<T, SIZE>, which do not allocate on the heap if size() < SIZE

Prefer std::array<T,Size> over std::vector<T>

    class String
    {
    public:
       String(const String& s) { ... }
       String(const char* s)   { ... }
    
       // all the methods of std::string
    private:
        union Data {
    
            struct NonSSO {
                char* ptr;             // 8 bytes 
                std::size_t size;      // 8 bytes   
        	std::size_t capacity;  // 8 bytes 
            } non_sso;
    
            struct SSO {
        	char string[ 23 ];     // 23 bytes 
                uint8_t size;          // 1 byte 
            } sso;
    
        } m_data; 
    };

https://github.com/elliotgoodrich/SSO-23

CppCon 2016: Nicholas Ormrod

The strange details of std::string at Facebook"

Tip #2: use std::move

  • Move resources from one instance to another.
  • Usually it invalidates the source object.
  • Prevent most of the cost of copying and heap allocations
  • Particularly useful to insert into a STL containers (and more intuitive than emplace)
void StringPush() 
{
    std::vector<std::string> buffer;
    buffer.reserve(100);
    for(int i=0; i<100; i++)
    {
      std::string created_string("long string that requires memory allocation");
      buffer.push_back(created_string);
    }
}

void StringMove() 
{
    std::vector<std::string> buffer;
    buffer.reserve(100);
    for(int i=0; i<100; i++)
    {
      std::string created_string("long string that requires memory allocation");
      buffer.push_back( std::move(created_string)) ;
    }
}

(real world example)

Recycling in action

 
  struct Point {
        float x, y, z;
        float red, green, blue;
        std::vector<float> features; // this usually contains 3 to 8 elements
  };
    
  struct PointCloud {
        std::vector<Point> points; // usually 10.000 +- 40% points.
        std::map<std::string, float> features;
  };
    
  std::deque<PointCloud> buffer; 

  // This function alone takes 38% of the CPU
  void DeserializePointCloud(const PointCloudMessage& msg)
  {
        PointCloud cloud;
        
        while( buffer.size() >= MAX_BUFFER_SIZE) {
            buffer.pop_front();
        }  

        cloud.features = // deserialize from message
        
        for (int i=0; i < N_POINTS; i++)
        {
            Point point = // deserialize from message
            cloud.points.push_back( point );
        }

        buffer.push_back( cloud );
   }
        

We can do better!


  // This function is now 2X faster than the previous one
  void DeserializePointCloud(const PointCloudMessage& msg)
  {
        // recycle an instance already created instance of 
        PointCloud cloud;

        if( buffer.size() >= MAX_BUFFER_SIZE) {
            cloud = std::move( buffer.front() );
        }
         
        while( buffer.size() >= MAX_BUFFER_SIZE) {
            scan_buffer.pop_front();
        }  

        cloud.features = // deserialize from message
        cloud.points.resize( N_POINTS );
        
        for (int i=0; i < N_POINTS; i++)
        {
            cloud.points[i] = // deserialize from message
        }

        buffer.push_back( std::move(cloud) );
   }
        

Recycle an already allocate instance

Resize, don't reserve

Move, do not copy

Tip #3: think about your API

    std::string IntToString( int num );
    int StringToInt( std::string text );
    std::string IntToString( int num );
    int StringToInt( const std::string& text );
    bool IntToString( int num, std::string& output);
    bool StringToInt( const std::string& text, int& number );

Better...

Best... (?)

    void TakesCharStar(const char* s);         // C convention
    void TakesString(const std::string& s);    // C++ convention


    //----------------------------------------------------
    // your code
    const char* char_array = "helloWorld";
    std::string cpp_string("helloWorld");

    TakesCharStar( char_array );         // OK
    TakesCharStar( cpp_string.c_str() ); // ugly ? 

    TakesString( char_array );    // potentially allocate memory
    TakesString( cpp_string );    // OK
    void TakesStringView( std::string_view s);   

    //----------------------------------------------------
    // your code
    const char* char_array = "helloWorld";
    std::string cpp_string("helloWorld");

    TakesStringView( char_array );   // OK
    TakesStringView( cpp_string );   // OK

Exceptions to tip #3:

Return Value Optimization

Sometime you do not have to worry, because the compiler calls std::move for you under the hood

Let's play a game...

    struct Noisy
    {
        Noisy() { std::cout << "constructed\n"; }
        Noisy(const Noisy&) { std::cout << "copy-constructed\n"; }
        Noisy(Noisy&&) { std::cout << "move-constructed\n"; }
        ~Noisy() { std::cout << "destructed\n"; }
    };
    std::vector<Noisy> f()
    {
        std::vector<Noisy> v = std::vector<Noisy>(2); 
        return v; 
    }            
     
    void PrintSize(std::vector<Noisy> arg)
    {
        std::cout << "arg.size() = " << arg.size() << '\n';
    }
    
    int main()
    {
        std::vector<Noisy> v = f();    
        PrintSize(v);                          
        return 0;
    }

constructed
constructed
copy-constructed
copy-constructed
arg.size() = 2
destructed
destructed
destructed
destructed

Guess the output...

Guess the output...


    std::vector<Noisy> f()
    {
        std::vector<Noisy> v = std::vector<Noisy>(2); 
    
        return v; 
    }            
     
    int main()
    {
        std::vector<Noisy> v = f(); 
        return 0;
    }
constructed
constructed
destructed
destructed

    std::vector<Noisy> f()
    {
        std::vector<Noisy> v = std::vector<Noisy>(2); 
        return v; 
    }            
     
    void PrintSize(std::vector<Noisy> arg)
    {
        std::cout << "arg.size() = " << arg.size() << '\n';
    }
    
    int main()
    {
        std::vector<Noisy> v = f();                              
        return 0;
    }
constructed
constructed
arg.size() = 2
destructed
destructed

Guess the output...

Reduce heap allocations

By Davide Faconti

Reduce heap allocations

  • 575