Ways to Improve the Program Performance
Outline
- Compiler Optimization
- Library Optimization
- Instruction Set Optimization
- Nodes Optimization
- Thread Affinity
- Tool: Vtune
Compiler Optimization
Common Compiler Flags
- Optimize for maximum speed
- -O1
optimizations that increase size for a small benefit
- -O2(default)
- -O3
- enable more aggressive optimizations that may not improve performance on some programs
- -Ofast
- might not be safe for all programs
Library Optimization
eg. MPI & Numerical Libraries
MPI
- Message Passing Interface
- Communication protocol for parallel computers
- Implementations
- Intel MPI
- OpenMPI
- MPICH
Numerical Libraries
- FFTW
- for computing discrete Fourier transforms
- MKL
- a library of optimized math routines
- linear algebra, fast fourier transforms, vector math...
- Example
- DGMX_FFT_LIBRARY=xxx
- Use FFTW3, MKL libraries for FFT support
Instruction Set Optimization
eg. SIMD
SIMD
- Single Instruction Multiple Data
- Same instruction is applied to many data streams
- eg. add 64 numbers by sending 64 data streams to 64 ALUs to form 64 sums within a single clock cycle
- Instruction Sets
- SSE
- AVX256
- AVX512
Nodes Optimization
The more, the better?
Scaling is limited
- Just like team work

Thread Affinity
Thread Affinity
- assign specific threads to a particular processor/core

Tool: Vtune
Vtune

Examples


Reference
Program Performance Improvement
By hsutzu
Program Performance Improvement
- 343