Some of the syntax will be ugly and confusing
*Dealing with this when writing benchmarks is fun
Single Instruction, Multiple Data
void add_arrays(float* a, float* b, float* res){
for(int i = 0; i < SIZE; i+= 8){
//Load into SIMD regs
__m256 mma = _mm256_load_ps(a+i);
__m256 mmb = _mm256_load_ps(b+i);
//Calculate the sum
__m256 mmres = _mm256_add_ps(mma, mmb);
//Store the result in res for the current 8 values
_mm256_store_ps(res+i, mmres);
}
}
(Also threads will be important no matter what)