Data Oriented design
or How I Learned to Stop Worrying and Love the Cache
Robert Rouhani
What is it?
A paradigm that focuses on
data
, not
objects
Why?
- Cache misses are SLOW
- Classes have overhead
Massive performance gap
Distance
What Is Cache?
But WHY?
- You need to process a LOT of data
- Code is time-critical
- Examples:
- Particle systems in games
- Data analysis
- Anything embedded
An example
class Particle {
Vec3 position;
Vec3 velocity;
Color color;
void update() {
position.x += velocity.x;
position.y += velocity.y;
position.z += velocity.z;
}
void render(GraphicsContext& ctx) {
//...
}
};
What's wrong, robert?
class Particle {
Vec3 position;
Vec3 velocity;
Color color; <---- Cached but unused in update()
void update() { <---- Potential i-cache miss per particle
position.x += velocity.x; <---
position.y += velocity.y; <--- Potential data misses
position.z += velocity.z; <---
}
void render(GraphicsContext& ctx) {
//...
}
};
Worst case
void update() {
position.x += velocity.x;
position.y += velocity.y;
position.z += velocity.z;
}
For (only) 4 particles:
one.update() -> i-cache miss (~600 cycles)
position.x -> data cache miss (~600 cycles)
velocity.x -> data cache miss (~600 cycles)
vector addition -> (~6 cycles)
600 + 600 + 600 + 6 = 1806 cycles
1806 * 4 = 7224 cycles
For only about 24 cycles of meaningful processing
The Solution (Part 1)
class ParticleManager {
std::vector<Vec3> positions; <--- Data is stored sequentially,
std::vector<Vec3> velocities; <--- not in bits and pieces all
std::vector<Color> colors; <--- over the heap
void update() { <---- Reduce number of i-cache misses to at most 1
for (int i = 0; i < positions.size(); i++){
positions[i].x += velocities[i].x; <-- Read data sequentially
positions[i].y += velocities[i].y; <-- to minimize the number
positions[i].z += velocities[i].z; <-- of data cache misses
}
}
};
This is still sub-optimal, a particle's position and velocity are now very far apart in memory.
The Remaining problem
- Position and Velocity vectors separate
- Causes 2 cache misses when done with row
- We can still reduce this!
The Solution (Part 2)
struct ParticleMotionData {
Vec3 position;
Vec3 velocity;
};
class ParticleManager {
std::vector<ParticleMotionData> motion; <-- Stored together now
std::vector<Color> colors;
void update() {
for (int i = 0; i < motion.size(); i++) {
motion[i].position.x += motion[i].velocity.x;
motion[i].position.y += motion[i].velocity.y;
motion[i].position.z += motion[i].velocity.z;
}
}
};
performance
- No hard numbers on the example (sorry)
- Other presentations show 2x-4x performance
- This Sony presentation
- Reduction from 19.6ms to 4.8ms
- Only moving data around in memory!
Sony Presentation
better design
- Easier to isolate actions
- Easier to serialize
- Easier to send over a network
- Easier to make parallel
Multi-threading
- Create thread pool
- Divide your array of data into chunks
- Assign threads to chunks of data
It's THAT simple!
Further reading
-
http://www.asawicki.info/news_1422_data-oriented_design_-_links_and_thoughts.html
-
http://www.slideshare.net/DICEStudio/introduction-to-data-oriented-design
-
http://stackoverflow.com/q/1641580/1122135
-
http://gamesfromwithin.com/data-oriented-design
- http://www.youtube.com/watch?v=16ZF9XqkfRY
- http://www.dataorienteddesign.com/dodmain/dodmain.html
Thanks
Professor Goldschmidt
Professor Moorthy
Sean O' Sullivan
RCOS
Questions?
Data Oriented design
By Robert Rouhani
Data Oriented design
- 3,157