Introduction to SIMD concept
The current state of increasing PC calculation performance
Ways to improve:
- raise the CPU frequency (on limit)
- add more CPU's (no single thread speed-up)
- add more processor cache (no reason)
- speed-up I/O, RAM, etc... (on the way)
- code optimization (need human resources)
- something else?
The Free Lunch Is Over
Herb Sutter (December 2004)
What is SIMD?
SIMD is Single Instruction Multiple Data
It's concept of data level parallelism
In other words SIMD is way to manipulate with a big chunk of data using identical instruction from some instruction pull.
SIMD realization on PC
1997y Intel: MMX (Multimedia Extensions) provide in i860
1998y AMD: 3DNow! provide in AMD K6-2 [dead from 2010]
1999y Intel: SSE (Streaming SIMD Extensions) in Pentium III
2001y Intel: SSE2 in Pentium IV
2004y Intel: SSE3 in Pentium IV (Prescott revision)
2005y Intel: SSSE3 (Supplemental SSE3) in 5100 Xeon
2006y Intel: SSE4 in Nehalem-based Core i7
2006-2008y: SSE4a, SSE4.1, SSE4.2, POPCNT, LZCNT
2008y Intel: AVX (Advanced Vector Extensions) but first processor at Q1 2011 (Sandy Bridge)
2013y Intel: AVX2 in Haswell
2013y Intel: FMA1 - FMA4 (Fused Multiply-Add) in Haswell
2013y Intel: AVX-512 in Xeon Phi (Knights Landing)
NEON on ARM architecture
Language with SIMD API
- C/C++/D
- (CIL) C# (2015) Mono from 2008
- Java script (2013)
- Dart
Lets make
"hello world"
sum of 4 integers program
Vectors
a |
b |
c |
d |
e |
f |
j |
k |
a + e |
b + f |
c + j |
d + k |
+
=
64bit
128bit
256bit
512bit
Containt:
- Integer
- Char
- Float
- Double
- Bitmask
- Complex
- Undefine
- NaN
Is it really faster?
Time to ask questions
Operation pull
- Arithmetic
- Bit Manipulation
- Cast
- Double
- Compare
- Convert
- Cryptography
- Elementary Math Functions
- General Support
- Load
- Logical
- Mask
- Move
- Miscellaneous
- Random
- Probability/Statistics
- OS-Targeted
- Set
- Shift
- Special Math Functions
- Store
- String Compare
- Swizzle
- Trigonometry
Ways to use it
- Let compiler/interpreter do it
- Use assembler directly
- Use intrinsic
Let's see something interesting
Time to ask questions
Logic operations concept
9 |
97 |
6 |
21 |
5 |
65 |
35 |
5 |
5 |
15 |
7 |
4 |
12 |
2 |
2 |
6 |
<
0x0000000 |
0x0000000 |
0xFFFFFFFF |
0x0000000 |
0xFFFFFFFF |
0x0000000 |
0xFFFFFFFF |
0x0000000 |
=
OR
42 |
Logic operations concept example
Для каждой космической миссии
Если миссия Успешна:
Написать хвалебную речь
Иначе:
Написать траурную речь
Для каждой космической миссии
Написать траурную речь
Написать хвалебную речь
Если мисиия Успешна:
Взять хвалебную речь
Иначе:
Взять траурную речь
... and one more example
Для каждой космической миссии
Если миссия Успешна:
Написать хвалебную речь
Иначе:
Если экипаж сбежал:
Написать испепеляющую речь
Иначе:
Написать траурную речь
Для каждой космической миссии
Написать траурную речь
Написать хвалебную речь
Написать испепеляющую речь
Если мисиия Успешна:
Взять хвалебную речь
Иначе:
Взять испепеляющую речь
Если экипаж погиб:
Взять траурную речь
Иначе:
Оставить речь что была раньще
no lazy check
Data preparing cost
RGBA to grey strategy
Time to ask questions
Language with SIMD API
- C/C++/D
- (CIL) C# (2015) Mono from 2008
- Java script (2013)
- Dart
Few words about JS
Few words about C#
The most important things
- Parallel code != multithreaded code
- SIMD instructions are hardware dependent. But it's not problem
- If Math calculation is bottleneck SIMD can be a solution
Useful links
- Intel Intrinsics Guide - list of all possible intrinsics
- kernel.org SIMD basic tutor - good guide for SIMD beginners
- IDF14 - JS demo's repo
- (Auto)Vectorization tutorial with Intel compiler
- google.com - nice service to find SIMD solution and tutorials
- Coders guild repo - all examples from this talk
Time to ask questions
Introduction to SIMD concept
By demobin
Introduction to SIMD concept
- 665