(Keep) Making the Web Faster with SIMD.js
Sajjad Taheri
Little About Me
- From Iran
- CS Phd Student at UC Irvine
- Working on high performance communication and computation for the Web platform
-
WebRTC
-
Computer Vision for the web
-
RGB/Depth Image processing
-
CV and SIMD = a perfect match!
-
-
SIMD
- Single Instruction Multiple Data
- A class of parallel computer architecture
-
do an arithmetic operation on multiple data points in parallel
SIMD
- Many variants: SSE, AVX, FMA
- they differ at
- Vector operations they support
- SSE4 has richer instructions than SSE2
- Vector width = do more operations in one cycle
- AVX vectors have twice the size of SSE's
- Vector operations they support
- they differ at
- SIMD.js considers SSE2 as the minimum SIMD support
- Vectors are 128 bits wide
SIMD.js
- JavaScript API to expose SIMD to web apps
- Browsers provide efficient implementation for each underlying hardware
- SIMD offers so much and many web apps can benefit from it
- 3D Graphics, Video processing
- Its already available on most processors
- E.g. first X86 CPUs to support SSE2 :
- Pentium 4 (2001), AMD K8 (2003)
- E.g. first X86 CPUs to support SSE2 :
- SIMD offers so much and many web apps can benefit from it
SIMD.js
Firefox (Nightly) was the first browser to implement
A SIMD.js Demo
Boolean Vectors
- Latest addition to SIMD.js API
- New data types
- Bool8x16, Bool16x8, Bool32x4
- Operators
- logical, comparison, selection, ...
- New data types
- Boolean vectors are mainly used for vector comparison and conditional assignments
- Been mimiced by integer vectors before
Program will be cleaner
Performance will be better
SpiderMOneky Boolean Vectors Implementation
- Add boolean vectors and their operators to the interpreter
- JIT Compiler (Ion Monkey)
- Stack/register allocator
- Generate efficient machine code for operators (currently only Bool32x4 and for X86/64)
GL-Matrix
GL-Matrix (SIMD)
- Mat4 operations are getting SIMDized using the JavaScript API
- Operations include:
- Multiply, rotate, adjoint, inverse, translate, scale..
- They're the most heavily used
- SIMD vectors can be better utilized to achieve the higher parallelism
GL-Matrix SIMD Benchmark
Small benchmark measures each function performance
GL-Matrix SIMD Performance
-
Overhead in interpreter mode
- efficient machine code only after JIT compilation
- Less improvement for small functions
Numbers are extracted from 500k iterations on Nightly
Thanks
Questions?
Intern Presentaion
By Sajjad Taheri
Intern Presentaion
- 542