(Keep) Making the Web Faster with SIMD.js

Sajjad Taheri

Little About Me

  • From Iran
  • CS Phd Student at UC Irvine
  • Working on high performance communication and computation for the Web platform
    • WebRTC

    • Computer Vision for the web

      • RGB/Depth Image processing

      • CV and SIMD = a perfect match!

SIMD

  • Single Instruction Multiple Data
  • A class of parallel computer architecture
  • do an arithmetic operation on multiple data points in parallel

SIMD

  • Many variants: SSE, AVX, FMA
    • they differ at
      • Vector operations they support
        • SSE4 has richer instructions than SSE2
      • Vector width = do more operations in one cycle        
        • AVX vectors have twice the size of SSE's
  • SIMD.js considers SSE2 as the minimum SIMD support
    • Vectors are 128 bits wide

SIMD.js

  • JavaScript API to expose SIMD to web apps
  • Browsers provide efficient implementation for each underlying hardware
    • SIMD offers so much and many web apps can benefit from it
      • 3D Graphics, Video processing
    • Its already available on most processors
      • E.g. first X86 CPUs to support SSE2 :
        • Pentium 4 (2001), AMD K8 (2003)

SIMD.js

Firefox (Nightly) was the first browser to implement

A SIMD.js Demo

Boolean Vectors

  • Latest addition to SIMD.js API
    •  New data types
      • Bool8x16, Bool16x8, Bool32x4
    • Operators
      • logical, comparison, selection, ...
  • Boolean vectors are mainly used for vector comparison and conditional assignments
  • Been mimiced by integer vectors before

Program will be cleaner

Performance will be better

SpiderMOneky Boolean Vectors Implementation

  • Add boolean vectors and their operators to the interpreter
  • JIT Compiler (Ion Monkey)
    • Stack/register allocator
    • Generate efficient machine code for operators (currently only Bool32x4 and for X86/64)

GL-Matrix

GL-Matrix (SIMD)

  • Mat4 operations are getting SIMDized using the JavaScript API
  • Operations include:
    • Multiply, rotate, adjoint, inverse, translate, scale.. 
  • They're the most heavily used
  • SIMD vectors can be better utilized to achieve the higher parallelism

GL-Matrix SIMD Benchmark

Small benchmark measures each function performance

 

GL-Matrix SIMD Performance

  • Overhead in interpreter mode
    • efficient machine code only after JIT compilation
  • Less improvement for small functions

Numbers are extracted from 500k iterations on Nightly

Thanks

 

 

 

Questions?

Intern Presentaion

By Sajjad Taheri

Intern Presentaion

  • 542