LLVM and Emscripten

What is LLVM?

  • a compiler system
  • it does a lot! e.g. analysis tools of interest to people who design languages
  • it has an intermediate low-level language to compile to, which can then generate native code for different targets
    • e.g. instead of compiling directly from C++ to x86, the LLVM chain goes C++ to LLVM to x86
    • clang is a C and C++ compiler that makes use of LLVM, for example

What is LLVM?

  • it is not a virtual machine
    • it can be compiled JIT, but LLVM is not meant to be executed/interpreted like, say, Java or CLR is
  • other developers have made frontends other than C/C++, supposedly (e.g. PHP or Lua or Haskell all compiled into LLVM)
  • code generation targets other low-level languages, such as x86, x64, MIPS...

That's neat and all

(but what can we actually do?)

  • It depends on what languages you have that can compile into LLVM and what targets you can generate assembly for!
  • This brings me to the reason I wanted to talk about this: a subset of javascript known as asm.js, and a compiler toolchain known as emscripten.

asm.js and Emscripten

  • asm.js -- the javascript spec used as a compile target
    • "a strict subset of JavaScript that can be used as a low-level, efficient target language for compilers. This sublanguage effectively describes a sandboxed virtual machine for memory-unsafe languages like C or C++."
  • emscripten -- the full compiler toolchain
    • "​Emscripten is an LLVM-based project that compiles C and C++ into highly-optimizable JavaScript in asm.js format."

C++ on the web

  • just to be clear, this is compiling C++ to run directly in the browser as javascript (or similar)
  • this is separate and distinct from cgi, which is typically native code that runs server-side and emits HTML, etc. to render

My Experience

  • So far, not much -- environment setup and a few simple "hello world" adjacent applications
    • input to emscripten is valid C or C++ or already-compiled LLVM (from any language)
    • output is an HTML shell, program memory, and the javascript file with everything
    • The HTML shell lets you define behavior for things like stdin, stdout, stderr, or other callbacks you might create in your C++ application
    • emscripten includes an OpenGL to WebGL shim, which mostly works -- calls that are valid in OpenGL but not in WebGL will cause clearly stated runtime errors

My Experience

  • I was surprised at how a lot of it just worked
    • default (if you don't define it otherwise) keyboard input via stdin::getline is weird for terminal programs, I'm currently trying to figure out more user-intuitive behavior
    • true concurrency doesn't work, but it at least round-robins runtime between threads
    • OpenGL really just works
  • Portingis a lot easier than I imagined, but I can still see some places where effort may be needed
    • e.g. abstracting away OS-specific libraries (Windows events for keyboard, mouse, frame updates, etc.) and implementing new ones specific to emscripten

Use in the Wild

oh also a live demo

"what about webassembly"

  • Good question, and one I don't have an in-depth answer for!
  • All I know is that there is an option to compile to WebAssembly instead of asm.js, and WebAssembly strikes me as less horrific than using javascript as a compile target because this is the exact use case WebAssembly is for.

LLVM and Emscripten

By tdhoward

LLVM and Emscripten

nowhere is safe from C++

  • 734