LLVM and Emscripten
What is LLVM?
- a compiler system
- it does a lot! e.g. analysis tools of interest to people who design languages
- it has an intermediate low-level language to compile to, which can then generate native code for different targets
- e.g. instead of compiling directly from C++ to x86, the LLVM chain goes C++ to LLVM to x86
- clang is a C and C++ compiler that makes use of LLVM, for example
What is LLVM?
-
it is not a virtual machine
- it can be compiled JIT, but LLVM is not meant to be executed/interpreted like, say, Java or CLR is
- other developers have made frontends other than C/C++, supposedly (e.g. PHP or Lua or Haskell all compiled into LLVM)
- code generation targets other low-level languages, such as x86, x64, MIPS...
That's neat and all
(but what can we actually do?)
- It depends on what languages you have that can compile into LLVM and what targets you can generate assembly for!
- This brings me to the reason I wanted to talk about this: a subset of javascript known as asm.js, and a compiler toolchain known as emscripten.
asm.js and Emscripten
-
asm.js -- the javascript spec used as a compile target
- "a strict subset of JavaScript that can be used as a low-level, efficient target language for compilers. This sublanguage effectively describes a sandboxed virtual machine for memory-unsafe languages like C or C++."
-
emscripten -- the full compiler toolchain
- "Emscripten is an LLVM-based project that compiles C and C++ into highly-optimizable JavaScript in asm.js format."
C++ on the web
- just to be clear, this is compiling C++ to run directly in the browser as javascript (or similar)
- this is separate and distinct from cgi, which is typically native code that runs server-side and emits HTML, etc. to render
My Experience
- So far, not much -- environment setup and a few simple "hello world" adjacent applications
- input to emscripten is valid C or C++ or already-compiled LLVM (from any language)
- output is an HTML shell, program memory, and the javascript file with everything
- The HTML shell lets you define behavior for things like stdin, stdout, stderr, or other callbacks you might create in your C++ application
- emscripten includes an OpenGL to WebGL shim, which mostly works -- calls that are valid in OpenGL but not in WebGL will cause clearly stated runtime errors
My Experience
- I was surprised at how a lot of it just worked
- default (if you don't define it otherwise) keyboard input via stdin::getline is weird for terminal programs, I'm currently trying to figure out more user-intuitive behavior
- true concurrency doesn't work, but it at least round-robins runtime between threads
- OpenGL really just works
- Portingis a lot easier than I imagined, but I can still see some places where effort may be needed
- e.g. abstracting away OS-specific libraries (Windows events for keyboard, mouse, frame updates, etc.) and implementing new ones specific to emscripten
Use in the Wild
- QuakeJS - recompiled open-source Quake to run in a browser, using emscripten
- BananaBread - Cube 2 engine recompiled to WebAssembly
- ports of things like Ogg, FFMpeg, zlib, FreeType, QT
- full list here: https://github.com/kripken/emscripten/wiki/Porting-Examples-and-Demos
oh also a live demo
"what about webassembly"
- Good question, and one I don't have an in-depth answer for!
- All I know is that there is an option to compile to WebAssembly instead of asm.js, and WebAssembly strikes me as less horrific than using javascript as a compile target because this is the exact use case WebAssembly is for.
LLVM and Emscripten
By tdhoward
LLVM and Emscripten
nowhere is safe from C++
- 734