C++ as Assembly 2.0 - Hello Nim

by Viktor Kirilov

subtitle: every talk I make is a lightning talk
and this one is going to be intense - sorry :|

Me, myself and I

  • my name is Viktor Kirilov - from Bulgaria
  • used to be into game development, currently into databases
  • creator of doctest - the fastest C++ testing framework
  • implemented hot code reloading in the Nim compiler
    • ​still needs a bit more work :|
  • I write C++ for a living ​but I dream of Nim

Nim

  • Started in 2005 by Andreas Rumpf (still the main dev)
  • strongly typed, compiled, multi paradigm, indentation based
  • some of the most powerful metaprogramming
  • "Speed of C, elegance of Python, power of Lisp/Perl"
  • "Nim is to C/C++ as CoffeeScript is to JavaScript"
  • aims to dominate in any space
    • compiled to C / C++ / Javascript / Objective C
    • can run on any platform (including the browser)
    • generates high performance code and standalone binaries

Hello

echo "Hello World" # code
nim c hello.nim # compile
./hello         # run
Hello World     # stdout

Some code...

import strformat # can also import specific parts and give aliases

type
  Person = object
    name: string
    age: Natural # Ensures the age is positive

# Types in Nim: arrays, sequences, tuples, sets & objects
let people = [
  Person(name: "John", age: 45),
  Person(name: "Kate", age: 30)
]

# Proc means function
proc doWork() =
  # Indentation based - like Python
  for person in people:
    # Type-safe string interpolation, evaluated at compile time.
    echo(fmt"{person.name} is {person.age} years old")

doWork() # and we get some output

Discriminated Unions - "Variants"

# This is an example how an abstract syntax tree could be modelled in Nim
type
  NodeKind = enum # the different node types
    Int,    # a leaf with an integer value
    String, # a leaf with a string value
    Add,    # an addition
    Sub,    # a subtraction
    If      # an if statement

  Node = ref object
    case kind: NodeKind # the ``kind`` field is the discriminator
    of Int: int_val: int
    of String: str_val: string
    of Add, Sub: # '+' or '-'
      left, right: Node
    of If:
      condition, then_part, else_part: Node

var n = Node(kind: Int, int_val: 42)

if n.kind notin {Add, Sub, If}: # easy work with sets/flags/enumerations
  echo "it's a value!"

# raises a `FieldError` exception, because n.kind is Int and not String
n.str_val = ""

OOP in Nim

type
  Note* = ref object of RootObj
    text: string
  TaskNote* = ref object of Note
    completed: bool

method render*(note: Note): string = # the object is the first argument
  return note.text

method render*(note: TaskNote): string =
  case note.completed
  of true: return "☑ " & note.text
  of false: return "☐ " & note.text

var baseType: Note = TaskNote(text: "Do me soon!", completed: false)
echo baseType.render() # polymorphic call - outputs "☐ Do me soon!"
  • object as first argument & uniform call syntax: func(obj) == obj.func()
  • encapsulation through modules (exports are explicit - with *)
  • subtyping via single inheritance
  • method (not proc): dynamic binding (polymorphism)
  • for dynamic dispatch on multiple parameters: --multimethods:on

Effect system - transitive

proc writeToConsole() =
  echo "is IO a side effect?"

var glob = 5
proc touchGlobal() =
  glob = 6

proc impossible() {.noSideEffect.} = # will be checked within the whole program
  writeToConsole() # this won't compile because of the "echo"
  touchGlobal() # this won't compile either - because of the global var access

proc complex() {.raises: [IOError, ArithmeticError].} = # only these can be thrown
  #...
proc simple() {.raises: [].} = # no exceptions can pass through
  #...
  • tracks side effects (like no global variable access)
  • tracks exceptions
  • tracks tags - "ReadIOEffect", "WriteIOEffect", "ExecIOEffect"
  • tracks locking levels ==> deadlock prevention at compile-time
  • tracks "GC safety" (no side effects ==> GC safe)
  • user extensible

Distinct types - "strong typedefs"

# example defining a currency

type
  # or use {.borrow.} here to reuse everything
  Dollars = distinct float

proc `+`(a, b: Dollars): Dollars {.borrow.}

var a = 20.Dollars

a = 25  # Doesn't compile
a = 25.Dollars  # Works fine

a = 20.Dollars * 20.Dollars # Doesn't compile
a = 20.Dollars + 20.Dollars # Works fine
  • This would be a ton of (slow-to-compile) code in C++
  • No more NASA failures for 500$ million because of unit issues

Meta-programming

  • when a program reads, generates, analyzes or transforms other programs
  • why do it
    • higher levels of abstraction and thinking
    • can enforce better coding patterns
    • can optimize code – by compile-time rewrites
      • think expression templates in C++
    • can increase code readability and maintainability
      • with great power comes great responsibility

What is an AST (Abstract Syntax Tree)

while b ≠ 0
  if a > b
    a := a − b
  else
    b := b − a
return a

Meta-programming in Nim

  • levels of complexity in Nim:
    • normal procs and inline iterators
    • generic procs and closure iterators
    • templates << expand & replace AST nodes
    • macros << can manipulate the AST in any way
  • respects the type system
    • unlike template engines & other mechanisms
      • *cough* C preprocessor *cough*
  • NimVM - responsible for:
    • expansion & execution of templates & macros
    • compile-time evaluation of code/expressions/constants

Templates

template withLock(lock: Lock, body: untyped) =
  acquire lock
  try:
    body # <<< this is where the 'block' of code will go
  finally:
    release lock
var ourLock: Lock # create a lock
initLock ourLock  # init the lock

withLock ourLock: # here we use the template and pass a code block
  echo "Do something which requires locking"
  echo "This might throw an exception"
var ourLock: Lock # this code was written out of the template
initLock ourLock  # this code was written out of the template

acquire ourLock # the result after the substitution - no "template call"
try:
  echo "Do something that requires locking"
  echo "This might throw an exception"
finally:
  release ourLock

Templates 2

template withFile(f_var: untyped,   # name of the file variable
                  filename: string, # file to open
                  mode: FileMode,   # file open mode
                  body: untyped) =  # the block of code
  let fn = filename # to prevent double evaluation of 'filename'
  var f_var: File
  if open(f_var, fn, mode): # first use of 'fn'
    try:
      body # <<< this is where the 'block' of code will go
    finally:
      close(f_var)
  else:
    quit("cannot open: " & fn) # second use of 'fn'

withFile(txt, "ttempl3.txt", fmWrite): # f_var will be 'txt'
  txt.writeLine("line 1")
  txt.writeLine("line 2")

Templates - pattern matching constructs

# match multiplication of integers and the literal '2'
template optimMultiply{`*`(a, 2)}(a: int): int = a + a

let x = 3
# will use addition instead of multiplication - not
# really useful since the C/C++ optimizer would handle
# this specific case, but this showcases the power of Nim
echo x * 2

# not going to match this because it's '3' and not '2'
echo x * 3
# this definition exists in the System module
template `!=` (a, b: untyped): untyped =
  not (a == b)

assert(5 != 6) # the compiler rewrites that to: assert(not (5 == 6))

Dumping the AST of a code block

import macros # 'macros' module for working with the AST

# inspect the AST hierarchy of any block of code
dumpTree:
  var mt: MyType = MyType(a:123.456, b:"abcdef")
# the output from dumpTree
StmtList               # list ot statements
  VarSection           # var statement
    IdentDefs          # identifier definition
      Ident "mt"       # name of var - an identifier
      Ident "MyType"   # type of var - an identifier
      ObjConstr        # constructor
        Ident "MyType" # identifier to call for construction
        ExprColonExpr  # ... you get the picture
          Ident "a"
          FloatLit 123.456
        ExprColonExpr
          Ident "b"
          StrLit "abcdef"

Macros - the explicit interface (low-level)

import macros, strutils # 'macros' module for manipulating the AST

macro toEnum(words: static[string]): untyped =
  # in Nim procs and macros have an implicit "result" variable
  # here we create a new tree node of type 'Enumeration'
  result = newTree(nnkEnumTy, newEmptyNode())

  # we split the words string on whitespace and iterate over them
  for w in splitWhitespace(words):
    # and we add identifier nodes as children to the enumeration
    result.add ident(w)

type
  Color = toEnum"Red Green Blue Indigo"

# Indigo is a valid identifier after the NimVM
# has gone through the call to "toEnum"
var color = Indigo

Getting the code to construct a code block

import macros # 'macros' module for working with the AST
dumpAstGen: # this will dump Nim code to create the code we have passed it
  proc hello() =
    echo "hi"
nnkStmtList.newTree( # the procedure definition statement
  nnkProcDef.newTree( # the procedure definition
    newIdentNode(!"hello"), # identifier - the name
    newEmptyNode(),
    newEmptyNode(),
    nnkFormalParams.newTree( # no parameters
      newEmptyNode()
    ),
    newEmptyNode(), # empty (for pragmas, etc.) - our 'hello' proc is too simple
    newEmptyNode(),
    nnkStmtList.newTree( # the list of statements in the proc
      nnkCommand.newTree( # a function call
        newIdentNode(!"echo"), # we call 'echo'
        newLit("hi") # and we pass it 'hi'
      )
    )
  )
)

Executing the output from dumpAstGen

import macros # 'macros' module for working with the AST

macro generate_hello(): typed =
  result = nnkStmtList.newTree( # the output from dumpAstGen (from last slide)
    nnkProcDef.newTree(
      newIdentNode(!"hello"),
      newEmptyNode(),
      newEmptyNode(),
      nnkFormalParams.newTree(
        newEmptyNode()
      ),
      newEmptyNode(),
      newEmptyNode(),
      nnkStmtList.newTree(
        nnkCommand.newTree(
          newIdentNode(!"echo"),
          newLit("hi")
        )
      )
    )
  )

generate_hello() # create the hello() proc from the last slide
hello() # hello() now exists!!! and when called will print "hi"

"quote do:" expressions - less verbose

import macros # 'macros' module for manipulating the AST

type
  MyType = object # some random type with 2 fields
    a: float
    b: string

macro myMacro(arg: untyped): untyped =
  var mt: MyType = MyType(a:123.456, b:"abcdef") # an arbitrary value
  
  let mtLit = newLit(mt) # convert the value into a NimNode tree
  
  # here we put literally the code we want
  # we inject NimNode symbols with backticks
  result = quote do:
    echo `arg`
    echo `mtLit`

myMacro("Hallo") # call the bad boy

# The call to myMacro will generate the following code:
echo "Hallo"
echo MyType(a: 123.456'f64, b: "abcdef")

HTML DSL (Domain Specific Language)

import html_dsl

# no need for third-party
# templating engines

html page:
  head:
    title("Title")
  body:
    p("Hello")
    p("World")
    dv:
      # we can mix code with
      # the HTML view
      for i in 0..3:
        p "Example"

echo render(page())
<!DOCTYPE html>
  <html class='has-navbar-fixed-top'>
  <head>
    <meta charset="utf-8">
    <meta name="viewport"
          content="width=device-width">
    <title>Title</title>
  </head>
  <body class='has-navbar-fixed-top'>
    <p >Hello</p>
    <p >World</p>
    <div>
      <p>Example</p>
      <p>Example</p>
      <p>Example</p>
    </div>
  </body>
</html>

Protobuf implementation: no external tool

import protobuf # module can be fetched from the package manager

# Define our Protobuf specification
const protoSpec = """syntax = "proto3";
message ExampleMessage {
  int32 number = 1;
  string text = 2;
  SubMessage nested = 3;
  message SubMessage {
    int32 a_field = 1;
  }
}
"""
# Generate Nim code to use it - invoking a full Protobuf parser
parseProto(protoSpec)

# Create messages using the already constructed Nim types
var msg = new ExampleMessage
msg.number = 10
msg.text = "Hello world"
# We even have helper functions for working with sub-messages
msg.nested = initExampleMessage_SubMessage(aField = 100)

What meta-programming gives us

  • higher level zero-cost abstractions
    • can help enforce patterns
    • can help readability & maintainability
  • can create a DSL (Domain Specific Language)
    • as seen with the HTML example
    • GUI creation & bindings
  • writing serialization & deserialization functions is a thing of the past
    • iterate over the fields of types at compile time - never forget anything!
  • no need for third-party code generators and template engines
  • Nim's high performance asynchronous IO framework (async/await)
  • https://hookrace.net/blog/introduction-to-metaprogramming-in-nim/

Feature dump (Incomplete)

  • function call parens are optional - echo("hello") OR echo "hello"
  • ~destructors
  • generics - HashSet[string] - can work with other types
  • concepts - constraints for generics
  • pattern matching
  • converters for implicit conversions - need to be explicitly written
  • extensible pragmas
  • defer - basically for "ON_SCOPE_EXIT" cleanup code
  • exceptions (reuses the C++ machinery when compiled to it - can interop!)
  • explicit "discard" keyword for unused results of functions
  • named and default arguments
  • statements are expressions
  • iterators & closures (generators / coroutines / resumable functions)
  • functions are first-class citizens (facilitating functional programming)
  • package manager (distributed) - based on Git & Mercurial

Nim compilation & runtime model

  • nim c -d:release main.nim
    • always compile only the main file, follow the imports
    • whole program analysis
    • a .c file for each .nim file in a "nimcache" (temp) folder (also .obj files)
    • only referenced (imported) modules & symbols are compiled in the end
  • Startup: a DFS traversal, Output: "foo" => "BAR!" => "main"
  • entire project is always "compiled" by Nim (currently no "minimal" rebuild)
    • ~4-5 sec for the entire source of Nim - 150+ files (without the C compiler)
    • the C/C++ compiler rebuilds only changed files (takes a bit more time)
    • "incremental recompilation" (IC) is the next big thing for Nim
# main.nim
import foo

# there is no main() function
echo from_foo()
echo "main" # global scope code
# foo.nim
import bar

proc from_foo*(): string =
  return from_bar

echo "foo" # global scope code
# bar.nim
let local = "BAR!"

let from_bar* = local
            ^
# '*' means 'exported'

Nim procs to C/C++ (simplified)

# foo.nim
proc foo() =
  echo "hello"
foo()
// == includes section
#include <nimbase.h> // nimbase is always present
// == type definitions section
struct TGenericSeq { int len; int reserved; };
struct NimStringDesc : public TGenericSeq { ... };
// == constants & globals section
STRING_LITERAL(the_string_literal, "hello", 5);
NimStringDesc the_string_constant(the_string_literal);
// == forward declarations section
void foo_iineYNh8S9cE6Ry7dr2Tz2A(); // mangled name
// == definitions section
void foo_iineYNh8S9cE6Ry7dr2Tz2A() {
    echoBinSafe(the_string_constant, 1); // the echo call
}
// == init section
void init_module_foo() {
    foo_iineYNh8S9cE6Ry7dr2Tz2A(); // << call
}
// == other sections - omitted for simplicity
// ...

Nim types to C/C++ (simplified)

type
  MyData = object
    answer: int
    ready: bool
proc newData(): MyData = return MyData(answer: 42, ready: true)
echo newData().answer
// == type definitions section
struct MyData {
    int answer;
    bool ready;
};
// == definitions section
tyObject_MyData newData() {
    MyData result; // always an implicit "result"
    result.answer = ((int) 42);
    result.ready = true;
    return result;
}
// == init section
void init_module_foo() {
    MyData T2_;
    T2_ = newData(); // << call the construction
    echoBinSafe(T2_.answer, 1); // the echo call
}

Nim compilation to C/C++: a BIG win

  • smaller scope for the compiler
  • all the cutting-edge optimization from C/C++ compilers for free
  • out-of-the-box support for tons of platforms
  • easiest C/C++ interop possible
  • exceptions - reusing those of C++ when using that backend
  • nim to C/C++ code mapping with #line directives for debuggers
  • no generated headers for the exported parts of modules
  • each .c/.cpp file contains everything (and only what) it needs
    • forward declarations for external functions
    • type definitions
  • each .c/.cpp file includes nimbase.h and a few C stdlib headers
  • high level macros & templates => simple structs and functions

Interfacing with C/C++

proc printf(formatstr: cstring)
    {.header: "<stdio.h>", importc: "printf", varargs.}
{.emit: """
using namespace core;
""".} # emits C++ code! can also do inline assembly

{.compile: "logic.c".} # compile & link this .c file

other pragmas - for use in Nim:

We can also call Nim code from C/C++:

# fib.nim
proc fib(a: cint): cint {.exportc.} # do not mangle
nim c --noMain --noLinking --header:fib.h fib.nim
// user.c
#include <fib.h>

Interfacing with C/C++

  • Anything possible in C++ is possible in Nim
  • The best possible C/C++ interoperability out of any language
  • c2nim tool - generate C/C++ bindings for Nim
  • nimterop tool - based on tree-sitter - discussion
type # here we define a generic type which maps directly to std::map
  StdMap {.importcpp: "std::map", header: "<map>".} [K, V] = object

proc `[]=`[K, V](this: var StdMap[K, V]; key: K; val: V) {.
  importcpp: "#[#] = #", header: "<map>".} # we import the [] operator

var x: StdMap[cint, cdouble] # and we use it directly...
x[6] = 91.4
std::map<int, double> x;
x[6] = 91.4; // no binding layer - C++ types/functions are called directly

C++ template constructs

Generated C++

Nim has a GC  (⊙_☉)  =>  (┛ಠ_ಠ)┛彡┻━┻

  • Different garbage collectors - more info here and here
    • Soft real-time deferred RC'ing GC, Mark and Sweep, Boehm, even Go!
  • We have a lot of control - unlike in other languages
    • when the GC should run
    • soft real-time guarantees - for how long it is allowed to run
  • each thread has its own GC - they don't wait for each other!
  • GC can be avoided, and can be turned off even now!
    • value-based types, destructors, stack & manual heap allocation
      • type Foo = object    => on the stack, no GC ever.
      • type Bar = ptr object    => equivalent raw C pointer, no GC ever.
    • stdlib will be eventually free of GC use
    • type Baz = ref object    => GC-managed object
      • deferred reference counting (only if survives it's creation scope)

Nim's GC - time & memory use

Runtime - small binaries

Nim backends - even the web!

  • C code can also be compiled to WASM or asm.js and run in the browser!
    • in addition to the normal JavaScript compilation
  • True isomorphic programming - 1 language for backend & frontend
  • cross compilation: ship .c files with a compile.sh script to any platform

The web

import dom # only for the javascript backend

proc onLoad(event: Event) =
  let p = document.createElement("p")
  p.innerHTML = "Click me!"
  p.style.fontFamily = "Helvetica"
  p.style.color = "red"

  p.addEventListener("click",
    proc (event: Event) =
      window.alert("Hello World!")
  )

  document.body.appendChild(p)

window.onload = onLoad

Jester - a Sinatra-like web framework

import htmlgen, jester, re # html, server & regular expressions

routes:
  get "/hello/@name?":
    # This matches "/hello/fred" and "/hello/bob" => ``@"name"`` will
    # be either "fred" or "bob". It will also match "/hello/".
    if @"name" == "":
      resp "No name received :("
    else:
      resp "Hello " & @"name"
  
  # Matches URLs of the form /15.html => request.matches[0] will be 15.
  get re"^\/([0-9]{2})\.html$":
    resp request.matches[0]
  
  # A greeting for the root
  get "/":
    resp h1("Hello world")

On C++...

  • C++ is a dinosaur - both in terms of age and size
    • have you seen the spec?
  • Expert-friendly Expert-tolerable
  • easy to shoot yourself in the foot
  • compile & link times are huge
  • no standard package manager & build system
  • multiple languages - the normal one, preprocessor, TMP
    • oh... the preprocessor...
    • and oh... the TMP... the SFINAE... the tricks... the errors... the horror...
  • no static reflection in 2019? really?
  • It takes more than 5k LOC to implement <optional> - a value and a bool...
    • think about that... a value T and a bool...
    • and what about variant (discriminated union)? or safe_int? or units?

More on C++...

  • Compilation model is based on compiling source files in isolation
  • No whole-program view of things without LTO
    • But LTO is only for optimization - what about cross-TU analysis on the original language and not on the low-level LLVM IR? So much context gets lost...
    • The lifetime analysis effort from a few years ago?
      • gradually being dumbed down - it's too hard - will never get to "Rust level"
    • How about enforcing a transitive effect system or checking exception specifications at compile time? Forget about it...
  • Zero-cost abstractions? They ain't free - we pay at build time...
    • Reparsing the same headers - up to the tens of megabytes per translation unit
      • and we reinstantiate the same templates...
      • and the linker has to remove all duplicated weak (inline) symbols...
    • C++20 Ranges: 20 line example - 3 seconds of compile time, Debug build really slow
    • Sometimes we don't entirely remove the abstractions even in Release builds...

My "favourite" aspect of C++

Clean slate - the C++ successor is here

  • Big reasons it's hard to switch away from C++:
    • legacy and maturity - too much software written already - big investment
    • inertia - attachment and lack of interest to learn new languages
  • C++ isn't breaking backwards compatibility anytime soon (which is good IMHO)
    • there is a proposal to introduce "C++ epochs" just like in Rust - doubt it...
  • C++ is a HUGE time/money cost on the scale of hundreds of millions
    • developer productivity (build times in the hours), bugs & safety
      • In C++ we are constantly solving unimportant problems
        • headers/sources, forward declarations, thinking how not to include X
      • dealing with absurdly complicated TMP
      • 5k LOC for optional - a value T and a boolean !!!
  • C++ is a great and valuable ongoing research - let's learn from it and iterate
  • I know about modules, concepts, constexpr & the reflection TS... not good enough!
  • Nim can reuse all the C/C++ software and is wildly superior...

Rust & Nim

  • shortcomings or Rust
    • because of the borrowing & lifetime checks
      • hard to learn
      • slow to compile & work with
    • meta-programming nowhere close to Nim
    • ​{} braces? really? :D (nitpicking...)
  • Ownership & borrowing semantics can be done in Nim too!
    • Even without them Nim is much safer than C/C++
  • Cross-language LTO between C++ & Rust - congrats
    • ​But that comes out-of-the-box when using Nim!
  • Microsoft are looking into Rust - look at Nim as well !!!

Why programming languages matter

  • I recently realized how connected thinking and languages are
    • And that translates to programming as well...
  • Human minds can only handle a finite amount of complexity, so how sophisticated a software system can get depends on how efficiently this complexity budget is used.
    • Browse the source of the Apollo 11 mission - we've gone pretty far...
    • Some people say the amount of bugs per 1k LOC is constant
      • don't quote me on this - I read it on the internet !!!
  • Better abstractions don't necessarily degrade performance
  • Programming languages play a fundamental part in the fastest technological change of our civilization - let's pay respect to that fact
  • Java had a $500 million marketing campaign in 2003 alone... (video)
    • Python rose to the top 10 "slowly and steadily"

Demo - compiling Nim

Nim is written in Nim

 

~150 .nim files => 24 sec

(4-5 sec Nim, 20 sec C compilation)

Popularity based on GitHub stars - Rust

Rust is just being Rust :|

Popularity based on GitHub stars - Nim

A bit more curved than Rust - same 9 year span - more momentum!

https://starcharts.herokuapp.com/nim-lang/Nim

Final thoughts

  • We have only scratched the surface
  • Nim has the potential to be a major programming language
  • Version 1.0 is out - a "promise" of stability
  • Small standalone & performant binaries without a VM
  • Can easily leverage existing C/C++/JavaScript code
  • Systems programming, embedded, AAA games, enterprise, web
    • GC can be avoided, Rust-like safety can be added
  • Currently backed by Status - one of the top 100 cryptocurrencies
    • they are developing an Ethereum 2.0 client in Nim
  • Many cool things incoming
    • "incremental compilation" (module cache), REPL, and more
    • Join the community! You can have a big impact

Final Final thoughts

  • Speed of C
  • Elegance of Python
  • Power of Lisp/Perl

Can you understand my bias now? :)

guilty of not talking about functional languages :|

Q&A

Made with Slides.com