An intro to the Unison language and compilation via partial evaluation

Paul Chiusano

@pchiusano

@unisonweb

Arya Irani

@aryairani

Part 1: an intro to Unison

Part 2: compilation via partial evaluation

Unison: motivation

Docker

Kubernetes

Terraform

Kafka

DynamoDB

S3

EC2

ElasticSearch

Kibana

Prometheus

Grafana

PagerDuty

etcd

ELB

Route 53

Consul

systemd

Flannel

Weave

Lambda

App Engine

rkt

CoreOS

Zookeeper

Redis

memcached

Protobufs

Thrift

Envoy

Mesos

Nomad

ASGs

←JSON→

Scala

Scala

"Just set up an ASG connected to ELB"

Chef

Puppet

🙁

Tons of work

🙁

Lots of unguessable + arbitrary knowledge

😀 👍

"OMG Wavelet Trees are amazing!!"

😀 👍

"Whoa! These 20 algorithms can replaced with a few star semiring generic functions!

🙁

"Configure your ELB to point to your ASG"

program any pool of distributed compute resources ...

... like it's a single computer

Scala

Unison

Unison

Functional language, open source

Lots of R&D

Working toward 1st release

Scala-based runtime

Emphasis: distributed systems

Language basics

Scala

f x (y + 1)

Unison

f(x, y + 1)

Scala

def factorial(n: Int): Int =
  Stream.range(1, n + 1).foldLeft(1)(_ * _)

Unison

factorial n =
  Stream.fold-left (*) 1 (Stream.range 1 (n + 1))
factorial : Number -> Number
factorial-at : Node -> Number -> Remote Number
factorial-at alice n = 
  at alice (Remote.pure { factorial n })

.

.

.

.

.

.

-- assuming alice : Node, bob : Node
example : Remote Number
example = do Remote
  a := factorial-at alice 3
  b := factorial-at bob 7
  pure { a + b }

.

.

.

example : Remote Node -> Remote Number
example provision = do Remote
  alice := provision
  bob := provision
  a := factorial-at alice 3
  b := factorial-at bob 7
  pure { a + b }

.

.

.

.

.

.

.

map-reduce : (a -> Remote b) -> (b -> b -> b) -> b -> Vector a -> Remote b
map-reduce f g z vs = do Remote
  bfs := Remote.traverse (a -> Remote.start (f a)) vs
  Vector.balanced-reduce (Remote.parallel-map2 g) b bfs
word-count : Text -> Number
word-count txt = ...

distributed-word-count : Remote Node -> Vector Text -> Remote Number
distributed-word-count provision docs =
  map-reduce
    (doc -> do Remote { n := provision; Remote.at' n { word-count doc }} )
    (+)
    0
    docs

Distributed map-reduce

.

.

.

.

.

.

Sounds great! but...

at someNode hugeComputation

need: runtime deployment + compilation

LLVM, JVM bytecode? custom JIT?

simplest thing that can possibly work:

send ASTs around and interpret

But isn't that slow?

It doesn't have to be

.

Part 2: compilation via partial evaluation

Big idea: partially evaluating ("specializing") an interpreter for a program IS compilation

Less known: can exploit to build "JIT for free" using plain ol' Scala / <your-lang> code

Why are interpreters slow?

  1. Instruction decoding / dispatch

  2. Unpredictable machine code sequence

  3. Missing optimizations available to statically-compiled code

All overhead can be eliminated via partial evaluation!!

trait Expr // Vector[Double] => Vector[Double]
case class Num(d: Int, n: Double) extends Expr
case class Plus(d: Int, i: Int, j: Int) extends Expr
case class Decr(d: Int) extends Expr
case class Copy(d: Int, i: Int) extends Expr
case class Block(es: List[Expr]) extends Expr
case class Loop(haltIf0: Int, p: Expr) extends Expr
Num(0, 42) [r0, r1, r2, r3] 
       ==> [42, r1, r2, r3]

.

.

.

.

Plus(0, 1, 2) [r0     , r1, r2, r3] 
          ==> [r1 + r2, r1, r2, r3]

.

.

.

.

Decr(3) [r0, r1, r2, r3      ] 
    ==> [r0, r1, r2, r3 - 1.0]

.

.

.

Copy(1, 2) [r0, r1, r2, r3] 
       ==> [r0, r2, r2, r3]

.

.

.

.

Loop(1, Decr(1)) [r0, 12, r2, r3] 
             ==> [r0,  0, r2, r3]

.

.

.

.

Loop(1, Block(p1, p2 ..)))

.

.

.

// expects `n` in register 0, 
// puts result in register 1
val fib = Block(  // var n = <fn param>
  Num(1, 0.0),    // var f1 = 0
  Num(2, 1.0),    // var f2 = 1
  Loop(0, Block(  // while (n != 0) {
    Plus(3, 1, 2),//   val tmp = f1 + f2
    Copy(1, 2),   //   f1 = f2
    Copy(2, 3),   //   f2 = tmp
    Decr(0)))     //   n -= 1
)                 // }

Compute nth Fibonacci:

def interpret(e: Expr, m: Array[Double]): Unit = e match {
  case Num(d, n) => m(d) = n

.

  ...
  case Loop(haltIf0, p) => loop(haltIf0, p, m)
}
def loop(haltIf0: Int, e: Expr, m: Array[Double]): Unit =
  while (m(haltIf0) != 0) interpret(e, m)

.

.

  case Plus(d, i, j) => m(d) = m(i) + m(j)

.

  1. Instruction decoding / dispatch

  2. Unpredictable machine code sequence

  3. Missing optimizations available to statically-compiled code

~ 10x - 50x slower

.

.

def loop(haltIf0: Expr, e: Expr, m: Array[Double]): Unit =
  while (interpret(haltIf0, m) != 0) interpret(e, m)

.

.

.

.

.

.

.

.

.

case class Plus(..) extends Expr
case class Copy(..) extends Expr
case class PlusThenCopy(..) extends Expr

Non-solution: ad hoc composite instructions

case class IncrThenDotProductMinus42(..) extends Expr

If problem is just ratio of  overhead : useful work ...

(aside: tracing JIT a better approach along these lines)

Instead: partial evaluation

def interpret(e: Expr, m: Array[Double]): Unit
def interpret(e: Expr): Array[Double] => Unit
def partialEval(e: Expr): Array[Double] => Unit = e match {
  case Num(d, n) => m => { m(d) = n }
  case Plus(d, i, j) => m => { m(d) = m(i) + m(j) }
  case Loop(haltIf0, p) => 
    val cp = partialEval(p)
    m => while (m(haltIf0) != 0.0) cp(m)

.

.

.

.

.

  case Block(es) => es match {
    case e :: es => 
      val ce = partialEval(e)
      val ces = partialEval(Block(ces))
      m => { ce(m); ces(m) }
    ...
  }

.

.

.

Array[Double] => Unit

.

def partialEval(e: Expr): Array[Double] => Unit = e match {
  case Num(d, n) => m => { m(d) = n }
  case Plus(d, i, j) => m => { m(d) = m(i) + m(j) }
  case Loop(haltIf0, p) => 
    val cp = partialEval(p)
    m => while (m(haltIf0) != 0.0) cp(m)
  ...

.

.

  1. Instruction decoding / dispatch

  2. Unpredictable machine code sequence

  3. Missing optimizations available to statically-compiled code

(specialized away)

(devirtualization + inlining)

(dynamic JIT)

⟹ 1-2x*

.

   Ratio of runtimes, summing 1 million numbers (lower is better)

1.0             Scala
1.85            partially-evaluated (2)
4.03            partially-evaluated
9.77            interpreted

GC, JIT

+JVM

.

.

+Flexibility

WHOA!!

The Unison runtime

  • Using this approach

  • Caveat: sensitive to representation

  • Flexible: can support proper tail calls

  • JS via Scala.js?

Array[Double] => Unit
Machine => Unit

Connections / related work

unisonweb.org

Other contributors / advisors: Rúnar Bjarnason, Dan Doel, Chris Gibbs, Sam Griffin, Ed Kmett, Mike Pilquist ...

@aryairani

@unisonweb