An intro to the Unison language and compilation via partial evaluation
Paul Chiusano
@pchiusano
@unisonweb
Arya Irani
@aryairani
Part 1: an intro to Unison
Part 2: compilation via partial evaluation
Unison: motivation
Docker
Kubernetes
Terraform
Kafka
DynamoDB
S3
EC2
ElasticSearch
Kibana
Prometheus
Grafana
PagerDuty
etcd
ELB
Route 53
Consul
systemd
Flannel
Weave
Lambda
App Engine
rkt
CoreOS
Zookeeper
Redis
memcached
Protobufs
Thrift
Envoy
Mesos
Nomad
ASGs
←JSON→
Scala
Scala
"Just set up an ASG connected to ELB"
Chef
Puppet
🙁
Tons of work
🙁
Lots of unguessable + arbitrary knowledge
😀 👍
"OMG Wavelet Trees are amazing!!"
😀 👍
"Whoa! These 20 algorithms can replaced with a few star semiring generic functions!
🙁
"Configure your ELB to point to your ASG"
program any pool of distributed compute resources ...
... like it's a single computer
Scala
Unison
Unison
Functional language, open source
Lots of R&D
Working toward 1st release
Scala-based runtime
Emphasis: distributed systems
Language basics
Scala
f x (y + 1)
Unison
f(x, y + 1)
Scala
def factorial(n: Int): Int =
Stream.range(1, n + 1).foldLeft(1)(_ * _)
Unison
factorial n =
Stream.fold-left (*) 1 (Stream.range 1 (n + 1))
factorial : Number -> Number
factorial-at : Node -> Number -> Remote Number
factorial-at alice n =
at alice (Remote.pure { factorial n })
.
.
.
.
.
.
-- assuming alice : Node, bob : Node
example : Remote Number
example = do Remote
a := factorial-at alice 3
b := factorial-at bob 7
pure { a + b }
.
.
.
example : Remote Node -> Remote Number
example provision = do Remote
alice := provision
bob := provision
a := factorial-at alice 3
b := factorial-at bob 7
pure { a + b }
.
.
.
.
.
.
.
map-reduce : (a -> Remote b) -> (b -> b -> b) -> b -> Vector a -> Remote b
map-reduce f g z vs = do Remote
bfs := Remote.traverse (a -> Remote.start (f a)) vs
Vector.balanced-reduce (Remote.parallel-map2 g) b bfs
word-count : Text -> Number
word-count txt = ...
distributed-word-count : Remote Node -> Vector Text -> Remote Number
distributed-word-count provision docs =
map-reduce
(doc -> do Remote { n := provision; Remote.at' n { word-count doc }} )
(+)
0
docs
Distributed map-reduce
.
.
.
.
.
.
Sounds great! but...
at someNode hugeComputation
need: runtime deployment + compilation
LLVM, JVM bytecode? custom JIT?
simplest thing that can possibly work:
send ASTs around and interpret
But isn't that slow?
It doesn't have to be
.
Part 2: compilation via partial evaluation
Big idea: partially evaluating ("specializing") an interpreter for a program IS compilation
Futamura projections (1983)
Less known: can exploit to build "JIT for free" using plain ol' Scala / <your-lang> code
Why are interpreters slow?
-
Instruction decoding / dispatch
-
Unpredictable machine code sequence
-
Missing optimizations available to statically-compiled code
All overhead can be eliminated via partial evaluation!!
trait Expr // Vector[Double] => Vector[Double]
case class Num(d: Int, n: Double) extends Expr
case class Plus(d: Int, i: Int, j: Int) extends Expr
case class Decr(d: Int) extends Expr
case class Copy(d: Int, i: Int) extends Expr
case class Block(es: List[Expr]) extends Expr
case class Loop(haltIf0: Int, p: Expr) extends Expr
Num(0, 42) [r0, r1, r2, r3]
==> [42, r1, r2, r3]
.
.
.
.
Plus(0, 1, 2) [r0 , r1, r2, r3]
==> [r1 + r2, r1, r2, r3]
.
.
.
.
Decr(3) [r0, r1, r2, r3 ]
==> [r0, r1, r2, r3 - 1.0]
.
.
.
Copy(1, 2) [r0, r1, r2, r3]
==> [r0, r2, r2, r3]
.
.
.
.
Loop(1, Decr(1)) [r0, 12, r2, r3]
==> [r0, 0, r2, r3]
.
.
.
.
Loop(1, Block(p1, p2 ..)))
.
.
.
// expects `n` in register 0,
// puts result in register 1
val fib = Block( // var n = <fn param>
Num(1, 0.0), // var f1 = 0
Num(2, 1.0), // var f2 = 1
Loop(0, Block( // while (n != 0) {
Plus(3, 1, 2),// val tmp = f1 + f2
Copy(1, 2), // f1 = f2
Copy(2, 3), // f2 = tmp
Decr(0))) // n -= 1
) // }
Compute nth Fibonacci:
def interpret(e: Expr, m: Array[Double]): Unit = e match {
case Num(d, n) => m(d) = n
.
...
case Loop(haltIf0, p) => loop(haltIf0, p, m)
}
def loop(haltIf0: Int, e: Expr, m: Array[Double]): Unit =
while (m(haltIf0) != 0) interpret(e, m)
.
.
case Plus(d, i, j) => m(d) = m(i) + m(j)
.
-
Instruction decoding / dispatch
-
Unpredictable machine code sequence
-
Missing optimizations available to statically-compiled code
~ 10x - 50x slower
.
.
def loop(haltIf0: Expr, e: Expr, m: Array[Double]): Unit =
while (interpret(haltIf0, m) != 0) interpret(e, m)
.
.
.
.
.
.
.
.
.
case class Plus(..) extends Expr
case class Copy(..) extends Expr
case class PlusThenCopy(..) extends Expr
Non-solution: ad hoc composite instructions
case class IncrThenDotProductMinus42(..) extends Expr
If problem is just ratio of overhead : useful work ...
(aside: tracing JIT a better approach along these lines)
Instead: partial evaluation
def interpret(e: Expr, m: Array[Double]): Unit
def interpret(e: Expr): Array[Double] => Unit
def partialEval(e: Expr): Array[Double] => Unit = e match {
case Num(d, n) => m => { m(d) = n }
case Plus(d, i, j) => m => { m(d) = m(i) + m(j) }
case Loop(haltIf0, p) =>
val cp = partialEval(p)
m => while (m(haltIf0) != 0.0) cp(m)
.
.
.
.
.
case Block(es) => es match {
case e :: es =>
val ce = partialEval(e)
val ces = partialEval(Block(ces))
m => { ce(m); ces(m) }
...
}
.
.
.
Array[Double] => Unit
.
def partialEval(e: Expr): Array[Double] => Unit = e match {
case Num(d, n) => m => { m(d) = n }
case Plus(d, i, j) => m => { m(d) = m(i) + m(j) }
case Loop(haltIf0, p) =>
val cp = partialEval(p)
m => while (m(haltIf0) != 0.0) cp(m)
...
.
.
-
Instruction decoding / dispatch
-
Unpredictable machine code sequence
-
Missing optimizations available to statically-compiled code
(specialized away)
(devirtualization + inlining)
(dynamic JIT)
⟹ 1-2x*
.
Ratio of runtimes, summing 1 million numbers (lower is better)
1.0 Scala
1.85 partially-evaluated (2)
4.03 partially-evaluated
9.77 interpreted
GC, JIT
+JVM
.
.
+Flexibility
WHOA!!
The Unison runtime
-
Using this approach
-
Caveat: sensitive to representation
-
Flexible: can support proper tail calls
-
JS via Scala.js?
Array[Double] => Unit
Machine => Unit
-
Futamura projections
-
Truffle & Graal
-
Tracing: eliminate interpreter overhead via compiling common sequences of instructions ("traces")
-
RPython
Connections / related work
unisonweb.org
Other contributors / advisors: Rúnar Bjarnason, Dan Doel, Chris Gibbs, Sam Griffin, Ed Kmett, Mike Pilquist ...
@aryairani
@unisonweb
An intro to Unison and compilation via partial evaluation
By Paul Chiusano
An intro to Unison and compilation via partial evaluation
Presentation at Scala World 2017
- 3,716