SinScheme
A Compiler For No One
Presented By: DAVIS SILVERMAN
What is a Compiler?
A Text Processor
Source Language Code
Destination
Language
Code
??????
LISP
LISt Processing
Lots of Irritating, Silly Parens
(define bar 5)
(define (square n) (* n n))
(define values
(let ([xs '(1 2 3 4 5)])
(map (lambda (e) (+ (square e) 1)) xs)))
Lisp (Racket)
(* 2 (+ 3 4))
Compiling Lisp
- Super easy to parse!
- Only have to worry about actual code generation
- Lisp is well understood academically
- Lisp is easily understood by beginners
LLVM
Generating an Executable is HARD
- Register allocation + other academic problems
- Platform dependencies
- Assembly is not a great language
LLVM Makes Generating Executables Easy!
- Type safe assembly language
- Many supported platforms
- Handles all 'real' code generation
Enter SinScheme
Scheme -> LLVM IR Compiler
https://github.com/sinistersnare/sinscheme
SinScheme
A Tour of the various compilation phases
Functional Compilers
- Much Like functional programming, about breaking code into many smaller problems, and solving each individually
- Many phases of compilation
- Lots of PL theory here!!
e ::= (define x e)
| (define (x x ... defaultparam ...) e ...+)
| (define (x x ... . x) e ...+)
| (letrec* ([x e] ...) e ...+)
| (letrec ([x e] ...) e ...+)
| (let* ([x e] ...) e ...+)
| (let ([x e] ...) e ...+)
| (let x ([x e] ...) e ...+)
| (lambda (x ... defaultparam ...) e ...+)
| (lambda x e ...+)
| (lambda (x ...+ . x) e ...+)
| (dynamic-wind e e e)
| (guard (x cond-clause ...) e ...+)
| (raise e)
| (delay e)
| (force e)
| (and e ...)
| (or e ...)
| (match e match-clause ...)
| (cond cond-clause ...)
| (case e case-clause ...)
| (if e e e)
| (when e e ...+)
| (unless e e ...+)
| (set! x e)
| (begin e ...+)
| (call/cc e)
| (apply e e)
| (e e ...)
| x
| op
| (quasiquote qq)
| (quote dat)
| nat | string | #t | #f
cond-clause ::= (e) | (e e e ...) | (else e e ...)
case-clause ::= ((dat ...) e e ...) | (else e e ...)
match-clause ::= (pat e e ...) | (else e e ...)
; in all cases, else clauses must come last
dat is a datum satisfying datum? from utils.rkt
x is a variable (satisfies symbol?)
defaultparam ::= (x e)
op is a symbol satisfying prim? from utils.rkt (if not otherwise in scope)
op ::= promise? | null? | cons | car | + | ... (see utils.rkt)
qq ::= e | dat | (unquote qq) | (unquote e) | (quasiquote qq)
| (qq ...+) | (qq ...+ . qq)
;; (quasiquote has the same semantics as in Racket)
pat ::= nat | string | #t | #f | (quote dat) | x | (? e pat) | (cons pat pat) | (quasiquote qqpat)
qqpat ::= e | dat | (unquote qqpat) | (unquote pat) | (quasiquote qq)
| (qq ...+) | (qq ...+ . qq)
;; (same semantics as Racket match for this subset of patterns)
Top Level Translations
- Removes pattern matching
- Quotes all datums
- Removes all defines
- *creating one big giant expression*
- Reminiscent of the Lambda Calculus
- *creating one big giant expression*
e ::= (letrec* ([x e] ...) e)
| (letrec ([x e] ...) e)
| (let* ([x e] ...) e)
| (let ([x e] ...) e)
| (let x ([x e] ...) e)
| (lambda (x ...) e)
| (lambda x e)
| (lambda (x ...+ . x) e)
| (dynamic-wind e e e)
| (guard (x cond-clause ...) e)
| (raise e)
| (delay e)
| (force e)
| (and e ...)
| (or e ...)
| (cond cond-clause ...)
| (case e case-clause ...)
| (if e e e)
| (when e e)
| (unless e e)
| (set! x e)
| (begin e ...+)
| (call/cc e)
| (apply e e)
| (e e ...)
| x
| op
| (quote dat)
cond-clause ::= (e) | (e e) | (else e) ; in all test cases
case-clause ::= ((dat ...) e) | (else e) ; else clauses always come last
dat is a datum satisfying datum? from utils.rkt
x is a variable (satisfies symbol?)
op is a symbol satisfying prim? from utils.rkt (if not otherwise in scope)
op ::= promise? | null? | cons | car | + | ... (see utils.rkt)
Desugaring
- Removes some unneeded sugar
- Turns all bindings into let bindings
- Desugars promises and exception handling
e ::= (let ([x e] ...) e)
| (lambda (x ...) e)
| (lambda x e)
| (apply e e)
| (e e ...)
| (prim op e ...)
| (apply-prim op e)
| (if e e e)
| (set! x e)
| (call/cc e)
| x
| (quote dat)
dat is a datum satisfying datum? from utils.rkt
x is a variable (satisfies symbol?)
op is a symbol satisfying prim? from utils.rkt (if not otherwise in scope)
op ::= promise? | null? | cons | car | + | ... (see utils.rkt)
Assignment Conversion
- Removes `set!` AKA mutation from the language
Alphatization
- Ensures that there is no variable shadowing
- This allows us to de-nest all `let` forms, as there will be no ambiguity
e ::= (let ([x e] ...) e)
| (lambda (x ...) e)
| (lambda x e)
| (apply e e)
| (e e ...)
| (prim op e ...)
| (apply-prim op e)
| (if e e e)
| (call/cc e)
| x
| (quote dat)
dat is a datum satisfying datum? from utils.rkt
x is a variable (satisfies symbol?)
op is a symbol satisfying prim? from utils.rkt (if not otherwise in scope)
op ::= promise? | null? | cons | car | + | ... (see utils.rkt)
Administrative Normal Form
- Lifts all 'complex' expressions into let bindings
- This forces an evaluation order for all expressions
e ::= (let ([x e]) e)
| (apply ae ae)
| (ae ae ...)
| (prim op ae ...)
| (apply-prim op ae)
| (if ae e e)
| (call/cc ae)
| ae
ae ::= (lambda (x ...) e)
| (lambda x e)
| x
| (quote dat)
Continuation Passing Style
- Turns all functions tail-recursive
- Program no longer ever returns
- At end of chain, we halt
- This allows us to use TRE for all functions
e ::= (let ([x (apply-prim op ae)]) e)
| (let ([x (prim op ae ...)]) e)
| (let ([x (lambda (x ...) e)]) e)
| (let ([x (lambda x e)]) e)
| (let ([x (quote dat)]) e)
| (apply ae ae)
| (ae ae ...)
| (if ae e e)
ae ::= (lambda (x ...) e)
| (lambda x e)
| x
| (quote dat)
Closure Conversion
- Turns lisp into a more imperative procedure based language
- Lift all lambdas to top-level again, with explicit environments
- Executable code simply calls the various lambdas
p ::= ((proc (x x ...) e) ...)
e ::= (let ([x (apply-prim op x)]) e)
| (let ([x (prim op x ...)]) e)
| (let ([x (make-closure x x ...)]) e)
| (let ([x (env-ref x nat)]) e)
| (let ([x (quote dat)]) e)
| (clo-app x x ...)
| (if x e e)
dat is a datum satisfying datum? from utils.rkt
x is a variable (satisfies symbol?)
op is a symbol satisfying prim? from utils.rkt (if not already removed)
nat is a natural number satisfying natural? or integer?
LLVM Code Emission
- Proc-language -> LLVM code is easy
- Simple transformations depending on token to LLVM IR code
- Also ensures all variables are stack allocated for garbage collection***
SinScheme Runtime
- C++ code
- Generates code that is used by the language
- Primitives written here
- Garbage collection + Object layout
Code Review?!?!?
- Anyone Interested in any part in particular?
- Runtime is super fun!
- Would love to talk about garbage collection if anyone is interested
Thanks!
Final Questions?
Find me @Sinistersnare
CompilersPresentation
By Davis Silverman
CompilersPresentation
- 859