Intelligent Compilers

Alexander Tsepkov

11/22/2016

Intelligent Compilers

  • How do compilers work?
  • What's with the static typing trend?
  • What else can compilers do today?
  • What else are we likely to see soon?

What makes the current generation of compilers intelligent is the fact that they can infer information from

incomplete data.

There are Many Different Compilers

  • Human-readable code to machine code (traditional)
    • C/C++, Java, Go
  • One human-readable language to another human-readable language (transcompiler)
    • RapydScript, CoffeeScript, Haxe, TypeScript
  • Same-language compiler
    • Babel, UglifyJS, Py2to3

What's in a Compiler?

Parser

Lexer

AST

Output

Input

Output

There can be more stuff, but it's not necessary

Parser

Lexer

AST

Output

Transformer

Linter

Optimizer

SourceMaps

REPL

Input

Output

What's in a Compiler?

Parser

Lexer

AST

Output

Splits text into tokens (words)

Combines tokens into nodes (sentences)

Abstract Syntax Tree (essay)

Prints AST back in new format (translator)

Some purists will separate transformer and output, I did not.

Sometimes you'll see tokenizer instead of lexer, they're similar.

Why use Babel, TypeScript, Flow, RapydScript?

  • ES6 Features Today
  • Static Typing
  • Potential for Additional Optimizations

Why Are We Going Back to Static Typing?

Turns out more freedom is not always good

Humans and computers excel at different things.

Computers are good at ...

Repeating actions

Doing exactly what you ask them to do

Remembering things

Computer Specialties Make It Good At:

  • Pointing out when we break our promises
  • Being more stubborn than us
  • Not glazing over the details

These are great for QA

Humans are good at ...

Seeing patterns

Finding shortcuts

Being creative

Human Specialties Help Us To ...

  • Meet deadlines
  • Find optimizations
  • Avoid reinventing the wheel
  • Avoid building things from scratch

But They Also Lead to Fragile Solutions

In Software We Have a Number of Tools to Mitigate That

  • Linters
  • Unit Tests
  • Integration Tests
  • Code Reviews
  • Bug Reports
  • Logs
  • Debuggers

In Software We Have a Number of Tools to Mitigate That

  • Linters
  • Unit Tests
  • Integration Tests
  • Code Reviews
  • Bug Reports
  • Logs
  • Debuggers

But they all require setup, and in our tendency to find shortcuts we sometimes avoid them, especially with smaller projects.

But what happens if the project grows beyond its original scope and you dread refactoring it?

Enter Static Typing

Static typing occupies a niche somewhere between linting and unit tests. It will tell you if you're getting your apples and oranges mixed up, but it won't prevent you from shooting yourself in the foot.

Why Is Static Typing Good?

// Flow example

function foo(num: number) {
    if (num > 10) {
        return 'cool';
    }
}

console.log(foo(100).toString());
// error: `toString` cannot be called on possibly null value

It's like a free unit test!

If It's So Good, Why Aren't We All Using Java?

  • It impedes rapid prototyping by forcing us to think about details that may not yet be relevant (float vs double).
  • It complains about all code, even if it's not yet being used, distracting us from the code of interest.
  • In early stages of implementation, our types may not yet be accurate since we haven't yet considered all use cases and edge cases.

Enter Hybrid-Typing

It lets us start off the project with loose checks and tighten them when we're ready, without the extra setup of other testing solutions.

That doesn't mean we shouldn't be relying on other mechanisms, just that it will be less dreadful to come back to our code when we can ensure at least some safety.

A Case For Hybrid-Typing

  • Prefix calculator:    + * - 4 2 5 3
  • Stack: operator => push, 2 operands => pop 2
  • In JavaScript, the implementation is easy
  • In Java, we need to define stack type
    • Stack<int> => operators aren't supported
    • Stack<chars> => operands larger than 9 aren't supported
  • Interviewee ended up using 2 stacks
  • Better approach: wrap the token in an object

Recent Interview:

Hybrid typing drives us to better design

What else can compiler do for me?

Interfaces (TypeScript)

interface Vehicle {
    wheels: number;
    color: string;
    passengers?: number;
    move(x: number, y: number);
}

class Car implements Vehicle {
    ...
}

function isSafeToDrive(vehicle: Vehicle): boolean {
    return vehicle.wheels > 1;
}

Generics (TypeScript)

function copyFields<T extends U, U>(target: T, source: U): T {
    for (let id in source) {
        target[id] = source[id];
    }
    return target;
}

let x = { a: 1, b: 2, c: 3, d: 4 };

copyFields(x, { b: 10, d: 20 }); // okay
copyFields(x, { Q: 90 });  // error: property 'Q' isn't declared in 'x'.

Type Inference

a = {}
b = "test"

...

b()       # Error: 'b' is not callable
c = a + b # Error: can't concatenate String and Object

TypeScript:

window.onmousedown = function(mouseEvent) {
    console.log(mouseEvent.buton); // Error: Window.onmousedown
};                                 // argument (MouseEvent) has
                                   // no .buton

RapydScript:

Type Inference (RapydScript)

a = { foo: 1, bar: { baz: "qux" } }
b = { "foo": 1, bar: { baz: "qux" } }
c = 'foo'
d = 5

a == b # true, performs deep equality
a == c # false, optimized to ===
a == d # false, optimized to ===

RapydScript uses a number of tricks to achieve proper deep-equality that JavaScript lacks without the overhead that a deep equality function would introduce.

What's next?

TypeScript Isn't Perfect

function bar(c: string): string {
    return c;
}

function foo(a) {
    a = bar(a);
    return a + "test";
}

foo([]);

# no error

Enter Interstate

def bar(c: String) -> String:
    return c

def foo(a):
    a = bar(a)
    return a + "test"

foo([])

# Error: foo call triggers a callback to foo,
# which in turn triggers a callback to bar and 
# resolves type at the time of the call itself 
# rather than just at the time of declaration

(Work-in-progress state manager for RapydScript)

Enter Interstate

def baz(a, b, c):
    return a + b + c

baz(1, 2, 3)                       # 6
baz(1, 2, 3, 4)                    # error, signature mismatch
baz([], 2, 3)                      # error, array concatenation
baz('quick', c=' fox', b=' brown') # 'quick brown fox' (arg swap)

(Work-in-progress state manager for RapydScript)

Interstate

  • Evaluates types at the time of the call rather than declaration
  • Detects dead code
  • Enforces function signature, number of arguments, and resolves argument order
  • Can recommend type based on observed use
  • Thinks in probabilities rather than types (allows conditional types and smarter type inference)

Interstate

  • Evaluates types at the time of the call rather than declaration
  • Detects dead code
  • Enforces function signature, number of arguments, and resolves argument order
  • Can recommend type based on observed use
  • Thinks in probabilities rather than types (allows conditional types and smarter type inference)
def foo(a: String or Num) -> String:
    return "You passed " + a.toString()

Intersate is a New Type of Tool

Parser

Lexer

AST

Output

Parser

Lexer

AST

Output

Finds tokens

Finds relationships between tokens

Intersate is a New Type of Tool

Parser

Lexer

Interstate

Finds tokens

Finds relationships between tokens (context)

Finds relationships between contexts

Intersate is a New Type of Tool

Parser

Lexer

Interstate

Finds tokens

y

y'

y''

Intersate is a New Type of Tool

Finds relationships between tokens (context)

Finds relationships between contexts

What Else May Be Possible?

  • Partial compile-time evaluation
  • More powerful linting
  • More powerful compile-time optimizations
  • Logic inlining/unrolling
  • Operator overloading
  • Automatic conversion to WebAssembly

If you want to help me out with this, reach out:

atsepkov@gmail.com

@atsepkov

github.com/atsepkov

intelligent compilers

By Alexander Tsepkov

intelligent compilers

  • 1,097