Alexander Tsepkov
Software developer, entrepreneur, and creator of RapydScript language.
Alexander Tsepkov
by
10/20/2016
Beyond TypeScript, Babel, and RapydScript
Compiler takes code written in one language and converts it into code in another language, optionally optimizing it.
Parser
Lexer
AST
Output
Input
Output
Parser
Lexer
AST
Output
Transformer
Linter
Optimizer
SourceMaps
REPL
Input
Output
Parser
Lexer
AST
Output
Splits text into tokens (words)
Combines tokens into nodes (sentences)
Abstract Syntax Tree (essay)
Prints AST back in new format (translator)
Some purists will separate transformer and output, I did not.
Sometimes you'll see tokenizer instead of lexer, they're similar.
Mary had a little lamb
Tokenizer
Lexer
token
token
token
token
verb
proper noun
adjective
noun
Think Regex
Think Syntax Highlighter
function Person() {
this.name = 'Jane';
}
{
type: 'keyword',
value: 'function',
start: [0,0],
end: [0,7]
}
{
type: 'string',
value: 'Jane',
start: [1,14],
end: [1,19]
}
function Person() {
this.name = 'Jane';
}
Node {
type: 'function',
name: 'Person',
arguments: [],
block: [
Node {
type: 'assign',
left: ...,
right: ...
}
]
}
function Person() {
this.name = 'Jane';
}
body
function
arguments
body
assign
left
right
string
dot
property
object
function Person() {
this.name = 'Jane';
}
class Person {
constructor() {
this.name = 'Jane';
} }
This sounds scary until you realize that because of how the intertwined the logic is, chances are you'll still end up in the same place, you may do an extra hop or two, you may run into an occasional bug.
Consider this example...
function makeFunc() { var name = "Mo" + "zilla"; function displayName() { alert(name); } return displayName; } var myFunc = makeFunc(); myFunc();
Forget the parent function, I'm parsing this one now... until I get distracted by something else
... case "function": return function_(AST_Defun); case "if": return if_();
case "return":
if (S.in_function == 0 && !options.bare_returns)
croak("SyntaxError: 'return' outside of function");
...
This is actual UglifyJS source, there are about 50 cases in this switch statement
Now imagine if attendance sheet was printed with a printer that couldn't handle newlines, spaces, or caps:
alexsmithjohndoehomersimpsonadalovelacebillgatesstevejobspaulallenstevewozniak
How would we take attendance then?
alexsmithjohndoehomersimpsonadalovelacebillgatesstevejobspaulallenstevewozniak
First 3 problems can be solved by reusing the AST spec and having the coroutine itself pass the baton to another coroutine when it encounters position of expected node:
class Class(Scope):
properties = {
name: "[SymbolDeclaration?] the name of this class",
init: "[Function] constructor for the class",
parent: "[Class?] parent class this class inherits from",
static: "[string*] list of static methods",
external: "[boolean] true if class is declared elsewhere, but within current scope at runtime",
decorators: "[Decorator*] function decorators, if any",
module_id: "[string] The id of the module this class is defined in",
statements: "[Node*] list of statements in the class scope (excluding method definitions)",
}
4th problem is more challenging, but recursive generators may be up to the task:
function *doStuff() {
yield 1;
yield 2;
yield *doStuff();
}
Finally, having addressed all parser problems, we can use templates to make output generation cleaner:
def _print(self): return ` @${self.decorators.join('\n@')} function ${self.name}(${self.args.join(', ')}) { ${self.body.join(';\n')} } `
Developers can include independent transformers/optimizers
AST node
Parser coroutine (node factory)
AST template
Output template
Transformer
Main Thread
Tokens
Node Factories
Main Thread (lexer / AST)
token stream
Bob's AST Node
Jane's AST Node
Mary's AST Node
Greg's AST Node
Array Node
atsepkov@gmail.com
@atsepkov
github.com/atsepkov
By Alexander Tsepkov
Software developer, entrepreneur, and creator of RapydScript language.