Manipulating & Verifying Source Code


(...or trees, what are they good for?)
About Me
-
Born & Raised in Argentina.
-
OOP for a long time (mostly C# & Java).
-
FP for the last 3+ years (Erlang, Clojure).
-
Open-source contributor (e.g. Elvis).
-
Love music, singing, asado and soccer.

JUAN FACORRO
Inaka
-
Founded in 2010.
-
Acquired by Erlang Solutions in 2014.
-
End-to-end applications with Erlang, Ruby, Web, iOS and Android components.
-
Highly concurrent application servers such as Whisper and TigerText.
-
Active contributors to the open-source community.




GEt ANSWERS
Ask Questions,


GOAL
COMPILing
FORMATting
(Meta)Programing
life Itself
Refactoring
good for...?


Verifying
Manipulation
Verification
Source Code

ROADMAP

Source Code


//drunk, fix later
//magic, do not
//touch
Source Code
Tokens
Abstract Syntax Tree (AST)
(4+3)*7{lparen, "("},
{int, "4"}, {op, "+"},
{int, "3"}, {rparen, ")"},
{op, "*"}, {int, "7"}

Parsing BASICS
Source Code

Lexer
Parser
PARSING TOOLS
Source Code


CLOJURE
Source Code
tools.reader


(defn
hello
[name]
(prn "Hello " name))"(defn hello
[name]
(prn \"Hello \" name))"tools.reader.edn/read-string
String
Nested Data Structures & Values
CLOJURE
Source Code


{:op :def,
:children [:meta :init],
:meta {:op :const, ...},
:init {:op :fn, ...},
...}(defn hello
[name]
(prn "Hello " name))tools.analyzer.jvm/analyze
tools.analyzer
Nested Maps w/ Lots of Information
Nested Data Structures & Values
CLOJURE
Source Code


{:tag :playground.parsley/root,
:content [{:tag :list,
:content ["("
{:tag :symbol, ...}
{:tag :whitespace, ...}
...
{:tag :symbol, ...}
")"]}]}
"(defn hello [name] name)"(parsley/make-parser opts grammar)Nested Maps w/ Original Source
String
parsley/instaparse
ERlang
Source Code
erl_scan + erl_parse


[{'(',1}, {integer,1,3},
{'+',1}, {integer,1,4}, {')',1},
{'*',1}, {integer,1,7},
{dot,1}]
"(3 + 4) * 7."erl_scan:string/1
List of Tokens (Tuples)
String
ERlang
Source Code
erl_scan + erl_parse


[{op, 1, '*',
{op, 1, '+',
{integer, 1, 3},
{integer, 1, 4}},
{integer, 1, 7}}][{'(',1}, {integer,1,3},
{'+',1}, {integer,1,4}, {')',1},
{'*',1}, {integer,1,7}, {dot,1}]erl_parse:parse_exprs/1
Abstract Syntax Forms (Nested Tuples)
List of Tokens
ERlang
Source Code
ktn_code (erlang-katana)


#{type => root,
attrs => #{},
content => [#{type => module,
attrs => #{...}},
#{type => function,
attrs => #{arity => 1,
name => two_times},
content => [#{type => clause,
attrs => #{...},
content => []}]}]}"-module(double).
two_times(X) -> 2 * X."ktn_code:parse_tree/1
Nested Maps
String
Scala
Source Code
scalariform

tokens: List[scalariform.lexer.Token] =
List(Token(VAL,val,0,val), Token(VARID,x,4,x),
Token(EQUALS,=,6,=),
Token(INTEGER_LITERAL,1,8,1),
Token(NEWLINE,"\n",9, "\n"),
Token(VARID,x,10,x), Token(PLUS,+,12,+),
Token(INTEGER_LITERAL,2,14,2),
Token(EOF,,15,))"val x = 1
x + 2"ScalaLexer.tokenise
List of Tokens
String
Scala
Source Code
scalariform

CompilationUnit( // "val x = 1\nx + 2"
StatSeq(
None, // val x = 1
Some(FullDefOrDcl(List(), List(), PatDefOrDcl(...))),
List((Token(NEWLINE,"\n",9,"\n"),
Some(Expr(List(InfixExpr(...))))) // x + 2
),
Token(EOF,"",15,"")
)List(Token(VAL,val,0,val),
Token(VARID,x,4,x), ...)new ScalaParse().compilationUnitOrScript
CompilationUnit
(Top Level AstNode)
List of Tokens
Manipulation



RECURSION
manipulation


RECURSION
manipulation
Depth/BREADTH first search



DFS

BFS
RECURSION
manipulation
Depth/BREADTH first search



DFS

BFS
RECURSION
manipulation
Depth/BREADTH first search



DFS

BFS
RECURSION
manipulation
Depth/BREADTH first search



DFS

BFS
RECURSION
manipulation
Depth/BREADTH first search



DFS

BFS
RECURSION
manipulation
PRE/POST walk



PRE

POST
f(node) = Paint node Red
RECURSION
manipulation
PRE/POST walk



PRE

POST
f(node) = Paint node Red
RECURSION
manipulation
PRE/POST walk



PRE

POST
f(node) = Paint node Red
RECURSION
manipulation
PRE/POST walk



PRE

POST
f(node) = Paint node Red
RECURSION
manipulation
PRE/POST walk



PRE

POST
f(node) = Paint node Red
RECURSION
manipulation
PRE/POST walk



PRE

POST
f(node) = Paint node Red
RECURSION
manipulation
Custom


-
Mixing it up (BFS+DFS+PRE+POST)
-
Early pruning
-
Whatever pattern you can think of
RECURSION
manipulation
LIMITED
RIGID
Well-Known
simple


Depth/Breadth First Search
PRE/POst walk
custom

the Zipper(1)
manipulation

Functional Iterator for Immutable Data Structures
Location
Node
Path 2 Root
SIBLINGS L/R

the Zipper(2)
manipulation


[]
L[];R[]
ROOT
*

the Zipper(2)
manipulation


DOWN
[*]
l[];R[7]
ROOT
*
+

the Zipper(2)
manipulation


4
[+, *]
l[];R[3]
*
+
ROOT
DOWN
DOWN

the Zipper(2)
manipulation


*
3
[+]
l[4];R[]
+
ROOT
DOWN
RIGHT
DOWN

the Zipper(2)
manipulation


*
3
[+]
l[4];R[]
+
ROOT
DOWN
RIGHT
DOWN

the Zipper(2)
manipulation


up
ROOT
DOWN
[*]
l[];R[7]
*
+
RIGHT
DOWN

the Zipper(2)
manipulation


up
RIGHT
DOWN
ROOT
DOWN
RIGHT
[*]
l[+];R[]
*
7

the Zipper(3)
manipulation



clojure.zip
akhudek/fast-zip
inaka/zipper
ferd/zippers
scalaz.TreeLoc
scalaz.Zipper
Domain SpecifiC Languages
manipulation
($ zloc [(defn ^:% vector? | _)]
do-something)
(if (and (-> zloc z/prev z/prev z/sexpr (= "defn"))
(-> zloc z/prev z/sexpr vector?))
(do-something zloc)
zloc)zipper + CSS-like selector + pattern matching
Domain SpecifiC Languages
manipulation
($ zloc [(defn ^:% vector? | _)]
do-something)
(if (and (-> zloc z/prev z/prev z/sexpr (= "defn"))
(-> zloc z/prev z/sexpr vector?))
(do-something zloc)
zloc)zipper + CSS-like selector + pattern matching
[(defn ^:% vector? | _)](and (-> zloc z/prev z/prev z/sexpr (= "defn"))
(-> zloc z/prev z/sexpr vector?))Verification


Verification
Plain TEXT



Verification
Plain TEXT
-
Line Length
-
no Tabs, use spaces
-
Trailing whitespace
-
Space after commas, plus, etc.
-
New line at the EOF


Verification
Abstract syntax tree

-
Iterate through all nodes
-
Filter by some condition(s)
-
Optional checks per node
-
report found nodes
Basic Algorithm

Verification
-
Iterate
-
Filter
-
Opt. checks
-
report

Abstract syntax tree

Basic Algorithm
EAStwood
Verification

ELVIS

scalaStyle
Tools
PARSE
Manipulate
tools.analyzer
tools.analyzer.ast
scalariform.lexer
scalariform.parser
scalastyle.VisitorHelper
ktn_code
(erlang-katana)
inaka/zipper

DEPRECATions
Verification
(defn deprecations [{:keys [asts]} opt]
(for [ast (map #(ast/postwalk % pass/reflect-validated) asts)
dexpr (filter deprecated? (ast/nodes ast))
:let [loc (pass/code-loc (pass/nearest-ast-with-loc dexpr))]]
(util/add-loc-info loc
{:linter :deprecations
:msg (msg dexpr)})))EASTWOOD


DEPRECATions
Verification
EASTWOOD

(defn deprecations [{:keys [asts]} opt]I F O R
(defn deprecations [{:keys [asts]} opt]
(for [ast (map #(ast/postwalk % pass/reflect-validated) asts)
dexpr (filter deprecated? (ast/nodes ast))
:let [loc (pass/code-loc (pass/nearest-ast-with-loc dexpr))]]
(util/add-loc-info loc
{:linter :deprecations
:msg (msg dexpr)})))
DEPRECATions
Verification
EASTWOOD

(for [ast (map #(ast/postwalk % pass/reflect-validated) asts)
dexpr (filter deprecated? (ast/nodes ast))(defn deprecations [{:keys [asts]} opt]
(for [ast (map #(ast/postwalk % pass/reflect-validated) asts)
dexpr (filter deprecated? (ast/nodes ast))
:let [loc (pass/code-loc (pass/nearest-ast-with-loc dexpr))]]
(util/add-loc-info loc
{:linter :deprecations
:msg (msg dexpr)})))
I F O R
DEPRECATions
Verification
EASTWOOD

:let [loc (pass/code-loc (pass/nearest-ast-with-loc dexpr))]]
(util/add-loc-info loc
{:linter :deprecations
:msg (msg dexpr)})))(defn deprecations [{:keys [asts]} opt]
(for [ast (map #(ast/postwalk % pass/reflect-validated) asts)
dexpr (filter deprecated? (ast/nodes ast))
:let [loc (pass/code-loc (pass/nearest-ast-with-loc dexpr))]]
(util/add-loc-info loc
{:linter :deprecations
:msg (msg dexpr)})))
I F O R
ELVIS
Verification

No if
no_if_expression(Config, Target, _RuleConfig) ->
{Root, _} = elvis_file:parse_tree(Config, Target),
Predicate = fun(Node) -> ktn_code:type(Node) == 'if' end,
ResultFun = result_node_line_fun(?NO_IF_EXPRESSION_MSG),
case elvis_code:find(Predicate, Root) of
[] -> [];
IfExprs -> lists:map(ResultFun, IfExprs)
end.
ELVIS
Verification

No if
no_if_expression(Config, Target, _RuleConfig) ->
{Root, _} = elvis_file:parse_tree(Config, Target),
Predicate = fun(Node) -> ktn_code:type(Node) == 'if' end,
ResultFun = result_node_line_fun(?NO_IF_EXPRESSION_MSG),
case elvis_code:find(Predicate, Root) of
[] -> [];
IfExprs -> lists:map(ResultFun, IfExprs)
end.no_if_expression(Config, Target, _RuleConfig) ->
I F O R
ELVIS
Verification

No if
no_if_expression(Config, Target, _RuleConfig) ->
{Root, _} = elvis_file:parse_tree(Config, Target),
Predicate = fun(Node) -> ktn_code:type(Node) == 'if' end,
ResultFun = result_node_line_fun(?NO_IF_EXPRESSION_MSG),
case elvis_code:find(Predicate, Root) of
[] -> [];
IfExprs -> lists:map(ResultFun, IfExprs)
end. {Root, _} = elvis_file:parse_tree(Config, Target),
Predicate = fun(Node) -> ktn_code:type(Node) == 'if' end,
I F O R
ELVIS
Verification

No if
no_if_expression(Config, Target, _RuleConfig) ->
{Root, _} = elvis_file:parse_tree(Config, Target),
Predicate = fun(Node) -> ktn_code:type(Node) == 'if' end,
ResultFun = result_node_line_fun(?NO_IF_EXPRESSION_MSG),
case elvis_code:find(Predicate, Root) of
[] -> [];
IfExprs -> lists:map(ResultFun, IfExprs)
end. case elvis_code:find(Predicate, Root) of
I F O R
ELVIS
Verification

No if
no_if_expression(Config, Target, _RuleConfig) ->
{Root, _} = elvis_file:parse_tree(Config, Target),
Predicate = fun(Node) -> ktn_code:type(Node) == 'if' end,
ResultFun = result_node_line_fun(?NO_IF_EXPRESSION_MSG),
case elvis_code:find(Predicate, Root) of
[] -> [];
IfExprs -> lists:map(ResultFun, IfExprs)
end. [] -> [];
IfExprs -> lists:map(ResultFun, IfExprs)
end.
I F O R
scalaStyle
Verification
final def verify(ast: CompilationUnit): List[ScalastyleError] = {
val it = for {
t <- localvisit(ast.immediateChildren(0));
f <- traverse(t);
if (matches(f))
} yield {
PositionError(f.position.get, params(f))
}
it.toList
}
I F O R
AbstractMethodChecker
scalaStyle
Verification
final def verify(ast: CompilationUnit): List[ScalastyleError] = {
val it = for {
t <- localvisit(ast.immediateChildren(0));
f <- traverse(t);
if (matches(f))
} yield {
PositionError(f.position.get, params(f))
}
it.toList
}
I F O R
final def verify(ast: CompilationUnit): List[ScalastyleError] = {AbstractMethodChecker
scalaStyle
Verification
final def verify(ast: CompilationUnit): List[ScalastyleError] = {
val it = for {
t <- localvisit(ast.immediateChildren(0));
f <- traverse(t);
if (matches(f))
} yield {
PositionError(f.position.get, params(f))
}
it.toList
}
I F O R
val it = for {
t <- localvisit(ast.immediateChildren(0));
f <- traverse(t);AbstractMethodChecker
scalaStyle
Verification
final def verify(ast: CompilationUnit): List[ScalastyleError] = {
val it = for {
t <- localvisit(ast.immediateChildren(0));
f <- traverse(t);
if (matches(f))
} yield {
PositionError(f.position.get, params(f))
}
it.toList
}
I F O R
if (matches(f))AbstractMethodChecker
scalaStyle
Verification
final def verify(ast: CompilationUnit): List[ScalastyleError] = {
val it = for {
t <- localvisit(ast.immediateChildren(0));
f <- traverse(t);
if (matches(f))
} yield {
PositionError(f.position.get, params(f))
}
it.toList
}
I F O R
} yield {
PositionError(f.position.get, params(f))
}AbstractMethodChecker
Challenges
Verification


Challenges
Verification
Corner Cases


Example: Space After Comma
- Regex (Plain Text)
- Find Node & Check Type
- Use tokens!
[1,2, 3]"1,2, 3""1, 2, 3,"
"4,5, 6""1,2, 3"Challenges
Verification
UNDER-SPECIFICATION


EXAMPLE: Spaghetti Code

SPAGHETTI
CODE
=
?
Challenges
Verification
ESSENTIAL COMPLEXITY


EXAMPLE: Don't REPEAT YOURSELF
- Compare Every Node with Every Other Node. O(N^2)
- Ignore properties (var names, locations, etc.)
- Compare Subsets of Contiguous Expressions. O(N!?)
Challenges
Verification
AMBIGUITIES


EXAMPLE: IF vs. WHEN
(if (odd? x)
:do-this)(when (odd? x)
:do-this)vs.


Solutions (?)
Verification
-
Add Knobs (options)
-
Formal Specification
-
Learn FRom mistakes
-
Patience & Passion

-
tools ARE READY & Available
-
All asts conceptually similar
-
useful applications
-
Tip of the iceberg
-
lots of fun

Questions?

@jfacorro

jfacorro

Cool Stuff/REFERENCES
- Data All the ASTs - Timothy Baldridge.
- The Zipper - Gerard Huet.
- Eastwood - Clojure lint tool.
- Elvis - Erlang style reviewer.
- Scalastyle - Scala style checker.
- Grok - Steve Yegge.
- inaka.github.io - Inaka Open Source.
- jai - Manipulate source code like DOM.

ASK QUESTIONS, GET ANSWERS

Thank You!
Manipulating & Verifying Source Code
By Juan Facorro
Manipulating & Verifying Source Code
Lambda Jam 2015
- 872