Manipulating & Verifying Source Code
(...or trees, what are they good for?)
About Me
-
Born & Raised in Argentina.
-
OOP for a long time (mostly C# & Java).
-
FP for the last 3+ years (Erlang, Clojure).
-
Open-source contributor (e.g. Elvis).
-
Love music, singing, asado and soccer.
JUAN FACORRO
Inaka
-
Founded in 2010.
-
Acquired by Erlang Solutions in 2014.
-
End-to-end applications with Erlang, Ruby, Web, iOS and Android components.
-
Highly concurrent application servers such as Whisper and TigerText.
-
Active contributors to the open-source community.
GEt ANSWERS
Ask Questions,
GOAL
COMPILing
FORMATting
(Meta)Programing
life Itself
Refactoring
good for...?
Verifying
Manipulation
Verification
Source Code
ROADMAP
Source Code
//drunk, fix later
//magic, do not
//touch
Source Code
Tokens
Abstract Syntax Tree (AST)
(4+3)*7
{lparen, "("},
{int, "4"}, {op, "+"},
{int, "3"}, {rparen, ")"},
{op, "*"}, {int, "7"}
Parsing BASICS
Source Code
Lexer
Parser
PARSING TOOLS
Source Code
CLOJURE
Source Code
tools.reader
(defn
hello
[name]
(prn "Hello " name))
"(defn hello
[name]
(prn \"Hello \" name))"
tools.reader.edn/read-string
String
Nested Data Structures & Values
CLOJURE
Source Code
{:op :def,
:children [:meta :init],
:meta {:op :const, ...},
:init {:op :fn, ...},
...}
(defn hello
[name]
(prn "Hello " name))
tools.analyzer.jvm/analyze
tools.analyzer
Nested Maps w/ Lots of Information
Nested Data Structures & Values
CLOJURE
Source Code
{:tag :playground.parsley/root,
:content [{:tag :list,
:content ["("
{:tag :symbol, ...}
{:tag :whitespace, ...}
...
{:tag :symbol, ...}
")"]}]}
"(defn hello [name] name)"
(parsley/make-parser opts grammar)
Nested Maps w/ Original Source
String
parsley/instaparse
ERlang
Source Code
erl_scan + erl_parse
[{'(',1}, {integer,1,3},
{'+',1}, {integer,1,4}, {')',1},
{'*',1}, {integer,1,7},
{dot,1}]
"(3 + 4) * 7."
erl_scan:string/1
List of Tokens (Tuples)
String
ERlang
Source Code
erl_scan + erl_parse
[{op, 1, '*',
{op, 1, '+',
{integer, 1, 3},
{integer, 1, 4}},
{integer, 1, 7}}]
[{'(',1}, {integer,1,3},
{'+',1}, {integer,1,4}, {')',1},
{'*',1}, {integer,1,7}, {dot,1}]
erl_parse:parse_exprs/1
Abstract Syntax Forms (Nested Tuples)
List of Tokens
ERlang
Source Code
ktn_code (erlang-katana)
#{type => root,
attrs => #{},
content => [#{type => module,
attrs => #{...}},
#{type => function,
attrs => #{arity => 1,
name => two_times},
content => [#{type => clause,
attrs => #{...},
content => []}]}]}
"-module(double).
two_times(X) -> 2 * X."
ktn_code:parse_tree/1
Nested Maps
String
Scala
Source Code
scalariform
tokens: List[scalariform.lexer.Token] =
List(Token(VAL,val,0,val), Token(VARID,x,4,x),
Token(EQUALS,=,6,=),
Token(INTEGER_LITERAL,1,8,1),
Token(NEWLINE,"\n",9, "\n"),
Token(VARID,x,10,x), Token(PLUS,+,12,+),
Token(INTEGER_LITERAL,2,14,2),
Token(EOF,,15,))
"val x = 1
x + 2"
ScalaLexer.tokenise
List of Tokens
String
Scala
Source Code
scalariform
CompilationUnit( // "val x = 1\nx + 2"
StatSeq(
None, // val x = 1
Some(FullDefOrDcl(List(), List(), PatDefOrDcl(...))),
List((Token(NEWLINE,"\n",9,"\n"),
Some(Expr(List(InfixExpr(...))))) // x + 2
),
Token(EOF,"",15,"")
)
List(Token(VAL,val,0,val),
Token(VARID,x,4,x), ...)
new ScalaParse().compilationUnitOrScript
CompilationUnit
(Top Level AstNode)
List of Tokens
Manipulation
RECURSION
manipulation
RECURSION
manipulation
Depth/BREADTH first search
DFS
BFS
RECURSION
manipulation
Depth/BREADTH first search
DFS
BFS
RECURSION
manipulation
Depth/BREADTH first search
DFS
BFS
RECURSION
manipulation
Depth/BREADTH first search
DFS
BFS
RECURSION
manipulation
Depth/BREADTH first search
DFS
BFS
RECURSION
manipulation
PRE/POST walk
PRE
POST
f(node) = Paint node Red
RECURSION
manipulation
PRE/POST walk
PRE
POST
f(node) = Paint node Red
RECURSION
manipulation
PRE/POST walk
PRE
POST
f(node) = Paint node Red
RECURSION
manipulation
PRE/POST walk
PRE
POST
f(node) = Paint node Red
RECURSION
manipulation
PRE/POST walk
PRE
POST
f(node) = Paint node Red
RECURSION
manipulation
PRE/POST walk
PRE
POST
f(node) = Paint node Red
RECURSION
manipulation
Custom
-
Mixing it up (BFS+DFS+PRE+POST)
-
Early pruning
-
Whatever pattern you can think of
RECURSION
manipulation
LIMITED
RIGID
Well-Known
simple
Depth/Breadth First Search
PRE/POst walk
custom
the Zipper(1)
manipulation
Functional Iterator for Immutable Data Structures
Location
Node
Path 2 Root
SIBLINGS L/R
the Zipper(2)
manipulation
[]
L[];R[]
ROOT
*
the Zipper(2)
manipulation
DOWN
[*]
l[];R[7]
ROOT
*
+
the Zipper(2)
manipulation
4
[+, *]
l[];R[3]
*
+
ROOT
DOWN
DOWN
the Zipper(2)
manipulation
*
3
[+]
l[4];R[]
+
ROOT
DOWN
RIGHT
DOWN
the Zipper(2)
manipulation
*
3
[+]
l[4];R[]
+
ROOT
DOWN
RIGHT
DOWN
the Zipper(2)
manipulation
up
ROOT
DOWN
[*]
l[];R[7]
*
+
RIGHT
DOWN
the Zipper(2)
manipulation
up
RIGHT
DOWN
ROOT
DOWN
RIGHT
[*]
l[+];R[]
*
7
the Zipper(3)
manipulation
clojure.zip
akhudek/fast-zip
inaka/zipper
ferd/zippers
scalaz.TreeLoc
scalaz.Zipper
Domain SpecifiC Languages
manipulation
($ zloc [(defn ^:% vector? | _)]
do-something)
(if (and (-> zloc z/prev z/prev z/sexpr (= "defn"))
(-> zloc z/prev z/sexpr vector?))
(do-something zloc)
zloc)
zipper + CSS-like selector + pattern matching
Domain SpecifiC Languages
manipulation
($ zloc [(defn ^:% vector? | _)]
do-something)
(if (and (-> zloc z/prev z/prev z/sexpr (= "defn"))
(-> zloc z/prev z/sexpr vector?))
(do-something zloc)
zloc)
zipper + CSS-like selector + pattern matching
[(defn ^:% vector? | _)]
(and (-> zloc z/prev z/prev z/sexpr (= "defn"))
(-> zloc z/prev z/sexpr vector?))
Verification
Verification
Plain TEXT
Verification
Plain TEXT
-
Line Length
-
no Tabs, use spaces
-
Trailing whitespace
-
Space after commas, plus, etc.
-
New line at the EOF
Verification
Abstract syntax tree
-
Iterate through all nodes
-
Filter by some condition(s)
-
Optional checks per node
-
report found nodes
Basic Algorithm
Verification
-
Iterate
-
Filter
-
Opt. checks
-
report
Abstract syntax tree
Basic Algorithm
EAStwood
Verification
ELVIS
scalaStyle
Tools
PARSE
Manipulate
tools.analyzer
tools.analyzer.ast
scalariform.lexer
scalariform.parser
scalastyle.VisitorHelper
ktn_code
(erlang-katana)
inaka/zipper
DEPRECATions
Verification
(defn deprecations [{:keys [asts]} opt]
(for [ast (map #(ast/postwalk % pass/reflect-validated) asts)
dexpr (filter deprecated? (ast/nodes ast))
:let [loc (pass/code-loc (pass/nearest-ast-with-loc dexpr))]]
(util/add-loc-info loc
{:linter :deprecations
:msg (msg dexpr)})))
EASTWOOD
DEPRECATions
Verification
EASTWOOD
(defn deprecations [{:keys [asts]} opt]
I F O R
(defn deprecations [{:keys [asts]} opt]
(for [ast (map #(ast/postwalk % pass/reflect-validated) asts)
dexpr (filter deprecated? (ast/nodes ast))
:let [loc (pass/code-loc (pass/nearest-ast-with-loc dexpr))]]
(util/add-loc-info loc
{:linter :deprecations
:msg (msg dexpr)})))
DEPRECATions
Verification
EASTWOOD
(for [ast (map #(ast/postwalk % pass/reflect-validated) asts)
dexpr (filter deprecated? (ast/nodes ast))
(defn deprecations [{:keys [asts]} opt]
(for [ast (map #(ast/postwalk % pass/reflect-validated) asts)
dexpr (filter deprecated? (ast/nodes ast))
:let [loc (pass/code-loc (pass/nearest-ast-with-loc dexpr))]]
(util/add-loc-info loc
{:linter :deprecations
:msg (msg dexpr)})))
I F O R
DEPRECATions
Verification
EASTWOOD
:let [loc (pass/code-loc (pass/nearest-ast-with-loc dexpr))]]
(util/add-loc-info loc
{:linter :deprecations
:msg (msg dexpr)})))
(defn deprecations [{:keys [asts]} opt]
(for [ast (map #(ast/postwalk % pass/reflect-validated) asts)
dexpr (filter deprecated? (ast/nodes ast))
:let [loc (pass/code-loc (pass/nearest-ast-with-loc dexpr))]]
(util/add-loc-info loc
{:linter :deprecations
:msg (msg dexpr)})))
I F O R
ELVIS
Verification
No if
no_if_expression(Config, Target, _RuleConfig) ->
{Root, _} = elvis_file:parse_tree(Config, Target),
Predicate = fun(Node) -> ktn_code:type(Node) == 'if' end,
ResultFun = result_node_line_fun(?NO_IF_EXPRESSION_MSG),
case elvis_code:find(Predicate, Root) of
[] -> [];
IfExprs -> lists:map(ResultFun, IfExprs)
end.
ELVIS
Verification
No if
no_if_expression(Config, Target, _RuleConfig) ->
{Root, _} = elvis_file:parse_tree(Config, Target),
Predicate = fun(Node) -> ktn_code:type(Node) == 'if' end,
ResultFun = result_node_line_fun(?NO_IF_EXPRESSION_MSG),
case elvis_code:find(Predicate, Root) of
[] -> [];
IfExprs -> lists:map(ResultFun, IfExprs)
end.
no_if_expression(Config, Target, _RuleConfig) ->
I F O R
ELVIS
Verification
No if
no_if_expression(Config, Target, _RuleConfig) ->
{Root, _} = elvis_file:parse_tree(Config, Target),
Predicate = fun(Node) -> ktn_code:type(Node) == 'if' end,
ResultFun = result_node_line_fun(?NO_IF_EXPRESSION_MSG),
case elvis_code:find(Predicate, Root) of
[] -> [];
IfExprs -> lists:map(ResultFun, IfExprs)
end.
{Root, _} = elvis_file:parse_tree(Config, Target),
Predicate = fun(Node) -> ktn_code:type(Node) == 'if' end,
I F O R
ELVIS
Verification
No if
no_if_expression(Config, Target, _RuleConfig) ->
{Root, _} = elvis_file:parse_tree(Config, Target),
Predicate = fun(Node) -> ktn_code:type(Node) == 'if' end,
ResultFun = result_node_line_fun(?NO_IF_EXPRESSION_MSG),
case elvis_code:find(Predicate, Root) of
[] -> [];
IfExprs -> lists:map(ResultFun, IfExprs)
end.
case elvis_code:find(Predicate, Root) of
I F O R
ELVIS
Verification
No if
no_if_expression(Config, Target, _RuleConfig) ->
{Root, _} = elvis_file:parse_tree(Config, Target),
Predicate = fun(Node) -> ktn_code:type(Node) == 'if' end,
ResultFun = result_node_line_fun(?NO_IF_EXPRESSION_MSG),
case elvis_code:find(Predicate, Root) of
[] -> [];
IfExprs -> lists:map(ResultFun, IfExprs)
end.
[] -> [];
IfExprs -> lists:map(ResultFun, IfExprs)
end.
I F O R
scalaStyle
Verification
final def verify(ast: CompilationUnit): List[ScalastyleError] = {
val it = for {
t <- localvisit(ast.immediateChildren(0));
f <- traverse(t);
if (matches(f))
} yield {
PositionError(f.position.get, params(f))
}
it.toList
}
I F O R
AbstractMethodChecker
scalaStyle
Verification
final def verify(ast: CompilationUnit): List[ScalastyleError] = {
val it = for {
t <- localvisit(ast.immediateChildren(0));
f <- traverse(t);
if (matches(f))
} yield {
PositionError(f.position.get, params(f))
}
it.toList
}
I F O R
final def verify(ast: CompilationUnit): List[ScalastyleError] = {
AbstractMethodChecker
scalaStyle
Verification
final def verify(ast: CompilationUnit): List[ScalastyleError] = {
val it = for {
t <- localvisit(ast.immediateChildren(0));
f <- traverse(t);
if (matches(f))
} yield {
PositionError(f.position.get, params(f))
}
it.toList
}
I F O R
val it = for {
t <- localvisit(ast.immediateChildren(0));
f <- traverse(t);
AbstractMethodChecker
scalaStyle
Verification
final def verify(ast: CompilationUnit): List[ScalastyleError] = {
val it = for {
t <- localvisit(ast.immediateChildren(0));
f <- traverse(t);
if (matches(f))
} yield {
PositionError(f.position.get, params(f))
}
it.toList
}
I F O R
if (matches(f))
AbstractMethodChecker
scalaStyle
Verification
final def verify(ast: CompilationUnit): List[ScalastyleError] = {
val it = for {
t <- localvisit(ast.immediateChildren(0));
f <- traverse(t);
if (matches(f))
} yield {
PositionError(f.position.get, params(f))
}
it.toList
}
I F O R
} yield {
PositionError(f.position.get, params(f))
}
AbstractMethodChecker
Challenges
Verification
Challenges
Verification
Corner Cases
Example: Space After Comma
- Regex (Plain Text)
- Find Node & Check Type
- Use tokens!
[1,2, 3]
"1,2, 3"
"1, 2, 3,"
"4,5, 6"
"1,2, 3"
Challenges
Verification
UNDER-SPECIFICATION
EXAMPLE: Spaghetti Code
SPAGHETTI
CODE
=
?
Challenges
Verification
ESSENTIAL COMPLEXITY
EXAMPLE: Don't REPEAT YOURSELF
- Compare Every Node with Every Other Node. O(N^2)
- Ignore properties (var names, locations, etc.)
- Compare Subsets of Contiguous Expressions. O(N!?)
Challenges
Verification
AMBIGUITIES
EXAMPLE: IF vs. WHEN
(if (odd? x)
:do-this)
(when (odd? x)
:do-this)
vs.
Solutions (?)
Verification
-
Add Knobs (options)
-
Formal Specification
-
Learn FRom mistakes
-
Patience & Passion
-
tools ARE READY & Available
-
All asts conceptually similar
-
useful applications
-
Tip of the iceberg
-
lots of fun
Questions?
@jfacorro
jfacorro
Cool Stuff/REFERENCES
- Data All the ASTs - Timothy Baldridge.
- The Zipper - Gerard Huet.
- Eastwood - Clojure lint tool.
- Elvis - Erlang style reviewer.
- Scalastyle - Scala style checker.
- Grok - Steve Yegge.
- inaka.github.io - Inaka Open Source.
- jai - Manipulate source code like DOM.
ASK QUESTIONS, GET ANSWERS
Thank You!
Manipulating & Verifying Source Code
By Juan Facorro
Manipulating & Verifying Source Code
Lambda Jam 2015
- 789