Manipulating & Verifying Source Code

(...or trees, what are they good for?)

About Me

  • Born & Raised in Argentina.

  • OOP for a long time (mostly C# & Java).

  • FP for the last 3+ years (Erlang, Clojure).

  • Open-source contributor (e.g. Elvis).

  • Love music, singing, asado and soccer.

JUAN FACORRO

Inaka

  • Founded in 2010.

  • Acquired by Erlang Solutions in 2014.

  • End-to-end applications with Erlang, Ruby, Web, iOS and Android components.

  • Highly concurrent application servers such as Whisper and TigerText.

  • Active contributors to the open-source community.

GEt ANSWERS

Ask Questions,

GOAL

COMPILing

FORMATting

(Meta)Programing

life Itself

Refactoring

good for...?

Verifying

Manipulation

Verification

Source Code

ROADMAP

Source Code

//drunk, fix later

//magic, do not 
//touch

Source Code

Tokens

Abstract Syntax Tree (AST)

(4+3)*7

{lparen, "("}

{int, "4"}, {op, "+"}

{int, "3"}{rparen, ")"}

{op, "*"},  {int, "7"}

Parsing BASICS

Source Code

Lexer

Parser

PARSING TOOLS

Source Code

CLOJURE

Source Code

tools.reader

(defn 
 hello
 [name] 
 (prn "Hello " name))
"(defn hello 
   [name] 
   (prn \"Hello \" name))"
tools.reader.edn/read-string

String

Nested Data Structures & Values

CLOJURE

Source Code

{:op       :def,
 :children [:meta :init],
 :meta     {:op :const, ...},
 :init     {:op :fn, ...},
 ...}
(defn hello
 [name] 
 (prn "Hello " name))
tools.analyzer.jvm/analyze

tools.analyzer

Nested Maps w/ Lots of Information

Nested Data Structures & Values

CLOJURE

Source Code

{:tag     :playground.parsley/root,
 :content [{:tag     :list,
            :content ["("
                      {:tag :symbol, ...}
                      {:tag :whitespace, ...}
                      ...
                      {:tag :symbol, ...}
                      ")"]}]}

"(defn hello [name] name)"
(parsley/make-parser opts grammar)

Nested Maps w/ Original Source

String

parsley/instaparse

ERlang

Source Code

erl_scan + erl_parse

[{'(',1}, {integer,1,3}, 
 {'+',1}, {integer,1,4}, {')',1},
 {'*',1}, {integer,1,7}, 
 {dot,1}]

"(3 + 4) * 7."
erl_scan:string/1

List of Tokens (Tuples)

String

ERlang

Source Code

erl_scan + erl_parse

[{op, 1, '*',
  {op, 1, '+', 
   {integer, 1, 3}, 
   {integer, 1, 4}},
  {integer, 1, 7}}]
[{'(',1}, {integer,1,3}, 
 {'+',1}, {integer,1,4}, {')',1},
 {'*',1}, {integer,1,7}, {dot,1}]
erl_parse:parse_exprs/1

Abstract Syntax Forms (Nested Tuples)

List of Tokens

ERlang

Source Code

ktn_code (erlang-katana)

#{type    => root,
  attrs   => #{},
  content => [#{type  => module,
                attrs => #{...}},
              #{type    => function,
                attrs   => #{arity => 1,
                             name => two_times},
                content => [#{type => clause,
                              attrs => #{...},
                              content => []}]}]}
"-module(double).

two_times(X) -> 2 * X."
ktn_code:parse_tree/1

 Nested Maps

String

Scala

Source Code

scalariform

tokens: List[scalariform.lexer.Token] =
List(Token(VAL,val,0,val), Token(VARID,x,4,x), 
     Token(EQUALS,=,6,=), 
     Token(INTEGER_LITERAL,1,8,1), 
     Token(NEWLINE,"\n",9, "\n"),
     Token(VARID,x,10,x), Token(PLUS,+,12,+), 
     Token(INTEGER_LITERAL,2,14,2),
     Token(EOF,,15,))
"val x = 1
 x + 2"
ScalaLexer.tokenise

List of Tokens

String

Scala

Source Code

scalariform

CompilationUnit( // "val x = 1\nx + 2"
  StatSeq(
    None, // val x = 1
    Some(FullDefOrDcl(List(), List(), PatDefOrDcl(...))),
    List((Token(NEWLINE,"\n",9,"\n"),
          Some(Expr(List(InfixExpr(...))))) // x + 2
  ),
  Token(EOF,"",15,"")
)
List(Token(VAL,val,0,val), 
     Token(VARID,x,4,x), ...)
new ScalaParse().compilationUnitOrScript

CompilationUnit

(Top Level AstNode)

List of Tokens

Manipulation

RECURSION

manipulation

RECURSION

manipulation

Depth/BREADTH first search

DFS

BFS

RECURSION

manipulation

Depth/BREADTH first search

DFS

BFS

RECURSION

manipulation

Depth/BREADTH first search

DFS

BFS

RECURSION

manipulation

Depth/BREADTH first search

DFS

BFS

RECURSION

manipulation

Depth/BREADTH first search

DFS

BFS

RECURSION

manipulation

PRE/POST walk

PRE

POST

f(node) = Paint node Red

RECURSION

manipulation

PRE/POST walk

PRE

POST

f(node) = Paint node Red

RECURSION

manipulation

PRE/POST walk

PRE

POST

f(node) = Paint node Red

RECURSION

manipulation

PRE/POST walk

PRE

POST

f(node) = Paint node Red

RECURSION

manipulation

PRE/POST walk

PRE

POST

f(node) = Paint node Red

RECURSION

manipulation

PRE/POST walk

PRE

POST

f(node) = Paint node Red

RECURSION

manipulation

Custom

  • Mixing it up (BFS+DFS+PRE+POST)

  • Early pruning

  • Whatever pattern you can think of

RECURSION

manipulation

LIMITED

RIGID

Well-Known

simple

Depth/Breadth First Search

PRE/POst walk

custom

the Zipper(1)

manipulation

Functional Iterator for Immutable Data Structures

Location

Node

Path 2 Root

SIBLINGS L/R

the Zipper(2)

manipulation

[]

L[];R[] 

ROOT

*

the Zipper(2)

manipulation

DOWN

[*]

l[];R[7] 

ROOT

*

+

the Zipper(2)

manipulation

4

[+, *]

l[];R[3] 

*

+

ROOT

DOWN

DOWN

the Zipper(2)

manipulation

*

3

[+]

l[4];R[] 

+

ROOT

DOWN

RIGHT

DOWN

the Zipper(2)

manipulation

*

3

[+]

l[4];R[] 

+

ROOT

DOWN

RIGHT

DOWN

the Zipper(2)

manipulation

up

ROOT

DOWN

[*]

l[];R[7] 

*

+

RIGHT

DOWN

the Zipper(2)

manipulation

up

RIGHT

DOWN

ROOT

DOWN

RIGHT

[*]

l[+];R[] 

*

7

the Zipper(3)

manipulation

clojure.zip

akhudek/fast-zip

inaka/zipper

ferd/zippers

scalaz.TreeLoc

scalaz.Zipper

Domain SpecifiC Languages

manipulation

jai

($ zloc [(defn ^:% vector? | _)] 
        do-something)
(if (and (-> zloc z/prev z/prev z/sexpr (= "defn"))
         (-> zloc z/prev z/sexpr vector?))
    (do-something zloc)
    zloc)

zipper + CSS-like selector + pattern matching  

Domain SpecifiC Languages

manipulation

jai

($ zloc [(defn ^:% vector? | _)] 
        do-something)
(if (and (-> zloc z/prev z/prev z/sexpr (= "defn"))
         (-> zloc z/prev z/sexpr vector?))
    (do-something zloc)
    zloc)

zipper + CSS-like selector + pattern matching  

[(defn ^:% vector? | _)]
(and (-> zloc z/prev z/prev z/sexpr (= "defn"))
     (-> zloc z/prev z/sexpr vector?))

Verification

Verification

Plain TEXT

Verification

Plain TEXT

  • Line Length

  • no Tabs, use spaces

  • Trailing whitespace

  • Space after commas, plus, etc.

  • New line at the EOF

Verification

Abstract syntax tree

  1. Iterate through all nodes

  2. Filter by some condition(s)

  3. Optional checks per node

  4. report found nodes

Basic Algorithm

Verification

  1. Iterate

  2. Filter 

  3. Opt. checks

  4. report

Abstract syntax tree

Basic Algorithm

EAStwood

Verification

ELVIS

scalaStyle

Tools

PARSE

Manipulate

tools.analyzer

tools.analyzer.ast

scalariform.lexer

scalariform.parser

scalastyle.VisitorHelper

ktn_code

(erlang-katana)

inaka/zipper

DEPRECATions

Verification

(defn deprecations [{:keys [asts]} opt]
  (for [ast (map #(ast/postwalk % pass/reflect-validated) asts)
        dexpr (filter deprecated? (ast/nodes ast))
        :let [loc (pass/code-loc (pass/nearest-ast-with-loc dexpr))]]
    (util/add-loc-info loc
     {:linter :deprecations
      :msg (msg dexpr)})))

EASTWOOD

DEPRECATions

Verification

EASTWOOD

(defn deprecations [{:keys [asts]} opt]

I F O R

(defn deprecations [{:keys [asts]} opt]
  (for [ast (map #(ast/postwalk % pass/reflect-validated) asts)
        dexpr (filter deprecated? (ast/nodes ast))
        :let [loc (pass/code-loc (pass/nearest-ast-with-loc dexpr))]]
    (util/add-loc-info loc
     {:linter :deprecations
      :msg (msg dexpr)})))

DEPRECATions

Verification

EASTWOOD

  (for [ast (map #(ast/postwalk % pass/reflect-validated) asts)
        dexpr (filter deprecated? (ast/nodes ast))
(defn deprecations [{:keys [asts]} opt]
  (for [ast (map #(ast/postwalk % pass/reflect-validated) asts)
        dexpr (filter deprecated? (ast/nodes ast))
        :let [loc (pass/code-loc (pass/nearest-ast-with-loc dexpr))]]
    (util/add-loc-info loc
     {:linter :deprecations
      :msg (msg dexpr)})))

I F O R

DEPRECATions

Verification

EASTWOOD

 :let [loc (pass/code-loc (pass/nearest-ast-with-loc dexpr))]]
    (util/add-loc-info loc
     {:linter :deprecations
      :msg (msg dexpr)})))
(defn deprecations [{:keys [asts]} opt]
  (for [ast (map #(ast/postwalk % pass/reflect-validated) asts)
        dexpr (filter deprecated? (ast/nodes ast))
        :let [loc (pass/code-loc (pass/nearest-ast-with-loc dexpr))]]
    (util/add-loc-info loc
     {:linter :deprecations
      :msg (msg dexpr)})))

I F O R

ELVIS

Verification

No if

no_if_expression(Config, Target, _RuleConfig) ->
    {Root, _} = elvis_file:parse_tree(Config, Target),

    Predicate = fun(Node) -> ktn_code:type(Node) == 'if' end,
    ResultFun = result_node_line_fun(?NO_IF_EXPRESSION_MSG),

    case elvis_code:find(Predicate, Root) of
        [] -> [];
        IfExprs -> lists:map(ResultFun, IfExprs)
    end.

ELVIS

Verification

No if

no_if_expression(Config, Target, _RuleConfig) ->
    {Root, _} = elvis_file:parse_tree(Config, Target),

    Predicate = fun(Node) -> ktn_code:type(Node) == 'if' end,
    ResultFun = result_node_line_fun(?NO_IF_EXPRESSION_MSG),

    case elvis_code:find(Predicate, Root) of
        [] -> [];
        IfExprs -> lists:map(ResultFun, IfExprs)
    end.
no_if_expression(Config, Target, _RuleConfig) ->

I F O R

ELVIS

Verification

No if

no_if_expression(Config, Target, _RuleConfig) ->
    {Root, _} = elvis_file:parse_tree(Config, Target),

    Predicate = fun(Node) -> ktn_code:type(Node) == 'if' end,
    ResultFun = result_node_line_fun(?NO_IF_EXPRESSION_MSG),

    case elvis_code:find(Predicate, Root) of
        [] -> [];
        IfExprs -> lists:map(ResultFun, IfExprs)
    end.
    {Root, _} = elvis_file:parse_tree(Config, Target),

    Predicate = fun(Node) -> ktn_code:type(Node) == 'if' end,

I F O R

ELVIS

Verification

No if

no_if_expression(Config, Target, _RuleConfig) ->
    {Root, _} = elvis_file:parse_tree(Config, Target),

    Predicate = fun(Node) -> ktn_code:type(Node) == 'if' end,
    ResultFun = result_node_line_fun(?NO_IF_EXPRESSION_MSG),

    case elvis_code:find(Predicate, Root) of
        [] -> [];
        IfExprs -> lists:map(ResultFun, IfExprs)
    end.
    case elvis_code:find(Predicate, Root) of

I F O R

ELVIS

Verification

No if

no_if_expression(Config, Target, _RuleConfig) ->
    {Root, _} = elvis_file:parse_tree(Config, Target),

    Predicate = fun(Node) -> ktn_code:type(Node) == 'if' end,
    ResultFun = result_node_line_fun(?NO_IF_EXPRESSION_MSG),

    case elvis_code:find(Predicate, Root) of
        [] -> [];
        IfExprs -> lists:map(ResultFun, IfExprs)
    end.
        [] -> [];
        IfExprs -> lists:map(ResultFun, IfExprs)
    end.

I F O R

scalaStyle

Verification

final def verify(ast: CompilationUnit): List[ScalastyleError] = {
  val it = for {
    t <- localvisit(ast.immediateChildren(0));
    f <- traverse(t);
    if (matches(f))
  } yield {
    PositionError(f.position.get, params(f))
  }

  it.toList
}

I F O R

AbstractMethodChecker

scalaStyle

Verification

final def verify(ast: CompilationUnit): List[ScalastyleError] = {
  val it = for {
    t <- localvisit(ast.immediateChildren(0));
    f <- traverse(t);
    if (matches(f))
  } yield {
    PositionError(f.position.get, params(f))
  }

  it.toList
}

I F O R

final def verify(ast: CompilationUnit): List[ScalastyleError] = {
AbstractMethodChecker

scalaStyle

Verification

final def verify(ast: CompilationUnit): List[ScalastyleError] = {
  val it = for {
    t <- localvisit(ast.immediateChildren(0));
    f <- traverse(t);
    if (matches(f))
  } yield {
    PositionError(f.position.get, params(f))
  }

  it.toList
}

I F O R

 val it = for {
    t <- localvisit(ast.immediateChildren(0));
    f <- traverse(t);
AbstractMethodChecker

scalaStyle

Verification

final def verify(ast: CompilationUnit): List[ScalastyleError] = {
  val it = for {
    t <- localvisit(ast.immediateChildren(0));
    f <- traverse(t);
    if (matches(f))
  } yield {
    PositionError(f.position.get, params(f))
  }

  it.toList
}

I F O R

    if (matches(f))
AbstractMethodChecker

scalaStyle

Verification

final def verify(ast: CompilationUnit): List[ScalastyleError] = {
  val it = for {
    t <- localvisit(ast.immediateChildren(0));
    f <- traverse(t);
    if (matches(f))
  } yield {
    PositionError(f.position.get, params(f))
  }

  it.toList
}

I F O R

  } yield {
    PositionError(f.position.get, params(f))
  }
AbstractMethodChecker

Challenges

Verification

Challenges

Verification

Corner Cases

Example: Space After Comma

- Regex (Plain Text)

- Find Node & Check Type

- Use tokens!

[1,2, 3]
"1,2, 3"
"1, 2, 3,"
"4,5, 6"
"1,2, 3"

Challenges

Verification

UNDER-SPECIFICATION

EXAMPLE: Spaghetti Code

SPAGHETTI

CODE

=

?

Challenges

Verification

ESSENTIAL COMPLEXITY

EXAMPLE: Don't REPEAT YOURSELF

- Compare Every Node with Every Other Node. O(N^2)

- Ignore properties (var names, locations, etc.)

- Compare Subsets of Contiguous Expressions. O(N!?)

Challenges

Verification

AMBIGUITIES

EXAMPLE: IF vs. WHEN

(if (odd? x)
  :do-this)
(when (odd? x)
  :do-this)

vs. 

Solutions (?)

Verification

  • Add Knobs (options)

  • Formal Specification

  • Learn FRom mistakes

  • Patience & Passion

  • tools ARE READY & Available 

  • All asts conceptually similar

  • useful applications

  • Tip of the iceberg

  • lots of fun

Questions?

@jfacorro

jfacorro

Cool Stuff/REFERENCES

ASK QUESTIONS, GET ANSWERS

Thank You!

Manipulating & Verifying Source Code

By Juan Facorro

Manipulating & Verifying Source Code

Lambda Jam 2015

  • 789