Parsing with Derivatives

A general way to parse context-free grammars

Folkert de Vries 

November 27, 2018

The Brzozowski derivative

for regular languages

L = \{ foo, bar, baz \}
L={foo,bar,baz}L = \{ foo, bar, baz \}
\begin{aligned} D_b(L) &= \{ ar, az \} \\ D_f(L) &= \{ oo \} \\ D_q(L) &= \emptyset \\ D_{foo}(L) &= \{ \epsilon \} \end{aligned}
Db(L)={ar,az}Df(L)={oo}Dq(L)=Dfoo(L)={ϵ}\begin{aligned} D_b(L) &= \{ ar, az \} \\ D_f(L) &= \{ oo \} \\ D_q(L) &= \emptyset \\ D_{foo}(L) &= \{ \epsilon \} \end{aligned}

The Brzozowski derivative

on regular expressions

S \rightarrow (ab)^{*}
S(ab)S \rightarrow (ab)^{*}
\begin{aligned} D_c(\emptyset) &= \emptyset \\ D_c(\epsilon) &= \emptyset \\ D_c(c) &= \epsilon \\ D_c(c') &= \emptyset \\ D_c(P^{*}) &= D_c(P) \circ P^{*} \\ D_c(P \cup S) &= D_c(P) \cup D_c(S) \\ D_c(P \circ S) &= D_c(P) \circ S \cup (\delta(P) \circ D_c(S)) \\ \end{aligned}
Dc()=Dc(ϵ)=Dc(c)=ϵDc(c)=Dc(P)=Dc(P)PDc(PS)=Dc(P)Dc(S)Dc(PS)=Dc(P)S(δ(P)Dc(S))\begin{aligned} D_c(\emptyset) &= \emptyset \\ D_c(\epsilon) &= \emptyset \\ D_c(c) &= \epsilon \\ D_c(c') &= \emptyset \\ D_c(P^{*}) &= D_c(P) \circ P^{*} \\ D_c(P \cup S) &= D_c(P) \cup D_c(S) \\ D_c(P \circ S) &= D_c(P) \circ S \cup (\delta(P) \circ D_c(S)) \\ \end{aligned}

\(\delta\) is the nullability function that checks that \(\epsilon\) is in \(L\)

The Brzozowski derivative

on regular expressions

\begin{aligned} \delta(\emptyset) &= \emptyset \\ \delta(\epsilon) &= \{ \epsilon \}\\ \delta(c) &= \emptyset\\ \delta(P \cup S) &= \delta(P) \cup \delta(S)\\ \delta(P \circ S) &= \delta(P) \cap \delta(S)\\ \delta(P^*) &= \{ \epsilon \}\\ \end{aligned}
δ()=δ(ϵ)={ϵ}δ(c)=δ(PS)=δ(P)δ(S)δ(PS)=δ(P)δ(S)δ(P)={ϵ}\begin{aligned} \delta(\emptyset) &= \emptyset \\ \delta(\epsilon) &= \{ \epsilon \}\\ \delta(c) &= \emptyset\\ \delta(P \cup S) &= \delta(P) \cup \delta(S)\\ \delta(P \circ S) &= \delta(P) \cap \delta(S)\\ \delta(P^*) &= \{ \epsilon \}\\ \end{aligned}

\(\delta\) is the nullability function that checks that \(\epsilon\) is in \(L\)

It uses that \(\emptyset \circ L = \emptyset\) and \(\epsilon \circ L = L\)

The Brzozowski derivative

on regular expressions

\begin{aligned} D_a(L) &= D_a((a \circ b)^{*}) \\ &= D_a(a \circ b) \circ (a \circ b)^{*} \\ &= D_a(a) \circ b \cup (\delta(a) \circ D_a(b)) \circ (a \circ b)^{*} \\ &= \epsilon \circ b \cup \emptyset \circ (a \circ b)^{*} \\ &= b \circ (a \circ b)^*\\ \end{aligned}
Da(L)=Da((ab))=Da(ab)(ab)=Da(a)b(δ(a)Da(b))(ab)=ϵb(ab)=b(ab)\begin{aligned} D_a(L) &= D_a((a \circ b)^{*}) \\ &= D_a(a \circ b) \circ (a \circ b)^{*} \\ &= D_a(a) \circ b \cup (\delta(a) \circ D_a(b)) \circ (a \circ b)^{*} \\ &= \epsilon \circ b \cup \emptyset \circ (a \circ b)^{*} \\ &= b \circ (a \circ b)^*\\ \end{aligned}

check that \(ab\) is in \(L = (ab)^*\)

i.e. \(\epsilon \in \delta(D_{ab}(L))\) 

\begin{aligned} D_{ab}(L) &= D_b(D_a(L)) \\ &= (a \circ b)^{*} \\ \end{aligned}
Dab(L)=Db(Da(L))=(ab)\begin{aligned} D_{ab}(L) &= D_b(D_a(L)) \\ &= (a \circ b)^{*} \\ \end{aligned}

we know that kleene star is nullable, so \(ab\) is accepted

The Brzozowski derivative

on context-free grammars

\(S \rightarrow aSb \ |\ \epsilon\)

Works the same as on regular languages, but is now harder to compute because of recursion

\begin{aligned} D_x(L) &= D_x(L) \circ \{x\} \cup \epsilon \end{aligned}
Dx(L)=Dx(L){x}ϵ\begin{aligned} D_x(L) &= D_x(L) \circ \{x\} \cup \epsilon \end{aligned}
\begin{aligned} L = L \circ \{x\} \cup \epsilon \end{aligned}
L=L{x}ϵ\begin{aligned} L = L \circ \{x\} \cup \epsilon \end{aligned}

The Brzozowski derivative

on context-free grammars

Step 1: Laziness

\begin{aligned} D_x(L) &= D_x(L) \circ \{x\} \cup \epsilon \end{aligned}
Dx(L)=Dx(L){x}ϵ\begin{aligned} D_x(L) &= D_x(L) \circ \{x\} \cup \epsilon \end{aligned}
\begin{aligned} L = L \circ \{x\} \cup \epsilon \end{aligned}
L=L{x}ϵ\begin{aligned} L = L \circ \{x\} \cup \epsilon \end{aligned}

Unfold only when needed

The Brzozowski derivative

on context-free grammars

Step 2: Memoization of \(\delta\) in \(D_c\)

\begin{aligned} D_x(L) &= D_x(L) \circ \{x\} \cup \epsilon \end{aligned}
Dx(L)=Dx(L){x}ϵ\begin{aligned} D_x(L) &= D_x(L) \circ \{x\} \cup \epsilon \end{aligned}
\begin{aligned} L = L \circ \{x\} \cup \epsilon \end{aligned}
L=L{x}ϵ\begin{aligned} L = L \circ \{x\} \cup \epsilon \end{aligned}

Don't repeat work

The Brzozowski derivative

on context-free grammars

Step 3: Calculation of \(\delta\) as a least fixed point

\begin{aligned} L = L \circ \{x\} \cup \epsilon \end{aligned}
L=L{x}ϵ\begin{aligned} L = L \circ \{x\} \cup \epsilon \end{aligned}

1. We know that only \(\epsilon\) is nullable

2. for all productions, check for nullable values in the right-hand side without recursing. 

If there is at least one, include it in the set of nullable values

3. repeat 2 until \(Nullable_{n} = Nullable_{n + 1}\)

The Brzozowski derivative

on context-free grammars

Step 3: Calculation of \(\delta\) as a least fixed point

\begin{aligned} P &\rightarrow S\\ S &\rightarrow TS \ |\ a\\ T &\rightarrow \epsilon \end{aligned}
PSSTS  aTϵ\begin{aligned} P &\rightarrow S\\ S &\rightarrow TS \ |\ a\\ T &\rightarrow \epsilon \end{aligned}
\begin{aligned} P &\rightarrow S\\ S &\rightarrow TS \ |\ \epsilon\\ T &\rightarrow \epsilon \end{aligned}
PSSTS  ϵTϵ\begin{aligned} P &\rightarrow S\\ S &\rightarrow TS \ |\ \epsilon\\ T &\rightarrow \epsilon \end{aligned}

Building Parse Trees

We've only done validation so far, let's actually parse something

The key insight is that \(D_a(a)\) reduces to \(\epsilon \downarrow \{ a \}\)

And that we can let epsilons in our grammar reduce similarly, e.g.

\(S \rightarrow aSb \ |\ \epsilon \downarrow \{ s \}\)

this gives enough information to retrace our steps later

Note: this is like monadic return or applicative pure

D_{aabb}(S) = D_a(a) \circ (D_a(a) \circ (\epsilon \downarrow \{ s \} \circ (D_b(b) \circ D_b(b))))
Daabb(S)=Da(a)(Da(a)(ϵ{s}(Db(b)Db(b))))D_{aabb}(S) = D_a(a) \circ (D_a(a) \circ (\epsilon \downarrow \{ s \} \circ (D_b(b) \circ D_b(b))))
\begin{aligned} & = \{ a \} \times (D_a(a) \circ (\epsilon \downarrow \{ s \} (D_b(b) \circ D_b(b)))) \\ & = \{ a \} \times ( \{ a \} \times (\epsilon \downarrow \{ s \} (D_b(b) \circ D_b(b)))) \\ & = \{ a \} \times ( \{ a \} \times (\{ s \} \times (D_b(b) \circ D_b(b)))) \\ & = \{ a \} \times ( \{ a \} \times (\{ s \} \times( \{ b \} \times D_b(b))) \\ & = \{ a \} \times ( \{ a \} \times (\{ s \} \times( \{ b \} \times \{ b \})) \\ & = \{ a \} \times ( \{ a \} \times (\{ s \} \times( \{ (b, b) \}))) \\ & = \{ a \} \times ( \{ a \} \times \{ (s, (b, b)) \}) \\ & = \{ a \} \times ( \{ a, (s, (b, b)) \}) \\ & = \{ (a, (a, (s, (b, b)))) \} \\ \end{aligned}
={a}×(Da(a)(ϵ{s}(Db(b)Db(b))))={a}×({a}×(ϵ{s}(Db(b)Db(b))))={a}×({a}×({s}×(Db(b)Db(b))))={a}×({a}×({s}×({b}×Db(b)))={a}×({a}×({s}×({b}×{b}))={a}×({a}×({s}×({(b,b)})))={a}×({a}×{(s,(b,b))})={a}×({a,(s,(b,b))})={(a,(a,(s,(b,b))))}\begin{aligned} & = \{ a \} \times (D_a(a) \circ (\epsilon \downarrow \{ s \} (D_b(b) \circ D_b(b)))) \\ & = \{ a \} \times ( \{ a \} \times (\epsilon \downarrow \{ s \} (D_b(b) \circ D_b(b)))) \\ & = \{ a \} \times ( \{ a \} \times (\{ s \} \times (D_b(b) \circ D_b(b)))) \\ & = \{ a \} \times ( \{ a \} \times (\{ s \} \times( \{ b \} \times D_b(b))) \\ & = \{ a \} \times ( \{ a \} \times (\{ s \} \times( \{ b \} \times \{ b \})) \\ & = \{ a \} \times ( \{ a \} \times (\{ s \} \times( \{ (b, b) \}))) \\ & = \{ a \} \times ( \{ a \} \times \{ (s, (b, b)) \}) \\ & = \{ a \} \times ( \{ a, (s, (b, b)) \}) \\ & = \{ (a, (a, (s, (b, b)))) \} \\ \end{aligned}
D_{aabb}(S) = D_a(a) \circ (D_a(a) \circ (\epsilon \downarrow \{ s \} \circ (D_b(b) \circ D_b(b))))
Daabb(S)=Da(a)(Da(a)(ϵ{s}(Db(b)Db(b))))D_{aabb}(S) = D_a(a) \circ (D_a(a) \circ (\epsilon \downarrow \{ s \} \circ (D_b(b) \circ D_b(b))))
\begin{aligned} & = \{ (a, (a, (s, (b, b)))) \} \\ \end{aligned}
={(a,(a,(s,(b,b))))}\begin{aligned} & = \{ (a, (a, (s, (b, b)))) \} \\ \end{aligned}

Practicality

Pros

  • Simple
  • Short
  • Quite Fast (with more tricks)

Practicality

Cons

  • Very difficult to get usable parse output
  • Hard to design a good API
  • You can probably do better if you know the specific grammar that you need to parse

(in strongly-typed languages)

Conclusion

  • Derivatives can be used for parsing
  • It is really elegant
  • but needs substantial work to become practical