Parsing with Derivatives
A general way to parse context-free grammars
Folkert de Vries
November 27, 2018
The Brzozowski derivative
for regular languages
The Brzozowski derivative
on regular expressions
\(\delta\) is the nullability function that checks that \(\epsilon\) is in \(L\)
The Brzozowski derivative
on regular expressions
\(\delta\) is the nullability function that checks that \(\epsilon\) is in \(L\)
It uses that \(\emptyset \circ L = \emptyset\) and \(\epsilon \circ L = L\)
The Brzozowski derivative
on regular expressions
check that \(ab\) is in \(L = (ab)^*\)
i.e. \(\epsilon \in \delta(D_{ab}(L))\)
we know that kleene star is nullable, so \(ab\) is accepted
The Brzozowski derivative
on context-free grammars
\(S \rightarrow aSb \ |\ \epsilon\)
Works the same as on regular languages, but is now harder to compute because of recursion
The Brzozowski derivative
on context-free grammars
Step 1: Laziness
Unfold only when needed
The Brzozowski derivative
on context-free grammars
Step 2: Memoization of \(\delta\) in \(D_c\)
Don't repeat work
The Brzozowski derivative
on context-free grammars
Step 3: Calculation of \(\delta\) as a least fixed point
1. We know that only \(\epsilon\) is nullable
2. for all productions, check for nullable values in the right-hand side without recursing.
If there is at least one, include it in the set of nullable values
3. repeat 2 until \(Nullable_{n} = Nullable_{n + 1}\)
The Brzozowski derivative
on context-free grammars
Step 3: Calculation of \(\delta\) as a least fixed point
Building Parse Trees
We've only done validation so far, let's actually parse something
The key insight is that \(D_a(a)\) reduces to \(\epsilon \downarrow \{ a \}\)
And that we can let epsilons in our grammar reduce similarly, e.g.
\(S \rightarrow aSb \ |\ \epsilon \downarrow \{ s \}\)
this gives enough information to retrace our steps later
Note: this is like monadic return or applicative pure
Practicality
Pros
- Simple
- Short
- Quite Fast (with more tricks)
Practicality
Cons
- Very difficult to get usable parse output
- Hard to design a good API
- You can probably do better if you know the specific grammar that you need to parse
(in strongly-typed languages)
Conclusion
- Derivatives can be used for parsing
- It is really elegant
- but needs substantial work to become practical
Parsing with Derivatives
By folkert de vries
Parsing with Derivatives
- 54