Parsing out
a Good Parser
Parsing Parsers Outline
-
Why Parse and What is a Parser
-
Ways to Parse and What we will Parse
-
Attempt #1 Regroup and Try Again #2
-
Try #3 has got to work
-
Never Give Up the Ghost #4
-
What Works and What to Watch Out for
Why Parse
- people cannot understand binary data
- computers cannot understand people
- a parser translates from 'people' to 'computer'
- people are less than perfect in writing text
- computers require structural perfection
What is a Parser
- program to convert text into binary data
- but not any text, the text has to follow a grammar
- a grammar is a set of rules (we'll code these up soon)
- the binary data is also called an Abstract Syntax Tree (AST)
Parts of a Parser
-
Lexer (optional)
-
A parser generator (grammar compiler) OR a runtime library
-
Actions: functions that translate to the Abstract Syntax Tree (AST) elements of your choice
All the Ways to Parse
- regular expressions (v)
- parser generators using PEG files
- hand coded (recursive decent) parsers
- parser combinators
- others
(techniques)
What to Parse
- identifiers
- strings "in quotes"
- lists (of identifiers, strings and of course nested lists)
- comments
a modest wish list
Test
Driven
Demo
https://github.com/nmorse/set-parsers-to-stun
canopy
parser generator for languages [python java javascript ruby]
nearley
generates javascript
also generates RR diagrams
a hand coded parser
with the help of xState (a finite state machine lib)
arcsecond
A set of parser "combinators"
(functions that take other functions as arguments and return (yes) new functions)
Compose them (combine them) into a parser
canopy | nearley | hand code | arcsecond | |
---|---|---|---|---|
learning curve | +1 | +3 | -2 | -1 |
features | +1 | +3 | 0 | +3 |
friendly errors | 0 | 0 | +1 | +1 |
following | -1 | +3 | 0 | +2 |
bottom line | +1 | +9 | -1 | +5 |
What Works and What to Watch Out for
Thank You
https://github.com/nmorse/set-parsers-to-stun
https://nearley.js.org/
https://xstate.js.org/viz/
https://github.com/francisrstokes/arcsecond
Parsing all the
By Nate Morse
Parsing all the
- 967