I made a DSL

and I liked

Why?

We wanted to give our end users the same tools as our analysts.

Why?

Sometimes our users know their problem area better.

Just give them the tools.

Why make a DSL?

All existing rules were written in python and just evaled in production.

Did you consider using?

Text

Authoring Requirements

Prevent injection attacks
Give feedback to users

Injection Attacks

locals()

import

match

block these?

allow these?

Is it safer to....

!=
in

✅

What kind of feedback?

Syntax

Semantics

abc == 123
asn == 1234 and
asn == 123 and ( ptr_org == 'foo.com' or h_from == 'm.foo.com'

asn == '123'  # ASN always integrer

ptr_org in asn  # ptr_org always string

regex(ptr_org, 123)  # 123 not a valid regex pattern

Does this already exist?

Nope

What about...

lex

yacc

bison

pyparsing

ply

parsley
parsimonious

lark

sly

antlr

Jetbrain's MPS

¯\_(ツ)_/¯

Time to make a DSL

Phases

Plan: Determine the grammar
Implement: Lexer, Parser, Compiler
Deploy: Integrate into Existing Authoring System as beta

Plan

Problem #1: what all we use
Problem #2: DSL syntax

Problem #1: determine what all we use

Historically, "just use Python" was the spec for analysts, we had no idea what was being used
We wrote a script using Python's AST library and just dumped out every token found across our existing rules.
That's when the surprises rolled in:
- Just comparisons ops ==, != but not < or >
- We had both lists and tuples being used
- Certain variables injected but completely unused

Problem #2: DSL syntax

Since we could make the syntax different, should we?
- using AND/OR instead of and/or
- or using = instead of ==
For the initial pass, we did decided to support a subset of Python's language spec, and not limit ourselves to that permanently.
We decided to reject doing both lists [,] and tuples (,), and just have one symbol set and we went with lists using [,]
As a subset of Python's syntax, we didn't have to immediately rewrite the runtime in production to use our new compiler. So this was a win from a deployment compatibility.

Implement

Made as an internal library; separate git repo
Tokenizer
- it knows about exactly which variables to expect
- It can reject anything it doesn't expect
Parser
- LR hand parser, doing depth first
- The parser does the heavy lifting doing syntax checking for boolean logic and parenthesis and semantic checking
- We even check the argument types for each function we have and their outputs
Compiler
- Targets Python for this initial build and simply unrolls the AST from a map for each symbol to python
~70 tests (pytest + parametrize); gitlab ci: tests, black, flake8

Deployment

Rewrote all existing rules that used tuples to using lists
Integrated it into our internal authoring tool fairly simply. Before committing to the db, we check each one. That was deployed after we wrote all the rules.
- Had briefly considered storing the AST instead of the DSL, but decided against it because it locked us into particular AST representations.
Once the runtime was rewritten to support multiple accounts, we implemented shortly after that the new compiler. It was basically a drop in replacement and added two lines.

Final Thoughts

Things I regretted

Not implementing a type system
Not making the DSL re-usable to other domains
Not treating all the operators such as == and in like functions instead of different syntax. (Their visitor functions are discrete.)
Trying a functional only approach with the parser. Found how much state being passed around a bit too much.

Things I liked about the approach

Relatively easy to debug weird rule behavior in production
Authoring for existing users and customers is extremely straightforward
Leaves us open to expansion into new ideas
Enables the feedback

Future Opportunities

Rule overlap checking: as it stands, we don't know if two authors write substantially similar rules
Implementing a client-side syntax and semantic-aware auto-complete/intellisense
Making a library for implementing DSLs so that we can reuse this approach in others areas of the business

Client-Driven WYSIWYG

Example of client side builder for conditions. Src: sentry.io

I made a DSL

By Robert Roskam

I made a DSL

Robert Roskam

Engineer Manager at Pantheon

raiderrobert

I made a DSL

Why?

We wanted to give our end users the same tools as our analysts.

Why?

Sometimes our users know their problem area better.

Just give them the tools.

Why make a DSL?

All existing rules were written in python and just evaled in production.

Did you consider using?

Authoring Requirements

Injection Attacks

block these?

allow these?

Is it safer to....

What kind of feedback?

Syntax

Semantics

Does this already exist?

Nope

What about...

¯\_(ツ)_/¯

Time to make a DSL

Phases

Plan: Determine the grammar

Implement: Lexer, Parser, Compiler

Deploy: Integrate into Existing Authoring System as beta

Plan

Problem #1: determine what all we use

Problem #2: DSL syntax

Implement

Deployment

Final Thoughts

Things I regretted

Things I liked about the approach

Future Opportunities

Client-Driven WYSIWYG

I made a DSL

More from Robert Roskam