I made a DSL

and I liked

Why?

We wanted to give our end users the same tools as our analysts.

Why?

Sometimes our users know their problem area better.

Just give them the tools.

Why make a DSL?

All existing rules were written in python and just evaled in production.

Did you consider using?

Text

Authoring Requirements

  • Prevent injection attacks

  • Give feedback to users

Injection Attacks

locals()

import

:=

match

 block these?

allow these?

Is it safer to....

==

!=
in

What kind of feedback?

Syntax

Semantics

abc == 123
asn == 1234 and
asn == 123 and ( ptr_org == 'foo.com' or h_from == 'm.foo.com'
asn == '123'  # ASN always integrer

ptr_org in asn  # ptr_org always string

regex(ptr_org, 123)  # 123 not a valid regex pattern

Does this already exist?

Nope

What about...

lex

yacc

bison

pyparsing

ply

parsley
parsimonious

lark

sly

antlr

Jetbrain's MPS

 

¯\_(ツ)_/¯

Time to make a DSL

Phases

  • Plan: Determine the grammar

  • Implement: Lexer, Parser, Compiler

  • Deploy: Integrate into Existing Authoring System as beta

Plan

  • Problem #1: what all we use
  • Problem #2: DSL syntax 

Problem #1: determine what all we use

  • Historically, "just use Python" was the spec for analysts, we had no idea what was being used
  • We wrote a script using Python's AST library and just dumped out every token found across our existing rules.
  • That's when the surprises rolled in:
    • Just comparisons ops ==, != but not < or >
    • We had both lists and tuples being used
    • Certain variables injected but completely unused

Problem #2: DSL syntax

  • Since we could make the syntax different, should we?
    • using AND/OR instead of and/or
    • or using = instead of ==
  • For the initial pass, we did decided to support a subset of Python's language spec, and not limit ourselves to that permanently. 
  • We decided to reject doing both lists [,] and tuples (,), and just have one symbol set and we went with lists using [,]
  • As a subset of Python's syntax, we didn't have to immediately rewrite the runtime in production to use our new compiler. So this was a win from a deployment compatibility.

Implement

  • Made as an internal library; separate git repo
  • Tokenizer
    • it knows about exactly which variables to expect
    • It can reject anything it doesn't expect
  • Parser
    • LR hand parser, doing depth first
    • The parser does the heavy lifting doing syntax checking for boolean logic and parenthesis and semantic checking
    • We even check the argument types for each function we have and their outputs
  • Compiler
    • Targets Python for this initial build and simply unrolls the AST from a map for each symbol to python
  • ~70 tests (pytest + parametrize); gitlab ci: tests, black, flake8

Deployment

  • Rewrote all existing rules that used tuples to using lists 
  • Integrated it into our internal authoring tool fairly simply. Before committing to the db, we check each one. That was deployed after we wrote all the rules.
    • Had briefly considered storing the AST instead of the DSL, but decided against it because it locked us into particular AST representations.
  • Once the runtime was rewritten to support multiple accounts, we implemented shortly after that the new compiler. It was basically a drop in replacement and added two lines.

Final Thoughts

Things I regretted

  • Not implementing a type system
  • Not making the DSL re-usable to other domains
  • Not treating all the operators such as == and in like functions instead of different syntax. (Their visitor functions are discrete.)
  • Trying a functional only approach with the parser. Found how much state being passed around a bit too much.

Things I liked about the approach

  • Relatively easy to debug weird rule behavior in production
  • Authoring for existing users and customers is extremely straightforward
  • Leaves us open to expansion into new ideas
  • Enables the feedback

Future Opportunities

  • Rule overlap checking: as it stands, we don't know if two authors write substantially similar rules
  • Implementing a client-side syntax and semantic-aware auto-complete/intellisense
  • Making a library for implementing DSLs so that we can reuse this approach in others areas of the business

Client-Driven WYSIWYG

Example of client side builder for conditions. Src: sentry.io

I made a DSL

By Robert Roskam

I made a DSL

  • 114