A DSL* in Python

domain specific language

 

Josh Finnie

Senior Software Maven @ TrackMaven

The Problem

  • The monolith was failing us
  • We wanted faster metrics
  • We wanted more complex metrics
    • but we wanted to have metrics still be "simple"

The Answer

  • Flask
  • Easy-to-use domain specific language (DSL)

Flask

  • Moved away from Django
    • Light-weight
    • Easy with jsonify

  • No need for additional packages

The DSL

  • Allows for complexity
  • Allows for completeness
  • Allows for ease-of-use
  • Allows for ease-of-programming

The DSL

The idea behind the DSL was to allow for users to calculate complex metrics without programmatically interfering:

Want to calculate "total Facebook likes"?

 

Want to calculate the "average Facebook likes & Twitter retweets per post & tweet"?

<URL>?calculate=total.facebook.*.likes
<URL>?calculate=div(sum(total.facebook.*.likes,total.twitter.*retweets),sum(count.facebook.*,count.twitter.*))

The DSL (app.py)

@app.route('/workspace/<workspace_id>', methods=['GET'])
def index(workspace_id):
    calc = request.args.get('calculate', None)
    bucket = request.args.get('bucket', 'day')
    start_date = request.args.get('start_date', None)
    end_date = request.args.get('end_date', None)

    # Get variables from user submitted DSL
    es_vars = VariableParser.parse(calc)

    # Build equation from user submitted DSL
    equation = EquationBuilder(calc, [x.string for x in es_vars])

    # Generate ElasticSearch aggregation and call it
    time_series_agg = TimeSeriesAggregation(interval=bucket)
    sub_agg_dict = AggregationBuilder(es_vars=es_vars).as_dict()
    time_series_agg.add_subaggregations(sub_agg_dict)
    total_results = time_series_agg.call(workspace_id, request.args)

    """
    total_results ~= {
        "total.facebook.*.likes": {
            "total.facebook.*.likes": {
                "value": 7
            }
        },
        "count.facebook.*": {
            "doc_count": 9
        },
        ...
    }
    """

    # Compute equation from user submitted DSL with ES aggregation
    final_results = []
    for time_bucket in total_results['data']['main']['buckets']:
        result = equation.calculate_result(time_bucket)
        final_results += [{'x': time_bucket['key_as_string'], 'y': result}]

    # Return the data in JSON format
    return jsonify({"data": final_results})

The DSL (EquationBuilder)

class EquationBuilder(object):
    ...

    @staticmethod
    def _split_calculation(calculation):
        """
        `_split_calculation` is a private function to separate a formula
        represented as a string into words and ')'s
        `div(add(x,y),z)` =>
            `['div','add','x','y',')','z',')']`
        """
        ...

    def _evaluate(self, operator):
        operand_list = []
        i = self.stack.pop()
        while i is not ")":
            operand_list.append(i)
            i = self.stack.pop()
        answer = self._perform_operation(operand_list, operator)
        self.stack.append(answer)
        return answer

    def calculate_result(self, var_dict):
        OPERATORS = ["add", "div", "sub", "mult"]
        self.stack = []

        for item in reversed(self.tokens):
            if item in [')']:
                self.stack.append(item)
            elif is_number(item):
                self.stack.append(float(item))
            elif item in self.variables:
                if "count" in item:
                    self.stack.append(var_dict[item]['doc_count'])
                else:
                    self.stack.append(var_dict[item][item]['value'])
            elif item in OPERATORS:
                self._evaluate(item)
        return self.stack[0]

Conclusion

  • Super fast
  • Super scalable
  • Easier to program with
    • We're no longer hardcoding metrics
  • Surprisingly easy to write
    • Which surprised me... SURPRISE!

Thanks

Any questions?

 

Come work for TrackMaven, we're pretty awesome!

 

@joshfinnie

http://www.joshfinnie.com