DataScript

as a 

Lingua Franca for declarative domain modeling

The approach (spoiler)

  • Encode the Domain Model (*) of your system in an in-memory data structure (a "meta-DB")
  • ... supported by DataScript, an in-memory graph DB
  • ... from which the "machinery" of the system is generically derived (DB schema / API endpoints / GraphQL schema / data validation / doc pages / etc.)

 

(*) Domain Model ≈ Data Schema ≈ UML diagram

What this approach is NOT

  • A blueprint of your system's infrastructure (à la integrant or schematic)
  • About front-end development
  • A library

An example Domain Model

What is DataScript?

What is DataScript?

  • An in-memory, immutable database, inspired by Datomic, for the JVM and JS
  • With a graph-like structure ("universal" relation of Entity/Attribute/Value)
  • Powerful read APIs: Datalog (pattern matching), Entity (navigational), Pull (pulling trees of data), raw index access
  • Data-oriented, composable writes (implicit upserts) 

DataScript Demo

DataScript is a data structure

Data Structure Read API Write API
Clojure Sequence first, next, rest conj
Clojure Map get, contains?, keys, vals assoc, dissoc
DataScript Datalog, Entity API, Pull API, raw indexes (dt/with db write)

The problem

The problem

Any given bit of the Domain Model gets used (explicitly or not) in many parts of the system:

  • Database schema
  • Input data validation
  • GraphQL schema
  • HTTP/REST API contracts (Swagger etc.)
  • Security rules
  • Test data generation
  • documentation
  • ETL

=> we'd like to make that DRY / declarative

Solution

Declare the domain model in one place, and derive the machinery aspects from there

The Solution: class annotations!

public class User {
  UUID id;
  @UserPrivate
  String email;
  String name;
  @RefTyped(cardinality="many")
  List<User> follows;
  @Derived
  int n_followers(){}
  @Derived
  @RefTyped(cardinality="many")
  List<Tweet> tweets(){}
}

The Solution: class annotations!

public class User {
  UUID id;
  @UserPrivate
  String email;
  String name;
  @RefTyped(cardinality="many")
  List<User> follows;
  @Derived
  int n_followers(){}
  @Derived
  @RefTyped(cardinality="many")
  List<Tweet> tweets(){}
}

Prior art

  • DB DML
  • Class annotations (ORM, Frameworks...)
  • API schema (GraphQL, Swagger)
  • informal documentation (UML)

Limitations:

  • not programmable
  • not portable
  • no/poor query API => not extensible
  • biased, incomplete perspective

We can do better

Domain Model meta-data as data

Naive approach: plain old Clojure data structures

(def domain-model-metadata
  {:types
   [{:entity-type/name :twitteur/User
     :entity-type/attributes
     [{:attribute/name :user/email
       :attribute.scalar/type :string
       :twitteur.security/private? true}
      {:attribute/name :user/tweets
       :attribute/ref-typed? true
       :attribute.ref-typed/type :twitteur/Tweet
       :attribute.ref-typed/many? true}
      ...]}
    {:entity-type/name :twitteur/Tweet
     :entity-type/attributes
     [...]}]})

Limitation: still a poor query API

Where DataScript comes in

Is this a good idea?

2 ways of developing

  • plumbing-first: start from the mechanical parts (HTTP routes, DB queries, etc.), shaping them towards business requirements 
    • domain model is scattered, implicit
    • accidental complexity
    • early success
    • adaptible
  • domain-first: start from the language of the domain, generate the mechanical parts from that
    • generic, abstract
    • principled

This approach is domain-first

  • Requires a mature enough understanding of the domain (and its trajectory)
  • Principled, not (very) adaptible
    • leave some escape hatches!
  • Generic, abstract
    • the dev team must be wanting to learn!

A toolkit for homemade frameworks

  • Not letting a 3rd-party framework make the assumptions for you
  • /!\ You're in the business of framework-building!
    • test it well
    • document it well
    • think it through

Questions?

DataScript as a lingua franca for declarative domain modeling

By Val Waeselynck

DataScript as a lingua franca for declarative domain modeling

  • 1,173