Datalog

Beyond SQL

Relational DB

Data stored in relations (tables)

Fixed Schema

Foreign-key relationships

Columns or combination of columns as index

Document DB

Data stored as documents

Mostly schema-free

Some foreign-key relationships

Document properties as index

Graph DB

Data as Graph structures with nodes and annotated edges

Fixed Schema (everything is a graph)

Query-by-navigation

Node-based index

And many more...

Datalog

Declarative logic programming language

Subset of Prolog (1972)

Based on first-order logic

Not turing complete

Implementations in C/C++, Clojure, Erlang, Haskell, Java, Lua, OCaml, Prolog, Python, Racket, Ruby, Rust, Tcl

Datalog Example

jedi(luke).
sith_lord(anakin).
sith_lord(ben).

good(X) :- jedi(X).

mother(padme,luke).
mother(padme,leia).
mother(leia,ben).
mother(shmi,anakin).

father(anakin,luke).
father(anakin,leia).

parent(X,Y) :- mother(X,Y).
parent(X,Y) :- father(X,Y).

sibling(X,Y) :- parent(Z,X), parent(Z,Y), X \== Y.

grand_parent(X,Y) :- parent(X,Z), parent(Z,Y).

force_sensitive(X) :- parent(shmi,X).
force_sensitive(X) :- grand_parent(shmi,X).

ancestor(X,Y) :- parent(X,Y).
ancestor(X,Y) :- parent(X,Z), ancestor(Z,Y).

% force_sensitive(X) :- ancestor(shmi,X).

*Atoms are lower-case, variables are upper-case, % indicates a comment

Demo-Time!

Datalog DB

(in 6 easy steps)

Represent entities as sets of facts:

[attribute value]

[father "Vader"]

[name "Luke"]

[job "Jedi"]

Combine facts to entity, give it an ID:

[entity-id attribute value]

[12 father "Vader"]

[12 name "Luke"]

[12 job "Jedi"]

Datalog DB

Make it transactional:

[entity-id attribute value trx-id]

 

[11 father "Vader" 4300]

[11 name "Luke"    4300]

[11 job     "Jedi"      4300]

[12 father "Han"    4300]

[12 name "Ben"     4300]

[12 job "Sith Lord" 4300]

[12 alias "Kylo"       4300]

Datalog DB

Add/revoke flag

[entity-id attribute value trx-id add?]

 

[12 job "Sith Lord"               4301 false]

[12 job "Supreme Leader" 4301 true]

(* true is default and can be omitted)

Datalog DB

Refer to other entities via their id.

 

[13 name "Han"       4302]

[13 job "Smuggler"  4302]

[12 father "Han"      4302 false]

[12 father 13            4302]

 

[14 name "Anakin"  4302]

[14 job "Sith Lord"   4302]

[11 father "Vader"   4302 false]

[11 father 14            4302]

Aside: Data Structure Literals

;; <- semicolon introduces a comment

"foo" ;; a string (duh)

:bar  ;; a keyword (i.e. interned string)

:db/id ;; keyword id with a namespace

[ ] ;; vector

{ } ;; map from key->value

{ "key1" "value1"
  :key2 "value2"
}

#{ } ;; set

()   ;; list

What are our entities now?

As of Trx 4302

;; datoms
[11 :father 14]
[11 :name "Luke"]
[11 :job "Jedi"]


[12 :father 13]
[12 :name "Ben"]
[12 :job "Supreme Leader"]
[12 :alias "Kylo"]


[13 :name "Han"]
[13 :job "Smuggler"]

[14 :name "Anakin"]
[14 :job "Sith Lord"]

Let's add a DB system and some data...

;; Map form
{ :db/id 11 
  :father 14
  :name "Luke"
  :job "Jedi" }

{ :db/id 12
  :father 13
  :name "Ben"
  :job "Supreme Leader"
  :alias "Kylo" }

{:db/id 13
 :name "Han" :job "Smuggler"}

{:db/id 14
 :name "Anakin":job "Sith Lord"
}

Datahike: A datalog DB

(def uri "datahike:file:///tmp/tutorial")
(d/create-database uri)
(def conn (d/connect uri))
;; set up schema, which are also datoms. 
;; We use some helper functions to generate them
;; example: (schema/str-attr :test)
;; => #:db{:ident :test, :valueType :db.type/string, :cardinality :db.cardinality/one}
;; 
(d/transact conn [(schema/str-attr :name)
                  (schema/one-to-one :father)
                  (schema/str-attr :job)
                  (schema/many-str-attr :alias)
                  ])
;; add some data, propose Entity IDs via :db/id
(d/transact conn [{:db/id 11 :name "Luke"   :job "Jedi" :father 14}
                  {:db/id 12 :name "Ben"    :job "Supreme Leader" :father 13
                             :alias "Kylo" }
                  {:db/id 13 :name "Han"    :job "Smuggler"}
                  {:db/id 14 :name "Anakin" :job "Sith Lord"}
                  ])

Querying via Datalog

Find entity-ids, attributes, values by unification:

[:find ?name           ;; ?name is returned
 :where
   [?e :job "Jedi"]    ;; ?e is a free variable
   [?e :name ?name]    ;; "Jedi" is a constant
                       ;; unify statements to
] 
=> #{["Luke"]}         ;; find match

Querying

Navigating references

[:find ?name ?father
         :where
         [?e :name ?name]
         [?e :father ?f]
         [?f :name ?father]
]
> #{["Luke" "Anakin"] ["Ben" "Han"]}

Changing Schema

And adding data

#:db{:ident :born, :valueType :db.type/long, 
     :cardinality :db.cardinality/one}

;; birthdate relative to the Battle of Yavin (SW:ANH)
[{:db/id 11 :born  -19}    ;; Luke
 {:db/id 12 :born    5}    ;; Ben Solo
 {:db/id 13 :born  -29}    ;; Han Solo
 {:db/id 14 :born  -41}]   ;; Anakin

Querying with predicates

[:find ?name ?b
 :where
 [?e :name ?name]
 [?e :born ?b]
 [(< 0 ?b)]
]
=> #{["Ben" 5]}

(d/q 
'[:find ?name ?b
  :in $ ?year
  :where
  [?e :name ?name]
  [?e :born ?b]
  [(< ?year ?b)]] 
   (d/db conn) 0)

Aggregates

[:find (count ?e)
 :where
 [?e :name _]]
 => 4
 
[:find (min ?b)
 :where
 [_ :born ?b]]
 => [-41]
 
(min ?xs)
(max ?xs)
(count ?xs)
(count-distinct ?xs)
(sum ?xs)
(avg ?xs)
(median ?xs)
(variance ?xs)
(stddev ?xs)

Aggregates 2

[:find (sample 2 ?name)
 :where
 [_ :name ?name]]
 
[:find (min 2 ?born)
 :where
 [?e :name ?name]
 [?e :born ?born]]
 => [-41 -29]

 
(distinct ?xs)
(min n ?xs)
(max n ?xs)
(rand n ?xs)
(sample n ?xs)

 

Or use your own functions!

Pulling

[*] 11               ;; select all attributes
=> {:db/id 11, 
    :born -19, 
    :job "Jedi",
    :name "Luke",
    :father #:db{:id 14}}
    
[:name :job] 11           ;; one attribute
=> {:name "Luke", :job "Jedi"}


[:name {:father [:name]}] 11  ;; pull referenced entity
=> {:name "Luke", :father {:name "Anakin"}}

Retrieve entities (and referenced entities) via selector and entity-id

Pulling 2

[* {:_father [*]}] 14
=>
{:db/id 14,
 :born -41,
 :job "Sith Lord",
 :name "Anakin",
 :_father [{:db/id 11, 
           :born -19, 
           :father #:db{:id 14}, 
           :job "Jedi", 
           :name "Luke"}]}

References are bi-directional

Querying and Pulling

(d/q '[:find [pull ?e [:name :born]]
         :where
         [?e :job ?j]
         [(not= ?j "Jedi")]
       ] (d/db conn))
=> ([{:name "Han", :born -29}] 
    [{:name "Ben", :born 5}] 
    [{:name "Anakin", :born -41}])

Let's do both!

and much more (rules, computed values etc.)

=> queries themselves are just data

Immutable Datalog DBs

Append-only, mono or bi-temporal

  • Data is only appended, not deleted* or updated
  • Facts can be retracted, which is recorded as fact
  • Transactions are also facts
  • Transaction-metadata like timestamp is stored as well

=> an immutable series of timestamped transactions

 

=> the database can be viewed in different points in time

*Most vendors still provide excision to hard-delete facts

Database as value

Remember (d/db conn)?

Database at the current time!

 

Let's go back in time!

(d/as-of db timestamp!)

(def current-db (d/db conn))
(def db-of-2000 (d/as-of current-db #inst "2000-01-01"))

(d/pull current-db
 '[:name] 11
)
=> {:name "Luke"}

(d/pull db-of-2000
 '[:name] 11
)
=> nil

Transactions as value

Transactions are facts too

;; structure of a fact/datom
[entity attribute value trx]
;; added with every transaction
[trx :db/txInstant timestamp]

;;; we can add
;;; other metadata
[trx :tx/changed-by "beders"]

Transactions in queries

Find when and how facts where added or retracted

;; all transactions
[:find [[pull ?tx [*]] ...]
 :where [?tx :db/txInstant _]]
 =>
 [#:db{:id 536870918, :txInstant #inst"2020-01-09T17:48:25.381-00:00"}
 #:db{:id 536870916, :txInstant #inst"2020-01-09T09:53:50.720-00:00"}
 ....
 ]
 
 ;; all trxs where :born attribute was added
 [:find ?tx ?time
  :where [?tx :db/txInstant ?time]
         [_ :born _ ?tx true]
         ]
 =>
 #{[536870916 #inst"2020-01-09T09:53:50.720-00:00"] 
   [536870917 #inst"2020-01-09T10:00:17.160-00:00"]}
 
 ;; all facts added in a trx
 [:find ?e ?a ?v
  :in $ ?tx
  :where [?e ?a ?v ?tx]
 ] (d/db conn) 536870914
 =>
 {[11 :job "Jedi"] ....}

Implementations

Datomic

(Cognitect - 2012)

Storage backend:

DynamoDB, Cassandra, ORA, PostgreSQL

Used by:

ca, Netflix, FB, TimeInc, NuBank (80TB data, 2100 transactors, 13 shards)

 

 

Crux

(Juxt - 2019)

Bi-temporal, trx-time and valid-time

Storage backends:

RocksDB, Kafka (configurable)

Used by:

Oscaro (eCommerce)

Avisi (AtlasCRM)

 

 

Implementations

Datahike

(@Replikativ - 2019)

Uses hitchhiker trees (fractal, snapshottable, on-disk)

Good for single machines, millions of facts

DataScript

(@Tonsky - 2014)

In-memory, in-browser

Used by:

PrecursorApp (collab editing)

Cognician (coaching platform)

 

Made with Slides.com