Jochen Bedersdorfer
Software Engineer
Data stored in relations (tables)
Fixed Schema
Foreign-key relationships
Columns or combination of columns as index
Data stored as documents
Mostly schema-free
Some foreign-key relationships
Document properties as index
Data as Graph structures with nodes and annotated edges
Fixed Schema (everything is a graph)
Query-by-navigation
Node-based index
Declarative logic programming language
Subset of Prolog (1972)
Based on first-order logic
Not turing complete
Implementations in C/C++, Clojure, Erlang, Haskell, Java, Lua, OCaml, Prolog, Python, Racket, Ruby, Rust, Tcl
jedi(luke).
sith_lord(anakin).
sith_lord(ben).
good(X) :- jedi(X).
mother(padme,luke).
mother(padme,leia).
mother(leia,ben).
mother(shmi,anakin).
father(anakin,luke).
father(anakin,leia).
parent(X,Y) :- mother(X,Y).
parent(X,Y) :- father(X,Y).
sibling(X,Y) :- parent(Z,X), parent(Z,Y), X \== Y.
grand_parent(X,Y) :- parent(X,Z), parent(Z,Y).
force_sensitive(X) :- parent(shmi,X).
force_sensitive(X) :- grand_parent(shmi,X).
ancestor(X,Y) :- parent(X,Y).
ancestor(X,Y) :- parent(X,Z), ancestor(Z,Y).
% force_sensitive(X) :- ancestor(shmi,X).
*Atoms are lower-case, variables are upper-case, % indicates a comment
Demo-Time!
(in 6 easy steps)
Represent entities as sets of facts:
[attribute value]
[father "Vader"]
[name "Luke"]
[job "Jedi"]
Combine facts to entity, give it an ID:
[entity-id attribute value]
[12 father "Vader"]
[12 name "Luke"]
[12 job "Jedi"]
Make it transactional:
[entity-id attribute value trx-id]
[11 father "Vader" 4300]
[11 name "Luke" 4300]
[11 job "Jedi" 4300]
[12 father "Han" 4300]
[12 name "Ben" 4300]
[12 job "Sith Lord" 4300]
[12 alias "Kylo" 4300]
Add/revoke flag
[entity-id attribute value trx-id add?]
[12 job "Sith Lord" 4301 false]
[12 job "Supreme Leader" 4301 true]
(* true is default and can be omitted)
Refer to other entities via their id.
[13 name "Han" 4302]
[13 job "Smuggler" 4302]
[12 father "Han" 4302 false]
[12 father 13 4302]
[14 name "Anakin" 4302]
[14 job "Sith Lord" 4302]
[11 father "Vader" 4302 false]
[11 father 14 4302]
;; <- semicolon introduces a comment
"foo" ;; a string (duh)
:bar ;; a keyword (i.e. interned string)
:db/id ;; keyword id with a namespace
[ ] ;; vector
{ } ;; map from key->value
{ "key1" "value1"
:key2 "value2"
}
#{ } ;; set
() ;; list
As of Trx 4302
;; datoms
[11 :father 14]
[11 :name "Luke"]
[11 :job "Jedi"]
[12 :father 13]
[12 :name "Ben"]
[12 :job "Supreme Leader"]
[12 :alias "Kylo"]
[13 :name "Han"]
[13 :job "Smuggler"]
[14 :name "Anakin"]
[14 :job "Sith Lord"]
Let's add a DB system and some data...
;; Map form
{ :db/id 11
:father 14
:name "Luke"
:job "Jedi" }
{ :db/id 12
:father 13
:name "Ben"
:job "Supreme Leader"
:alias "Kylo" }
{:db/id 13
:name "Han" :job "Smuggler"}
{:db/id 14
:name "Anakin":job "Sith Lord"
}
(def uri "datahike:file:///tmp/tutorial")
(d/create-database uri)
(def conn (d/connect uri))
;; set up schema, which are also datoms.
;; We use some helper functions to generate them
;; example: (schema/str-attr :test)
;; => #:db{:ident :test, :valueType :db.type/string, :cardinality :db.cardinality/one}
;;
(d/transact conn [(schema/str-attr :name)
(schema/one-to-one :father)
(schema/str-attr :job)
(schema/many-str-attr :alias)
])
;; add some data, propose Entity IDs via :db/id
(d/transact conn [{:db/id 11 :name "Luke" :job "Jedi" :father 14}
{:db/id 12 :name "Ben" :job "Supreme Leader" :father 13
:alias "Kylo" }
{:db/id 13 :name "Han" :job "Smuggler"}
{:db/id 14 :name "Anakin" :job "Sith Lord"}
])
Find entity-ids, attributes, values by unification:
[:find ?name ;; ?name is returned
:where
[?e :job "Jedi"] ;; ?e is a free variable
[?e :name ?name] ;; "Jedi" is a constant
;; unify statements to
]
=> #{["Luke"]} ;; find match
Navigating references
[:find ?name ?father
:where
[?e :name ?name]
[?e :father ?f]
[?f :name ?father]
]
> #{["Luke" "Anakin"] ["Ben" "Han"]}
And adding data
#:db{:ident :born, :valueType :db.type/long,
:cardinality :db.cardinality/one}
;; birthdate relative to the Battle of Yavin (SW:ANH)
[{:db/id 11 :born -19} ;; Luke
{:db/id 12 :born 5} ;; Ben Solo
{:db/id 13 :born -29} ;; Han Solo
{:db/id 14 :born -41}] ;; Anakin
[:find ?name ?b
:where
[?e :name ?name]
[?e :born ?b]
[(< 0 ?b)]
]
=> #{["Ben" 5]}
(d/q
'[:find ?name ?b
:in $ ?year
:where
[?e :name ?name]
[?e :born ?b]
[(< ?year ?b)]]
(d/db conn) 0)
[:find (count ?e)
:where
[?e :name _]]
=> 4
[:find (min ?b)
:where
[_ :born ?b]]
=> [-41]
(min ?xs) (max ?xs) (count ?xs) (count-distinct ?xs) (sum ?xs) (avg ?xs) (median ?xs) (variance ?xs) (stddev ?xs)
[:find (sample 2 ?name)
:where
[_ :name ?name]]
[:find (min 2 ?born)
:where
[?e :name ?name]
[?e :born ?born]]
=> [-41 -29]
(distinct ?xs) (min n ?xs) (max n ?xs) (rand n ?xs) (sample n ?xs)
Or use your own functions!
[*] 11 ;; select all attributes
=> {:db/id 11,
:born -19,
:job "Jedi",
:name "Luke",
:father #:db{:id 14}}
[:name :job] 11 ;; one attribute
=> {:name "Luke", :job "Jedi"}
[:name {:father [:name]}] 11 ;; pull referenced entity
=> {:name "Luke", :father {:name "Anakin"}}
Retrieve entities (and referenced entities) via selector and entity-id
[* {:_father [*]}] 14
=>
{:db/id 14,
:born -41,
:job "Sith Lord",
:name "Anakin",
:_father [{:db/id 11,
:born -19,
:father #:db{:id 14},
:job "Jedi",
:name "Luke"}]}
References are bi-directional
(d/q '[:find [pull ?e [:name :born]]
:where
[?e :job ?j]
[(not= ?j "Jedi")]
] (d/db conn))
=> ([{:name "Han", :born -29}]
[{:name "Ben", :born 5}]
[{:name "Anakin", :born -41}])
Let's do both!
and much more (rules, computed values etc.)
=> queries themselves are just data
Append-only, mono or bi-temporal
=> an immutable series of timestamped transactions
=> the database can be viewed in different points in time
*Most vendors still provide excision to hard-delete facts
Remember (d/db conn)?
Database at the current time!
Let's go back in time!
(d/as-of db timestamp!)
(def current-db (d/db conn))
(def db-of-2000 (d/as-of current-db #inst "2000-01-01"))
(d/pull current-db
'[:name] 11
)
=> {:name "Luke"}
(d/pull db-of-2000
'[:name] 11
)
=> nil
Transactions are facts too
;; structure of a fact/datom
[entity attribute value trx]
;; added with every transaction
[trx :db/txInstant timestamp]
;;; we can add
;;; other metadata
[trx :tx/changed-by "beders"]
Find when and how facts where added or retracted
;; all transactions
[:find [[pull ?tx [*]] ...]
:where [?tx :db/txInstant _]]
=>
[#:db{:id 536870918, :txInstant #inst"2020-01-09T17:48:25.381-00:00"}
#:db{:id 536870916, :txInstant #inst"2020-01-09T09:53:50.720-00:00"}
....
]
;; all trxs where :born attribute was added
[:find ?tx ?time
:where [?tx :db/txInstant ?time]
[_ :born _ ?tx true]
]
=>
#{[536870916 #inst"2020-01-09T09:53:50.720-00:00"]
[536870917 #inst"2020-01-09T10:00:17.160-00:00"]}
;; all facts added in a trx
[:find ?e ?a ?v
:in $ ?tx
:where [?e ?a ?v ?tx]
] (d/db conn) 536870914
=>
{[11 :job "Jedi"] ....}
Datomic
(Cognitect - 2012)
Storage backend:
DynamoDB, Cassandra, ORA, PostgreSQL
Used by:
ca, Netflix, FB, TimeInc, NuBank (80TB data, 2100 transactors, 13 shards)
Crux
(Juxt - 2019)
Bi-temporal, trx-time and valid-time
Storage backends:
RocksDB, Kafka (configurable)
Used by:
Oscaro (eCommerce)
Avisi (AtlasCRM)
Datahike
(@Replikativ - 2019)
Uses hitchhiker trees (fractal, snapshottable, on-disk)
Good for single machines, millions of facts
DataScript
(@Tonsky - 2014)
In-memory, in-browser
Used by:
PrecursorApp (collab editing)
Cognician (coaching platform)
By Jochen Bedersdorfer
Overview of Datalog (the prolog subset) and how it's used in databases like Datahike or Datomic