Fission Reactor
Datalog, CRDTs, and a WebNative Database
Welcome 👋
Hi, I'm Quinn!
Applied Researcher @
Fission
@wilton_quinn
on Twitter
Agenda
The state of WebNative
Introducing Dialog!
Open discussion
We have lots of time today
Please interrupt if you have questions!
The State of WebNative
WebNative
is a typescript SDK for decentralized apps
...but
@theappcypher
recently joined to port to Rust
Provides...
Identity and authorization, with
UCAN
File storage, with the
WebNative FileSystem
End-to-end encryption, with
WebCrypto
Cross-device syncing, with
IPFS
But what about structured data?
Modeling rich application state and relational data
Collaborative apps with concurrent access
Why an Edge Database?
Data is being produced at unprecedented rates, but existing systems are showing their cracks
Walled gardens inhibit interoperability
Users lack agency over the use and sharing of data
Fundamental limits on scaling
A local-first database offers solutions to these issues
Data can be directly shared between applications
E2E encryption enables granular access controls
Data locality eliminates network calls
Smaller datasets alleviates some scaling concerns
If only it were so easy...
A local-first database must
embrace
uncertainty
Devices may be offline for months at a time
The set of devices is unknown and unbounded
The capabilities of those devices varies
Concurrent updates may span long-lived branches of history
Think conflict resolution of a long-lived Git branch
Normally we ask when a system will converge...
...but here we must instead question what the world looks like when convergence is impossible
Introducing Dialog!
Dialog
is an edge database we're building that uses a dialect of
Datalog
to query data in local-first
applications
Its design goals include:
Exposing
bitemporal CRDTs
, that offer resilience to
byzantine faults
Modeling application state as views over these data types, with
incremental view maintenance
Enabling data integration over
heterogeneous
and
encrypted
data sets
It's also very much a work in progress!
Data- what?
Datalog
is a declarative language from the 70s, with roots in logic programming
Programs are rules that operate on a set of facts
It's used in...
Deductive databases (
Datomic
)
Distributed systems (
Bloom
)
Static analysis (
Soufflé
)
Datalog is equivalent to relational algebra (SQL) with recursion
This makes it a powerful language for representing complex queries!
Time Traveling CRDTs
C
onflict-free
R
eplicated
D
ata
T
ypes
Data structures which support coordination-free synchronization of updates, while guaranteeing
strong eventual consistency
Roughly, two peers that have received the same events will converge on the same state
Bitemporality
is a technique for modeling time in databases such that...
The state at any point in history can be recovered
Alternative timelines can be forked
Keeping CALM with Datalog
The
CALM Principle
proves:
"logically monotonic distributed code is eventually consistent without any need for coordination protocols"
Datalog is logically monotonic...
...what if it could be used to simplify the design and implementation of CRDTs?
Spoiler alert: it can!
Data structures as queries: Expressing CRDTs using Datalog (2018)
Byzantine Faults and CRDTs!
We're building a trustless database though; CRDTs must converge given invalid events
IDs must be unambiguous
Causal dependencies must be unforgeable
Semantic invariants must be verifiable
These goals can all be achieved by explicitly modeling causality over a content-addressable DAG
Making CRDTs Byzantine Fault Tolerant (2022)
Represent Datalog facts as 4-tuples:
(entity, attribute, value, causality)
The CID doubles as a tie-breaker!
Speeding through Time
Time travel and conflict resolution mean recomputing views from arbitrary points
How do we avoid starting from the beginning?
Incremental view maintenance
with Datalog!
A class of algorithms for recomputing a view when the inputs change
Current candidate is Delete/Rederive (DRed)
Compute difference in facts between two points
Delete all derivations that rely on deleted facts
Rederive all facts with alternative derivations
Heterogenous Data Integration
Remember, global convergence isn't a goal for us!
Real-world systems are messy and non-overlapping
Social networks rely on asymmetry and privacy
In science and media we trust different sources
Data relevance is a question of
access
and
intent
Encrypted data is captured by hidden sub-DAGs
Relevant data is filtered for in the view
We trade global convergence for locally deterministic and mutually compatible interpretations of data
Partially Encrypted DAGs
{A, B, C, D}
{A}
{A, B}
{A, C}
What's next?
Reminder: this is all very early!
We've primarily been defining the problem
Most of our focus has been on Datalog and CRDTs
I've prototyped a Datalog engine to explore some implementation details
An incomplete list of open questions:
What does the DSL for views and queries look like?
How is provenance tracked and stored?
Can indexes be verified and shared?
How does the UX for time travel and forking look?
Bringing it all together...
Dialog
is our approach to a local-first database that prioritizes user agency, data privacy, and interoperability
It eschews traditional ideas of convergence in favor of recognizing the inherent complexity of the world
It ties together
Datalog
,
CRDTs
, and
Content-Addressable Storage
using IPFS
...with the goal of integrating into our existing
WebNative
SDK
Made with Slides.com