Fission Reactor
Datalog, CRDTs, and a WebNative Database
Welcome 👋
- Hi, I'm Quinn!
- Applied Researcher @ Fission
-
@wilton_quinn on Twitter
- Agenda
- The state of WebNative
- Introducing Dialog!
- Open discussion
- We have lots of time today
- Please interrupt if you have questions!
The State of WebNative
-
WebNative is a typescript SDK for decentralized apps
- ...but @theappcypher recently joined to port to Rust
- ...but @theappcypher recently joined to port to Rust
- Provides...
- Identity and authorization, with UCAN
- File storage, with the WebNative FileSystem
- End-to-end encryption, with WebCrypto
-
Cross-device syncing, with IPFS
- But what about structured data?
- Modeling rich application state and relational data
- Collaborative apps with concurrent access
Why an Edge Database?
- Data is being produced at unprecedented rates, but existing systems are showing their cracks
- Walled gardens inhibit interoperability
- Users lack agency over the use and sharing of data
- Fundamental limits on scaling
- A local-first database offers solutions to these issues
- Data can be directly shared between applications
- E2E encryption enables granular access controls
- Data locality eliminates network calls
- Smaller datasets alleviates some scaling concerns
If only it were so easy...
- A local-first database must embrace uncertainty
- Devices may be offline for months at a time
- The set of devices is unknown and unbounded
- The capabilities of those devices varies
-
Concurrent updates may span long-lived branches of history
- Think conflict resolution of a long-lived Git branch
- Think conflict resolution of a long-lived Git branch
- Normally we ask when a system will converge...
- ...but here we must instead question what the world looks like when convergence is impossible
Introducing Dialog!
-
Dialog is an edge database we're building that uses a dialect of Datalog to query data in local-first applications
- Its design goals include:
- Exposing bitemporal CRDTs, that offer resilience to byzantine faults
- Modeling application state as views over these data types, with incremental view maintenance
- Enabling data integration over heterogeneous and encrypted data sets
- It's also very much a work in progress!
Data- what?
-
Datalog is a declarative language from the 70s, with roots in logic programming
- Programs are rules that operate on a set of facts
- Programs are rules that operate on a set of facts
- It's used in...
- Datalog is equivalent to relational algebra (SQL) with recursion
- This makes it a powerful language for representing complex queries!
Time Traveling CRDTs
-
Conflict-free Replicated Data Types
- Data structures which support coordination-free synchronization of updates, while guaranteeing strong eventual consistency
- Roughly, two peers that have received the same events will converge on the same state
-
Bitemporality is a technique for modeling time in databases such that...
- The state at any point in history can be recovered
- Alternative timelines can be forked
Keeping CALM with Datalog
- The CALM Principle proves:
- "logically monotonic distributed code is eventually consistent without any need for coordination protocols"
- "logically monotonic distributed code is eventually consistent without any need for coordination protocols"
- Datalog is logically monotonic...
- ...what if it could be used to simplify the design and implementation of CRDTs?
- ...what if it could be used to simplify the design and implementation of CRDTs?
- Spoiler alert: it can!
Byzantine Faults and CRDTs!
- We're building a trustless database though; CRDTs must converge given invalid events
- IDs must be unambiguous
- Causal dependencies must be unforgeable
- Semantic invariants must be verifiable
- These goals can all be achieved by explicitly modeling causality over a content-addressable DAG
- Represent Datalog facts as 4-tuples:
- (entity, attribute, value, causality)
- The CID doubles as a tie-breaker!
Speeding through Time
- Time travel and conflict resolution mean recomputing views from arbitrary points
- How do we avoid starting from the beginning?
- How do we avoid starting from the beginning?
-
Incremental view maintenance with Datalog!
- A class of algorithms for recomputing a view when the inputs change
- A class of algorithms for recomputing a view when the inputs change
- Current candidate is Delete/Rederive (DRed)
- Compute difference in facts between two points
- Delete all derivations that rely on deleted facts
- Rederive all facts with alternative derivations
Heterogenous Data Integration
- Remember, global convergence isn't a goal for us!
- Real-world systems are messy and non-overlapping
- Social networks rely on asymmetry and privacy
- In science and media we trust different sources
- Data relevance is a question of access and intent
- Encrypted data is captured by hidden sub-DAGs
- Relevant data is filtered for in the view
- We trade global convergence for locally deterministic and mutually compatible interpretations of data
Partially Encrypted DAGs
{A, B, C, D}
{A}
{A, B}
{A, C}
What's next?
- Reminder: this is all very early!
- We've primarily been defining the problem
- Most of our focus has been on Datalog and CRDTs
- I've prototyped a Datalog engine to explore some implementation details
- I've prototyped a Datalog engine to explore some implementation details
- An incomplete list of open questions:
- What does the DSL for views and queries look like?
- How is provenance tracked and stored?
- Can indexes be verified and shared?
- How does the UX for time travel and forking look?
Bringing it all together...
-
Dialog is our approach to a local-first database that prioritizes user agency, data privacy, and interoperability
- It eschews traditional ideas of convergence in favor of recognizing the inherent complexity of the world
- It ties together Datalog, CRDTs, and Content-Addressable Storage using IPFS
- ...with the goal of integrating into our existing WebNative SDK
Fission Reactor: Datalog, CRDTs, and a WebNative Database
By quinnwilton
Fission Reactor: Datalog, CRDTs, and a WebNative Database
- 1,539