Fission Reactor

Datalog, CRDTs, and a WebNative Database

Welcome 👋

  • Hi, I'm Quinn!
  • Agenda
    • The state of WebNative
    • Introducing Dialog!
    • Open discussion
       
  • We have lots of time today
    • Please interrupt if you have questions!

The State of WebNative

  • WebNative is a typescript SDK for decentralized apps
  • Provides...
  • But what about structured data?
    • Modeling rich application state and relational data
    • Collaborative apps with concurrent access

Why an Edge Database?

  • Data is being produced at unprecedented rates, but existing systems are showing their cracks
    • Walled gardens inhibit interoperability
    • Users lack agency over the use and sharing of data
    • Fundamental limits on scaling
  • A local-first database offers solutions to these issues 
    • Data can be directly shared between applications
    • E2E encryption enables granular access controls
    • Data locality eliminates network calls
    • Smaller datasets alleviates some scaling concerns

If only it were so easy...

  • A local-first database must embrace uncertainty
    • Devices may be offline for months at a time
    • The set of devices is unknown and unbounded
    • The capabilities of those devices varies

  • Concurrent updates may span long-lived branches of history
    • Think conflict resolution of a long-lived Git branch

  • Normally we ask when a system will converge...
    • ...but here we must instead question what the world looks like when convergence is impossible

Introducing Dialog!

  • Dialog is an edge database we're building that uses a dialect of Datalog to query data in local-first applications
     
  • Its design goals include:
    • Exposing bitemporal CRDTs, that offer resilience to byzantine faults
    • Modeling application state as views over these data types, with incremental view maintenance
    • Enabling data integration over heterogeneous and encrypted data sets
       
  • It's also very much a work in progress!

Data- what?

  • Datalog is a declarative language from the 70s, with roots in logic programming
    • Programs are rules that operate on a set of facts
       
  • It's used in...
  • Datalog is equivalent to relational algebra (SQL) with recursion
    • This makes it a powerful language for representing complex queries!

Time Traveling CRDTs

  • Conflict-free Replicated Data Types
    • Data structures which support coordination-free synchronization of updates, while guaranteeing strong eventual consistency
    • Roughly, two peers that have received the same events will converge on the same state
  • Bitemporality is a technique for modeling time in databases such that...
    • The state at any point in history can be recovered
    • Alternative timelines can be forked 

Keeping CALM with Datalog

Byzantine Faults and CRDTs!

  • We're building a trustless database though; CRDTs must converge given invalid events
    • IDs must be unambiguous
    • Causal dependencies must be unforgeable
    • Semantic invariants must be verifiable
  • These goals can all be achieved by explicitly modeling causality over a content-addressable DAG
  • Represent Datalog facts as 4-tuples:
    • (entity, attribute, value, causality)
    • The CID doubles as a tie-breaker! 

Speeding through Time

  • Time travel and conflict resolution mean recomputing views from arbitrary points
    • How do we avoid starting from the beginning?
  • Incremental view maintenance with Datalog!
    • A class of algorithms for recomputing a view when the inputs change
  • Current candidate is Delete/Rederive (DRed)
    1. Compute difference in facts between two points
    2. Delete all derivations that rely on deleted facts
    3. Rederive all facts with alternative derivations

Heterogenous Data Integration

  • Remember, global convergence isn't a goal for us!
     
  • Real-world systems are messy and non-overlapping
    • Social networks rely on asymmetry and privacy
    • In science and media we trust different sources
  • Data relevance is a question of access and intent
    • Encrypted data is captured by hidden sub-DAGs
    • Relevant data is filtered for in the view
  • We trade global convergence for locally deterministic and mutually compatible interpretations of data

Partially Encrypted DAGs

{A, B, C, D}

{A}

{A, B}

{A, C}

What's next?

  • Reminder: this is all very early!
    • We've primarily been defining the problem
    • Most of our focus has been on Datalog and CRDTs
      • I've prototyped a Datalog engine to explore some implementation details
  • An incomplete list of open questions:
    • What does the DSL for views and queries look like?
    • How is provenance tracked and stored?
    • Can indexes be verified and shared?
    • How does the UX for time travel and forking look?

Bringing it all together...

  • Dialog is our approach to a local-first database that prioritizes user agency, data privacy, and interoperability
     
  • It eschews traditional ideas of convergence in favor of recognizing the inherent complexity of the world
     
  • It ties together Datalog, CRDTs, and Content-Addressable Storage using IPFS
     
  • ...with the goal of integrating into our existing WebNative SDK
Made with Slides.com