Some neat specs for: serialization, canonical hashing, immutable linking, and protocol design
but first,
a little
IPLD originated from the IPFS project group
IPFS is for files, right?
Well, yeah. But what are "files" but a particular case of of some tree structures?
IPFS is for files, right?
Each of the points in this graph is
an IPLD object.
much like git is just a blob store,
IPFS is just an IPLD store.
They both just "happen" to have some porcelain
that makes them good at storing and manipulating files.
And IPLD objects are a bit like a structured format
for making new objects like git's internals, but more generally!
let's be honest...
We needed this in IPFS.
Our needs aren't that rare.
Why not make it reusable?
We want to make building applications easier.
We want to make building *distributed* applications easier.
Immutable links are a powerful primitive.
Content-addressable systems are good.
JSON.
CBOR.
Protobuf.
Git packs.
Chain blocks.
Whatever.
We need tools for data-description that are
decentralization-friendly,
language agnostic,
and grok immutable links.
Most existing systems are very 'cathedral'; do not want
"codecs" in IPLD are pretty familiar
"codecs" in IPLD are pretty familiar
So what if we could standardize this?
So we hoist links into the model
So we hoist links into the model
... as `CID`s.
<cid> ::= <cid-version><multicodec><multihash>
CID -- Content Identifiers -- are a simple standard for content-addressable links.
They provide for future-proof versioning at the lowest levels, choice of hash, etc!
So we hoist links into the model
... as `CID`s.
<cid> ::= <cid-version><multicodec><multihash>
We map other systems (e.g. Git, which has its own native hashing scheme) into CID space by assigning them a "multicodec" marker number.
Text
CIDs can be represented as strings or bytes.
Pictured: unpacking the prefixes, you can see this this CID refers to raw bytes, uses a sha-256 hash, etc.
Some codecs can represent any arrangement of maps and lists and whatever map keys you want, etc. JSON and CBOR and other general serialization formats can do this.
Some codecs have opinions about what they can store. Protobufs, for example, must have pre-enumerated fields. Git, for example, can be seen as IPLD, but only with some known fields.
Both of these are fine.
And so now we have the multicodec table.
It contains a lot of things.
And we're ready to move onto the Data Model.
https://github.com/multiformats/multicodec/blob/master/table.csv
It's basically the JSON you know and love, but specified abstractly so we can use other representations too.
{ "hi": "hello", "list": ["indeed"], "ints": 42, "strings": "so many", "bools": true }
It's basically the JSON you know and love, but specified abstractly so we can use other representations too.
A5 # map(5)
62 # text(2)
6869 # "hi"
65 # text(5)
68656C6C6F # "hello"
64 # text(4)
6C697374 # "list"
81 # array(1)
66 # text(6)
696E64656 # "indeed"
64 # text(4)
696E7473 # "ints"
18 2A # uint(42)
67 # text(7)
737472696E6 # "strings"
67 # text(7)
736F206D616 # "so many"
65 # text(5)
626F6F6C73 # "bools"
F5 # true
A shared model of the data isn't just great for internal library abstractions (though it is!)...
You can choose one format for storage and internal use and canonical hashing (say, CBOR, because it's fast)...
And use still use something human-readable (like JSON) for display, web APIs, and other place binary doesn't go.
(adjective) Very comprehensive; pertaining or appropriate to large classes or their characteristics; -- opposed to specific.
(adjective -- computing, of program code) Written so as to operate on any data type
more runtime errors
more shared code
more compile time checks
hypergenericism
the 'Any' type
mechanisms for parameterized types
extremely verbose types everywhere
All of the Data Model kinds (map, list, int, str, etc) are a `Node`.
So a JSON document is just a Node tree.
(Including links, it's a Node Graph -- specifically, a DAG!)
You can operate on Nodes by asking them what type they are at runtime.
This is a lot like the 'reflect' capabilities in strongly-typed programming languages you're already familiar with.
All of the Data Model kinds (map, list, int, str, etc) are a `Node`.
Nodes are immutable!
Is the mutable counterpart for a Node.
Any Node can return a NodeBuilder which can build a new, similar Node in a copy-on-write fashion.
* these statements may vary based on your client library of choice. go-ipld-prime
is based on these immutable/COW designs.
Kind() // returns 'map', 'list', 'int', etc TraverseField(key) // steps across a map TraverseIndex(idx) // steps across a list MapIterator() // iterates a map ListIterator() // iterates a list AsString() // unboxes native string (or errors) AsInt() // unboxes native int (or errors) AsBool() // unboxes native bool (or errors) AsBytes() // you get the idea AsLink() // returns a content-ID you can load..!
tl;dr: choose an in-memory working representation that's good to you.
We can make deep traversals generically.
Even works agross link boundaries!
func (p Progress) Traverse( start Node, pathSegments []string, do func( target Node, p2 Progress, ), )
Updates? Why not?
func (p Progress) Update( start Node, pathSegments []string, doReplace func( target Node, p2 Progress, ) (replacement Node), )
(basically, we're saying you can build `jq` here... and it would work equally well over JSON, CBOR, or any other codec you can provide.)
What other generic algorithms would you like to write?
Graph transformers?
Toposorters?
Do it.
A query "language"?
Why not.
We built one that's actually an AST that's also represented in IPLD.
(You can bring your own DSLs.)
Bears some vague resemblance to GraphQL queries...
Is regular IPLD.
We're putting this in the core library so you can use it for more traversals!
We can use these selectors to guide visits to subsections of a graph.
This is a useful building block for many other algorithms and applications.
func (p Progress) Walk( start Node, selector Selector, atEach func( target Node, p2 Progress, ), )
In IPFS, replicating IPLD objects over the network and between storage pools works totally generically.
Soon: using those generic selectors and traversals,
we can easily sync around graphs and subgraphs.
Say what you want with a selector: get it all in one stream, with minimum RTTs.
Neat!
(remember how we said that Node interface was going to have another purpose?)
Imagine we want a map...
But it's backed by a...
It can even be several 'blocks', connected by Links...
schemas are just another (big) example of something that's "generic over the data model"
which is cool and wholesome
All of the existing systems are hard to apply here.
Immutable links are consequential.
We care about migration...
And we care about migration that works for decentralized protocols and distributed development practices.
(Which means strict version numbers are *out* -- requires central coordination.)
## MyString is a named type. type MyString string ## MyInt is another one. type MyInt int ## and so on
type MyString string ## "String" is the key type; ## "MyString" is the value type. type MyMap map {String:MyString} ## or inline in other things: type MyStruct struct { aField {String:MyString} }
type MyString string ## Looks familiar already, right? type MyList list [MyString] ## or inline in other things: type MyStruct struct { aField [MyString] }
## Without the 'nullable' keyword, ## this list can *only* contain ## strings! type MyList list [String] ## This list can contain either ## a string or a 'null' at each ## entry in the list. type HoleyList list [nullable String]
'nullable' can be applied to map values,
list values, and struct fields.
## Structs have a known set of fields. type MyStruct struct { x Int y String z nullable MyStruct }
'optional' is distinct from 'nullable'!
Means the field can be *missing* entirely.
'optional' only applies to struct fields.
## Structs have a known set of fields. type MyStruct struct { x Int y optional nullable String z optional MyStruct }
Schema | Valid Matching Representations | Cardinality |
---|---|---|
type Foo struct { bar Bool } |
{"bar": true} {"bar": false} |
2 |
Schema | Valid Matching Representations | Cardinality |
---|---|---|
type Foo struct { bar Bool } |
{"bar": true} {"bar": false} |
2 |
type Foo struct { bar nullable Bool } |
{"bar": true} {"bar": false} {"bar": null} |
3 = 2+1 |
type Foo struct { bar optional Bool } |
{"bar": true} {"bar": false} {} |
3 = 2+1 |
type Foo struct { bar optional nullable Bool } |
{"bar": true} {"bar": false} {"bar": null} {} |
4 (!) = 2+1+1 |
type Foo struct { bar Bool (default "false") } |
{"bar": true} {} |
2 |
Cardinality-counting is an important design foundation.
If the cardinality of two parts of a model aren't the same, then that means one of them is less expressive.
Can use this to reason about compatibility and completeness of models!
Defaults are a neat feature for reducing serialized verbosity... without changing cardinality.
(This means encountering the 'default' in the serial data is rejected...! Otherwise, the transform would be lossy!)
## 'defaults' can be used to elide ## common values when serializing; ## they *don't* change cardinality. type MyStruct struct { y Bool (default false) z String (default "word") }
## Structs are represented as maps ## by default! So you dont need to ## say it. type MyStruct struct { x Int y Bool z String } representation map
Everything we've seen so far has had an implicit "representation" -- instructions for how it maps onto the Data Model kinds.
## This type will serialize as a list! type MyStruct struct { x Int y Bool z String } representation tuple
We can customize these.
## This will serialize as a STRING! type MyStruct struct { x String y String } representation stringjoin { delim ":" }
... And it can change the kind of representation entirely.
## This will serialize as a STRING! type MyStruct struct { ... } representation stringjoin { ... } ## So this map can use it as a key...! type WildMap map {MyStruct:Whatever}
This is an important feature:
with it, we can use structs as map keys, for example.
Unions (also often known as "sum types") can contain data from any one of their member types...
but only one at a time.
type NeatUnion union { | MemberTypeOne "one" | MemberTypeTwo "two" | MemberTypeThree "three" } representation keyed
type MyEnum enum { | One "one" | Two "two" | Three "3" }
## This map will be sharded.
type MyMap map {String:MyString}<HAMT>
## Additional config here.
advanced HAMT {
implementation "experimental/HAMT/v1"
bitwidth 14
hashalgo "murmur"
}
Gives us the ability to do "structural typing" -- it detects matching data, without the use of explicit version numbers!
Many community implementations already exist; more are wanted!
Advanced layouts are a recent feature; many explorations required!
Codegen is a recent feature in go-ipld.
Other languages could benefit from similar categories of tooling!
What can you dream of building with a decentralized protocol-building toolkit?
Especially the `specs` repo!
freenode: #ipld
twitter: @warpfork
github: warpfork
etc: probably warpfork
sometimes responds to 'eric' when shouted
https://ipld.io/
Thank you!
https://github.com/ipld/
https://ipld.io/