The Timeless Stack
<3 IPLD
Timeless Stack
- Reproducible Builds / Deterministic Compute project.
- Linux containers meets content-addressable.
- Hashes are focused, not hidden or an afterthought.
- Want cont-addr specs for what to compute: "computation addressable".
IPLD
- "Interplanetary Linked Data"
- A set of standards for data representation and modelling aimed at building distributed systems.
- Hashes are focused, not hidden or an afterthought.
What are these projects?
Timeless Stack
- Reproducible Builds / Deterministic Compute project.
- Linux containers meets content-addressable.
- Hashes are focused, not hidden or an afterthought.
- Want cont-addr specs for what to compute: "computation addressable".
IPLD
- "Interplanetary Linked Data"
- A set of standards for data representation and modelling aimed at building distributed systems.
- Hashes are focused, not hidden or an afterthought.
Some quick context...
Clearly, we've got some aims in common!
Timeless Stack Principles: Reproducible Builds
- We want to attest that when we run the same computation twice (or more), we get the same result.
- Hashes are great for this: equality checks are cheap!
- We also want decentralized "trust" (or verifiability without trust, really): more hashes to the rescue!
- Therefore, we hash both our inputs, our outputs, and a lot of our datastructures.
Cont-Addr documents used many places
in the Timeless Stack:
- We need to address Immutable filesystems!
- We need to address each computation (e.g. container spec)!
- To build the ecosystem as a whole, we need to associate and relate many computations and many filesystems...
We need to build indexes which do this
(and then address snapshots of those too!).
We'll talk about each of these in detail...
Addressing Computation
("formulas")
This is a formula.
A formula represents a container to run.
The inputs map is {path:filesystemHash}.
The action is a command to run in the container.
The outputs map is filesystems to pack & hash.
{
"formula": {
"inputs": {
"/": "tar:6q7G4hWr283FpTa5Lf8heVqw9t97b5"
},
"action": {
"exec": ["/bin/mkdir", "-p", "/task/out/beep"]
},
"outputs": {
"/task/out": {"packtype": "tar"}
}
}
}
This is a how you use it.
`repeatr` is a command in the Timeless Stack suite of tools which evaluates formulas.
When you run this, it'll:
- get all the filesystems,
- launch a container,
- wait for the command,
- pack all the outputs,
- emit JSON describing it.
$ repeatr run theformula.json
This is a RunRecord.
RunRecords are what `repeatr` emits when finished evaluating a Formula.
Check out the 'results': more filesystem hashes.
Check out 'formulaID': a strong link to the formula!
{
"guid": "cyrw3c3f-k9hag7xm-53wcy9b5",
"time": 1544875520,
"formulaID": "9mb9Nixx2M5FoxVJgQtYzn1QvtQdM1TZjZ",
"exitCode": 0,
"results": {
"/task/out": "tar:729LuUdChuu7traKQHNVAoWD9Ajmr"
}
}
This is a RunRecord.
Notice some fields don't converge here: the 'timestamp' and the 'guid'. This is on purpose!
Separate runs of a formula should produce separate results... even when the outputs are the same!
=> to attest reproducibility.
{
"guid": "cyrw3c3f-k9hag7xm-53wcy9b5",
"time": 1544875520,
"formulaID": "9mb9Nixx2M5FoxVJgQtYzn1QvtQdM1TZjZ",
"exitCode": 0,
"results": {
"/task/out": "tar:729LuUdChuu7traKQHNVAoWD9Ajmr"
}
}
Review: How many things have we hashed already?
Lots.
-
Wares (the filesystem IDs)
(and we'll talk about those more in a sec) -
Formulas
(to address our computations) -
RunRecords
(to address and attest our results)
Review: How many things have we hashed already?
And they compose:
RunRecords include the hashes of Formulas,
and
RunRecords and Formulas both include hashes of Wares.
Addressing Filesystems
("wares")
This is a WareID.
"tar:6q7G4hWr283FpTa5Lf8heVqw9t97b5"
This is also a WareID.
"git:48065b8b217aba443965a8fb065646f74a2b5ecf"
The Timeless Stack doesn't particularly care how files are packed.
We have a process called `rio` which abstracts all this.
The word before the ":" is the keyword for selecting a which filesystem packing & unpacking plugin to use.
rio unpack tar:qweoiruqwpoeiru
rio pack tar ./path/to/filesystem
rio unpack git:f274ab4c3953b2dd2ef
The Timeless Stack doesn't particularly care how files are packed.
We just have a couple of semantic needs:
- The filesystem has to be immutable and addressable.
- The filesystem has to have a clear mapping onto regular POSIX filesystems: we need to "unpack" it --
so that we can set up our containers.
- The filesystem has to have a clear mapping from regular POSIX filesystems: we need to "pack" it --
so we can export results from our containers.
The Timeless Stack doesn't particularly care how files are packed.
Some of those needs are awfully interesting, though...
What's in a POSIX filesystem, anyway?
The Timeless Stack doesn't particularly care how files are packed... mostly.
What's in a POSIX filesystem, anyway?
Turns out there's a couple things that you'll need --
like it or not, in practice, your container will not run if you can't represent these:
- /dev/null !
- The execute bit on files!
- rwxrwxrwx in general (some important programs look at these and error if they don't like the perms!).
- uids and gids :(
- sticky bits on /tmp! ...etc......etc......etc...
can use we use IPFS for this?
- IPFS seems like a natural fit, because it's immutable and content-addressable, yayy!
- But No :(:(:(
- Can't current store device nodes, permissions, or other critical bits :(
- Also serious issues around convergence -- too many parameters: chunking, trickledag, etc, all change the hash in nonsemantic ways.
Here's hoping for UnixFSv2...?
Currently:
use & hash tar.
enc.Step(&tok.Token{Type: tok.TMapOpen, Length: fieldCount})
// Name
enc.Step(&tok.Token{Type: tok.TString, Str: "n"})
enc.Step(&tok.Token{Type: tok.TString, Str: m.Name.Last()})
// Type
enc.Step(&tok.Token{Type: tok.TString, Str: "t"})
enc.Step(&tok.Token{Type: tok.TString, Str: string(m.Type)})
// Permission mode bits (this is presumed to already be basic perms (0777)
/// and setuid/setgid/sticky (07000) only, per fs.Metadata standard).
enc.Step(&tok.Token{Type: tok.TString, Str: "p"})
enc.Step(&tok.Token{Type: tok.TInt, Int: int64(m.Perms)})
// UID (numeric)
enc.Step(&tok.Token{Type: tok.TString, Str: "u"})
enc.Step(&tok.Token{Type: tok.TInt, Int: int64(m.Uid)})
// GID (numeric)
enc.Step(&tok.Token{Type: tok.TString, Str: "g"})
enc.Step(&tok.Token{Type: tok.TInt, Int: int64(m.Gid)})
// Skipped: size -- because that's fairly redundant
// Linkname, if it's a symlink
if m.Linkname != "" {
enc.Step(&tok.Token{Type: tok.TString, Str: "l"})
enc.Step(&tok.Token{Type: tok.TString, Str: m.Linkname})
}
// devMajor and devMinor numbers, if it's a device
if m.Type == fs.Type_Device || m.Type == fs.Type_CharDevice {
enc.Step(&tok.Token{Type: tok.TString, Str: "dM"})
enc.Step(&tok.Token{Type: tok.TInt, Int: m.Devmajor})
enc.Step(&tok.Token{Type: tok.TString, Str: "dm"})
enc.Step(&tok.Token{Type: tok.TInt, Int: m.Devminor})
}
// Modtime
enc.Step(&tok.Token{Type: tok.TString, Str: "m"})
enc.Step(&tok.Token{Type: tok.TInt, Int: m.Mtime.Unix()})
enc.Step(&tok.Token{Type: tok.TString, Str: "mn"})
enc.Step(&tok.Token{Type: tok.TInt, Int: int64(m.Mtime.Nanosecond())})
You can see a snippet of the hashing code we use over to the right.
If IPLD proposes a UnixFSv2 standard for canonical filesystem content hashing which worked over the tar format as well as IPFS, that'd be a m a z i n g.
Addressing (bigger!) Computation
("modules")
"Formulas" already showed us how to represent and thus hash a spec of a single computation.
Now suppose we want to string bigger things together... and feed results of one computation into inputs of another!
This is a Module.
We're using names here!
In the middle you can see a "proto-formula". It will be templated into a real formula with hashes -- which can then be run.
Don't mind the "imports" and "exports" for now... more on that in a minute.
{
"imports": {
"base": "catalog:polydawn.io/busybash:v1:amd64"
},
"steps": {
"step-name": {
"protoformula": {
"inputs": {
"/": "base"
},
"action": {
"exec": [
"/bin/bash", "-c",
"echo hi | tee /task/out/file"
]
},
"outputs": {
"out": "/task/out"
}
}
}
},
"exports": {
"export-label": "step-name.out"
}
}
This is a Module.
We're using names here!
You can see how data is wired through.
This is a very simple example; more "steps" can be added to the map. Sizable dependency graphs can be wired together with these names.
{ "imports": { "base": "catalog:polydawn.io/busybash:v1:amd64" }, "steps": { "step-name": { "protoformula": { "inputs": { "/": "base" }, "action": { "exec": [ "/bin/bash", "-c", "echo hi | tee /task/out/file" ] }, "outputs": { "out": "/task/out" } } } }, "exports": { "export-label": "step-name.out" } }
Concept:
"neighborhoods"
In other parts of the design, hashes are preferred because they're unambiguous.
Names work here because the scope is limited: only refer to other things in the "neighborhood".
The "neighborhood" is all in one document... covered by one hash.
{ "imports": { "base": "catalog:polydawn.io/busybash:v1:amd64" }, "steps": { "step-name": { "protoformula": { "inputs": { "/": "base" }, "action": { "exec": [ "/bin/bash", "-c", "echo hi | tee /task/out/file" ] }, "outputs": { "out": "/task/out" } } } }, "exports": { "export-label": "step-name.out" } }
So this is how we can do pipelines of computations.
But the "neighborhood" here is still pretty smol.
What if we want to coordinate even bigger things?
And share them?
Addressing, Indexing, and Connecting it all
("catalogs")
We have yet more stuff to hash ;)
This is a Lineage, & some Releases.
This document maps human-readable names onto the hashes of filesystems.
When we build stuff (using a module), we can release it by making a document like this.
{
"name": "domain.org/team/project",
"releases": [
{
"name": "v2.0rc1",
"items": {
"docs": "tar:SiUoVi9KiSJoQ0vE29",
"linux-amd64": "tar:Ee0usTSDBLZjgjZ8Nk",
"darwin-amd64": "tar:G9ei3jf9weiq00ijvl",
"src": "tar:KE29VJDJKWlSiUoV9s"
},
"metadata": {
"anything": "goes here",
"semver": "2.0rc1",
"tracks": "nightly,beta,2.x"
},
"hazards": null,
},{
"name": "v1.1",
"items": {
"docs": "tar:iSJSiUoVi9KoQ0vE2",
"linux-amd64": "tar:BLZEe0usTSDjgjZ8N",
"darwin-amd64": "tar:weiG9ei3jf9q00ijv",
"src": "tar:KWlKE29VJDJSiUoV9"
},
"metadata": {
"anything": "you get the idea",
"semver": "1.1",
"tracks": "nightly,beta,stable,1.x"
},
"hazards": null,
}
]
}
This is a Lineage, & some Releases.
You can see the lineage's name, and the names of each release highlighted.
{ "name": "domain.org/team/project", "releases": [ { "name": "v2.0rc1", "items": { "docs": "tar:SiUoVi9KiSJoQ0vE29", "linux-amd64": "tar:Ee0usTSDBLZjgjZ8Nk", "darwin-amd64": "tar:G9ei3jf9weiq00ijvl", "src": "tar:KE29VJDJKWlSiUoV9s" }, "metadata": { "anything": "goes here", "semver": "2.0rc1", "tracks": "nightly,beta,2.x" }, "hazards": null, },{ "name": "v1.1", "items": { "docs": "tar:iSJSiUoVi9KoQ0vE2", "linux-amd64": "tar:BLZEe0usTSDjgjZ8N", "darwin-amd64": "tar:weiG9ei3jf9q00ijv", "src": "tar:KWlKE29VJDJSiUoV9" }, "metadata": { "anything": "you get the idea", "semver": "1.1", "tracks": "nightly,beta,stable,1.x" }, "hazards": null, } ] }
This is a Lineage, & some Releases.
The keys in "items" map lines up with the "exports" map keys from a module.
And each of values in this map is a filesystem hash ("WareID").
{ "name": "domain.org/team/project", "releases": [ { "name": "v2.0rc1", "items": { "docs": "tar:SiUoVi9KiSJoQ0vE29", "linux-amd64": "tar:Ee0usTSDBLZjgjZ8Nk", "darwin-amd64": "tar:G9ei3jf9weiq00ijvl", "src": "tar:KE29VJDJKWlSiUoV9s" }, "metadata": { "anything": "goes here", "semver": "2.0rc1", "tracks": "nightly,beta,2.x" }, "hazards": null, },{ "name": "v1.1", "items": { "docs": "tar:iSJSiUoVi9KoQ0vE2", "linux-amd64": "tar:BLZEe0usTSDjgjZ8N", "darwin-amd64": "tar:weiG9ei3jf9q00ijv", "src": "tar:KWlKE29VJDJSiUoV9" }, "metadata": { "anything": "you get the idea", "semver": "1.1", "tracks": "nightly,beta,stable,1.x" }, "hazards": null, } ] }
This is a Catalog.
When we gather a bunch of Lineages together, each representing a different project and its releases, we call this a Catalog.
It's useful to represent these as files, so we can vendor them into git repos.
But it could be an IPLD tree just as easily.
$ find -name lineage.tl ./catalog/timeless.polydawn.io/stellar/lineage.tl ./catalog/timeless.polydawn.io/heft/lineage.tl ./catalog/timeless.polydawn.io/runc/lineage.tl ./catalog/timeless.polydawn.io/hitch/lineage.tl ./catalog/timeless.polydawn.io/refmt/lineage.tl ./catalog/timeless.polydawn.io/repeatr/lineage.tl ./catalog/timeless.polydawn.io/rio/lineage.tl ./catalog/polydawn.io/monolith/busybash/lineage.tl ./catalog/polydawn.io/monolith/debian-gcc-plus/lineage.tl ./catalog/polydawn.io/monolith/debian/lineage.tl ./catalog/polydawn.io/monolith/minld/lineage.tl ./catalog/hyphae.polydawn.io/rust/lineage.tl ./catalog/hyphae.polydawn.io/debootstrap/lineage.tl ./catalog/hyphae.polydawn.io/go/lineage.tl ./catalog/hyphae.polydawn.io/sources/binutils/lineage.tl ./catalog/hyphae.polydawn.io/sources/gzip/lineage.tl
Okay, cool. Now we can describe "releases" of a ton of stuff.
And we can see how we would hash whole trees of this..!
The root hash of a Catalog pretty much describes the known universe!
... are we done yet?
Nope :D What wizardry can we do with this?
Remember Modules?
Module Imports used names.
Do you see a pattern?
These are the Lineage names.
Then the Version name.
Then the Item name.
{ "imports": { "base": "catalog:polydawn.io/busybash:v1:amd64" }, "steps": { "step-name": { "protoformula": { "inputs": { "/": "base" }, "action": { "exec": [ "/bin/bash", "-c", "echo hi | tee /task/out/file" ] }, "outputs": { "out": "/task/out" } } } }, "exports": { "export-label": "step-name.out" } }
Neighborhoods
pt II
If we put a Catalog together with a Module...
...united under a single merkle-tree (such as git)...
We've built a bigger "neighboorhood"!
The Module can resolve names deterministically, and be human-readable.
$ find -name lineage.tl ; find -name module.tl ; find -name .git ./.timeless/catalog/polydawn.io/busybash/lineage.tl ./.timeless/catalog/hyphae.polydawn.io/go/lineage.tl ./module.tl ./.git
$ cat ./module.tl | jq .imports { "base": "catalog:polydawn.io/busybash:v1:linux-amd64", "go": "catalog:hyphae.polydawn.io/go:v1.11:linux-amd64", "src": "ingest:git:.:HEAD" }
$ cat ./catalog/polydawn.io/busybash/lineage.tl | jq .releases[]|select(.name=="v1").items["linux-amd64"]
"tar:6q7G4hWr283FpTa5Lf8heVqw9t97b5"
Continuous Creation
Because Module Imports and Module Exports line up with Lineage names (and version names and item labels)....
This means we can directly create new content
without ever leaving the merkle forest.
{ "imports": { "base": "catalog:polydawn.io/busybash:v1:amd64" }, "steps": { "step-name": { "protoformula": { "inputs": { "/": "base" }, "action": { "exec": [ "/bin/bash", "-c", "echo hi | tee /task/out/file" ] }, "outputs": { "out": "/task/out" } } } }, "exports": { "export-label": "step-name.out" } }
Decentralized Publishing
Yes we can!
Pollinating Catalogs
Since Catalogs can just be represented as files...
We can sync them around with push/pull semantics, just like git.
We can even use git to do this (though in the long run, more native tools would be neater).
{
"name": "domain.org/team/project",
"releases": [
{
"name": "v2.0rc1",
"items": {
"docs": "tar:SiUoVi9KiSJoQ0vE29",
"linux-amd64": "tar:Ee0usTSDBLZjgjZ8Nk",
"darwin-amd64": "tar:G9ei3jf9weiq00ijvl",
"src": "tar:KE29VJDJKWlSiUoV9s"
},
"metadata": {
"anything": "goes here",
"semver": "2.0rc1",
"tracks": "nightly,beta,2.x"
},
"hazards": null,
},{
"name": "v1.1",
"items": {
"docs": "tar:iSJSiUoVi9KoQ0vE2",
"linux-amd64": "tar:BLZEe0usTSDjgjZ8N",
"darwin-amd64": "tar:weiG9ei3jf9q00ijv",
"src": "tar:KWlKE29VJDJSiUoV9"
},
"metadata": {
"anything": "you get the idea",
"semver": "1.1",
"tracks": "nightly,beta,stable,1.x"
},
"hazards": null,
}
]
}
Pollinating Catalogs
If we use push/pull semantics to pollinate around Catalog data, then we have "TOFU"-style semantics.
Can add signing too. But "TOFU" is pretty good!
{
"name": "domain.org/team/project",
"releases": [
{
"name": "v2.0rc1",
"items": {
"docs": "tar:SiUoVi9KiSJoQ0vE29",
"linux-amd64": "tar:Ee0usTSDBLZjgjZ8Nk",
"darwin-amd64": "tar:G9ei3jf9weiq00ijvl",
"src": "tar:KE29VJDJKWlSiUoV9s"
},
"metadata": {
"anything": "goes here",
"semver": "2.0rc1",
"tracks": "nightly,beta,2.x"
},
"hazards": null,
},{
"name": "v1.1",
"items": {
"docs": "tar:iSJSiUoVi9KoQ0vE2",
"linux-amd64": "tar:BLZEe0usTSDjgjZ8N",
"darwin-amd64": "tar:weiG9ei3jf9q00ijv",
"src": "tar:KWlKE29VJDJSiUoV9"
},
"metadata": {
"anything": "you get the idea",
"semver": "1.1",
"tracks": "nightly,beta,stable,1.x"
},
"hazards": null,
}
]
}
Is that enough?
What about discovery?
What about public notaries?
Rules for decentralized changes
Can we make a public, merkle-tree audit log which links to hashes of Lineages?
Absolutely.
Let's do it!
{
"name": "domain.org/team/project",
"releases": [
{
"name": "v2.0rc1",
"items": {
"docs": "tar:SiUoVi9KiSJoQ0vE29",
"linux-amd64": "tar:Ee0usTSDBLZjgjZ8Nk",
"darwin-amd64": "tar:G9ei3jf9weiq00ijvl",
"src": "tar:KE29VJDJKWlSiUoV9s"
},
"metadata": {
"anything": "goes here",
"semver": "2.0rc1",
"tracks": "nightly,beta,2.x"
},
"hazards": null,
},{
"name": "v1.1",
"items": {
"docs": "tar:iSJSiUoVi9KoQ0vE2",
"linux-amd64": "tar:BLZEe0usTSDjgjZ8N",
"darwin-amd64": "tar:weiG9ei3jf9q00ijv",
"src": "tar:KWlKE29VJDJSiUoV9"
},
"metadata": {
"anything": "you get the idea",
"semver": "1.1",
"tracks": "nightly,beta,stable,1.x"
},
"hazards": null,
}
]
}
Rules for decentralized changes
Can we make a rule that only additions of new versions are allowed, and other operations are invalid?
Absolutely. And anyone can check it.
{
"name": "domain.org/team/project",
"releases": [
{
"name": "v2.0rc1",
"items": {
"docs": "tar:SiUoVi9KiSJoQ0vE29",
"linux-amd64": "tar:Ee0usTSDBLZjgjZ8Nk",
"darwin-amd64": "tar:G9ei3jf9weiq00ijvl",
"src": "tar:KE29VJDJKWlSiUoV9s"
},
"metadata": {
"anything": "goes here",
"semver": "2.0rc1",
"tracks": "nightly,beta,2.x"
},
"hazards": null,
},{
"name": "v1.1",
"items": {
"docs": "tar:iSJSiUoVi9KoQ0vE2",
"linux-amd64": "tar:BLZEe0usTSDjgjZ8N",
"darwin-amd64": "tar:weiG9ei3jf9q00ijv",
"src": "tar:KWlKE29VJDJSiUoV9"
},
"metadata": {
"anything": "you get the idea",
"semver": "1.1",
"tracks": "nightly,beta,stable,1.x"
},
"hazards": null,
}
]
}
Rules for decentralized changes
This gets outside of the scope of this talk...
But here's some further resources:
"Verifiable Log-backed Datastructures"
(esp. see paper by Cutter, Laurie, et al)
The Most <3'd IPLD Features
Standardization &
Library support!
Writing custom hashers for every single class of object that I need to address is possible, but a PITA.
IPLD libraries can save me massive amounts of time!
Language Agnosticism
One of the major goals of this project is to produce API-driven systems that are language agnostic.
IPLD's pluggable serialization formats and language-agnostic schema type system are hugely awesome.
Schemas as Documentation
IPLD Schemas provide us a very satisfying place to attach documentation of our semantics -- and again, in an language agnostic way.
(Without IPLD Schemas, we would probably be attaching docs to our Golang implementation, but that's no fun for readers who aren't already Gophers...!)
Compact Representations
We want formulas, modules, and catalogs to be printable, and human-readable when printed.
Compact representations are critical to this
(ex: moduleImports representing as single-line strings instead of 5 or more lines of JSON struct).
Thanks, Schema Representation Strategies!
Convergence
Hashes are responsible for *application level semantics* in this project! We *need* them to converge!
IPLD gives us canonical forms, which removes a lot of the bikeshedding from how to hash things.
(Notable issue: we do still have to pick multicodecs, multihashes, and such values concretely -- our application doesn't work if we can't use hashes for cheap equality.)
Codegen
(Technically, more of a "nice to have", but is it ever!)
Writing IPLD Schemas and then having codegen produce matching Golang native types for them saves astronomical amounts of time.
Terse language-agnos schemas + well-typed native code (with autocompletion!) = <3
tada
https://repeatr.io | https://ipld.io |
---|---|
https://github.com/polydawn/timeless | https://github.com/ipld/specs |
timeless <3 ipld
By warpfork
timeless <3 ipld
- 425