Clearly, we've got some aims in common!
We'll talk about each of these in detail...
A formula represents a container to run.
The inputs map is {path:filesystemHash}.
The action is a command to run in the container.
The outputs map is filesystems to pack & hash.
{
"formula": {
"inputs": {
"/": "tar:6q7G4hWr283FpTa5Lf8heVqw9t97b5"
},
"action": {
"exec": ["/bin/mkdir", "-p", "/task/out/beep"]
},
"outputs": {
"/task/out": {"packtype": "tar"}
}
}
}
`repeatr` is a command in the Timeless Stack suite of tools which evaluates formulas.
When you run this, it'll:
$ repeatr run theformula.json
RunRecords are what `repeatr` emits when finished evaluating a Formula.
Check out the 'results': more filesystem hashes.
Check out 'formulaID': a strong link to the formula!
{
"guid": "cyrw3c3f-k9hag7xm-53wcy9b5",
"time": 1544875520,
"formulaID": "9mb9Nixx2M5FoxVJgQtYzn1QvtQdM1TZjZ",
"exitCode": 0,
"results": {
"/task/out": "tar:729LuUdChuu7traKQHNVAoWD9Ajmr"
}
}
Notice some fields don't converge here: the 'timestamp' and the 'guid'. This is on purpose!
Separate runs of a formula should produce separate results... even when the outputs are the same!
=> to attest reproducibility.
{
"guid": "cyrw3c3f-k9hag7xm-53wcy9b5",
"time": 1544875520,
"formulaID": "9mb9Nixx2M5FoxVJgQtYzn1QvtQdM1TZjZ",
"exitCode": 0,
"results": {
"/task/out": "tar:729LuUdChuu7traKQHNVAoWD9Ajmr"
}
}
RunRecords include the hashes of Formulas,
and
RunRecords and Formulas both include hashes of Wares.
"tar:6q7G4hWr283FpTa5Lf8heVqw9t97b5"
"git:48065b8b217aba443965a8fb065646f74a2b5ecf"
We have a process called `rio` which abstracts all this.
The word before the ":" is the keyword for selecting a which filesystem packing & unpacking plugin to use.
rio unpack tar:qweoiruqwpoeiru
rio pack tar ./path/to/filesystem
rio unpack git:f274ab4c3953b2dd2ef
We just have a couple of semantic needs:
Some of those needs are awfully interesting, though...
What's in a POSIX filesystem, anyway?
What's in a POSIX filesystem, anyway?
Turns out there's a couple things that you'll need --
like it or not, in practice, your container will not run if you can't represent these:
Here's hoping for UnixFSv2...?
enc.Step(&tok.Token{Type: tok.TMapOpen, Length: fieldCount})
// Name
enc.Step(&tok.Token{Type: tok.TString, Str: "n"})
enc.Step(&tok.Token{Type: tok.TString, Str: m.Name.Last()})
// Type
enc.Step(&tok.Token{Type: tok.TString, Str: "t"})
enc.Step(&tok.Token{Type: tok.TString, Str: string(m.Type)})
// Permission mode bits (this is presumed to already be basic perms (0777)
/// and setuid/setgid/sticky (07000) only, per fs.Metadata standard).
enc.Step(&tok.Token{Type: tok.TString, Str: "p"})
enc.Step(&tok.Token{Type: tok.TInt, Int: int64(m.Perms)})
// UID (numeric)
enc.Step(&tok.Token{Type: tok.TString, Str: "u"})
enc.Step(&tok.Token{Type: tok.TInt, Int: int64(m.Uid)})
// GID (numeric)
enc.Step(&tok.Token{Type: tok.TString, Str: "g"})
enc.Step(&tok.Token{Type: tok.TInt, Int: int64(m.Gid)})
// Skipped: size -- because that's fairly redundant
// Linkname, if it's a symlink
if m.Linkname != "" {
enc.Step(&tok.Token{Type: tok.TString, Str: "l"})
enc.Step(&tok.Token{Type: tok.TString, Str: m.Linkname})
}
// devMajor and devMinor numbers, if it's a device
if m.Type == fs.Type_Device || m.Type == fs.Type_CharDevice {
enc.Step(&tok.Token{Type: tok.TString, Str: "dM"})
enc.Step(&tok.Token{Type: tok.TInt, Int: m.Devmajor})
enc.Step(&tok.Token{Type: tok.TString, Str: "dm"})
enc.Step(&tok.Token{Type: tok.TInt, Int: m.Devminor})
}
// Modtime
enc.Step(&tok.Token{Type: tok.TString, Str: "m"})
enc.Step(&tok.Token{Type: tok.TInt, Int: m.Mtime.Unix()})
enc.Step(&tok.Token{Type: tok.TString, Str: "mn"})
enc.Step(&tok.Token{Type: tok.TInt, Int: int64(m.Mtime.Nanosecond())})
You can see a snippet of the hashing code we use over to the right.
If IPLD proposes a UnixFSv2 standard for canonical filesystem content hashing which worked over the tar format as well as IPFS, that'd be a m a z i n g.
"Formulas" already showed us how to represent and thus hash a spec of a single computation.
Now suppose we want to string bigger things together... and feed results of one computation into inputs of another!
We're using names here!
In the middle you can see a "proto-formula". It will be templated into a real formula with hashes -- which can then be run.
Don't mind the "imports" and "exports" for now... more on that in a minute.
{
"imports": {
"base": "catalog:polydawn.io/busybash:v1:amd64"
},
"steps": {
"step-name": {
"protoformula": {
"inputs": {
"/": "base"
},
"action": {
"exec": [
"/bin/bash", "-c",
"echo hi | tee /task/out/file"
]
},
"outputs": {
"out": "/task/out"
}
}
}
},
"exports": {
"export-label": "step-name.out"
}
}
We're using names here!
You can see how data is wired through.
This is a very simple example; more "steps" can be added to the map. Sizable dependency graphs can be wired together with these names.
{ "imports": { "base": "catalog:polydawn.io/busybash:v1:amd64" }, "steps": { "step-name": { "protoformula": { "inputs": { "/": "base" }, "action": { "exec": [ "/bin/bash", "-c", "echo hi | tee /task/out/file" ] }, "outputs": { "out": "/task/out" } } } }, "exports": { "export-label": "step-name.out" } }
In other parts of the design, hashes are preferred because they're unambiguous.
Names work here because the scope is limited: only refer to other things in the "neighborhood".
The "neighborhood" is all in one document... covered by one hash.
{ "imports": { "base": "catalog:polydawn.io/busybash:v1:amd64" }, "steps": { "step-name": { "protoformula": { "inputs": { "/": "base" }, "action": { "exec": [ "/bin/bash", "-c", "echo hi | tee /task/out/file" ] }, "outputs": { "out": "/task/out" } } } }, "exports": { "export-label": "step-name.out" } }
So this is how we can do pipelines of computations.
But the "neighborhood" here is still pretty smol.
What if we want to coordinate even bigger things?
And share them?
We have yet more stuff to hash ;)
This document maps human-readable names onto the hashes of filesystems.
When we build stuff (using a module), we can release it by making a document like this.
{
"name": "domain.org/team/project",
"releases": [
{
"name": "v2.0rc1",
"items": {
"docs": "tar:SiUoVi9KiSJoQ0vE29",
"linux-amd64": "tar:Ee0usTSDBLZjgjZ8Nk",
"darwin-amd64": "tar:G9ei3jf9weiq00ijvl",
"src": "tar:KE29VJDJKWlSiUoV9s"
},
"metadata": {
"anything": "goes here",
"semver": "2.0rc1",
"tracks": "nightly,beta,2.x"
},
"hazards": null,
},{
"name": "v1.1",
"items": {
"docs": "tar:iSJSiUoVi9KoQ0vE2",
"linux-amd64": "tar:BLZEe0usTSDjgjZ8N",
"darwin-amd64": "tar:weiG9ei3jf9q00ijv",
"src": "tar:KWlKE29VJDJSiUoV9"
},
"metadata": {
"anything": "you get the idea",
"semver": "1.1",
"tracks": "nightly,beta,stable,1.x"
},
"hazards": null,
}
]
}
You can see the lineage's name, and the names of each release highlighted.
{ "name": "domain.org/team/project", "releases": [ { "name": "v2.0rc1", "items": { "docs": "tar:SiUoVi9KiSJoQ0vE29", "linux-amd64": "tar:Ee0usTSDBLZjgjZ8Nk", "darwin-amd64": "tar:G9ei3jf9weiq00ijvl", "src": "tar:KE29VJDJKWlSiUoV9s" }, "metadata": { "anything": "goes here", "semver": "2.0rc1", "tracks": "nightly,beta,2.x" }, "hazards": null, },{ "name": "v1.1", "items": { "docs": "tar:iSJSiUoVi9KoQ0vE2", "linux-amd64": "tar:BLZEe0usTSDjgjZ8N", "darwin-amd64": "tar:weiG9ei3jf9q00ijv", "src": "tar:KWlKE29VJDJSiUoV9" }, "metadata": { "anything": "you get the idea", "semver": "1.1", "tracks": "nightly,beta,stable,1.x" }, "hazards": null, } ] }
The keys in "items" map lines up with the "exports" map keys from a module.
And each of values in this map is a filesystem hash ("WareID").
{ "name": "domain.org/team/project", "releases": [ { "name": "v2.0rc1", "items": { "docs": "tar:SiUoVi9KiSJoQ0vE29", "linux-amd64": "tar:Ee0usTSDBLZjgjZ8Nk", "darwin-amd64": "tar:G9ei3jf9weiq00ijvl", "src": "tar:KE29VJDJKWlSiUoV9s" }, "metadata": { "anything": "goes here", "semver": "2.0rc1", "tracks": "nightly,beta,2.x" }, "hazards": null, },{ "name": "v1.1", "items": { "docs": "tar:iSJSiUoVi9KoQ0vE2", "linux-amd64": "tar:BLZEe0usTSDjgjZ8N", "darwin-amd64": "tar:weiG9ei3jf9q00ijv", "src": "tar:KWlKE29VJDJSiUoV9" }, "metadata": { "anything": "you get the idea", "semver": "1.1", "tracks": "nightly,beta,stable,1.x" }, "hazards": null, } ] }
When we gather a bunch of Lineages together, each representing a different project and its releases, we call this a Catalog.
It's useful to represent these as files, so we can vendor them into git repos.
But it could be an IPLD tree just as easily.
$ find -name lineage.tl ./catalog/timeless.polydawn.io/stellar/lineage.tl ./catalog/timeless.polydawn.io/heft/lineage.tl ./catalog/timeless.polydawn.io/runc/lineage.tl ./catalog/timeless.polydawn.io/hitch/lineage.tl ./catalog/timeless.polydawn.io/refmt/lineage.tl ./catalog/timeless.polydawn.io/repeatr/lineage.tl ./catalog/timeless.polydawn.io/rio/lineage.tl ./catalog/polydawn.io/monolith/busybash/lineage.tl ./catalog/polydawn.io/monolith/debian-gcc-plus/lineage.tl ./catalog/polydawn.io/monolith/debian/lineage.tl ./catalog/polydawn.io/monolith/minld/lineage.tl ./catalog/hyphae.polydawn.io/rust/lineage.tl ./catalog/hyphae.polydawn.io/debootstrap/lineage.tl ./catalog/hyphae.polydawn.io/go/lineage.tl ./catalog/hyphae.polydawn.io/sources/binutils/lineage.tl ./catalog/hyphae.polydawn.io/sources/gzip/lineage.tl
Okay, cool. Now we can describe "releases" of a ton of stuff.
And we can see how we would hash whole trees of this..!
The root hash of a Catalog pretty much describes the known universe!
... are we done yet?
Nope :D What wizardry can we do with this?
Module Imports used names.
Do you see a pattern?
These are the Lineage names.
Then the Version name.
Then the Item name.
{ "imports": { "base": "catalog:polydawn.io/busybash:v1:amd64" }, "steps": { "step-name": { "protoformula": { "inputs": { "/": "base" }, "action": { "exec": [ "/bin/bash", "-c", "echo hi | tee /task/out/file" ] }, "outputs": { "out": "/task/out" } } } }, "exports": { "export-label": "step-name.out" } }
If we put a Catalog together with a Module...
...united under a single merkle-tree (such as git)...
We've built a bigger "neighboorhood"!
The Module can resolve names deterministically, and be human-readable.
$ find -name lineage.tl ; find -name module.tl ; find -name .git ./.timeless/catalog/polydawn.io/busybash/lineage.tl ./.timeless/catalog/hyphae.polydawn.io/go/lineage.tl ./module.tl ./.git
$ cat ./module.tl | jq .imports { "base": "catalog:polydawn.io/busybash:v1:linux-amd64", "go": "catalog:hyphae.polydawn.io/go:v1.11:linux-amd64", "src": "ingest:git:.:HEAD" }
$ cat ./catalog/polydawn.io/busybash/lineage.tl | jq .releases[]|select(.name=="v1").items["linux-amd64"]
"tar:6q7G4hWr283FpTa5Lf8heVqw9t97b5"
Because Module Imports and Module Exports line up with Lineage names (and version names and item labels)....
This means we can directly create new content
without ever leaving the merkle forest.
{ "imports": { "base": "catalog:polydawn.io/busybash:v1:amd64" }, "steps": { "step-name": { "protoformula": { "inputs": { "/": "base" }, "action": { "exec": [ "/bin/bash", "-c", "echo hi | tee /task/out/file" ] }, "outputs": { "out": "/task/out" } } } }, "exports": { "export-label": "step-name.out" } }
Since Catalogs can just be represented as files...
We can sync them around with push/pull semantics, just like git.
We can even use git to do this (though in the long run, more native tools would be neater).
{
"name": "domain.org/team/project",
"releases": [
{
"name": "v2.0rc1",
"items": {
"docs": "tar:SiUoVi9KiSJoQ0vE29",
"linux-amd64": "tar:Ee0usTSDBLZjgjZ8Nk",
"darwin-amd64": "tar:G9ei3jf9weiq00ijvl",
"src": "tar:KE29VJDJKWlSiUoV9s"
},
"metadata": {
"anything": "goes here",
"semver": "2.0rc1",
"tracks": "nightly,beta,2.x"
},
"hazards": null,
},{
"name": "v1.1",
"items": {
"docs": "tar:iSJSiUoVi9KoQ0vE2",
"linux-amd64": "tar:BLZEe0usTSDjgjZ8N",
"darwin-amd64": "tar:weiG9ei3jf9q00ijv",
"src": "tar:KWlKE29VJDJSiUoV9"
},
"metadata": {
"anything": "you get the idea",
"semver": "1.1",
"tracks": "nightly,beta,stable,1.x"
},
"hazards": null,
}
]
}
If we use push/pull semantics to pollinate around Catalog data, then we have "TOFU"-style semantics.
Can add signing too. But "TOFU" is pretty good!
{
"name": "domain.org/team/project",
"releases": [
{
"name": "v2.0rc1",
"items": {
"docs": "tar:SiUoVi9KiSJoQ0vE29",
"linux-amd64": "tar:Ee0usTSDBLZjgjZ8Nk",
"darwin-amd64": "tar:G9ei3jf9weiq00ijvl",
"src": "tar:KE29VJDJKWlSiUoV9s"
},
"metadata": {
"anything": "goes here",
"semver": "2.0rc1",
"tracks": "nightly,beta,2.x"
},
"hazards": null,
},{
"name": "v1.1",
"items": {
"docs": "tar:iSJSiUoVi9KoQ0vE2",
"linux-amd64": "tar:BLZEe0usTSDjgjZ8N",
"darwin-amd64": "tar:weiG9ei3jf9q00ijv",
"src": "tar:KWlKE29VJDJSiUoV9"
},
"metadata": {
"anything": "you get the idea",
"semver": "1.1",
"tracks": "nightly,beta,stable,1.x"
},
"hazards": null,
}
]
}
Is that enough?
What about discovery?
What about public notaries?
Can we make a public, merkle-tree audit log which links to hashes of Lineages?
Absolutely.
Let's do it!
{
"name": "domain.org/team/project",
"releases": [
{
"name": "v2.0rc1",
"items": {
"docs": "tar:SiUoVi9KiSJoQ0vE29",
"linux-amd64": "tar:Ee0usTSDBLZjgjZ8Nk",
"darwin-amd64": "tar:G9ei3jf9weiq00ijvl",
"src": "tar:KE29VJDJKWlSiUoV9s"
},
"metadata": {
"anything": "goes here",
"semver": "2.0rc1",
"tracks": "nightly,beta,2.x"
},
"hazards": null,
},{
"name": "v1.1",
"items": {
"docs": "tar:iSJSiUoVi9KoQ0vE2",
"linux-amd64": "tar:BLZEe0usTSDjgjZ8N",
"darwin-amd64": "tar:weiG9ei3jf9q00ijv",
"src": "tar:KWlKE29VJDJSiUoV9"
},
"metadata": {
"anything": "you get the idea",
"semver": "1.1",
"tracks": "nightly,beta,stable,1.x"
},
"hazards": null,
}
]
}
Can we make a rule that only additions of new versions are allowed, and other operations are invalid?
Absolutely. And anyone can check it.
{
"name": "domain.org/team/project",
"releases": [
{
"name": "v2.0rc1",
"items": {
"docs": "tar:SiUoVi9KiSJoQ0vE29",
"linux-amd64": "tar:Ee0usTSDBLZjgjZ8Nk",
"darwin-amd64": "tar:G9ei3jf9weiq00ijvl",
"src": "tar:KE29VJDJKWlSiUoV9s"
},
"metadata": {
"anything": "goes here",
"semver": "2.0rc1",
"tracks": "nightly,beta,2.x"
},
"hazards": null,
},{
"name": "v1.1",
"items": {
"docs": "tar:iSJSiUoVi9KoQ0vE2",
"linux-amd64": "tar:BLZEe0usTSDjgjZ8N",
"darwin-amd64": "tar:weiG9ei3jf9q00ijv",
"src": "tar:KWlKE29VJDJSiUoV9"
},
"metadata": {
"anything": "you get the idea",
"semver": "1.1",
"tracks": "nightly,beta,stable,1.x"
},
"hazards": null,
}
]
}
This gets outside of the scope of this talk...
But here's some further resources:
"Verifiable Log-backed Datastructures"
(esp. see paper by Cutter, Laurie, et al)
Writing custom hashers for every single class of object that I need to address is possible, but a PITA.
IPLD libraries can save me massive amounts of time!
One of the major goals of this project is to produce API-driven systems that are language agnostic.
IPLD's pluggable serialization formats and language-agnostic schema type system are hugely awesome.
IPLD Schemas provide us a very satisfying place to attach documentation of our semantics -- and again, in an language agnostic way.
(Without IPLD Schemas, we would probably be attaching docs to our Golang implementation, but that's no fun for readers who aren't already Gophers...!)
We want formulas, modules, and catalogs to be printable, and human-readable when printed.
Compact representations are critical to this
(ex: moduleImports representing as single-line strings instead of 5 or more lines of JSON struct).
Thanks, Schema Representation Strategies!
Hashes are responsible for *application level semantics* in this project! We *need* them to converge!
IPLD gives us canonical forms, which removes a lot of the bikeshedding from how to hash things.
(Notable issue: we do still have to pick multicodecs, multihashes, and such values concretely -- our application doesn't work if we can't use hashes for cheap equality.)
(Technically, more of a "nice to have", but is it ever!)
Writing IPLD Schemas and then having codegen produce matching Golang native types for them saves astronomical amounts of time.
Terse language-agnos schemas + well-typed native code (with autocompletion!) = <3
https://repeatr.io | https://ipld.io |
---|---|
https://github.com/polydawn/timeless | https://github.com/ipld/specs |