The Archive and Package (arcp) URI scheme
Stian Soiland-Reyes
eScience lab, The University of Manchester
Workshop for Research Objects (RO2018),
IEEE eScience 2008, Amsterdam
2018-10-29
This work is licensed under a
Creative Commons Attribution 4.0 International License.
Findable
Accessible
Interoperable
Reusable
To be Findable:
F1. (meta)data are assigned a globally unique and persistent identifier
F2. data are described with rich metadata (defined by R1 below)
F3. metadata clearly and explicitly include the identifier of the data it describes
F4. (meta)data are registered or indexed in a searchable resource
To be Accessible:
A1. (meta)data are retrievable by their identifier using a standardized communications protocol
A1.1 the protocol is open, free, and universally implementable
A1.2 the protocol allows for an authentication and authorization procedure, where necessary
A2. metadata are accessible, even when the data are no longer available
To be Interoperable:
I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles
I3. (meta)data include qualified references to other (meta)data
To be Reusable:
R1. meta(data) are richly described with a plurality of accurate and relevant attributes
R1.1. (meta)data are released with a clear and accessible data usage license
R1.2. (meta)data are associated with detailed provenance
R1.3. (meta)data meet domain-relevant community standards
URI refresher
A Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resource.
Don't forget URI's sibling IRI
Internationalized Resource Identifiers
RFC3987
<scheme://authority/path/to/resource?query#fragment>
www.example.com
(typically DNS hostname)
http https ftp file ...
If this is a URL, the scheme defines a protocol to resolve the resource
Specific for each content type
// means hierarchical
URI structure
4.2. Relative Reference
A relative reference takes advantage of the hierarchical syntax to express a URI reference relative to the name space of another hierarchical URI.
</path/to/resource>
<head>
<title>The Archive and Package (arcp) URI scheme</title>
<meta charset="utf-8" />
<link href="css/basic.css" media="all" rel="stylesheet" />
<link href="css/acm.css" media="all" rel="stylesheet alternate" />
<link href="css/do.css" rel="stylesheet" media="all" />
<link href="css/font-awesome.min.css" rel="stylesheet" media="all" />
<script src="scripts/simplerdf.js"></script>
<script src="scripts/medium-editor.min.js"></script>
<script src="scripts/medium-editor-tables.min.js"></script>
<script src="scripts/do.js"></script>
<link href="https://doi.org/10.5281/zenodo.1320264" rel="cite-as" />
<link href="http://s11.no/2018/arcp.html" rel="canonical" type="text/html" />
<link href="https://creativecommons.org/licenses/by/4.0/" rel="license" />
http://s11.no/2018/arcp.html
<head>
<title>The Archive and Package (arcp) URI scheme</title>
<meta charset="utf-8" />
<link href="css/basic.css" media="all" rel="stylesheet" />
<link href="css/font-awesome.min.css" rel="stylesheet" media="all" />
<script src="scripts/do.js"></script>
h1 { font-size:16pt !important; }
h2 { font-size:14pt !important; }
...
(function webpackUniversalModuleDefinition(root, factory) {
if(typeof exports === 'object' && typeof module === 'object')
module.exports = factory(require("fetch"));
else if(typeof define === 'function' && define.amd)
define(["fetch"], factory);
else if(typeof exports === 'object')
exports["DO"] = factory(require("fetch"));
else
/*!
* Font Awesome 4.7.0 by @davegandy - http://fontawesome.io - @fontawesome
* License - http://fontawesome.io/license (Font: SIL OFL 1.1, CSS: MIT License)
*/
@font-face{font-family:'FontAwesome';
Resolving from base URI
Absolute URI
http://s11.no/2018/arcp.html
<head>
<title>The Archive and Package (arcp) URI scheme</title>
<meta charset="utf-8" />
<link href="css/basic.css" media="all" rel="stylesheet" />
<link href="css/font-awesome.min.css" rel="stylesheet" media="all" />
<script src="scripts/do.js"></script>
/*!
* Font Awesome 4.7.0 by @davegandy - http://fontawesome.io - @fontawesome
* License - http://fontawesome.io/license (Font: SIL OFL 1.1, CSS: MIT License)
*/
@font-face{
font-family:'FontAwesome';
font-weight: normal;
font-style: normal
src: url('../fonts/fontawesome-webfont.eot') format('embedded-opentype'),
url('../fonts/fontawesome-webfont.woff2') format('woff2'),
url('../fonts/fontawesome-webfont.woff') format('woff'),
url('../fonts/fontawesome-webfont.ttf') format('truetype'),
url('../fonts/fontawesome-webfont.svg') format('svg');
}
<svg>
<metadata>
Created by FontForge 20120731 at Mon Oct 24 17:37:40 2016
Copyright Dave Gandy 2016. All rights reserved.
</metadata>
<defs>
<font id="FontAwesome" horiz-adv-x="1536">
<font-face font-family="FontAwesome" font-weight="400" font-stretch="normal" units-per-em="1792" panose-1="0 0 0 0 0 0 0 0 0 0" ascent="1536" descent="-256" bbox="-1.02083 -256.962 2304.6 1537.02" underline-thickness="0" underline-position="0" unicode-range="U+0020-F500"/>
<missing-glyph horiz-adv-x="896" d="M224 112h448v1312h-448v-1312zM112 0v1536h672v-1536h-672z"/><glyph glyph-name=".notdef" horiz-adv-x="896" d="M224 112h448v1312h-448v-1312zM112 0v1536h672v-1536h-672z"/>
<glyph glyph-name=".null" horiz-adv-x="0"/>
<glyph glyph-name="nonmarkingreturn" horiz-adv-x="597"/>
<glyph glyph-name="space" unicode=" " horiz-adv-x="448"/>
<glyph glyph-name="dieresis" unicode="¨" horiz-adv-x="1792"/>
<http://s11.no/2018/arcp.html>
+ <css/font-awesome.css>
= <http://s11.no/2018/css/font-awesome.css>
<http://s11.no/2018/css/font-awesome.css>
+ <../fonts/fontawesome-webfont.svg>
= <http://s11.no/2018/fonts/fontawesome-webfont.svg>
URI resolution as operations
<http://s11.no/2018/arcp.html>
+ <#ro>
= <http://s11.no/2018/arcp.html#ro>
Relative #fragment
<http://s11.no/2018/arcp.html#ro>
+ <#article>
= <http://s11.no/2018/arcp.html#article>
<http://s11.no/2018/arcp.html?t=20181028>
+ <#ro>
= <http://s11.no/2018/arcp.html?t=20181028#ro>
Relative ?query
<http://s11.no/2018/arcp.html>
+ <?t=20181028>
= <http://s11.no/2018/arcp.html?t=20181028>
<http://s11.no/2018/arcp.html>
+ <../fonts/fontawesome-webfont.svg>
= <http://s11.no/2018/fonts/fontawesome-webfont.svg>
Relative /paths
<http://s11.no/2018/arcp.html>
+ </>
= <http://s11.no/>
<http://s11.no/2018/arcp.html>
+ <cwl.html>
= <http://s11.no/2018/cwl.html>
<http://s11.no/2018/arcp.html>
+ </2018/cwl.html>
= <http://s11.no/2018/cwl.html>
Relative to "folder"
Relative to parent
Root
Relative to root
Relative //hosts
<http://s11.no/2018/arcp.html>
+ <//cdn.example.com/fontawesome.css>
= <http://cdn.example.com/fontawesome.css>
<https://s11.no/2018/arcp.html>
+ <//cdn.example.com/fontawesome.css>
= <https://cdn.example.com/fontawesome.css>
Uncertain destination?
Use relative URI references
<http://з11.ею/2018/arcp.html>
+ <#article>
= <http://з11.ею/2018/arcp.html#article>
IRI!
Case Study
rohub.org
<file:///home/stain/.cache/.fr-ElVun8/.ro/manifest.rdf>
+ <../ce247caa-7fae-4126-af3a-d9008fcc315f.rdf>
= <file:///home/stain/.cache/.fr-ElVun8/ce247caa-7fae-4126-af3a-d9008fcc315f.rdf>
Research Object manifest
(Sorry about the RDF/XML!)
<file:///home/stain/.cache/.fr-ElVun8/.ro/manifest.rdf>
+ <../ce247caa-7fae-4126-af3a-d9008fcc315f.rdf>
= <file:///home/stain/.cache/.fr-ElVun8/ce247caa-7fae-4126-af3a-d9008fcc315f.rdf>
Consuming archives with "relativized" Linked Data
Parsing on command line
stain@biggie:~/.cache/.fr-ElVun8$ riot *rdf .ro/*rdf | \
grep ROToolkit-ES-CR.pdf | \
grep 'rdf-syntax-ns#type' | \
riot --formatted=turtle
<file:///home/stain/.cache/.fr-ElVun8/ROToolkit-ES-CR.pdf>
a <http://purl.org/wf4ever/ro#Resource> ,
<http://www.openarchives.org/ore/terms/AggregatedResource> ,
<http://purl.org/wf4ever/roterms#Paper> ,
<http://purl.org/dc/terms/BibliographicResource> .
<http://localhost:3030/ro/upload>
+ <ROtoolkit-ES-CR.pdf>
= <http://localhost:3030/ro/ROtoolkit-ES-CR.pdf>
<http://localhost:3030/ro/upload>
+ <../ROtoolkit-ES-CR.pdf>
= <http://localhost:3030/ROtoolkit-ES-CR.pdf
Setting the base URI
stain@biggie:~/.cache/.fr-ElVun8$ for r in *rdf .ro/*rdf ; do
base="http://example.com/ro/1337/$r"
echo "## $base"
riot "--base=$base" "$r"
done
## http://example.com/ro/1337/03b9c45b-cc44-4354-a593-8b5f089604d8.rdf
<http://example.com/ro/1337/03b9c45b-cc44-4354-a593-8b5f089604d8.rdf> <http://swrc.ontoware.org/ontology#keywords> " Earth Science" .
## http://example.com/ro/1337/04553fe2-658a-48a4-9ecb-daea4d7976fb.rdf
<http://example.com/ro/1337/04553fe2-658a-48a4-9ecb-daea4d7976fb.rdf> <http://w3id.org/ro/earth-science#distributionCategory> "Preprint" .
## http://example.com/ro/1337/26fb4b59-761c-4675-8c08-464fc7e0db1e.rdf
<http://example.com/ro/1337/ROHub-web-traffic-0318-0718.png> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/wf4ever/wf4ever#Image> .
## http://example.com/ro/1337/2e7d35fa-0eea-405f-9f3a-28c11ee8c5e3.rdf
<http://example.com/ro/1337/ROToolkit-ES-CR.pdf> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/dc/terms/BibliographicResource> .
## http://example.com/ro/1337/39b019cf-8041-47f3-a320-2487447f3ea7.rdf
<http://example.com/ro/1337/ROHub-portal.png> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/wf4ever/wf4ever#Image> .
## http://example.com/ro/1337/4f80a4c6-775b-4c5d-8ab5-524b459b4f87.rdf
<http://example.com/ro/1337/ROToolkit-ES-CR.zip> <http://purl.org/dc/terms/description> "HTML version of the paper" .
## http://example.com/ro/1337/575ef8e6-afe6-4200-ab87-7aed5d7815ec.rdf
...
## http://example.com/ro/1337/.ro/manifest.rdf
<http://example.com/ro/1337/> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://purl.org/wf4ever/ro#ResearchObject> .
<http://example.com/ro/1337/> <http://www.openarchives.org/ore/terms/aggregates>
<http://example.com/ro/1337/ROToolkit-ES-CR.zip> .
<http://example.com/ro/1337/> <http://www.openarchives.org/ore/terms/aggregates>
<http://example.com/ro/1337/ROToolkit-ES-CR.pdf> .
<http://example.com/ro/1337/> <http://www.openarchives.org/ore/terms/aggregates>
<http://example.com/ro/1337/ROHub-web-traffic-0318-0718.png> .
Not that good
Base URLs
<file:///home/stain/.cache/.fr-ElVun8/data/survey.csv>
<file://s11.no/home/stain/ro/1337/data/survey.csv>
<file://1af95613-1163-46e7-ac9a-69a92af70920/data/survey.csv>
<http://example.com/ro/1337/>
<http://rohub.org/download/ro15.zip#data/survey.csv>
<http://1af95613-1163-46e7-ac9a-69a92af70920/data/survey.csv>
<jar:http://example.com/ro.zip!/data/survey.csv>
<arcp://prefix,namespace/path/to/resource>
Structure of arcp URIs
uuid ni name
Path from archive "root"
URI escape as needed
b82b3e69-b6ff-4940-b461-cfb089a13334
Generated from random generator
<arcp://uuid,32a423d6-52ab-47e3-a9cd-54f418a48571/>
+ <css/base.css>
= <arcp://uuid,32a423d6-52ab-47e3-a9cd-54f418a48571/css/base.css>
>>> uuid.uuid4()
UUID('32a423d6-52ab-47e3-a9cd-54f418a48571')
Always unique (UUID v4)
Hashed from archive download URL
arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/pics/flower.jpeg
>>> uuid.uuid5(uuid.NAMESPACE_URL, "http://example.com/data.zip")
UUID('b7749d0b-0e47-5fc4-999d-f154abe68065')
Location-based (UUID v5)
Location-independent archive identifier (BDBag)
>>> uuid.uuid5(uuid.NAMESPACE_URL, "http://identifiers.org/ark/ark:/57799/b91w9r")
UUID('4f11f216-e2dc-57cd-a714-300409a430ce')
stain@biggie:~$ sha256sum archive.zip
7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069
RFC6920 (Naming Thing with Hashes) URI
>>> urlsafe_b64encode("7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069"
.decode("hex"))
'f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk='
Hash checksum of archive
ni:///sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk/
arcp://ni,sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk/src/luhn.c
stain@biggie:~$ sha256sum archive.zip
7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069
>>> urlsafe_b64encode("7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069"
.decode("hex"))
'f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk='
Resolving NI URIs
ni:///sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk/
http://repo.example.com/.well-known/ni/sha-256/f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk/
Retrievable
ni://repo.example.com/sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk/
Verifiable
arcp://ni,sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk/src/luhn.c
arcp URI Python library
>>> from arcp import *
>>> arcp_random()
'arcp://uuid,dcd6b1e8-b3a2-43c9-930b-0119cf0dc538/'
>>> arcp_random("/foaf.ttl", fragment="me")
'arcp://uuid,dcd6b1e8-b3a2-43c9-930b-0119cf0dc538/foaf.ttl#me'
>>> arcp_hash(b"Hello World!", "/folder/")
'arcp://ni,sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk/folder/'
>>> arcp_location("http://example.com/data.zip", "/file.txt")
'arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/file.txt'
pip install arcp
Parsing arcp URIs
>>> is_arcp_uri("arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/file.txt")
True
>>> u = parse_arcp("arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/file.txt")
ARCPSplitResult(scheme='arcp',prefix='uuid',
name='b7749d0b-0e47-5fc4-999d-f154abe68065',
uuid='b7749d0b-0e47-5fc4-999d-f154abe68065',
path='/file.txt',query='',fragment='')
>>> u.path
'/file.txt'
>>> u.prefix
'uuid'
>>> u.uuid
UUID('b7749d0b-0e47-5fc4-999d-f154abe68065')
>>> u.uuid.version
5
>>> parse_arcp("arcp://ni,sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk/folder/")
.hash
('sha-256', '7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069')
What's next?
Collect feedback from community (you!)
Shrink scope?
Complete arcp support in taverna-robundle
Tool for processing linked data in archives?
Mature to RFC status
2018-09-29 arcp URI scheme
By Farah Z Khan
2018-09-29 arcp URI scheme
Presented at Workshop for Research Objects (RO2018), IEEE eScience 2008, Amsterdam Paper: http://s11.no/2018/arcp.html
- 931