The Archive and Package (arcp) URI scheme

Stian Soiland-Reyes

eScience lab, The University of Manchester

Workshop for Research Objects (RO2018),
IEEE eScience 2008, Amsterdam
2018-10-29

Findable

Accessible

Interoperable

Reusable

To be Findable:

F1. (meta)data are assigned a globally unique and persistent identifier

F2. data are described with rich metadata (defined by R1 below)

F3. metadata clearly and explicitly include the identifier of the data it describes

F4. (meta)data are registered or indexed in a searchable resource

To be Accessible:

A1. (meta)data are retrievable by their identifier using a standardized communications protocol

A1.1 the protocol is open, free, and universally implementable

A1.2 the protocol allows for an authentication and authorization procedure, where necessary

A2. metadata are accessible, even when the data are no longer available

To be Interoperable:

I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.

I2. (meta)data use vocabularies that follow FAIR principles

I3. (meta)data include qualified references to other (meta)data

To be Reusable:

R1. meta(data) are richly described with a plurality of accurate and relevant attributes

R1.1. (meta)data are released with a clear and accessible data usage license

R1.2. (meta)data are associated with detailed provenance

R1.3. (meta)data meet domain-relevant community standards

URI refresher

   A Uniform Resource Identifier (URI) is a  compact sequence of characters that identifies an abstract or physical resource.

Don't forget URI's sibling IRI
Internationalized Resource Identifiers
RFC3987

<scheme://authority/path/to/resource?query#fragment>

www.example.com

(typically DNS hostname)

http
https
ftp
file
...

If this is a URL, the scheme defines a protocol to resolve the resource

Specific for each content type

// means hierarchical

URI structure

4.2. Relative Reference

A relative reference takes advantage of the hierarchical syntax to express a URI reference relative to the name space of another hierarchical URI.

</path/to/resource>

<head>
  <title>The Archive and Package (arcp) URI scheme</title>
  <meta charset="utf-8" />
    <link href="css/basic.css" media="all" rel="stylesheet" />
    <link href="css/acm.css" media="all" rel="stylesheet alternate" />
    <link href="css/do.css" rel="stylesheet" media="all" />
    <link href="css/font-awesome.min.css" rel="stylesheet" media="all" />

    <script src="scripts/simplerdf.js"></script>
    <script src="scripts/medium-editor.min.js"></script>
    <script src="scripts/medium-editor-tables.min.js"></script>
    <script src="scripts/do.js"></script>

    
    <link href="https://doi.org/10.5281/zenodo.1320264" rel="cite-as" />    
    <link href="http://s11.no/2018/arcp.html" rel="canonical" type="text/html" />
    <link href="https://creativecommons.org/licenses/by/4.0/" rel="license" />

http://s11.no/2018/arcp.html

<head>
  <title>The Archive and Package (arcp) URI scheme</title>
  <meta charset="utf-8" />
    <link href="css/basic.css" media="all" rel="stylesheet" />
    <link href="css/font-awesome.min.css" rel="stylesheet" media="all" />

    <script src="scripts/do.js"></script>

h1 { font-size:16pt !important; }
h2 { font-size:14pt !important; }
...
(function webpackUniversalModuleDefinition(root, factory) {
	if(typeof exports === 'object' && typeof module === 'object')
		module.exports = factory(require("fetch"));
	else if(typeof define === 'function' && define.amd)
		define(["fetch"], factory);
	else if(typeof exports === 'object')
		exports["DO"] = factory(require("fetch"));
	else
/*!
 *  Font Awesome 4.7.0 by @davegandy - http://fontawesome.io - @fontawesome
 *  License - http://fontawesome.io/license (Font: SIL OFL 1.1, CSS: MIT License)
 */
@font-face{font-family:'FontAwesome';

Resolving from base URI

Absolute URI

http://s11.no/2018/arcp.html

<head>
  <title>The Archive and Package (arcp) URI scheme</title>
  <meta charset="utf-8" />
    <link href="css/basic.css" media="all" rel="stylesheet" />
    <link href="css/font-awesome.min.css" rel="stylesheet" media="all" />

    <script src="scripts/do.js"></script>
/*!
 *  Font Awesome 4.7.0 by @davegandy - http://fontawesome.io - @fontawesome
 *  License - http://fontawesome.io/license (Font: SIL OFL 1.1, CSS: MIT License)
 */
@font-face{
  font-family:'FontAwesome';
  font-weight: normal;
  font-style: normal
  src: url('../fonts/fontawesome-webfont.eot')   format('embedded-opentype'),
       url('../fonts/fontawesome-webfont.woff2') format('woff2'),
       url('../fonts/fontawesome-webfont.woff')  format('woff'),
       url('../fonts/fontawesome-webfont.ttf')   format('truetype'),
       url('../fonts/fontawesome-webfont.svg')   format('svg');
}
<svg>
<metadata>
Created by FontForge 20120731 at Mon Oct 24 17:37:40 2016
Copyright Dave Gandy 2016. All rights reserved.
</metadata>
<defs>
  <font id="FontAwesome" horiz-adv-x="1536">
    <font-face font-family="FontAwesome" font-weight="400" font-stretch="normal" units-per-em="1792" panose-1="0 0 0 0 0 0 0 0 0 0" ascent="1536" descent="-256" bbox="-1.02083 -256.962 2304.6 1537.02" underline-thickness="0" underline-position="0" unicode-range="U+0020-F500"/>
    <missing-glyph horiz-adv-x="896" d="M224 112h448v1312h-448v-1312zM112 0v1536h672v-1536h-672z"/><glyph glyph-name=".notdef" horiz-adv-x="896" d="M224 112h448v1312h-448v-1312zM112 0v1536h672v-1536h-672z"/>
    <glyph glyph-name=".null" horiz-adv-x="0"/>
    <glyph glyph-name="nonmarkingreturn" horiz-adv-x="597"/>
    <glyph glyph-name="space" unicode=" " horiz-adv-x="448"/>
    <glyph glyph-name="dieresis" unicode="¨" horiz-adv-x="1792"/>

<http://s11.no/2018/arcp.html>
  + <css/font-awesome.css>
  = <http://s11.no/2018/css/font-awesome.css>

<http://s11.no/2018/css/font-awesome.css>

  + <../fonts/fontawesome-webfont.svg>
  = <http://s11.no/2018/fonts/fontawesome-webfont.svg>

URI resolution as operations

<http://s11.no/2018/arcp.html>
  + <#ro>
  = <http://s11.no/2018/arcp.html#ro>

Relative #fragment

<http://s11.no/2018/arcp.html#ro>
  + <#article>
  = <http://s11.no/2018/arcp.html#article>

<http://s11.no/2018/arcp.html?t=20181028>
  + <#ro>
  = <http://s11.no/2018/arcp.html?t=20181028#ro>

Relative ?query

<http://s11.no/2018/arcp.html>
  + <?t=20181028>
  = <http://s11.no/2018/arcp.html?t=20181028>

<http://s11.no/2018/arcp.html>

  + <../fonts/fontawesome-webfont.svg>
  = <http://s11.no/2018/fonts/fontawesome-webfont.svg>

Relative /paths

<http://s11.no/2018/arcp.html>

  + </>
  = <http://s11.no/>

<http://s11.no/2018/arcp.html>

  + <cwl.html>
  = <http://s11.no/2018/cwl.html>

<http://s11.no/2018/arcp.html>

  + </2018/cwl.html>
  = <http://s11.no/2018/cwl.html>

Relative to "folder"

Relative to parent

Root

Relative to root

Relative //hosts

<http://s11.no/2018/arcp.html>

  + <//cdn.example.com/fontawesome.css>
  = <http://cdn.example.com/fontawesome.css>

<https://s11.no/2018/arcp.html>

  + <//cdn.example.com/fontawesome.css>
  = <https://cdn.example.com/fontawesome.css>

Uncertain destination?
Use relative URI references

<http://з11.ею/2018/arcp.html>

  + <#article>
  = <http://з11.ею/2018/arcp.html#article>

IRI!

Case Study

rohub.org

<file:///home/stain/.cache/.fr-ElVun8/.ro/manifest.rdf>

  + <../ce247caa-7fae-4126-af3a-d9008fcc315f.rdf>
  = <file:///home/stain/.cache/.fr-ElVun8/ce247caa-7fae-4126-af3a-d9008fcc315f.rdf>

Research Object manifest

(Sorry about the RDF/XML!)

<file:///home/stain/.cache/.fr-ElVun8/.ro/manifest.rdf>

  + <../ce247caa-7fae-4126-af3a-d9008fcc315f.rdf>
  = <file:///home/stain/.cache/.fr-ElVun8/ce247caa-7fae-4126-af3a-d9008fcc315f.rdf>

Consuming archives with "relativized" Linked Data

Parsing on command line

stain@biggie:~/.cache/.fr-ElVun8$ riot *rdf .ro/*rdf | \
                                  grep ROToolkit-ES-CR.pdf | \
                                  grep 'rdf-syntax-ns#type' | \
                                  riot --formatted=turtle
<file:///home/stain/.cache/.fr-ElVun8/ROToolkit-ES-CR.pdf>
        a    <http://purl.org/wf4ever/ro#Resource> , 
             <http://www.openarchives.org/ore/terms/AggregatedResource> , 
             <http://purl.org/wf4ever/roterms#Paper> , 
             <http://purl.org/dc/terms/BibliographicResource> .

<http://localhost:3030/ro/upload>

 + <ROtoolkit-ES-CR.pdf>
 = <http://localhost:3030/ro/ROtoolkit-ES-CR.pdf>

<http://localhost:3030/ro/upload>

 + <../ROtoolkit-ES-CR.pdf>
 = <http://localhost:3030/ROtoolkit-ES-CR.pdf

Setting the base URI

stain@biggie:~/.cache/.fr-ElVun8$ for r in *rdf .ro/*rdf ; do 
  base="http://example.com/ro/1337/$r"
  echo "## $base" 
  riot "--base=$base" "$r"
done 

## http://example.com/ro/1337/03b9c45b-cc44-4354-a593-8b5f089604d8.rdf
<http://example.com/ro/1337/03b9c45b-cc44-4354-a593-8b5f089604d8.rdf> <http://swrc.ontoware.org/ontology#keywords> " Earth Science" .
## http://example.com/ro/1337/04553fe2-658a-48a4-9ecb-daea4d7976fb.rdf
<http://example.com/ro/1337/04553fe2-658a-48a4-9ecb-daea4d7976fb.rdf> <http://w3id.org/ro/earth-science#distributionCategory> "Preprint" .
## http://example.com/ro/1337/26fb4b59-761c-4675-8c08-464fc7e0db1e.rdf
<http://example.com/ro/1337/ROHub-web-traffic-0318-0718.png> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/wf4ever/wf4ever#Image> .
## http://example.com/ro/1337/2e7d35fa-0eea-405f-9f3a-28c11ee8c5e3.rdf
<http://example.com/ro/1337/ROToolkit-ES-CR.pdf> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/dc/terms/BibliographicResource> .
## http://example.com/ro/1337/39b019cf-8041-47f3-a320-2487447f3ea7.rdf
<http://example.com/ro/1337/ROHub-portal.png> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/wf4ever/wf4ever#Image> .
## http://example.com/ro/1337/4f80a4c6-775b-4c5d-8ab5-524b459b4f87.rdf
<http://example.com/ro/1337/ROToolkit-ES-CR.zip> <http://purl.org/dc/terms/description> "HTML version of the paper" .
## http://example.com/ro/1337/575ef8e6-afe6-4200-ab87-7aed5d7815ec.rdf

...
## http://example.com/ro/1337/.ro/manifest.rdf
<http://example.com/ro/1337/> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
   <http://purl.org/wf4ever/ro#ResearchObject> .
<http://example.com/ro/1337/> <http://www.openarchives.org/ore/terms/aggregates>
   <http://example.com/ro/1337/ROToolkit-ES-CR.zip> .
<http://example.com/ro/1337/> <http://www.openarchives.org/ore/terms/aggregates>
   <http://example.com/ro/1337/ROToolkit-ES-CR.pdf> .
<http://example.com/ro/1337/> <http://www.openarchives.org/ore/terms/aggregates>
   <http://example.com/ro/1337/ROHub-web-traffic-0318-0718.png> .

Not that good
Base URLs

<file:///home/stain/.cache/.fr-ElVun8/data/survey.csv>

<file://s11.no/home/stain/ro/1337/data/survey.csv>

<file://1af95613-1163-46e7-ac9a-69a92af70920/data/survey.csv>

 

<http://example.com/ro/1337/>

<http://rohub.org/download/ro15.zip#data/survey.csv>

<http://1af95613-1163-46e7-ac9a-69a92af70920/data/survey.csv>

 

<jar:http://example.com/ro.zip!/data/survey.csv>

 

<arcp://prefix,namespace/path/to/resource>

Structure of arcp URIs

uuid
ni
name

Path from archive "root"
URI escape as needed

b82b3e69-b6ff-4940-b461-cfb089a13334

Generated from random generator

<arcp://uuid,32a423d6-52ab-47e3-a9cd-54f418a48571/>
+ <css/base.css>
= <arcp://uuid,32a423d6-52ab-47e3-a9cd-54f418a48571/css/base.css>
>>> uuid.uuid4()
UUID('32a423d6-52ab-47e3-a9cd-54f418a48571')

Always unique (UUID v4)

Hashed from archive download URL

arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/pics/flower.jpeg
>>> uuid.uuid5(uuid.NAMESPACE_URL, "http://example.com/data.zip")
UUID('b7749d0b-0e47-5fc4-999d-f154abe68065')

Location-based (UUID v5)

Location-independent archive identifier (BDBag)

>>> uuid.uuid5(uuid.NAMESPACE_URL, "http://identifiers.org/ark/ark:/57799/b91w9r")
UUID('4f11f216-e2dc-57cd-a714-300409a430ce')
stain@biggie:~$  sha256sum archive.zip
7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069

RFC6920 (Naming Thing with Hashes) URI

>>> urlsafe_b64encode("7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069"
     .decode("hex"))
'f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk='

Hash checksum of archive

ni:///sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk/
arcp://ni,sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk/src/luhn.c
stain@biggie:~$  sha256sum archive.zip
7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069
>>> urlsafe_b64encode("7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069"
     .decode("hex"))
'f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk='

Resolving NI URIs

ni:///sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk/
http://repo.example.com/.well-known/ni/sha-256/f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk/

Retrievable

ni://repo.example.com/sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk/

Verifiable

arcp://ni,sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk/src/luhn.c

arcp URI Python library

>>> from arcp import *

>>> arcp_random()
'arcp://uuid,dcd6b1e8-b3a2-43c9-930b-0119cf0dc538/'

>>> arcp_random("/foaf.ttl", fragment="me")
'arcp://uuid,dcd6b1e8-b3a2-43c9-930b-0119cf0dc538/foaf.ttl#me'

>>> arcp_hash(b"Hello World!", "/folder/")
'arcp://ni,sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk/folder/'

>>> arcp_location("http://example.com/data.zip", "/file.txt")
'arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/file.txt'
pip install arcp

Parsing arcp URIs

>>> is_arcp_uri("arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/file.txt")
True
>>> u = parse_arcp("arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/file.txt")
ARCPSplitResult(scheme='arcp',prefix='uuid',
  name='b7749d0b-0e47-5fc4-999d-f154abe68065',
  uuid='b7749d0b-0e47-5fc4-999d-f154abe68065',
  path='/file.txt',query='',fragment='')

>>> u.path
'/file.txt'
>>> u.prefix
'uuid'
>>> u.uuid
UUID('b7749d0b-0e47-5fc4-999d-f154abe68065')
>>> u.uuid.version
5

>>> parse_arcp("arcp://ni,sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk/folder/")
     .hash
('sha-256', '7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069')

What's next?

Collect feedback from community (you!)

Shrink scope?

Complete arcp support in taverna-robundle

Tool for processing linked data in archives?

Mature to RFC status