persistent identifiers (pids)

good morning. i hope you're caffeinated.

mike nason
open scholarship & publishing librarian @ unb libraries
crossref & metadata liaison @ pkp

i'd like for you to keep a thought on the backburner.

pids are in the drinking water of scholarly publishing

i'll come back to this.

let's start small.

the doi

dois

dois are ubiquitous. we see them all over the place:

in references/bibliographies
on article/journal websites
in repositories
on published datasets
links on twitter or researchgate or academia dot edu or, or, or...

and, we probably know one handy thing about them:

if you click on a doi that looks like a link, it will take you to the thing.

dois

dois are the most prominent persistent identifier.

they are also, arguably, the most important persistent identifier.

how a doi works

a doi is made up of two chunks. a prefix, and a suffix.

prefix

10.4324

suffix
9780203051238-5

together, these make the doi 10.4324/9780203051238-5

how a doi works

prefixes and suffixes mean different things.

prefix

10.4324

a prefix is usually associated with a publisher or organization. dois for that organization will usually have the same prefix.

suffix
9780203051238-5

a suffix is meant to be a machine-readable (not human-readable), opaque, unique string that is specific to the singular work to which it is assigned.

how a doi works

if i prepend a doi with https://doi.org/, it turns into a url.

https://doi.org/10.4324/9780203051238-5

clicking this will redirect me to the publication this doi is associated with. the process of a doi redirecting you to a publication is called resolution.

what's "resolution"?

how doi resolution works

dois aren't just like a bit.ly link or tiny.url. if you're not familiar with these services, they will swap out a very large and unwieldy link so you can share something that isn't enormous.

for example: https://bit.ly/35LaEXj
this is a bit.ly link for this talk

the actual url for this talk is: https://slides.com/ahemnason/persistent-identifiers-pids-and-open-scholarly-infrastructure/

bit.ly and tiny.url are both basic redirects.

how doi resolution works

a doi, though, is a lot more than a redirect. a doi is a reference to an entire publication record. that publication record is full of metadata.

one of these metadata elements is the publication's url.

when you resolve a doi by clicking on it:

the record is accessed
the stored url is retrieved
you are sent to the stored url

the url can be updated by the publisher. the doi stays the same.

<?xml version="1.0" encoding="UTF-8"?>
<crossref_result version="3.0" xsi:schemaLocation="http://www.crossref.org/qrschema/3.0 http://www.crossref.org/schemas/crossref_query_output3.0.xsd">
  <query_result>
    <head>
      <doi_batch_id>none</doi_batch_id>
    </head>
    <body>
      <query status="resolved">
        <doi type="book_content">10.4324/9780203051238-5</doi>
        <crm-item name="publisher-name" type="string">Informa UK Limited</crm-item>
        <crm-item name="prefix-name" type="string">Informa UK (Routledge)</crm-item>
        <crm-item name="member-id" type="number">301</crm-item>
        <crm-item name="citation-id" type="number">122425695</crm-item>
        <crm-item name="book-id" type="number">1477192</crm-item>
        <crm-item name="deposit-timestamp" type="number">2020122110554080199</crm-item>
        <crm-item name="owner-prefix" type="string">10.4324</crm-item>
        <crm-item name="last-update" type="date">2020-12-21T15:07:00Z</crm-item>
        <crm-item name="created" type="date">2020-12-21T15:06:59Z</crm-item>
        <crm-item name="citedby-count" type="number">0</crm-item>
        <doi_record>
          <crossref xsi:schemaLocation="http://www.crossref.org/xschema/1.1 http://doi.crossref.org/schemas/unixref1.1.xsd">
            <book book_type="other">
              <book_metadata language="en">
                <contributors>
                  <person_name sequence="first" contributor_role="author">
                    <given_name>Richard</given_name>
                    <surname>Smiraglia</surname>
                  </person_name>
                </contributors>
                <titles>
                  <title>Metadata</title>
                  <subtitle>A Cataloger's Primer</subtitle>
                </titles>
                <edition_number>0</edition_number>
                <publication_date media_type="online">
                  <month>11</month>
                  <day>12</day>
                  <year>2012</year>
                </publication_date>
                <isbn media_type="electronic">9780203051238</isbn>
                <publisher>
                  <publisher_name>Routledge</publisher_name>
                </publisher>
                <doi_data>
                  <doi>10.4324/9780203051238</doi>
                  <timestamp>2020122110554078499</timestamp>
                  <resource>https://www.taylorfrancis.com/books/9781136435843</resource>
                </doi_data>
              </book_metadata>
              <content_item component_type="chapter" publication_type="full_text" language="en">
                <titles>
                  <title>Understanding Metadata and Metadata Schemes</title>
                </titles>
                <publication_date>
                  <year>2012</year>
                  <month>11</month>
                  <day>12</day>
                </publication_date>
                <pages>
                  <first_page>25</first_page>
                  <last_page>44</last_page>
                </pages>
                <doi_data>
                  <doi>10.4324/9780203051238-5</doi>
                  <timestamp>2020122110554080199</timestamp>
                  <resource>https://www.taylorfrancis.com/books/9781136435843/chapters/10.4324/9780203051238-5</resource>
                </doi_data>
              </content_item>
            </book>
          </crossref>
        </doi_record>
      </query>
    </body>
  </query_result>
</crossref_result>

i'm sorry to do this to you.

there's a lot of information here:

publisher
deposit and update timestamp
book type
contributors (first, role=author)
title and subtitle
publication date
doi for the book
link for the book
chapter title
doi for the chapter
link for the chapter

<doi_data>
  <doi>10.4324/9780203051238</doi>
  <timestamp>2020122110554078499</timestamp>
  <resource>https://www.taylorfrancis.com/books/9781136435843</resource>
</doi_data>
</book_metadata>
<content_item component_type="chapter" publication_type="full_text" language="en">
  <titles>
    <title>Understanding Metadata and Metadata Schemes</title>
  </titles>
  <publication_date>
    <year>2012</year>
    <month>11</month>
    <day>12</day>
  </publication_date>
  <pages>
    <first_page>25</first_page>
    <last_page>44</last_page>
  </pages>
  <doi_data>
    <doi>10.4324/9780203051238-5</doi>
    <timestamp>2020122110554080199</timestamp>
    <resource>https://www.taylorfrancis.com/books/9781136435843/chapters/10.4324/9780203051238-5</resource>
  </doi_data>
  

<!-- urls are part of the metadata of a doi. -->
<!-- when you change the location of content, you update your doi with the new location. everyone who uses the doi gets to the content no matter where you put it, so long as that doi is updated. this means, the doi is persistent.-->

neat!

surely, there is more to it?

🤔

congratulations, you now know more about dois than a frankly surprising amount of people.

and, by extension, you now know more about pids than a frankly surprising amount of people.

let's step back a bit

what a pid is

an identifier // a label which gives a unique name/label to an entity: a person, place, or thing.

persistent // long-lasting

dois are identifiers

doi is an acroynm for:

Digital
Object
Identifier

dois are used for articles, datasets, issues, journals, galleys, preprints, theses, proceedings, monographs, reports, standards... "publications", basically

what an identifier is

an identifier is a unique string of characters assigned to something, someplace, or someone that can be used to identify it.

social insurance number
driver's license number
medicare number
license plate number
student number

we are assigned identifiers all the time. we (ideally) carry "ID".

what an identifier is

identifiers typically refer to physical objects and are often created/managed locally. they're useful for record-keeping and data retrieval/searchability. they're useful for disambiguation, too!

unb student id is only a useful label for unb students.
social insurance numbers are provided federally.
medicare numbers are provided provincially.
a license plate is only a label for a current, registered car.

there is more than one mike nason in canada, but only one of them has my social insurance number (i hope).

pids share the same benefits! a doi is good for disambiguation, data retrieval, searchability in the same way that a social insurance number is. like if, uhh, every article published had its own little tiny registration with a government.

we've scratched the surface a bit on what a doi does, but url storage and redirection is just one benefit for one kind of pid.

identifiers

**ok, so what about persistence?**

**what a persistent identifier is**

persistent identifiers most frequently refer to digital things. traditionally, we share or locate digital things using a link (a url).

url // uniform resource locator
(https://www.example.com/index.html)

we know that urls break all the time, for lots of different reasons.

what a persistent identifier does

a url can tell you where something was when you read it. if you bookmark that url or put it in print, you're assuming that it will still work later. this is not guaranteed!

but, as we discussed earlier, using dois i can update the location if the content moves. i can provide a persistent link to a record that contains a url.

the id is persistent. where it directs me may change.
a doi is persistent. where the doi resolves may change.

what a persistent identifier does

so, imagine finding the citation for this work in a bibliography... which of these two will be more useful if the content moves from the website it is currently on?

Smiraglia, R. (2005). Metadata: A Cataloger's Primer (1st ed.). Routledge. https://www.taylorfrancis.com/books/mono/10.4324/9780203051238/metadata-richard-smiraglia

Smiraglia, R. (2005). Metadata: A Cataloger's Primer (1st ed.). Routledge. https://doi.org/10.4324/9780203051238

what about pids for things that aren't publications?

i was told there would be "personal identifiers"

orcid!

what orcid is

ORCID stands for "open researcher and contributor id"

ORCID is also the name of the not-for-profit organization that provides ORCID IDs, maintains the service, and develops the website and API

what an orcid id is

orcid ids are a kind of pid.

what an orcid is

first and foremost, orcid ids help consistently and properly identify the authors of works no matter what their name is, was, or will be

nearly every publisher can take an orcid id as metadata associated with a publication

and! orcids are included as metadata in dois! this means your identity in publication metadata can be unambiguous

for example

i might write my own name as:

Mike Nason
Michael Nason
Michael Thomas William Nason
M. Nason
mnason
ahemnason

and! there may be more than one of any of these! the more variations and folks with the same name there are, the harder it is to find the stuff i've done. a pid for people would make attribution and discovery easier.

for example

i might write my own name as:

Mike Nason
Michael Nason
Michael Thomas William Nason
M. Nason
mnason
ahemnason

https://orcid.org/0000-0001-5527-8489

orcid also provides users with what's known as an orcid profile

when you hear someone talk about a scholar profile or a researcher profile, they are probably talking about orcid, scopus id, researcher id, or google scholar (but they might be talking about something else entirely)

i've done a handful of talks on this, links for them are on the next slide

UNB Libraries Research Booster 1

video | deck

(bonus) CRKN PIDs Series: Object Identifiers: Use Cases for Librarians and Data Professionals

video | deck

and so...

what a persistent identifier does

pids make things easier to find, track, share, and access!

if my orcid id is present as metadata in the dois of the work i publish, i can pull my publication record easily and add it to my orcid profile

if my articles have dois, i can provide persistent links to their most recent location, which will ensure ease of access and citation

if a funding agency can pull metadata from my orcid profile, they can acquire all of my publication metadata without me having to fill out a pile of forms

and so, "persistent identifier"

a unique, unchanging, identifier representing an object (digital or otherwise) that can direct a user, unambiguously, to that object's current location.

a pid cannot just do this inherently, though...

pids require a third party

often, folks use the phrase "minting a doi" to describe the assignment of a doi to a work. i see this a lot. a journal editor might say to me, "i made all these dois but they don't work! i just get an error!"

a publisher can mint pids and provide them to you, but they need a third party to be at all useful.

to work, a pid needs to be registered with a pid registration agency.

what's a registration agency?

registration agencies

persistent identifiers are managed by registration agencies (typically international not-for-profits) that store records/metadata, facilitate resolution requests, and may or may not offer other services based on membership. they do much of this through APIs.

there are a lot of registration agencies!

registration agencies

it's important to know that registration agencies differ in mandate, governance, scope, service, supported objects, membership terms, and feature set.

they also, often, work together and share data.

let's review
the field

pids for scholarly works

Crossref (DOI)

most scholarly publishers are crossref members. at the time i wrote this (1pm, april 6th) crossref had 134,294,189 dois registered with their service.

crossref are a big deal.

articles
proceedings
monographs
*datasets
funding agencies
grants
reports
standards
preprints

pids for scholarly works

Datacite (DOI)

while some scholarly publishers use datacite for article dois, it is much more commonly used in data/institutional/disciplinary repositories.

datacite and crossref work together to connect research data to publications.

software
datasets
collections
audio/visual
event
model
*publications

pids for researchers

ORCID (ISNE)
Scopus ID
WoS Researcher ID

orcid are the go-to here, with scopus and wos offerings both restricted to publications present on those platforms. however, these services can share data between them.

researchers

pids for organizations

ROR
GRID
ISNE

there are good odds you'll never need to know what the ror id for unb is. the predominant use-case for organizational ids is in strengthening connections between records using open scholarly infrastructure.

organizations

a quick example re: ror

we write UNB as:

UNB
University of New Brunswick
UNBF / UNBSJ
UNB Fredericton
University of New Brunswick Saint John

https://ror.org/05nkf0n29

registration agencies

registration agencies provide metadata schema through which users can describe the objects they are registering pids for.

as you can imagine, you'd describe a person differently than you'd describe a dataset, or a journal article, or an organization. even when agencies use the same type of pid (like the doi), the schema they use may vary.

<?xml version="1.0" encoding="UTF-8"?>
<crossref_result version="3.0" xsi:schemaLocation="http://www.crossref.org/qrschema/3.0 http://www.crossref.org/schemas/crossref_query_output3.0.xsd">
  <query_result>
    <head>
      <doi_batch_id>none</doi_batch_id>
    </head>
    <body>
      <query status="resolved">
        <doi type="book_content">10.4324/9780203051238-5</doi>
        <crm-item name="publisher-name" type="string">Informa UK Limited</crm-item>
        <crm-item name="prefix-name" type="string">Informa UK (Routledge)</crm-item>
        <crm-item name="member-id" type="number">301</crm-item>
        <crm-item name="citation-id" type="number">122425695</crm-item>
        <crm-item name="book-id" type="number">1477192</crm-item>
        <crm-item name="deposit-timestamp" type="number">2020122110554080199</crm-item>
        <crm-item name="owner-prefix" type="string">10.4324</crm-item>
        <crm-item name="last-update" type="date">2020-12-21T15:07:00Z</crm-item>
        <crm-item name="created" type="date">2020-12-21T15:06:59Z</crm-item>
        <crm-item name="citedby-count" type="number">0</crm-item>
        <doi_record>
          <crossref xsi:schemaLocation="http://www.crossref.org/xschema/1.1 http://doi.crossref.org/schemas/unixref1.1.xsd">
            <book book_type="other">
              <book_metadata language="en">
                <contributors>
                  <person_name sequence="first" contributor_role="author">
                    <given_name>Richard</given_name>
                    <surname>Smiraglia</surname>
                  </person_name>
                </contributors>
                <titles>
                  <title>Metadata</title>
                  <subtitle>A Cataloger's Primer</subtitle>
                </titles>
                <edition_number>0</edition_number>
                <publication_date media_type="online">
                  <month>11</month>
                  <day>12</day>
                  <year>2012</year>
                </publication_date>
                <isbn media_type="electronic">9780203051238</isbn>
                <publisher>
                  <publisher_name>Routledge</publisher_name>
                </publisher>
                <doi_data>
                  <doi>10.4324/9780203051238</doi>
                  <timestamp>2020122110554078499</timestamp>
                  <resource>https://www.taylorfrancis.com/books/9781136435843</resource>
                </doi_data>
              </book_metadata>
              <content_item component_type="chapter" publication_type="full_text" language="en">
                <titles>
                  <title>Understanding Metadata and Metadata Schemes</title>
                </titles>
                <publication_date>
                  <year>2012</year>
                  <month>11</month>
                  <day>12</day>
                </publication_date>
                <pages>
                  <first_page>25</first_page>
                  <last_page>44</last_page>
                </pages>
                <doi_data>
                  <doi>10.4324/9780203051238-5</doi>
                  <timestamp>2020122110554080199</timestamp>
                  <resource>https://www.taylorfrancis.com/books/9781136435843/chapters/10.4324/9780203051238-5</resource>
                </doi_data>
              </content_item>
            </book>
          </crossref>
        </doi_record>
      </query>
    </body>
  </query_result>
</crossref_result>

i'm sorry to do this to you again.

there's a lot of information here:

metadata is hugely useful

unlike publications themselves, metadata is typically free. and, we can learn a lot from it. crossref, for example, can store the following things as publicly accessible metadata:

title
subtitle
authors
orcids
affiliation
copyright license

funder/grant ids
languages
ror
references
resource location
version

publisher
journal
volume/issue
related dois
dates
abstracts

crossref makes this metadata available through a public api.

is it time for you to tell me what an API is?

what an api is

api stands for

Application
Programming
Interface

an API is, basically, a set of rules for interacting with software

think of it as being a little like a translator working as an intermediary between two people who don't speak the same language

what an api is

APIs are everywhere. when my calendar app tells me today's forecast, it's accessing that information using the Accuweather API. when my watch vibrates because i got a text message, that's because Garmin's API is communicating with Apple's notifications API.

APIs are how disparate systems, built by different people, using different languages and definitions, find common ground and share information.

this is open scholarly infrastructure

this network of APIs is like a municipal water system (get it?). it is, increasingly, infrastructure relied upon by researchers and institutions whether or not they are really aware of it.

almost all open scholarly infrastructure is based around APIs.

crossref

datacite

orcid

elsevier

t&f

sage

ror

github

dataverse

zenodo

arxiv

mendelay

zotero

cris systems

funders

openaire

google scholar

unpaywall

share your paper

plos

...