Persistent Identifiers Panel Primer

Mike Nason
Open Scholarship & Publishing Librarian | UNB Libraries
Crossref & Metadata Liaison | PKP

Some Resources

I'd love to really spread out and go deep on PIDs, how they work, who stakeholders and providers are, and why they matter so much in modern scholarly publishing. We have five minutes for an intro.

 

So, here are some links! They go pretty deep, so if you are keen to learn about the majesty of PIDs, please!

Getting Found, Staying Found. Persistent Identifiers & Their Value. (deck | video)

 

FAIR Principles & Persistent Identifiers
(deck | video)

What is a PID?

PID is short for Persistent Identifier. Half of this should make complete sense to you. We have identifiers for all kinds of things, all the time.

  • social insurance numbers
  • driver's licenses
  • license plates
  • medicare
  • student ids

Identifiers typically refer to physical objects and are often created/managed locally. They're useful for record-keeping and data retrieval/searchability. They're useful for disambiguation, too!

 

For example, there is more than one Mike Nason in Canada, but only one of them has my social insurance number (I hope).

  • social insurance numbers
  • driver's licenses
  • license plates
  • medicare
  • student ids

 

 

What is a PID?

Persistent

Persistence (not to be confused with "permanence", please) means "long-lasting"... a function of the life of the service that has stewardship over a kind of identifier.

Identifier

A unique string or pattern referring to an object, person, document, file, website, skyscraper, sunflower seed, yawn, feature film, bicycle, unicorn... whatever.

 

You get it.

In Librarianship, we use a sort of dizzying array of persistent identifiers.

  • issn
  • isbn
  • doi
  • handle
  • ark
  • orcid
  • ror
  • scopusid
  • ringgold
  • magnet link
  • urn
  • uri
  • xri
  • purl
  • viaf
  • isni
  • oclc number
  • and so on...

And, in the context of publishing and open scholarly infrastructure, we can narrow this down into some especially vital PIDs.

  • doi
  • orcid
  • rorggold

This is a little reductive, and we can talk about that. But, also, tick tock ⏰.

DOIs are the big one, so let's pivot to those real quick.

DOIs

DOIs are ubiquitous. We see them all over the place:

  • in references/bibliographies
  • on article/journal websites
  • in repositories
  • on published datasets
  • links

And, we probably know one handy thing about them:

 

If you click on a DOI that looks like a link, it will take you to the thing.

DOIs are the most prominent persistent identifier.

They are also, arguably, the most important persistent identifier.

How a DOI Works

A doi is made up of two chunks, and they mean different things.

prefix

10.4324


A prefix is usually associated with a publisher or organization. DOIs for that organization will usually have the same prefix.

suffix

9780203051238-5

 

A suffix is meant to be a machine-readable (not human-readable), opaque, unique string that is specific to the singular work to which it is assigned.

If I prepend a DOI with https://doi.org/, it turns into a URL.

 

https://doi.org/10.4324/9780203051238-5

Clicking this will redirect me to the publication this DOI is associated with.

 

The process of a DOI redirecting you to a publication is called resolution.

 

And resolution is facilitated by registration agencies.

How a DOI Works

Vitally, DOIs aren't just like a bit.ly link or tiny.url.

 

If you're not familiar with these services, they swap out a very large and unwieldy link, so you can share something that isn't enormous.

For example: https://bit.ly/35LaEXj
this is a bit.ly link for a talk I did on OSI.

 

The actual URL for that talk is: https://slides.com/ahemnason/persistent-identifiers-pids-and-open-scholarly-infrastructure/

 

bit.ly and tiny.url are both basic redirects.

How a DOI Works

Plenty of people treat DOIs this way, or assume it's their only real function. This is complicated by the fact that handles do, essentially, work this way. 

 

Or, swinging wildly in the other direction, they ascribe fresh meaning to DOIs and make some wild presumptions about DOIs as a sign of scholarly legitimacy.

How a DOI Works

If you click on a DOI that looks like a link, it will take you to the thing.

A DOI is a lot more than a redirect.

Any single DOI is a reference to an entire publication record. That publication record is full of metadata.

 

And, one of these metadata elements is the publication's URL.

When you resolve a doi by clicking on it:

  • the record is accessed
  • the stored URL is retrieved
  • you are sent to the stored URL

 

The URL can be updated by the publisher. The DOI stays the same.

How a DOI Works

And, depending on the registration agency that DOI is registered to, and what kind of work it is, there's a boatload of other metadata you can access.

Publication Metadata

Unlike publications themselves, metadata is typically free. And, we can learn a lot from it. Crossref, for example, can store the following things (not inclusive) as publicly accessible metadata:

 

title
subtitle
authors
orcids
affiliation

copyright license
funder/grant ids
languages
ror
references
resource location
version
publisher
journal/volume/issue
related dois
dates
abstracts

This metadata is, as you might have guessed, hugely useful!

Just within Crossref...

Article-level Metadata
10.4138/atlgeo.2022.008

Title-level Metadata
10.4138


 

 

These are all calls against the Crossref public API for metadata related to publications. This is just a fraction of the sorts of queries you could make against this metadata.

But when the infrastructure is connected!

  • I could have publications automatically added to my ORCiD record.
  • I could find all the works in OpenAire written by researchers at my institution.
  • I could get publishing metrics about the journals my faculty publish in via OpenAlex.
  • I could reveal relationships between a funder, a specific grant ID, and all the various products of scholarship created/disseminated as a result.
  • I could evaluate the "completeness" of my published metadata with a variety of tools.

Registration Agencies

Persistent identifiers are managed by registration agencies (typically international not-for-profits) that store records/metadata, facilitate resolution requests, and may or may not offer other services based on membership.

 

They do much of this through APIs.

There are a lot of registration agencies!

 

Each agency may differ in mandate, governance, scope, service, supported objects, membership terms, and feature set.


They also, often, work together and share data.

 

 

PIDs for Scholarly Works

Publishing-flavoured

Crossref (DOI)

 

Most scholarly publishers are Crossref members. At the time I wrote this (Feb 13th) Crossref had 155,639,301 DOIs registered with their service.

 

Crossref are a big deal.

articles
proceedings
monographs
*datasets
funding agencies
grants
reports
standards
preprints

Datacite (DOI)


While some scholarly publishers use Datacite for article DOIs, it is much more commonly used in data/institutional/disciplinary repositories. Datacite and Crossref work together to connect research data to publications.

software
datasets
collections
audio/visual
event
model
*publications

PIDs for Scholarly Works

Repositories-flavoured

PIDs for Researchers

ORCID
Scopus ID
WoS Researcher ID

 

ORCID are the go-to here, with Scopus and WoS offerings both restricted to publications present on those platforms. However, these services can share data between them.

researchers

PIDs for Organizations

ROR
GRID
Ringgold

 

The predominant use-case for organizational IDs is in strengthening connections between records using open scholarly infrastructure.

organizations

Registration Agencies

Registration agencies provide metadata schema through which users can describe the objects they are registering PIDs for.

 

 

As you can imagine, you'd describe a person differently than you'd describe a dataset, or a journal article, or an organization. Even when agencies use the same type of PID (like the DOI), the schema they use may vary.

Open Scholarly Infrastructure

This network of APIs connecting these PID registration agencies to other, open services is, increasingly, infrastructure relied upon by researchers and institutions whether they are really aware of it.


Almost all open scholarly infrastructure is based around APIs and the ability to easy push and pull this metadata around.

Open scholarly infrastructure is a network of scholarly-research-focused open-source platforms, service providers, and APIs that work in concert to share data, illuminate relationships, and make research more discoverable.


https://openscholarlyinfrastructure.org/

The Key is Metadata

The metadata we get out of these systems, and its utility, is very much dependent on its quality.

 

We have a general expression in the metadata universe. That's "garbage in, garbage out."

Metadata is kind of everyone's responsibility. Researchers, librarians, publishers, registration agencies... everyone has a stake in accurate, usable metadata.

 

But, part of the reason these agencies have risen into the positions they're in now is because of their approaches to interchangeable, retrievable, and reusable metadata .

Congratulations, you now know more about dois than a frankly surprising amount of people.

 

And, by extension, you now know more about pids than a frankly surprising amount of people.

I am (almost) always happy to answer questions about open scholarly infrastructure, in general, so please reach out!

 

Panel time!