R2R 2019 Workshop E:
citation by persistent identifier

BMA House in London

Sessions (Murrell room):

  1. 2019-02-25 10:20 — Introduction
  2. 2019-02-25 16:40 — Barriers
  3. 2019-02-26 10:30 — Solutions

abstract (programme)

Workshop E – Citation by Identifier
How can we best minimise laborious bibliographic tasks for authors by using persistent identifiers to automatically create citations in manuscripts?

  • Dr Daniel Himmelstein
    – Postdoctoral Fellow at University of Pennsylvania
  • Rick Anderson
    – Associate Dean for Collections & Scholarly Communication at the University of Utah Library
    Member of the 2019 Researcher to Reader Advisory Board

Rather than require authors/journals to manually collect bibliographic details and format references, authors can cite persistent identifiers, while automated systems do the rest. While citation-by-identifier is now technically possible, it is not widespread in manuscript authoring and publishing workflows. What barriers stand in the way of wider adoption and what can we do about them? This workshop will explore how to leverage the rise of open bibliographic catalogues — such as Crossref and PubMed — to revolutionize the ease and accuracy of scholarly citation.


The only manual bibliographic step in the publication workflow, from authoring to production, is when an author chooses which work to cite.

Preregistered Workshop Attendees

Name, title, affiliation, one sentence on interest in cite-by-ID

Collaborative Goolge Doc for notes at  tiny.cc/r2r-workshop

What is a persistent identifier

a long lasting standardized reference to a citeable work

This is a sentence with 5 citations [

citation by persistent identifier


  1. Reproducibility of computational workflows is automated using continuous analysis
    Brett K Beaulieu-Jones, Casey S Greene
    Nature Biotechnology (2017-03-13) https://doi.org/f9ttx6
    DOI: 10.1038/nbt.3780 · PMID: 28288103 · PMCID: PMC6103790
  2. Sci-Hub provides access to nearly all scholarly literature.
    Daniel S Himmelstein, Ariel Rodriguez Romero, Jacob G Levernier, Thomas Anthony Munro, Stephen Reid McLaughlin, Bastian Greshake Tzovaras, Casey S Greene
    eLife (2018-03-01) https://www.ncbi.nlm.nih.gov/pubmed/29424689
    DOI: 10.7554/elife.32822 · PMID: 29424689 · PMCID: PMC5832410
  3. Opportunities and obstacles for deep learning in biology and medicine
    Travers Ching, Daniel S. Himmelstein, Brett K. Beaulieu-Jones, Alexandr A. Kalinin, Brian T. Do, Gregory P. Way, Enrico Ferrero, Paul-Michael Agapow, Michael Zietz, Michael M. Hoffman, … Casey S. Greene
    Journal of the Royal Society Interface (2018-04) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5938574/
    DOI: 10.1098/rsif.2017.0387 · PMID: 29618526 · PMCID: PMC5938574
  4. IPFS - Content Addressed, Versioned, P2P File System
    Juan Benet
    arXiv (2014-07-14) https://arxiv.org/abs/1407.3561v1
  5. Open collaborative writing with Manubot
    Daniel S. Himmelstein, David R. Slochower, Venkat S. Malladi, Casey S. Greene, Anthony Gitter
    (2018-08-03) https://greenelab.github.io/meta-review/
This is a sentence with 5 citations [1,2,3,4,5].
Prefix Resource
doi DOI Content Negotation
pmcid NCBI Literature Citation Exporter
pmid NCBI E-utilities
arxiv arXiv API
isbn Zotero translation-server
wikidata Zotero translation-server
url Zotero translation-server
raw user must supply CSL JSON metadata

Technical details of citation metadata retrieval


  • 12 articles with 6 identifiers each: DOI, shortDOI, PMCID, PMID, URL, and short texts.
  • 24 assignments, each with 3 identifiers
  • Write the following on the paper:
    • Seconds to locate the article (abstract or greater).
    • First two words of title, e.g. "Evidence that the Great Pacific Garbage Patch is rapidly accumulating plastic"
    • If cannot locate after 180 seconds, write 180 seconds for time and leave title blank.

Assignments at goo.gl/e86EiE

​text_cite    67%
url          75%
pmid         80%
shortdoi     80%
doi          86%
pmcid        86%

Goals of citation by persistent ID

  • unambiguous references
  • lossless publishing workflows
  • easy retrieval of cited works
  • automated metadata generation
  • automated bibliographies
  • machine readability


  • Vague or confusing instructions to authors

  • Lack of awareness of the whole persistent-identifier issue (more prevalent)

  • Lack of interest on the part of authors (less prevalent)

  • Inconsistency of requirements across journals/publishers

  • Historical works will not have persistent identifiers (backlog problem)


  • Journals: do not care what reference style article is submitted with as long as they contain include persistent IDs

  • PhD programs: mint DOIs for PhD theses and include ORCIDs for students.

  • Search engines:  detect and resolve persistent IDs, such as shortDOIs.

  • Metadata repositories: allow easy ways to report incorrect metadata via a centralized system.

  • Organisations producing style guidelines: can encourage (or mandate) persistent identifiers in citations.

  • Librarians: educate patrons regarding persistent identifiers and how they can be convenient in the long-term.

R2R 2019 workshop: Citation by persistent identifiers

By Daniel Himmelstein

R2R 2019 workshop: Citation by persistent identifiers

Introduction slides for the citation-by-identifier workshop at the 2019 Researcher to Reader conference in London. These slides are released under a CC BY 4.0 License.

  • 1,999