Credit Lost

Daina Bouquin

Daniel Chivvis

Center for Astrophysics | Harvard & Smithsonian

The History of Software Citation in Astronomy

Software Citation Principles

  • Software should be considered important to science.
  • Software citations should enable normative and legal credit and attribution to be given to software authors.
  • Software citations need to uniquely and persistently identify software.
  • Software Citations need to enable access to the software itself and its associated metadata.

not just astronomy

What makes this hard?

Persistent Identifiers

Identification

Unambiguous way to point at a specific thing in a specific place at a specific time.

Location

Where the thing you are pointing at is at a specific time.

Identifiers for software are new.

But astronomers have been "citing" software for decades.

Software papers

  • Create ambiguity around software authorship
    • authorship lists are static, but software is dynamic with authorship lists that change from one version of code to the next
  • Make locating software more difficult over time
    • links from papers to their associated code bases break if they exist at all
  • Citations to software papers for the purpose of software citation are indistinguishable from citations to those papers for other purposes
  • Can put important information about open source software behind paywalls
  • Arbitrarily creating more work for software authors

Registry Records

(e.g, ASCL)

  • List links to websites and documents associated with the software
  • Authors may not determine how their software is represented
    • Records can be created for software without software authors' input
  • Presents challenges for software curation
    • No process for reconciling registry records with archival records

 

The practice of citing software registry records could become a recommended practice for citing software that otherwise has no clear or direct way to identify it.

We developed a case study to look at past behaviors over the last two decades.

 

Different "types" of software packages developed in whole or in part at the CfA

  • Likely to be cited

  • Cover long year range

  • AAS XML (1998-2018)

  • ADS API search (same time frame) 

Need to understand past behaviors and motivations to change norms.

Software Aliases

Search strings that could have been used by article authors mentioning software in their papers. We made distinctions between identifiers and non-identifiers for our study.
 

  • Example: spec2d
    • searching for "spec2d" would miss papers that used the software authors' preferred citation method:
      "The analysis pipeline used to reduce the DEIMOS data was developed at UC Berkeley with support from NSF grant AST-0071048."
    • A "spec2d" search would also miss things like a footnote containing: "keck.hawaii.edu/inst/deimos/pipeline.html"


We identified 410 aliases for our 9 software packages.

Limitations

Confounding and ambiguous aliases could not always be identified and removed from our results

  • Example: Stingray
    • Needed to weed out results that were mentions of the stingray nebula, stingray-shaped objects, actual stingrays (i.e., animals), the Corvette.
    • "Stingray" is also a name given to multiple instruments and was returned as part of an email domain.
  • Example: "DEIMOS" was a possible alias for spec2d
    • Decided to exclude this alias because it is also the name of the outermost Martian moon.

Analyzing AAS XML

True citations have bibliographic entries

(machine actionable)

People use other mechanisms to try to give people credit but these mechanisms are not indexed by platforms like ADS or Google Scholar.

All software packages were mentioned in some way within a year of first release.

Our ADS API search showed that this was true independent of publisher

The ADS API is not designed for this purpose, so our search likely missed about 40% of the software mentions

All mentioned software packages had multiple aliases.

In total we found 109 aliases.

All "identifiers" mentioned in the literature were all associated with journal articles.

 

None of the identifiers found resolved directly to specific software releases.

Many software mentions

do not have bibliographic entries

(they are not machine actionable)

Note: bibliographic entries were often (but not always) nested in acknowledgement tags

343 papers included software mentions,  but did not give any form of recognizable credit beyond mentioning it.

Example:

"Combining our isochrone data from the Dartmouth Stellar Evolution Database with our most updated star parameters using astropy.io, we created a diagram to illustrate log g versus Teff of our binary stars, as seen in Figure 10."

(Aleo et al. 2017)

Relying on full-text searching and preferred citations results in software authors losing credit.

We need standards.

We also need to address "preferred citation" practices.

Often people don't follow these instructions.

Registry records often contain complicated or conflicting instructions

Multiple preferred citations are confusing.

 Ineffective software identification and ambiguous citation practices have been pervasive in the past, and addressing those practices will require changes in normative behaviors throughout the scholarly communication ecosystem.

Software Authors

  • Mint a DataCite software DOI (e.g., Zenodo)
  • Create a citation file (CFF or CodeMeta)
  • Update and check your metadata
  • Ensure your preferred citations/any instructions about attribution enable direct software citation
  • If you have many versions of software, decide who the authors are for the "concept" of the software

Article Authors

  • Do your best at direct software citation
    • Ideally, cite a persistent identifier if one is available
  • Consider the version that you are citing
    • Who are you trying to give credit?
  • Put software citations in the references section
  • Cite your own code in a software paper
    • ​tells others how you want it cited

Publishers

  • Make a software citation policy
    • Provide examples 
  • Make expectations clear as to how much editorial review will be dedicated to checking software citations 
    • People assume you will fix software citations
  • If you accept software papers recommend authors create metadata files and mint a DOI 
    • Provide examples of these

Credit Lost

Daina Bouquin

Daniel Chivvis

Center for Astrophysics | Harvard & Smithsonian

The History of Software Citation in Astronomy

Credit Lost: The History of Software Citation in Astronomy

By Daina Bouquin

Credit Lost: The History of Software Citation in Astronomy

Presented online at APS-DPP 2020 http://meetings.aps.org/Meeting/DPP20/Session/JM10.2

  • 813