Credit Lost
Daina Bouquin
Daniel Chivvis
Center for Astrophysics | Harvard & Smithsonian
The History of Software Citation in Astronomy
Software Citation Principles
- Software should be considered important to science.
- Software citations should enable normative and legal credit and attribution to be given to software authors.
- Software citations need to uniquely and persistently identify software.
- Software Citations need to enable access to the software itself and its associated metadata.
not just astronomy
What makes this hard?
Persistent Identifiers
Identification
Unambiguous way to point at a specific thing in a specific place at a specific time.
Location
Where the thing you are pointing at is at a specific time.
Identifiers for software are new.
But astronomers have been "citing" software for decades.
Software papers
-
Create ambiguity around software authorship
- authorship lists are static, but software is dynamic with authorship lists that change from one version of code to the next
-
Make locating software more difficult over time
- links from papers to their associated code bases break if they exist at all
- Citations to software papers for the purpose of software citation are indistinguishable from citations to those papers for other purposes
- Can put important information about open source software behind paywalls
- Arbitrarily creating more work for software authors
Registry Records
(e.g, ASCL)
- List links to websites and documents associated with the software
- Authors may not determine how their software is represented
- Records can be created for software without software authors' input
-
Presents challenges for software curation
- No process for reconciling registry records with archival records
The practice of citing software registry records could become a recommended practice for citing software that otherwise has no clear or direct way to identify it.
We developed a case study to look at past behaviors over the last two decades.
Different "types" of software packages developed in whole or in part at the CfA
-
Likely to be cited
-
Cover long year range
-
AAS XML (1998-2018)
-
ADS API search (same time frame)
Need to understand past behaviors and motivations to change norms.
Software Aliases
Search strings that could have been used by article authors mentioning software in their papers. We made distinctions between identifiers and non-identifiers for our study.
- Example: spec2d
- searching for "spec2d" would miss papers that used the software authors' preferred citation method:
"The analysis pipeline used to reduce the DEIMOS data was developed at UC Berkeley with support from NSF grant AST-0071048." - A "spec2d" search would also miss things like a footnote containing: "keck.hawaii.edu/inst/deimos/pipeline.html"
- searching for "spec2d" would miss papers that used the software authors' preferred citation method:
We identified 410 aliases for our 9 software packages.
Limitations
Confounding and ambiguous aliases could not always be identified and removed from our results
- Example: Stingray
- Needed to weed out results that were mentions of the stingray nebula, stingray-shaped objects, actual stingrays (i.e., animals), the Corvette.
- "Stingray" is also a name given to multiple instruments and was returned as part of an email domain.
- Example: "DEIMOS" was a possible alias for spec2d
- Decided to exclude this alias because it is also the name of the outermost Martian moon.
Analyzing AAS XML
True citations have bibliographic entries
(machine actionable)
People use other mechanisms to try to give people credit but these mechanisms are not indexed by platforms like ADS or Google Scholar.
All software packages were mentioned in some way within a year of first release.
Our ADS API search showed that this was true independent of publisher
The ADS API is not designed for this purpose, so our search likely missed about 40% of the software mentions
All mentioned software packages had multiple aliases.
In total we found 109 aliases.
All "identifiers" mentioned in the literature were all associated with journal articles.
None of the identifiers found resolved directly to specific software releases.
Many software mentions
do not have bibliographic entries
(they are not machine actionable)
Note: bibliographic entries were often (but not always) nested in acknowledgement tags
343 papers included software mentions, but did not give any form of recognizable credit beyond mentioning it.
Example:
"Combining our isochrone data from the Dartmouth Stellar Evolution Database with our most updated star parameters using astropy.io, we created a diagram to illustrate log g versus Teff of our binary stars, as seen in Figure 10."
(Aleo et al. 2017)
Relying on full-text searching and preferred citations results in software authors losing credit.
We need standards.
We also need to address "preferred citation" practices.
Often people don't follow these instructions.
Registry records often contain complicated or conflicting instructions
Multiple preferred citations are confusing.
Ineffective software identification and ambiguous citation practices have been pervasive in the past, and addressing those practices will require changes in normative behaviors throughout the scholarly communication ecosystem.
Software Authors
- Mint a DataCite software DOI (e.g., Zenodo)
- Create a citation file (CFF or CodeMeta)
- Update and check your metadata
- Ensure your preferred citations/any instructions about attribution enable direct software citation
- If you have many versions of software, decide who the authors are for the "concept" of the software
Article Authors
- Do your best at direct software citation
- Ideally, cite a persistent identifier if one is available
- Consider the version that you are citing
- Who are you trying to give credit?
- Put software citations in the references section
-
Cite your own code in a software paper
- tells others how you want it cited
Publishers
- Make a software citation policy
- Provide examples
- Make expectations clear as to how much editorial review will be dedicated to checking software citations
- People assume you will fix software citations
- If you accept software papers recommend authors create metadata files and mint a DOI
- Provide examples of these
Credit Lost
Daina Bouquin
Daniel Chivvis
Center for Astrophysics | Harvard & Smithsonian
The History of Software Citation in Astronomy
Credit Lost: The History of Software Citation in Astronomy
By Daina Bouquin
Credit Lost: The History of Software Citation in Astronomy
Presented online at APS-DPP 2020 http://meetings.aps.org/Meeting/DPP20/Session/JM10.2
- 813