Credit Lost
Software Citation in Astronomy
Daina Bouquin
Harvard-Smithsonian Center for Astrophysics
Software Citation Principles
Software is important to science.
Therefore software citations must:
- enable normative and legal credit and attribution for software authors
- uniquely and persistently identify software
- enable access to software and its associated metadata.
not just astronomy...
What makes this hard?
What if I wanted
to cite
something
unusual?
Humphrey, S.D. (1849). Multiple Exposures of the Moon: Nine Exposures, daguerreotype.
I would not cite
Humphrey's handbook
Not the thing I want to cite
Wrong year
Humphrey gets credit, but
not for the physical
daguerreotype of the moon
Humphrey, S. D. (1858). American Hand Book of the Daguerreotype. (5th ed.)
Humphrey, S.D. (1849). Multiple Exposures of the Moon: Nine Exposures, daguerreotype. http://id.lib.harvard.edu/images/olvwork124646/catalog
Identifiers for software are new,
but astronomers have been "citing" software for decades.
This page doesn't exist there anymore.
URL
Uniform Resource Locator
Citing Something Else
(e.g., "software papers," registry records)
create authorship ambiguity
Different software versions have different authors– how many papers would the software authors need to write?
makes locating software more difficult over time
Links break (if they exist at all)
can put open source documentation behind paywalls
Not all software papers are OA
Software citations are made indistinguishable
from citations to for other purposes
So we developed a case study.
Different "types" of software packages developed in whole or in part at the CfA
-
Likely to be cited
-
Cover long year range
-
AAS XML (1998-2018)
-
ADS API search (same time frame)
If we want astronomers to create and cite identifiers we need to understand and respond to existing norms.
Software Aliases
Search strings that could have been used by article authors mentioning software in their papers.
- Example: spec2d
- searching for "spec2d" would miss papers that used the software authors' preferred citation method:
"The analysis pipeline used to reduce the DEIMOS data was developed at UC Berkeley with support from NSF grant AST-0071048."
- A "spec2d" search would also miss things like a footnote containing: "keck.hawaii.edu/inst/deimos/pipeline.html"
- searching for "spec2d" would miss papers that used the software authors' preferred citation method:
We identified 410 aliases for our 9 software packages.
"Preferred citations"
Always more than one preferred citation.
Often people don't follow these instructions.
Limitations
Confounding and ambiguous aliases could not always be identified and removed from our results
- Example: Stingray
- Needed to weed out results that were mentions of the stingray nebula, stingray-shaped objects, actual stingrays (i.e., animals), the Corvette.
- "Stingray" is also a name given to multiple instruments and was returned as part of an email domain.
- Example: "DEIMOS" was a possible alias for spec2d
- Decided to exclude this alias because it is also the name of the outermost Martian moon.
All software packages were mentioned
in some way within a year of first release.
Drop-off is due to incomplete data in the final year
Our ADS API search showed that this was true independent of publisher
Drop-off is due to incomplete data in the final year
The ADS API is not designed for this purpose, so our search likely missed about 40% of the software mentions
All mentioned software packages had multiple aliases.
In total we found 109 aliases
Hundreds of software mentions
did not have bibliographic entries
(they are not machine actionable - footnotes, acknowledgments, etc.)
343 papers included software mentions, but
did not give any form of credit beyond mentioning it
Relying on full-text search and preferred citations results in
software authors losing credit.
What can you do?
- Mint a DataCite software DOI by archiving a copy of your code (e.g., Zenodo)
- Create a citation file (CFF or CodeMeta)
- Make sure your preferred citations/instructions about attribution enable direct software citation
- Pay close attention to publishers' software citation policies– encourage publishers to adopt these policies if they haven't already
Advocate for publishers to:
- develop (and enforce) a software citation policy that follows the software citation principles
- Give article authors examples
- Make expectations clear as to how much editorial review will be dedicated to checking software citations
- Article authors believe publishers will catch mistakes
A case study - Credit Lost
https://doi.org/10.3847/1538-4365/ab7be6
CodeMeta file generator
https://codemeta.github.io/codemeta-generator/
SSI Guidance for Archiving Software
http://doi.org/10.5281/zenodo.1327325
Archiving software using Zenodo/GitHub
https://guides.github.com/activities/citable-code/
Software Citation Checklist
http://doi.org/10.5281/zenodo.3479199
In-text software citation examples
https://www.astrobetter.com/blog/2019/07/01/citing-astronomy-software-inline-text-examples/
Credit Lost: Software Citation in Astronomy
By Daina Bouquin
Credit Lost: Software Citation in Astronomy
Presented to the HEACIT meeting on July 25, 2022.
- 451