Credit Lost

Software Citation in Astronomy

Daina Bouquin

Harvard-Smithsonian Center for Astrophysics

Software Citation Principles


Software is important to science.

 

Therefore software citations must:

  • enable normative and legal credit and attribution for software authors
  • uniquely and persistently identify software
  • enable access to software and its associated metadata.

  

not just astronomy...

What makes this hard?

What if I wanted
to cite 
something
unusual?

Humphrey, S.D. (1849). Multiple Exposures of the Moon: Nine Exposures, daguerreotype.

I would not cite

Humphrey's handbook

Not the thing I want to cite

 

Wrong year

 

Humphrey gets credit, but

not for the physical

daguerreotype of the moon

Humphrey, S. D. (1858). American Hand Book of the Daguerreotype. (5th ed.)

Humphrey, S.D. (1849). Multiple Exposures of the Moon: Nine Exposures, daguerreotype. http://id.lib.harvard.edu/images/olvwork124646/catalog

Identifiers for software are new,

but astronomers have been "citing" software for decades.

This page doesn't exist there anymore.

URL
Uniform Resource Locator
 

Citing Something Else

(e.g., "software papers," registry records)


create authorship ambiguity

Different software versions have different authors– how many papers would the software authors need to write?

 

makes locating software more difficult over time

Links break (if they exist at all)

 

can put open source documentation behind paywalls

 Not all software papers are OA

 

Software citations are made indistinguishable

from citations to for other purposes

So we developed a case study

 

Different "types" of software packages developed in whole or in part at the CfA

  • Likely to be cited

  • Cover long year range

  • AAS XML (1998-2018)

  • ADS API search (same time frame) 

If we want astronomers to create and cite identifiers we need to understand and respond to existing norms.

Software Aliases

Search strings that could have been used by article authors mentioning software in their papers.
 

  • Example: spec2d
    • searching for "spec2d" would miss papers that used the software authors' preferred citation method:
      "The analysis pipeline used to reduce the DEIMOS data was developed at UC Berkeley with support from NSF grant AST-0071048."
       
    • A "spec2d" search would also miss things like a footnote containing: "keck.hawaii.edu/inst/deimos/pipeline.html"


We identified 410 aliases for our 9 software packages.

"Preferred citations"

Always more than one preferred citation.

Often people don't follow these instructions.

Limitations

Confounding and ambiguous aliases could not always be identified and removed from our results

  • Example: Stingray
    • Needed to weed out results that were mentions of the stingray nebula, stingray-shaped objects, actual stingrays (i.e., animals), the Corvette.
    • "Stingray" is also a name given to multiple instruments and was returned as part of an email domain.
       
  • Example: "DEIMOS" was a possible alias for spec2d
    • Decided to exclude this alias because it is also the name of the outermost Martian moon.

All software packages were mentioned
in some way within a year of first release.

Drop-off is due to incomplete data in the final year

Our ADS API search showed that this was true independent of publisher

Drop-off is due to incomplete data in the final year

The ADS API is not designed for this purpose, so our search likely missed about 40% of the software mentions

All mentioned software packages had multiple aliases.

In total we found 109 aliases

 

Hundreds of software mentions

did not have bibliographic entries

(they are not machine actionable - footnotes, acknowledgments, etc.)

 

343 papers included software mentions, but
did not give any form of credit beyond mentioning it  

Relying on full-text search and preferred citations results in
software authors losing credit.

What can you do?

 

  • Mint a DataCite software DOI by archiving a copy of your code  (e.g., Zenodo)
     
  • Create a citation file (CFF or CodeMeta)
     
  • Make sure your preferred citations/instructions about attribution enable direct software citation
     
  • Pay close attention to publishers' software citation policies– encourage publishers to adopt these policies if they haven't already

Advocate for publishers to:

 

  • develop (and enforce) a software citation policy that follows the software citation principles
     
  • Give article authors examples
     
  • Make expectations clear as to how much editorial review will be dedicated to checking software citations
    • Article authors believe publishers will catch mistakes
Made with Slides.com