Software Citation Implementation in Astronomy

An Update

Daina Bouquin

Daniel Chivvis

Center for Astrophysics | Harvard & Smithsonian

because you want credit for your work

 

because you care about science

I don't want you to have to think about citation.

I want you to have a scientific legacy.

 

Your work will be the foundation on which future generations must build an improved understanding of how the Universe works.

 

Your work will be their heritage.

If we can't point at your work and say you did it we can't do anything else.

Code is speech

FORCE11:

Software Citation Implementation Working Group

not just astronomy

updated implementation guidance

(different/better ways to cite different things)

old guidance is a starting point for understanding its limits and what new work is needed

  • Publish the software:
    • If the software is on GitHub, follow steps in https://guides.github.com/activities/citable-code/
    • If the software is not on GitHub, submit it to Zenodo directly, or to figshare, or a suitable domain repository, with appropriate metadata (including authors, title, citations, and dependencies)
  • Get a DOI

  • Create a CITATION file and update your README to tell people how to cite

OLD Guidance

 

  • Check for a CITATION file or README that contains citation information; if such a file says how to cite the software itself, do that

  • If there is no CITATION file or specifications in the README, do your best to follow the principles

    • If the software developers declare who the authors are, list them; otherwise, just name the project as the authors

    • Include a method for identification that is machine actionable, globally unique, interoperable (i.e. URL to a release, a company product number)

    • If there’s a landing page that includes metadata, point to that, not directly to the software (i.e. if software is on GitHub and in Zenodo, point to the Zenodo version and specifically to the landing page)

    • Include specific version/release information

OLD Cont.

software metadata specifications:

Moving Forward

Citation File Format

human- and machine-readable file format that provides citation metadata for software.

Comprehensive Metadata for Software

  • credit for academic software
    • citation metadata
  • replicate some analysis
    • versions and dependencies
  • discover software you don’t already know
    • keywords and descriptions

The FORCE11 Software Citation Implementation WG update will not be meant to answer all questions/solve all the problems.

 

"Communities should build further documents... more specific for their communities and use cases."

Definitions

Software is:

  • available
    • open or closed source
  • archived
    • preserved long term by someone other than the software author
  • stewarded -  "actively" preserved
  • identified
    • has an identifier that is tied to the software separate from the software's location
  • indexed
    • the identifier is one of a list maintained by some indexing service
  • “published” = permanently archiving it and creating a resolvable identifier (e.g. by Zenodo, figshare, institutional archival repositories)
  • “unpublished” = the software is made available by a hosting organization that does not commit to long term preservation (e.g. GitHub, personal website)

 

Code publicly available online as open source is unpublished unless a specific version of the software has been deposited and made available via an organization that archives and provides identifiers for software deposits.

"Published Software"

"published software"

means

"archived software"

oh god

Best thing to cite

Software Types

Case Study

(slightly better than anecdotes)

  • recommend comprehensive reviewer and editor guidelines/inform policy development among publishers
  • develop training and other resources for article authors and software developers to improve software citation practices
  • inform the prioritization and development of tools that could be used to support software citation

Different "types" of software packages developed in whole or in part at the CfA

 

Likely to be cited

 

Cover long year range

 

AAS XML (1998-2018)

 

ADS API search (forthcoming) 

Methods

  • Reference = XML tag is a citation tag or the string associated with the alias contains bibtex record indicator
  • Acknowledgement = XML tag
  • Footnote = XML tag is a footnote tag or the string associated with the alias contains a known footnote indicator

Define "aliases" for the software packages.

Find out where aliases are found in articles.

XML is terrible but we have preliminary findings and general issues have been identified

(Work still needed on footnotes and DS9 confounds)

How many articles contained some kind of "alias"?

AstroBlend: 2

AstroPy: 1168

RADMC-3D: 613

Spec2d: 610

Stingray: 9

TARDIS: 19

WCS Tools: 191

References?

AstroBlend: 0%

AstroPy: 86%

RADMC-3D: 68%

Spec2d: 71%

Stingray: 55%

TARDIS: 84%

WCS Tools: 54%

Nothing?

AstroPy: 2%

RADMC-3D: 4%

Spec2d: 6%

TARDIS: 10.5%

WCS Tools: 19%

How often were identifiers used to give credit?

AstroBlend: 0%

AstroPy: 76%

RADMC-3D: 2.7%

Spec2d: 45%

Stingray: 0%

TARDIS: 21%

WCS Tools: 19.8%

How often was a

Zenodo DOI used?

 

0%

 

Why?

People have been trying to give credit from the start

(interactive plots)

Authors are specifically requesting people cite something other than the code even when a Zenodo DOI for the code exists.

These things should be cited in addition to the code, rather than as stand-ins for the code

Software Zenodo DOI doesn't guarantee a native software citation

ASCL Records are cited instead and often contain complicated or conflicting instructions

Sometimes the ASCL record really is the best answer to the question of what to cite, but people don't understand when to use it on its own.

Spec2d

Metadata is scattered and not formatted uniformly

  • You control your metadata.

  • You are your own cataloger.

software authors

  • You need to cite software correctly.

  • No one else will catch mistakes.

  • You are your own copy editor.

article authors

  • You need policies that can be enforced.

  • You need to provide examples.

Publishers

Things you can do

right now

Software Authors

  • Mint a DataCite software DOI
  • Create a CFF file
  • License your data and code explicitly
  • Update and check your metadata
    • Check it again
  • Link documentation to the source code directly
  • Ensure your preferred citations/any instructions about attribution enable native software citation
  • If you have many versions of software, decide who the authors are for the "concept" of the software*
  • Try out the new AAS/JOSS code review process

* get a freaking ORCiD

Article Authors

  • Look for preferred citations
    • Look everywhere
  • If you cannot find a preferred citation, follow the F11 guidance and make sure you're doing your best at native software citation
  • Consider the version that you are citing
    • Who are you trying to give credit?
  • Follow publisher policies, if there isn't one follow F11 principles
  • Put software citations in the references section
  • Cite your own code in a software paper
    • ​tells others how you want it cited

Publishers

  • Make a software citation policy
  • Provide examples 
    • ​What to do
    • What not to do
  • Make expectations clear as to how much editorial review will be dedicated to checking software citations 
    • Everyone assumes you will fix it
  • If you accept software papers recommend authors create metadata files and mint a DOI 
    • Provide examples of these

Thank You

ADS/AAS

Asclepias Update

Made with Slides.com