Software Citation in Astronomy
Daina Bouquin
Harvard-Smithsonian Center for Astrophysics
daina.bouquin@cfa.harvard.edu
Harvard University
Smithsonian Institution
Some things that I work on:
The relationships between signifiers
and what they stand for in reality.
How we understand what something means.
Vocabulary of a person, language, or branch of knowledge.
(contains the signifiers)
Copernicus, N. (1543). Nicolai Copernici Torinensis De revolutionibus orbium cœlestium libri vi. Norimbergae: Apud Ioh. Petreium.
Galilei, G. (1610). Osservazioni e calcoli relativi ai Pianeti Medicei.
Galileo (67 years later)
Threatened with torture
Imprisoned for life
Burned his books
(Largely seen as the birth of observational astronomy and the scientific method)
(It was easy to dismiss)
Meaning is collective agreement about a specific thing at a specific time.
Humphrey, S.D. (1849). Multiple Exposures of the Moon: Nine Exposures, daguerreotype. http://id.lib.harvard.edu/images/olvwork124646/catalog
Sometimes it's more about privilege.
Earliest image of the moon extant.
earliest surviving image
Now it's art.
means context
Daguerreotype "Recipe book"
Matters because of its relationship to astro daguerreotypes.
Provenance guides prioritization for curation.
Curation is work.
Mechanisms for modeling relationships between the information gathered from provenancial sources.
The creators of these objects did not need to care about the historic meaning of their work.
Provenance could be determined so we gave these things meaning and prioritized them for curation.
We know what to call these things and
we know how to take care of them.
These items are part of your astronomical heritage.
Knowledge is more than books and articles.
I can describe this thing but give it little meaning.
Cultural norms prevent me from throwing this away.
(I would feel bad)
A paper could provide some provenance.
Our record should definitely have a field
where we can identify a relevant paper.
Remember though:
Who didn't?
Is the author of the paper identical to
the creator of this thing?
Who gets credit?
We need to be able to directly identify the object to distinguish between the object and our sources of provenance.
What does this have to do with citations?
give credit to Samuel Dwight Humphrey
(the photographer)
help someone else find the daguerreotype
(it's a physical thing)
expand the object's semantic network
(allow its meaning to change)
I would not cite
Humphrey's handbook
Not the thing I want to cite
Wrong year
Humphrey gets credit, but
not for his daguerreotype of the moon
Humphrey, S. D. (1858). American Hand Book of the Daguerreotype. (5th ed.)
Humphrey, S.D. (1849). Multiple Exposures of the Moon: Nine Exposures, daguerreotype. http://id.lib.harvard.edu/images/olvwork124646/catalog
Software was not valued the way that papers and data are
(still are not)
but people wanted to give
software credit and software authors wanted credit, so they hacked the system.
(and that's never going to happen)
Remember why you wouldn't do this for the daguerreotype?
authorship ambiguity
Different software versions have different authors– how many papers would the software authors need to write?
makes locating software more difficult over time
Links break (if they exist at all)
can put open source documentation behind paywalls
Remember privilege? Not all software papers are OA
Software citations are made indistinguishable
from citations to for other purposes
(acknowledgements)
Machines can't find these types of "citations"
(humans can just read them)
Titles are ambiguous
(software also has many "aliases")
ADS search for software called "Stingray" returns papers about:
just means it's in a place right now
What about pointing to the software's location?
(the repo)
This page doesn't exist there anymore.
URL
Uniform Resource Locator
Locations change.
Provenance changes.
Metadata changes.
Meaning changes.
https://github.com/dfm/corner.py
was
Changes over time.
The meaning you are trying to express now will be different from what will be located at this URL later.
This is not what you cite because
it is fragile and has no unambiguous meaning.
https://github.com/dfm/triangle.py
This page has a URL: https://zenodo.org/record/53155
This page is an interface where metadata is displayed.
The metadata is stored
with the identifier (DOI).
The URL is just another piece of metadata.
Unambiguous way to point at a specific thing in a specific place at a specific time.
(DOI, URI, Bibcode, arXiv ID, etc.)
Where the thing you are pointing at is at a specific time.
(URL)
Software will be the foundation on which future generations must build new knowledge.
Machine-actionable metadata about your software.
https://codemeta.github.io/index.html
Creating a CodeMeta file gives your software provenance so when you deposit your software in an archive, that archive understands how to take care of and understand your software.
Lets us translate our lexicon from one schema to another.
Enables interoperability and further contextualization.
Identifiers can be mapped to other identifiers.
https://doi.org/10.3847/1538-4365/ab7be6
https://codemeta.github.io/codemeta-generator/
http://doi.org/10.5281/zenodo.1327325
https://guides.github.com/activities/citable-code/
http://doi.org/10.5281/zenodo.3479199
https://www.astrobetter.com/blog/2019/07/01/citing-astronomy-software-inline-text-examples/
We have a complete history of nothing.
Some things get a legacy and some things don't.
Your work matters.