Daina Bouquin
Head Librarian, Center for Astrophysics | Harvard & Smithsonian
Daina Bouquin
Head Librarian
Harvard-Smithsonian Center for Astrophysics
daina.bouquin@cfa.harvard.edu
Your work will be the foundation on which the next generation must build an improved understanding of how the Universe works
Take the perspective of an institution of memory
An example:
Software is inseparable from "the data"
ML frameworks trade off exact numeric determinism for performance and often require remote computing resources
Even if you copy development steps there will be
tiny differences in the end results
This is the future (emergent reality) of scientific research
Stabilizing and recovering data from digital media
Whole fields are being born and augmented in response
Best practices are still developing and will need to incorporate
discipline-specific culture and values
Native data and software citation are vitally important, but what should be cited to properly give author(s) credit?
There is a proposed CITATION.cff file standard
The astronomy community doesn't agree on how much someone should contribute to a code before that person is considered an author.
Complicates authorship issues and issues pertaining to dependencies and documentation (metadata)
How do we deal with multiple forks?
How should citations be calculated across different types of digital objects and versions of those objects?
e.g. IDL, IRAF, MATLAB, etc.
Restrictive licenses and no on-going support
We're still defining what "Fair Use" is in this landscape
Proceedings underway regarding filings with the US Copyright Office for Anti-Circumvention Exemption
NSF does require software management in the same way it requires data management
NASA does not specify software requirements (yet) but explicitly requires "data management"
Code requirements "are governed by guidance at the directorate, division, and program levels"
Investigators are "encouraged to consult with the cognizant program officer"
Longterm persistence and
development / improvement of
metadata standards essential
Reality hits Re: reproducibility
If the cost of replicability was 1x (or more) the cost of the original work...
How do we balance this cost vs. the lost opportunity of doing new research?
Advocacy and community culture/values need to be incorporated into goals
Needs:
Scalable institutional support
consideration for long term curatorial needs
Must be developed in collaboration with institutions of memory and stakeholders throughout the scientific "lifecycle"
Having a well-funded archive and team of researchers helps to make all needed information artifacts accessible and usable. Many projects though do not have these resources, and many have even more data/more dynamic software.
Who takes responsibility for managing data/code long term?
NASA/CXC/PSU/L.Townsley et al
FAIR Software too
Findable
Accessible
Interoperable
Reusable
Software Preservation Network
+
Software Sustainability Institute
"...as much about getting consensus on the best practices and educating the community as it will be about the tools we come up with"
Learn to problem solve in this landscape and advocate for resources and infrastructure to support
your goals
(Libraries can and should help)
By Daina Bouquin
Special Invited Talk at the 15th International HITRAN (high-resolution transmission molecular absorption database) Conference, 13-15 June 2018. Abstract: http://hitran.org/media/hitran-conferences/hitran-15-2018/res/hitran_abs_book_2018.pdf#page=12
Head Librarian, Center for Astrophysics | Harvard & Smithsonian