Welcome
Please take a name tag and add your pronouns
WiFi - https://wifi.harvard.edu/
Mission
Move the needle on the most persistent software citation challenges.
A bit of background
Citing software is fundamentally more challenging than citing a paper.
Relevant information is often not obvious, norms are still being established, and there are few tools to support you.
You usually don't have to ask things like this about a paper:
-
What is the title?
-
What label should I use to identify the version?
-
Who should be identified as authors?
Software Citation Principles (2016)
Software is important.
Therefore software citations must:
- enable normative, legal credit and attribution for authors
- uniquely and persistently identify software
- enable access to software and its associated metadata.
Archived software with a persistent identifier is the most citable software.
Software needs an identifier to be citable.
Archives mint identifiers.
But even archived software can still be difficult to cite.
FORCE11
Software Citation Implementation Working Group
2018/2019 - compiled a white paper outlining
challenges to software citation implementation
Many stakeholders
- Disciplinary communities
- Publishers
- Repositories
- Indexers
- Funders
- Institutions
2019
we came up with the idea of hosting a workshop on how software citation challenges could be addressed.
The IMLS provided us funding to host the workshop in June 2020.
What can repositories, libraries, archives, museums, and institutions do to enable software citation?
Focusing generally on issues impacting the availability and format of software citation metadata
Develop mutually supporting, actionable ideas to be shared widely.
2020
2021
2022
Mission
Move the needle on the most persistent software citation challenges.
Online Focus Groups
- Understand how people view the problem of software citation from various vantage points
- Identify barriers to implementing software citation metadata standards.
In-Person Workshop
- Brainstorm interventions
- Prioritize problems and lay out mutually supporting approaches to address them
- Synthesize ideas and other deliverables to be shared widely
A two-part approach
We know we will at least synthesize a white paper summarizing what we discuss during this workshop.
Report Leaders will be tasked with helping write the white paper, disseminating it for community feedback, and incorporating community feedback into a final draft.
If you volunteer to be a Report Leader you will receive a $750 honorarium (looking for at least three volunteers).
Start with introductions
Who are you?
Where are you coming from?
Why were you willing to be here?
2 truths + 1 lie
Themes from the online focus groups
(things that came up repeatedly and
can serve as inspiration for brainstorming)
Most preservation platforms were designed for data and papers, not software
Shoehorning software into systems meant for different digital objects makes enabling software citation difficult.
workflows, staff practices, metadata fields, and
tools were all developed for data
Awareness of software citation metadata standards
(CFF and CodeMeta) is low.
Adoption is limited.
Two standards with different strengths can cause confusion.
Any citation file is good, but two standards can make things less straightforward.
There are very few immediate incentives to encourage software authors to generate and curate software citation metadata.
Mostly we're talking about long term benefits.
Automatically generated software citation metadata is very messy.
We need tools that make citations easier to generate
AND we people to clean up the metadata from one
version to the next.
Tooling can be buggy.
e.g., Zenodo/GitHub workflow - if it doesn't work on the first try, people won't try it again.
There is already software in archives with missing metadata.
How do we enable software citation when the software
and the metadata are no longer maintained?
It is not clear when to create separate records for software and its related digital objects.
When is it better to create multiple identifiers?
How do you get people to create multiple archival deposits?
Who does the work?
Metadata curation needs to continue as long as the
software is actively developed.
In-Person
Workshop
What's next?
Issues Round Table
Open discussion to define the questions we'll be brainstorming about.
1 hour
Lunch!
Lightning Talks
~10 mins per person
Brainstorming
Universal principles
After a break for lunch and lightning talks...
The more the better
- Quantity of ideas is more important than quality
- Analysis comes later
No criticism
- Creativity and criticism don't mix
- Criticism is for later
Follow the steps
- Goal-oriented and time-bound activity can be productive
- Uncontrolled bursts of creativity tend to fail
- Structured activities are designed to tap creative potential you haven't used yet in a specific context
Bring in new people
- Not everyone here has been deeply focused on software citation for a long time - that's a good thing!
- Different perspectives from different stakeholders result in new questions and interesting ideas
"Disney" method
- Questionably has anything to do with Walt Disney
-
Brainstorming method based on separation and cooperation of three roles
- Dreamer
- Realist
- Critic
"Dreamer"
- Free to imagine and suggest anything
- Generates a lot of ideas
- Doesn't think about constraints
"Realist"
- Thinks practically
- Finds ways to put ideas into practice
"Critic"
- Analyses risk
- Finds weak points
- Wants to prevent failure
How do we make archived software more citable?
Make sure all repositories have metadata fields for software version and release date
Software is deposited with a paper or a dataset, make sure there's a citation file for just the software
Dreamer
generate ideas
20 minutes
Make sure all repositories have metadata fields for software version and release date
Software is deposited with a paper or a dataset, make sure there's a citation file for just the software
Realist
How can we make these things happen?
Who can support these ideas?
Survey archives
-how many have these fields already?
-what would be needed to add them?
Outreach campaign to get institutions to add CFF file generation to deposit instructions
Add CFF file generation to existing resources about open source development
20 minutes
How do we make software in repositories more citable?
Make sure all repositories have metadata fields for software version and release date
Software is deposited with a paper or a dataset, make sure there's a citation file for just the software
Critic
Survey archives
-how many have these fields already?
-what would be needed to add them?
Outreach campaign to get institutions to add CFF file generation to deposit instructions
Add CFF file generation to existing resources about open source development
Low response rate
Who would act on the results?
Outreach to whom?
Good survey design is hard
What form would the outreach take?
Might not get traction - small portion of OSS is for research
20 minutes
Don't need a file if you have good metadata for the record
How do we make software in repositories more citable?
Make sure all repositories have metadata fields for software version and release date
Software is deposited with a paper or a dataset, make sure there's a citation file for just the software
Vote
find the top two
Survey archives
-how many have these fields already?
-what would be needed to add them?
Outreach campaign to get institutions to add CFF file generation to deposit instructions
Add CFF file generation to existing resources about open source development
Low response rate
Who would act on the results?
Outreach to whom?
Good survey design is hard
What form would the outreach take?
Might not get traction - small portion of OSS is for research
5 minutes
Don't need a file if you have good metadata for the record
Break
Report out from each group
~30-45 mins
Tomorrow we'll work on refining the top ideas,
identifying barriers, and road mapping
Issues Round Table
Open discussion to define the questions we'll be brainstorming about.
1.5 hours
- Most preservation platforms were designed for data, not software
-
Awareness of software citation metadata standards is low
-
Two standards with different strengths leads to confusion
- There are few immediate incentives to encourage software authors to generate and curate metadata
- It is not clear when to create separate records for software and its related digital objects
- Automatically generated metadata is very messy
- Tooling can be buggy
- There is already software in archives with missing metadata
Lunch!
12:00 – 1:30pm
Lightning Talks
~10 mins per person
1:30 – 2:00pm
20 minutes for each step
Big Idea
(Dreamer)
Actionable
idea
(Realist)
Criticism
(Critic)
Vote
Software Citation Workshop
Day 2
Starbursting
focuses on generating questions rather than answers
Who?
What?
Where?
When?
Why?
How?
Idea: Survey repositories about availability of metadata fields
spend at least 10 minutes on each
Who?
What?
Where?
When?
Why?
How?
Idea: Survey repositories about availability of metadata fields
Who are we trying to contact?
Who will develop the survey?
Who will review the survey?
Who will distribute the survey?
Who?
What?
Where?
When?
Why?
How?
Idea: Survey repositories about availability of metadata fields
What are we trying to find out?
What will we do with the results?
What will respondents need to know to answer the survey?
What resources do we need?
Who?
What?
Where?
When?
Why?
How?
Idea: Survey repositories about availability of metadata fields
How do we obtain the resources we need?
...
...
...
Gap filling
Try to answer the questions you generated
1 hour
Report back
What are the questions we can't answer?
What resources are needed for each idea?
How do you think we could obtain those resources?
Lunch!
Lightning Talks
~10 mins per person
Road mapping
30 minutes
What ideas can be done now?
What ideas can be done in a month?
What ideas can be done in six months?
What ideas can be done in a year?
What ideas will take more than a year?
Now
1 month
6 months
1 year
> 1 year
Define Deliverables and Outcomes
What would impact from each idea?
30 mins
Now
1 month
6 months
1 year
> 1 year
Outcomes and deliverables
Break
15 mins
Describe next steps
What are the next logical steps for each idea?
1 hour
Now
1 month
6 months
1 year
> 1 year
Next steps
Report Back
Combine road maps
30 mins
Now
1 month
6 months
1 year
> 1 year
Refractor Tour
Last Day
Working Session
Discuss white paper summarizing of the workshop
(report leaders work together)
Dissemination planning
Moving ideas forward
2-3 hours
Software Citation Workshop 2022
By Daina Bouquin
Software Citation Workshop 2022
Slides associated with a multiday IMLS-funded workshop focused on advancing software citation implementation.
- 423