Welcome

Please take a name tag and add your pronouns

WiFi - https://wifi.harvard.edu/

Mission


Move the needle on the most persistent software citation challenges.

A bit of background

Citing software is fundamentally more challenging than citing a paper.
 

Relevant information is often not obvious, norms are still being established, and there are few tools to support you.

 

You usually don't have to ask things like this about a paper:

  • What is the title?

  • What label should I use to identify the version?

  • Who should be identified as authors?

Software Citation Principles (2016)


Software is important.

 

Therefore software citations must:

  • enable normative, legal credit and attribution for authors
  • uniquely and persistently identify software 
  • enable access to software and its associated metadata.

  

Archived software with a persistent identifier is the most citable software.

Software needs an identifier to be citable.

Archives mint identifiers.

But even archived software can still be difficult to cite.

FORCE11 

Software Citation Implementation Working Group

 

2018/2019 - compiled a white paper outlining
challenges to software citation implementation

 

https://arxiv.org/abs/1905.08674

Many stakeholders

 

  • Disciplinary communities
  • Publishers
  • Repositories
  • Indexers
  • Funders
  • Institutions

2019

we came up with the idea of hosting a workshop on how software citation challenges could be addressed.

 

The IMLS provided us funding to host the workshop in June 2020.

What can repositories, libraries, archives, museums, and institutions do to enable software citation?

Focusing generally on issues impacting the availability and format of software citation metadata

 

  

Develop mutually supporting, actionable ideas to be shared widely.

2020

2021

2022

Mission


Move the needle on the most persistent software citation challenges.

Online Focus Groups

  • Understand how people view the problem of software citation from various vantage points
  • Identify barriers to implementing software citation metadata standards.
     

 

In-Person Workshop

  • Brainstorm interventions 
  • Prioritize problems and lay out mutually supporting approaches to address them 
  • Synthesize ideas and other deliverables to be shared widely

 

A two-part approach

 

We know we will at least synthesize a white paper summarizing what we discuss during this workshop.

 

Report Leaders will be tasked with helping write the white paper, disseminating it for community feedback, and incorporating community feedback into a final draft.

 

If you volunteer to be a Report Leader you will receive a $750 honorarium (looking for at least three volunteers).

Start with introductions

 

Who are you?

Where are you coming from?

Why were you willing to be here?

 

2 truths + 1 lie

Themes from the online focus groups


(things that came up repeatedly and
can serve as inspiration for brainstorming)

Most preservation platforms were designed for data and papers, not software

 

Shoehorning software into systems meant for different digital objects makes enabling software citation difficult.

 

workflows, staff practices, metadata fields, and
tools were all developed for data

Awareness of software citation metadata standards
(CFF and CodeMeta) is low.

 

Adoption is limited.

Two standards with different strengths can cause confusion.

 

Any citation file is good, but two standards can make things less straightforward.

There are very few immediate incentives to encourage software authors to generate and curate software citation metadata.

 

Mostly we're talking about long term benefits.

Automatically generated software citation metadata is very messy.

 

We need tools that make citations easier to generate
AND we people to clean up the metadata from one
version to the next.

Tooling can be buggy.

 

e.g., Zenodo/GitHub workflow - if it doesn't work on the first try, people won't try it again.

There is already software in archives with missing metadata.

 

How do we enable software citation when the software
and the metadata are no longer maintained?

 

It is not clear when to create separate records for software and its related digital objects.

 

When is it better to create multiple identifiers?

 

How do you get people to create multiple archival deposits?

Who does the work?

 

Metadata curation needs to continue as long as the
software is actively developed.

 

In-Person
Workshop

 

What's next?

Issues Round Table

Open discussion to define the questions we'll be brainstorming about.

 

 

1 hour

Lunch!

Lightning Talks

 

~10 mins per person

Brainstorming

Universal principles

After a break for lunch and lightning talks...

The more the better

  • Quantity of ideas is more important than quality
  • Analysis comes later

No criticism

  • Creativity and criticism don't mix
  • Criticism is for later

Follow the steps

  • Goal-oriented and time-bound activity can be productive
  • Uncontrolled bursts of creativity tend to fail
  • Structured activities are designed to tap creative potential you haven't used yet in a specific context

Bring in new people

  • Not everyone here has been deeply focused on software citation for a long time - that's a good thing!
  • Different perspectives from different stakeholders result in new questions and interesting ideas

"Disney" method

 

  • Questionably has anything to do with Walt Disney
  • Brainstorming method based on separation and cooperation of three roles
    • ​Dreamer
    • Realist
    • Critic

"Dreamer"

  • Free to imagine and suggest anything
  • Generates a lot of ideas
  • Doesn't think about constraints

"Realist"

  • Thinks practically
  • Finds ways to put ideas into practice

"Critic"

  • Analyses risk
  • Finds weak points
  • Wants to prevent failure

How do we make archived software more citable?

Make sure all repositories have  metadata fields for software version and release date

Software is deposited with a paper or a dataset, make sure there's a citation file for just the software

Dreamer
generate ideas

20 minutes

Make sure all repositories have  metadata fields for software version and release date

Software is deposited with a paper or a dataset, make sure there's a citation file for just the software

Realist
How can we make these things happen?
Who can support these ideas?

Survey archives
-how many have these fields already?
-what would be needed to add them?

Outreach campaign to get institutions to add CFF file generation to deposit instructions

Add CFF file generation to existing resources about open source development

20 minutes

How do we make software in repositories more citable?

Make sure all repositories have  metadata fields for software version and release date

Software is deposited with a paper or a dataset, make sure there's a citation file for just the software

Critic

Survey archives
-how many have these fields already?
-what would be needed to add them?

Outreach campaign to get institutions to add CFF file generation to deposit instructions

Add CFF file generation to existing resources about open source development

Low response rate

Who would act on the results?

Outreach to whom?

Good survey design is hard

What form would the outreach take?

Might not get traction - small portion of OSS is for research

20 minutes

Don't need a  file if you have good metadata for the record

How do we make software in repositories more citable?

Make sure all repositories have  metadata fields for software version and release date

Software is deposited with a paper or a dataset, make sure there's a citation file for just the software

Vote

find the top two

Survey archives
-how many have these fields already?
-what would be needed to add them?

Outreach campaign to get institutions to add CFF file generation to deposit instructions

Add CFF file generation to existing resources about open source development

Low response rate

Who would act on the results?

Outreach to whom?

Good survey design is hard

What form would the outreach take?

Might not get traction - small portion of OSS is for research

5 minutes

Don't need a  file if you have good metadata for the record

Break

Report out from each group

 

 

~30-45 mins

 

 

Tomorrow we'll work on refining the top ideas,
identifying barriers, and road mapping

Issues Round Table

Open discussion to define the questions we'll be brainstorming about.

 

 

1.5 hours

  • Most preservation platforms were designed for data, not software
     
  • Awareness of software citation metadata standards is low
     
  • Two standards with different strengths leads to confusion
     
  • There are few immediate incentives to encourage software authors to generate and curate metadata
     
  • It is not clear when to create separate records for software and its related digital objects
     
  • Automatically generated metadata is very messy
     
  • Tooling can be buggy
     
  • There is already software in archives with missing metadata

Lunch!

 

12:00 – 1:30pm

Lightning Talks

 

~10 mins per person

 

 

1:30 – 2:00pm

20 minutes for each step

Big Idea

(Dreamer)

Actionable
idea

(Realist)

Criticism

(Critic)

Vote

Software Citation Workshop 

 

Day 2

Starbursting

 

focuses on generating questions rather than answers

Who?

What?

Where?

When?

Why?

How?

Idea: Survey repositories about availability of metadata fields

spend at least 10 minutes on each

Who?

What?

Where?

When?

Why?

How?

Idea: Survey repositories about availability of metadata fields

Who are we trying to contact?

Who will develop the survey?

Who will review the survey?

Who will distribute the survey?

Who?

What?

Where?

When?

Why?

How?

Idea: Survey repositories about availability of metadata fields

What are we trying to find out?

What will we do with the results?

What will respondents need to know to answer the survey?

What resources do we need?

Who?

What?

Where?

When?

Why?

How?

Idea: Survey repositories about availability of metadata fields

How do we obtain the resources we need?

...

...

...

Gap filling

 

Try to answer the questions you generated

 

1 hour

Report back

 

What are the questions we can't answer?

 

What resources are needed for each idea?

 

How do you think we could obtain those resources?

Lunch!

Lightning Talks

 

~10 mins per person

Road mapping

 

30 minutes

What ideas can be done now?

What ideas can be done in a month?

What ideas can be done in six months?

What ideas can be done in a year?

What ideas will take more than a year?

Now

1 month

6 months

1 year

> 1 year

Define Deliverables and Outcomes

 

 

What would impact from each idea?

 

30 mins

Now

1 month

6 months

1 year

> 1 year

Outcomes and deliverables

Break

 

15 mins

Describe next steps

 

What are the next logical steps for each idea?

 

1 hour

Now

1 month

6 months

1 year

> 1 year

Next steps

Report Back

 

Combine road maps

 

30 mins

Now

1 month

6 months

1 year

> 1 year

Refractor Tour

Last Day

Working Session

 

Discuss white paper summarizing of the workshop

(report leaders work together)

Dissemination planning

Moving ideas forward

 

2-3 hours

Software Citation Workshop 2022

By Daina Bouquin

Software Citation Workshop 2022

Slides associated with a multiday IMLS-funded workshop focused on advancing software citation implementation.

  • 94