The REUSE Initiative
Andrea Janes, Roberto Confalonieri, Riccardo Felluga, Max Elia Schweigkofler
Free University of Bozen-Bolzano

Topics
this is how we will spend our/your time:
Conclusion
Reuse-checker: a tool to check REUSE compliance
The study about REUSE adoption
The REUSE Initiative
Introduction
Reusing existing software...
- ...is productive: one does not need to develop everything from scratch
- If it is Free Software you can even:
- evaluate the quality of the reused component
- understand how maintained it is evaluating the community that is behind and looking at how often it is updated
- If it is Free Software you can even:
- ...can become a nightmare because of the license!
Risk of license conflicts
- Is not easy for a developer to understand the legal consequences of reusing an existing Free Software component
- Possible reasons:
- Many licenses exist, e.g., the Software Package Data Exchange (SPDX) considers 348 licenses with 32 possible exceptions.
- Domain specific language: the rights and obligations of each type of license are difficult to understand.
- Complexity: it complex to understand all licenses in use in a project and how they can be combined.
- Uncertainty: the consequences of violating a license are unclear as removing a component.
The REUSE Initiative
- Initiated by the Free Software Foundation Europe
- Has the goal to provide clear guidelines on how to publish Free Software so that license and copyright are clearly defined and machine readable.
-
The desired effects:
- Make it easier to understand the license
- Allow the creation of tools that find reusable components with compatible licenses
- First release October 11/2017, second release 12/2017

The REUSE Initiative
- The initiative defines 3 main practices:
- Provide the exact text of each license used
- Include a copyright notice and license in each file
- Provide an inventory for included software
- It also defines how these practices have to be implemented (pay attention on the next 3 slides)
Provide the exact text of each license used
-
One license for all files: provide a file called "LICENSE", "LICENCE", "COPYING", or "COPYRIGHT"
- The header of the file must be: "Valid-License-Identifier: ", followed by one of the 348 SPDX license identifiers
-
Multiple licenses: licenses are in separate text files in a LICENSES folder and
- Name each license file either using the SPDX identifier of the license it contains or
- include a "Valid-License-Identifier" tag in the license file or
- include a "Valid-License-Identifier" tag in a separate file to the license text with the same file name but with a ".license" extension.
Include a copyright notice and license in each file
- Include a copyright notice and license in the header of each file, using an SPDX identifier to refer to the license.
- Or: add the header in a separate file with the same name but with a “.license” extension.
- Or: list the copyright and license information for all files in a separate file using a “machine-readable debian/copyright” file* or using the SPDX specification to provide a bill of materials**, stored in a file called “LICENSE.spdx”.
- Or (for the copyright): rely on the underlying version control system, which should indicate the author in its meta data.
* https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
** https://spdx.org/specifications
Provide an inventory for included software
- Optionally, include a bill of materials, generated automatically, conforming to the SPDX specification and included in a file in the top level directory of your repository called LICENSE.spdx.
The study
- If we look at GITHub repositories, how well do they respect the REUSE initiative rules?

How we conducted it
- GitHub hosts 85 million repositories
- If we would write a tool that can scan 1 repo/second it would have taken us 2.7 years (SFSCon 2022)
- We took a sample of 1000 repositories of all repositories that were released in the month before the study (from July to August 2017
- 416,776 repos
- 282,232 files
- 70 repos did not exist anymore (deleted between sampling and actual download)
Results (1/2)
- Of 930 repositories:
- a single license file exists: 571 (61.4%)
- the SPDX license file exists: 0 (0%)
- the license folder exists: 2 (0.2%)
- the debian license folder exists: 2 (0.2%)
- the all used licenses are present: 568 (61.1%)
- the readme file exists: 822 (88.4%
- the authors file exists: 20 (2.2%)

Results (2/2)
- Of 282,232 files:
- with copyright information: 89,328 (31.7%)
- with license found in file: 264,790 (93.8%)
- with license found in .license: 1 (0.0%)
- with license found in debian format: 642 (0.2%)
- with license found in .spdx: 0 (0%)
- where a SPDX license expression exists: 14,661 (5.2%)
- with a valid SPDX license expression: 14,019 (5.0%)

Reuse checker
- Tool to verify if repositories comply to REUSE rules
- Due to the complexity of the REUSE practices not complete
- Available on
https://github.com/riccardofelluga/reuse-checker - Written in Elixir :)
Reuse checker in action




Conclusion
- The REUSE initiative goes to the right direction
- Unfortunately (in our opinion) too many ways to be compliant
- Have a look at the "Flight Rules" of IDM-Südtirol at https://github.com/idm-suedtirol/reuse that contain simple recipies to be REUSE compliant. It might be easier to digest.
- The full specification is at https://reuse.software/
That's it!
Andrea Janes, Researcher @ Free University of Bozen-Bolzano
ajanes@unibz.it
SFSCon: the REUSE Initiative
By Andrea Janes