The REUSE Initiative

Andrea Janes, Roberto Confalonieri, Riccardo Felluga, Max Elia Schweigkofler

Free University of Bozen-Bolzano

Topics

this is how we will spend our/your time:

Conclusion

Reuse-checker: a tool to check REUSE compliance

The study about REUSE adoption

The REUSE Initiative

Introduction

Reusing existing software...

  • ...is productive: one does not need to develop everything from scratch
    • If it is Free Software you can even:
      • evaluate the quality of the reused component
      • understand how maintained it is evaluating the community that is behind and looking at how often it is updated
  • ...can become a nightmare because of the license!

Risk of license conflicts

  • Is not easy for a developer to understand the legal consequences of reusing an existing Free Software component
  • Possible reasons:
    • Many licenses exist, e.g., the Software Package Data Exchange (SPDX) considers 348 licenses with 32 possible exceptions.
    • Domain specific language: the rights and obligations of each type of license are difficult to understand.
    • Complexity: it complex to understand all licenses in use in a project and how they can be combined.
    • Uncertainty: the consequences of violating a license are unclear as removing a component.

The REUSE Initiative

  • Initiated by the Free Software Foundation Europe
  • Has the goal to provide clear guidelines on how to publish Free Software so that license and copyright are clearly defined and machine readable.
  • The desired effects:
    • ​Make it easier to understand the license
    • Allow the creation of tools that find reusable components with compatible licenses
  • First release October 11/2017, second release 12/2017

The REUSE Initiative

  • The initiative defines 3 main practices:
    1. Provide the exact text of each license used
    2. Include a copyright notice and license in each file
    3. Provide an inventory for included software
  • It also defines how these practices have to be implemented (pay attention on the next 3 slides)

Provide the exact text of each license used

  • One license for all files: provide a file called "LICENSE", "LICENCE", "COPYING", or "COPYRIGHT"
    • The header of the file must be: "Valid-License-Identifier: ", followed by one of the 348 SPDX license identifiers
  • Multiple licenses: licenses are in separate text files in a LICENSES folder and
    • Name each license file either using the SPDX identifier of the license it contains or
    • include a "Valid-License-Identifier" tag in the license file or
    • include a "Valid-License-Identifier" tag in a separate file to the license text with the same file name but with a ".license" extension.

Include a copyright notice and license in each file

  • Include a copyright notice and license in the header of each file, using an SPDX identifier to refer to the license.
  • Or: add the header in a separate file with the same name but with a “.license” extension. 
  • Or: list the copyright and license information for all files in a separate file using a “machine-readable debian/copyright” file* or using the SPDX specification to provide a bill of materials**, stored in a file called “LICENSE.spdx”. 
  • Or (for the copyright): rely on the underlying version control system, which should indicate the author in its meta data.

* https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/

** https://spdx.org/specifications

Provide an inventory for included software

  • Optionally, include a bill of materials, generated automatically, conforming to the SPDX specification and included in a file in the top level directory of your repository called LICENSE.spdx.

The study

  • If we look at GITHub repositories, how well do they respect the REUSE initiative rules?

How we conducted it

  • GitHub hosts 85 million repositories
  • If we would write a tool that can scan 1 repo/second it would have taken us 2.7 years (SFSCon 2022)
  • We took a sample of 1000 repositories of all repositories that were released in the month before the study (from July to August 2017
    • 416,776 repos
    • 282,232 files
    • 70 repos did not exist anymore (deleted between sampling and actual download)

Results (1/2)

  • Of 930 repositories:
    • a single license file exists: 571 (61.4%)
    • the SPDX license file exists: 0 (0%)
    • the license folder exists: 2 (0.2%)
    • the debian license folder exists: 2 (0.2%)
    • the all used licenses are present: 568 (61.1%)
    • the readme file exists: 822 (88.4%
    • the authors file exists: 20 (2.2%)

Results (2/2)

  • Of 282,232 files:
    • with copyright information: 89,328 (31.7%)
    • with license found in file: 264,790 (93.8%)
    • with license found in .license: 1 (0.0%)
    • with license found in debian format: 642 (0.2%)
    • with license found in .spdx: 0 (0%)
    • where a SPDX license expression exists: 14,661 (5.2%)
    • with a valid SPDX license expression: 14,019 (5.0%)

Reuse checker

Reuse checker in action

Conclusion

  • The REUSE initiative goes to the right direction
  • Unfortunately (in our opinion) too many ways to be compliant
  • Have a look at the "Flight Rules" of IDM-Südtirol at https://github.com/idm-suedtirol/reuse that contain simple recipies to be REUSE compliant. It might be easier to digest.
  • The full specification is at https://reuse.software/

That's it!

Andrea Janes, Researcher @ Free University of Bozen-Bolzano

         ajanes@unibz.it

SFSCon: the REUSE Initiative

By Andrea Janes

SFSCon: the REUSE Initiative

Presentation of the REUSE initiative during https://www.sfscon.it/

  • 140