How to make SPDX industry standard for AI/ML

Hello I am Cheuk

  • Open-Source contributor

  • Organisers of community events

  • PSF director and fellow
  • Community manager at OpenSSF

Who has looked at the ingredient list?

Think of last time you opened a pack of snacks

There is a need to know what your food is made of

same as other consumer goods we used everyday

including software that we used everyday

Software Bill of Materials (SBOMs)

  • list of all the open source and third-party components present in a codebase
  • lists the licenses that govern those components
  • the versions of the components used in the codebase and patch status
  • like ingredients list of a food product

Do you know there is a stardard format of how to list the ingredients?

According to Food labelling and packaging in

  • If your food or drink product has 2 or more ingredients (including any additives), you must list them all
  • Ingredients must be listed in order of weight, with the main ingredient first
  • You must highlight allergens on the label using a different font, style or background colour.

So there should be some standard format of how to list the SBOMs, right?

Software Package Data Exchange (SPDX)

  • open standard describing SBOMs
  • common format to reduce redundant work sharing important release data
  • freely available international open standard
    (ISO/IEC 5692:2021)
  • formats that are both machine- and human-readable
  • efficient exchange of metadata in the supply chain

SPDX 2.3 (current release)

  • external security information reference

  • reference to a Common Vulnerabilities and Exposures (CVE) advisory

  • satisfying US Executive Order 14028 Minimum Elements for an SBOM

  • verify the provenance and integrity of the software

  • an ISO Standard: ISO/IEC 5962:2021

  • Signing an SPDX SBOM with Sigstore’s Cosign

SPDX 2.3 is a pretty good for software

But we can make it better





SPDX 3.0
(release candidate)

  • new Security, Build, Data and AI profiles
  • support database better
  • capture domain-specific information
  • capture AI/ML models and dataset provenance

How to get the AI/ML and data community to adopt SPDX 3.0

This is great! But...

AI/ML risks

  • data breach and privacy risks - rely on data
  • the system is more complex - lots of black boxes
  • AI bloom - less careful
  • new vulnerability - prompt injections

Quick adoption

  • Thorough profile
  • Less burden to start
  • Universial standard
  • Satisfying policies

Good tool


  • Examples
  • Usecases
  • Education


Where to start

  • SPDX 3.0 profileis quite thorough
  • Communicate with policies makers

  • To create a univerersial standard
  • More outreaching
  • Go to where the AI/ML community is
  • Understand their needs

Call to action

  • Adopt SPDX 2.3 right now
  • Contribute to SPDX 3.0 model - try the 3.0-rc
  • Engaging in outreaching activities
  • Keep communicating with policies makers and users

Let's make SPDX industry standard for AI/ML

Thank you!

