Let's shadow a Python release: insights from packaging, distribution, and supply-chain security

Agriya Khetarpal

Agenda

  1. The release process for the Python programming language
  2. The software supply chain in the Python ecosystem

CPython 🐍

  • The most common (and the reference) implementation of Python
  • Most likely the one you use on a daily basis
  • Other implementations: PyPy, RustPython, Jython, GraalPy, IronPython, Brython, MicroPython, Pyodide, etc.

Python 3.13 🪩

  • Will be released in October, 2024

  • Shining new features

    • New, updated REPL

    • A new JIT compiler based on LLVM (experimental)

    • Free-threaded (no-GIL) builds

How does the Python release process work?

  • Alphas, betas, and release candidates
  • Maintenance and security updates
  • Coordinated through Python Enhancement Proposals (PEPs) by core developers
    • PEP 719: Python 3.13
    • PEP 745: Python 3.14
  • Yes, Python 3.14 is already in the feature development stage, while Python 3.13 is getting ready for releases!

Support cycles and backports

  • 1.5 years of backports and 3.5 years of security updates
  • This will be 2 + 3 for Python 3.13 and later
  • End-of-life: after five years of the first release

Stakeholders involved

  • Python core developers
  • Release Manager
  • The Python Steering Council
  • The Python Software Foundation
  • Community contributors, (potentially) like you

Visualising the CPython release process

 

Seth Larson (Security Developer-in-Residence, PSF)

How releases are made

  • https://github.com/python/release-tools repository
  • Important to note: everything runs in CI (GitHub Actions and Azure Pipelines)!
  • Several scripts and YAML workflows to facilitate every aspect of the release: compilation, signage, binary uploads, documentation, certificates, and more

How do you usually install Python?

Windows

  • Package managers (Chocolatey, NuGet, Scoop)
  • Microsoft Store
  • Official https://python.org/download/ installation wizards offered in .msi or .exe format

macOS

  • Package managers (Homebrew, MacPorts, Fink)
  • Official .pkg or .dmg installers

How do you usually install Python?

GNU/Linux

  • Package managers (apt, yum, dpkg, dnf, Linuxbrew, Spack)
  • Official source tarballs

How do you usually install Python?

Downstream packaging and distribution

  • OS-specific package maintainers (Red Hat, Gentoo, etc.)
  • Maintainers of Docker images
  • Cloud provider distributions (AWS, Azure, GCP)
  • Embedded systems and IoT devices

Build provenance

Provenance

Let's compare two photos. Are they the same photo?

They are not :P

How about these two?

They are the same, but don't be too quick to judge!

Images contain metadata

FileSize: 3.5 MiB
FileModifyDate: 2024-07-19T18:50:31.000+00:00
FileAccessDate: 2024-07-19T18:50:31.000+00:00
FileInodeChangeDate: 2024-07-19T18:50:31.000+00:00
FileType: JPEG
FileTypeExtension: jpg
MIMEType: image/jpeg
JFIFVersion: 1.02
ResolutionUnit: inches
XResolution: 72
YResolution: 72
ProfileCMMType: Linotronic
ProfileVersion: 2.1.0
ProfileClass: Display Device Profile
ColorSpaceData: RGB
ProfileConnectionSpace: XYZ
ProfileDateTime: 1998-02-09T06:49:00.000+00:00
ProfileFileSignature: acsp
PrimaryPlatform: Microsoft Corporation
CMMFlags: Not Embedded, Independent
DeviceManufacturer: Hewlett-Packard
DeviceModel: sRGB
DeviceAttributes: Reflective, Glossy, Positive, Color
RenderingIntent: Perceptual
ConnectionSpaceIlluminant: 0.9642 1 0.82491
ProfileCreator: Hewlett-Packard
ProfileID: 0
ProfileCopyright: Copyright (c) 1998 Hewlett-Packard Company
ProfileDescription: sRGB IEC61966-2.1
MediaWhitePoint: 0.95045 1 1.08905
MediaBlackPoint: 0 0 0

 

RedMatrixColumn: 0.43607 0.22249 0.01392
GreenMatrixColumn: 0.38515 0.71687 0.09708
BlueMatrixColumn: 0.14307 0.06061 0.7141
DeviceMfgDesc: IEC http://www.iec.ch
DeviceModelDesc: IEC 61966-2.1 Default RGB colour space - sRGB
ViewingCondDesc: Reference Viewing Condition in IEC61966-2.1
ViewingCondIlluminant: 19.6445 20.3718 16.8089
ViewingCondSurround: 3.92889 4.07439 3.36179
ViewingCondIlluminantType: D50
Luminance: 76.03647 80 87.12462
MeasurementObserver: CIE 1931
MeasurementBacking: 0 0 0
MeasurementGeometry: Unknown
MeasurementFlare: 0.999%
MeasurementIlluminant: D65
Technology: Cathode Ray Tube Display
RedTRC: (Binary data 2060 bytes, use -b option to extract)
GreenTRC: (Binary data 2060 bytes, use -b option to extract)
BlueTRC: (Binary data 2060 bytes, use -b option to extract)
ImageWidth: 4160
ImageHeight: 6240
EncodingProcess: Progressive DCT, Huffman coding
BitsPerSample: 8
ColorComponents: 3
YCbCrSubSampling: YCbCr4:2:0 (2 2)
ImageSize: 4160x6240
Megapixels: 26

Provenance is

security

metadata

reproducibility

verifiability

evidence

identifiability

confidence

origin

trust

SLSA: Supply-chain Levels for Software Artifacts

The software supply chain

  • Source code
  • Build systems
  • Package registries
  • Distribution channels
  • End users

Exhibit: dependency confusion

  • PyTorch (nightly release) was compromised between December 26–31, 2022

  • Malicious miscreants added torchtriton as a dependency to PyPI, from where pip downloads packages at a priority in comparison to other indices

  • Downloaded ~2717 times in total, with 2500 on 26 December 2022 alone

Similar attacks in the wild

Typosquatting

pip install requetss?

pip install beautifilsoup4

tensotflow

playwrgiht

matplptlib

requirementstxt

asynciio

This isn't endemic to the Python ecosystem alone

SLSA provenance

Level 1: Provides supply chain visibility

  • Automated version control.
  • Automated build process.
  • Generate Provenance.
  • Provenance contains information

Level 4: Assurance of build integrity + dependency management

  • 2 person review of all changes.
  • Hermetic/Reproducible builds.
  • Hardened build service.
  • Signed & Non-falsifiable provenance

Level 3: Harden build infrastructure, integrate trust

  • Source/build platform meet standards.
  • Auditability of source/build.
  • Guaranteed integrity of provenance

Level 2: Protect against tampering, provide integrity of builds

  • Version control.
  • Hosted Build.
  • Signed provenance

SBOM: Software Bill-of-Materials 🧾

  • Comprehensive inventory of software components
  • Includes dependencies, versions, and licensing info
  • Crucial for vulnerability management and compliance

The Python SBOM 📜

  • Included since CPython 3.12.2 and later releases
  • JSON file containing all SHA-256 checksums for all files
  • Names and versions of all software components
  • Dependency relationships between software components
  • Software identifiers (like CPE and Package URLs)
  • Download URLs for source code with checksums

The Python SBOM 📜

  • Since OpenSSL is built differently across Windows and Linux/macOS, the Python Windows installers need different SBOMs
  • i.e., different SBOMs correspond to different sources

How are the Scientific Python and the Python packaging ecosystems faring?

  • Sigstore: provides verification and signing binaries
  • Coupled with GitHub: Artifact Attestations
  • Scientific Python Ecosystem Coordination (SPEC)-8: "Securing the Release Process" is underway

What you can do ✅

"The locus of control" maxim

Internal

  • Learn to write secure code (don't store anything in plaintext, inculcate an adversarial mindset, and more)
  • Vet your dependencies - Package your code properly - Check for wheels (and not sdists) to help mitigate(!) RCEs

  • Sign your binaries and releases
  • In some cases, even limit such processes to dedicated release managers

What you can do ✅

"The locus of control" maxim

External

  • Always build your binaries on ephemeral systems (CI providers) – never do so on your own system
  • Aim for higher levels of SLSA provenance

  • Use tooling that your code hosting solution provides (GitHub/GitLab/BitBucket/etc.) for provenance

  • Try to score on the OpenSSF scorecards and follow up on the OpenSSF's best practices

About me 😁

Agriya Khetarpal

  • Software engineer at Quansight
  • Privileged to contribute to the open source
    Scientific Python and Pyodide ecosystems
  • Interested in
    • Python packaging 📦🐍
    • Scientific computing ➗🧪
    • Documentation 📝🌉
    • ...and more 👾

Thank you for your time!

Please feel free to say hello!

Need these slides?

in/agriyakhetarpal

agriyakhetarpal

agriyakhetarpal

agriyakhetarpal [at] outlook [dot] com

Content licensed under CC-by-SA Attribution-ShareAlike Version 4.0 International License

Further readings