importlib.resources

If you can import it, you can read it*

Pycon 2018 Cleveland, Ohio

May 2018

Barry Warsaw

Python Foundation @ LinkedIn

My code needs some static files.  How hard can it be to read them at run time?

Types of static files

  • Templates
  • Sample data
  • Certificates
  • gettext translation catalogs

File system layout

thepkg/
    __init__.py
    a.py
    b.py
    data/
        sample.dat

Naive approach

import thepkg
from pathlib import Path

pkg = Path(thepkg.__file__).parent
path = pkg / 'data' / 'sample.dat'

with open(path, 'rb') as fp:
    contents = fp.read()

Done!

Right?

What's the problem?

Things get complicated

thepkg/
    __init__.py
    a.py
    b.py
    data/
        sample.dat

Zip files and zipapps

pkg = Path(thepkg.__file__).parent
path = pkg / 'data' / 'sample.dat'
with open(path, 'rb') as fp:
   contents = fp.read()
Traceback (most recent call last):
  File "run.py", line 7, in <module>
    with open(path, 'rb') as fp:
NotADirectoryError: [Errno 20] Not a directory: '.../thepkg.zip/thepkg/data/sample.dat'

pkg_resources

Basic Resource Access

from pkg_resources import \
    resource_string as resource_bytes

contents = resource_bytes(
    'thepkg', 'data/sample.dat')

Works for both file system paths and zip file paths

Done!

Right?

What's the problem?

pkg_resources

  • has import-time side-effects
  • is slow
  • tries to do too much
  • has funky APIs
  • is everywhere
  • still supports Python 2

We can do better!

Because we have Python's import machinery to help us

importlib.resources

from importlib.resources import read_binary

contents = read_binary(
    'thepkg.data', 'sample.dat')
import thepkg.data
contents = read_binary(
    thepkg.data, 'sample.dat')

File system layout

thepkg/
    __init__.py
    a.py
    b.py
    data/
        sample.dat

File system layout

thepkg/
    __init__.py
    a.py
    b.py
    data/
        __init__.py
        sample.dat

Terminology

Access a "resource" in a "package"

Q: What's a "package"?

Q: What's a "resource"?

  • Subdirectories/subpackages are not resources!
  • Namespace packages cannot contain resources

E.g. a directory containing an __init__.py

A: Any importable module with a __path__ attribute

A: Any readable object contained in a package

E.g. a file inside a package

Packages and resources

thepkg/
    __init__.py
    a.py
    b.py
    data/
        __init__.py
        sample.dat

Package: thepkg

Packages and resources

thepkg/
    __init__.py
    a.py
    b.py
    data/
        __init__.py
        sample.dat

Package: thepkg.data

importlib.resources API

Types

Package = Union[str, ModuleType]

Resource = Union[str, os.PathLike]

importlib.resources API

Get the contents of a resource

read_binary(
    package: Package,
    resource: Resource) -> bytes
read_text(
    package: Package,
    resource: Resource,
    encoding: str = 'utf-8',
    errors: str = 'strict') -> str

importlib.resources API

Get a file-like object open for reading

open_text(
    package: Package,
    resource: Resource,
    encoding: str = 'utf-8',
    errors: str = 'strict') -> TextIO
open_binary(
    package: Package,
    resource: Resource) -> BinaryIO

importlib.resources API

Get a concrete file system path

with path(
        thepkg, 
        'foo.cpython-37m-darwin.so'
        ) as lib:      
    import_shared_library(lib)
path(
    package: Package,
    resource: Resource) -> Iterator[Path]

importlib.resources API

List what's in a package *

contents(
    package: Package) -> Iterable[str]

* Items are not guaranteed to be resources!

>>> print(sorted(contents(
        'thepkg.data')))
['__init__.py', '__pycache__', 'sample.dat']

importlib.resources API

Is a thing a resource?

is_resource(
    package: Package
    name: str) -> bool

* Use this with contents() to iterate over resources in a package

API for loaders

  • Low level API for custom loaders

  • Built-in support for file system and zips

loader.get_resource_reader(
    str: package_name
    ) -> importlib.abc.ResourceReader

importlib.abc.ResourceReader

  • open_resource(str: resource) -> BytesIO
  • resource_path(str: resource) -> str
  • is_resource(str: name) -> bool
  • contents() -> Iterable[str]
  • FileNotFoundError raised when resource doesn't exist
  • resource_path() requires a concrete file system path
  • contents() can return non-resources

Performance

  • CLIs start up 25-50% faster
  • importlib.resources
  • shiv (new open source replacement for pex)
  • http://shiv.readthedocs.io/en/latest/

importlib_resources

Backport of resource reading for Python 2.7, 3.4-3.6 (works as a shim for 3.7)

importlib-resources.rtfd.org

Give it up for

Brett Cannon

First of hopefully many great collaborations between the LinkedIn and Microsoft Python teams

Barry Warsaw

barry@python.org

bwarsaw@linkedin.com

@pumpichank

github.com/warsaw

gitlab.com/warsaw

importlib-resources.rtfd.org

importlib.resources

By Barry Warsaw

importlib.resources

Pycon 2018 talk

  • 537