Pydantic

  • Less boilerplate

  • More semantics

  • Composable models

Alexandre René

PyMoTW – 29 May 2020

Less boilerplate

class A:
  def __init__(self, x:int):
    self.x = x
class B(A):
  def __init__(self, x:int, y:int):
    super().__init__(x)
      # Remove y from signature
    self.y
class A(BaseModel):
  x: int
class B(A):
  y: int

Normal python

Pydantized python

More semantics

class Vector:
  def __init__(self, r:float, θ:float):
    if r <= 0:
      raise ValueError(
        "Negative radius")
    if θ < 0 or 2*np.pi < θ:
      raise ValueError(
        "Angle outside [0,2π]")
    self.r = r
    self.θ = θ
class Vector(BaseModel):
  r: PositiveFloat
  θ: confloat(gt=0, lt=2*np.pi)

Normal python

Pydantized python

{'title': 'Vector',
 'type': 'object',
 'properties': {
  'r': {'title': 'R', 'type': 'number',
        'exclusiveMinimum': 0},
  'θ': {'title': 'Θ',
   'type': 'number',
   'exclusiveMinimum': 0,
   'exclusiveMaximum': 6.283185307179586}},
 'required': ['r', 'θ']}
Vector.schema()

Composable models

class Complex(Vector):
    def __init__(self, r:float, θ:float):
        θ = θ % (2*np.pi)
        super().__init__(r, θ)
    def conj(self):
        return Complex(self.r, -self.θ)
      
z = Complex(r=1, θ=0.75)
z.conj()
class Complex(Vector):
    @validator('θ', pre=True)
    def standardize_θ(θ):
        return θ % (2*np.pi)
    def conj(self):
        return Complex(r=self.r, θ=-self.θ)
      
z = Complex(r=1, θ=0.75)
z.conj()

Normal python

Pydantized python

<__main__.Complex at 0x7fbf7c562580>
Complex(r=1.0, θ=5.533185307179586)

via inheritance

Composable models

class VectorBasis:
    def __init__(self, e):
        if not isinstance(e, Iterable):
            raise TypeError(
              "e is not iterable")
        if not isinstance(e, list):
            e = list(e)
        for ei in e:
            if not isinstance(ei, Vector):
                raise TypeError(
                  "e must be composed of Vector objects")
        self.e = e
    
basis = VectorBasis([z, z.conj()])
basis
class VectorBasis(BaseModel):
    e :List[Vector]

basis = VectorBasis(e=[z, z.conj()])
basis

Normal python

Pydantized python

<__main__.VectorBasis at 0x7fbf7c5263d0>
VectorBasis(e=[Complex(r=1.0, θ=0.75),
               Complex(r=1.0, θ=5.533185307179586)])

via composition

What is it's for

TODO: Example real-world application

What is it's for

  • Defining parameter classes
  • Parsing/coercing inputs
  • Declarative (not imperative) definition of class parameters
  • Automatic input validation/coercion
     
  • Export/import model parameters

customizable

→ dict

→ JSON

\(\Biggl\{\)

≠ validation

Existing alternatives

  • Extends dict with
    • Hierarchical parameters


       
    • Save/load to JSON-ish file

       
    • ParameterSpace
    • ParameterReference
       
  • + Drop-in replacement for dict
  • - Subclassing dict is somewhat hackish
  • - Stale project
Ecell = ParameterSet({'tau_m': 10.0, 'cm': 0.2})
Icell = ParameterSet({'tau_m': 15.0, 'cm': 0.5})
network = ParameterSet({'Ecells': Ecell, 'Icells': Icells})
network.Icells.cm = 0.7
network.save("network.param")
network2 = ParameterSet("network.param")
print(network2.pretty())
{
  "Ecells": {
    "tau_m": 10.0,
    "cm": 0.2,
  },
  "Icells": {
    "tau_m": 15.0,
    "cm": 0.7,
  },
}

Existing alternatives

  • Clean, compact model specification w/ types
  • Proper class
    • →Attach methods






       
  • Default repr()
  • + Built-in to Python 3.7+
  • - Need to write your own import/export
  • - No type casting
from dataclasses import dataclass

@dataclass
class InventoryItem:
    '''Class for keeping track of an item in inventory.'''
    name: str
    unit_price: float
    quantity_on_hand: int = 0

    def total_cost(self) -> float:
        return self.unit_price * self.quantity_on_hand

Key advantages

  • (sometimes) less boilerplate
  • Type checking
  • Separation of logic — One function per operation/check
  • Clean subclassing — Interface not reproduced in subclass
  • Self-document constraints
  • Validation on both initialization and attribute assignment (latter is optional)
  • Entirely cythonized
  • Extremely fast development — 70 releases in < 3 years
  • 95%? mature — New core features added, but few breaking changes.
    • My impression: v2 will more or less freeze the API
  • Separation of parsing/validation logic from task logic encourages more composable (⇒ reusable) code

Biggest disadvantage

Debugging validators is somewhat different; stack trace is less informative. This is because:

  • Validators are all executed, and a summary of all errors is printed ⇒ stack trace doesn't contain the validator;
  • Validators are called from Cython code ⇒ inspection with pdb more limited.

Adaptations:

  • Learn to read Pydantic error output




     
  • Place debugging statements inside validators

(that I found)

ValidationError: 2 validation errors for Vector
r
  ensure this value is greater than 0
  (type=value_error.number.not_gt; limit_value=0)
θ
  ensure this value is less than 6.283185307179586
  (type=value_error.number.not_lt; limit_value=6.283185307179586)

How it works

Type annotations

from typing import List
Vector = List[float]

def scale(scalar: float, vector: Vector) -> Vector:
    return [scalar * num for num in vector]
  • Any
  • List
  • Tuple
  • Optional

Common types:

I've also added a few

  • NPType
  • Array
  • DType

Pydantic recognizes e.g. both `list` and `List`, with different meaning.

It also adds new types.

How it works

Validator arguments

class UserModel(BaseModel):
    name: str
    username: str
    password1: str
    password2: str

    @validator('name')
    def name_must_contain_space(cls, v):
        if ' ' not in v:
            raise ValueError('must contain a space')
        return v.title()

    @validator('password2')
    def passwords_match(cls, v, values, **kwargs):
        if 'password1' in values and v != values['password1']:
            raise ValueError('passwords do not match')
        return v

    @validator('username')
    def username_alphanumeric(cls, v):
        assert v.isalpha(), 'must be alphanumeric'
        return v

How it works

Validator arguments

  • validators are "class methods", so the first argument value they receive is the UserModel class, not an instance of UserModel.
  • the second argument is always the field value to validate; it can be named as you please
  • you can also add any subset of the following arguments to the signature (the names must match):
    • values: a dict containing the name-to-value mapping of any previously-validated fields
    • config: the model config
    • field: the field being validated
    • **kwargs: if provided, this will include the arguments above not explicitly listed in the signature
  • validators should either return the parsed value or raise a ValueError, TypeError, or AssertionError (assert statements may be used).
  • where validators rely on other values, you should be aware that:

    • Validation is done in the order fields are defined. E.g. in the example above, password2 has access to password1 (and name), but password1 does not have access to password2.

    • If validation fails on another field (or that field is missing) it will not be included in values, hence if 'password1' in values and ... in this example.

How it works

Default values

class Vector(BaseModel):
  r: PositiveFloat = 1
  θ: confloat(gt=0, lt=2*np.pi) = 0
  • Default values are not validated
    • Can set sentinel values of different type, without `Optional[]`
      You are responsible for the types of your own defaults
  • Use a Field to specify also alias, title, constraints and more.

How it works

class A(BaseModel):
  x: int
  s: str
  @validator('x', pre=True)
  def check_x(cls, v):
    if v < 0:   # See PositiveInt
      raise ValueError(
        "`x` must be positive")
      return v
  @validator('s')
  def check_s(cls, v):
    if len(v) > 10:   # See min_length
      raise ValueError(
        "`s` must not be longer than 10")
      return v
  def __new__(cls, **kwargs):
    super().__new__(cls)
  def __init__(self, **kwargs):
    super().__init__(**kwargs)

Execution order

  1. __new__()
  2. __init__()
    (before super())
     
  3. super().__init__() →BaseModel.__init__()
    Where (most) pydantic magic starts
    1. @root_validator(pre=True)
       
    2. @validator(pre=True)
      →Use this to set e.g. defaults.
    3. Automatic validation (coercion)
    4. @validator
       
    5. @root_validator()
       
  4. __init__()
    (after super())

Repeats for each parameter, in declaration order

Execution order

How it works

class Foo(BaseModel):
    a: int
    b: int
    @root_validator
    def root_val_post(cls, values):
        print("root val post")
        return values
    @root_validator(pre=True)
    def root_val_pre(cls, values):
        print("root val pre")
        return values
    @validator('a')
    def val_post_a(cls, a):
        print("val post a")
        return a
    @validator('b', pre=True)
    def val_pre_b(cls, b):
        print("val pre b")
        return b
    @validator('a', pre=True)
    def val_pre_a(cls, a):
        print("val pre a")
        return a
    @validator('b')
    def val_post_b(cls, b):
        print("val post b")
        return b
Foo(a=1, b=1)
root val pre
val pre a
val post a
val pre b
val post b
root val post

Execution order

How it works

class Foo(BaseModel):
    a: int
    b: int
    @root_validator
    def root_val_post(cls, values):
        print("root val post")
        return values
    @root_validator(pre=True)
    def root_val_pre(cls, values):
        print("root val pre")
        return values
    @validator('a')
    def val_post_a(cls, a):
        print("val post a")
        return a
    @validator('b', pre=True)
    def val_pre_b(cls, b):
        print("val pre b")
        return b
    @validator('a', pre=True)
    def val_pre_a(cls, a):
        print("val pre a")
        return a
    @validator('b')
    def val_post_b(cls, b):
        print("val post b")
        return b
Foo()
root val pre
root val post

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
<ipython-input-29-fdea65d60c59> in <module>
----> 1 Foo()

~/usr/local/miniconda3/envs/sinn/lib/python3.8/site-packages/pydantic/main.cpython-38-x86_64-linux-gnu.so in pydantic.main.BaseModel.__init__()

ValidationError: 2 validation errors for Foo
a
  field required (type=value_error.missing)
b
  field required (type=value_error.missing)

Execution order

How it works

class Foo(BaseModel):
    a: int
    b: int
    @root_validator
    def root_val_post(cls, values):
        print("root val post")
        return values
    @root_validator(pre=True)
    def root_val_pre(cls, values):
        print("root val pre")
        return values
    @validator('a')
    def val_post_a(cls, a):
        print("val post a")
        return a
    @validator('b', pre=True)
    def val_pre_b(cls, b):
        print("val pre b")
        return b
    @validator('a', pre=True)
    def val_pre_a(cls, a):
        print("val pre a")
        return a
    @validator('b')
    def val_post_b(cls, b):
        print("val post b")
        return b
Foo(a=[1,2], b=1)
root val pre
val pre a
val pre b
val post b
root val post

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
<ipython-input-30-e67f50384fc4> in <module>
----> 1 Foo(a=[1,2], b=1)

~/usr/local/miniconda3/envs/sinn/lib/python3.8/site-packages/pydantic/main.cpython-38-x86_64-linux-gnu.so in pydantic.main.BaseModel.__init__()

ValidationError: 1 validation error for Foo
a
  value is not a valid integer (type=type_error.integer)

Execution order

How it works

class Foo(BaseModel):
    a: int
    b: int
    @root_validator
    def root_val_post(cls, values):
        print("root val post")
        return values
    @root_validator(pre=True)
    def root_val_pre(cls, values):
        print("root val pre")
        return values
    @validator('a')
    def val_post_a(cls, a):
        print("val post a")
        return a
    @validator('b', pre=True)
    def val_pre_b(cls, b):
        print("val pre b")
        return b
    @validator('a', pre=True)
    def val_pre_a(cls, a):
        print("val pre a")
        return a
    @validator('b')
    def val_post_b(cls, b):
        print("val post b")
        return b
Foo(a=1, b=[1,2])
root val pre
val pre a
val post a
val pre b
root val post

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
<ipython-input-31-ca8e22466f32> in <module>
----> 1 Foo(a=1, b=[1,2])

~/usr/local/miniconda3/envs/sinn/lib/python3.8/site-packages/pydantic/main.cpython-38-x86_64-linux-gnu.so in pydantic.main.BaseModel.__init__()

ValidationError: 1 validation error for Foo
b
  value is not a valid integer (type=type_error.integer)

Execution order

How it works

class Foo(BaseModel):
    a: int
    b: int
    @root_validator
    def root_val_post(cls, values):
        print("root val post")
        return values
    @root_validator(pre=True)
    def root_val_pre(cls, values):
        print("root val pre")
        return values
    @validator('a')
    def val_post_a(cls, a):
        print("val post a")
        return a
    @validator('b', pre=True)
    def val_pre_b(cls, b):
        print("val pre b")
        return b
    @validator('a', pre=True)
    def val_pre_a(cls, a):
        print("val pre a")
        return a
    @validator('b')
    def val_post_b(cls, b):
        print("val post b")
        return b
Foo(a=[1,2], b=[1,2])
root val pre
val pre a
val pre b
root val post

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
<ipython-input-32-fe07bec27756> in <module>
----> 1 Foo(a=[1,2], b=[1,2])

~/usr/local/miniconda3/envs/sinn/lib/python3.8/site-packages/pydantic/main.cpython-38-x86_64-linux-gnu.so in pydantic.main.BaseModel.__init__()

ValidationError: 2 validation errors for Foo
a
  value is not a valid integer (type=type_error.integer)
b
  value is not a valid integer (type=type_error.integer)

Execution order

How it works

Execution order – Summary

  • Validators are all executed;
    Errors are compiled at the end.
  • @root_validator(pre=True) executed before everything
  • @root_validator(pre=False) executed after everything
  • pre/post for a given variable are executed in immediate succession
  • validator order determined by annotation order

Consequences:

  • Know all of the failing inputs, not just first
  • BUT: Don't rely on earlier validators to sanitive inputs for later ones
    • Failing inputs are not added to `values` dictionary
  • Recommendation: Write methods in order they will be executed

How it works

Model configuration

class A(BaseMode):
    x: int
    data: CustomDataType
    @validate('x', pre=True)
    def check_x(cls, v):
        if v < 0:   # See PositiveInt
            raise ValueError("`x` must be positive")
        return v

    class Config:
        allow = True
        fields = {"data": {"description": "Recording in time-mV"}
        json_encoders = {CustomDataType: CustomDataType.json_econder}
  • Docs still a bit disorganized wrt Config options, especially the pros & cons of each.
  • ⇒ Use the search bar. Search also Github issues.

Patterns

class Model(BaseModel):
  opt: int = 0

Optional argument

class Model(BaseModel):
  opt: Optional[int]

Optional argument,
stays None

class Model(BaseModel):
  opt: float = Field(..., default_factory=time.time())

Optional argument,
computed default

class Model(BaseModel):
  req: float
  opt: str = None

  @validator('opt', always=True, pre=True)
  def set_opt(cls, opt, values):
    req = values.get('req', None)
    if req is None: return opt
    return str(req)

Optional argument, default depends on other params

Patterns

@validator('user')
def check_user(cls, user, values):
  a, b, c = (values.get(x, None)
             for x in ('a', 'b', 'c'))
  ...

Extracting multiple attributes

class TestExtra(BaseModel):
    __slots__ = ('processed_at',)
    a: int

    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        object.__setattr__(self, 'processed_at',
                           datetime.utcnow())

Internal variables which aren't part of the model

class Point(BaseModel):
  x: float
  y: float
  def __init__(self, desc=None, **kwargs):
    if isinstance(desc, Point):
      x = desc.x; y = desc.y
    elif isinstance(desc, dict):
      x = desc['x']; y = desc['y']
    if 'x' not in kwargs:
      kwargs['x'] = x
    if 'y' not in kwargs:
      kwargs['y'] = y
    super().__init__(**kwargs)

Overriding validators w/ __init__

  • Almost never the best solution, but sometimes the quickest.

Finer points

Numpy types

class _ArrayType(np.ndarray):
    @classmethod
    def __get_validators__(cls):
        yield cls.validate_type

    @classmethod
    def validate_type(cls, value, field):
        if isinstance(value, typing.NotCastableToArray):
            raise TypeError(f"Values of type {type(value)} cannot be casted "
                             "to a numpy array.")
        if isinstance(value, np.ndarray):
            # Don't create a new array unless necessary
            if cls._ndim  is not None and value.ndim != cls._ndim:
                raise TypeError(f"{field.name} expects a variable with "
                                f"{cls._ndim} dimensions.")
            # Issubdtype allows specifying abstract dtypes like 'number', 'floating'
            if np.issubdtype(value.dtype, cls.dtype):
                result = value
            elif np.can_cast(value, cls.dtype):
                result = value.astype(cls.dtype)
            else:
                raise TypeError(f"Cannot safely cast '{field.name}' type  "
                                f"({value.dtype}) to type {cls.dtype}.")
        else:
            result = np.array(value)
            # Issubdtype allows specifying abstract dtypes like 'number', 'floating'
            if np.issubdtype(result.dtype, cls.dtype):
                pass
            elif np.can_cast(result, cls.dtype):
                if cls._ndim is not None and result.ndim != cls._ndim:
                    raise TypeError(
                        f"The shape of the data ({result.shape}) does not " "correspond to the expected of dimensions "
                        f"({cls._ndim} for '{field.name}').")
                elif result.dtype != cls.dtype:
                    result = result.astype(cls.dtype)
            else:
                raise TypeError(f"Cannot1 safely cast '{field.name}' (type  "
                                f"{result.dtype}) to type {cls.dtype}.")
        return result

    @classmethod
    def __modify_schema__(cls, field_schema):
        # FIXME: Figure out how to use get schema of subfield
        field_schema.update(type ='array',
                            items={'type': 'number'})
    @classmethod
    def json_encoder(cls, v):
        """See typing.json_encoders."""
        return v.tolist()

class _ArrayMeta(type):
    def __getitem__(self, args):
        if isinstance(args, tuple):
            T = args[0]
            ndim = args[1] if len(args) > 1 else None
            extraargs = args[2:]  # For catching errors only
        else:
            T = args
            ndim = None
            extraargs = []
        if (not isinstance(T, type) or len(extraargs) > 0
            or not isinstance(ndim, (int, type(None)))):
            raise TypeError(
                "`Array` must be specified as either `Array[T]`"
                "or `Array[T, n], where `T` is a type and `n` is an int. "
                f"(received: {', '.join((str(a) for a in args))}]).")
        dtype=typing.convert_dtype(T)
        specifier = str(dtype)
        if ndim is not None:
            specifier += f",{ndim}"
        return type(f'Array[{specifier}]', (_ArrayType,),
                    {'dtype': dtype, '_ndim': ndim})

class Array(np.ndarray, metaclass=_ArrayMeta):
    """
    Use this to specify a NumPy array type annotation; `pydantic` will
    recognize the type and execute appropriate validation/parsing.

    This may become obsolete, or need to be updated, when NumPy officially
    supports type hints (see https://github.com/numpy/numpy-stubs).

    - `Array[T]` specifies an array with dtype `T`. Any expression for which
      `np.dtype(T)` is valid is accepted.
    - `Array[T,n]` specifies an array with dtype `T`, that must have exactly
      `n` dimensions.

    Example
    -------
    >>> from pydantic.dataclasses import dataclass
    >>> from mackelab_toolbox.typing import Array
    >>>
    >>> @dataclass
    >>> class Model:
    >>>     x: Array[np.float64]      # Array of 64-bit floats, any number of dimensions
    >>>     v: Array['float64', 1]    # 1-D array of 64-bit floats


    """
    pass

Summary – advantages

  • (sometimes) less boilerplate
  • Type checking
  • Separation of logic — One function per operation/check
  • Clean subclassing — Interface not reproduced in subclass
  • Self-document constraints
  • Validation on both initialization and attribute assignment (latter is optional)
  • Entirely cythonized
  • Extremely fast development — 70 releases in < 3 years
  • 95%? mature — New core features added, but few breaking changes.
    • My impression: v2 will more or less freeze the API
  • Separation of parsing/validation logic from task logic encourages more composable (⇒ reusable) code

Pydantic intro

By alexrene

Pydantic intro

An introduction to Pydantic for scientific applications

  • 109