Pydantic
-
Less boilerplate
-
More semantics
-
Composable models
Alexandre René
PyMoTW – 29 May 2020
Less boilerplate
class A:
def __init__(self, x:int):
self.x = x
class B(A):
def __init__(self, x:int, y:int):
super().__init__(x)
# Remove y from signature
self.y
class A(BaseModel):
x: int
class B(A):
y: int
Normal python
Pydantized python
More semantics
class Vector:
def __init__(self, r:float, θ:float):
if r <= 0:
raise ValueError(
"Negative radius")
if θ < 0 or 2*np.pi < θ:
raise ValueError(
"Angle outside [0,2π]")
self.r = r
self.θ = θ
class Vector(BaseModel):
r: PositiveFloat
θ: confloat(gt=0, lt=2*np.pi)
Normal python
Pydantized python
{'title': 'Vector',
'type': 'object',
'properties': {
'r': {'title': 'R', 'type': 'number',
'exclusiveMinimum': 0},
'θ': {'title': 'Θ',
'type': 'number',
'exclusiveMinimum': 0,
'exclusiveMaximum': 6.283185307179586}},
'required': ['r', 'θ']}
Vector.schema()
Composable models
class Complex(Vector):
def __init__(self, r:float, θ:float):
θ = θ % (2*np.pi)
super().__init__(r, θ)
def conj(self):
return Complex(self.r, -self.θ)
z = Complex(r=1, θ=0.75)
z.conj()
class Complex(Vector):
@validator('θ', pre=True)
def standardize_θ(θ):
return θ % (2*np.pi)
def conj(self):
return Complex(r=self.r, θ=-self.θ)
z = Complex(r=1, θ=0.75)
z.conj()
Normal python
Pydantized python
<__main__.Complex at 0x7fbf7c562580>
Complex(r=1.0, θ=5.533185307179586)
via inheritance
Composable models
class VectorBasis:
def __init__(self, e):
if not isinstance(e, Iterable):
raise TypeError(
"e is not iterable")
if not isinstance(e, list):
e = list(e)
for ei in e:
if not isinstance(ei, Vector):
raise TypeError(
"e must be composed of Vector objects")
self.e = e
basis = VectorBasis([z, z.conj()])
basis
class VectorBasis(BaseModel):
e :List[Vector]
basis = VectorBasis(e=[z, z.conj()])
basis
Normal python
Pydantized python
<__main__.VectorBasis at 0x7fbf7c5263d0>
VectorBasis(e=[Complex(r=1.0, θ=0.75),
Complex(r=1.0, θ=5.533185307179586)])
via composition
What is it's for
TODO: Example real-world application
What is it's for
- Defining parameter classes
- Parsing/coercing inputs
- Declarative (not imperative) definition of class parameters
- Automatic input validation/coercion
- Export/import model parameters
customizable
→ dict
→ JSON
\(\Biggl\{\)
≠ validation
Existing alternatives
- Extends dict with
- Hierarchical parameters
- Save/load to JSON-ish file
- ParameterSpace
- ParameterReference
- Hierarchical parameters
- + Drop-in replacement for dict
- - Subclassing dict is somewhat hackish
- - Stale project
Ecell = ParameterSet({'tau_m': 10.0, 'cm': 0.2})
Icell = ParameterSet({'tau_m': 15.0, 'cm': 0.5})
network = ParameterSet({'Ecells': Ecell, 'Icells': Icells})
network.Icells.cm = 0.7
network.save("network.param")
network2 = ParameterSet("network.param")
print(network2.pretty())
{
"Ecells": {
"tau_m": 10.0,
"cm": 0.2,
},
"Icells": {
"tau_m": 15.0,
"cm": 0.7,
},
}
Existing alternatives
- Clean, compact model specification w/ types
- Proper class
- →Attach methods
- →Attach methods
- Default repr()
- + Built-in to Python 3.7+
- - Need to write your own import/export
- - No type casting
from dataclasses import dataclass
@dataclass
class InventoryItem:
'''Class for keeping track of an item in inventory.'''
name: str
unit_price: float
quantity_on_hand: int = 0
def total_cost(self) -> float:
return self.unit_price * self.quantity_on_hand
Key advantages
- (sometimes) less boilerplate
- Type checking
- Separation of logic — One function per operation/check
- Clean subclassing — Interface not reproduced in subclass
- Self-document constraints
- Validation on both initialization and attribute assignment (latter is optional)
- Entirely cythonized
- Extremely fast development — 70 releases in < 3 years
- 95%? mature — New core features added, but few breaking changes.
- My impression: v2 will more or less freeze the API
- Separation of parsing/validation logic from task logic encourages more composable (⇒ reusable) code
Biggest disadvantage
Debugging validators is somewhat different; stack trace is less informative. This is because:
- Validators are all executed, and a summary of all errors is printed ⇒ stack trace doesn't contain the validator;
- Validators are called from Cython code ⇒ inspection with pdb more limited.
Adaptations:
- Learn to read Pydantic error output
- Place debugging statements inside validators
(that I found)
ValidationError: 2 validation errors for Vector
r
ensure this value is greater than 0
(type=value_error.number.not_gt; limit_value=0)
θ
ensure this value is less than 6.283185307179586
(type=value_error.number.not_lt; limit_value=6.283185307179586)
How it works
Type annotations
from typing import List
Vector = List[float]
def scale(scalar: float, vector: Vector) -> Vector:
return [scalar * num for num in vector]
- Any
- List
- Tuple
- Optional
Common types:
Pydantic recognizes e.g. both `list` and `List`, with different meaning.
It also adds new types.
How it works
Validator arguments
class UserModel(BaseModel):
name: str
username: str
password1: str
password2: str
@validator('name')
def name_must_contain_space(cls, v):
if ' ' not in v:
raise ValueError('must contain a space')
return v.title()
@validator('password2')
def passwords_match(cls, v, values, **kwargs):
if 'password1' in values and v != values['password1']:
raise ValueError('passwords do not match')
return v
@validator('username')
def username_alphanumeric(cls, v):
assert v.isalpha(), 'must be alphanumeric'
return v
How it works
Validator arguments
- validators are "class methods", so the first argument value they receive is the UserModel class, not an instance of UserModel.
- the second argument is always the field value to validate; it can be named as you please
- you can also add any subset of the following arguments to the signature (the names must match):
- values: a dict containing the name-to-value mapping of any previously-validated fields
- config: the model config
- field: the field being validated
- **kwargs: if provided, this will include the arguments above not explicitly listed in the signature
- validators should either return the parsed value or raise a ValueError, TypeError, or AssertionError (assert statements may be used).
-
where validators rely on other values, you should be aware that:
-
Validation is done in the order fields are defined. E.g. in the example above, password2 has access to password1 (and name), but password1 does not have access to password2.
-
If validation fails on another field (or that field is missing) it will not be included in values, hence if 'password1' in values and ... in this example.
-
How it works
Default values
class Vector(BaseModel):
r: PositiveFloat = 1
θ: confloat(gt=0, lt=2*np.pi) = 0
- Default values are not validated
- Can set sentinel values of different type, without `Optional[]`
You are responsible for the types of your own defaults
- Can set sentinel values of different type, without `Optional[]`
- Use a Field to specify also alias, title, constraints and more.
How it works
class A(BaseModel):
x: int
s: str
@validator('x', pre=True)
def check_x(cls, v):
if v < 0: # See PositiveInt
raise ValueError(
"`x` must be positive")
return v
@validator('s')
def check_s(cls, v):
if len(v) > 10: # See min_length
raise ValueError(
"`s` must not be longer than 10")
return v
def __new__(cls, **kwargs):
super().__new__(cls)
def __init__(self, **kwargs):
super().__init__(**kwargs)
Execution order
- __new__()
- __init__()
(before super())
- super().__init__() →BaseModel.__init__()
Where (most) pydantic magic starts- @root_validator(pre=True)
- @validator(pre=True)
→Use this to set e.g. defaults. - Automatic validation (coercion)
- @validator
- @root_validator()
- @root_validator(pre=True)
- __init__()
(after super())
Repeats for each parameter, in declaration order
Execution order
How it works
class Foo(BaseModel):
a: int
b: int
@root_validator
def root_val_post(cls, values):
print("root val post")
return values
@root_validator(pre=True)
def root_val_pre(cls, values):
print("root val pre")
return values
@validator('a')
def val_post_a(cls, a):
print("val post a")
return a
@validator('b', pre=True)
def val_pre_b(cls, b):
print("val pre b")
return b
@validator('a', pre=True)
def val_pre_a(cls, a):
print("val pre a")
return a
@validator('b')
def val_post_b(cls, b):
print("val post b")
return b
Foo(a=1, b=1)
root val pre
val pre a
val post a
val pre b
val post b
root val post
Execution order
How it works
class Foo(BaseModel):
a: int
b: int
@root_validator
def root_val_post(cls, values):
print("root val post")
return values
@root_validator(pre=True)
def root_val_pre(cls, values):
print("root val pre")
return values
@validator('a')
def val_post_a(cls, a):
print("val post a")
return a
@validator('b', pre=True)
def val_pre_b(cls, b):
print("val pre b")
return b
@validator('a', pre=True)
def val_pre_a(cls, a):
print("val pre a")
return a
@validator('b')
def val_post_b(cls, b):
print("val post b")
return b
Foo()
root val pre
root val post
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
<ipython-input-29-fdea65d60c59> in <module>
----> 1 Foo()
~/usr/local/miniconda3/envs/sinn/lib/python3.8/site-packages/pydantic/main.cpython-38-x86_64-linux-gnu.so in pydantic.main.BaseModel.__init__()
ValidationError: 2 validation errors for Foo
a
field required (type=value_error.missing)
b
field required (type=value_error.missing)
Execution order
How it works
class Foo(BaseModel):
a: int
b: int
@root_validator
def root_val_post(cls, values):
print("root val post")
return values
@root_validator(pre=True)
def root_val_pre(cls, values):
print("root val pre")
return values
@validator('a')
def val_post_a(cls, a):
print("val post a")
return a
@validator('b', pre=True)
def val_pre_b(cls, b):
print("val pre b")
return b
@validator('a', pre=True)
def val_pre_a(cls, a):
print("val pre a")
return a
@validator('b')
def val_post_b(cls, b):
print("val post b")
return b
Foo(a=[1,2], b=1)
root val pre
val pre a
val pre b
val post b
root val post
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
<ipython-input-30-e67f50384fc4> in <module>
----> 1 Foo(a=[1,2], b=1)
~/usr/local/miniconda3/envs/sinn/lib/python3.8/site-packages/pydantic/main.cpython-38-x86_64-linux-gnu.so in pydantic.main.BaseModel.__init__()
ValidationError: 1 validation error for Foo
a
value is not a valid integer (type=type_error.integer)
Execution order
How it works
class Foo(BaseModel):
a: int
b: int
@root_validator
def root_val_post(cls, values):
print("root val post")
return values
@root_validator(pre=True)
def root_val_pre(cls, values):
print("root val pre")
return values
@validator('a')
def val_post_a(cls, a):
print("val post a")
return a
@validator('b', pre=True)
def val_pre_b(cls, b):
print("val pre b")
return b
@validator('a', pre=True)
def val_pre_a(cls, a):
print("val pre a")
return a
@validator('b')
def val_post_b(cls, b):
print("val post b")
return b
Foo(a=1, b=[1,2])
root val pre
val pre a
val post a
val pre b
root val post
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
<ipython-input-31-ca8e22466f32> in <module>
----> 1 Foo(a=1, b=[1,2])
~/usr/local/miniconda3/envs/sinn/lib/python3.8/site-packages/pydantic/main.cpython-38-x86_64-linux-gnu.so in pydantic.main.BaseModel.__init__()
ValidationError: 1 validation error for Foo
b
value is not a valid integer (type=type_error.integer)
Execution order
How it works
class Foo(BaseModel):
a: int
b: int
@root_validator
def root_val_post(cls, values):
print("root val post")
return values
@root_validator(pre=True)
def root_val_pre(cls, values):
print("root val pre")
return values
@validator('a')
def val_post_a(cls, a):
print("val post a")
return a
@validator('b', pre=True)
def val_pre_b(cls, b):
print("val pre b")
return b
@validator('a', pre=True)
def val_pre_a(cls, a):
print("val pre a")
return a
@validator('b')
def val_post_b(cls, b):
print("val post b")
return b
Foo(a=[1,2], b=[1,2])
root val pre
val pre a
val pre b
root val post
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
<ipython-input-32-fe07bec27756> in <module>
----> 1 Foo(a=[1,2], b=[1,2])
~/usr/local/miniconda3/envs/sinn/lib/python3.8/site-packages/pydantic/main.cpython-38-x86_64-linux-gnu.so in pydantic.main.BaseModel.__init__()
ValidationError: 2 validation errors for Foo
a
value is not a valid integer (type=type_error.integer)
b
value is not a valid integer (type=type_error.integer)
Execution order
How it works
Execution order – Summary
- Validators are all executed;
Errors are compiled at the end. - @root_validator(pre=True) executed before everything
- @root_validator(pre=False) executed after everything
- pre/post for a given variable are executed in immediate succession
- validator order determined by annotation order
Consequences:
- Know all of the failing inputs, not just first
- BUT: Don't rely on earlier validators to sanitive inputs for later ones
- Failing inputs are not added to `values` dictionary
- Recommendation: Write methods in order they will be executed
How it works
Model configuration
class A(BaseMode):
x: int
data: CustomDataType
@validate('x', pre=True)
def check_x(cls, v):
if v < 0: # See PositiveInt
raise ValueError("`x` must be positive")
return v
class Config:
allow = True
fields = {"data": {"description": "Recording in time-mV"}
json_encoders = {CustomDataType: CustomDataType.json_econder}
- Docs still a bit disorganized wrt Config options, especially the pros & cons of each.
- ⇒ Use the search bar. Search also Github issues.
Patterns
class Model(BaseModel):
opt: int = 0
Optional argument
class Model(BaseModel):
opt: Optional[int]
Optional argument,
stays None
class Model(BaseModel):
opt: float = Field(..., default_factory=time.time())
Optional argument,
computed default
class Model(BaseModel):
req: float
opt: str = None
@validator('opt', always=True, pre=True)
def set_opt(cls, opt, values):
req = values.get('req', None)
if req is None: return opt
return str(req)
Optional argument, default depends on other params
Patterns
@validator('user')
def check_user(cls, user, values):
a, b, c = (values.get(x, None)
for x in ('a', 'b', 'c'))
...
Extracting multiple attributes
class TestExtra(BaseModel):
__slots__ = ('processed_at',)
a: int
def __init__(self, **kwargs):
super().__init__(**kwargs)
object.__setattr__(self, 'processed_at',
datetime.utcnow())
Internal variables which aren't part of the model
class Point(BaseModel):
x: float
y: float
def __init__(self, desc=None, **kwargs):
if isinstance(desc, Point):
x = desc.x; y = desc.y
elif isinstance(desc, dict):
x = desc['x']; y = desc['y']
if 'x' not in kwargs:
kwargs['x'] = x
if 'y' not in kwargs:
kwargs['y'] = y
super().__init__(**kwargs)
Overriding validators w/ __init__
- Almost never the best solution, but sometimes the quickest.
Finer points
Numpy types
class _ArrayType(np.ndarray):
@classmethod
def __get_validators__(cls):
yield cls.validate_type
@classmethod
def validate_type(cls, value, field):
if isinstance(value, typing.NotCastableToArray):
raise TypeError(f"Values of type {type(value)} cannot be casted "
"to a numpy array.")
if isinstance(value, np.ndarray):
# Don't create a new array unless necessary
if cls._ndim is not None and value.ndim != cls._ndim:
raise TypeError(f"{field.name} expects a variable with "
f"{cls._ndim} dimensions.")
# Issubdtype allows specifying abstract dtypes like 'number', 'floating'
if np.issubdtype(value.dtype, cls.dtype):
result = value
elif np.can_cast(value, cls.dtype):
result = value.astype(cls.dtype)
else:
raise TypeError(f"Cannot safely cast '{field.name}' type "
f"({value.dtype}) to type {cls.dtype}.")
else:
result = np.array(value)
# Issubdtype allows specifying abstract dtypes like 'number', 'floating'
if np.issubdtype(result.dtype, cls.dtype):
pass
elif np.can_cast(result, cls.dtype):
if cls._ndim is not None and result.ndim != cls._ndim:
raise TypeError(
f"The shape of the data ({result.shape}) does not " "correspond to the expected of dimensions "
f"({cls._ndim} for '{field.name}').")
elif result.dtype != cls.dtype:
result = result.astype(cls.dtype)
else:
raise TypeError(f"Cannot1 safely cast '{field.name}' (type "
f"{result.dtype}) to type {cls.dtype}.")
return result
@classmethod
def __modify_schema__(cls, field_schema):
# FIXME: Figure out how to use get schema of subfield
field_schema.update(type ='array',
items={'type': 'number'})
@classmethod
def json_encoder(cls, v):
"""See typing.json_encoders."""
return v.tolist()
class _ArrayMeta(type):
def __getitem__(self, args):
if isinstance(args, tuple):
T = args[0]
ndim = args[1] if len(args) > 1 else None
extraargs = args[2:] # For catching errors only
else:
T = args
ndim = None
extraargs = []
if (not isinstance(T, type) or len(extraargs) > 0
or not isinstance(ndim, (int, type(None)))):
raise TypeError(
"`Array` must be specified as either `Array[T]`"
"or `Array[T, n], where `T` is a type and `n` is an int. "
f"(received: {', '.join((str(a) for a in args))}]).")
dtype=typing.convert_dtype(T)
specifier = str(dtype)
if ndim is not None:
specifier += f",{ndim}"
return type(f'Array[{specifier}]', (_ArrayType,),
{'dtype': dtype, '_ndim': ndim})
class Array(np.ndarray, metaclass=_ArrayMeta):
"""
Use this to specify a NumPy array type annotation; `pydantic` will
recognize the type and execute appropriate validation/parsing.
This may become obsolete, or need to be updated, when NumPy officially
supports type hints (see https://github.com/numpy/numpy-stubs).
- `Array[T]` specifies an array with dtype `T`. Any expression for which
`np.dtype(T)` is valid is accepted.
- `Array[T,n]` specifies an array with dtype `T`, that must have exactly
`n` dimensions.
Example
-------
>>> from pydantic.dataclasses import dataclass
>>> from mackelab_toolbox.typing import Array
>>>
>>> @dataclass
>>> class Model:
>>> x: Array[np.float64] # Array of 64-bit floats, any number of dimensions
>>> v: Array['float64', 1] # 1-D array of 64-bit floats
"""
pass
Summary – advantages
- (sometimes) less boilerplate
- Type checking
- Separation of logic — One function per operation/check
- Clean subclassing — Interface not reproduced in subclass
- Self-document constraints
- Validation on both initialization and attribute assignment (latter is optional)
- Entirely cythonized
- Extremely fast development — 70 releases in < 3 years
- 95%? mature — New core features added, but few breaking changes.
- My impression: v2 will more or less freeze the API
- Separation of parsing/validation logic from task logic encourages more composable (⇒ reusable) code
Pydantic intro
By alexrene
Pydantic intro
An introduction to Pydantic for scientific applications
- 109