Pydantic
-
Less boilerplate
-
More semantics
-
Composable models
Alexandre René
PyMoTW – 29 May 2020
Less boilerplate
class A:
def __init__(self, x:int):
self.x = x
class B(A):
def __init__(self, x:int, y:int):
super().__init__(x)
# Remove y from signature
self.y
class A(BaseModel):
x: int
class B(A):
y: int
Normal python
Pydantized python
More semantics
class Vector:
def __init__(self, r:float, θ:float):
if r <= 0:
raise ValueError(
"Negative radius")
if θ < 0 or 2*np.pi < θ:
raise ValueError(
"Angle outside [0,2π]")
self.r = r
self.θ = θ
class Vector(BaseModel):
r: PositiveFloat
θ: confloat(gt=0, lt=2*np.pi)
Normal python
Pydantized python
{'title': 'Vector',
'type': 'object',
'properties': {
'r': {'title': 'R', 'type': 'number',
'exclusiveMinimum': 0},
'θ': {'title': 'Θ',
'type': 'number',
'exclusiveMinimum': 0,
'exclusiveMaximum': 6.283185307179586}},
'required': ['r', 'θ']}
Vector.schema()
Composable models
class Complex(Vector):
def __init__(self, r:float, θ:float):
θ = θ % (2*np.pi)
super().__init__(r, θ)
def conj(self):
return Complex(self.r, -self.θ)
z = Complex(r=1, θ=0.75)
z.conj()
class Complex(Vector):
@validator('θ', pre=True)
def standardize_θ(θ):
return θ % (2*np.pi)
def conj(self):
return Complex(r=self.r, θ=-self.θ)
z = Complex(r=1, θ=0.75)
z.conj()
Normal python
Pydantized python
<__main__.Complex at 0x7fbf7c562580>
Complex(r=1.0, θ=5.533185307179586)
via inheritance
Composable models
class VectorBasis:
def __init__(self, e):
if not isinstance(e, Iterable):
raise TypeError(
"e is not iterable")
if not isinstance(e, list):
e = list(e)
for ei in e:
if not isinstance(ei, Vector):
raise TypeError(
"e must be composed of Vector objects")
self.e = e
basis = VectorBasis([z, z.conj()])
basis
class VectorBasis(BaseModel):
e :List[Vector]
basis = VectorBasis(e=[z, z.conj()])
basis
Normal python
Pydantized python
<__main__.VectorBasis at 0x7fbf7c5263d0>
VectorBasis(e=[Complex(r=1.0, θ=0.75),
Complex(r=1.0, θ=5.533185307179586)])
via composition
What is it's for
TODO: Example real-world application
What is it's for
- Defining parameter classes
- Parsing/coercing inputs
- Declarative (not imperative) definition of class parameters
- Automatic input validation/coercion
- Export/import model parameters
customizable
→ dict
→ JSON
{
≠ validation
Existing alternatives
- Extends dict with
- Hierarchical parameters
- Save/load to JSON-ish file
- ParameterSpace
- ParameterReference
- Hierarchical parameters
- + Drop-in replacement for dict
- - Subclassing dict is somewhat hackish
- - Stale project
Ecell = ParameterSet({'tau_m': 10.0, 'cm': 0.2})
Icell = ParameterSet({'tau_m': 15.0, 'cm': 0.5})
network = ParameterSet({'Ecells': Ecell, 'Icells': Icells})
network.Icells.cm = 0.7
network.save("network.param")
network2 = ParameterSet("network.param")
print(network2.pretty())
{
"Ecells": {
"tau_m": 10.0,
"cm": 0.2,
},
"Icells": {
"tau_m": 15.0,
"cm": 0.7,
},
}
Existing alternatives
- Clean, compact model specification w/ types
- Proper class
- →Attach methods
- →Attach methods
- Default repr()
- + Built-in to Python 3.7+
- - Need to write your own import/export
- - No type casting
from dataclasses import dataclass
@dataclass
class InventoryItem:
'''Class for keeping track of an item in inventory.'''
name: str
unit_price: float
quantity_on_hand: int = 0
def total_cost(self) -> float:
return self.unit_price * self.quantity_on_hand
Key advantages
- (sometimes) less boilerplate
- Type checking
- Separation of logic — One function per operation/check
- Clean subclassing — Interface not reproduced in subclass
- Self-document constraints
- Validation on both initialization and attribute assignment (latter is optional)
- Entirely cythonized
- Extremely fast development — 70 releases in < 3 years
- 95%? mature — New core features added, but few breaking changes.
- My impression: v2 will more or less freeze the API
- Separation of parsing/validation logic from task logic encourages more composable (⇒ reusable) code
Biggest disadvantage
Debugging validators is somewhat different; stack trace is less informative. This is because:
- Validators are all executed, and a summary of all errors is printed ⇒ stack trace doesn't contain the validator;
- Validators are called from Cython code ⇒ inspection with pdb more limited.
Adaptations:
- Learn to read Pydantic error output
- Place debugging statements inside validators
(that I found)
ValidationError: 2 validation errors for Vector
r
ensure this value is greater than 0
(type=value_error.number.not_gt; limit_value=0)
θ
ensure this value is less than 6.283185307179586
(type=value_error.number.not_lt; limit_value=6.283185307179586)
How it works
Type annotations
from typing import List Vector = List[float] def scale(scalar: float, vector: Vector) -> Vector: return [scalar * num for num in vector]
- Any
- List
- Tuple
- Optional
Common types:
Pydantic recognizes e.g. both `list` and `List`, with different meaning.
It also adds new types.
How it works
Validator arguments
class UserModel(BaseModel):
name: str
username: str
password1: str
password2: str
@validator('name')
def name_must_contain_space(cls, v):
if ' ' not in v:
raise ValueError('must contain a space')
return v.title()
@validator('password2')
def passwords_match(cls, v, values, **kwargs):
if 'password1' in values and v != values['password1']:
raise ValueError('passwords do not match')
return v
@validator('username')
def username_alphanumeric(cls, v):
assert v.isalpha(), 'must be alphanumeric'
return v
How it works
Validator arguments
- validators are "class methods", so the first argument value they receive is the UserModel class, not an instance of UserModel.
- the second argument is always the field value to validate; it can be named as you please
- you can also add any subset of the following arguments to the signature (the names must match):
- values: a dict containing the name-to-value mapping of any previously-validated fields
- config: the model config
- field: the field being validated
- **kwargs: if provided, this will include the arguments above not explicitly listed in the signature
- validators should either return the parsed value or raise a ValueError, TypeError, or AssertionError (assert statements may be used).
-
where validators rely on other values, you should be aware that:
-
Validation is done in the order fields are defined. E.g. in the example above, password2 has access to password1 (and name), but password1 does not have access to password2.
-
If validation fails on another field (or that field is missing) it will not be included in values, hence if 'password1' in values and ... in this example.
-
How it works
Default values
class Vector(BaseModel):
r: PositiveFloat = 1
θ: confloat(gt=0, lt=2*np.pi) = 0
- Default values are not validated
- Can set sentinel values of different type, without `Optional[]`
You are responsible for the types of your own defaults
- Can set sentinel values of different type, without `Optional[]`
- Use a Field to specify also alias, title, constraints and more.
How it works
class A(BaseModel): x: int s: str @validator('x', pre=True) def check_x(cls, v): if v < 0: # See PositiveInt raise ValueError( "`x` must be positive") return v @validator('s') def check_s(cls, v): if len(v) > 10: # See min_length raise ValueError( "`s` must not be longer than 10") return v def __new__(cls, **kwargs): super().__new__(cls) def __init__(self, **kwargs): super().__init__(**kwargs)
Execution order
- __new__()
- __init__()
(before super())
- super().__init__() →BaseModel.__init__()
Where (most) pydantic magic starts- @root_validator(pre=True)
- @validator(pre=True)
→Use this to set e.g. defaults. - Automatic validation (coercion)
- @validator
- @root_validator()
- @root_validator(pre=True)
- __init__()
(after super())
Repeats for each parameter, in declaration order
Execution order
How it works
class Foo(BaseModel): a: int b: int @root_validator def root_val_post(cls, values): print("root val post") return values @root_validator(pre=True) def root_val_pre(cls, values): print("root val pre") return values @validator('a') def val_post_a(cls, a): print("val post a") return a @validator('b', pre=True) def val_pre_b(cls, b): print("val pre b") return b @validator('a', pre=True) def val_pre_a(cls, a): print("val pre a") return a @validator('b') def val_post_b(cls, b): print("val post b") return b
Foo(a=1, b=1)
root val pre
val pre a
val post a
val pre b
val post b
root val post
Execution order
How it works
class Foo(BaseModel): a: int b: int @root_validator def root_val_post(cls, values): print("root val post") return values @root_validator(pre=True) def root_val_pre(cls, values): print("root val pre") return values @validator('a') def val_post_a(cls, a): print("val post a") return a @validator('b', pre=True) def val_pre_b(cls, b): print("val pre b") return b @validator('a', pre=True) def val_pre_a(cls, a): print("val pre a") return a @validator('b') def val_post_b(cls, b): print("val post b") return b
Foo()
root val pre
root val post
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
<ipython-input-29-fdea65d60c59> in <module>
----> 1 Foo()
~/usr/local/miniconda3/envs/sinn/lib/python3.8/site-packages/pydantic/main.cpython-38-x86_64-linux-gnu.so in pydantic.main.BaseModel.__init__()
ValidationError: 2 validation errors for Foo
a
field required (type=value_error.missing)
b
field required (type=value_error.missing)
Execution order
How it works
class Foo(BaseModel): a: int b: int @root_validator def root_val_post(cls, values): print("root val post") return values @root_validator(pre=True) def root_val_pre(cls, values): print("root val pre") return values @validator('a') def val_post_a(cls, a): print("val post a") return a @validator('b', pre=True) def val_pre_b(cls, b): print("val pre b") return b @validator('a', pre=True) def val_pre_a(cls, a): print("val pre a") return a @validator('b') def val_post_b(cls, b): print("val post b") return b
Foo(a=[1,2], b=1)
root val pre
val pre a
val pre b
val post b
root val post
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
<ipython-input-30-e67f50384fc4> in <module>
----> 1 Foo(a=[1,2], b=1)
~/usr/local/miniconda3/envs/sinn/lib/python3.8/site-packages/pydantic/main.cpython-38-x86_64-linux-gnu.so in pydantic.main.BaseModel.__init__()
ValidationError: 1 validation error for Foo
a
value is not a valid integer (type=type_error.integer)
Execution order
How it works
class Foo(BaseModel): a: int b: int @root_validator def root_val_post(cls, values): print("root val post") return values @root_validator(pre=True) def root_val_pre(cls, values): print("root val pre") return values @validator('a') def val_post_a(cls, a): print("val post a") return a @validator('b', pre=True) def val_pre_b(cls, b): print("val pre b") return b @validator('a', pre=True) def val_pre_a(cls, a): print("val pre a") return a @validator('b') def val_post_b(cls, b): print("val post b") return b
Foo(a=1, b=[1,2])
root val pre
val pre a
val post a
val pre b
root val post
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
<ipython-input-31-ca8e22466f32> in <module>
----> 1 Foo(a=1, b=[1,2])
~/usr/local/miniconda3/envs/sinn/lib/python3.8/site-packages/pydantic/main.cpython-38-x86_64-linux-gnu.so in pydantic.main.BaseModel.__init__()
ValidationError: 1 validation error for Foo
b
value is not a valid integer (type=type_error.integer)
Execution order
How it works
class Foo(BaseModel): a: int b: int @root_validator def root_val_post(cls, values): print("root val post") return values @root_validator(pre=True) def root_val_pre(cls, values): print("root val pre") return values @validator('a') def val_post_a(cls, a): print("val post a") return a @validator('b', pre=True) def val_pre_b(cls, b): print("val pre b") return b @validator('a', pre=True) def val_pre_a(cls, a): print("val pre a") return a @validator('b') def val_post_b(cls, b): print("val post b") return b
Foo(a=[1,2], b=[1,2])
root val pre
val pre a
val pre b
root val post
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
<ipython-input-32-fe07bec27756> in <module>
----> 1 Foo(a=[1,2], b=[1,2])
~/usr/local/miniconda3/envs/sinn/lib/python3.8/site-packages/pydantic/main.cpython-38-x86_64-linux-gnu.so in pydantic.main.BaseModel.__init__()
ValidationError: 2 validation errors for Foo
a
value is not a valid integer (type=type_error.integer)
b
value is not a valid integer (type=type_error.integer)
Execution order
How it works
Execution order – Summary
- Validators are all executed;
Errors are compiled at the end. - @root_validator(pre=True) executed before everything
- @root_validator(pre=False) executed after everything
- pre/post for a given variable are executed in immediate succession
- validator order determined by annotation order
Consequences:
- Know all of the failing inputs, not just first
- BUT: Don't rely on earlier validators to sanitive inputs for later ones
- Failing inputs are not added to `values` dictionary
- Recommendation: Write methods in order they will be executed
How it works
Model configuration
class A(BaseMode):
x: int
data: CustomDataType
@validate('x', pre=True)
def check_x(cls, v):
if v < 0: # See PositiveInt
raise ValueError("`x` must be positive")
return v
class Config:
allow = True
fields = {"data": {"description": "Recording in time-mV"}
json_encoders = {CustomDataType: CustomDataType.json_econder}
- Docs still a bit disorganized wrt Config options, especially the pros & cons of each.
- ⇒ Use the search bar. Search also Github issues.
Patterns
class Model(BaseModel):
opt: int = 0
Optional argument
class Model(BaseModel):
opt: Optional[int]
Optional argument,
stays None
class Model(BaseModel):
opt: float = Field(..., default_factory=time.time())
Optional argument,
computed default
class Model(BaseModel):
req: float
opt: str = None
@validator('opt', always=True, pre=True)
def set_opt(cls, opt, values):
req = values.get('req', None)
if req is None: return opt
return str(req)
Optional argument, default depends on other params
Patterns
@validator('user')
def check_user(cls, user, values):
a, b, c = (values.get(x, None)
for x in ('a', 'b', 'c'))
...
Extracting multiple attributes
class TestExtra(BaseModel):
__slots__ = ('processed_at',)
a: int
def __init__(self, **kwargs):
super().__init__(**kwargs)
object.__setattr__(self, 'processed_at',
datetime.utcnow())
Internal variables which aren't part of the model
class Point(BaseModel):
x: float
y: float
def __init__(self, desc=None, **kwargs):
if isinstance(desc, Point):
x = desc.x; y = desc.y
elif isinstance(desc, dict):
x = desc['x']; y = desc['y']
if 'x' not in kwargs:
kwargs['x'] = x
if 'y' not in kwargs:
kwargs['y'] = y
super().__init__(**kwargs)
Overriding validators w/ __init__
- Almost never the best solution, but sometimes the quickest.
Finer points
Numpy types
class _ArrayType(np.ndarray): @classmethod def __get_validators__(cls): yield cls.validate_type @classmethod def validate_type(cls, value, field): if isinstance(value, typing.NotCastableToArray): raise TypeError(f"Values of type {type(value)} cannot be casted " "to a numpy array.") if isinstance(value, np.ndarray): # Don't create a new array unless necessary if cls._ndim is not None and value.ndim != cls._ndim: raise TypeError(f"{field.name} expects a variable with " f"{cls._ndim} dimensions.") # Issubdtype allows specifying abstract dtypes like 'number', 'floating' if np.issubdtype(value.dtype, cls.dtype): result = value elif np.can_cast(value, cls.dtype): result = value.astype(cls.dtype) else: raise TypeError(f"Cannot safely cast '{field.name}' type " f"({value.dtype}) to type {cls.dtype}.") else: result = np.array(value) # Issubdtype allows specifying abstract dtypes like 'number', 'floating' if np.issubdtype(result.dtype, cls.dtype): pass elif np.can_cast(result, cls.dtype): if cls._ndim is not None and result.ndim != cls._ndim: raise TypeError( f"The shape of the data ({result.shape}) does not " "correspond to the expected of dimensions " f"({cls._ndim} for '{field.name}').") elif result.dtype != cls.dtype: result = result.astype(cls.dtype) else: raise TypeError(f"Cannot1 safely cast '{field.name}' (type " f"{result.dtype}) to type {cls.dtype}.") return result @classmethod def __modify_schema__(cls, field_schema): # FIXME: Figure out how to use get schema of subfield field_schema.update(type ='array', items={'type': 'number'}) @classmethod def json_encoder(cls, v): """See typing.json_encoders.""" return v.tolist() class _ArrayMeta(type): def __getitem__(self, args): if isinstance(args, tuple): T = args[0] ndim = args[1] if len(args) > 1 else None extraargs = args[2:] # For catching errors only else: T = args ndim = None extraargs = [] if (not isinstance(T, type) or len(extraargs) > 0 or not isinstance(ndim, (int, type(None)))): raise TypeError( "`Array` must be specified as either `Array[T]`" "or `Array[T, n], where `T` is a type and `n` is an int. " f"(received: {', '.join((str(a) for a in args))}]).") dtype=typing.convert_dtype(T) specifier = str(dtype) if ndim is not None: specifier += f",{ndim}" return type(f'Array[{specifier}]', (_ArrayType,), {'dtype': dtype, '_ndim': ndim}) class Array(np.ndarray, metaclass=_ArrayMeta): """ Use this to specify a NumPy array type annotation; `pydantic` will recognize the type and execute appropriate validation/parsing. This may become obsolete, or need to be updated, when NumPy officially supports type hints (see https://github.com/numpy/numpy-stubs). - `Array[T]` specifies an array with dtype `T`. Any expression for which `np.dtype(T)` is valid is accepted. - `Array[T,n]` specifies an array with dtype `T`, that must have exactly `n` dimensions. Example ------- >>> from pydantic.dataclasses import dataclass >>> from mackelab_toolbox.typing import Array >>> >>> @dataclass >>> class Model: >>> x: Array[np.float64] # Array of 64-bit floats, any number of dimensions >>> v: Array['float64', 1] # 1-D array of 64-bit floats """ pass
Summary – advantages
- (sometimes) less boilerplate
- Type checking
- Separation of logic — One function per operation/check
- Clean subclassing — Interface not reproduced in subclass
- Self-document constraints
- Validation on both initialization and attribute assignment (latter is optional)
- Entirely cythonized
- Extremely fast development — 70 releases in < 3 years
- 95%? mature — New core features added, but few breaking changes.
- My impression: v2 will more or less freeze the API
- Separation of parsing/validation logic from task logic encourages more composable (⇒ reusable) code
Pydantic intro
By alexrene
Pydantic intro
An introduction to Pydantic for scientific applications
- 143