Pydantic
What's coming in V2
&
A review of building python extensions with Rust
Please ask questions as I go along ... I've written this in a hurry so if you're confused, you're probably not alone
Some Background
- Pydantic does data validation using type hints
- It was first released in 2017 as a tiny experiment
- The library has since seen massive growth ... particularly since Sebastián used pydantic in FastAPI. Thanks Sebastián! 🙏
- Pydantic hasn't been significantly rewritten since v0.0.1
- The internals are creaking
- V2 is an opportunity to fix some of footguns but also re-write the internals
The good bits
Everything! (and hopefully nothing for you):
-
all validation is offloaded to another library - pydantic-core which I'm building now
- Ugly hangovers from mistakes I made while building v0.0.1 can be killed
- We can use a little of the speed premium provided by pydantic-core to do some stuff more correctly - e.g. smart unions
- I've given up on the "validation" vs. "parsing" vs. "coercion" debate, I'm just using "validate" - if you don't like it 🖕 (or just use strict=True)
What's changing in V2
"You can't make a Tomlette without breaking a few Gregs!" (Succession S2E9) - best TV pun ever?
I'm going to have to upset some people to get V2 out:
- a bunch of PRs will have to be closed - users can come back and restart them once the meat of the V2 changes are in main
- Some functionality will change, I think for the better, but some people will disagree
- Some hacks and workarounds will no longer be possible as the logic is in rust where you can't mess
- no more subclasses of basic types(str, bytes) - except with wrap decorators
- (initially) no pure python implementation of pydantic-core
The bad bits
What's changing in V2
Wrap Validators
class MyModel(BaseModel):
appointment_time: datetime | None
@validator('appointment_time', mode='wrap')
def validate_appointment_time(cls, v, handler):
if v == 'now':
return datetime.now()
try:
return handler(v)
except ValidationError:
# we don't want to fail, so just use None
return None
New Features
(Implemented, but not with this nice syntactic sugar)
AKA "The onion" - like middleware
Strict Mode
class StrictModel(BaseModel, strict=True):
a_string: str
an_int: int
set_of_ints: Set[int]
class LaxModel(BaseModel):
lax_string: str
strict_string1: str = Field(..., strict=True)
strict_string2: StrictStr
New Features
(Implemented, but not with this nice syntactic sugar)
(Does not apply when "downcasting" from JSON)
Smart Union
class MyModel(BaseModel):
int_or_bool: int | bool
bool_or_int: bool | int
print(MyModel(int_or_bool=1)) #> 1, 1
print(MyModel(int_or_bool=True)) #> True, True
print(MyModel(int_or_bool='1')) #> 1, True :-( ?
New Features
(Implemented, but not with this nice syntactic sugar)
(Using strict mode)
(Also works with models and model instances)
Optional vs. Nullable
class MyModel1(BaseModel):
none_allowed_required: str | None
none_allowed_not_required: str | None = None
from typing_extentions import Required, NotRequired
class MyModel2(BaseModel): # ... maybe, do you want this?
none_allowed_required: Required[str]
none_allowed_not_required: NotRequired[str]
New Features
(Implemented, but not with this nice syntactic sugar)
(I'm no longer scared of the word "optional")
Validation without a model
from pydantic import validate
validate(List[int], [1, 2, '3']) #> [1, 2, 3]
validate(List[int], [1, 2, 3], strict=True) #> [1, 2, 3]
validate(List[int], [1, 2, '3'], strict=True)
#> raises ValidationError
New Features
(Implemented, but not with this nice syntactic sugar)
No intermediate model required (unlike parse_obj in V1)
Parsing JSON directly
class MyModel(BaseModel):
name: str
age: int
friends: List[int]
settings: Dict[str, float]
MyModel.validate_json('{...}')
New Features
(Implemented, but not with this nice syntactic sugar)
No json.loads - just rust JSON parsing straight into validation
- We could add support for other formats (e.g. yaml, toml) the only side affect would be bigger binaries
- Not yet possible to get the line number :-(
Speed
New Features
Benchmark | Speed up |
---|---|
Simple model (str, int, List[int], Dict[str, float]) | 15.97x |
Simple model - JSON | 11.56x |
A bool (single value) | 3.46x |
Recursive model, 50 deep | 3.99x |
list of typed dicts, length 100 | 12.14x |
list of ints, length 1000 | 25.49x |
And more...
New Features
- hopefully 🤞 less "performance guilt"™
- context kwarg to validator functions
- input value is available in errors
- cleaning up the namespace - so you can use fields like "json", "dict" and "fields" and more importantly, so we can add more methods in future, either:
- all methods will have a prefix, e.g. "my_.model_dict()", ".model_json()", ".model_schema()"
- or, a namespace object: ".m.dict()", ".m.json()"
pydantic-core internals
from pydantic_core import SchemaValidator
schema_validator = SchemaValidator({'type': 'bool'})
print(repr(schema_validator))
Python extensions in Rust
(This code actually runs now!)
Let's start simple
SchemaValidator(name="bool", validator=BoolValidator)
print(schema_validator.validate_python(True)) -> True
print(schema_validator.validate_python(1)) -> True
print(schema_validator.validate_json('true')) -> True
pydantic-core internals
from pydantic_core import SchemaValidator
# Equivalent to: Dict[str, Optional[int]]
schema_validator = SchemaValidator({
'type': 'dict',
'keys': {'type': 'str'},
'values': {'type': 'optional', 'schema': {'type': 'int'}}
})
Python extensions in Rust
Let's get a bit more complicated
SchemaValidator(name="dict", validator=DictValidator {
strict: false,
key_validator: Some(StrValidator),
value_validator: Some(
OptionalValidator {validator: IntValidator},
),
min_items: None,
max_items: None,
try_instance_as_dict: false,
})
pydantic-core internals
class MyModel(BaseModel):
name: str
age: int | None = 42
settings: dict[str, float]
friends: list[int | str]
Python extensions in Rust
And finally...
(you don't need to read all this)
SchemaValidator(name="MyCoreModel",
validator=ModelClassValidator {
strict: false,
class: Py(0x12fe7e7c0), (MyCoreModel)
new_method: Py(0x00101054130), (MyCoreModel.__new__)
validator: ModelValidator {
name: "Model",
fields: [
ModelField {
name: "name",
default: None,
validator: StrValidator,
},
ModelField {
name: "age",
default: 42,
validator: OptionalValidator {
validator: IntValidator
},
},
ModelField {
name: "settings",
default: None,
validator: DictValidator { ... },
},
ModelField {
name: "friends",
default: None,
validator: ListValidator {
strict: false,
item_validator: Some(
UnionValidator {
choices: [
IntValidator,
StrValidator,
],
},
),
min_items: None,
max_items: None,
},
},
],
extra_behavior: Ignore,
extra_validator: None,
},
})
Many other people have said all this, there are many (much better) talks about it.
But for completeness, the good:
- Speed is the biggest win
- Provides a way to hook into existing great libraries written in rust - rtoml and watchfiles - when is someone going to do this for ASGI?
- Rust's error handling makes easy to catch and deal with errors
- Rust provides excellent primitives for threading
- pyo3 is amazing, getting started is very easy
The bad:
- Writing rust will always be slower than python
- there's a big learning curve
The ugly:
- Fighting the borrow checker is boring
- There will be boilerplate, macros to avoid boilerplate can be even worse...
The obvious things
Writing Python extensions in Rust
-
"Performance guilt":
- With rust there's no penalty for recursion
- and no penalty for small functions
- so ... you can build more modular code without paying a performance penalty
-
The theory is that someone can come along in 5 years time and add another type to pydantic-core, and:
- The type checker and linter will stop them doing dumb things
- There will be zero runtime penalty if you don't use it
- Their change can be small since there's no performance penalty for calling out to existing code
-
Contributions ... ?
- Will there be fewer, will they be "better"? or worse?
The less obvious things
Writing Python extensions in Rust
#[derive(Debug, Clone)]
pub struct OptionalValidator {
validator: Box<dyn Validator>,
}
impl OptionalValidator {
pub const EXPECTED_TYPE: &'static str = "optional";
}
impl Validator for OptionalValidator {
fn build(schema: &PyDict, config: Option<&PyDict>) -> PyResult<Box<dyn Validator>> {
let schema: &PyAny = schema.get_as_req("schema")?;
Ok(Box::new(Self {
validator: build_validator(schema, config)?.0,
}))
}
fn validate<'s, 'data>(
&'s self,
py: Python<'data>,
input: &'data dyn Input,
extra: &Extra,
) -> ValResult<'data, PyObject> {
match input.is_none() {
true => Ok(py.None()),
false => self.validator.validate(py, input, extra),
}
}
fn validate_strict<'s, 'data>(
&'s self,
py: Python<'data>,
input: &'data dyn Input,
extra: &Extra,
) -> ValResult<'data, PyObject> {
match input.is_none() {
true => Ok(py.None()),
false => self.validator.validate_strict(py, input, extra),
}
}
fn set_ref(&mut self, name: &str, validator_arc: &ValidatorArc) -> PyResult<()> {
self.validator.set_ref(name, validator_arc)
}
validator_boilerplate!(Self::EXPECTED_TYPE);
}
Writing Python extensions in Rust, example... (please don't cry)
mod optional;
...
lots of code...
...
validator_match!(
type_,
dict,
config,
... all the other validators
// unions
self::union::UnionValidator,
self::optional::OptionalValidator,
...
)
Writing Python extensions in Rust, example...
using my validator
Questions?
Checkout: github.com/samuelcolvin/pydantic-core
And: github.com/samuelcolvin/pydantic
Follow me on twitter: @samuel_colvin
Pydantic V2
By Samuel Colvin
Pydantic V2
- 7,816