by
https://fosdem.org/2023/schedule/event/rust_how_pydantic_v2_leverages_rusts_superpowers/
I won't:
hello_world()
Python extension in Rust thing" (there are lots of other good resources for that)Yet another data validation library for Python.
Side project, nothing special, maintained in my spare time.
Pydantic Today
Downloads
Until this happend:
from datetime import datetime, timedelta
from pydantic import BaseModel
class Talk(BaseModel):
title: str
attendance: int
when: datetime | None = None
mistakes: list[tuple[timedelta, str]]
some_data = {
'title': 'Pydantic & Rust',
'attendance': '10',
'when': '2023-02-04T16:45:00',
'mistakes': [
('00:00:00', 'Screen mirroring confusion'),
('00:00:30', 'Forgot to turn on the mic'),
('00:25:00', 'Too short'),
('00:50:00', 'Too long!'),
],
}
talk = Talk(**some_data)
Just type hints!
Spoiler, I don't know ... but we can guess.
Default to coercion:
(Not pedantic! = 😡 people)
Also, we're:
Pydantic Today
The problem is, Pydantic V1 stinks on the inside, time to rethink!
Pydantic V2
Priorities for V2:
Sad penguin, no snow
So, I decided to rebuilt Pydantic from the ground up in Rust ...
a year later, I'm nearly done
The principle:
Pydantic V2
class Talk(BaseModel):
title: Annotated[
str,
Maxlen(100)
]
attendance: PosInt
when: datetime | None = None
mistakes: list[
tuple[timedelta, str]
]
ModelValidator {
cls: Talk,
validator: TypeDictValidator [
Field {
key: "title",
validator: StrValidator { max_len: 100 },
},
Field {
key: "attendance",
validator: IntValidator { min: 0 },
},
Field {
key: "when",
validator: UnionValidator [
DateTimeValidaor {},
NoneValidator {},
],
default: None,
},
Field {
key: "mistakes",
validator: ListValidator {
item_validator: TupleValidator [
TimedeltaValidator {},
StrValidator {},
],
},
},
],
}
Pydantic V2
#[enum_dispatch(CombinedValidator)]
trait Validator {
const EXPECTED_TYPE: &'static str;
fn build(schema: &PyDict, config: Option<&PyDict>) -> PyResult<CombinedValidator>;
fn validate(&self, input: &impl Input, extra: &Extra) -> ValResult<PyObject>;
}
#[enum_dispatch]
enum CombinedValidator {
Int(IntValidator),
Str(StrValidator),
TypedDict(TypedDictValidator),
Union(UnionValidator),
TaggedUnion(TaggedUnionValidator),
Nullable(NullableValidator),
// ... and 43 more
}
fn build_validator(schema: &PyDict, config: Option<&PyDict>) -> PyResult<CombinedValidator> {
let schema_type: &str = schema.get_as_req("type")?;
// really this is a clever macro to avoid the duplication
match schema_type {
IntValidator::EXPECTED_TYPE => IntValidator::build(schema, config),
StrValidator::EXPECTED_TYPE => StrValidator::build(schema, config),
TypedDictValidator::EXPECTED_TYPE => TypedDictValidator::build(schema, config),
UnionValidator::EXPECTED_TYPE => UnionValidator::build(schema, config),
TaggedUnionValidator::EXPECTED_TYPE => TaggedUnionValidator::build(schema, config),
NullableValidator::EXPECTED_TYPE => NullableValidator::build(schema, config),
// ... and 43 more
}
}
trait Input<'a> {
fn is_none(&self) -> bool;
fn strict_str(&'a self) -> ValResult<&'a str>;
fn lax_str(&'a self) -> ValResult<&'a str>;
fn validate_date(&self, strict: bool) -> ValResult<PyDatetime>;
fn strict_date(&self) -> ValResult<PyDatetime>;
// ... and 53 more
}
impl<'a> Input<'a> for PyAny {
// ...
}
impl<'a> Input<'a> for JsonInput {
// ...
}
#[pyclass]
struct SchemaValidator {
validator: CombinedValidator,
}
#[pymethods]
impl SchemaValidator {
#[new]
fn py_new(schema: &PyDict, config: Option<&PyDict>) -> PyResult<Self> {
// We also do magic/evil schema validation using pydantic-core itself
let validator = build_validator(schema, config)?;
Ok(SchemaValidator { validator })
}
fn validate_python(&self, input: &PyAny, strict: Option<bool>) -> PyResult<PyObject> {
self.validator.validate(input, &Extra::new(strict))
}
fn validate_json(
&self,
input_string: &PyString,
strict: Option<bool>,
) -> PyResult<PyObject> {
let input = parse_string(input_string)?;
self.validator.validate(&input, &Extra::new(strict))
}
}
Pydantic V2
from pydantic_core import SchemaValidator
class Talk:
...
talk_validator = SchemaValidator({
'type': 'model',
'cls': Talk,
'schema': {
'type': 'typed-dict',
'fields': {
'title': {'schema': {'type': 'str', 'max_length': 100}},
'attendance': {'schema': {'type': 'int', 'ge': 0}},
'when': {
'schema': {
'type': 'default',
'schema': {'type': 'nullable', 'schema': {'type': 'datetime'}},
'default': None,
}
},
'mistakes': {
'schema': {
'type': 'list',
'items_schema': {
'type': 'tuple',
'mode': 'positional',
'items_schema': [{'type': 'timedelta'}, {'type': 'str'}]
}
}
},
},
}
})
some_data = {
'title': 'Pydantic & Rust',
'attendance': '100',
'when': '2023-02-04T16:45:00',
'mistakes': [
('00:00:00', 'Screen mirroring confusion'),
('00:00:30', 'Forgot to turn on the mic'),
('00:25:00', 'Too short'),
('00:40:00', 'Too long!'),
],
}
talk = talk_validator.validate_python(some_data)
print(talk.mistakes)
"""
[
(datetime.timedelta(0), 'Screen mirroring confusion'),
(datetime.timedelta(seconds=30), 'Forgot to turn on the mic'),
(datetime.timedelta(seconds=1500), 'Too short'),
(datetime.timedelta(seconds=2400), 'Too long!')
]
"""
class Talk(BaseModel):
title: Annotated[
str,
Maxlen(100)
]
attendance: PosInt
when: datetime | None = None
mistakes: list[
tuple[timedelta, str]
]
When building Python libraries...
Obviously:
(maybe) Less obviously:
Rust
But rather: Python as the user* interface for Rust.
(* by user, I mean "application developer")
I'd love to see a generation of libraries for Python (and other high level languages) built in Rust.
Rust
TLS
Routing
HTTP parsing
Validation
DB query
Serializing
Rust/C
Python
Application Logic
HTTPS request lifecycle:
100% of Developer time
=
1% of CPU cycles
...
Questions?
Twitter: @samuel_colvin & @pydantic
GitHub: /samuelcolvin & /pydantic
Docs: docs.pydantic.dev
Massive thanks to PyO3 - Rust bindings for Python, which made this possible
If you'd like a laugh, please see
github.com/pydantic/pydantic/issues/4790
for a very different (angry) opinion