How Pydantic V2 leverages Rust's Superpowers
Using Rust to build Python extensions
by
Samuel Colvin
Fosdem, February 2023
https://fosdem.org/2023/schedule/event/rust_how_pydantic_v2_leverages_rusts_superpowers/
Me
- Software developer for 10 years
- I've spent the last 5 years doing lots of Open Source
- I created Pydantic in 2017 as an experiment, it's since taken over my life
- Worked full time on Pydantic for the last year
Today
- Introduce you to Pydantic, including some hype numbers
- Explain why I decided to rebuild Pydantic
- Try to show how it's built in Rust
- Try to explain why Rust is great for this
I won't:
- Do the "
hello_world()
Python extension in Rust thing" (there are lots of other good resources for that)
What is it?
Yet another data validation library for Python.
Side project, nothing special, maintained in my spare time.
Pydantic Today
Downloads
Until this happend:
What's so great?
from datetime import datetime, timedelta
from pydantic import BaseModel
class Talk(BaseModel):
title: str
attendance: int
when: datetime | None = None
mistakes: list[tuple[timedelta, str]]
some_data = {
'title': 'Pydantic & Rust',
'attendance': '10',
'when': '2023-02-04T16:45:00',
'mistakes': [
('00:00:00', 'Screen mirroring confusion'),
('00:00:30', 'Forgot to turn on the mic'),
('00:25:00', 'Too short'),
('00:50:00', 'Too long!'),
],
}
talk = Talk(**some_data)
Just type hints!
- Less to learn
- compatible with static type analysis, IDEs and your brain
Spoiler, I don't know ... but we can guess.
Default to coercion:
- Fault tolerant
- More formats supported without faff
(Not pedantic! = 😡 people)
Also, we're:
- fast (ish)
- friendly (ish)
- complete (ish).
Pydantic Today
What's wrong?
The problem is, Pydantic V1 stinks on the inside, time to rethink!
Pydantic V2
Priorities for V2:
- Performance - it was good, but it could be better - think of the penguins!
- Strict Mode - live up to the name
- Onion - wrap validators 🧅
- Composability - models aren't always super
- Maintainability - I maintain pydantic so I want maintaining Pydantic to be fun
Sad penguin, no snow
So, I decided to rebuilt Pydantic from the ground up in Rust ...
a year later, I'm nearly done
Where's the rust!?
The principle:
Pydantic V2
class Talk(BaseModel):
title: Annotated[
str,
Maxlen(100)
]
attendance: PosInt
when: datetime | None = None
mistakes: list[
tuple[timedelta, str]
]
ModelValidator {
cls: Talk,
validator: TypeDictValidator [
Field {
key: "title",
validator: StrValidator { max_len: 100 },
},
Field {
key: "attendance",
validator: IntValidator { min: 0 },
},
Field {
key: "when",
validator: UnionValidator [
DateTimeValidaor {},
NoneValidator {},
],
default: None,
},
Field {
key: "mistakes",
validator: ListValidator {
item_validator: TupleValidator [
TimedeltaValidator {},
StrValidator {},
],
},
},
],
}
Ok, some actual Rust...
Pydantic V2
#[enum_dispatch(CombinedValidator)]
trait Validator {
const EXPECTED_TYPE: &'static str;
fn build(schema: &PyDict, config: Option<&PyDict>) -> PyResult<CombinedValidator>;
fn validate(&self, input: &impl Input, extra: &Extra) -> ValResult<PyObject>;
}
#[enum_dispatch]
enum CombinedValidator {
Int(IntValidator),
Str(StrValidator),
TypedDict(TypedDictValidator),
Union(UnionValidator),
TaggedUnion(TaggedUnionValidator),
Nullable(NullableValidator),
// ... and 43 more
}
fn build_validator(schema: &PyDict, config: Option<&PyDict>) -> PyResult<CombinedValidator> {
let schema_type: &str = schema.get_as_req("type")?;
// really this is a clever macro to avoid the duplication
match schema_type {
IntValidator::EXPECTED_TYPE => IntValidator::build(schema, config),
StrValidator::EXPECTED_TYPE => StrValidator::build(schema, config),
TypedDictValidator::EXPECTED_TYPE => TypedDictValidator::build(schema, config),
UnionValidator::EXPECTED_TYPE => UnionValidator::build(schema, config),
TaggedUnionValidator::EXPECTED_TYPE => TaggedUnionValidator::build(schema, config),
NullableValidator::EXPECTED_TYPE => NullableValidator::build(schema, config),
// ... and 43 more
}
}
trait Input<'a> {
fn is_none(&self) -> bool;
fn strict_str(&'a self) -> ValResult<&'a str>;
fn lax_str(&'a self) -> ValResult<&'a str>;
fn validate_date(&self, strict: bool) -> ValResult<PyDatetime>;
fn strict_date(&self) -> ValResult<PyDatetime>;
// ... and 53 more
}
impl<'a> Input<'a> for PyAny {
// ...
}
impl<'a> Input<'a> for JsonInput {
// ...
}
#[pyclass]
struct SchemaValidator {
validator: CombinedValidator,
}
#[pymethods]
impl SchemaValidator {
#[new]
fn py_new(schema: &PyDict, config: Option<&PyDict>) -> PyResult<Self> {
// We also do magic/evil schema validation using pydantic-core itself
let validator = build_validator(schema, config)?;
Ok(SchemaValidator { validator })
}
fn validate_python(&self, input: &PyAny, strict: Option<bool>) -> PyResult<PyObject> {
self.validator.validate(input, &Extra::new(strict))
}
fn validate_json(
&self,
input_string: &PyString,
strict: Option<bool>,
) -> PyResult<PyObject> {
let input = parse_string(input_string)?;
self.validator.validate(&input, &Extra::new(strict))
}
}
Python Interface
Pydantic V2
from pydantic_core import SchemaValidator
class Talk:
...
talk_validator = SchemaValidator({
'type': 'model',
'cls': Talk,
'schema': {
'type': 'typed-dict',
'fields': {
'title': {'schema': {'type': 'str', 'max_length': 100}},
'attendance': {'schema': {'type': 'int', 'ge': 0}},
'when': {
'schema': {
'type': 'default',
'schema': {'type': 'nullable', 'schema': {'type': 'datetime'}},
'default': None,
}
},
'mistakes': {
'schema': {
'type': 'list',
'items_schema': {
'type': 'tuple',
'mode': 'positional',
'items_schema': [{'type': 'timedelta'}, {'type': 'str'}]
}
}
},
},
}
})
some_data = {
'title': 'Pydantic & Rust',
'attendance': '100',
'when': '2023-02-04T16:45:00',
'mistakes': [
('00:00:00', 'Screen mirroring confusion'),
('00:00:30', 'Forgot to turn on the mic'),
('00:25:00', 'Too short'),
('00:40:00', 'Too long!'),
],
}
talk = talk_validator.validate_python(some_data)
print(talk.mistakes)
"""
[
(datetime.timedelta(0), 'Screen mirroring confusion'),
(datetime.timedelta(seconds=30), 'Forgot to turn on the mic'),
(datetime.timedelta(seconds=1500), 'Too short'),
(datetime.timedelta(seconds=2400), 'Too long!')
]
"""
class Talk(BaseModel):
title: Annotated[
str,
Maxlen(100)
]
attendance: PosInt
when: datetime | None = None
mistakes: list[
tuple[timedelta, str]
]
Where Rust Excels
When building Python libraries...
Obviously:
- Performance
- Multithreading - no GIL
- Reusing high quality rust libraries (I also maintain two libraries doing this watchfiles and rtoml)
(maybe) Less obviously:
- Deeply recursive code - no stack, no recursion penalty, but be careful!
- Small modular components - (almost) no function penalty
- Complex error handling
Rust
Not Rust vs. Python
But rather: Python as the user* interface for Rust.
(* by user, I mean "application developer")
I'd love to see a generation of libraries for Python (and other high level languages) built in Rust.
Rust
TLS
Routing
HTTP parsing
Validation
DB query
Serializing
Rust/C
Python
Application Logic
HTTPS request lifecycle:
100% of Developer time
=
1% of CPU cycles
...
Thank you
Questions?
Twitter: @samuel_colvin & @pydantic
GitHub: /samuelcolvin & /pydantic
Docs: docs.pydantic.dev
Massive thanks to PyO3 - Rust bindings for Python, which made this possible
If you'd like a laugh, please see
github.com/pydantic/pydantic/issues/4790
for a very different (angry) opinion
Fosdem | How Pydantic V2 leverages Rust's Superpowers
By Samuel Colvin
Fosdem | How Pydantic V2 leverages Rust's Superpowers
- 3,449