Pydantic & Rust
Samuel Colvin
What is Pydantic?
- Data validation & more using Python type hints
- Top 50 package PyPI - Just reached 1B downloads 🎉
- The validation magic behind FastAPI - FastAPI is ~25% of Pydantic's usage
from datetime import datetime
from pydantic import BaseModel
class Delivery(BaseModel):
timestamp: datetime
dimensions: tuple[int, int]
m = Delivery(timestamp='2020-01-02T03:04:05Z', dimensions=['10', '20'])
print(repr(m.timestamp))
#> datetime.datetime(2020, 1, 2, 3, 4, 5, tzinfo=TzInfo(UTC))
print(m.dimensions)
#> (10, 20)
Pydantic V2
- Complete rewrite of Pydantic, with the core written in Rust
- Released in June
- 5 - 50x faster than Pydantic V1
- More correct, more extensible
Why Rust?
The obvious advantages...
- Performance
- Reusing high quality rust libraries
- More explicit error handling
(maybe) Less obviously advantages:
- Virtually zero cost customisation, even in hot code
- Arguably easier to maintain - the compiler picks up more of mistake
- Private means private
Disadvantages:
- Slower to develop
- Fewer people can help you
- Have to distribute binaries, or leave users to compile it
Pydantic V2 Architecture
Read type hints
construct a "core schema"
pydantic
(pure python)
pydantic-core
(binary + stubs + core-schema)
process core schema
return SchemaValidator
Receive data
call schema_validator(data)
run validator
return the result of validation
Pydantic V2
Examples
Performance
import timeit
from pydantic import BaseModel, __version__
class Model(BaseModel):
name: str
age: int
friends: list[int]
settings: dict[str, float]
data = {
'name': 'John',
'age': 42,
'friends': list(range(200)),
'settings': {f'v_{i}': i / 2.0 for i in range(50)}
}
t = timeit.timeit(
'Model(**data)',
globals={'data': data, 'Model': Model},
number=10_000,
)
print(f'version={__version__} time taken {t * 100:.2f}us')
version=1.10.4 time taken 179.81us
version=2.30 time taken 7.99us
22.5x speedup
Strict Mode
from pydantic import BaseModel, ConfigDict, ValidationError
class Model(BaseModel):
model_config = ConfigDict(strict=True)
age: int
friends: tuple[int, int]
try:
Model(age='42', friends=[1, 2])
except ValidationError as e:
print(e)
"""
2 validation errors for Model
age
Input should be a valid integer ... input_value='42'
friends
Input should be a valid tuple ... input_value=[1, 2]
"""
print(Model(age=42, friends=(1, 2)))
#> age=42 friends=(1, 2)
AKA Pedant mode.
Builtin JSON parsing
from pydantic import BaseModel, ConfigDict
class Model(BaseModel):
model_config = ConfigDict(strict=True)
age: int
friends: tuple[int, int]
print(Model.model_validate_json('{"age": 1, "friends": [1, 2]}'))
#> age=1 friends=(1, 2)
If you're going to be a pedant, you better be right.
Also gives us:
- Big performance improvement without 3rd party parsing library
- Custom Errors (WIP)
- Line numbers in errors (in future)
Wrap Validators
from pydantic import BaseModel, field_validator
class Model(BaseModel):
x: int
@field_validator('x', mode='wrap')
def validate_x(cls, v, handler):
if v == 'one':
return 1
try:
x = handler(v)
except ValueError:
return -999
else:
return x + 1
print(Model(x='one'))
#> x=1
print(Model(x=2))
#> x=3
print(Model(x='three'))
#> x=-999
AKA "The Onion"
Before
On Error
After
Alias Paths
from pydantic import BaseModel, Field, AliasPath, AliasChoices
class MyModel(BaseModel):
a: int = Field(validation_alias=AliasPath('foo', 1, 'bar'))
b: str = Field(validation_alias=AliasChoices('x', 'y'))
m = MyModel.model_validate(
{
'foo': [{'bar': 0}, {'bar': 1}],
'y': 'Y',
}
)
print(m)
#> a=1 b='Y'
Somewhat similar to serde's "flatten".
Coming Soon...
PyO3 Speedups
PyO3 provides the magic, that allows Pydantic (and many other libraries) to call Rust from Python.
While Rust is very fast:
- Calling Rust from Python
- and, interacting with Python objects (e.g. Dicts) from Rust
Is slower than it could be.
But much of this is fixable...
PyO3 Speedups
Let's look at an example...
def dict_not_none(**kwargs: Any) -> Any:
return {k: v for k, v in kwargs.items() if v is not None}
Implementation | Measurement |
---|---|
Python | 281ns |
PyO3 Today | 350ns |
Baremetal FFI | 54ns |
PyO3 Next | 235ns |
Faster JSON validation
We currently parse JSON completely, store it in a heap of Maps and Arrays, then validate.
We can do much better...
Faster JSON validation
What we have now:
def validate_json_today(model_type: ModelType, json_data: str):
json_object = JsonObject()
for chunk in JsonParser(json_data).chunks():
json_chunk = chunk.to_object()
json_object.add(json_chunk)
model_data = {}
errors = []
for f in model_type.fields:
try:
model_data[f.name] = f.validate(json_object[f.key])
except Error:
errors.append(field)
if errors:
raise ValidationError(errors)
else:
return model_type(model_data)
Warning: Python as pseudo code for Rust 😱
Faster JSON validation
What we might have in future:
def validate_json_future(model_type: ModelType, json_data: str):
tmp: list[Any | None] = [None for _ in range(len(model_type.fields))]
errors = []
for key, chunk in iter_json_parser(json_data):
field = model_type.fields.get(key)
if field:
try:
tmp[field.index] = field.validate(chunk.parse())
except Error:
errors.append(field)
model_data = []
for index, field in enumerate(tmp):
if field is None:
errors.append(model_type.get_by_index(index))
else:
model_data.append(field)
...
Warning: Python as pseudo code for Rust 😱
Pydantic V3 more...
- Lazy Python objects => even no python objects
- More direct validation & serialization formats - msgpack, avro, parquet?
Thank you
Alert!
Pydantic will start a closed beta of our Observability tool later this year!
Come and find me for beta access, or scan:
FastAPI Berlin | Pydantic & Rust
By Samuel Colvin
FastAPI Berlin | Pydantic & Rust
- 8,846