Pydantic & Rust

Samuel Colvin

What is Pydantic?

Data validation & more using Python type hints
Top 50 package PyPI - Just reached 1B downloads 🎉
The validation magic behind FastAPI - FastAPI is ~25% of Pydantic's usage

from datetime import datetime
from pydantic import BaseModel

class Delivery(BaseModel):
    timestamp: datetime
    dimensions: tuple[int, int]

m = Delivery(timestamp='2020-01-02T03:04:05Z', dimensions=['10', '20'])
print(repr(m.timestamp))
#> datetime.datetime(2020, 1, 2, 3, 4, 5, tzinfo=TzInfo(UTC))
print(m.dimensions)
#> (10, 20)

Pydantic V2

Complete rewrite of Pydantic, with the core written in Rust
Released in June
5 - 50x faster than Pydantic V1
More correct, more extensible

Why Rust?

The obvious advantages...

Performance
Reusing high quality rust libraries
More explicit error handling

(maybe) Less obviously advantages:

Virtually zero cost customisation, even in hot code
Arguably easier to maintain - the compiler picks up more of mistake
Private means private

Disadvantages:

Slower to develop
Fewer people can help you
Have to distribute binaries, or leave users to compile it

Pydantic V2 Architecture

Read type hints

construct a "core schema"

pydantic

(pure python)

pydantic-core

(binary + stubs + core-schema)

process core schema

return SchemaValidator

Receive data

call schema_validator(data)

run validator

return the result of validation

Pydantic V2

Examples

Performance

import timeit
from pydantic import BaseModel, __version__

class Model(BaseModel):
    name: str
    age: int
    friends: list[int]
    settings: dict[str, float]

data = {
    'name': 'John',
    'age': 42,
    'friends': list(range(200)),
    'settings': {f'v_{i}': i / 2.0 for i in range(50)}
}
t = timeit.timeit(
    'Model(**data)',
    globals={'data': data, 'Model': Model},
    number=10_000,
)
print(f'version={__version__} time taken {t * 100:.2f}us')

version=1.10.4 time taken 179.81us
version=2.30   time taken   7.99us

22.5x speedup

Strict Mode

from pydantic import BaseModel, ConfigDict, ValidationError

class Model(BaseModel):
    model_config = ConfigDict(strict=True)
    
    age: int
    friends: tuple[int, int]

try:
    Model(age='42', friends=[1, 2])
except ValidationError as e:
    print(e)
    """
    2 validation errors for Model
    age
      Input should be a valid integer ... input_value='42'
    friends
      Input should be a valid tuple ... input_value=[1, 2]
    """

print(Model(age=42, friends=(1, 2)))
#> age=42 friends=(1, 2)

AKA Pedant mode.

Builtin JSON parsing

from pydantic import BaseModel, ConfigDict

class Model(BaseModel):
    model_config = ConfigDict(strict=True)

    age: int
    friends: tuple[int, int]

print(Model.model_validate_json('{"age": 1, "friends": [1, 2]}'))
#> age=1 friends=(1, 2)

If you're going to be a pedant, you better be right.

Also gives us:

Big performance improvement without 3rd party parsing library
Custom Errors (WIP)
Line numbers in errors (in future)

Wrap Validators

from pydantic import BaseModel, field_validator

class Model(BaseModel):
    x: int

    @field_validator('x', mode='wrap')
    def validate_x(cls, v, handler):
        if v == 'one':
            return 1

        try:
            x = handler(v)
        except ValueError:
            return -999
        else:
            return x + 1

print(Model(x='one'))
#> x=1
print(Model(x=2))
#> x=3
print(Model(x='three'))
#> x=-999

AKA "The Onion"

Before

On Error

After

Alias Paths

from pydantic import BaseModel, Field, AliasPath, AliasChoices


class MyModel(BaseModel):
    a: int = Field(validation_alias=AliasPath('foo', 1, 'bar'))
    b: str = Field(validation_alias=AliasChoices('x', 'y'))


m = MyModel.model_validate(
    {
        'foo': [{'bar': 0}, {'bar': 1}],
        'y': 'Y',
    }
)
print(m)
#> a=1 b='Y'

Somewhat similar to serde's "flatten".

Coming Soon...

PyO3 Speedups

PyO3 provides the magic, that allows Pydantic (and many other libraries) to call Rust from Python.

While Rust is very fast:

Calling Rust from Python
and, interacting with Python objects (e.g. Dicts) from Rust

Is slower than it could be.

But much of this is fixable...

PyO3 Speedups

Let's look at an example...

def dict_not_none(**kwargs: Any) -> Any:
    return {k: v for k, v in kwargs.items() if v is not None}

Implementation	Measurement
Python	281ns
PyO3 Today	350ns
Baremetal FFI	54ns
PyO3 Next	235ns

github.com/davidhewitt/dict-not-none

Faster JSON validation

We currently parse JSON completely, store it in a heap of Maps and Arrays, then validate.

We can do much better...

Faster JSON validation

What we have now:

def validate_json_today(model_type: ModelType, json_data: str):
    json_object = JsonObject()
    for chunk in JsonParser(json_data).chunks():
        json_chunk = chunk.to_object()
        json_object.add(json_chunk)

    model_data = {}
    errors = []
    for f in model_type.fields:
        try:
            model_data[f.name] = f.validate(json_object[f.key])
        except Error:
            errors.append(field)

    if errors:
        raise ValidationError(errors)
    else:
        return model_type(model_data)

Warning: Python as pseudo code for Rust 😱

Faster JSON validation

What we might have in future:

def validate_json_future(model_type: ModelType, json_data: str):
    tmp: list[Any | None] = [None for _ in range(len(model_type.fields))]
    errors = []
    for key, chunk in iter_json_parser(json_data):
        field = model_type.fields.get(key)
        if field:
            try:
                tmp[field.index] = field.validate(chunk.parse())
            except Error:
                errors.append(field)

    model_data = []
    for index, field in enumerate(tmp):
        if field is None:
            errors.append(model_type.get_by_index(index))
        else:
            model_data.append(field)

    ...

Warning: Python as pseudo code for Rust 😱

Pydantic V3 more...

Lazy Python objects => even no python objects
More direct validation & serialization formats - msgpack, avro, parquet?

Thank you

docs.pydantic.dev / @github.com/pydantic

Alert!

Pydantic will start a closed beta of our Observability tool later this year!

Come and find me for beta access, or scan: