How Pydantic V2 leverages Rust's Superpowers

Using Rust to build Python extensions

by

Samuel Colvin

Fosdem, February 2023

https://fosdem.org/2023/schedule/event/rust_how_pydantic_v2_leverages_rusts_superpowers/

Me

  • Software developer for 10 years
  • I've spent the last 5 years doing lots of Open Source
  • I created Pydantic in 2017 as an experiment, it's since taken over my life
  • Worked full time on Pydantic for the last year

Today

  • Introduce you to Pydantic, including some hype numbers
  • Explain why I decided to rebuild Pydantic
  • Try to show how it's built in Rust
  • Try to explain why Rust is great for this

 

I won't:

  • Do the "hello_world() Python extension in Rust thing" (there are lots of other good resources for that)

What is it?

Yet another data validation library for Python.

 

Side project, nothing special, maintained in my spare time.

Pydantic Today

Downloads

Until this happend:

What's so great?

from datetime import datetime, timedelta
from pydantic import BaseModel


class Talk(BaseModel):
    title: str
    attendance: int
    when: datetime | None = None
    mistakes: list[tuple[timedelta, str]]


some_data = {
    'title': 'Pydantic & Rust',
    'attendance': '10',
    'when': '2023-02-04T16:45:00',
    'mistakes': [
        ('00:00:00', 'Screen mirroring confusion'),
        ('00:00:30', 'Forgot to turn on the mic'),
        ('00:25:00', 'Too short'),
        ('00:50:00', 'Too long!'),
    ],
}
talk = Talk(**some_data)

Just type hints!

  • Less to learn
  • compatible with static type analysis, IDEs and your brain

Spoiler, I don't know ... but we can guess.

Default to coercion:

  • Fault tolerant
  • More formats supported without faff

(Not pedantic! = 😡 people)

Also, we're:

  • fast (ish)
  • friendly (ish)
  • complete (ish).

Pydantic Today

What's wrong?

The problem is, Pydantic V1 stinks on the inside, time to rethink!

Pydantic V2

Priorities for V2:

  • Performance - it was good, but it could be better - think of the penguins!
  • Strict Mode - live up to the name
  • Onion - wrap validators 🧅
  • Composability - models aren't always super
  • Maintainability - I maintain pydantic so I want maintaining Pydantic to be fun

Sad penguin, no snow

So, I decided to rebuilt Pydantic from the ground up in Rust ...

a year later, I'm nearly done

Where's the rust!?

The principle:

Pydantic V2

class Talk(BaseModel):
    title: Annotated[
        str,
        Maxlen(100)
    ]
    attendance: PosInt
    when: datetime | None = None
    mistakes: list[
        tuple[timedelta, str]
    ]
ModelValidator {
    cls: Talk,
    validator: TypeDictValidator [
        Field {
            key: "title",
            validator: StrValidator { max_len: 100 },
        },
        Field {
            key: "attendance",
            validator: IntValidator { min: 0 },
        },
        Field {
            key: "when",
            validator: UnionValidator [
                DateTimeValidaor {},
                NoneValidator {},
            ],
            default: None,
        },
        Field {
            key: "mistakes",
            validator: ListValidator {
                item_validator: TupleValidator [
                    TimedeltaValidator {},
                    StrValidator {},
                ],
            },
        },
    ],
}

Ok, some actual Rust...

Pydantic V2

#[enum_dispatch(CombinedValidator)]
trait Validator {
    const EXPECTED_TYPE: &'static str;

    fn build(schema: &PyDict, config: Option<&PyDict>) -> PyResult<CombinedValidator>;

    fn validate(&self, input: &impl Input, extra: &Extra) -> ValResult<PyObject>;
}

#[enum_dispatch]
enum CombinedValidator {
    Int(IntValidator),
    Str(StrValidator),
    TypedDict(TypedDictValidator),
    Union(UnionValidator),
    TaggedUnion(TaggedUnionValidator),
    Nullable(NullableValidator),
    // ... and 43 more
}

fn build_validator(schema: &PyDict, config: Option<&PyDict>) -> PyResult<CombinedValidator> {
    let schema_type: &str = schema.get_as_req("type")?;
    // really this is a clever macro to avoid the duplication
    match schema_type {
        IntValidator::EXPECTED_TYPE => IntValidator::build(schema, config),
        StrValidator::EXPECTED_TYPE => StrValidator::build(schema, config),
        TypedDictValidator::EXPECTED_TYPE => TypedDictValidator::build(schema, config),
        UnionValidator::EXPECTED_TYPE => UnionValidator::build(schema, config),
        TaggedUnionValidator::EXPECTED_TYPE => TaggedUnionValidator::build(schema, config),
        NullableValidator::EXPECTED_TYPE => NullableValidator::build(schema, config),
        // ... and 43 more
    }
}

trait Input<'a> {
    fn is_none(&self) -> bool;

    fn strict_str(&'a self) -> ValResult<&'a str>;

    fn lax_str(&'a self) -> ValResult<&'a str>;

    fn validate_date(&self, strict: bool) -> ValResult<PyDatetime>;

    fn strict_date(&self) -> ValResult<PyDatetime>;

    // ... and 53 more
}

impl<'a> Input<'a> for PyAny {
    // ...
}

impl<'a> Input<'a> for JsonInput {
    // ...
}

#[pyclass]
struct SchemaValidator {
    validator: CombinedValidator,
}

#[pymethods]
impl SchemaValidator {
    #[new]
    fn py_new(schema: &PyDict, config: Option<&PyDict>) -> PyResult<Self> {
        // We also do magic/evil schema validation using pydantic-core itself
        let validator = build_validator(schema, config)?;
        Ok(SchemaValidator { validator })
    }

    fn validate_python(&self, input: &PyAny, strict: Option<bool>) -> PyResult<PyObject> {
        self.validator.validate(input, &Extra::new(strict))
    }

    fn validate_json(
        &self,
        input_string: &PyString,
        strict: Option<bool>,
    ) -> PyResult<PyObject> {
        let input = parse_string(input_string)?;
        self.validator.validate(&input, &Extra::new(strict))
    }
}

Python Interface

Pydantic V2

from pydantic_core import SchemaValidator


class Talk:
    ...

talk_validator = SchemaValidator({
    'type': 'model',
    'cls': Talk,
    'schema': {
        'type': 'typed-dict',
        'fields': {
            'title': {'schema': {'type': 'str', 'max_length': 100}},
            'attendance': {'schema': {'type': 'int', 'ge': 0}},
            'when': {
                'schema': {
                    'type': 'default',
                    'schema': {'type': 'nullable', 'schema': {'type': 'datetime'}},
                    'default': None,
                }
            },
            'mistakes': {
                'schema': {
                    'type': 'list',
                    'items_schema': {
                        'type': 'tuple',
                        'mode': 'positional',
                        'items_schema': [{'type': 'timedelta'}, {'type': 'str'}]
                    }
                }
            },
        },
    }
})

some_data = {
    'title': 'Pydantic & Rust',
    'attendance': '100',
    'when': '2023-02-04T16:45:00',
    'mistakes': [
        ('00:00:00', 'Screen mirroring confusion'),
        ('00:00:30', 'Forgot to turn on the mic'),
        ('00:25:00', 'Too short'),
        ('00:40:00', 'Too long!'),
    ],
}
talk = talk_validator.validate_python(some_data)
print(talk.mistakes)
"""
[
    (datetime.timedelta(0), 'Screen mirroring confusion'), 
    (datetime.timedelta(seconds=30), 'Forgot to turn on the mic'), 
    (datetime.timedelta(seconds=1500), 'Too short'), 
    (datetime.timedelta(seconds=2400), 'Too long!')
]
"""
class Talk(BaseModel):
    title: Annotated[
        str,
        Maxlen(100)
    ]
    attendance: PosInt
    when: datetime | None = None
    mistakes: list[
        tuple[timedelta, str]
    ]

Where Rust Excels

When building Python libraries...

 

Obviously:

  • Performance
  • Multithreading - no GIL
  • Reusing high quality rust libraries (I also maintain two libraries doing this watchfiles and rtoml)

 

(maybe) Less obviously:

  • Deeply recursive code - no stack, no recursion penalty, but be careful!
  • Small modular components - (almost) no function penalty
  • Complex error handling

Rust

Not Rust vs. Python

But rather: Python as the user* interface for Rust.

(* by user, I mean "application developer")

 

I'd love to see a generation of libraries for Python (and other high level languages) built in Rust.

Rust

TLS

Routing

HTTP parsing

Validation

DB query

Serializing

Rust/C

Python

Application Logic

HTTPS request lifecycle:

100% of Developer time

=

1% of CPU cycles

...

Thank you

Questions?

Twitter: @samuel_colvin & @pydantic

GitHub: /samuelcolvin & /pydantic

Docs: docs.pydantic.dev


Massive thanks to PyO3 - Rust bindings for Python, which made this possible

pyo3.rs


If you'd like a laugh, please see

github.com/pydantic/pydantic/issues/4790

for a very different (angry) opinion

Fosdem | How Pydantic V2 leverages Rust's Superpowers

By Samuel Colvin

Fosdem | How Pydantic V2 leverages Rust's Superpowers

  • 3,310