Pydantic
What's coming in V2
&
A review of building python extensions with Rust
Please ask questions as I go along ... I've written this in a hurry so if you're confused, you're probably not alone
Some Background
- Pydantic does data validation using type hints
 - It was first released in 2017 as a tiny experiment
 - The library has since seen massive growth ... particularly since Sebastián used pydantic in FastAPI. Thanks Sebastián! 🙏
 - Pydantic hasn't been significantly rewritten since v0.0.1
 - The internals are creaking
 - V2 is an opportunity to fix some of footguns but also re-write the internals
 
The good bits
Everything! (and hopefully nothing for you):
- 
	
all validation is offloaded to another library - pydantic-core which I'm building now
 - Ugly hangovers from mistakes I made while building v0.0.1 can be killed
 - We can use a little of the speed premium provided by pydantic-core to do some stuff more correctly - e.g. smart unions
 - I've given up on the "validation" vs. "parsing" vs. "coercion" debate, I'm just using "validate" - if you don't like it 🖕 (or just use strict=True)
 
What's changing in V2
"You can't make a Tomlette without breaking a few Gregs!" (Succession S2E9) - best TV pun ever?
I'm going to have to upset some people to get V2 out:
- a bunch of PRs will have to be closed - users can come back and restart them once the meat of the V2 changes are in main
 - Some functionality will change, I think for the better, but some people will disagree
 - Some hacks and workarounds will no longer be possible as the logic is in rust where you can't mess
 - no more subclasses of basic types(str, bytes) - except with wrap decorators
 - (initially) no pure python implementation of pydantic-core
 
The bad bits
What's changing in V2
Wrap Validators
class MyModel(BaseModel):
    appointment_time: datetime | None
    
    @validator('appointment_time', mode='wrap')
    def validate_appointment_time(cls, v, handler):
        if v == 'now':
            return datetime.now()
        
        try:
            return handler(v)
        except ValidationError:
            # we don't want to fail, so just use None
            return None
New Features
(Implemented, but not with this nice syntactic sugar)
AKA "The onion" - like middleware
Strict Mode
class StrictModel(BaseModel, strict=True):
    a_string: str
    an_int: int
    set_of_ints: Set[int]
class LaxModel(BaseModel):
    lax_string: str
    strict_string1: str = Field(..., strict=True)
    strict_string2: StrictStr
New Features
(Implemented, but not with this nice syntactic sugar)
(Does not apply when "downcasting" from JSON)
Smart Union
class MyModel(BaseModel):
    int_or_bool: int | bool
    bool_or_int: bool | int
print(MyModel(int_or_bool=1))  #> 1, 1
print(MyModel(int_or_bool=True))  #> True, True
print(MyModel(int_or_bool='1'))  #> 1, True :-( ?New Features
(Implemented, but not with this nice syntactic sugar)
(Using strict mode)
(Also works with models and model instances)
Optional vs. Nullable
class MyModel1(BaseModel):
    none_allowed_required: str | None
    none_allowed_not_required: str | None = None
      
from typing_extentions import Required, NotRequired
class MyModel2(BaseModel):  # ... maybe, do you want this?
    none_allowed_required: Required[str]
    none_allowed_not_required: NotRequired[str]New Features
(Implemented, but not with this nice syntactic sugar)
(I'm no longer scared of the word "optional")
Validation without a model
from pydantic import validate
validate(List[int], [1, 2, '3'])  #> [1, 2, 3]
validate(List[int], [1, 2, 3], strict=True)  #> [1, 2, 3]
validate(List[int], [1, 2, '3'], strict=True)
#> raises ValidationErrorNew Features
(Implemented, but not with this nice syntactic sugar)
No intermediate model required (unlike parse_obj in V1)
Parsing JSON directly
class MyModel(BaseModel):
    name: str
    age: int
    friends: List[int]
    settings: Dict[str, float]
MyModel.validate_json('{...}')New Features
(Implemented, but not with this nice syntactic sugar)
No json.loads - just rust JSON parsing straight into validation
- We could add support for other formats (e.g. yaml, toml) the only side affect would be bigger binaries
 - Not yet possible to get the line number :-(
 
Speed
New Features
| Benchmark | Speed up | 
|---|---|
| Simple model (str, int, List[int], Dict[str, float]) | 15.97x | 
| Simple model - JSON | 11.56x | 
| A bool (single value) | 3.46x | 
| Recursive model, 50 deep | 3.99x | 
| list of typed dicts, length 100 | 12.14x | 
| list of ints, length 1000 | 25.49x | 
And more...
New Features
- hopefully 🤞 less "performance guilt"™
 - context kwarg to validator functions
 - input value is available in errors
 - cleaning up the namespace - so you can use fields like "json", "dict" and "fields" and more importantly, so we can add more methods in future, either:
	
- all methods will have a prefix, e.g. "my_.model_dict()", ".model_json()", ".model_schema()"
 - or, a namespace object: ".m.dict()", ".m.json()"
 
 
pydantic-core internals
from pydantic_core import SchemaValidator
schema_validator = SchemaValidator({'type': 'bool'})
print(repr(schema_validator))Python extensions in Rust
(This code actually runs now!)
Let's start simple
SchemaValidator(name="bool", validator=BoolValidator)print(schema_validator.validate_python(True)) -> True
print(schema_validator.validate_python(1))    -> True
print(schema_validator.validate_json('true')) -> Truepydantic-core internals
from pydantic_core import SchemaValidator
# Equivalent to: Dict[str, Optional[int]]
schema_validator = SchemaValidator({
    'type': 'dict',
    'keys': {'type': 'str'},
    'values': {'type': 'optional', 'schema': {'type': 'int'}}
})Python extensions in Rust
Let's get a bit more complicated
SchemaValidator(name="dict", validator=DictValidator {
    strict: false,
    key_validator: Some(StrValidator),
    value_validator: Some(
        OptionalValidator {validator: IntValidator},
    ),
    min_items: None,
    max_items: None,
    try_instance_as_dict: false,
})pydantic-core internals
class MyModel(BaseModel):
    name: str
    age: int | None = 42
    settings: dict[str, float]
    friends: list[int | str]Python extensions in Rust
And finally...
(you don't need to read all this)
SchemaValidator(name="MyCoreModel", 
  validator=ModelClassValidator {
    strict: false,
    class: Py(0x12fe7e7c0), (MyCoreModel)
    new_method: Py(0x00101054130), (MyCoreModel.__new__)
    validator: ModelValidator {
        name: "Model",
        fields: [
            ModelField {
                name: "name",
                default: None,
                validator: StrValidator,
            },
            ModelField {
                name: "age",
                default: 42,
                validator: OptionalValidator { 
                  validator: IntValidator 
                },
            },
            ModelField {
                name: "settings",
                default: None,
                validator: DictValidator { ... },
            },
            ModelField {
                name: "friends",
                default: None,
                validator: ListValidator {
                    strict: false,
                    item_validator: Some(
                        UnionValidator {
                            choices: [
                                IntValidator,
                                StrValidator,
                            ],
                        },
                    ),
                    min_items: None,
                    max_items: None,
                },
            },
        ],
        extra_behavior: Ignore,
        extra_validator: None,
    },
})Many other people have said all this, there are many (much better) talks about it.
But for completeness, the good:
- Speed is the biggest win
 - Provides a way to hook into existing great libraries written in rust - rtoml and watchfiles - when is someone going to do this for ASGI?
 - Rust's error handling makes easy to catch and deal with errors
 - Rust provides excellent primitives for threading
 - pyo3 is amazing, getting started is very easy
 
The bad:
- Writing rust will always be slower than python
 - there's a big learning curve
 
The ugly:
- Fighting the borrow checker is boring
 - There will be boilerplate, macros to avoid boilerplate can be even worse...
 
The obvious things
Writing Python extensions in Rust
- 
"Performance guilt":
	
- With rust there's no penalty for recursion
 - and no penalty for small functions
 - so ... you can build more modular code without paying a performance penalty
 - 
The theory is that someone can come along in 5 years time and add another type to pydantic-core, and:
		
- The type checker and linter will stop them doing dumb things
 - There will be zero runtime penalty if you don't use it
 - Their change can be small since there's no performance penalty for calling out to existing code
 
 
 - 
Contributions ... ?
	
- Will there be fewer, will they be "better"? or worse?
 
 
The less obvious things
Writing Python extensions in Rust
#[derive(Debug, Clone)]
pub struct OptionalValidator {
    validator: Box<dyn Validator>,
}
impl OptionalValidator {
    pub const EXPECTED_TYPE: &'static str = "optional";
}
impl Validator for OptionalValidator {
    fn build(schema: &PyDict, config: Option<&PyDict>) -> PyResult<Box<dyn Validator>> {
        let schema: &PyAny = schema.get_as_req("schema")?;
        Ok(Box::new(Self {
            validator: build_validator(schema, config)?.0,
        }))
    }
    fn validate<'s, 'data>(
        &'s self,
        py: Python<'data>,
        input: &'data dyn Input,
        extra: &Extra,
    ) -> ValResult<'data, PyObject> {
        match input.is_none() {
            true => Ok(py.None()),
            false => self.validator.validate(py, input, extra),
        }
    }
    fn validate_strict<'s, 'data>(
        &'s self,
        py: Python<'data>,
        input: &'data dyn Input,
        extra: &Extra,
    ) -> ValResult<'data, PyObject> {
        match input.is_none() {
            true => Ok(py.None()),
            false => self.validator.validate_strict(py, input, extra),
        }
    }
    fn set_ref(&mut self, name: &str, validator_arc: &ValidatorArc) -> PyResult<()> {
        self.validator.set_ref(name, validator_arc)
    }
    validator_boilerplate!(Self::EXPECTED_TYPE);
}
Writing Python extensions in Rust, example... (please don't cry)
mod optional;
...
lots of code...
...
    validator_match!(
        type_,
        dict,
        config,
        ... all the other validators
        // unions
        self::union::UnionValidator,
        self::optional::OptionalValidator,
        ...
    )Writing Python extensions in Rust, example...
using my validator
Questions?
Checkout: github.com/samuelcolvin/pydantic-core
And: github.com/samuelcolvin/pydantic
Follow me on twitter: @samuel_colvin
Pydantic V2
By Samuel Colvin
Pydantic V2
- 10,876