Pydantic is a Python library for data validation and settings management using Python-type hinting. In this post, we will look at some of the tips and tricks of Pydantic
Recently, I got introduced to Pydantic. I was heavily using FastAPI and absolutely love how it enforces you to use Pydantic for Data serialization and validation. Life before Pydantic was mostly Flask and Django. While both of these were great frameworks, we need something like FastAPI even to see where it is flawed. Enough about FastAPI. This post is not about that.
Let's go back to Pydantic.
What is Pydantic?
It is a Python library. You do pip install pydantic and come into some powerful stuff. It is primarily used to validate data coming into your application and serialize data going out of your application.
To add custom validation to a field, you can use the validator decorator
from pydantic import validatorclassBookModel: ...@validator("isbn")defisbn_must_be_valid(cls,v): regex ="^(?=(?:\D*\d){10}(?:(?:\D*\d){3})?$)[\d-]+$"ifnot re.search(regex, v):raiseValueError("Invalid ISBN")
Validate all fields
There will be cases where you want to validate one field based on another field. You can use the root_validator decorator to validate all fields
from pydantic import root_validator, BaseModelclassBook(BaseModel): ...@root_validatordefcheck_published_date(cls,values):if values.get("published_date")< values.get("publisher").get("started_date"):raiseValueError("Book cannot be published before publisher started")
So how does it work?
After adding all the above validators, run the following code. You will encounter a series of validation errors. Now imagine this class hooked to an HTTP request body. You no longer have to handle independent validations.
We can add Config class to configure the model to tweak the behavior of the model. For example, we can set extra to forbid to prevent extra fields from being added to the model. We can also set allow_population_by_field_name to True to allow the population of the model by field name. We can also set fields to a dictionary of field names and its configuration. For example, we can set alias for a field.
from pydantic import BaseModel, FieldclassBookModel(BaseModel): ...classConfig: extra ="forbid" allow_population_by_field_name =True fields ={"name":{"alias":"book_name"},"published_date":{"alias":"published"}}
Setting common alias
from pydantic import BaseModel, Fielddefto_camel(string:str) ->str:return"".join(word.capitalize() for word in string.split("_"))classBookModel(BaseModel): ...classConfig: extra ="forbid" allow_population_by_field_name =True alias_generator = to_camel
There are a lot more configs which you can explore in the docs
Inheritance woohoo!
Inheritance with pydantic becomes even more powerful since we are also inheriting the config from the parent class.
book_model.dict()# all fieldsbook_model.dict(exclude={"isbn"})# exclude isbnbook_model.dict(exclude={"isbn"}, by_alias=True)# use aliasbook_model.dict(include={"name", "price", "publisher": {"name"}})# only these fieldsbook_model.dict(exclude_unset=True)# removes all None
Serialize to JSON
book_model.json()# all fieldsbook_model.json(exclude={"isbn"})# exclude isbn
Now let's see how we can use this dataclass to validate data
book =Book(name="The Alchemist", publisher=Publisher(name="HarperCollins", location="New York"), price="abcd", isbn="abcdef", published_date="1799-05-01")
Gives an error TypeError: __init__() got an unexpected keyword argument 'publisher'. Let's fix that and retry.
book =Book(name="The Alchemist", price="abcd", isbn="abcdef", published_date="1799-05-01")
That passed. But notice how we didn't get any error for the price field. That's because dataclasses don't validate data.
Let's see how we can add validation to dataclasses
from dataclasses import dataclass, field, fieldsfrom typing import Optionalimport datetime@dataclassclassBook: ...def__post_init__(self):for field infields(self):if field.name =="price":ifnotisinstance(field.value, float):raiseValueError("Price must be a float")elif field.name =="isbn": regex ="^(?=(?:\D*\d){10}(?:(?:\D*\d){3})?$)[\d-]+$"ifnot re.search(regex, field.value):raiseValueError("Invalid ISBN")elif field.name =="published_date":if field.value < datetime.date(1800, 1, 1):raiseValueError("Published date must be greater than 1800")
That's a lot of hoops to jump through. The code doesn't look clean. We cannot blame Dataclass completely for this. Dataclasses were not designed to validate data. They were designed to create classes with less boilerplate. To keep it generic we had to compensate on lack of powerful features like pydantic.
Want validation? but with dataclass?
Pydantic got you covered in that aspect from pydantic.dataclasses import dataclass , and you can use it just like you would use dataclass.
Attrs
Attrs is another library which is similar to dataclasses. Attrs is more closer to pydantic than dataclasses. Let's see how we can use attrs to validate data. They have pretty good argument on why you should use attrs over dataclasses. You can read it here
from attrs import asdict, define, make_class, Factory@defineclassPublisher: name:str location:str@defineclassBook: name:str price:float isbn:str published_date: datetime.date publisher: Publisher =Factory(Publisher, name="HarperCollins", location="New York")# book cannot be created without a publisher __attrs_post_init__ =lambdaself: self.publisher# published date should be >= 1800 published_date_validator = validator("published_date")(lambda self, attribute, value: value >= datetime.date(1800, 1, 1))
book =Book(name="The Alchemist", price="abcd", isbn="abcdef", published_date="1799-05-01")