While I agree that there are ways to write a faster validation library in python, there are also benefits to moving the logic to native code.
msgspec[1] is another parsing/validation library, written in C. It's on average 50-80x faster than pydantic for parsing and validating JSON [2]. This speedup is only possible because we make use of native code, letting us parse JSON directly and efficiently into the proper python types, removing any unnecessary allocations.
It's my understanding that pydantic V2 currently doesn't do this (they still have some unnecessary intermediate allocations during parsing), but having the validation logic already in compiled code makes integrating this with the parser theoretically possible later on. With the logic in python this efficiency gain wouldn't be possible.
Definitely true. I've just soured on the POV that native code is the first thing one should reach for. I was surprised that it only took a few days of optimizations to convert my validation library to being significantly faster than pydantic, when pydantic as already largely compiled via cython.
If you're interested in both efficiency and maintainability, I think you need to start by optimizing the language of origin. It seems to me that with pydantic, the choice has consistently been to jump to compilation (cython, now rust) without much attempt at optimizing within Python.
I'm not super-familiar with how things are being done on an issue-to-issue / line-to-line basis, but I see this rust effort taking something like a year+, when my intuition is some simpler speedups in python could have been in a matter of days or weeks (which is not to say they would be of the same magnitude of performance gains).
Two things may preclude optimization in pure Python when producing a library for general public. Having a nice / ergonomic interface is one. Keeping things backwards-compatible is another.
msgspec[1] is another parsing/validation library, written in C. It's on average 50-80x faster than pydantic for parsing and validating JSON [2]. This speedup is only possible because we make use of native code, letting us parse JSON directly and efficiently into the proper python types, removing any unnecessary allocations.
It's my understanding that pydantic V2 currently doesn't do this (they still have some unnecessary intermediate allocations during parsing), but having the validation logic already in compiled code makes integrating this with the parser theoretically possible later on. With the logic in python this efficiency gain wouldn't be possible.
[1]: https://github.com/jcrist/msgspec
[2]: https://jcristharif.com/msgspec/benchmarks.html#benchmark-sc...