With PyO3, I built the library to parse datetimes 10x faster than `datetime.strptime` in just a few lines of code: https://github.com/gukoff/dtparse
It just calls the Rust's chrono library that does the parsing and wraps the result in a Python object. You can do it for any Rust library, it's very, very easy!
Feel free to use this repo as a reference if you want to build a similar thing. The code is commented, and there's a working GitHub action that builds the wheels for all platforms and uploads them to PyPi: https://github.com/gukoff/dtparse/tree/master/.github/workfl...
I was surprised to find out how slow strptime() can be. I was working on a data-focused project that was finally starting to slow down from the growing volume of data. I was looking at river heights over time, and once I hit about 140,000 data points the project got slow enough to make some profiling and optimization worthwhile. I was quite surprised to find it was spending more than two full seconds just running strptime(), out of a total execution time of around 15 seconds.
I ended up looking at a bunch of different ways of processing timestamps in Python: strptime(), string parsing, regex, datetime.isoformat(), NumPy, Pandas, and more. I got a 46x speedup using datetime.isoformat(). Other approaches got anywhere from 4x to 40x speedup, and a couple approaches were an order of magnitude slower than strptime().
My takeaway was there's no substitute for profiling the actual code you're running, and focusing on the specific bottlenecks in your own project. I wrote this up in a blog post if anyone's interested, "What's faster than strptime()?"
I'm very curious to hear the use case for which date time parsing was the bottleneck! Also, I'm surprised that the overhead of calling across the language boundary didn't dwarf the gains from parsing...
One of the components in our project was churning through thousands of JSONs per second - deserializing, transforming and serializing them.
These JSONs represented the flight information. They included multiple datetimes, such as the scheduled departure/arrival time and the real departure/arrival time of a flight.
The first bottleneck was JSON deserializarion/serializarion. At that time we solved it with ujson, and now there's the even more performant orjson.
The second bottleneck happened to be datetime deserializarion. And we solved it with ciso8601 - luckily, these datetimes were in ISO8601. But this bottleneck later repeatedly occured in the other components and became an inspiration to write dtparse :)
I've had this situation a few times. Most recently transforming large (1-50 GB) CSV files in to a format that can be digested by a proprietary bulk DB loader.
Because our problem was just about reformatting we ended up reading the CSVs in binary mode and using struct to extract the relevant values from the date time fields. But if we needed to do actual date logic something like this would perhaps be useful (but there other fast date time libraries out there, I've been a fan of pendulum for some tasks).
That makes sense, but I have a hard time believing the approach of calling into a date time parser O(n) times is going to yield a significant performance gain no matter how much faster the parser is. However, I'm being downvoted, so perhaps I'm mistaken?
Sometimes it's about optimizing wall time not algorithmic complexity.
If you have a batch SLA of 1 hour, and your currently spending 50-70 mins to complete the batch and 20 minutes of that time is spent date parsing and you can reduce it to 5 minutes that's an big win.
No doubt, but if your date parsing saves you 1 second per date parsed but each call into the faster library costs 2 seconds, then your performance actually suffers. The only way around this is to make a batch call such that the overhead is O(1).
I’m not going to install it to check, but when someone writes “Fast datetime parser for Python written in Rust. Parses 10x-15x faster than datetime.strptime.” it seems reasonable to assume that this is not the case.
In a language like Java where you mostly spend time in the VM and only occasionally jump into native code, that might be true. But in python a huge part of the runtime is this kind of native call. So I would not expect that this approach adds any new overhead.
Your conclusion might be right, but your reasoning is certainly wrong. Calling native functions in Python is often quite expensive because you need to marshal between PyObjects and the native types (probably allocating memory as well). This doesn’t “feel” so slow in Python because, well, everything in Python is slow. But you really start to notice it when you’re optimizing.
Of course "It depends", but in my experience that kind of thing is rare. Either you're passing in str and can just grab the char* out of the existing PyObject, or you have some more complicated thing that was wrapped once in a PyObject and doesn't need to be converted, etc. But sure, if you have some dict with a lot of data and need to convert it into an std::map you'll have a bad time.
My instinct is that the overhead is small. You need to add a few C stack frames and do some string conversion on each call, maybe an allocation to store the result. It’s not going to be as quick as doing in pure Rust, but the python-to-native code layer can be pretty lightweight I think!
Right, and that makes sense, but the context here is a date parsing library for Python--unless said library has a batch interface, I'm not sure how that would improve performance, but maybe I'm misestimating something.
I've certainly never been bottlenecked on date parsing :) However, many/most of the high performance python libraries are built in C code, and compiled down into something the python interpreter can use directly. There are lots of python bindings written in c++ to native c libraries as well, I know I have used ZeroMQ pretty recently. Rust is done the same way- the code is compiled down into objects that Python can use directly- its not like running a javascript interpreter in your code.
I have seen it in many cases, especially working on financial data. My most recent example was working with real time feeds of trades, which we used ML models on top of. Inference was based on accumulated volume per fixed amount of time (say 30 sec, 1 min), and the code doing this in real time was python.
I don't remember the numbers, but caching + using ciso8601 was essential to manage the peak load (maybe 50k trades per sec ?).
I was looking at PyO3 a few months ago, after discovering the orjson python (with rust inside) library and radically speeding up an auto-ML app for work.
I really enjoyed starting to learn Rust, but found the process to embed in Python to be rather intimidating. Looking forward to using your repo as a reference, and love the dtparse work you've done.
Another cheap trick if the time column is sequential is to split the string into date and time components, cache the date part and calculate the time part just with some multiplication
Major caveat is timezone handling, but this only applies in a subset of situations
If you've got to that point of modifying the storage format then you might as well just use an integer (microseconds success the epoch) and be done with it. That seems cleaner than using a string (or two strings) anyway.
I've been playing with PyO3 for prototyping, and wrapped some Rust code to see if it's faster than Python. The experience was very much like using Boost Python (whcih these days has alternative with https://github.com/pybind/pybind11). It's _really_ easy to wrap code for Python, and it has nice APIs to ensure GIL is held. Being Rust, I'm much more confident I won't suffer from memory unsafety issues which my C++ at the time did.
Now I'm starting to use it as part of the Python memory profiler I'm working on (https://pythonspeed.com/fil), in this case to call in to the low-level Python C API which PyO3 includes bindings for in addition to its high-level API. This kind of usage is more like writing C, except with the benefit of having high-level APIs (for GIL holding, but also object conversion) available when I need it.
So basically you get safe, high-level, easy-to-use APIs, with fallback to low-level unsafe APIs if you need them.
There's definitely a conversion cost. For strings, Python apparently caches the UTF-8 encoded string, so if you _repeatedly_ transfer it to Rust I suspect (but haven't checked) that the cost is much lower.
In general I suspect it's the usual "NumPy arrays are fast, everything else you better be getting a sufficiently large boost from the low-level code to justify conversion".
For the thing I prototyped in Rust, it was wrapping the `ahocorasick` crate which was in fact faster than `pyahocorasick` which is written in C or Cython or something. Both have similar conversion costs, probably, so it came down to "for lots of data the Rust version was faster".
Nice! Reach out if there are any problems or if you need something exposed in the API. Looking at the pyahocorasick issue tracker, there are a number of features/bugs that your wrapper package would resolve. :)
NumPy also support conversions without copying. One thing I haven't found good way to bridge between Python is the pandas.DataFrame, it seems to be quite Python focused object and iterating through DataFrame is particularly slow.
Been using Maturin for a little while professionally, and it's surprisingly good. There's a few bugbears here and there - I haven't found a way to have Cargo Test & a pyo3 library working at the same time - but overall it's a lot more pleasant than working with Rust and R was.
Between pyodide, pyo3, rust-cpython, and rustpython, I think Pyo3 is the best way to drop in rust in a python project for a speed up, if that is your goal. Some of the demos show using python from rust, but to me the biggest feature is without a doubt compiling rust code to native python modules. I'm using it to speed up image manipulation backed by numpy arrays.
There’s a setuptools rust [0] extension package that can be used to hook the compilation of the rust into the wheel building or install from source. Maturin [1] seems to be regarded as the new and improved solution for this, but I found that it’s angled toward the using python from rust.
There’s also the rust numpy [2] package by the same org which is fantastic in that it lets you pass a numpy matrix to a native method written in rust and convert it to the rust equivalent data structure, perform whatever transformation you want (in parallel using rayon [3]), and return the array. When building for release, I was seeing speed ups of 100x over numpy on the most matrix mathable function imaginable, and numpy is no joke.
I think there is a lot of potential for these two ecosystems together. If there’s not a python package for something, there’s probably a rust crate.
If anyone is interested the python package that I'm building with some rust backend, its called pyrogis [4] for making custom image manipulations through numpy arrays.
> Between pyodide, pyo3, rust-cpython, and rustpython, I think Pyo3 is the best way to drop in rust in a python project for a speed up, if that is your goal. Some of the demos show using python from rust, but to me the biggest feature is without a doubt compiling rust code to native python modules. I'm using it to speed up image manipulation backed by numpy arrays.
> There’s a setuptools rust [0] extension package that can be used to hook the compilation of the rust into the wheel building or install from source. Maturin [1] seems to be regarded as the new and improved solution for this, but I found that it’s angled toward the using python from rust.
> There’s also the rust numpy [2] package by the same org which is fantastic in that it lets you pass a numpy matrix to a native method written in rust and convert it to the rust equivalent data structure, perform whatever transformation you want (in parallel using rayon [3]), and return the array. When building for release, I was seeing speed ups of 100x over numpy on the most matrix mathable function imaginable, and numpy is no joke.
What sort of algorithm was that? Generally getting 100x speedup on vectorized code is highly unusual even using handcoded c++. So I suspect it was quite loop heavy? In those cases I have also seen very significant speed ups.
I have been using pythran [1] for speeding up my python code. It generally achieves extremely good performance. I have blogged about it here [2] and recently a member used pythran to speed up some nbody benchmarks [3] which was used in an article to argue for using compiled languages.
Matrix of shape (rows, columns, 3). Average the last dim for each point and change it to [0,0,0] if average less than a value, [255,255,255] if greater. A brightness threshold. May be remembering the speed up factor wrong so take it with a grain of salt - fact of the matter is it was very impressive.
I’m checking out that post later, I’m trying to make my package easy to build on, so being able to write extensions with Pythran would be another great option for speed ups. Thanks
Just for the fun of it I tested what speed up I could get with a naive algorithm and pythran. Based on your description it looks like the I should do the following:
def threshold_pixel(img, thr):
out = np.zeros_like(img)
o = np.mean(img, axis=-1)
out[o>thr] = 255
return out
This runs in ~30ms for a (1024,1024,3) array using numpy on my machine. Using pythran (note I had to explicitely write out the loop for out[o>thr] =255, due to a bug, that I found and just reported), I get a speed of 6.ms (with openmp) and 9ms without (I did not tune the openmp, but this should yield a much higher speedup).
P.S.: Just had a look at your project, very cool, I have to try that
I needed Blender integration a while back and wasn't sure what i could write it in. Py03 worked great with Blender with no configuration. I was quite concerned that something about the Python-embedded-Blender behavior would limit Py03.. but nope, so far it's worked flawlessly.
At work, I'm using PyO3 for a project that churns through a lot of data (step 1) and does some pattern mining (step 2). This is the second generation of the project and is on-demand compared with the large, batch project in Spark that it is replacing. The Rust+Python project has really good performance, and using Rust for the core logic is such a joy compared with Scala or Python that a lot of other pieces are written in.
Learning PyO3, I cobbled together a sample project[0] to demonstrate how some functionality works. It's a little outdated (uses PyO3 0.11.0 compared with the current 0.13.1) and doesn't show everything, but I think it's reasonably clear.
One thing I noticed is that passing very large data from Rust and into Python's memory space is a bit of a challenge. I haven't quite grokked who owns what when and how memory gets correctly dropped, but I think the issues I've had are with the amount of RAM used at any moment and not with any memory leaks.
Huggingface Tokenizers (https://github.com/huggingface/tokenizers), which are now used by default in their Transformers Python library, use pyO3 and became popular due to the pitch that it encoded text an order of magnitude faster with zero config changes.
It lives up to that claim. (I had issues with return object typing when going between Python/Rust at first but those are more consistent now)
I'm interested in running Python inside wasmtime. I think PyO3 would be useful. We could build a small Rust wasm binary that exports an "execute_python_script" function. This would finally be a way to run Python in a strong sandbox with memory [0] and CPU [1] restrictions. (In 1999, I asked Guido for sandboxing support in Python, but he refused.)
That's a really great name you came up with! Embodies both parts of your focus, stays pronounceable. Does the 3 relate to the Python version or are you mimicking some specific molecule that I can't think of?
Oh, I know, I wasn't trying to correct you or anything. I was just adding on to the correct answer to point out that PyO3's naming scheme is part of a popular trend in Rust libraries.
I have! Used FastAPI as a frontend to do some minor data modification, and passed the data for model inference in Rust.
Works really nicely, although given how little work I'm doing in the Python side I honestly prefer using Rocket instead of FastAPI and then using pyo3 to call the Python library in Rust, rather than the other way around.
I think the idea is that they move their business logic to the Rust code, since Rust's type system is more powerful and more sound, instead of trying to make do with MyPy
Wouldn't it be more of a priority to move it for lower memory use and higher request speed? A better type system is good, but often these are a struggle with scaling interpreted languages compared to other lower level languages.
It would minimize the python surface required to be covered with type-hints and mypy. If possible, one could simply point django to the modules generated from rust.
I'll give it a shot tonight and see how it goes. Now I'm curious.
It just calls the Rust's chrono library that does the parsing and wraps the result in a Python object. You can do it for any Rust library, it's very, very easy!
The only slightly complicated part is the distribution. You need to use https://github.com/PyO3/maturin or https://github.com/PyO3/setuptools-rust, and of course, you need to have Rust installed on the wheel-building machine.
Feel free to use this repo as a reference if you want to build a similar thing. The code is commented, and there's a working GitHub action that builds the wheels for all platforms and uploads them to PyPi: https://github.com/gukoff/dtparse/tree/master/.github/workfl...