"slowness" is not the problem here - it's "slow program startup". If your process only lasts for tens of milliseconds, you're not going to make it any faster by embedding C code because 99% of time is still going to be spent initializing the python VM.