The issue is that the Python code uses arbitrary precision while Julia code uses 64 bits. Python may be slower because of that. One way to get rid of this discrepancy is to compile with typed Cython, or with Numba, which is what I did. The other way would be to benchmark Julia with BigInt. Either way would be fair IMHO.
I was unaware that Python 3 removed the distinction between int (which used to be C-style int) and long (which was arbitrary precision).
If it's really idiomatic Python 3 to always use arbitrary precision integers for everything, then it's not really Julia's fault that Python 3 makes it more difficult to use performant arithmetic.