I tested both 64 bit variants a while ago and there was no throughput difference worth mentioning with the C++ and C implementations, respectively. (I settled on xxHash since it's around for longer and the code is portable, also I didn't need to do a C++->C translation).
If there is a large difference, perhaps one of the implementations is vectorized or botched.
If there is a large difference, perhaps one of the implementations is vectorized or botched.