I think the author called _mm_load_ps() on a pointer that isn’t 16-bit aligned. Either change that to _mm_loadu_ps() (which also works on unaligned pointers but incurs a performance penalty), or make sure your object is aligned when doing the heap allocation (there’s a STL function called aligned_alloc() in C++, probably there’s something similar in Rust)
Debug and release builds probably call slightly different sets of methods, so they might end up with slightly different heaps. That might be enough to cause a difference in offset between the builds.