Don't miss the 2023! The article really is already a little outdated: since Rust 1.78, the compiler uses some more aggressive loop unrolling (and a little bit of SIMD): https://godbolt.org/z/zhbobW7rr
OP wrote: "Looking at the assembly, it's doing some loop unrolling.", linking to https://godbolt.org/z/Kv77abW6c, which is using "Rust Nightly", a moving target. By now, there's more loop unrolling.
I ran the benchmarks locally on the same hardware, using latest nightly Rust 1.81 (aggressive loop unrolling): no difference. Same speed as 1.5 years ago.
OP wrote: "Looking at the assembly, it's doing some loop unrolling.", linking to https://godbolt.org/z/Kv77abW6c, which is using "Rust Nightly", a moving target. By now, there's more loop unrolling.
Loop unrolling started with Rust 1.59: https://godbolt.org/z/5PTnWrWf7
Update: The code on Github shows they were using Rust version 1.67.0-nightly, 2022-11-27.