As the person who wrote that: the speedup here is primarily due to using a smarter algorithm, though Nimrod helped in keeping the constant factor low.
I expect that one could get the same performance in C, though unrolling the top levels of the recursive search would be a bit more cumbersome and having efficient bound checks for arrays in Nimrod helped in debugging some variants of the algorithm.
In short, the benefits of Nimrod here were expressiveness and safety rather than raw performance (though having performance of optimized code that is competitive with C hardly hurts, either).
I expect that one could get the same performance in C, though unrolling the top levels of the recursive search would be a bit more cumbersome and having efficient bound checks for arrays in Nimrod helped in debugging some variants of the algorithm.
In short, the benefits of Nimrod here were expressiveness and safety rather than raw performance (though having performance of optimized code that is competitive with C hardly hurts, either).