Nice! In case it isn't obvious, randn! takes an existing vector and fills it up with new random normal variates. This allows computing a bunch of random variates at once, which is a little more efficient, without having to allocate a new vector every time through the loop.
Another option would be to generate all M-1xI random numbers ahead of time, but that would be less memory efficient.
I tried that too, but it wasn't any faster – you've already gotten most of the gains to be had by lifting the random number generation out of the innermost loop. So I went with the option that uses less memory.
i tried the approach of generating them ahead of time and then loading them from disk. it only uses 80mb of memory for this example.
the difference i see between generating numbers on-the-fly using boost and reading them from disk is 6.41s vs 0.32s, ie a 20x speedup, for 10 repetitions of 100,000 paths over 100 timesteps.
Another option would be to generate all M-1xI random numbers ahead of time, but that would be less memory efficient.