Does Windows really keep a pool of zero pages? With COW, a single zero page is sufficient -- any copying would be done when a write is attempted over the page.
Windows-style ahead of time zeroing thread will typically take the cache misses and memory bandwidth for that page twice. (Assuming your code subsequently puts its own data on the pages, instead of just ordering up zeroed pages to sit on)
Are you certain it's not bypassing cache on the writes? It's been relatively straightforward to do this on x86 for ages now, especially if you're using SSE to fill many bytes at once. I would be shocked if the page zeroing thread did cached reads or writes.
The page-zeroing code may be able to minimize its effect on the cache, but it will then necessarily consume memory bandwidth -- 4 KB of bandwidth per page zeroed as it writes each page. So, it still affects overall performance.
And it guarantees that when a process goes to use the pages they will not be in the cache. Ah, tradeoffs.