I've used a lot of SSE2 and prefetch instincs in Visual C++ for math / data heavy high performance computing applications. It can make a fairly big different -- like more than 50% performance improvements for key parts of the code.
That depends heavily on with CPU and memory subsystem combination you're running it on. You need to know how far to prefetch and how to order your data in memory.