Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The last time I got my hands dirty at this level was almost 20 years ago, optimizing a software renderer's texture mapping code for the original Pentium, so I don't know much about the details these days, but it sounds like the MOV would touch the data item, which might keep it from getting aged out of the cache-like thing in question (ETA: "bypass network"), so it would still be available when it was actually needed a few instructions later.

A stall can last several clock cycles, potentially quite a bit longer than a NOP.

In the code I worked on, rearranging the order of MOV instructions made the inner loop execute twice as fast, just by avoiding stalls. I think at one point I added a NOP to avoid a stall, but I don't remember if it survived into the final version of the code.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: