Windows NT has been ported to PowerPC, DEC Alpha and Itanium in the past, so an ARM port shouldn't be especially hard. Longhorn was probably a much more ambitious change to WinNT than a CPU port. MS would probably want to introduce some kind of universal binary format as well, but that shouldn't be undoable either.
To ensure portability, NT was originally written for the DEC Alpha and later ported to Intel. (It was also original created with a Pig-Latin UI and later localized to English.) Also, the Xbox360 runs a stripped-down branch of NT on big-endian PowerPC. An native ARM port should be relatively easy.
That's not a universal binary. It's a bytecode package. It still needs to be run through an interpreter or a JIT compiler, just like a JAR file for Java.
A universal binary contains actual machine code for multiple architectures. Universal binary support requires deeper changes to the OS and can complicate testing, but it's basically a requirement for making cross-platform high performance code because a JIT compiler can't spend as much time optimizing code as an ahead-of-time compiler.
Machine-independent bytecode paired with a high-quality VM allows you to ship a cross-platform executable, but it's not going to be enough when ARM PCs are facing an uphill battle to prove their performance is acceptable to a market that isn't particularly satisfied with Intel's Atom.
That depends on the market, I think. AFAIK most of the dissatisfaction with Atom is due to its power consumption rather than its performance, so an ARM with comparable computing power and lower power consumption would be quite satisfactory for most users when teamed up with a solid GPU.
It's a smart, easy optimization, so I'd be surprised if .NET wasn't using it, but ultimately it has the same effect as reducing the frequency of GC pauses. It doesn't lead to faster execution. It doesn't change the fact that the code didn't pass through a more thorough analyzer/optimizer. How good are JIT compilers at automatic vectorization, for example? Opportunities for automatic vectorization could be encoded into bytecode such that the SIMD capabilities of different architectures could be used, but I don't think .NET does that.