Regarding faulting mechanisms, if you've got a discrete GPU on a PCI bus then it's a separate piece of silicon that handles the network protocol. The important point is that I don't believe that the GPU cores have to be able to halt execution and save their state at any instruction.
It's certainly true that SIMD instructions in CPUs have predication which saves you a lot of trouble. The difference is that if you have two instructions which are predicated in a disjoint way you can execute them both in the same cycle in a SIMT machine but you would have to spend one cycle for each instruction in a SIMD machine. You can look at Dylan16807's link for all the details.
GPUs don't support precise exceptions. For example, you can't take a GPU program that contains a segfault, run it as a standard program (as in, not in a debug mode), and be presented with the exact instruction that generated the fault.
It's certainly true that SIMD instructions in CPUs have predication which saves you a lot of trouble. The difference is that if you have two instructions which are predicated in a disjoint way you can execute them both in the same cycle in a SIMT machine but you would have to spend one cycle for each instruction in a SIMD machine. You can look at Dylan16807's link for all the details.