This is rather like the bluescreen problem: if your application tries to open a file from the normal filesystem and it's corrupt, the user blames the application. If the user opens a file in the Dropbox folder, they blame Dropbox. So they end up engaging in heroics to not be blamed for it.
(Windows has gone to increasing lengths to accomodate and contain badly written drivers, since most bluescreens are caused by drivers. There is now a subsystem to allow the video drivers to crash and entirely restart without bluescreening.)
Which prevents any CUDA kernels from running longer than like 5 seconds or so. Which means you can't use the GPU to spawn it's own kernels with no PCIe/driver latency in between, because this master has to finish before windows kills it.
Last I checked, providing callback function pointers to binary vendor libraries (read/write adapters for FFT come to mind, allowing on-the-fly metric computation or skipping an intermediate storage for FFT convolution) was only possible on Linux, and with statically linking said vendor library into the software (incidentally breaking binary distributability for GPL).