It’s some effort but I bet they added a classical serial CPU to run the existing code. In fact, [1] suggests that’s exactly what they did. I suspect they had other reasons to add the GSP so the amortized cost of moving the driver code to firmware was actually not that large all things considered and in the long term reduces their costs (eg they reduce the burden further of supporting multiple OSes, they can improve performance further theoretically, etc etc)
That's exactly what happened - Turing microarchitecture brought in new[1] "GSP" which is capable enough to run the task. Similar architecture happens AFAIK on Apple M-series where the GPU runs its own instance of RTOS talking with "application OS" over RPC.
[1] Turing GSP is not the first "classical serial CPU" in nvidia chips, it's just first that has enough juice to do the task. Unfortunately without recalling the name of the component it seems impossible to find it again thanks to search results being full of nvidia ARM and GSP pages...
here's[1] a presentation from nvidia regarding (unsure if done or not) plan for replacing Falcon with RISC-V, [2] suggests the GSP is in fact the "NV-RISC" mentioned in [1]. Some work on reversing Falcon was apparently done for Switch hacking[3]?
[1] https://download.nvidia.com/XFree86/Linux-x86_64/525.78.01/R...