I might be wrong, but my understanding is that we're on a decelerating slope of perf/transistor and have been for quite a while - I just looked up the OpenCL benchmark results of the 1080 Ti vs 4090, and the perf/W went up by 2.8x despite going from 16nm to 5nm, with perfect power scaling, we would've seen a more than 10x increase.
Probably not. There will be better GPUs. It's like we did use all those Kepler K10 and K80 fifteen or so years ago, they were Ok for models with few millions of parameters, then Pascal and Volta arrived ten years ago with massive speed up and larger memory, allowing to train same size models 2-4 times faster, so you simply had to replace all Keplers. Then Turing happened making all P100 and V100 obsolete. Then A100, and now H100. Next L100 or whatever with just more on-board memory will make H100 obsolete quickly.
One thing that is missing is that we have massively improved the performance of the algorithms lately to require less compute power, so a H100 will still be performant several years from now. The problem will be that it's going to be using up more power and physical space than an out-performing future version and so will need to be scrapped.