Traditional CS may have something to do with slightly improving the performance by allowing more training for the same compute, but it won't be an order of magnitude or more. The improvements to be gained will be found more in statistics than CS per se.
I'm not sure. Methods like Chinchilla and Quantization have been able to reduce compute by more than an order of magnitude. There might very well be a few more levels of optimizations within the same statistical paradigm.