Pretty much everything in the high end large language model area is off limits to people without access to a supercomputer (we're talking hundreds of A100s or several $100k in cloud computing equivalent). Open Source efforts like BigScience may open up downstream tasks for normal people, but the forefront of this research is no longer accessible to individuals.
You might be surprised to hear that the KenLM language models that are used for speech recognition are actually trained on-disk using CPU. With a €149 monthly bare metal server, I could train my own LM on OSCAR DE and EN.
Where I do agree with you is that transformer-style text generation models in the billion parameter range are off-limits for hobbyists. But that's only a tiny part of the useful applications of AI. And you can train them with gradient checkpointing, it's just 100x slower than what Google can do.
KenLM ist not a neural network and instead a purely statistical n-gram model. So it's no surprise that it would be faster on a CPU in many cases. However, as soon as you have to deal with noisy data, KenLM gets blown out of the water by DL architectures like LSTM and, more recently, Transformers. There's a reason why purely statistical models have seen very little progress in the last 10 years (KenLM was published 11 years ago) and that reason is that this "noise" is basically just a consequence of the central limit theorem applied to data with a huge amount of nuance - much more than any human coded feature vector could ever account for.