You know, there is more to deep learning research than meta-learning/architecture exploration. Sure you can explore the hyper-parameter space faster with 500 GPUs and get yet again a 0.05% better test score on ImageNet (or more I don't actually know), but there are other ways to do something meaningful in DL without using such compute power.
That's a fair point and I agree. It's just sometimes difficult to beat automated exploration; as a standard company you probably don't have access to top-end researchers/practitioners, just average ones, and those might get a significant boost by trading smartness for brute force and run many models in parallel in evolutionary fashion.
When you e.g. look at how Google's internal system constructs loss functions and how many combinations do they try, one has to have an unexpected idea to beat their results, and that idea can be usually quickly incorporated into their platform, raising the bar for individual researchers. At Facebook they basically press a few buttons and select a few checkboxes, then wait until best model is selected, leading to frustration among researchers.
It's just an indicator that extensive improvement is possible. But as with adding more and more shader cores you can get your FPS gainz to the point of Ray-tracing appears around the corner and now you need to add different kind of cores.
Same process for research. You suppose to find some insights on how to do one thing or another, find the direction of search, eventually there would be hardware to fully explore that direction. Then you move on to a different direction. Rinse-repeat