Exploring even a tiny, tiny, tiny part of the hyperparam space takes thousands of GPUs. And that is for a single dataset and model---change anything and you have to redo the entire thing.
I mean, maybe some day, but right now, we're poking at like 0.00000000001% of the space, and that is state-of-the-art progress.
I mean, maybe some day, but right now, we're poking at like 0.00000000001% of the space, and that is state-of-the-art progress.