> The real value methinks is actually over the control of proprietary data used for training which is the single most important factor for model output quality.

Maybe. But we've barely scratched the surface of being more economical with data.

I remember back in the old days, there was lots of work on eg dropout and data augmentation etc. We haven't seen too much of that with the like of ChatGPT yet.

I'm also curious to see what the future of multimodal models holds: you can create almost arbitrarily amounts of extra data by pointing a webcam at the world, especially when combined with a robot, or letting your models also play StarCraft or Diplomacy against each other.

