I suspect you are right. We may be stuck at the gpt4 sizes for a bit just because of hardware costs though. As they get bigger it costs too much to run them until our hardware becomes more optimal for these large models at 4 bits or so.
I think the YouTube videos is going to be the next big training set. A transformer trained on all text and all of YouTube will be killer amazing at so much. I bet it can understand locomotion and balance and body control from YouTube.
I wonder if TPUs, like Google's Tensor chip, will beat out GPUs when it comes to image/video based training?
NewPipe/Freetube/Invidious instances + SponsorBlock API support is really great at filtering all of these useless advertising memes. Somehow ($$$) the behavior of TV adverts culture seeped into YouTube. I've not watched broadcast/cable TV in years, but when I interact with people that do, inevitably, they make similar references. It's super weird.
Yeah, SponsorBlock (https://sponsor.ajay.app/) is crowdsourced data for the in-video ads. Weirdly enough, I've never contributed timestamps, but the vast majority of the content I watch has submitted timestamps. I support a bunch of creators on patreon, but very much disagree with their double dipping.
> I wonder if TPUs, like Google's Tensor chip, will beat out GPUs when it comes to image/video based training?
One of the OpenAI guys was talking about this. He said the specific technology does not matter, it is just a cost line item. They don't need to have the best chip tech available as long as they have enough money.
That said I am curious if anyone else can really comment on this. It seems like as we get to very large and expensive models we will produce more and more specialized technology.
Whether or not cost matters much depends on your perspective.
If you’re OpenAI and GPT4 is just a step on the way to AGI, and you can amortize that huge cost over the hundreds of millions in revenue you’re gonna pull in from subscriptions and API use… then sure you’re probably not very cost sensitive. It could be 20% cheaper or 50% more expensive, whatever, it’s so good your customers will use it at a wide range of costs. And you have truckloads of money from Microsoft anyways.
If you’re a company or a developer trying to build a feature, whole new product, or an entire company on top of GPT then that cost matters a whole lot. The difference between $0.06 and $0.006 per turn could be infeasible vs. shippable.
If you’re trying to compete with OpenAI then you’re probably doing everything possible to reduce that training cost.
So, whether or not it matters - it really depends.
> They don't need to have the best chip tech available as long as they have enough money.
That sounds like someone who is "Blitzscaling." Costs do not matter in those cases, just acquiring customers and marketshare. But for the rest of us, who will see benefits but are not trying to win a $100B market, we will cost optimize.
Maybe it's just a line item to them, but it's pretty relevant to anyone operating with a less-than-gargantuan budget. If a superior/affordable chip is widely available, OpenAI's competitive advantage recedes rapidly because suddenly everyone else can do what they can. To some extent that's exactly what happened with DALL-E/StableDiffusion.
assuming it's not horizontally scalable, because otherwise they would just out-spend everyone else anyway like they've already done. That's a big "if", though.
I think the YouTube videos is going to be the next big training set. A transformer trained on all text and all of YouTube will be killer amazing at so much. I bet it can understand locomotion and balance and body control from YouTube.
I wonder if TPUs, like Google's Tensor chip, will beat out GPUs when it comes to image/video based training?