Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's pretty much looking like anything can be extracted from language. Some harder than others for sure but with enough scale it does look like eventually everything falls. Text only GPT-4 has a pretty solid understanding of space that 3.5 definitely lacks. You can see more thorough experiments in the microsoft agi paper where they test it's ability to track the visual space of a maze.


There is no such thing as text-only GPT-4 unless you are referring to at inference time.


There is such a thing as a text only GPT-4 lol. It wasn't trained to be multimodal from scratch. First a text only version was trained and then it was made multimodal somehow ( the details are unknown but making a text only LLM multimodal isn't new e.g Palm, Flamingo, Blip-2, Fromage). The text only version exists and is what the microsoft researchers had access to.


That would make sense to me, but AFAIK the existence of text-only trained GPT-4 is not publicly reported? Or I missed this.


It has been, it was in the Microsoft research paper "Sparks of AGI". You can watch the lead author of the paper, Sebastien Bubeck, present it here: https://youtu.be/qbIk7-JPB2c

It's a good video for understanding GPT-4 as a "What are we sure that LLMs are technically capable of?" exercise. As he notes in the video right at the start, the model was made safe and thus has significantly lower performance in the public release, so the examples he shows aren't replicable in the different model the public has access to.


I see that you are probably referring to the claim at 4:30... but I'm not sure he is actually saying that the early model had no text capability or if it merely was not something they were given access to.


But only things extractable from language. A large part of robotics isn't linguistic. The specific weights of the model aren't in language either.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: