Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Have we really hit the wall?

Do they use GPS based data?

Feels like there’s data all around us.

Sure they’ve hit the wall with obvious conversations and blog articles that humans produced, but data is a by product of our environment. Surely there’s more. Tons more.



We also could just measure the background noise of the universe and produce unlimited data.

But just like GPS data it isn't suited for LLMs given that you know it has no relevance what so ever to language.


Ignoring the confusion about 'GPS' for a moment: there's lots and lots of other data that could be used for training AI systems.

But, you need to go multi-modal for that; and you need to find data that's somewhat useful, not just random fluctuations like the CMB. So eg you could use YouTube videos, or even just point webcams at the real world. That might be able to give your AI a grounding in everyday physics?

There's also lots of program code you can train your AI on. Not so much the code itself, because compared to the world's total text (that we are running out of), the world's total human written code is relatively small.

But you can generate new code and make it useful for training, by also having the AI predict what happens when you (compile and) run the code. A bit like self-playing for improving AlphaGo.


You’re thinking of language in the strictest of sense.

GPS data as it relates to location names, people, cultures, path finding.


What does culture and names and people have to do with the Global Position System?

You are right that we can have lots more data, if you are willing to consider other modalities. But that's not 'GPS'. Unless you are using an idiosyncratic definition of GPS?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: