At scale, it becomes a wonderful tool. Are the people in this thread so threaten...

2OEH8eoCRo0 · on Feb 5, 2023

It is a wonderful tool but I still feel that the creators of the training data are getting shafted. I'm both amazed and horrified at our creation and what it portends.

ericd · on Feb 5, 2023

Yeah, there will probably have to be some adjustment. In the future, maybe an ML agent will hire people to go find answers for it about questions it has, using us as researchers/mechanical Turks :-) Quality matters more than quantity for something that’s trying to understand the world well and not just building a statistical language model, I imagine that it will be worth it to pay for quality when training heavily used models, to avoid using garbage info. You don’t need 30 different superficial product reviews with a bunch of SEO text if you have one that’s very thoroughly researched.

And in the meantime, with ads no longer working, maybe crypto is actually useful for something here - lightning makes very small transactions possible with basically no fees, and makes it easy to programmatically pay for things. People hate being nickled and dimed, but a professional trying to construct an ML model could reasonably budget for use fees for fast unhindered access to quality training data. An agent could even evaluate its likelihood of learning something new/accurate vs the cost proposed by the server, and choose the subsets to pull.

Just a random idea, but I hope we don’t fight tooth and nail to preserve the trash heap of the internet’s current state.

anonymouskimmer · on Feb 5, 2023

The internet has always been a trash heap. We've just been creating new heaps with parts of the old heaps every few years or so. Sure, it's nice to imagine a future in which this isn't the case, but your imagination is not going to be the future reality.

ericd · on Feb 5, 2023

People are already being paid to curate data for models, I’m mostly suggesting that that might become a major revenue source, and that ads might be less relevant in a world where people don’t need to sift through the trash heap to get info (and that’s a good thing overall!)

anonymouskimmer · on Feb 5, 2023

Wouldn't an AI-driven search engine be even better than a language model for that purpose though? It could even snippet highlight the most relevant parts of various web pages to save on the sifting.

ericd · on Feb 5, 2023

Maybe, and arguably, that's what Google has been doing. But one thing I really like about the idea of using a model directly is that there's one interface to learn, whereas with web search, I'm constantly adapting to a grab bag of page types and UX conventions.