At scale, it becomes a wonderful tool. Are the people in this thread so threatened or so invested in the current business models of the internet that you can’t see how amazing this sort of thing could be for our abilities as a species? Not just in its current iteration, but it will get better and better.
This could be an excellent brain augmentation, trying to hamper it because we want to force people to drag themselves through underlying sources so those sources can try to steal their attention with ads for revenue is asinine.
It is a wonderful tool but I still feel that the creators of the training data are getting shafted. I'm both amazed and horrified at our creation and what it portends.
Yeah, there will probably have to be some adjustment. In the future, maybe an ML agent will hire people to go find answers for it about questions it has, using us as researchers/mechanical Turks :-) Quality matters more than quantity for something that’s trying to understand the world well and not just building a statistical language model, I imagine that it will be worth it to pay for quality when training heavily used models, to avoid using garbage info. You don’t need 30 different superficial product reviews with a bunch of SEO text if you have one that’s very thoroughly researched.
And in the meantime, with ads no longer working, maybe crypto is actually useful for something here - lightning makes very small transactions possible with basically no fees, and makes it easy to programmatically pay for things. People hate being nickled and dimed, but a professional trying to construct an ML model could reasonably budget for use fees for fast unhindered access to quality training data. An agent could even evaluate its likelihood of learning something new/accurate vs the cost proposed by the server, and choose the subsets to pull.
Just a random idea, but I hope we don’t fight tooth and nail to preserve the trash heap of the internet’s current state.
The internet has always been a trash heap. We've just been creating new heaps with parts of the old heaps every few years or so. Sure, it's nice to imagine a future in which this isn't the case, but your imagination is not going to be the future reality.
People are already being paid to curate data for models, I’m mostly suggesting that that might become a major revenue source, and that ads might be less relevant in a world where people don’t need to sift through the trash heap to get info (and that’s a good thing overall!)
Wouldn't an AI-driven search engine be even better than a language model for that purpose though? It could even snippet highlight the most relevant parts of various web pages to save on the sifting.
Maybe, and arguably, that's what Google has been doing. But one thing I really like about the idea of using a model directly is that there's one interface to learn, whereas with web search, I'm constantly adapting to a grab bag of page types and UX conventions.
This could be an excellent brain augmentation, trying to hamper it because we want to force people to drag themselves through underlying sources so those sources can try to steal their attention with ads for revenue is asinine.