Hacker News new | past | comments | ask | show | jobs | submit login

All content will be consumed by AI and regurgitated by them, and content creators won’t get anything from it. Content creation is a complete waste of time at this point unless AI companies are prevented from consuming content for free, and regurgitating the answers.





Distribution is a part of the business. A question-answer based AI is an inferior distribution method to a human-written email that gets sent to my inbox once a week.

Is it really that hard to make utility of a crawler visit to ones page negative? Hidden links with boatloads of garbage, generated nonsense and so on ... Seems like the only viable method how to fight back to me.

Maybe a link that effectively works as a crawler trap, where it links to a bunch of Markov chain garbage, and the Markov chains just keep linking to more that are randomly generated.

Markov chains are almost free compared to LLM generation, and the content, from a machine point of view, couldn't simply be ruled out as non-human unless you had a smaller model in the background pruning out that sort of thing.


That won’t matter because AI changes the market from a “speculative, broad-appeal” content model to an “on-demand individually tailored” one. Consumers will stop searching for something sorta like they wanted and start asking for exactly what they want. Pushing static content onto a web site won’t be able to compete most of the time.

What wouldn't matter? LLMs feed on content and data scraped from web pages, and writers don't want their work stolen. That's it. It doesn't matter if AI made search engines obsolete and changed the market or whatever. It will still need to scrap content in some other way and the comment above is exploring counter-measures.

They don't need to fight "AI" and prevent it, they only need to make sure their work isn't stolen at individual level, and there are tons of options they can adapt.


> LLMs feed on content and data scraped from web pages...

Not exclusively, no. As we march forward I'd expect:

- the number of web pages generated by AI will outpace the number generated by humans, probably on the order of a few magnitudes, meaning that human-generated content will drown in a sea of machine generated content. Furthermore, ad-supported web sites will all be AI generated, cutting off advertising funding for human-generated sites

- this will make web pages a very poor source of information to scrape overall, since it'll mostly be LLM output

- sophisticated AI builders will start signing licensing deals with content creators that gives them exclusive access for their AI. Think: Medical and legal journals, large archives of historical works, stock market historical data, technical manuals, etc. This content isn't very consumer friendly as-is, but could be used to generate consumer friendly content that is technically more accurate.

> It's weird how you paint that it's inevitable, just to tell them "tough luck"? They don't need to fight "AI" they only need to make sure their work isn't stolen, and there are many options they can adapt to.

I think you're suggesting that people can continue publishing their content on publicly accessible web sites and that they're somehow able to detect that their site is being visited by an AI web scraper, and in those cases they feed "garbage" to the scraper. I'm suggesting that this will be a loosing battle, that trying to detect bots already is a forever-war that can't be won, while at the same time consumer behavior will change such that publishing pages on public web sites won't be an effective way to reach an audience because of the mountains of AI-generated content.


Which raises the question what would be the point of LLMs when the majority of their training data is just the fluff generated by other LLMs. When the output of an LLM becomes indistinguishable from its input, why go to the trouble of training it at all?

I don't believe that LLM's cannot be made such that they distinguish AI from non-AI content. On the contrary. There are ways to compare similarly looking content and digging up the one that provides most insights, new ideas or actually measured data. There are ways to find out how 'old' the content is by keeping track of when it is updated and in what way (the actual data or just meta data about it, or just their interpretation or presentation).

The crawlers/bots spend time and energy and hence will be optimized to not just blindly eat rehashed content. Unlike currently google's search engine. The latter feeds on the revenues of the rehashed/ad'ified content.


The obvious answer is to do more with it. This will actually be a positive as soon as science is done by AI. As soon as the economy is done by AI. As soon as wars are fought by AI.

I also wonder who is going to post on stack overflow if questions can be answered by an LLM. At some point new consumable content drops like a cliff?

LLM will need to consume official documents but there won't be much on "I got this exception..." content.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: