Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is an astoundingly bad take. Surely you aren't trying to suggest that original, factual, human-authored content has no more inherent value than randomly generated nonsense?


That's Wittgenstein's argument.


No not at all, I'm not sure why you would even think that.


As I read it, your parent comment suggests that the distinction in quality and utility between human-authored and AI-generated content is merely "a matter of perspective", i.e. that there is no real distinction, and that they're both equally valuable.

If you actually meant something else, you should probably clarify.


I am not the person to whom you replied. I understood their comment to be about paradigms shifting through social awareness of the limits and opportunities of new technology.

It can be both true that right now predominantly low quality content emanates from LLMs and at some future time the highest quality material will come from those sources. Or perhaps even right now (the future is already here, just unevenly distributed).

If that was their reasoning, I tend agree. The equivalent of the Catholic Church in this metaphor is the presumption human-generated content's inherent superiority.


LLMs are inherently approximations of collective knowledge. They will never be better than their training sets. It's a statistical impossibility.


Suggesting clarification to suit your imaginary inferences seems puzzling. The parent post pointed out that perspectives on authorship have a historical precedent, I didn’t see the value judgement your reading suggested.


The discussion here is that we're not able to distinguish them.

If we cannot distinguish, I'd argue they have similar value.

They must have. Otherwise, how can we demonstrate objectively the higher value in the human output?


They can be distinguished. They are just becoming more difficult to. Its slightly-more difficult, but also the amount of garbage is overwhelming. AI can spit out entire books in moments that would take an individual months or years to write.

There are lots of fake recipe books on amazon for instance. But how can you really be sure without trying the recipes? It might look like a recipe at first glance, but if its telling you to use the right ingredients in a subtly-wrong way, its hard to tell at first glance that you won't actually end up with edible food. Some examples are easy to point at, like the case of the recipe book that lists Zelda food items as ingredients, but they aren't always that obvious.

I saw someone giving programming advice on discord a few weeks ago. Advice that was blatantly copy/pasted from chat GPT in response to a very specific technical question. It looked like an answer at first glance, but the file type of the config file chat GPT provided wasn't correct, and on top of that it was just making up config options in attempt to solve the problem. I told the user this, they deleted their response and admitted it was from chatGPT. However, the user asking the question didn't know the intricacies of "what config options are available" and "what file types are valid configuration files". This could have wasted so much of their time, dealing with further errors about invalid config files, or options that did not exist.


> Some examples are easy to point at, like the case of the recipe book that lists Zelda food items as ingredients

As an aside, the case you're thinking of was a novel, not a recipe book. Still embarrassing, but at least it was just a bit of set dressing, not instructions to the reader.

https://www.cnet.com/culture/zelda-breath-of-the-wild-recipe...

> I saw someone giving programming advice on discord a few weeks ago. Advice that was blatantly copy/pasted from chat GPT in response to a very specific technical question.

This, on the other hand, is a very real and a very serious problem. I've also seen users try to get ChatGPT to teach them a new programming language or environment (e.g. learning to use a game development framework) and ending up with some seriously incorrect ideas. Several patterns of failure I've seen are:

1) As you describe, language models will frequently hallucinate features. In some cases, they'll even fabricate excuses for why those features fail to work, or will apologize when called out on their error, then make up a different nonexistent feature.

2) Language models often confuse syntax or features from different programming languages, libraries, or paradigms. One example I've heard of recently is language models trying to use features from the C++ standard library or Boost when writing code targeted at Unreal Engine; this doesn't work, as UE has its own standard library.

3) The language model's body of "knowledge" tends to fall off outside of functionality commonly covered in tutorials. Writing a "hello world" program is no problem; proposing a design for (or, worse, an addition to) a large application is hopeless.


> The language model's body of "knowledge" tends to fall off outside of functionality commonly covered in tutorials. Writing a "hello world" program is no problem; proposing a design for (or, worse, an addition to) a large application is hopeless.

Hard disagree. I've used GPT-4 to write full optimizers from papers that were published long after the cutoff date that use concepts that simply didn't exist in the training corpus. Trivial modifications were done after to help with memory usage and whatnot, but more often than not if I provide it the appropriate text from a paper it'll spit something out that more or less works. I have enough knowledge in the field to verify the corectness.

Most recently I used GPT-4 to implement the paper Bayesian Flow Networks, a completely new concept that I recall from the comment section on HN people said "this is way too complicated for people who don't intimately know the field" to make any use of.

I don't mind it when people don't find use with LLMs for their particular problems, but I simply don't run into the vast majority of uselessness that people find, and it really makes me wonder how people are prompting to manage to find such difficulty with them.


They can indeed distinguish them, I agree. So why the fuss?

I think the concern is that bad authors would game the reviews and lure audiences into bad books.

But aren't they already able to do so? Is it sustainable long term? If you spit out programming books with code that doesn't even run, people will post bad reviews, ask for refunds. These authors will burn their names.

It's not sustainable.


It doesn't need to be sustainable as one author or one book. These aren't real authors. Its people using AI to make a quick buck. By the time the fraud is found out, they've already made a profit.

They make up an authors name. Publish a bunch of books on a subject. Publish a bunch of fake reviews. Dominate the search results for a specific popular search. They get people to buy their book.

Its not even book specific, its been happening with actual products all over amazon for years. People make up a company, sell cheap garbage, and make a profit. But with books, they can now make the cheap garbage look slightly convincing. And the cheap garbage is so cheap to produce in mass amounts that nobody can really sort through and easily figure out "which of these 10k books published today are real and which are made up by ai".

It takes time and money to produce cheap products at a factory. But once these scammers have the AI generation setup, they can just publish books on loop until someone ends up buying one. They might get found out eventually, and they will have to pretend to be a different author, and they just repeat the process.


What’s the fuss about spam? You can distinguish it from useful mail? What’s the fuss about traffic jams? You’ll get there eventually.

The LLM allow DDoS attack by increasing the threshold needed to check the books for gibberish.

It’s not like this stream of low quality did not exist before, but the topic is hot and many grifters try LLMs to get a quick buck at the same time.


It’s sustainable if you can automate the creation of amazon seller accounts. Based on the number of fraudulent Chinese seller accounts, I’d say it’s very likely automated or otherwise near 0 cost.


A piece of human-written content and a piece of AI-written content may have similar value if we cannot distinguish between them. But if you can add the information that the human-written content was written by a human to the comparison, the human-written content becomes significantly more valuable, because it allows for a much deeper reading of the text, since the reader can trust that there has been an actual intent to convey some specific set of ideas through the text. This allows the reader to take a leap of faith and put in the work required to examine the author's point of view, knowing that it is based on the desires and hopes of an actual living person with a lifetime of experience behind them instead of being essentially random noise in the distribution.


I'm not a native English speaker, but ChatGPT answers in each interaction I had with it sound bland. And I dislike the bite-sized format of it. I'm reading "Amusing Ourselves to Death" by Neil Postman and while you may agree or disagree with his take, he developed it in a very coherent way, exploring several aspects. ChatGPT's output falls into the same uncanny valley as the robotic voice from text to speech software, understandable, but no human does write that way.

ChatGPT as an autocompletion tool is fine, IMO. As well as generating alternative sentences. But anything longer than a paragraph falls back to the uncanny valley.


I totally agree. So why are people so worried about books being written by ChatGPT?

These pseudo-authors will get bad reviews, will lose money in refunds, burn their names.

It's not sustainable. Some will try, for sure, but they won't last long.


There's too many names and it's too cheap to do this.

The equilibrium shifts to making it much harder to find good books, and that was already hard enough.


If you ask LLM something you know you can distinguish noise from good output. If you ask LLM something you don’t know then how do you know if the output is correct? There are cases where checking is easier than producing the result, e.g. when you ask for a reference.


Book buyers should give themselves primarily by who's the author, I think.

Choose a book from someone that has a hard earned reputation to protect.


There is bootstrapping process of learning which authors in that field have good reputation before you know anything about the field. That is being disrupted by LLMs as well, though.


I can't distinguish between pills that contain the medicine that I was prescribed and those than contain something else entirely. Therefore taking either should be just as good.


Really. Are you comparing a complex chemical analysis required to attest the contents of a pill to reading text?


It depends, is the text of a technical nature? How exactly is one to know they're being deceived if, to take one of the examples that has been linked in this discussion, they receive a mushroom foraging guide but the information is actually AI-generated?


You first check who published it. Is the author an expert in the matter with years, perhaps decades in the industry?

Heck, we always did that since before GPT.

Good authors will continue to publish good content because they have a reputation to protect. They might use ChatGPT to increase productivity, but will surely and carefully review it before signing off.


"We" certainly did not "always" do that before.


Really? You buy books without searching anything about who wrote it?

If yes, well, there's the problem then. It's not AI, but the lack of guidance and research skills in support of the process of choosing a book.


Is it about "me" specifically? Anyway, how do I know the biography of the author I find isn't also AI-generated at this rate? Or that the purported author actually wrote the book? Your solution still ultimately depends on there being non-generated information somewhere down the line.


I thought it would be pretty obvious that one should look for biographical data from external, independent sources. If the person has earned a reputation in any industry, they'll probably have articles on respectable publishers, would have presented in conferences, maybe even have patents, etc. Just Google their name. Then Google what's associated with them. If one doesn't find anything solid, discard the book. It takes no more than 5-10 minutes to recognize a solid reputation like that.


If they were of similar value would there be a problem with the deluge?


Can't the deluge be delusional or an overreaction at best?




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: