I like the author's take: it isn't a value judgement on the individual using ChatGPT (or Gemini or whichever LLM you like this week), it's that the thought that went into making the prompt is, inevitably, more interesting/original/human than the output the LLM generates afterwards.
In my experiments with LLMs for writing code, I find that the code is objectively garbage if my prompt is garbage. If I don't know what I want, if I don't have any ideas, and I don't have a structure or plan, that's the sort of code I get out.
I'd love to hear any counterpoints from folks who have used LLMs lately to get academic or creative writing done, as I haven't tried using any models lately for anything beyond helping me punch through boilerplate/scaffolding on personal programming projects.
This is the CRUX of the issue. Even with SOTA models (Sonnet 3.5, etc) - the more open-ended your prompt - the more banal and generic the response. It's GIGO turtles all the way down.
I pointed this out a few weeks ago with respect to why the current state of LLMs will never make great campaign creators in Dungeons and Dragons.
We as humans don't need to be "constrained" - ask any competent writer to sit quietly and come up with a novel story plot and they can just do it.
That being said - they can still make AMAZING soundboards.
And if you still need some proof, crank the temperature up to 1.0 and pose the following prompt to ANY LLM:
Come up with a self-contained single room of a dungeon that involves an
unusual puzzle for use with a DND campaign. Be specific in terms of the
puzzle, the solution, layout of the dungeon room, etc. It should be totally
different from anything that already exists. Be imaginative.
I guarantee 99% of the returns will return a very formulaic physics-based puzzle response like "The Resonant Hourglass", or "The Mirror of Acoustic Symmetry", etc.
When using Claude Sonnet 3.7 for coding, I often find that constraints I add to the prompt, end up producing unintended side effects.
Some examples:
- "Don't include pointless comments." - The model doesn't keep track of what it's doing as well, I generally just do another pass after it writes the code to simplify things.
- "Keep things simple" - The model cuts corners(often unnecessarily) on things like type safety.
- "Allow exceptions to bubble up" - Claude deletes existing error handling logic. I found that Claude seems to prefer just swallowing errors and adding some logging, instead of fixing the underlying cause of the error, but adding this to the prompt just caused it to remove the error handling that I had added myself.
> I guarantee 99% of the returns will return a very formulaic physics-based puzzle response like "The Resonant Hourglass"
Haha, I was suspicious, so I tried this, and I indeed got an hourglass themed puzzle! Though it wasn't physics-based - characters were supposed to share memories to evoke emotions, and different emotions would ring different bells, and then you were supposed to evoke a certain type of story. Honestly, I don't know what the hourglass had to do with it.
It is totally different from anything that exists. It fulfils the prompt, I suppose! It has to be crazy so you can be more certain it's unique. The prompt didn't say anything about it being good.
Yeah this was the part I found a little silly, mostly because I just couldn't visualize what that mesh looked like or how I would describe how to operate it.
The room is a simple 30-foot square with a single exit door that's currently sealed. In the center sits a large stone cube (roughly 5 feet on each side) covered in various textured surfaces - some rough like sandpaper, others smooth as glass, some with ridged patterns, and others with soft fabric-like textures.
Around the room, six distinct scent emitters are positioned, each releasing a different aroma (pine, cinnamon, ocean breeze, smoke, floral, and citrus). The room is otherwise empty except for a small stone pedestal near the entrance with a simple lever.
## The Puzzle Concept
This puzzle operates on "synesthetic translation" - converting sensory experiences across different senses. The core concept is entirely verbal and tactile, making it fully accessible without visual components.
## How It Works
When players pull the lever, one of the scent emitters activates strongly, filling the room with that particular aroma. Players must then approach the central cube and touch the texture that corresponds to that smell according to a hidden synesthetic logic.
The connection between smells and textures follows this pattern:
- Pine scent → ridged texture (like tree bark)
- Cinnamon → rough, granular texture (like spice)
- Ocean → smooth, undulating surface (like waves)
- Smoke → soft, cloudy texture (like mist)
- Floral → velvet-like texture (like petals)
- Citrus → bumpy, pitted texture (like orange peel)
After correctly matching three smell-texture pairs in sequence, the door unlocks. However, an incorrect match causes the lever to reset and a new random smell to emerge.
## Communication & Accessibility
The DM describes the smells verbally when they're activated and can describe the various textures when players explore the cube by touch. The entire puzzle can be solved through verbal description, touch, and smell without requiring sight.
For extra accessibility, the DM can add:
- Distinct sounds that play when each scent is released
- Textured surfaces that have subtle temperature differences
- Verbal clues discovered through successful matches
## What Makes This Unique
This puzzle uniquely relies on cross-sensory associations that aren't commonly used in dungeons. It:
- Doesn't rely on visuals at all
- Uses smell as a primary puzzle component (rare in D&D)
- Creates unusual connections between different senses
- Has no mathematical, musical, or traditional riddle elements
- Can be experienced fully regardless of vision status
- Creates interesting roleplaying opportunities as players discuss how different scents "feel" texturally
For the DM, it's easy to describe and implement while still being conceptually unique. Players solve it through discussion, exploration, and experimentation rather than recalling common puzzle patterns.
In my experience Gemini can be really good at creative writing, but yes you have to prompt and edit it very carefully (feeding ideas, deleting ideas, setting tone, conciseness, multiple drafts, etc).
I use Gemini pretty much exclusively for creative writing largely because the long context lets you fit an entire manuscript plus ancillary materials, so it can serve as a solid beta reader, and when you ask it to outline a chapter it is very good at taking the events preceding and following into account. It's hard to overstate the value of having a decent beta reader that can iteratively review your entire work in seconds.
As a side note, I find the way that you interact with a LLM when doing creative writing is generally more important than the model. I have been having great results with LLMs for creative writing since ChatGPT 3.5, in part because I approach the model with a nucleus of a chapter and a concise summary of relevant details, then have it ask me a long list of questions to flesh out details, then when the questions stop being relevant I have have it create a narrative outline or rough draft which I can finish.
Interesting. I think I'm a better editor so I use it as a writer, but it makes sense that it works the other way too for strong writers. Your way might even be better, since evaluating a text is likely easier than constructing a good text (Which is why your process worked even back with 3.5).
I have a horrible time editing my own work. Decision paralysis and what not, but I did have the idea that a good way to practice would be editing the content of LLM generated fictional narratives. I think the point that many are making that LLMs are useful as cognitive aids that augment thinking rather than replacements for thinking. They can be used to train your mind by inspiring thoughts you wouldn't have came up with on your own.
I have mixed feelings. Generally I don’t think that LLM output should be used to create anything that a human is supposed to read, but I do carve out a big exception for people using LLMs for translation/writing in a second language.
At the same time, however, the people who need to use an LLM for this are going to be the worst at identifying the output’s weaknesses, eg just as I couldn’t write Spanish text, I also couldn’t evaluate the quality of a Spanish translation that an LLM produced. Taken to an extreme, then, students today could rely on LLMs, trust them without knowing any better, and grow to trust them for everything without knowing anything, never even able to evaluate their quality or performance.
The one area that I do disagree with the author, though, is coding. As much as I like algorithms code is written to be read by computers and I see nothing wrong with computers writing it. LLMs have saved me tons of time writing simple functions so I can speed through a lot of the boring legwork in projects and focus on the interesting stuff.
I think Miyazaki said it best: “I feel… humans have lost confidence“. I believe that LLMs can be a great tool for automating a lot of boring and repetitive work that people do every day, but thinking that they can replace the unique perspectives of people is sad.
I actually feel very strongly that code is very much written for us humans. Sure, it's a set of instructions that is intended to be machine read and executed but so much of _how_ code is written is very much focused on the human element that's been a part of software development. OOP, design patterns, etc. don't exist because there is some great benefit to the machines running the code. We humans benefit as the ones maintaining and extending the functionality of the application.
I'm not making a judgement about the use of LLMs for writing code, just that I do think that code serves the purpose of expressing meaning to machines as well as humans.
>As much as I like algorithms code is written to be read by computers and I see nothing wrong with computers writing it.
unless you're the sole contributor, code is a collaborative effort and will be reviewed by peers to make sure you don't hit any landmines at best, or ruin the codebase at worst. unless you're writing codegen itself I very much would consider writing code as if a human is going to read it.
>“I feel… humans have lost confidence“
Confidence in their fellow man? yes. As the author said a lot of this reliance on AI without proper QA comes down to "nobody cares". Or at least that mentality. And apathy is just as contagious in an environment as passion. If we lose that passion and are simply doing a task to get by and clock out, we're doomed as a species.
For creative and professional writing, I found them useful for grammar and syntax review, or finding words from a fuzzy description.
For the structure, they are barely useful: Writing is about having such a clear understanding, that the meaning remains when reduced to words, so that others may grasp it. The LLM won't help much with that, as you say yourself.
> I'd love to hear any counterpoints from folks who have used LLMs lately to get academic or creative writing done
They’re great at proofreading. They’re also good at writing conclusions and abstracts for articles, which is basically synthesising the results of the article and making it sexy (a task most scientists are hopelessly terrible at). With caveats:
- all the information needs to be in the prompt, or they will hallucinate;
- the result is not good enough to submit without some re-writing, but more than enough to get started and iterate instead of staring at a blank screen.
I want to use them to write methods sections, because that is basically the exact same information repeated in every article, but the actual sentences need to be different each time. But so far I don’t trust them to be accurate with technical details. They’re language models, they have no knowledge or understanding.
Point two is critical. I have found that the best way for me is to avoid using copy-and-paste. Instead, I put the browser on the right corner of the screen and my text editor on the left, then transcribe the text word by word by typing it using the keyboard. In this way, my natural laziness is less likely to accept words, expressions, and sentences that are perhaps okay-ish but not 100% following my taste.
LLMs may seem like magic buy they aren't. They operate within the confines of the context they're given. The more abstract the context, the more abstract the results.
I expect to need to give a model at least as much context as a decent intern would require.
Often asking the model "what information could I provide to help you produce better code" and then providing said information leads to vastly improved responses. Claude 3.7 sonnet in Cline is fairly decent at asking for this itself in plan mode.
More and more I find that context engineering is the most important aspect of prompt engineering.
I use an LLM to brainstorm for a creative writing project. Mostly I ignore its suggestions! but, somehow having the chatter helps me see what I am trying to say
Sometimes, good writing is like an NP-complete problem, hard to create, but easy to verify. If you have enough skill to distinguish good output from garbage, you can produce reasonably good results.
> Sometimes, good writing is like an NP-complete problem, hard to create, but easy to verify.
Doesn’t this match pretty much all human creation? It’s easier to judge a book that to write it, it’s easier to watch a rocket going up in the space than to build it, it’s easier to appreciate some Renaissance painting or sculpture than to actually make it.
I think the author has a fair take on the types of LLM output he has experience with, but may be overgeneralizing his conclusion. As shown by his example, he seems to be narrowly focusing on the use case of giving the AI some small snippet of text and asking it to stretch that into something less information-dense — like the stereotypical "write a response to this email that says X", and sending that output instead of just directly saying X.
I personally tend not to use AI this way. When it comes to writing, that's actually the exact inverse of how I most often use AI, which is to throw a ton of information at it in a large prompt, and/or use a preexisting chat with substantial relevant context, possibly have it perform some relevant searches and/or calculations, and then iterate on that over successive prompts before landing on a version that's close enough to what I want for me to touch up by hand. Of course the end result is clearly shaped by my original thoughts, with the writing being a mix of my own words and a reasonable approximation of what I might have written by hand anyway given more time allocated to the task, and not clearly identifiable as AI-assisted. When working with AI this way, asking to "read the prompt" instead of my final output is obviously a little ridiculous; you might as well also ask to read my browser history, some sort of transcript of my mental stream of consciousness, and whatever notes I might have scribbled down at any point.
> the exact inverse of how I most often use AI, which is to throw a ton of information at it in a large prompt
It sounds to me that you don't make the effort to absorb the information. You cherry-pick stuff that pops in your head or that you find online, throw that into an LLM and let it convince you that it created something sound.
To me it confirms what the article says: it's not worth reading what you produce this way. I am not interested in that eloquent text that your LLM produced (and that you modify just enough to feel good saying it's your work); it won't bring me anything I couldn't get by quickly thinking about it or quickly making a web search. I don't need to talk to you, you are not interesting.
But if you spend the time to actually absorb that information, realise that you need to read even more, actually make your own opinion and get to a point where we could have an actual discussion about that topic, then I'm interested. An LLM will not get you there, and getting there is not done in 2 minutes. That's precisely why it is interesting.
You're making a weirdly uncharitable assumption. I'm referring to information which I largely or entirely wrote myself, or which I otherwise have proprietary access to, not which I randomly cherry-picked from scattershot Google results.
Synthesizing large amounts of information into smaller more focused outputs is something LLMs happen to excel at. Doing the exact same work more slowly by hand just to prove a point to someone on HN isn't a productive way to deliver business value.
> Doing the exact same work more slowly by hand just to prove a point to someone on HN isn't a productive way to deliver business value.
You prove my point again: it's not "just to prove a point". It's about internalising the information, improving your ability to synthesise and be critical.
Sure, if your only objective is to "deliver business value", maybe you make more money by being uninteresting with an LLM. My point is that if you get good at doing all that without an LLM, then you become a more interesting person. You will be able to have an actual discussion with a real human and be interesting.
Understanding or being interesting has nothing to do with it. We use calculators and computers for a reason. No one hires people to respond to API requests by hand; we run the code on servers. Using the right tool for the job is just doing my job well.
> We use calculators and computers for a reason. No one hires people to respond to API requests by hand; we run the code on servers
We were talking about writing, not about vibe coding. We don't use calculators for writing. We don't use API requests for writing (except when we make an LLM write for us).
> Using the right tool for the job is just doing my job well.
I don't know what your job is. But if your job is to produce text that is meant to be read by humans, then it feels like not being able to synthesise your ideas yourself doesn't make you excellent at doing your job.
Again maybe it makes you productive. Many developers, for instance, get paid for writing bad code (either because those who pay don't care about quality or can't make a difference, or something else). Vibe coding obviously makes those developers more productive. But I don't believe it will make them learn how to produce good code. Good for them if they make money like this, of course.
> We were talking about writing, not about vibe coding. We don't use calculators for writing. We don't use API requests for writing (except when we make an LLM write for us).
We do however use them to summarize and transform data all the time. Consider the ever present spreadsheet. Huge amounts of data are thrown into spreadsheets and formulas are applied to that data to present us with graphs and statistics. You could do all of that by hand, and you'd probably have a much better "internalization" about what the data is. But most of the time, hand crafting graphs from raw data and internalizing it isn't useful or necessary to accomplish what you actually want to accomplish with the data.
You seem to not make the difference between maths and, say, literature or history.
Do you actually think that an LLM can take, say, a Harry Potter book as an input, and give it a grade in such a way that everybody will always agree on?
And to go further, do you actually use LLMs to generate graphs and statistics from spreadsheet? Because that is probably a bad idea given that there are tools that actually do it right.
> Do you actually think that an LLM can take, say, a Harry Potter book as an input, and give it a grade in such a way that everybody will always agree on?
No, but I also don't think a human can do that either. Subjective things are subjective. I'm not sure I understand how this connects to the idea you expressed that doing various tasks with automation tools like LLMs prevent you from "internalizing" the data, or why not "internalizing" data is necessarily a bad thing. Am I just misunderstanding your concern?
Many of the posts I find here defending the use of LLMs focus on "profitability". "You ask me to give you 3 pages about X? I'll give you 3 pages about X and you may not even realise that I did not write them". I completely agree that it can happen and that LLMs, right now, are useful to hack the system. But if you specialise in being efficient at getting an LLM to generate 3 pages, you may become useless faster than you think. Still, I don't think that this is the point of the article, and it is most definitely not my point.
My point is that while you specialise in hacking the system with an LLM, you don't learn about the material that goes into those 3 pages.
* If you are a student, it means that you are losing your time. Your role as a student is to learn, not to hack.
* More generally as a person, "I am a professional in summarising stuff I don't understand in a way that convinces me and other people who don't understand it either" is not exactly very sexy to me.
If you want to get actual knowledge about something, you have to actually work on getting that knowledge. Moving it from an LLM to a word document is not it. Being knowledgeable requires "internalising" it. Such that you can talk about it at dinner. And have an opinion about it that is worth something to others. If your opinion is "ChatGPT says this, but with my expertise in prompting I can get it to say that", it's pretty much worthless IMHO. Except for tricking the system, in a way similar to "oh my salary depends on the number of bugs I fix? Let me introduce tons of easy-to-fix bugs then".
We were talking about writing, not about vibe coding.
No one said anything about vibe coding. Using tools appropriately to accomplish tasks more quickly is just common sense. Deliberately choosing to pay 10x the cost for the same or equivalent output isn't a rational business decision, regardless of whether the task happens to be writing, long division, or anything else.
Just to be clear, I'm not arguing against doing things manually as a learning exercise or creative outlet. Sometimes the journey is the point; sometimes the destination is the point. Both are valid.
I don't know what your job is.
Here's one: prepping first drafts of legal docs with AI assistance before handing them off to lawyers for revision has objectively saved significant amounts of time and money. Without AI this would have been too time-consuming to be worthwhile, but with AI I've saved not only my own time but the costs of billable hours on phone calls to discuss requirements, lawyers writing first drafts on their own, and additional Q&A and revisions over email. Using AI makes it practical to skip the first two parts and cut down on the third significantly.
Here's another one: doing security audits of customer code bases for a company that currently advertises its use of AI as a cost-saving/productivity-enhancing mechanism. Before they'd integrated AI into their platform, I would frequently get rave reviews for the quality and professionalism of my issue reports. After they added AI writing assistance, nothing changed other than my ability to generate a greater number of reports in the same number of billable hours. What you're suggesting effectively amounts to choosing to deliver less value out of ego. I still have to understand my own work product, or I wouldn't be able to produce it even with AI assistance. If someone thinks that somehow makes the product less "interesting", well then I guess it's a good thing my job isn't entertainment.
Don't get me wrong: I don't deny that LLMs can help tricking other humans into believing that the text is more professional than it actually is. LLMs are engineered exactly for that.
I'd be curious to know whether your legal documents are as good as without LLMs. I wouldn't be surprised at all if they were worse, but cheaper. Talking about security audits, that's actually a problem I've seen: LLMs makes it harder to detect bad audits, and in my experience I have been more often confronted to bad security audits than to good ones.
For both examples, you say "LLMs are useful to make more money". I say "I believe that LLMs lower the quality of the work". It's not incompatible.
There's no "tricking" involved, and no basis for your assumption that LLMs lower the quality of work. I would suggest that what you and the author are observing is actually the opposite effect: LLMs broadly help improve the quality of work, all else being equal. The caveat is that when all else is not equal, this manifests in bad work being improved to a level that's still bad. The issue here is students using advanced tooling as an excuse to be lazy and undercut their own learning process, not the tool itself. LLMs are just this generation's version of Wikipedia and spell check.
As much as the author rightfully complains about the example in the post, a version that only said "explain the downsides of Euler angles in robotics and suggest some alternatives" would obviously be far worse. In this case, the AI helped elevate clear F-level work to maybe a C. That's not an indictment of AI; it's an indictment of low-quality work. LLMs lower the bar to produce passable-looking bad work, but they also lower the bar to produce excellent work. The confirmation bias here is that we don't know how many cases of B-level work became A papers with AI assistance, because those instances don't stand out in the same way.
In the audit example, LLMs aren't doing the audit. They synthesize my notes into a useful starting point to nullify writer's block, and let me focus more of my time on the hard or unique aspects of a given report. It's like having an intern write the first draft for me, typically with some mistakes or oversights, occasionally with a valuable additional insight thrown in, and often with links to a few helpful references for the customer that I wouldn't necessarily have found and included on my own. That doesn't lower the quality; it improves it.
As far as the legal example, it really depends on the complexity of a given instance and the guidance you've provided to your lawyers. A good lawyer won't sign off on something that fails to meet the requested quality bar (if anything, the financial incentive would be for them to err on the side of conservatism and toss out the draft you'd provided). But of course this all depends on you having a clear enough understanding of what you're trying to accomplish, and enough familiarity with legal documents and proficiency with language to shape everything into a passable first draft. AI speeds this up, but if you don't know what you're doing then the AI won't solve that for you. It's a tool like any other, and can be used properly or improperly.
I think that mindet directly correlates with the kind of AI hat prompted this article: "It doesn't matter" in your eyes. You don't see the task as important, only the output and that it makes you money. the craft is less important than what you can sell it for.
Yes, a learning exercise has a goal to extend your own knowledge. Business is, especially as of late, figuring out how to get the cheapest but still acceptable work to the recipient for the highest value. I suppose it's on the recipient for not checking the quality of their commission.
If you present your AI-powered work to me, and I suspect you employed AI to do any of the heavy lifting, I will automatically discount any role you claim to have had in that work.
Fairly or unfairly, people (including you) will inexorably come to see anything done with AI as ONLY done with AI, and automatically assume that anyone could have done it.
In such a world, someone could write the next Harry Potter and it will be lost in a sea of one million mediocre works that roughly similar. Hidden in plain sight forever. There would no point in reading it, because it is probably the same slop I could get by writing a one paragraph prompt. It would be too expensive to discover otherwise.
To be clear, I'm not a student, nor do I disagree with academic honor codes that forbid LLM assistance. For anything that I apply AI assistance to, the extent to which I could personally "claim credit" is essentially immaterial; my goal is to get a task done at the highest quality and lowest cost possible, not to cheat on my homework. AI performs busywork that would cost me time or cost money to delegate to another human, and that makes it valuable.
I'm expanding on the author's point that the hard part is the input, not the output. Sure someone else could produce the same output as an LLM given the same input and sufficient time, but they don't have the same input. The author is saying "well then just show me the input"; my counterpoint is that the input can often be vastly longer and less organized or cohesive than the output, and thus less useful to share.
> In such a world, someone could write the next Harry Potter and it will be lost in a sea of one million mediocre works that roughly similar. Hidden in plain sight forever. There would no point in reading it, because it is probably the same slop I could get by writing a one paragraph prompt. It would be too expensive to discover otherwise.
This has already been the case for decades. There are probably brilliant works sitting out there on AO3 or whatnot. But you'll never find them because it's not worth wading through the junk. AI merely accelerates what was already happening.
>AI merely accelerates what was already happening.
I think "merely" is underselling the magnitude of effect this can have. Asset stores overnight went form "okay I need to dig hard to find something good" to outright useless as it's flooded with unusable slop. Google somehow got worse overnight for technical searches that aren't heavily quieried.
I didn't really desire such accelerations for slop, thanks. At least I could feel good knowing human made slop was learned from sometimes.
> someone could write the next Harry Potter and it will be lost in a sea of one million mediocre works that roughly similar.
To be fair, the first Harry Potter is a kinda average British boarding school story. Rowling is barely an adequate writer (and it shows badly in some of the later books). There was a reason she got rejected by so many publishers.
However, Netscape was going nuts and the Internet was taking off. Anime was going nuts and produced some of the all time best anime. MTV animation went from Beavis and Butthead to Daria in this time frame. Authors were engaging with audiences on Usenet (see: Wheel of Time and Babylon 5). Fantasy had moved from counterculture for hardcore nerd boys to something that the bookish female nerds would engage with.
Harry Potter dropped onto that tinder and absolutely caught fire.
I don't really assossiate harry potter's rise with that of the internet. By the time it lit the internet ablaze was in the 2000's, after the first few movies aired.
It certainly wasn't the writing that elevated it. I think it was as simple as tapping into an audience who for once wasn't raised as some nuclear family. a Cinderella esque tale of being whisked away from abuse mixed with a hero's journey towards his inevitable clash with the very evil that set this in motion.
The movies definiely helped too. The first few were very well done with excellent child actors. Watching many other fantasy adaptations try to replicate that really shows just how the stars align into making HP a success.
I was surprised to find how not true that is when I eventually read the books for myself, long after they became a phenomenon. The books are well-crafted mystery stories that don't cheat the reader. All the clues are there, more or less, for you to figure out what's happening, yet she still surprises.
The world-building is meh at best. The magic system is perfunctory. But the characters are strong and the plot is interesting from beginning to end.
In my experiments with LLMs for writing code, I find that the code is objectively garbage if my prompt is garbage. If I don't know what I want, if I don't have any ideas, and I don't have a structure or plan, that's the sort of code I get out.
I'd love to hear any counterpoints from folks who have used LLMs lately to get academic or creative writing done, as I haven't tried using any models lately for anything beyond helping me punch through boilerplate/scaffolding on personal programming projects.