Here are the cases where it helps me (I promise this isn't ai generated even tho...

xmprt · 2025-06-11T16:36:25 1749659785

> Using apis I am familiar with but don't have memorized

I think you have to be careful here even with a typed language. For example, I generated some Go code recently which execed a shell command and got the output. The generated code used CombinedOutput which is easier to used but doesn't do proper error handling. Everything ran fine until I tested a few error cases and then realized the problem. In other times I asked the agent to write tests cases too and while it scaffolded code to handle error cases, it didn't actually write any tests cases to exercise that - so if you were only doing a cursory review, you would think it was properly tested when in reality it wasn't.

tptacek · 2025-06-11T18:38:27 1749667107

You always have to be careful. But worth calling out that using CombinedOutput() like that is also a common flaw in human code.

dingnuts · 2025-06-11T19:18:25 1749669505

The difference is that humans learn. I got bit by this behavior of CombinedOutput once ten years ago, and no longer make this mistake.

csallen · 2025-06-11T19:41:51 1749670911

This applies to AI, too, albeit in different ways:

1. You can iteratively improve the rules and prompts you give to the AI when coding. I do this a lot. My process is constantly improving, and the AI makes fewer mistakes as a result.

2. AI models get smarter. Just in the past few months, the LLMs I use to code are making significantly fewer mistakes than they were.

th0ma5 · 2025-06-11T23:46:28 1749685588

That you don't know when it will make a mistake and that it is getting harder to find them are not exactly encouraging signs to me.

tptacek · 2025-06-12T00:04:53 1749686693

Do you mean something by "getting harder to find them" that is different from "they are making fewer dumb errors"?

sweetjuly · 2025-06-12T03:51:22 1749700282

There are definitely dumb errors that are hard for human reviewers to find because nobody expects them.

One concrete example is confusing value and pointer types in C. I've seen people try to cast a `uuid` variable into a `char` buffer to, for example, memset it, by doing `(const char *)&uuid)`. It turned out, however, that `uuid` was not a value type but rather a pointer, and so this ended up just blasting the stack because instead of taking the address of the uuid storage, it's taking the address of the pointer to the storage. If you're hundreds of lines deep and are looking for more complex functional issues, it's very easy to overlook.

gf000 · 2025-06-12T07:29:52 1749713392

But my gripe with your first point is that by the time I write an exact detailed step-by-step prompt for them, I could have written the code by hand. Like there is a reason we are not using fuzzy human language in math/coding, it is ambiguous. I always feel like doing those funny videos where you have to write exact instructions on how to make a peanut butter sandwich, getting deliberately misinterpreted. Except it is not fun at all when you are the one writing the instructions.

2. It's very questionable that they will get any smarter, we have hit the plateau of diminishing returns. They will get more optimized, we can run them more times with more context (e.g. chain of thought), but they fundamentally won't get better at reasoning.

mpweiher · 2025-06-12T09:12:43 1749719563

> Like there is a reason we are not using fuzzy human language in math/coding, it is ambiguous

On the foolishness of "natural language programming"

https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667...

smallnamespace · 2025-06-12T09:59:27 1749722367

> by the time I write an exact detailed step-by-step prompt for them, I could have written the code by hand

The improved prompt or project documentation guides every future line of code written, whether by a human or an AI. It pays dividends for any long term project.

> Like there is a reason we are not using fuzzy human language in math/coding

Math proofs are mostly in English.

kasey_junk · 2025-06-11T21:09:48 1749676188

And you can build automatic checks that reinforce correct behavior for when the lessons haven’t been learned, by bot or human.

mlinhares · 2025-06-11T13:52:46 1749649966

The downside for formulaic code kinda makes the whole thing useless from my perspective, I can't imagining a case where that works.

Maybe a good case, that i've used a lot, is using "spreadsheet inputs" and teaching the LLM to produce test cases/code based on the spreadsheet data (that I received from elsewhere). The data doesn't change and the tests won't change either so the LLM definitely helps, but this isn't code i'll ever touch again.

dontlikeyoueith · 2025-06-11T17:19:05 1749662345

> Maybe a good case, that i've used a lot, is using "spreadsheet inputs" and teaching the LLM to produce test cases/code based on the spreadsheet data (that I received from elsewhere)

This seems weird to me instead of just including the spreadsheet as a test fixture.

mlinhares · 2025-06-11T19:29:45 1749670185

The spreadsheet in this case is human made and full of "human-like things" like weird formatting and other fluffiness that makes it hard to use directly. It is also not standardized, so every time we get it it is slightly different.

vmg12 · 2025-06-11T13:57:46 1749650266

There is a lot of formulaic code that llms get right 90% of the time that are impossible to build macros for. One example that I've had to deal with is language bridge code for an embedded scripting language. Every function I want available in the scripting environment requires what is essentially a boiler plate function to be written and I had to write a lot of them.

Groxx · 2025-06-11T15:39:37 1749656377

There's also fuzzy datatype mapping in general, where they're like 90%+ identical but the remaining fields need minor special handling.

Building a generator capable of handling all variations you might need is extremely hard[1], and it still won't be good enough. An LLM will both get it almost perfect almost every time, and likely reuses your existing utility funcs. It can save you from typing out hundreds of lines, and it's pretty easy to verify and fix the things it got wrong. It's the exact sort of slightly-custom-pattern-detecting-and-following that they're good at.

1: Probably impossible, for practical purposes. It almost certainly makes an API larger than the Moon, which you won't be able to fully know or quickly figure out what you need to use due to the sheer size.

gf000 · 2025-06-12T07:32:43 1749713563

Well yeah, this is a good application of LLMs as this is a fundamentally text-to-text operation they excel at.

But then why are so many people expect them to do well in actual reasoning tasks?

thadt · 2025-06-12T00:50:55 1749689455

I get that reference! Having done this with Lua and C++, it’s easy to do, but just tedious repetition. Something that Swig could handle, but it adds so much extra code, plumbing and overall surface area for what amounts to just a few lines of glue code per function that it feels like overkill. I can definitely see the use for a bespoke code generator for something like that.

Freedom2 · 2025-06-12T01:16:06 1749690966

To be pedantic, OP wasn't referencing anything in the usual sense that we use it in (movie, comic, games references). They were more speaking from personal experience. In that sense, there's nothing to "reference" as such.

mlinhares · 2025-06-11T14:57:28 1749653848

You could definitely build a code generator that outputs this but definitely a good use case for an LLM.

felipeerias · 2025-06-12T00:09:13 1749686953

Planning is indeed a very underrated use case.

One of my most productive uses of LLMs was when designing a pipeline from server-side data to the user-facing UI that displays it.

I was able to define the JSON structure and content, the parsing, the internal representation, and the UI that the user sees, simultaneously. It was very powerful to tweak something at either end and see that change propagate forwards and backwards. I was able to hone in on a good solution much faster that it would have been the case otherwise.

j1436go · 2025-06-12T07:16:02 1749712562

As a personal anecdote I've tried to create Shell scripts for the testing of a public HTTP API that had pretty good documentation and in both cases the requests did not work. In one case it even hallucinated an endpoint.

owl_vision · 2025-06-11T18:04:17 1749665057

plus 1 for using agents for api refresher and discovery. i also use regular search to find possible alternatives and about 3-4 out of 10 normal search wins.

Discovering private api using an agent is super useful.