Here are the cases where it helps me (I promise this isn't ai generated even though im using a list...)
- Formulaic code. It basically obviates the need for macros / code gen. The downside is that they are slower and you can't just update the macro and re-generate. The upside is it works for code that is slightly formulaic but has some slight differences across implementations that make macros impossible to use.
- Using apis I am familiar with but don't have memorized. It saves me the effort of doing the google search and scouring the docs. I use typed languages so if it hallucinates the type checker will catch it and I'll need to manually test and set up automated tests anyway so there are plenty of steps where I can catch it if it's doing something really wrong.
- Planning: I think this is actually a very under rated part of llms. If I need to make changes across 10+ files, it really helps to have the llm go through all the files and plan out the changes I'll need to make in a markdown doc. Sometimes the plan is good enough that with a few small tweaks I can tell the llm to just do it but even when it gets some things wrong it's useful for me to follow it partially while tweaking what it got wrong.
Edit: Also, one thing I really like about llm generated code is that it maintains the style / naming conventions of the code in the project. When I'm tired I often stop caring about that kind of thing.
> Using apis I am familiar with but don't have memorized
I think you have to be careful here even with a typed language. For example, I generated some Go code recently which execed a shell command and got the output. The generated code used CombinedOutput which is easier to used but doesn't do proper error handling. Everything ran fine until I tested a few error cases and then realized the problem. In other times I asked the agent to write tests cases too and while it scaffolded code to handle error cases, it didn't actually write any tests cases to exercise that - so if you were only doing a cursory review, you would think it was properly tested when in reality it wasn't.
This applies to AI, too, albeit in different ways:
1. You can iteratively improve the rules and prompts you give to the AI when coding. I do this a lot. My process is constantly improving, and the AI makes fewer mistakes as a result.
2. AI models get smarter. Just in the past few months, the LLMs I use to code are making significantly fewer mistakes than they were.
There are definitely dumb errors that are hard for human reviewers to find because nobody expects them.
One concrete example is confusing value and pointer types in C. I've seen people try to cast a `uuid` variable into a `char` buffer to, for example, memset it, by doing `(const char *)&uuid)`. It turned out, however, that `uuid` was not a value type but rather a pointer, and so this ended up just blasting the stack because instead of taking the address of the uuid storage, it's taking the address of the pointer to the storage. If you're hundreds of lines deep and are looking for more complex functional issues, it's very easy to overlook.
But my gripe with your first point is that by the time I write an exact detailed step-by-step prompt for them, I could have written the code by hand. Like there is a reason we are not using fuzzy human language in math/coding, it is ambiguous. I always feel like doing those funny videos where you have to write exact instructions on how to make a peanut butter sandwich, getting deliberately misinterpreted. Except it is not fun at all when you are the one writing the instructions.
2. It's very questionable that they will get any smarter, we have hit the plateau of diminishing returns. They will get more optimized, we can run them more times with more context (e.g. chain of thought), but they fundamentally won't get better at reasoning.
> by the time I write an exact detailed step-by-step prompt for them, I could have written the code by hand
The improved prompt or project documentation guides every future line of code written, whether by a human or an AI. It pays dividends for any long term project.
> Like there is a reason we are not using fuzzy human language in math/coding
The downside for formulaic code kinda makes the whole thing useless from my perspective, I can't imagining a case where that works.
Maybe a good case, that i've used a lot, is using "spreadsheet inputs" and teaching the LLM to produce test cases/code based on the spreadsheet data (that I received from elsewhere). The data doesn't change and the tests won't change either so the LLM definitely helps, but this isn't code i'll ever touch again.
> Maybe a good case, that i've used a lot, is using "spreadsheet inputs" and teaching the LLM to produce test cases/code based on the spreadsheet data (that I received from elsewhere)
This seems weird to me instead of just including the spreadsheet as a test fixture.
The spreadsheet in this case is human made and full of "human-like things" like weird formatting and other fluffiness that makes it hard to use directly. It is also not standardized, so every time we get it it is slightly different.
There is a lot of formulaic code that llms get right 90% of the time that are impossible to build macros for. One example that I've had to deal with is language bridge code for an embedded scripting language. Every function I want available in the scripting environment requires what is essentially a boiler plate function to be written and I had to write a lot of them.
There's also fuzzy datatype mapping in general, where they're like 90%+ identical but the remaining fields need minor special handling.
Building a generator capable of handling all variations you might need is extremely hard[1], and it still won't be good enough. An LLM will both get it almost perfect almost every time, and likely reuses your existing utility funcs. It can save you from typing out hundreds of lines, and it's pretty easy to verify and fix the things it got wrong. It's the exact sort of slightly-custom-pattern-detecting-and-following that they're good at.
1: Probably impossible, for practical purposes. It almost certainly makes an API larger than the Moon, which you won't be able to fully know or quickly figure out what you need to use due to the sheer size.
I get that reference! Having done this with Lua and C++, it’s easy to do, but just tedious repetition. Something that Swig could handle, but it adds so much extra code, plumbing and overall surface area for what amounts to just a few lines of glue code per function that it feels like overkill. I can definitely see the use for a bespoke code generator for something like that.
To be pedantic, OP wasn't referencing anything in the usual sense that we use it in (movie, comic, games references). They were more speaking from personal experience. In that sense, there's nothing to "reference" as such.
One of my most productive uses of LLMs was when designing a pipeline from server-side data to the user-facing UI that displays it.
I was able to define the JSON structure and content, the parsing, the internal representation, and the UI that the user sees, simultaneously. It was very powerful to tweak something at either end and see that change propagate forwards and backwards. I was able to hone in on a good solution much faster that it would have been the case otherwise.
As a personal anecdote I've tried to create Shell scripts for the testing of a public HTTP API that had pretty good documentation and in both cases the requests did not work. In one case it even hallucinated an endpoint.
plus 1 for using agents for api refresher and discovery. i also use regular search to find possible alternatives and about 3-4 out of 10 normal search wins.
Discovering private api using an agent is super useful.
- Formulaic code. It basically obviates the need for macros / code gen. The downside is that they are slower and you can't just update the macro and re-generate. The upside is it works for code that is slightly formulaic but has some slight differences across implementations that make macros impossible to use.
- Using apis I am familiar with but don't have memorized. It saves me the effort of doing the google search and scouring the docs. I use typed languages so if it hallucinates the type checker will catch it and I'll need to manually test and set up automated tests anyway so there are plenty of steps where I can catch it if it's doing something really wrong.
- Planning: I think this is actually a very under rated part of llms. If I need to make changes across 10+ files, it really helps to have the llm go through all the files and plan out the changes I'll need to make in a markdown doc. Sometimes the plan is good enough that with a few small tweaks I can tell the llm to just do it but even when it gets some things wrong it's useful for me to follow it partially while tweaking what it got wrong.
Edit: Also, one thing I really like about llm generated code is that it maintains the style / naming conventions of the code in the project. When I'm tired I often stop caring about that kind of thing.