I use LLMs for coding and I like it the way I am using it. I do not outsource thinking, and I do not expect it to know what I want without giving it context to my thoughts with regarding to the project. I have written a 1000 LOC program in C using an LLM. It was a success. I have reviewed it "line by line" though, I do not know why I would not do this. Of course it did not spit out 1000 LOC from the get go, we started small and we built upon our foundations. It has an idea of my thinking and my preferences with regarding to C and the project because of our interactions that gave it context.
> I have written a 1000 LOC program in C using an LLM.
> I have reviewed it "line by line" though, I do not know why I would not do this.
1k LOC is not that much. I can easily do this in a day's project.
But it's pretty rare you're going to be able to review every line in a mature project, even if you're developing that project. Those can contain hundreds or even thousands of files with hundreds (hopefully not thousands) of LOC. While it's possible to review every line it's pretty costly in time and it's harder since the code is changing as you're doing this...
Think of it this way, did you also review all the lines of code in all the libraries you used? Why not? The reasoning will be pretty similar. This isn't to say we shouldn't spend more time exploring the code we work with nor that we likely wouldn't benefit from this, but that time is a scarce resource. So the problem is when the LLM is churning out code faster than you can review.
While coding you are hopefully also debugging and thinking. By handing coding over to the LLM you decouple this. So you reduce your time writing lines of code but increase time spent debugging and analyzing. There will be times where this provides gains but IME this doesn't happen in serious code. But yeah, my quick and dirty scripts can be churned out a lot faster. That saves time, but not 10x. At least not for me
So when people talk about safety, it does matter in Rust, right? Because "1k LOC is not that much. I can easily do this in a day's project.". Why should we choose Rust over anything below 100k LOC if it is nothing?
I am just asking. Everyone says 1k LOC is nothing, yet they want to replace 1k LOC in C with 1k LOC in Rust. You can do it in day. You are a professional!
Or what is your point? That 1k LOC projects are useless or pointless? Because if so, I seriously beg to differ.
> So the problem is when the LLM is churning out code faster than you can review.
> when people talk about safety, it does matter in Rust, right?
I'm not sure I would make this about languages. Different languages have different advantages, but there's always a trade-off, right? For example, this `cp` issue is a bit of a problem for the coreutils rewrite[0]. I think you gotta ask the question: what benefit does rewriting in rust provide? Potentially more safety, but also something like coreutils has been heavily investigated for the past few decades. Rewriting also comes with the chance of introducing new bugs. So is it safer? Hard to say, right? Especially since Rust is still new and there's not a lot of major software written in it.
> Everyone says 1k LOC is nothing
> Or what is your point?
The point we're trying to make is that lines of code are not the bottleneck. Probably one of the big problems with our industry right now is an over reliance on metrics (KPIs). But what can you measure in coding? Lines? Commits? Tickets? Is any of that meaningful?
I said in another comment[1] that I've spent hours or days to write /one line/ of code, or even partial. Does that mean I was doing a bad job? Was I just slacking off? I think this is something many developers have experienced. Were we all lazy? Dumb?
I'd argue that you can't answer that question from the information alone. Sometimes a single line of code is crazy hard to figure out. If you haven't seen this before, allow me to introduce you to some old coding lore[2]
//When I wrote this, only God and I understood what I was doing
//Now, God only knows
The thread has other examples of where people have wasted time trying to understand some "magic". Or maybe you know Carmack's Fast Inverse Square Root Algo[3]. Look at that one. It's 7 LOC (5?) yet those lines are so powerful. That is not the type of code someone writes in a flow state, off the top of their head. That is the type of code you write because you used a profiler[4], found the bottleneck, and optimized the crap out of it. Writing 7 lines takes no time, but I'm sure that code took at least a week to write.
The point here is that it is really hard to measure the quality and effectiveness of a programmer. The context of the problem is not something that can be abstracted away when evaluating them. Unfortunately, this means to evaluate them you also need to be an expert programmer AND have meaningful context to understand the specific problems they are working on. There's not a thing you can do from a spreadsheet. The truth is that if you optimize from the spreadsheet you'll only introduce more Jira tickets. There's a joke that there's 2 types of 10x programmers. The other one is the programmer that introduces 100x the jira tickets while completing them 10x as fast. The problem is that that programmer doesn't see the bigger scope and makes mistakes that leads to new tickets. This might be like your new rockstar junior dev. They fill out tickets but are solving the problems in isolation, not in context of the codebase. This leads to more complexity and bugs later on, but that lag in effect is hard to measure/identify so it is easy to think they are a rockstar but actually a problem.
> I start small, I can review just fine.
Yes, and this is how you should do it. I mentioned Unix Philosophy[5] previously. But the thing is that projects continue. Scope expands. If you want to keep writing small programs and integrate them together then you actually need to think quite carefully about the design and implementation of them (again, see Unix Philosophy).
So the point is that everything is highly context driven. That's what matters. You need nuance and care. It is not easy to say what makes good code or even identify it. So... LGTM
In that case, I agree with you with everything and I do actually try to do it the way you mentioned.
And I am expert programmer (I would like to believe) and I use LLMs just to get some refresher of my options and whatnot, and I choose where the project goes, with my knowledge. All my prompts are very specific, which requires knowledge.
nobody in this or any meaningful software engineering discussion is talking about software projects that are 1000, or even 10000, SLoC. these are trivial and uninteresting sizes. the discussion is about 100k+ SLoC projects.
I do not see how this is always necessarily implied. And should I seriously always assume this is the case? Where are you getting this from? None of these projects people claim to successfully (or not) written with the help from LLM have 10k LOC, let alone >100k. Should they just be ignored because LOC is not >100k?
Additionally, why is it that whenever I mention success stories accomplished with the help of LLMs, people rush to say "does not count because it is not >100k LOC". Why does it not count, why should it not count? I would have written it by hand, but I finished much faster with the help of an LLM. These are genuine projects that solve real problems. Not every significant project has to have >100k LOC. I think we have a misunderstanding of the term "significant".
> nobody in this or any meaningful software engineering discussion is talking about software projects that are 1000, or even 10000, SLoC.
Because small programs are really quick and easy to write, there was never a bottleneck making them and the demand for people to write small programs is very small.
The difficulty of writing a program scales super linearly with size, an experienced programmer in his current environment easily writes a 500 line program in a day, but writing 500 meaningful lines to an existing 100k line codebase in a day is not easy at all. So almost all developer time in the world is spent making large programs, small programs is a drop in an ocean and automating that doesn't make a big difference overall.
Small programs can help you a lot, but that doesn't replace programmers since almost no programmers are hired to write small programs, instead automatically making such small programs mostly helps replace other tasks like regular white collar workers etc whose jobs are now easier to automate.
> but writing 500 meaningful lines to an existing 100k line codebase in a day is not easy at all.
I've had plenty of instances where it's taken more than a day to write /one line/ of code! I suspect most experienced devs have also had these types of experiences.
Not because the single line was hard to write but because the context in which it needed to be written.
Typing was never the bottleneck and I'm not sure why this is the main argument for LLMs (e.g. "LLMs save me from the boilerplate). When typing is a bottleneck it seems like it's more likely that the procedure is wrong. Things like libraries, scripts, and skeletons tend to be far better solutions for those problems. In tough cases abstraction can be extremely powerful, but abstraction is a difficult tool to wield.
> Things like libraries, scripts, and skeletons tend to be far better solutions for those problems.
My feelings exactly.
LLM code generation (at least, the sort where people claim they're being 10X-ed) feels like it competes with frameworks. "An agent built this generic CRUD webapp on its own with only 30 minutes of input from me!"—well, I built an equivalent webapp in 30 minutes with Django. These are off-the-shelf solutions to solved problems. Yes, a framework like Django requires up-front learning, but in the end it leaves you with fewer lines of code to maintain, as opposed to custom-generated LLM code.
There's an argument to be made that this gap is actually highlighting design issues rather than AI limitations.
It's entirely possible to have a 100k LOC system be made up of effective a couple hundred 500 line programs that are composed together to great effect.
That's incredibly rare but I did once work for a company who had such a system and it was a dream to work in. I have to think AIs are making a massive impact there.
> It's entirely possible to have a 100k LOC system be made up of effective a couple hundred 500 line programs that are composed together to great effect.
I'm confused. Are you imagining a program with 100k LoC is contained in a single file? Because you'd be insane to do such a thing. It's normally a lot of files with not LoC each, which de facto meets this criteria.
You may also wish to look at UNIX Philosophy. The idea that programs should be small and focused. A program should do one thing and do it well. But there's a generalization to this philosophy when you realize a function is a program.
I do agree there's a lot of issues with design these days but I think you've vastly oversimplified the problem.
> It's entirely possible to have a 100k LOC system be made up of effective a couple hundred 500 line programs that are composed together to great effect.
To me, this sounds like an nightmare—I'm sure anyone who's worked at a shop with way too many microservices would agree. It's trivial to right-click a function call and jump to its definition; much harder to trace through your service mesh and find out what, exactly, is running at `load-balancer.kube.internal:8080/api`.