Don't forget the audits and compliance reports. No company with a C-suite with more than 3 brain cells combined will be going down that route. People forget that hobby-projects do not have the same legal and business requirements as ... enterprise projects.
Technical debt increase over the past few years is mind boggling to me.
First the microservices, then the fuckton of CI/CD dependencies, and now add the AI slop on top with MCPs running in the back. Every day is a field day for security researchers.
And where are all the new incredible products we were promised? Just goes to show that tools are just tools. No matter how much you throw at your product, if it sucks, it'll suck afterwards as well. Focus on the products, not the tools.
CFAA. Grand jury indicted because of the claims I created Bitcoin and BitTorrent, which are sort of presented as the same thing! Also maybe something something Comcast because he works there?
Firstly, BitTorrent and Bitcoin are two completely different technologies, neither of which I created or have committed code to. Nothing related to Comcast was involved and I never have worked there or any other company like that.
I had a great legal team. Very dedicated. Some areas offer great public defenders because the case load is much lighter than others.
Also it's kind of fun to see MD5 hashes used as evidence! I didn't know that was still a thing and I was curious of the legal grounds for the fact that any forensics expert could walk in and provide an equivalent of md5.gif proving it could be fabricated. I explained to my legal team that I could go home and create a version of my hard drive that is filled with repeating copies of the bible that matches all of those MD5 hashes. In legal battles you have to pick things best with your overall strategy.
I can say it was an interesting ride. I watched people overdose daily during COVID in Federal Prison among all kinds of other insanity. I never had any kind of security issues as I was in a lower security facility and many of them, even the "gang banger" ones, were relatively business minded and interested in how the darknet was changing the global drug trade and genuinely happy to have someone they viewed as talented willing to drop knowledge on it. They also loved the fact I could look "street check" other claimed hackers and fraudsters. If they couldn't answer a few basic questions from me they likely were never in any kind of "game" more complex than physically stealing peoples cards. Combine that with the infinite amount of Android rooting related work that needs done to keep the behind-the-bars mobile network functioning LMAO and I was doing fine.
My first cell-mate type person thing was a doctor. His name was Kumar. His friend came over, who also was a doctor, and someone said "Oh hey Kumar" to him as well. At this point I was starting to assume they just call all Indian people Kumar, because I had witnessed similar things. However, they both happened to be named Kumar. Also, they delivered his mail to me wrongly one time on accident. No envelope, etc. just a piece of paper. It was a bill for tens of millions of dollars and most of it had been paid. At this point I had became conditioned to strange paperwork that attempted to tell me I was screwed, so at first I was like "Oh, yeah? The computers I hacked were $20+ million?" before realizing it was his mail. IIRC he was there because he was scamming Medicare for cosmetic surgery like boob jobs and stuff.
Normally in Federal Prison even if you are a terrible baby murdering criminal, you get a little time outside, etc. During COVID things were fucked and it was 24-0 in the same 4 rooms for 6 months for me. Thankfully I had a good DnD campaign to run and worked up to being able to do some physical challenges like 100 push-ups in a single set, which when starting at 30 is impressive.
Ah yes, and I had you guys! I forgot, as I have mentioned this a few times in passing, but I setup a Puppeteer script that would scrape a few sites like HN that I enjoy and would put them into a PDF. It ran the same thing that would happen if you clicked reader mode so that the page was easy to read and it put 4 on a single page and sent it to a friend who would print it and mail it for me weekly. I could have used an API to do this, but the mailing rules are specific and I didn't want to risk it. My friend helped out and mailed that stuff for me and I received the front page of HN along with the articles in a weekly digest format. Originally I didn't know what the mail rules were so it had all kinds of weird search/replace regexes to avoid OCR or something weird, so articles about "HACKER FINDS BLAH" would turn into "WACKER FINDS BLAH" to avoid my mail getting turned away. (This would not have mattered AFAIK)
About those TRULINCS computers. I had decided not to fuck with them much in terms of hacking them. I did get curious a few times and navigate through their boot menus and check a few things out and there were some demons there. Some of the boot was locked down, but PXE boot attacks would work. And before you think "how the fuck would you...", just know you can pay two fine-running Hispanic gentlemen to bring anything inside at 2X cost and mostly it's by weight because they run outside and grab it and run back inside! It can be done with a Raspberry Pi to simply spoof the PXE. Why would someone hack the TRULINCS computer if they have an Rpi? Well, you can basically sit at those computers and use them without concern from a staff member, whereas using a contraband device requires stashing it, hiding, etc.
The guards didn't go inside the building during COVID. Those guys came in a few times dressed head to toe in bullshit-ass-made Amazon hazmat suits for a few days, then stopped coming all together, then eventually would come through with gas masks on a bit here and there. My time was very short as my case took a long time to reach a plea as there were very low damages and various statutes need damages to trigger IIRC.
One time when they came to check my living quarters while waiting for trial they found a bin of old parts. One of them was a power supply unit (PSU). They spent a considerable amount of time trying to determine "how much data it had" and asked me many questions about it. I was not allowed to have a phone at the time, so I could not simply take the product code on the side of it and show them the online retailer specs, etc. Those guys were genuinely trying to decipher the fucking mystery of how many gigabytes were inside that power supply. I will never forget that, and neither should you!
The FBI got tired of talking to me pretty quickly. Most of my answers created more questions that were of no value to anyone. "Do you remember any passwords used?" - "No" - "You don't remember one of them?" - "I'm not sure I remember which passwords I remember, do you remember which ones you've forgotten?" - "Try to think of one" - Closes eyes "I have thought of passwords" - "Can you remember one now?" - "Remember which ones I forgot?"
I remember when the FBI started taking me to the wrong building accidentally because they missed their turn I specifically said I didn't mind because we can stop at McDonalds before we get back on the highway to the US Marshals.
Maybe it's because I only code for my own tools, but I still don't understand the benefit of relying on someone/something else to write your code and then reading it, understand it, fixing it, etc. Although asking an LLM to extract and find the thing I'm looking for in an API Doc is super useful and time saving. To me, it's not even about how good these LLMs get in the future. I just don't like reading other people's code lol.
Here are the cases where it helps me (I promise this isn't ai generated even though im using a list...)
- Formulaic code. It basically obviates the need for macros / code gen. The downside is that they are slower and you can't just update the macro and re-generate. The upside is it works for code that is slightly formulaic but has some slight differences across implementations that make macros impossible to use.
- Using apis I am familiar with but don't have memorized. It saves me the effort of doing the google search and scouring the docs. I use typed languages so if it hallucinates the type checker will catch it and I'll need to manually test and set up automated tests anyway so there are plenty of steps where I can catch it if it's doing something really wrong.
- Planning: I think this is actually a very under rated part of llms. If I need to make changes across 10+ files, it really helps to have the llm go through all the files and plan out the changes I'll need to make in a markdown doc. Sometimes the plan is good enough that with a few small tweaks I can tell the llm to just do it but even when it gets some things wrong it's useful for me to follow it partially while tweaking what it got wrong.
Edit: Also, one thing I really like about llm generated code is that it maintains the style / naming conventions of the code in the project. When I'm tired I often stop caring about that kind of thing.
> Using apis I am familiar with but don't have memorized
I think you have to be careful here even with a typed language. For example, I generated some Go code recently which execed a shell command and got the output. The generated code used CombinedOutput which is easier to used but doesn't do proper error handling. Everything ran fine until I tested a few error cases and then realized the problem. In other times I asked the agent to write tests cases too and while it scaffolded code to handle error cases, it didn't actually write any tests cases to exercise that - so if you were only doing a cursory review, you would think it was properly tested when in reality it wasn't.
This applies to AI, too, albeit in different ways:
1. You can iteratively improve the rules and prompts you give to the AI when coding. I do this a lot. My process is constantly improving, and the AI makes fewer mistakes as a result.
2. AI models get smarter. Just in the past few months, the LLMs I use to code are making significantly fewer mistakes than they were.
There are definitely dumb errors that are hard for human reviewers to find because nobody expects them.
One concrete example is confusing value and pointer types in C. I've seen people try to cast a `uuid` variable into a `char` buffer to, for example, memset it, by doing `(const char *)&uuid)`. It turned out, however, that `uuid` was not a value type but rather a pointer, and so this ended up just blasting the stack because instead of taking the address of the uuid storage, it's taking the address of the pointer to the storage. If you're hundreds of lines deep and are looking for more complex functional issues, it's very easy to overlook.
But my gripe with your first point is that by the time I write an exact detailed step-by-step prompt for them, I could have written the code by hand. Like there is a reason we are not using fuzzy human language in math/coding, it is ambiguous. I always feel like doing those funny videos where you have to write exact instructions on how to make a peanut butter sandwich, getting deliberately misinterpreted. Except it is not fun at all when you are the one writing the instructions.
2. It's very questionable that they will get any smarter, we have hit the plateau of diminishing returns. They will get more optimized, we can run them more times with more context (e.g. chain of thought), but they fundamentally won't get better at reasoning.
> by the time I write an exact detailed step-by-step prompt for them, I could have written the code by hand
The improved prompt or project documentation guides every future line of code written, whether by a human or an AI. It pays dividends for any long term project.
> Like there is a reason we are not using fuzzy human language in math/coding
The downside for formulaic code kinda makes the whole thing useless from my perspective, I can't imagining a case where that works.
Maybe a good case, that i've used a lot, is using "spreadsheet inputs" and teaching the LLM to produce test cases/code based on the spreadsheet data (that I received from elsewhere). The data doesn't change and the tests won't change either so the LLM definitely helps, but this isn't code i'll ever touch again.
> Maybe a good case, that i've used a lot, is using "spreadsheet inputs" and teaching the LLM to produce test cases/code based on the spreadsheet data (that I received from elsewhere)
This seems weird to me instead of just including the spreadsheet as a test fixture.
The spreadsheet in this case is human made and full of "human-like things" like weird formatting and other fluffiness that makes it hard to use directly. It is also not standardized, so every time we get it it is slightly different.
There is a lot of formulaic code that llms get right 90% of the time that are impossible to build macros for. One example that I've had to deal with is language bridge code for an embedded scripting language. Every function I want available in the scripting environment requires what is essentially a boiler plate function to be written and I had to write a lot of them.
There's also fuzzy datatype mapping in general, where they're like 90%+ identical but the remaining fields need minor special handling.
Building a generator capable of handling all variations you might need is extremely hard[1], and it still won't be good enough. An LLM will both get it almost perfect almost every time, and likely reuses your existing utility funcs. It can save you from typing out hundreds of lines, and it's pretty easy to verify and fix the things it got wrong. It's the exact sort of slightly-custom-pattern-detecting-and-following that they're good at.
1: Probably impossible, for practical purposes. It almost certainly makes an API larger than the Moon, which you won't be able to fully know or quickly figure out what you need to use due to the sheer size.
I get that reference! Having done this with Lua and C++, it’s easy to do, but just tedious repetition. Something that Swig could handle, but it adds so much extra code, plumbing and overall surface area for what amounts to just a few lines of glue code per function that it feels like overkill. I can definitely see the use for a bespoke code generator for something like that.
To be pedantic, OP wasn't referencing anything in the usual sense that we use it in (movie, comic, games references). They were more speaking from personal experience. In that sense, there's nothing to "reference" as such.
One of my most productive uses of LLMs was when designing a pipeline from server-side data to the user-facing UI that displays it.
I was able to define the JSON structure and content, the parsing, the internal representation, and the UI that the user sees, simultaneously. It was very powerful to tweak something at either end and see that change propagate forwards and backwards. I was able to hone in on a good solution much faster that it would have been the case otherwise.
As a personal anecdote I've tried to create Shell scripts for the testing of a public HTTP API that had pretty good documentation and in both cases the requests did not work. In one case it even hallucinated an endpoint.
plus 1 for using agents for api refresher and discovery. i also use regular search to find possible alternatives and about 3-4 out of 10 normal search wins.
Discovering private api using an agent is super useful.
I am beginning to love working like this. Plan a design for code. Explain to the LLM the steps to arrive to a solution. Work on reading, understanding, fixing, planing, ect. while the LLM is working on the next section of code. We are working in parallel.
Think of it like being a cook in a restaurant. The order comes in. The cook plans the steps to complete the task of preparing all the elements for a dish. The cook sears the steak and puts it in the broiler. The cook doesn't stop and wait for the steak to finish before continuing. Rather the cook works on other problems and tasks before returning to observe the steak. If the steak isn't finished the cook will return it to the broiler for more cooking. Otherwise the cook will finish the process of plating the steak with sides and garnishes.
The LLM is like the oven, a tool. Maybe grating cheese with a food processor is a better analogy. You could grate the cheese by hand or put the cheese into the food processor port in order to clean up, grab other items from the refrigerator, plan the steps for the next food item to prepare. This is the better analogy because grating cheese could be done by hand and maybe does have a better quality but if it is going into a sauce the grain quality doesn't matter so several minutes are saved by using a food processor which frees up the cook's time while working.
Professional cooks multitask using tools in parallel. Maybe coding will move away from being a linear task writing one line of code at a time.
I like your take and the metaphors are good at helping demonstrate by example.
One caveat I wonder about is how this kind of constant context switching combines with the need to think deeply (and defensively with non humans). My gut says I'd struggle at also being the brain at the end of the day instead of just the director/conductor.
I've actively paired with multiple people at once before because of a time crunch (and with a really solid team). It was, to this day, the most fun AND productive "I" have ever been and what you're pitching aligns somewhat with that. HOWEVER, the two people who were driving the keyboards were substantially better engineers than me (and faster thinkers) so the burden of "is this right" was not on me in the way it is when using LLMs.
I don't have any answers here - I see the vision you're pitching and it's a very very powerful one I hope is or becomes possible for me without it just becoming a way to burn out faster by being responsible for the deep understanding without the time to grok it.
> I've actively paired with multiple people at once
That was my favorite part of being a professional cook, working closely on a team.
Humans are social animals who haven't -- including how our brains are wired -- changed much physiologically in the past 25,000 years. Smart people today are not much smarter than smart people in Greece 3,000 years ago, except for the sample size of 8B people being larger. We are wired to work in groups like hunters taking down a wooly mammoth.[0]
Being wired to work in groups is different than being wired to clean up the mess left by a bunch of LLM agents.
I do this "let it go do the crap while I think about what to do next" somewhat frequently. But it's mostly for easy crap around the edges (making tools to futz with logs or metrics, writing queries, moving things around). The failure rate for my actual day job code just is too high, even for non-rocket-science stuff. It's usually more frustrating to spend 5 minutes chatting with the agent and then fixing it's stuff than to just spend 5 minutes writing the code.
Cause the bot has all the worst bits of human interactions - like ambiguous incomplete understanding - without the reward of building a long-term social relationship. That latter thing is what I'm wired for.
I have always found this idea of not being smarter somewhat baffling. Education makes people smarter does it not? At least that is one of the claims it makes. Do you mean that a baby hunter gatherer from 25000 years ago would be on average just as capable of learning stuff when integrated into society compared to someone born nowadays? For human beings 25.000 years is something like 1000 generations. There will be subtle vgenetic variations and evolutions on that scale of generations. But the real gains in "smartness" will be on a societal level. Remember: humans without society are not very different from "dumber" animals like apes and dogs.
You can see this very well with the cases of heavy neglect. Feral children are very animal-like and quite incapable of learning very effective...
i think the premise is if we plucked the average baby from 25,000 years and transported them magically into the present day, into a loving and nurturing environment, they would be just as “smart” as you and i.
there's intelligence and there's wisdom. I may know how, eg Docker works and an ancient Greek man may not, but I can't remember a 12 digit number I've only seen once, or multiply two three digit numbers in my head without difficulty.
I mean, how docker works (which is mostly a human construct with its own peculiarities) is not what I would use as an example - this is more like a board game that has its own rules and you just learnt them. Ancient people had their own "games" with rulesets. It's not a "fundamental truth".
Societal smartness might be something like an average student knowing that we are made from cells, some germ theory over bodily fluid inbalances causing diseases, etc, very crude understanding of more elements of physics (electronics). Though unfortunately intellectualism is on a fall, and people come out dumber and dumber from schools all over the world.
what if we actually get dumber? There are multiple cases of people in the past that are way smarter than the current thought leaders and inventors. There are a higher % of smart people nowadays but are they smarter than Leonardo Da Vinci?
> Neuroplasticity is the brain’s remarkable ability to adapt its structure and function by rewiring neural connections in response to learning, experience, or injury.
The invention and innovation of language, agriculture, writing, and mathematics has driven the change in neuroplasticity remodeling, but the overall structure of the brain hasn't changed.
Often in modern societal structures there has been pruning of intellectuals, i.e. the intelligent members of a society are removed from the gene pool, sent to Siberia. However, that doesn't stop the progeneration of humans capable of immense intelligence with training and development, it only removes the culture being passed down.
And, I say, with strong emphasis, not only has the brain of humans been similar for 25,000 years, the potential for sharpening our abilities in abstract reasoning, memory, symbolic thought, and executive control is *equal* across all sexes and races in humans today. Defending that statement is a hill I'm willing to die on.
You are just looking at the wrong people to compare.
Leonardo Da Vinci would be a PhD student working on some obscure sub-sub-sub field of something and only 6 other people on the world understanding how marvelously genius he is. The reason they don't get to such a status is that human knowledge is like a circle. A single person can work on the circumference of this circle, but they are limited by what they can learn of this circle. As society improved, we have expanded the radius of the circle greatly, and now an expert can only be an expert in a tiny tiny blob on the circumference, while Leonardo could "see" a good chunk of the whole circle.
---
"Thought leader and inventor" are VC terms of no substance and are 100% not who I would consider smart people on average. Luck is a much more common attribute among them.
Well, you might not have got my point. Those "smart" PhD students would be considered quite dumb in other ages, because working on the circumference of the circle doesn't make one smart but it might get you a big salary in a VC project
On one codebase I work with, there are often tasks that involve changing multiple files in a relatively predictable way. Like there is little creativity/challenge, but a lot of typing in multiple parts/files. Tasks like these used to take 3-4 hours complete before just because I had to physically open all these files, find right places to modify, type the code etc. With AI agent I just describe the task, and it does the job 99% correct, reducing the time from 3-4 hours to 3-4 minutes.
You mean to future proof the code so requirements changes are easy to implement? Yeah, I've seen lots of code like that (some of it written by myself). Usually the envisioned future never materializes unfortunately.
There’s always a balance to be struck when avoiding premature consolidation of repeated code. We all face the same issue as osigurdson at some point and the productive responses fall in a range.
If you have some idea of what future changes may be seen, it is fine to design for that. However, it is impossible to design a codebase to handle any change. Realistically, just doing the absolute bare minimum is probably the best defence in that situation.
It's a monorepo with backend/frontend/database migrations/protobufs. Could you suggest how exactly should I refactor it so I don't need to make changes in all these parts of the codebase?
I wouldn't try to automate the DB part, but much like the protobufs code is generated from a spec, you can generate other parts from a spec. My current company has a schema repo used for both API and kafka type generation.
This is a case where a monorepo should be a big advantage, as you can update everything with a single change.
It's funny, but originally I had written a codegenerator that just reads protobuf and generates/modifies code in other parts. It's been ok experience until you hit another corner case (especially in UI part) and need to spend another hours improving codegenerator. But since after AI coding tools became better I started delegating this part to AI increasingly more, and now with agentic AI tools it became way more efficient than keeping maintaining codegenerator. And you're right about DB part - again, now with task description it's a no brainer to tell it which parts shouldn't be touched.
A lot of that is inherent in the framework. eg Java and Go spew boilerplate. LLMs are actually pretty good at generating boilerplate.
See, also, testing. There's a lot of similar boilerplate for testing. Giving LLMs a list of "Test these specific items, with this specific setup, and these edge cases." I've been pretty happy writing a bulleted outline of tests and getting ... 85% complete code back? You can see a pretty stark line in a codebase I work on where I started doing this vs comprehensiveness of testing.
Amusingly, cursor took 5 minutes trying to figure out how to do what a simple global find/replace did for me in 30 seconds after I got tired of waiting for it's attempt just last night on a simple predictable lots-of-files change.
A 60x speedup is way more than I've seen even in its best case for things like that.
In my experience, two things makes a big difference for AI agents: quality of code (naming and structure mostly) and AI-friendly documentation and tasks planning. For example, in some repos I have legacy naming that evolved after some refactoring, and while devs know that "X means Y", it's not easy for AI to figure it out unless explicitly documented. I'm still learning how to organize AI-oriented codebase documentation and planning tools (like claude task master), but they do make a big difference indeed.
This was "I want to update all the imports to the new version of the library, where they changed a bit in the fully qualified package name." Should be a super-trivial change for the AI agent to understand.
Like I mentioned, it's literally just global find and replace.
Slightly embarrassing thing to have even asked Cursor to do for me, in retrospect. But, you know, you get used to the tool and to being lazy.
So you went from being able to handle at most 10 or so of these tasks you often get per week, to >500/week. Did you reap any workplace benefits from this insane boost in productivity?
My house has never been cleaner. I have time to catch up on chores that I normally do during the weekend. Dishes, laundry, walk the dog more.
It seems silly but it’s opened up a lot of extra time for some of this stuff. Heck, I even play my guitar more, something I’ve neglected for years. Noodle around while I wait for Claude to finish something and then I review it.
All in all, I dig this new world. But I also code JS web apps for a living, so just about the easiest code for an LLM to tackle.
EDIT: Though I think you are asking about work specifically. i.e., does management recognize your contributions and reward you?
For me, no. But like I said, I get more done at work and more done at home. It’s weird. And awesome.
That doesn't sound like a situation that will last. If management figures out you are using this extra time to do all your chores, they aren't gonna reward you. They might decide to get someone who would use the extra time to do more work...
So much of what people hyping AI write in this forums boils down to "this vendor will keep making this tool better forever and management will let me keep the productivity gains".
Experience shows otherwise. Urging me to embrace a new way of building software that is predicated on benevolent vendors and management seems hostile to me.
I think it is very simple to draw the line at "something that tries to write for you", you know, an agent by definition. I am beginning to realize people simply would prefer to manage, even if the things they end up managing aren't actually humans. So it creates a nice live action role-play situation.
A better name for vibecoding would be larpcoding, because you are doing a live action role-play of managing a staff of engineers.
Now not only even a junior engineer can become a manager, they will start off their careers managing instead of doing. Terrifying.
It’s not a clear line though. Compilers have been writing programs for us. The plaintext programming language code that we talk about is but a spec for the actual program.
From this perspective, English-as-spec is a natural progression in the direction we’ve been going all along.
I felt the same way until recently (like last Friday recently). While tools like Windsurf / Cursor have some utility, most of the time I am just waiting around for them while I get to read and correct the output. Essentially, I'm helping out with the training while paying to use the tool. However, now that Codex is available in ChatGPT plus, I appreciate that asynchronous flow very much. Especially for making small improvements , fixing minor bugs, etc. This has obvious value imo. What I like to do is queue up 5 - 10 tasks and the. focus on hard problems while it is working away. Then when I need a break I review / merge those PRs.
I kinda consider it a P!=nP type thing. If I need to write a simple function, it will almost always take me more time to implement it than it will to verify if an implementation of it suits my needs. There are exceptions, but overall when coding with LLMs this seems to hold true. Asking the LLM to write the function then checking it's work is a time saver.
I think this perspective is kinda key. Shifting attention towards more and better ways to verify code can probably lead to improved quality instead of degraded.
I see it as basically Cunningham's Law. It's easier to see the LLM's attempt a solution and how it's wrong than to write a perfectly correct solution first time.
> I still don't understand the benefit of relying on someone/something else to write your code and then reading it, understand it, fixing it, etc.
Friction.
A lot of people are bad at getting started (like writer's block, just with code), whereas if you're given a solution for a problem, then you can tweak it, refactor it and alter it in other ways for your needs, without getting too caught up in your head about how to write the thing in the first place. Same with how many of my colleagues have expressed that getting started on a new project from 0 is difficult, because you also need to setup the toolchain and bootstrap a whole app/service/project, very similar to also introducing a new abstraction/mechanism in an existing codebase.
Plus, with LLMs being able to process a lot of data quickly, assuming you have enough context size and money/resources to use that, it can run through your codebase in more detail and notice things that you might now, like: "Oh hey, there are already two audit mechanisms in the codebase in classes Foo and Bar, we might extract the common logic and..." that you'd miss on your own.
As a senior developer you already spend a significant amount of time planning new feature implementations and reviewing other people's code (PRs). I find that this skill transitions quite nicely to working with coding agents.
I don't disagree but... wouldn't you rather be working with actual people?
Spending the whole day chatting with AI agents sounds like a worst-of-both-worlds scenarios. I have to bring all of my complex, subtle soft skills into play which are difficult and tiring to use, and in the end none of that went towards actually fostering real relationships with real people.
At the end of the day, are you gonna have a beer with your agents and tell them, "Wow, we really knocked it out of the park today?"
Spending all day talking to virtual coworkers is literally the loneliest experience I can imagine, infinitely worse than actually coding in solitude the entire day.
It's a double-edged sword. AI agents don't have a long-term context window that gets better over time. People who employ AI agents today instead of juniors are going to find themselves in another local maximum: yes, the AI agent will make you more productive today compared to a junior, but (as the tech stands today) you will never be able to promote an AI agent to senior or staff, and you will not get to hire out an army of thousands of engineers that lets you deliver the sheer throughput that FAANG / Fortune 500 are capable of. You will be stuck at some shorter level of feature-delivery capacity.
Right. So many of these agentic UX stories describe it like, "I do a bunch of code reviews for my junior engineer minions."
But when I do code reviews, I don't enjoy reviewing the code itself at all. The enjoyment I get out of the process comes from feeling like I'm mentoring an engineer who will remember what I say in the code review.
If I had to spend a month doing code reviews where every single day I have to tell them the exact same corrections, knowing they will never ever learn, I would quit my job.
Being a lead over an army of enthusiastic interns with amnesia is like the worst software engineering job I can imagine.
Unless the underlying AI agent models continue to improve over time.
Isn’t that the mantra of all AI CEOs, that we are simply riding the wave of technological progress.
My employer can't go out and get me three actual people to work under me for $30 a month.
EDIT: You can quibble on the exact rate of people's worth of work versus the cost of these tools, but look at what a single seat on Copilot or Cursor or Windsurf gets you, and you can see that if they are only barely more productive than you working without them, the economics are it's cheaper to "hire" virtual juniors than real juniors. And the virtual juniors are getting better by the month, go look at the Aider leaderboards and compare recent models to older ones.
That's fair but your experience at the job is also part of the compensation.
If my employer said, "Hey, you're going to keep making software, but also once a day, we have to slap you in the face." I might choose to keep the job, but they'd probably have to pay me more. They're making the work experience worse and that lowers my total compensation package.
Shepherding an army of artificial minions might be cheaper for the corporation, but it sounds like an absolutely miserable work experience so if they were offering me that job, they'd have to pay me more to take.
You will hit two problems in this "only hire virtual juniors" thing:
* the wall of how much you can review in one day without your quality slipping now that there's far less variation in your day
* the long-term planning difficulties around future changes when you are now the only human responsible for 5-20x more code surface area
* the operational burden of keeping all that running
The tools might get good enough that you only need 5 engineers to do what used to be 10-20. But the product folks aren't gonna stop wanting you to keep churning out the changes, and the last 2 years of evolution of these models doesn't seem like it's on a trajectory to cut that down to 1 (or 0) without unforeseen breakthroughs.
I'm categorizing my expenses. I asked the code AI to do 20 at a time, and suggest categories for all of them in an 800 line file. I then walked the diff by hand correcting things. I then asked it to double check my work. It did this in a 2 column cav mapping.
It could do this in code. I didn't have to type anywhere near as much and 1.5 sets of eyes were on it. It did a pretty accurate job and the followup pass was better.
This is just an example I had time to type before my morning shower
You’re clinging to an old model of work. Today an LLM converted my docker compose infrastructure to Kubernetes, using operators and helm charts as needed. It did in 10 minutes what would take me several days to learn and cobble together a bad solution. I review every small update and correct it when needed. It is so much more productive. I’m driving a tractor while you are pulling an ox cart.
“ It did in 10 minutes what would take me several days to learn and cobble together a bad solution.”
Another way to look at this is you’re outsourcing your understanding to something that ultimately doesn’t think.
This means 2 things: your solution could be severely suboptimal in multiple areas such as security and two because you didn’t bother understanding it yourself you’ll never be able to identify that.
You might think “that’s fine, the LLM can fix it”. The issue with that is when you don’t know enough to know something needs to be fixed.
So maybe instead of carts and oxen this is more akin to grandpa taking his computer to Best Buy to have them fix it for him?
Senior engineers delegate to junior engineers, which have all the same downsides you described, all the time. This pattern seems to work fine for virtually every software company in existence.
> Another way to look at this is you’re outsourcing your understanding to something that ultimately doesn’t think.
You read this quote wrong. Senior devs outsource _work_ to junior engineers, not _understanding_. The way they became senior in the first place is by not outsourcing work so they could develop their understanding.
I read the quote just fine. I don't understand 100% of what my junior engineers do. I understand a good chunk, like 90-95% of it, but am I really going to spend 30 minutes trying to understand why that particular CSS hack only works with `rem` and not `px`? Of course not - if I did that for every line of code, I'd never get anything done.
My take from this comment is that maybe you do not understand it as well as you think you do.
Claiming that "other modern infrastructure" is easier to understand than CSS is wild to me. Infrastructure includes networking and several protocol, authentication and security in many ways, physical or virtual resources and their respective capabilities, etc etc etc.
In what world is all of that more easy than understanding CSS?
When did I say I was blindly allowing an AI to set up my docker infrastructure? Obviously I wouldn't delegate that to a junior. My goalposts have always been in the same place - perhaps you're confusing them with someone else's goalposts.
Comparing apples to oranges in your response but I’ll address it anyway.
I see this take brought up quite a bit and it’s honestly just plain wrong.
For starters Junior engineers can be held accountable. What we see currently is people leaving gaping holes in software and then pointing at the LLM which is an unthinking tool. Not the same.
Juniors can and should be taught as that is what causes them to progress not only in SD but also gets them familiar with your code base. Unless your company is a CRUD printer you need that.
More closely to the issue at hand this is assuming the “senior” dev isn’t just using an LLM as well and doesn’t know enough to critique the output. I can tell you that juniors aren’t the ones making glaring mistakes in terms of security when I get a call.
So, no, not the same. The argument is that you need enough knowledge of the subject call bs to effectively use these tools.
> For starters Junior engineers can be held accountable. What we see currently is people leaving gaping holes in software and then pointing at the LLM which is an unthinking tool. Not the same.
This is no different than, say, the typical anecdote of a junior engineer dropping the database. Should the junior be held accountable? Of course not - it's the senior's fault for allowing that to happen at the first place. If the junior is held accountable, that would more be an indication of poor software engineering practices.
> More closely to the issue at hand this is assuming the “senior” dev isn’t just using an LLM as well and doesn’t know enough to critique the output.
This seems to miss the point of the analogy. A senior delegating to a junior is akin to me delegating to an LLM. Seniors have delegated to juniors long before LLMs were a twinkle in Karpathy's eye.
> This is no different than, say, the typical anecdote of a junior engineer dropping the database. Should the junior be held accountable? Of course not - it's the senior's fault for allowing that to happen at the first place. If the junior is held accountable, that would more be an indication of poor software engineering practices.
Of course the junior should be held accountable, along with the senior. Without accountability, what incentive do they have to not continue to fuck up?
Dropping the database is an extreme example because it's pretty easy to put in checks that should make that impossible. But plenty of times I've seen juniors introduce avoidable bugs simply because they did not bother to test their code -- that is where teaching accountability is a vital part of growth as an engineer.
No one is an expert on all the things. I use libraries and tools to take care of things that are less important. I use my brain for things that are important. LLMs are another tool, more flexible and capable than any other. So yes, grandpa goes to Best Buy because he’s running his legal practice and doesn’t need to be an expert on computers.
I am pretty confident that my learnings have massively sped up working together with LLMs. I can build so much more and learn through what they are putting out. This goes to so many domains in my life now, it is like I have this super mentor. It is DIY house things, smart home things, hardware, things I never would have been confident to work with otherwise. I feel like I have been massively empowered and all of this is so exciting. Maybe I missed a mentor type of guidance when I was younger to be able to do all DYI stuff, but it is definitely sufficient now. Life feels amazing thanks to it honestly.
If there's something that you don't understand, ask the LLM to explain it to you. Drill into the parts that don't make sense to you. Ask for references. One of the big advantages of LLMs over, say, reading a tutorial on the web is that you can have this conversation.
But you would have learned something if you invested the time. Now when your infra blows up you have no idea what to fix and will go fishing into the LLM lake to find how to fix it
If it would have taken you days to learn about the topic well enough to write a bad implementation, how can you have any confidence you can evaluate, let alone "correct", one written by an LLM?
I think this fits squarely with the idea that LLM today is a great learning tool; learning through practice has always been a proven way to learn but a difficult method to learn from fixed material like books.
LLM is a teacher that can help you learn by doing the work you want to be doing and not some fake exercise.
The more you learn though, the more you review the code produced by the LLM and the more you'll notice that you are still able to reason better than an LLM and after your familiarity with an area exceeds the capabilities of the LLM the interaction with the LLM will bring diminishing returns and possibly the cost of babysitting that eager junior developer assistant may become larger than the benefits.
But that's not a problem, for all areas you master there will be hundreds of other areas you haven't mastered yet or ever will and for those things the LLM we have already today are of immediate help.
All this without even having to enter the topic of how coding assistants will improve in the future.
TL;DR
Use a tool when it helps. Don't use it when it doesn't. It pays to learn to use a tool so you know when it helps and when it doesn't. Just like every other tool
if you work on a team most code you see isn’t yours.. ai code review is really no different than reviewing a pr… except you can edit the output easier and maybe get the author to fix it immediately
Reviewing code is harder than writing code. I know staff engineers that can’t review code. I don’t know where this confidence that you’ll be able to catch all the AI mistakes comes from.
I was about to say exactly this—it's not really that different from managing a bunch of junior programmers. You outline, they implement, and then you need to review certain things carefully to make sure they didn't do crazy things.
But yes, these juniors take minutes versus days or weeks to turn stuff around.
> if you work on a team most code you see isn’t yours.. ai code review is really no different than reviewing a pr… except you can edit the output easier and maybe get the author to fix it immediately
And you can't ask "why" about a decision you don't understand (or at least, not with the expectation that the answer holds any particular causal relationship with the actual reason)... so it's like reviewing a PR with no trust possible, no opportunity to learn or to teach, and no possibility for insight that will lead to a better code base in the future. So, the exact opposite of reviewing a PR.
Are you using the same tools as everyone else here? You absolutely can ask "why" and it does a better job of explaining with the appropriate context than most developers I know. If you realize it's using a design pattern that doesn't fit, add it to your rules file.
You can ask it "why", and it gives a probable English string that could reasonably explain why, had a developer written that code, they made certain choices; but there's no causal link between that and the actual code generation process that was previously used, is there? As a corollary, if Model A generates code, Model A is no better able to explain it than Model B.
I think that's right, and not a problem in practice. It's like asking a human why: "because it avoids an allocation" is a more useful response than "because Bob told me I should", even if the latter is the actual cause.
> I think that's right, and not a problem in practice. It's like asking a human why: "because it avoids an allocation" is a more useful response than "because Bob told me I should", even if the latter is the actual cause.
Maybe this is the source of the confusion between us? If I see someone writing overly convoluted code to avoid an allocation, and I ask why, I will take different actions based on those two answers! If I get the answer "because it avoids an allocation," then my role as a reviewer is to educate the code author about the trade-off space, make sure that the trade-offs they're choosing are aligned with the team's value assessments, and help them make more-aligned choices in the future. If I get the answer "because Bob told me I should," then I need to both address the command chain issues here, and educate /Bob/. An answer is "useful" in that it allows me to take the correct action to get the PR to the point that it can be submitted, and prevents me from having to make the same repeated effort on future PRs... and truth actually /matters/ for that.
Similarly, if an LLM gives an answer about "why" it made a decision that I don't want in my code base that has no causal link to the actual process of generating the code, it doesn't give me anything to work with to prevent it happening next time. I can spend as much effort as I want explaining (and adding to future prompts) the amount of code complexity we're willing to trade off to avoid an allocation in different cases (on the main event loop, etc)... but if that's not part of what fed in to actually making that trade-off, it's a waste of my time, no?
Right. I don't treat the LLM like a colleague at all, it's just a text generator, so I partially agree with your earlier statement:
> it's like reviewing a PR with no trust possible, no opportunity to learn or to teach, and no possibility for insight that will lead to a better code base in the future
The first part is 100% true. There is no trust. I treat any LLM code as toxic waste and its explanations as lies until proven otherwise.
The second part I disagree somewhat. I've learned plenty of things from AI output and analysis. You can't teach it to analyze allocations or code complexity, but you can feed it guidelines or samples of code in a certain style and that can be quite effective at nudging it towards similar output. Sometimes that doesn't work, and that's fine, it can still be a big time saver to have the LLM output as a starting point and tweak it (manually, or by giving the agent additional instructions).
Oh, it can infer quite a bit. I've seen many times in reasoning traces "The user is frustrated, understandably, and I should explain what I have done" after an exasperated "why???"
>And you can't ask "why" about a decision you don't understand (or at least, not with the expectation that the answer holds any particular causal relationship with the actual reason).
To be fair, humans are also very capable of post-hoc rationalization (particularly when they're in a hurry to churn out working code).
Just to draw a parallel (not to insult this line of thinking in any way): “ Maybe it's because I only code for my own tools, but I still don't understand the benefit of relying on someone/something else to _compile_ your code and then reading it, understand it, fixing it, etc”
At a certain point you won’t have to read and understand every line of code it writes, you can trust that a “module” you ask it to build works exactly like you’d think it would, with a clearly defined interface to the rest of your handwritten code.
> At a certain point you won’t have to read and understand every line of code it writes, you can trust that a “module” you ask it to build works exactly like you’d think it would, with a clearly defined interface to the rest of your handwritten code.
"A certain point" is bearing a lot of load in this sentence... you're speculating about super-human capabilities (given that even human code can't be trusted, and we have code review processes, and other processes, to partially mitigate that risk). My impression was that the post you were replying to was discussing the current state of the art, not some dimly-sensed future.
I think there are 2 types of software engineering jobs: the ones where you work on a single large product for a long time, maintaining it and adding features, and the ones that spit out small projects that they never care for again.
The latter category is totally enamored with LLMs, and I can see the appeal: they don't care at all about the quality or maintainability of the project after it's signed off on. As long as it satisfies most of the requirements, the llm slop / spaghetti is the client's problem now.
The former category (like me, and maybe you) see less value from the LLMs. Although I've started seeing PRs from more junior members that are very obviously written by AI (usually huge chunks of changes that appear well structured but as soon as you take a closer look you realize the "cheerleader effect"... it's all AI slop, duplicated code, flat-out wrong with tests modified to pass and so on) I still fail to get any value from them in my own work. But we're slowly getting there, and I presume in the future we'll have much more componentized code precisely for AIs to better digest the individual pieces.
> I just don't like reading other people's code lol.
Do you work for yourself, or for a (larger than 1 developer) company? You mention you only code for your own tools, so I am guessing yourself?
I don't necessarily like reading other people's code either, but across a distributed team, it's necessary - and sometimes I'm also inspired when I learn something new from someone else. I'm just curious if you've run into any roadblocks with this mindset, or if it's just preference?
Fast prototyping for code I'll throw away anyway. Sometimes I just want to get something to work as a proof of concept then I'll figure out how to productionize it later.
It is just faster and less effort. I can't write code as quickly as the LLM can. It is all in my head, but I can't spit it out as quickly. I just see LLMs as getting what is in my head quickly out there. I have learned to prompt it in such a way that I know what to expect, I know its weakspots and strengths. I could predict what it is going output, so it is not that difficult to understand.
Yes, the eureka moment with LLMs is when they started outputting the things I was beginning to type. Not just words but sentences, whole functions and even unit tests. The result is the same as I would have typed it, just a lot faster.
> I still don't understand the benefit of relying on someone/something else to write your code and then reading it
Maybe the key is this: our brains are great at spotting patterns, but not so great at remembering every little detail. And a lot of coding involves boilerplate—stuff that’s hard to describe precisely but can be generated anyway. Even if we like to think our work is all unique and creative, the truth is, a lot of it is repetitive and statistically has a limited number of sound variations. It’s like code that could be part of a library, but hasn’t been abstracted yet. That’s where AI comes in: it’s really good at generating that kind of code.
It’s kind of like NP problems: finding a solution may take exponentially longer, but checking one takes only polynomial time. Similarly, AI gives us a fast draft that may take a human much longer to write, and we review it quickly. The result? We get more done, faster.
Copy and paste gives us a fast draft of repetitive code. That’s never been the bottle neck.
The bottle neck is in the architecture and the details. Which is exactly what AI gets wrong, and which is why any engineer who respects his craft sees this snake oil for what it is.
It's an intentional (hopefully) tradeoff between development speed and deep understanding. By hiring someone or using an agent, you are getting increased speed for decreased understanding. Part of choosing whether or not to use an agent should include an analysis of how much benefit you get from a deep understanding of the subsystem you're currently working on. If it's something that can afford defects, you bet I'll get an agent to do a quick-n-dirty job.
> I just don't like reading other people's code lol.
I agree entirely and generally avoided LLMs because they couldn't be trusted. However a few days ago i said screw it and purchased Claude Max just to try and learn how i can use LLMs to my advantage.
So far i avoid it for things where they're vague, complex, etc. The effort i have to go through to explain it exceeds my own in writing it.
However for a bunch of things that are small, stupid, wastes of time - i find it has been very helpful. Old projects that need to migrate API versions, helper tools i've wanted but have been too lazy to write, etc. Low risk things that i'm too tired to do at the end of the day.
I have also found it a nice way to get movement on projects where i'm too tired to progress on after work. Eg mostly decision fatigue, but blank spaces seem to be the most difficult for me when i'm already tired. Planning through the work with the LLM has been a pretty interesting way to work around my mental blocks, even if i don't let it do the work.
This planning model is something i had already done with other LLMs, but Claude Code specifically has helped a lot in making it easier to just talk about my code, rather than having to supply details to the LLM/etc.
It's been far from perfect of course, but i'm using this mostly to learn the bounds and try to find ways to have it be useful. Tricks and tools especially, eg for Claude adding the right "memory" adjustments to my preferred style, behaviors (testing, formatting, etc) has helped a lot.
I'm a skeptic here, but so far i've been quite happy. Though i'm mostly going through low level fruit atm, i'm curious if 20 days from now i'll still want to renew the $100/m subscription.
I use it almost like an RSI mitigation device, for tasks I can do (and do well) but don't want to do anymore. I don't want to write another little 20-line script to format some data, so I'll have the machine do it for me.
I'll also use it to create basic DAOs from schemas, things like that.
If you give a precise enough spec, it's effectively your code, with the remaining difference being inconsequential. And in my experience, it is often better, drawing from a wider pool of idioms.
No different than most practices now. PM write a ticket, dev codes it, PRs it, then someone else reviews it. Not a bad practice. Sometimes a fresh set of eyes really helps.
I am not too familiar with software development inside large organizations as I work for myself - are there any of those steps the AI cannot do well? I mean it seems to me that if the AI is as good as humans at text based tasks you could have an entire software development process with no humans. I.e. user feedback or error messages go to a first LLM that writes a ticket. That ticket goes to a second LLM that writes code. That code goes to a 3rd LLM that reviews the code. That code goes through various automated tests in a CI / CD pipeline to catch issues. If no tests fail the updated software is deployed.
You could insert sanity checks by humans at various points but are any of these tasks outside the capabilities of an LLM?
Well, this tipped the scale, and I just subscribed. Honestly, so refreshing to have a normal search engine after 2 years of nonstop AI crap thrown at my face.
reply