Hacker News new | past | comments | ask | show | jobs | submit login
The LLM Curve of Impact on Software Engineers (serce.me)
64 points by todsacerdoti 3 months ago | hide | past | favorite | 24 comments



Perhaps the meaningful axis is not around seniority, per se, but instead around experimentation as OP alludes to.

By virtue of being a junior, as you're just picking things up, you have to be flexible, malleable and open to new things. Anything that seems to work is amazing. Then you start to form the "it works" vs. "it's built well" intuition. Then your skills and opinions start to calcify. Your beliefs become firmer, prescriptive, pedantic, maybe petty. Good code becomes a fixed definition in your head. And yes, you can be very productive within your niche, but there's a quick fall-off outside of it. But then, as you grow even more, you realize the beliefs you held dear are actually fluid. You start to foster an experimental side, trying things, dabbling, hacking, building little proofs-of-concept and many half-finished seedlings of ideas. This, which the OP identifies as 'Staff level', is for me just a kind of blossoming of the programming and hacking mindset.

With LLMs, you have to be open to learning their language, their biases. You have to be able to figure out how to get the best out of them, as they will almost never slot easily into how you currently work. You have to grow strong intuitions about context, linguistics, even a kind of theory-of-mind. The latest "Wait" paper from stanford shows us how such tiny inflections or insertions can lead to drastically superior results. These little learnings are borne of experimentation. Trying new tools and workflows, is as well, utterly vital, but per every emacs/vim or tabs/spaces debate, we see that people don't do well to branch outside of their workflows. They have their chosen way, and don't want to give others the time of day.


It might be referring to job role rather than "age." A Staff Engineer is supposed to be inspiring (both up and down,) so doing quick tests fits the bill, and that's something LLMs are great at supporting. Mid-level is mostly about delivering reliably.

The question is if LLMs/tools could drag mid-level earlier into senior level, or if it's a phase one has to go through. Ultimately, the tangible promise of LLMs is to push the entire timeline up, so that junior tasks are automated and you go in and control the LLMs. Expectations on what it means to be a software engineer are sure to change, at some point. (I like the software craft as-is, but fact is most of our lumber is straighter now that we have automatic saws. And I've never heard a carpenter pleading to split a log manually.)


> I've never heard a carpenter pleading to split a log manually

Log splitting is a lumber mills job which has been around for nearly 2000 years. Maybe a better analogy is a nail gun? Which is actually insteresting because some of the old timers I met are actually as fast or faster with a hammer and nail as a nail gun. And hammers are still used everywhere daily by woodworkers and carpenters.

But the challenges of a carpenter are more about problem solving than brute force. How you join and the order you join can make your life a pita if you don’t have the experience. So a junior carpenter might be implementing repetive tasks or follow directions but you need experience to know how to implement a unique solution correctly on the first try and not waste hundreds of dollars of material or the clients time. Afterall “measure twice cut once” has to be learned.


Thanks. I had not seen the wait paper yet. That is crazy.

s1: Simple test-time scaling https://arxiv.org/pdf/2501.19393


This has been exactly my experience. In my work as a senior SWE in my day-to-day areas of responsibility, if I’m stumped, then an LLM won’t even get close.

But when I experiment with new libraries and toolchains as a beginner, LLMs are like a private tutor. They can bring one up to lower mid level of experience pretty quickly.

The knowledge gradient between the user and the LLM is important.

I’m not sure if I’d say LLMs become useful again at a higher level of expertise/mastery though.


TFA is presenting one dimension (junior to staff), but actually appears to be talking about three dimensions:

1. Title - which comes with different activities and quality standards to some degree. (I don't fully agree with the definitions, but to be fair, they're quite different from company to company.)

2. Familiarity with the code base you're working on.

3. Familiarity with the tech stack currently used.

I think the strongest correlation between these and LLM usefulness is (1) and (3).

For (1), they can be useful for churning out something quickly (if quality is not a big issue, e.g. experimentation), which I personally rarely do, because good tooling makes it easy enough to build an MVP in a stack you're familiar with. If I'm not familiar with a stack I have to learn, I use this kind of thing as an opportunity to do just that quite often. And in many cases, I look for ways to experiment without writing any code. I can see how it's controversial.

I personally find them most useful for learning a new stack (3) - armed with a search engine (and ideally someone with experience I can talk to) in addition. This seems comparatively uncontroversial.

For (2), understanding a large code base quicky, I'm pretty bearish. There won't be a lot of SO posts and the like about a company's internal code base, so the results are very mixed in my experience. I'd call this part highly controversial, and the kind of thing where those who don't know the code base find it to be magic, those who do know it find the results terrible.


I’ve asked windsurf to analyze some medium sized GitHub project for me, generate architecture documents including mermaid diagrams. Mostly did a pretty good job helping me understand the code bases.


what's medium size? how many files (lines)? what's the limit for Windsurf?


I tried it on benthos (redpanda connect) for which it seemed to create correct technical diagrams. This article discusses some limitations to the search agents of both windsurf and cursor: https://www.pixelstech.net/article/1734832711-understanding-... I would expect that this is an area where both will see improvements in the future, so the article may be outdated already.


This hits the nail on the head for me. Low complexity prototypes and modules are something o1 or R1 can reason well about. For more complex coding tasks, we‘d need IMO a feedback loop with a linter and a compiler, and maybe even a debugger and CI workflows. Basically an agent that works exactly like I do.

The reasoning process is nice, because the CoT can fact check itself. But it can’t catch compiler bugs across multiple files/modules, because an LLM is not a compiler.


This exists though. Cline, Aider, Cursor, etc. all hook into the diagnostics of the IDE and can run arbitrary commands (including compilers) and feed outouts back to the LLM. Take Cline, it has a planning mode that you can plug a reasoning model into and that will outline the best path to implementation. Then you switch to action mode where a (usually) completion model takes over, implements in multiple steps and validated diagnostics (or compilation output) as it implements the plan.


Correct, same with windsurf, you can ask it to first generate unit tests, then the code, and it will run the test, analyze the results, comment where and why it made mistakes, improve its own code, and repeat the tests until it works.

What I’ve also found very useful in windsurf is to have it analyze existing code bases, you can ask it to generate markdown documentation for specific functionality, including mermaid diagrams.

Also helps to have it first generate design documents, improve the designs until satisfactory, and then have it generate the tests and code accordingly.



Doesn’t work for me for Swift development yet. Should have been more specific.

I haven’t seen a tool that can debug btw


I've had cursor suggest changes for debugging (like prints), you do have to copy paste the results though.

I've also had cursor generate a CURL request to test an endpoint and determine what responses it had so it could generate a struct to store the result.

So I'd say they are getting there.


openhands can do printf debugging


I agree current LLM tech is the most useful for people that see the possibility but not the ability in terms of time or competence to get there. Things like excel formulas regex patterns that would take me sometimes hours to write are usually just a question away.

Though I still sometimes have a hard time writing my thoughts and logic out completely as I do not have a programing background nor was I very good at maths so my code probably would be crap for most programers.


Good article, very thought provoking. Here are some other dimensions that affect usefulness / value of adding LLMs to a workflow. LLMs can be amazing, or worse than useless depending where you land on these - regardless of seniority.

- are you working in a domain and language adequately represented in the training data? For example, it’s a lot easier to prompt an LLM to do what you want in a React CRUD website than a Swift app with obscure APIs from after the knowledge cutoff. No doubt LLMs will eventually generalize so this stops being troublesome, but today with o3-mini, R1 et al you need a LOT of docs in the context to get good results outside the mainstream.

- greenfield codebase or big hairy codebase full of non-industry-standard architecture & conventions?

- how closely does your code formatting align to industry standard? (needing bespoke formatting, line breaks at column N, extra lint rules etc is distraction for LLMs)

Codebases should be moved toward industry standards so LLMs can be leveraged more effectively. Could be a good (if indirect) way for mid or senior engs to help their junior counterparts.


I'd very much hope everyone at all skill levels is doing experimenting. Aside from that it seems like a plausible take from the individuals PoV.

There is another aspect though - senior people tend to be the most set in their way & also most at risk of their hard won experience losing value. That dramatically influences how open people are to accepting change.


> Almost two years ago now, I was exploring

If a story from two years ago had to be dug up as an illustration of usefulness... I don't know where the author positions himself on that graph, but that's like the direct opposite of high impact on the day-to-day


What is the impact on Software Architects ? AI Architecture guidance can be very ivory tower in regards to transformer model internals and ML pipelines, but when it comes to CodeGenAI refinement, you really need to touch the grass. On the plus side, system design skills means knowing what questions to ask, and good mermaidJs markdown designs are solid agent inputs. I guess the role will shift to researching and enabling the devs to best leverage CodeGenAI infra for full SDLC velocity with quality


In OPs examples, what about privacy/IP concerns of your existing code base? When people for example mention copilot/cursor or any other auto completion tools or just chatgpt, do you happily let the models access your existing "company internal" code? Sure, no problem when you self-host, but I assume all these use cases talk about some external API? Are you even allowed to do that in most companies? As your IP basically leaves your machine at one point?


The author here. Yes, what you're saying is 100% on point. Putting a company's code into a random chatbot online would be a horrendous violation of any company's policy out there. I'm in a fortunate position where we've had a clearly defined policy for a long time now, outlining which tools can be used and which categories of data we are allowed to use with them.


Ha of course he only thinks they are useful for him. The ego...




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: