Hacker News new | past | comments | ask | show | jobs | submit login
[flagged] Ask HN: If LLMs are so useful, why haven't we seen any spike in productivity?
54 points by pera 4 months ago | hide | past | favorite | 55 comments
If LLM do actually help engineers become significantly more productive what could explain that, for instance, in the open source community:

- We are not fixing bugs faster

- We are not developing features faster

- We haven't seen an explosion of new projects

- We haven't seen an explosion of vulnerabilities being discovered

Maybe I am missing something but to me everything looks the same (except for an increasing amount of useless customer service chatbots and garbage LLM generated books on Amazon)

Edit: Unfortunately this submission was demoted for some reason but thanks for all the comments.




Writing code is the easiest part of the process (relatively speaking). Figuring out the requirements, working with stakeholders to drive consensus, and understanding user needs is the bulk of the work.

LLMs will certainly lower the entry barriers for new programmers, and might also create a new solopreneur economy because of it. Now non-technical people with ideas can start prototyping and raise money, but would soon need engineers to grow the product.


That doesn't completely explain it. There are tons of open source projects with countless feature requests just begging for someone to implement them. There's not really any research to do. Often the feature being requested is ridiculously simple but difficult to implement.


I think this is a good point to consider.

Let's imagine an inexperienced developer comes across a problem in an open source library, that has an existing issue raised in GitHub.

Are tools like Copilot and ChatGPT good enough to walk them through setting up the dev environment, fixing code and testing the fix. Maybe, but not without many prompts from the dev.

But how is that different from someone StackOverflowing their way through the problem.


Often it's not about "just" adding the feature, but evaluating if it makes sense, how it _should_ be implemented and if it should be part of some different feature. Just hacking something in is usually not the complicated part.


I think we need to consider that sometimes a feature is not implemented or a PR is not merged also because any code added needs to be maintained.

So even if there are a lot of feature requests that does not mean that the maintainer wants to just implement them in any fast way because that is code/feature that needs maintenance further down.


Technology spreads slowly. google docs is an instant 50x productivity increase for any legal process and yet a few years ago I saw an advocate's mind blown by a simple demonstration of simultaneous editing from two people in the same affidavit.

For him, the norm is still to redline a document on paper, and have his secretary add those changes to the original digital document and have that sent over to the opposing team for the same treatment.

I don't have strong opinions about LLMs' coding ability (though compared to the other comments so far I am more on the "LLMs are pretty good at creating software from natural language descriptions" side) but even assuming that LLMs can give programmers a 50x productivity increase, I'd assume it would take 10-50 years for industry and processes to evolve to take advantage of that increase.


My lawyer says his office runs on Microsoft 365. Is Google Docs really 50x over that? I don't even see how it's 50x over LibreOffice and a shared drive.


I think the "50x" is a little bit of a random number.

If you are already writing good code it might be hard to get any great improvement. If you are a beginner without much training /experience it might not be hard to see orders of magnitude improvement.

It might take some time though. When I have spoken to non coding people they seem to look at me like I am talking about flying to the moon. If computers are ever considered general tools and the general public every moves more towards more DIY and small business there might be more of an uptake.


Sure if it's online and multiplayer it's similar. My comment is not google over Microsoft but shared/multiplayer docs for tracking changes and reaching agreements in minutes for something that used to take (and for many still does take) weeks or months of manual reconciliation of printed paper documents


Your comment was "instant 50x productivity increase for any legal process".


Sue me ;)


Frank or George or Bill or Tom, anything but Sue.


Kudos for raising an empirical point rather than looking at the aspirations of the tech. It's hard to have that kind of look.

Jury's still out. It will take time until we have enough post mortems to tell if it is doing the job and how it's affecting things.

I do agree that if it was so good, we'd see practical applications ib more meaningful ways than just anecdotal tricks or lots of low quality content.


Turns out the bottleneck of engineering isn't related to what goes on in the editor.


I agree with your comment to certain degree, specially in a commercial enterprise environment, but programming is still very time consuming and if a new invention made us faster it would be something noticeable, no?


Just speaking from my experience, here; when I sit down to code, actually typing out the logic of what I want is not what I spend my time doing. I research optimization options, I prep old code for new features and refactor cruft. I have an AI-enabled editor, but besides generating boilerplate the AI-based features are mostly useless. My job doesn't rely on endlessly generating buggy code, it depends on the existence of endless buggy code that needs correcting.


You don’t think bespoke emojis are a boost to productivity?


What statistics are you referring to when making these claims?

Github hosts only 20% public repositories. Perhaps open source developers are less likely to have Github Copilot paid out of their own pocket?

Why do you expect "an explosion of new projects" with perhaps 20% of increased productivity? What percentage of open source developers are using LLMs for increased productivity when working on open source? If it's merely 20%, we'd see a 4% increase, something that's hardly noticeable.


My employer pays GitHub ~£10/month for me to use GitHub’s copilot. This is tiny compared to what they pay me.

It unlocks a small amount of extra productivity, but not that much. Yet still enough to be worth it.

My position is that they are useful but not massively useful, yet.


LLMs have been getting better - they were all pretty poor for my programming purposes a year or so ago, recently Perplexity (even the non-Pro version) and GPT4 have been helpful, and 4o is even better. I have been posting Leetcode hard problems into 4o and getting sensible outputs, something I didn't even try previously. Sometimes I do have to have it go through a few iterations, and I give it various qualifications (like keep to such-and-such time and space complexity or better). My usual instruction is to make the class or function more and more compact while keeping to the same functionality and time/space complexity.

I got 4o to give me a 33 line, relatively simple and understandable bidirectional BFS Kotlin function for this Leetcode problem which Perplexity (non-Pro) and GPT4 could solve, but not as well as 4o - https://leetcode.com/problems/word-ladder

Of course, even though these are Leetcode hard level problems, they are well-defined and relatively self-contained. I work at a Fortune 100 company and 99% of the time I can pound out the CRUD I do in my sleep - the difficulties I encounter are distractions, the CI server having some problem, the ticket/story I am working out not being fully specified and the PM is MIA that day, all teams are working on the feature at the same time and I need to find out what feature flags to have set and which test headers have been agreed on, the PM has asked me to work on something but some of what he says does not make sense in context so I have to ask for clarification etc. Then there's the meta-game of knowing what to prioritize, with one important component being what will make my manager happy so I get a good yearly review, and what I need to prioritize may differ from what my PM says to prioritize, or even more complexly, what my manager says to prioritize, but doesn't really mean.


I actually think all of the things you listed are absolutely happening.


> for instance, in the open source community

They're definitely wrong on that point, there's countless projects that exist that otherwise wouldn't have been started at all. Anecdotally I would never have put in the initial effort to set up a project that has 100+ stars now without the initial kick from early GPT-4 last year.

Lots of these new repos are also disproportionately in the LLM related space specifically since that's where people use them the most for code, so it's probably not as noticeable at large yet.


Youtube, Instagram are flooded with AI generated content.

I believe AI will be useful in Game Dev. AI voice acting, AI face generation. This way all the NPCs will be unique. Possibly AI layout generation.

I don't think using AI to generate script is great use case. It can be used to generate ideas. But still we need human creativity to make great games.


> Youtube, Instagram are flooded with AI generated content

Are we supposed to be happy about that?


We don't need LLMs to generate background NPCs, we have procgen for that already. I think AI generated voices are still a ways from being ad good as a real human; whenever I watch a YouTube video with an AI voice over, I can tell immediately


GPT-4o got a whole lot closer to realism though- I’d say we’re better than Skyrim’s pre recorded speech system at this point.


> significantly more productive

Try a couple percent. More if you type slowly (magic autocomplete). More if you're doing something where you need to search q&a fora a lot.


Apart from a few things already mentioned, I don't believe developers are really trying to engage with the LLM development. Some are just completely opposed to it ethically, some think it's not good enough and don't try, some don't care, some were not satisfied before. There's quite a few people using copilot, but that's the most basic and simple approach.

I don't personally know anyone trying to use more fancy tools like agents or ide-integrated helpers. They're not perfect by any means and you actually need to learn how to use them well, but the difference is massive. I've definitely saved some hours when developing smaller scope tools. It's not a time save that would drastically change my total productivity, but... it exists and it's going to increase in the future. And it requires upfront investment into the tooling and learning that few people seem to be interested in.

But even given current issues, how can you tell there hasn't been an improvement? How would you be able to tell across all the open source in the world?


Many big companies are investing in AI to replace people, not fix problems. Try getting anything internal from a human anymore. It's all internal bots. It's easier to sell when you can say "We can replace 90% of your HR department" vs "It will help you find bugs and develop features faster". I'm a bit cynical, but I see it happening everyday.


What data are you using for those metrics? A 5% improvement in the time to fix bugs or develop features might not be immediately obvious.


All proponents of AI claim they develop 30% faster with AI.

So yes, where are those 30%.


As a developer, I do try to use LLMs in my daily work, and I get a slight productivity boost, but I wouldn’t say it’s a spike. It helps me write tests faster, it helps me write pure functions faster, and it helps me autocomplete some documentation faster. It also helps me go faster with 1-time fixes, for db data migration queries or stuff like that.

That said, LLMs in code editor come with some kind of "hyperactivity" which I find really unpleasant. They’re too "in your face", make the code move a lot and sometimes make it a bit harder to focus than without LLM. They can also be extremely frustrating and result in productivity loss, for example when they generate code that’s slightly wrong and you need to take some time to fix it. It’s harder than just writing the code.


Bottlenecks most often stem from lack of clarity.

The business doesn’t have clarity on what they are trying to achieve. Or they don’t have clarity on what’s important, and constantly change priority (and both of these can cause the most talented engineer to spin their wheels).

LLMs can help gain clarity, the same way a coach, consultant, or therapist can help you work through a scenario. But it’s only as effective as the work you’re willing to put into that endeavor.

So it comes down to:

* Nothing has changed regarding the nature of human work ethic

* Most people don’t want to be a programmer. The idea that ”everyone’s a programmer now” is no different than saying ”everyone’s a carpenter now” because power tools exist. Most people don’t want to do that kind of work and are happy to pay someone else to do it.


Profits? I mean, I'm guessing the trend can't be wholly disconnected from the massive layoffs in the tech sector over the last year or so. Probably not the primary driver, but companies are always looking to maximize profits and, in the short-term, what faster way is there than by making cuts?

If a business sees a 15% productivity boost coming, especially with no easy plan in place to to utilize it fully for equivalent profit, someone near the top is already thinking that quick cuts could be an immediate 15% increase in reported profits for next quarter (in a 1:1 scenario).

I'm being a bit simplistic, but I think the general idea of business maximizing profits over output stands (or easy short-term thinking over more difficult long-term planning).


- Some early studies have shown modest gains (can try and link later)

- It’s still very early. LLMs have only been publicly available for 2 years, copilots a little less than that.

- It’s mostly anchored on cold starts ie I’m creating something from scratch. Leveraging LLMs in existing and mature codebases is definitely going to pick up.

- The majority of devs aren’t really using these tools or using them to their full ability. It takes a lot of fiddling to understand the limits and strengths, but when you do, you basically stop writing code and write more prose.

I will be surprised if in ten years even a quarter of your keyboard inputs will be towards code directly vs directing your friendly coding robot.


How do you expect to "see those"? I have e.g. started using LLM (in a limited manner) to help me write TS Definitions for my OSS projects. I've also used gen AI to create art for multiple of my projects as well. But those haven't "unlocked" my productivity 10x, they have just allowed me to have a bit more of free time/less headaches.

But I still love programming and will mostly continue to do so when it's for fun, which is most of my OSS. For me it's like saying "why do you do woodworking when you can outsource it to some Chinese shop?" when it defeats the point.


The technology still in its infancy and people's ability to harness it even moreso.

Patience.


Two years ago people on this forum were trying to convince me the world would be a completely different place in no time

This is one of these "in two years" technology, like self driving cars, asteroid mining, &c.


I think people underestimate the rate of progress of technology but overestimate the time it takes the world to adapt to it. AI is progressing rapidly but it won’t change the world overnight.


There are indeed productivity gains, but those are more scattered to quantify.

Here are some significant productivity gains I get from Mistral/Phind/ChatGPT/office-internal-llm daily.

- throw a messy shell script and ask it to refactor it(works 80% of the time)

- put a sample xml/json/yaml and ask it to generate the class/struct (code generation)

- ask questions and it gives immediate response with example more well suited to my need (previously took time to go into SO/Reddit/SE etc and scroll through several posts, docs or even waste time reading blogspams )

- ask questions about specific topic and get immediate response and citations(this is inhouse trained model) instead of fighting with broken search or ocean of messy documents in Confluence/Notion/Gitlab Pages and what not

- rubber duck when brainstorming a problem(it can sometimes lead to interesting outcomes)

- prepare a bash script to do something and then I simply modify/correct/refine it to fit my needs

- questions about trivial stuff

- generate boiler plates

- generate a throw away project to try something fast

- convert from one language to another(need to work with different teams using different languages such as TS/Java/C++/Scala/Python/Shell/Rust/Erlang etc)

- write a polite email(or response to) which I can copy paste and send when I am too occupied with something else

- documentation of specific feature of something which would take a lot of digging in the original docs

- generate a pure self-contained html/css prototype to send to our UI/UX team to give them an idea of particular concept

- summarize large block of text into bullet forms(useful for presentations)

- get summaries of popular books(because chatgpt has indeed trained on a lot of them somehow!)

- translate a text to another language(works well when it does but still needs some corrections)

Most of these activities save me a lot of time which would previously need some big time investments.


According to Gartner, there is increased speed in tasks. However, they claim that the workers will use that time for leisure instead of working harder, what they term as "productivity leakage". So much for AI making our lives easier, right?

Source:

https://www.theregister.com/2024/09/09/gartner_synmposium_ai...


My personal experience suggests that

* we are not developing features faster, but we have time for asking questions we had no time to ask before. More ambitious architectures and designs.

* we are fixing bugs faster and we are producing less bugs (because of better designs).

* not everybody is happy about less bugs.

* we discover more vulnerabilities. Again, not everybody is happy about that, they just want new features, not new knowledge of vulnerabilities and technical debt.


On the B2C side, there's a disconnect because the app developer is typically on the hook for inference costs. This incentivizes them to cut corners so as to minimize the cost of that AI, rather than use it creatively and use high quality models. The apps that let users bring their own keys are much more innovative in this space, but the amount of friction involved in transferring keys keeps most apps from adopting that approach.


They are useful in niches. I got mad at copilot suggesting the wrong thing and went to an ide without it the other day. AST based code completion works much better, copilot gave answers that looked good but were wrong. Worse I needed to hit tab in one place and it kept completing something worng instead of just adding the tab needed there.


Speak for yourself - I'm coding at least 30% faster than before because of LLMs. I don't use it for generating code, but as a reference and for brainstorming ideas. Your expectations are perhaps too high by wanting to see an "explosion", but the productivity increase was very clear in my case.


I think you're correct. The positives from LLMs are an illusion but the drawbacks are real but insidious.


The productivity studies that most people cite were done by GitHub, which has an obvious agenda to promote Copilot. The productivity gains are very marginal and much less than claimed.


Writing code has never been the barrier to productivity for me. It’s all the other businesses and development processes plus distractions.


outside of the company walls as the context for empirical measure:

1. is the web becoming more [accessible](https://abilitynet.org.uk/news-blogs/inaccessible-websites-k... http://useragentman.com/wcag-wishlist/)?

2. are the web pages getting [faster](https://www.nngroup.com/articles/the-need-for-speed/) and lighter?

3. is it righting wrongs about existing non-performant [code](https://www.webperf.tips/tip/cached-js-misconceptions/)?

4. is it encouraging [smaller](https://dyf-tfh.github.io/)?

5. is it promoting historical [insights](https://qntm.org/clean)?

6. is it popping [bubbles](https://www.youtube.com/watch?v=Y7YAXUWG820)?

7. is it encouraging the correct interpretations of actual [innovators](https://mamund.site44.com/articles/objects-v-messages/index....)?

8. is it minimizing or eliminating [traps](https://www.gnu.org/philosophy/javascript-trap.html)? (also see the W3C's Web Sustainability Guideline's on javascript fallbacks)

9. is it avoiding the "[wars](https://tanzu.vmware.com/content/blog/framework-wars-now-all...)"?

10. is it shedding the object-form? https://dreamsongs.com/ObjectsHaveFailedNarrative.html


Just wait until we release the next model!


Do you work at Open AI ?


Everyone is a decent programmer now who can solve nearly any problem with help from LLM.


Everyone? Most people are incapable of expressing a problem in reasonably clear terms. They often don't even know the right questions to ask.

LLMs are pretty good at giving you what you ask for. Not so good at telling you that you're asking for the wrong thing.


> LLMs are pretty good at giving you what you ask for. Not so good at telling you that you're asking for the wrong thing.

So they're comparable to rubber ducks. I would like to see data from a comparative study with rubber ducks, LLMs, and a control group.


Here is a problem I've been noodling with. If you are a decent programmer, how does your LLM help you solve this problem?

Given a cheminformatics fingerprint definition based on SMARTS substructure patterns, come up with a screening filter, likely using a decision tree, which uses intermediate feature tests to prune search space faster than simply testing each pattern one-by-one.

For example, the Klekota-Roth patterns defined in their supplemental data (and also available from CDK at https://github.com/cdk/cdk/blob/main/descriptor/fingerprint/...) contain patterns like:

    "CC(=NNC=O)C",
    "CC(=NNC=O)C(=O)O",
    "CC(=NNC=O)C=C",
    "CC(=NNC=O)C=Cc1ccccc1",
Clearly if 'CC(=NNC=O)C' does not exist in the molecule to fingerprint then there is no reason to test for the subsequent three patterns.

Similarly, there are patterns like:

    "FC(F)(C=O)C1(F)OC(F)(F)C(F)(F)C1(F)F",
    "FC(F)(F)C(F)(F)C(F)(F)OC(F)(C=O)C(F)(F)F",
    "FC(F)(F)C(F)(F)C(F)(F)S",
which could be improved by an element count test - count the number of fluorines, and only do the test if there are enough atoms in the molecule to fingerprint.

So one stage might be to construct a list of element counts;

   ele_counts = [0]*200
   seen = set()
   for atom in mol.GetAtoms():
      ele_counts[eleno:=atom.GetAtomicNum()] += 1
      seen.add(eleno)
then have a lookup table for each element, based on the patterns which have at least that count of the given element type;

   ele_patterns = [
     # max known count, list of set of matching patterns
     (0, [set()]), # element 0
     (0, [set()]), # hydrogen
     ..
     (20, [{all patterns which contain no carbon},
           {all patterns which require at most 1 carbon}, ...
           {all patterns which require at most 19 carbons}],
     (10, [{all patterns which contain no fluorine}, ..
           {all patterns which contain at most 9 fluorines}], 
      ...]
so one reduction can be

   def get_possible_patterns(seen, ele_counts):
     for eleno in seen:
        max_count, match_list = ele_patterns[eleno]
        count = min(ele_counts[eleno], max_count)
        yield match_list[count]
   patterns = set.intersect(*get_possible_patterns(seen, ele_counts))
and only test that subset of patterns.

However, this is not sophisticated enough to identify which other tests, like the "CC(=NNC=O)C" example I gave before, or "S(=O)(=O)", which might be good tests at a higher level than the element.

And clearly if there isn't a sulphur, aren't two oxygens, and aren't two double bonds then there's no need to test "S(=O)(=O)", suggesting a tree structure would be useful.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: