I wanted to find the actual change performed by these agents so I watched the embedded video. I can not believe what I saw.
The video shows a private fork of a pubic repository. The bug is real, but it was resolved in February 2023 and doesn’t seem like the solution was automated [1]
The bug has a stack trace attached with a big arrow pointing to line 223 of a backend_compat.py file. A quick grasp on this stack trace and you already know what happened and why, and how to fix this, but…
not for the agent. It seems to analyze the repository in multiple steps and tries to locate the class. Why did they even release this video?
Mgmt at every company is asked - what are you doing to be agentic ?
so, they organize hackathons where devs build a hypothetical agentic framework nobody will dare use. So, mgmt can claim, look here what i have done to be agentic.
you should ask: would you dogfood your agent, and the answer is no way. these are meant purely for marketing purposes, as they dont meet an end user need.
The term "co-pilot" implies a company has to hire a software engineer to guide the AI.
The term "agent" implies you can give the AI full access to your repos and fire the software engineers you're grudgingly paying six figures to.
The second is much more valuable to executives not wanting to pay the software people that demand higher salaries than virtually everyone else in the organization.
They're was no rebrand. They're different concepts. Copilot and similar solutions are giving hints as you do the development. Agents are systems that receive a goal and will iterate actions and queries for more information until they achieve the goal.
I'm explaining what words mean. Agentic approach has been a thing for years https://en.wikipedia.org/wiki/Intelligent_agent You can just say you don't like AI in programming, without saying incorrect things on top of that.
That’s true, but this repo has thousands of bugs. They could at least find one that was in the training set, but also did not contain the location in the bug description.
every hype cycle runs through a predictable course.
we are at a phase where the early adopters have seen the writing on the wall.. ie that llms are useful for a limited set of usecases. but there are lots of late adopters who are still awestruck and not disillusioned yet.
I think the process could be better, but if you want good quality you really shouldn't expect it to just jump at the "obvious" thing. Just like you wouldn't want the developer to just make the error to away in the quickest way. Getting more context is always going to be a good idea, even if it wastes some time in the "trivial" cases.
> But with the SWE localization agent, a [ibm-swe-agent-2.0] could open a bug report they’ve received on GitHub, tag it with “ibm-swe-agent-1.0” and the agent will quickly work in the background to find the troublesome code. Once it’s found the location, it’ll suggest a fix that [ibm-swe-agent-2.0] could implement to resolve the issue. [ibm-swe-agent-2.0] could then review the proposed fix using other agents.
I made a few minor edits, but I think we all know this is coming. This calls itself "for developers" for now, but really also it's "instead of developers", and at some point the mask will come off.
It will suck to babysit LLMs as a job. In one sense perhaps it will be nice to have models do the chores. But I fear we’ll be 90% babysitting. Today I was in an hour long chat with ChatGPT about a problem when it circled back to its initial (wrong) soliton.
I have very little fear for my own job no matter how good models get. What happens is that software gets cheaper and more of it is bought. It’s what happened in every industry with automation.
Those who can’t operate a machine though (in this case an AI) should maybe worry. But chances are their jobs weren’t very secure to begin with.
Baby sitting LLMs is already my job and has been for a year. It's kind of boring but honestly after nearly 20 years in the game I felt like I was approaching endgame for programming anyways.
All the project/product managers that think they are the ones responsible for team success are going to get a rude awakening. When they try to do the job of an entire team, it's going to come apart pretty quickly. LLMs are a tool, nothing more, they don't magically imbue the user with competency.
They're not going to try to do the job, they're going to hire cheaper, worse SWEs to manipulate AI... and then things will come apart pretty quickly :) But they'll still have someone else to blame.
> LLMs are a tool, nothing more, they don't magically imbue the user with competency.
Not a good take though, IMO. They're literally a tool that can teach you how to use them, or anything else.
> > LLMs are a tool, nothing more, they don't magically imbue the user with competency.
> Not a good take though, IMO. They're literally a tool that can teach you how to use them, or anything else.
I disagree. In their current incarnation, LLMs require a human subject matter expert to determine if the output is valid. In the project manager team lead example, the LLM won't tell you if the database is sized correctly, or if you even need a database.
They will ensure that before that happens that won't occur; I'm sure they will cover their bases. AI is great for PM's/Product/C-Suite types (i.e. the decision makers). Bad for the do'ers/builders long term IMO.
I don't care. I swore to myself that if the time comes my skills will no longer be needed, I'd gracefully ride into the sunset and do some other thing.
Sounds nice until you actually have to find some other thing, especially with the bar for entry being high for most interesting and well compensated jobs. It will be even worse when you have huge numbers of other devs also looking for a new job.
That’s their goal, no doubt. And I’m sure a lot of zombie projects will be blindly turned over to this type of agent and left to rot. But in practice, these agents will never replace humans, because someone will have to oversee them, and that human will probably just be the “developer” that was “replaced” by them. The work will suffer, the quality will suffer, the enjoyment of the human will suffer, the costs will increase, but some salesperson and some mid level exec will be able to claim they sold and deployed AI and get a bonus.
Developers are not going to go away, but the cushy high salaries likely will. Skill development follows a logarithmic curve where an AI boost to junior devs will be much more than the boost given to senior devs. This discrepancy will pull down the value of devs as you will get "more band for you buck" from lower tier devs, since the AI is comparatively free.
Although I also wonder about the development of new languages that may be optimized for transformers, as it seems clumsy and wasteful to have transformers juggle all the tokens needed to make code readable by humans. That would be really interesting to have a model that outputs code that functions incredibly but is indecipherable by humans.
Junior devs don't always understand enough to know why something should or shouldn't be done.
I don't think junior devs are going to benefit; if anything, the whole role of 'junior' has been made obsolete. The rote / repetitive work a junior would traditionally do, can now be delegated wholesale to a LLM.
I figure, productivity is going to be increased a lot. We'll need less developers as a result. The duties associated with developers are going to morph and become more solutions / architecture orientated.
I run a startup accelerator with a law firm partner (but not a legal accelerator) - and some of the stuff I hear in the lunchroom is wild. No doubt the firm is going to do extremely well un-fucking gen AI legal mess.
Wondering if anyone running a Python mainly application(s) is willing to be a Design Partner for a true AI Agent that can fix a few issues but with very high accuracy and no oversight.
I am from LogicStar.ai and we are trying to get rid of the BS and show true value in real-world applications with deep program analysis, reproduction, verification etc and like everyone else sourcing fix-suggestions from LLMs.
Follow our page on https://www.linkedin.com/company/logicstar-ai/ as I plan to post updates and frank comments here on what other agents are or are NOT.
Your input will drive in what direction we should focus the technology and use cases.
>That score places the IBM SWE agent high up the SWE-bench leaderboard, well above many other agents relying on massive frontier models, like GPT-4o and Claude 3.
They're not even in the top half of the leaderboard. Almost half the score of the first place agent.
"It made sense for IBM to build agentic tools like these, argues Ruchir Puri, chief scientist at IBM Research, not just for its own developers, but for all the enterprise developers IBM strives to assist."
What a weird sentence. Mx. Puri does not argue anything, this is just an unfounded claim. So far it just looks like snake oil that is to be sold to other companies.
This would actually be a good business strategy: Sell software that diminishes productivity to your competition and watch them disintegrate.
I would have liked to see a giant ppt of an agentic framework or architecture.
Call it Enterprise Agentic Framework or something like that.
The architecture diagram would fill an entire ppt slide and bedazzle its customers.
I wonder what kinds of errors it can actually detect. I’d love to throw it at my support queue: find the reason this thing got stuck in the interaction between three state machines which are not defined as state machines.
A combination of static analysis trying to reproduce the problem 1st, then trying various fix-suggestions and verifying they work and do not cause other regressions are a few of the steps that will need to be involved. Before an experienced developer can review this "very-junior dev"/AI work before pushing to the main of any commercial applications.
The design partnership offer of https://www.linkedin.com/company/logicstar-ai/ stands if you have core Python for now. More languages will come once we get real-world feedback as a proof of concept.
What worries me most is because there is no way to prove the negative value of these agentic scams and because swe teams are (sadly) compressible to some extent, some companies will simply let go 10% of their workforce while the remaining 90% will have no choice but to keep grudging with the additional “benefit” of having to show the positive value of this scam to their hierarchy (unless they want to apply to the 10%). So much waste and sadness all around.
I said this before I left IBM, and I will say it again.
These and other models IBM is working on can do basic tasks that anyone else could. But it will all fall apart the moment you add complexity to it.
It's hilarious to see how IBM struggles to stay relevant, what did that lead to? A bot that summarizes stack trace. Why is this even on the front page of HN?
I do see the /s but I do find that an interesting thought experiment since (a) I'd guess the number of humans who can actually debug RPG is pretty small and (b) so where are these magical agents that are "gonna take muh job" going to get training data for the code or the fixes to any such bugs
The video shows a private fork of a pubic repository. The bug is real, but it was resolved in February 2023 and doesn’t seem like the solution was automated [1]
The bug has a stack trace attached with a big arrow pointing to line 223 of a backend_compat.py file. A quick grasp on this stack trace and you already know what happened and why, and how to fix this, but…
not for the agent. It seems to analyze the repository in multiple steps and tries to locate the class. Why did they even release this video?
[1] https://github.com/Qiskit/qiskit/issues/9562