IBM's new SWE agents for developers

wmal · 2024-10-22T21:20:30 1729632030

I wanted to find the actual change performed by these agents so I watched the embedded video. I can not believe what I saw.

The video shows a private fork of a pubic repository. The bug is real, but it was resolved in February 2023 and doesn’t seem like the solution was automated [1]

The bug has a stack trace attached with a big arrow pointing to line 223 of a backend_compat.py file. A quick grasp on this stack trace and you already know what happened and why, and how to fix this, but…

not for the agent. It seems to analyze the repository in multiple steps and tries to locate the class. Why did they even release this video?

[1] https://github.com/Qiskit/qiskit/issues/9562

negoutputeng · 2024-10-22T21:24:00 1729632240

Mgmt at every company is asked - what are you doing to be agentic ?

so, they organize hackathons where devs build a hypothetical agentic framework nobody will dare use. So, mgmt can claim, look here what i have done to be agentic.

you should ask: would you dogfood your agent, and the answer is no way. these are meant purely for marketing purposes, as they dont meet an end user need.

negoutputeng · 2024-10-22T21:28:38 1729632518

whats hilarious in this farce is how these are being rebranded from "co-pilots" to "agents"

just goes to show, it is all a big song-and-dance. much ado about nothing.

jjmarr · 2024-10-22T21:36:41 1729633001

The term "co-pilot" implies a company has to hire a software engineer to guide the AI.

The term "agent" implies you can give the AI full access to your repos and fire the software engineers you're grudgingly paying six figures to.

The second is much more valuable to executives not wanting to pay the software people that demand higher salaries than virtually everyone else in the organization.

viraptor · 2024-10-22T22:02:45 1729634565

They're was no rebrand. They're different concepts. Copilot and similar solutions are giving hints as you do the development. Agents are systems that receive a goal and will iterate actions and queries for more information until they achieve the goal.

negoutputeng · 2024-10-22T22:09:20 1729634960

you are quoting the party-line.

i am saying, the thing is snake-oil - a solution looking for a problem.

viraptor · 2024-10-22T22:20:02 1729635602

I'm explaining what words mean. Agentic approach has been a thing for years https://en.wikipedia.org/wiki/Intelligent_agent You can just say you don't like AI in programming, without saying incorrect things on top of that.

mooreds · 2024-10-22T21:29:36 1729632576

Right. Woe is the startup that doesn't have an AI story right now.

whiplash451 · 2024-10-22T21:53:15 1729633995

The companies that have a data moat and no AI are in a much better position than those who’ve got it the other way around.

mooreds · 2024-10-23T13:56:33 1729691793

Depends on what you are optimizing for.

Long term value, I agree.

Fundraising, hard disagree.

colonwqbang · 2024-10-22T21:40:49 1729633249

Classic machine learning researcher trick: just select your test example from the training set! It certainly saves a lot of effort.

wmal · 2024-10-22T21:47:49 1729633669

That’s true, but this repo has thousands of bugs. They could at least find one that was in the training set, but also did not contain the location in the bug description.

This way it would at least look like it may work

toomuchtodo · 2024-10-22T21:51:05 1729633865

Decision makers and those writing the check aren’t sophisticated enough to know the difference, in my experience with orgs that buy from IBM.

negoutputeng · 2024-10-22T22:17:22 1729635442

every hype cycle runs through a predictable course.

we are at a phase where the early adopters have seen the writing on the wall.. ie that llms are useful for a limited set of usecases. but there are lots of late adopters who are still awestruck and not disillusioned yet.

colonwqbang · 2024-10-22T22:01:48 1729634508

Indeed. It's also amusing how it produces a multi-page essay on the bug instead of submitting a pull request with an actionable fix.

rgavuliak · 2024-10-23T08:09:49 1729670989

The demo is not supposed to wow the technical people. The business people whose budgets will pay for this are less likely to notice.

viraptor · 2024-10-22T21:44:19 1729633459

I think the process could be better, but if you want good quality you really shouldn't expect it to just jump at the "obvious" thing. Just like you wouldn't want the developer to just make the error to away in the quickest way. Getting more context is always going to be a good idea, even if it wastes some time in the "trivial" cases.

guluarte · 2024-10-23T01:33:57 1729647237

it takes more time to watch the video than fix the bug

bubaumba · 2024-10-22T21:27:16 1729632436

you can't expect all at once. just one step forward. note how fast everything moves since 2020, and accelerating. finally 'it's' coming...

BugsJustFindMe · 2024-10-22T20:48:26 1729630106

> But with the SWE localization agent, a [ibm-swe-agent-2.0] could open a bug report they’ve received on GitHub, tag it with “ibm-swe-agent-1.0” and the agent will quickly work in the background to find the troublesome code. Once it’s found the location, it’ll suggest a fix that [ibm-swe-agent-2.0] could implement to resolve the issue. [ibm-swe-agent-2.0] could then review the proposed fix using other agents.

I made a few minor edits, but I think we all know this is coming. This calls itself "for developers" for now, but really also it's "instead of developers", and at some point the mask will come off.

alkonaut · 2024-10-22T21:50:07 1729633807

It will suck to babysit LLMs as a job. In one sense perhaps it will be nice to have models do the chores. But I fear we’ll be 90% babysitting. Today I was in an hour long chat with ChatGPT about a problem when it circled back to its initial (wrong) soliton.

I have very little fear for my own job no matter how good models get. What happens is that software gets cheaper and more of it is bought. It’s what happened in every industry with automation.

Those who can’t operate a machine though (in this case an AI) should maybe worry. But chances are their jobs weren’t very secure to begin with.

rkozik1989 · 2024-10-24T13:18:54 1729775934

Baby sitting LLMs is already my job and has been for a year. It's kind of boring but honestly after nearly 20 years in the game I felt like I was approaching endgame for programming anyways.

mistrial9 · 2024-10-24T16:14:12 1729786452

one more thing - you won't get a "job" .. on-demand temps can fill the roles, and are much cheaper for the company. It is happening already.

bloopernova · 2024-10-22T21:45:05 1729633505

All the project/product managers that think they are the ones responsible for team success are going to get a rude awakening. When they try to do the job of an entire team, it's going to come apart pretty quickly. LLMs are a tool, nothing more, they don't magically imbue the user with competency.

digging · 2024-10-22T22:06:26 1729634786

They're not going to try to do the job, they're going to hire cheaper, worse SWEs to manipulate AI... and then things will come apart pretty quickly :) But they'll still have someone else to blame.

> LLMs are a tool, nothing more, they don't magically imbue the user with competency.

Not a good take though, IMO. They're literally a tool that can teach you how to use them, or anything else.

bloopernova · 2024-10-23T14:14:31 1729692871

> > LLMs are a tool, nothing more, they don't magically imbue the user with competency.

> Not a good take though, IMO. They're literally a tool that can teach you how to use them, or anything else.

I disagree. In their current incarnation, LLMs require a human subject matter expert to determine if the output is valid. In the project manager team lead example, the LLM won't tell you if the database is sized correctly, or if you even need a database.

DebtDeflation · 2024-10-24T10:23:02 1729765382

>they're going to hire cheaper, worse SWEs to manipulate AI

This is 100% the play.

Right now you can hire 5 devs in India to do the job of 1 competent US dev and save 30-40% on total cost.

Add in AI and it will only take 3 devs in India to do the same work, and can now save 50-60% on total cost.

throw234234234 · 2024-10-23T01:58:27 1729648707

They will ensure that before that happens that won't occur; I'm sure they will cover their bases. AI is great for PM's/Product/C-Suite types (i.e. the decision makers). Bad for the do'ers/builders long term IMO.

RealityVoid · 2024-10-22T21:15:56 1729631756

I don't care. I swore to myself that if the time comes my skills will no longer be needed, I'd gracefully ride into the sunset and do some other thing.

giantg2 · 2024-10-22T21:33:56 1729632836

Sounds nice until you actually have to find some other thing, especially with the bar for entry being high for most interesting and well compensated jobs. It will be even worse when you have huge numbers of other devs also looking for a new job.

mycall · 2024-10-22T21:18:32 1729631912

This is really the only answer. Be water my friend.

rzzzt · 2024-10-22T21:26:58 1729632418

Incompressible, freeze around 0°C, corrosive to metal, got it.

bun_at_work · 2024-10-22T21:35:40 1729632940

side-step flamebait like winnie the poo

bravetraveler · 2024-10-23T08:10:01 1729671001

Oh, bother

soco · 2024-10-22T21:22:16 1729632136

Hopefully that some other thing puts bread on your table.

sesteel · 2024-10-22T21:45:56 1729633556

I've taken up a new career as an AI influencer.

lyu07282 · 2024-10-22T21:29:44 1729632584

Give IBM a trillion dollars and they couldn't threaten a 7 year olds lemonade stand business, I think we'll be safe lol

skywhopper · 2024-10-22T21:48:26 1729633706

That’s their goal, no doubt. And I’m sure a lot of zombie projects will be blindly turned over to this type of agent and left to rot. But in practice, these agents will never replace humans, because someone will have to oversee them, and that human will probably just be the “developer” that was “replaced” by them. The work will suffer, the quality will suffer, the enjoyment of the human will suffer, the costs will increase, but some salesperson and some mid level exec will be able to claim they sold and deployed AI and get a bonus.

Workaccount2 · 2024-10-22T21:52:35 1729633955

Developers are not going to go away, but the cushy high salaries likely will. Skill development follows a logarithmic curve where an AI boost to junior devs will be much more than the boost given to senior devs. This discrepancy will pull down the value of devs as you will get "more band for you buck" from lower tier devs, since the AI is comparatively free.

Although I also wonder about the development of new languages that may be optimized for transformers, as it seems clumsy and wasteful to have transformers juggle all the tokens needed to make code readable by humans. That would be really interesting to have a model that outputs code that functions incredibly but is indecipherable by humans.

lwhi · 2024-10-22T22:01:22 1729634482

Junior devs don't always understand enough to know why something should or shouldn't be done.

I don't think junior devs are going to benefit; if anything, the whole role of 'junior' has been made obsolete. The rote / repetitive work a junior would traditionally do, can now be delegated wholesale to a LLM.

I figure, productivity is going to be increased a lot. We'll need less developers as a result. The duties associated with developers are going to morph and become more solutions / architecture orientated.

Workaccount2 · 2024-10-22T22:05:40 1729634740

What you say could be true too (or a combo), the outcome will still be the same though as more devs compete for fewer positions.

j-krieger · 2024-10-22T22:05:06 1729634706

at some point, this will explode in a giant mess when your Codebase is littered by AI generated trash.

zeroonetwothree · 2024-10-22T21:00:40 1729630840

There’s still a huge gulf to cross to get to “instead of”.

sksxihve · 2024-10-22T23:50:55 1729641055

Easy fix, start publishing public repos on github with incorrect code so the AI is trained on it.

invalidOrTaken · 2024-10-22T20:55:48 1729630548

bring it on lol

dingnuts · 2024-10-22T21:06:22 1729631182

time to start a consultancy that specializes in unfucking the mess made by generative AI

neom · 2024-10-22T21:43:17 1729633397

I run a startup accelerator with a law firm partner (but not a legal accelerator) - and some of the stuff I hear in the lunchroom is wild. No doubt the firm is going to do extremely well un-fucking gen AI legal mess.

bubaumba · 2024-10-22T21:22:29 1729632149

not only AI, we have one 'guru' who sounds like he is reading copilot on remote audio only meetings.

giantg2 · 2024-10-22T21:35:20 1729632920

Thank you for a great career idea.

bubaumba · 2024-10-23T04:51:13 1729659073

great minds think alike. remote consulting looks within the reach now.

alfalfasprout · 2024-10-22T21:31:02 1729632662

AI is the new bottom-of-the-barrel outsourced contractor.

mycall · 2024-10-22T21:19:47 1729631987

Reminds me of fixing all the half-baked vendor's work my company pays good money for.

Let the AI write all the code and programmers will do the fixes.

mistrial9 · 2024-10-22T21:14:58 1729631698

yeah - alongside other in-demand services. like apartment building management, corporate janitorial services, and public transportation bus drivers.

thepbc · 2024-11-05T14:49:12 1730818152

Wondering if anyone running a Python mainly application(s) is willing to be a Design Partner for a true AI Agent that can fix a few issues but with very high accuracy and no oversight.

I am from LogicStar.ai and we are trying to get rid of the BS and show true value in real-world applications with deep program analysis, reproduction, verification etc and like everyone else sourcing fix-suggestions from LLMs.

Follow our page on https://www.linkedin.com/company/logicstar-ai/ as I plan to post updates and frank comments here on what other agents are or are NOT. Your input will drive in what direction we should focus the technology and use cases.

alephxyz · 2024-10-23T03:20:08 1729653608

>That score places the IBM SWE agent high up the SWE-bench leaderboard, well above many other agents relying on massive frontier models, like GPT-4o and Claude 3.

They're not even in the top half of the leaderboard. Almost half the score of the first place agent.

jcgrillo · 2024-10-22T21:12:16 1729631536

Which block in the flowchart is the one which will try to sell me db2?

hrmacb · 2024-10-22T21:16:40 1729631800

"It made sense for IBM to build agentic tools like these, argues Ruchir Puri, chief scientist at IBM Research, not just for its own developers, but for all the enterprise developers IBM strives to assist."

What a weird sentence. Mx. Puri does not argue anything, this is just an unfounded claim. So far it just looks like snake oil that is to be sold to other companies.

This would actually be a good business strategy: Sell software that diminishes productivity to your competition and watch them disintegrate.

TeslaCoils · 2024-10-22T21:09:33 1729631373

Sure... https://www.cnbc.com/2024/06/17/mcdonalds-to-end-ibm-ai-driv...

valcron1000 · 2024-10-23T01:09:16 1729645756

How many millions were spent on building this "agent" that can "fix" a null pointer exception by wrapping it in a null check?

negoutputeng · 2024-10-22T22:25:29 1729635929

I would have liked to see a giant ppt of an agentic framework or architecture. Call it Enterprise Agentic Framework or something like that. The architecture diagram would fill an entire ppt slide and bedazzle its customers.

All i got instead are lame tools for developers.

kayodelycaon · 2024-10-22T20:40:56 1729629656

I wonder what kinds of errors it can actually detect. I’d love to throw it at my support queue: find the reason this thing got stuck in the interaction between three state machines which are not defined as state machines.

Or is this the next iteration of static analysis?

thepbc · 2024-11-05T15:16:40 1730819800

A combination of static analysis trying to reproduce the problem 1st, then trying various fix-suggestions and verifying they work and do not cause other regressions are a few of the steps that will need to be involved. Before an experienced developer can review this "very-junior dev"/AI work before pushing to the main of any commercial applications. The design partnership offer of https://www.linkedin.com/company/logicstar-ai/ stands if you have core Python for now. More languages will come once we get real-world feedback as a proof of concept.

whiplash451 · 2024-10-22T21:58:27 1729634307

What worries me most is because there is no way to prove the negative value of these agentic scams and because swe teams are (sadly) compressible to some extent, some companies will simply let go 10% of their workforce while the remaining 90% will have no choice but to keep grudging with the additional “benefit” of having to show the positive value of this scam to their hierarchy (unless they want to apply to the 10%). So much waste and sadness all around.

thisisnotauser · 2024-10-24T16:24:18 1729787058

Anyone who has ever used an IBM product will keep this as far away from their code as possible.

m3kw9 · 2024-10-22T20:51:36 1729630296

Maybe they can also create agents to replace the “business analysis” where they check and define business logic requirements.

eloycoto · 2024-10-22T21:07:56 1729631276

I need to say that I'm very impressed with the PDL project, a lot of things can be done in there.

https://github.com/IBM/prompt-declaration-language

constantlm · 2024-10-22T21:39:15 1729633155

What's up with the amateur hour graphs with squashed/pixelated logos?

RealCodingOtaku · 2024-10-22T22:17:55 1729635475

I said this before I left IBM, and I will say it again.

These and other models IBM is working on can do basic tasks that anyone else could. But it will all fall apart the moment you add complexity to it.

It's hilarious to see how IBM struggles to stay relevant, what did that lead to? A bot that summarizes stack trace. Why is this even on the front page of HN?

d3m0t3p · 2024-10-22T22:03:21 1729634601

The video is really sad. No music, no sounds. I remember better videos on youtube in 2006.

BS_Alarm · 2024-10-22T22:43:18 1729636998

Beep beep!

FL410 · 2024-10-22T21:05:55 1729631155

Can it debug RPG? /s

mdaniel · 2024-10-23T03:25:16 1729653916

I do see the /s but I do find that an interesting thought experiment since (a) I'd guess the number of humans who can actually debug RPG is pretty small and (b) so where are these magical agents that are "gonna take muh job" going to get training data for the code or the fixes to any such bugs