We already have agentic systems; they're not particularly impressive (1). There'...

NitpickLawyer · 2024-12-23T05:42:47 1734932567

> It's provable; if just chaining LLMs are a particular size into agentic systems could scale indefinitely, then you could use a 1-param LLM and get AGI. You can't. QED.

Perhaps I missunderstand your reply, but that has not been my experience at all.

There are 3 types of "agentic" behaviour that has worked for a while for me, and I don't know how else it would work without "agents":

1. Task decomposition - this was my manual flow since pre-chatgpt models: a) provide an overview of topic x with chapter names; b) expand on chapter 1 ... n ; c) make a summary of each chapter; d) make an introduction based on the summaries. I now have an "agent" that does that w/ minimal scripting and no "libraries". Just pure python control loop.

This gets me pretty reasonable documents for my daily needs.

2. tool use (search, db queries, API hits). I don't know how you'd use an LLM without this functionality. And chaining them into flows absolutely works.

3. coding. I use the following "flow" -> input a paragraph or 2 about what I want, send that + some embedding-based context from the codebase to an LLM (3.5 or 4o, recently o1 or gemini) -> get code -> run code -> /terminal if error -> paste results -> re-iterate if needed. This flow really works today, especially with 3.5. In my testing it needs somewhere under 3 "iterations" to "get" what's needed in more than 80% of the cases. I intervene in the rest of 20%.

danielbln · 2024-12-23T06:11:03 1734934263

A zed user? Live that editor and the dev flow with it.

NitpickLawyer · 2024-12-23T07:26:51 1734938811

Haha, yes! I'm trying it out and been loving it so far. I found that I go there for most of my eda scripts these days. I do a lot of datasets collection and exploration, and it's amazing that I can now type one paragraph and get pretty much what it would have taken me ~30 min to code myself. Claude 3.5 is great for most exploration tasks, and the flow of "this doesn't work /terminal" + claude using prints to debug is really starting to come together.

I use zed for this, cursor for my more involved sessions and aider + vscode + continue for local stuff when I want to see how far along local models have come. Haven't tried cline yet, but heard great stuff.

wokwokwok · 2024-12-23T09:11:24 1734945084

I didn’t say they don’t work, I said there is an upper bound on the function they provide.

If a discrete system can be composed of multiple LLMs the upper bound on the function they provide is by the function of the LLM, not the number of agents.

Ie. We have agentic systems.

Saying “wait till you see those agentic systems!” is like saying “wait til you see those c++ programs!”

Yes. I see them. Mmm. Ok. I don’t think I’m going to be surprised by seeing them doing exactly the same things in a year.

The impressive part in a year will the non agentic part of things.

Ie. Explicitly; if the underlying LLMs dont get any better, there is no reason to expect the system built out of them to get any better.

If that was untrue, you would expect to be able to build agentic systems out of much smaller LLMs, but that overwhelmingly doesn’t work.

bubaumba · 2024-12-23T12:03:08 1734955388

> if the underlying LLMs dont get any better, there is no reason to expect the system built out of them to get any better.

Actually o1, o3 are doing exactly this, and very well. I.e. explicitly: by proper orchestration the same LLM can do much better job. There is a price, but...

> you would expect to be able to build agentic systems out of much smaller LLMs

Good point, it should be possible to do it on a high-end pc or even embedded.

wokwokwok · 2024-12-23T16:02:50 1734969770

> but that overwhelmingly doesn’t work.

MCTS will be the next big “thing”; not agents.

bubaumba · 2024-12-23T18:49:08 1734979748

They are not mutually exclusive. Likely we'll get more clear separation of architecture and underlying technology. In this case agents (i.e. architecture) can use different technologies or mix of them. Including 'AI' and algorithms. The trick is to make them work together.