Seriously LLMs are remarkable tools, but they are horribly unreliable. What tasks could such autonomous agent do (beyond what a chat bot, perhaps extended with web access, already does)? I mean which task is so complex one can't just automate it with simple scripting and non critical if it goes wrong to the point of letting an AI LLM do it? BTW, running those models is rather expensive so also the task has to be quite expensive now, perhaps completed by a human.
> I mean which task is so complex one can't just automate it with simple scripting and non critical if it goes wrong to the point of letting an AI LLM do it?
Many of the things that get to deal with unstructured human inputs. Translation, summarization, data extraction, etc. The choice isn't between "reliable code" and "unreliable LLM", but rather "unreliable LLM" vs. nothing at all.
But I think more important aspect is that LLMs are getting quite good at coding - meaning they can be used to automate all the tasks that you could automate with scripting, but it wouldn't be simple, or you don't know the relevant tooling well enough, or you just can't be arsed to do it yourself.
For example, I've recently been using GPT-4 to automate little things in my Emacs. GPT-4 actually sucks at writing Emacs Lisp, but even having to correct it myself is much less effort than writing those bits from scratch, as the LLM saves me the step of figuring out how to do it. This is enough to make a difference between automating and not automating those little things.
First we had outsourcing to Indians, now we have ChatGPT. There is almost a rule of thumb, the less you pay, the bigger pile of shit you get. At least with ChatGPT you can vet it first, but with the market being flush with 1-2 year experience devs, globally vetting will be shit too. I honestly wonder, what will start happening to all these LLMs when the training set will get over-represented with cheap, fast, crappy code written by LLMs themselves. I bet "content inbreeding" will become the topic in the future.
>when the training set will get over-represented with cheap, fast, crappy code written by LLMs themselves
It's already happening. An MIT study came out last week that found that Amazon Mechanical Turk workers hired to do RLHF type training of models were using ChatGPT to select the best answer. And the web being polluted by AI generated content which then gets scraped into Common Crawl and other training data sets has been an issue for a couple of years now.
It’s just a new tool. It isn’t outsourcing. It sounds like the person you’re replying to uses it the same way I do which is basically as a way to quickly brainstorm solutions. It acts as a rubber duck that forces you to explain the problem clearly but it has the added benefit of suggesting mostly correct code. It’s a bit like pair programming where you’re telling it the high level things to do and it’s hammering out the boilerplate while you also review in real time and point out mistakes as they happen.
I think you’re completely wrong about new devs producing worse code with this new tool. On the contrary, they’re going to be able to learn things it took you years to master in a matter of months since they now have a private tutor/mentor/reviewer/domain expert/consultant on call 24/7 for $20 a month.
As for using generated content as input, Microsoft has published a few papers showing that using curated generated content can be used to train specialized models that are more competent in their domain than the original models and the kicker is they didn’t even use humans to curate the content, they just used existing language models!
IMO if you look at how much better GPT-4 is at coding compared to GPT-3.5, and advances in letting GPT test and debug it’s own code, it’s not going to be “cheaper and worse” in the future.
GPT and LLMs will allow good and seasoned programmers to produce better code in time-constrained environments.
At a quick glance the sentiment that the market being flush with newer developers would somehow automatically lead to a dip in technical advancements in the near future seems completely made up. I'd believe it to be the opposite, frankly. I'd see a dip in the near future if a large portion of the more senior engineers died all of a sudden and took their knowledge with them. That's not going to happen.
It all seems to be correlated with growth - as in, the fastest-growing areas of the industry, ones that are hot and considered a career gateway, attracting the largest amount of outsiders - are the biggest circuses on wheels.
It might be that the moat for large LLM providers will be their ability to pay good developers to write good code solely for the purpose of feeding the training corpus of the Coding-LLM-as-a-Service.
Example task that you cannot automate with a basic script - for each e-mail you receive, convert it into json of a given format for a given e-mail type (e.g. for delivery notifications, newsletters, meeting requests and so on). Also, tag it if it relates to a specific area of interest.
Another case that a friend of mine built - summarise a stream of news from Twitter/Telegram so that no news are lost, but duplicates are removed and bad writing is reformatted.
LLMs can be used to propose candidates for chemical, medical, coding, and basically any domain where you got enough data to create a model but it is exponentially hard to search for solutions. They can be the core module in an agent that learns by RL or evolutionary methods - so the quality only depends on how much data the agent can generate. We know that when an agent can do proper exploration like AlphaGo Zero it can reach superhuman levels, it builds its own specialised dataset as it runs.
Imagine a task that can automatically create and post ads on LinkedIn, Adsense, etc. Imagine you can also provide analytics as feedback, thus making each ad more analytics optimized. These are the types of mundane activities I dream of offloading to GPTs until the entire concept of online ads become irrelevant.
I agree, The whole LLM agent stuff is totally overhyped. LLMs cannot really plan and all the workarounds people are currently using are not very reliable. Nice for demos, but unusable for a "product".