Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Every post that claimed using ChatGPT to achieve non-trivial tasks turned out to have non-trivial human intervention.

> (from the original article) In fact, I found it better to let ChatGPT generate a toy-ish version of the code first, then let it add things to it step-by-step. This resulted in much better output than, say, asking ChatGPT to generate production-quality code with all features in the first go. This also gave me a way to break down my requirements and feed them one at a time - as I was also acting as a code-reviewer for the generated output, and so this method was also easier for me to work with.

It takes a human who really knows the area to instruct ChatGPT and review the output, point out silly mistakes in the generated non-sense, and start next iteration. This kind of curated posts always cut off the most part of the conversations and the failed attempts, and then concatenate successful attempts with outputs of quality. Sure, it will be helpful as a super-IntelliSense. But not as helpful as the post suggested.

I've tried to do something like in the post, but I was quickly bored with waiting output, reviewing, all the iterations. One important aspect about programming is that reading code may not be easier than writing code. And in my case, it's more painful.



ChatGPT is a junior developer whose knowledge is broad but shallow.


IMO this leaves out some salient details. For example, I'd say ChatGPT is a very, very good junior developer. The kind of junior developer that loves computer science, has been screwing around with miscellaneous algorithms and data structures its whole life, has a near-perfect memory, and is awake 24/7/365, but has never had to architect a data-intensive system, write future-proof code, or write code for other developers. Of course, these last three things are a big deal, but the rest of the list makes for a ridiculously useful teammate.


It also has a very broad knowledge of programming languages and frameworks. It's able to onboard you with ease and answer most of qour questions. The trick is to recognize when it's confidently incorrect and hallucinating API calls.


But that's the secret, it's always hallucinating


What do you mean when you say this? Most people use hallucinate to mean "writes things that aren't true". It clearly and demonstrably is able to write at least some code that is valid and write some things that are true.


These models don't have a frame of reference to ground themselves in reality with, so they don't really have a base "truth". Everything is equally valid if it is likely.

A human in a hallucinogenic state could hallucinate a lot of things that are true. The hallucination can feature real characters and places, and could happen to follow the normal rules of physics, but they are not guaranteed to do so. And since the individual has essentially become detached from reality, they have no way of knowing which is which.

It's not a perfect analogy, but it helps with understanding that the model "writing things that aren't true" is not some statistical quirk or bug that can be solved with a bandaid, but rather is fundamental to the models themselves. In fact, it might be more truthful to say that the models are always making things up, but that often the things they are making up happen to be true and/or useful.


Precisely, the model is just regurgitating and pattern matching using a large enough training set where the outputs happen to look factual or logical. Meaning that we're just anthropomorphizing these concepts onto statical models, so it's not much different than Jesus Toast.


I think this is a great way to think about it. Hallucinations are the default and an LLM app is one that channels hallucinations rather than avoids them.


Yeah, the junior dev analogy misses on the core capabilities. The ability to spit out syntactically correct blocks of code in a second or two is a massive win, even if it requires careful review.


Yup, it's been a help for me. Had a buddy who asked me if I could automate his workflow in wordpress for post submissions he had to deal with. I asked chatgpt with a little prodding to create me a little workflow. I cleaned it up a bit and threw it in AWS lambda for literally 0$. He was super thankful and hooked me up with a bunch of meat (his dad is a butcher) and I spent maybe an hour on it.


Getting a bit heavy on the anthropomorphizing, it's an LLM which has certain capabilities.

For example, I would not expect the same junior engineer to have such variance, given the same inputs.


I completely agree. IMO anthropomorphisms of LLMs leave out extremely important details.


> and is awake 24/7/365

The whole thing is a really accurate expansion on the analogy. It even extends further to explain how it tends to forget certain requirements it was just told and tends to hallucinate at times.


The worst part is that it doesn't know when it doesn't know, so makes up a garbage.


> The worst part is that it doesn't know when it doesn't know, so makes up a garbage.

Hehe, that's true for humans too.


i've seen this in juniors as well.


And unfortunately just as many seniors


Well, besides the prose, ChatGPT generates a perfectly valid looking code mashup of e.g. Qt, wxWidgets and its hallucinations on top of that. Humans don't do that :)


Now, who is going to mentor real human junior developers? Coz they wont progress by themselves (or not many will).

Whats the initiative for companies to invest into junior developers now ?


Maybe the seniors will get replaced by juniors with ChatGPT ?

It is cheaper…


Doubt it will go beyond that either. It is equivalent to a spell checker and a calculator having a baby.

It will take the world by storm and change the way we do things. But it won't change the work that needs to be done. Just the way we consume it.


Disagee. GPT is a senior and knows it all but doesn't know where to start unless you precisely instruct them what to do.


"Developer who needs precise instructions to accomplish every task" is the exact opposite of a senior developer


I'm actually not so sure about that statement. For example, knowing if the code will be executed on a raspberry pi, a HPC with 10TB RAM and 512 CPUs, or a home desktop with 128GB RAM and 8 core CPU will greatly affect how the task may be done. Also, if code aesthetics are important with dependencies that allow it, or fewer dependencies are required, or if performance is more important, or if saving disk space is paramount, etc. All of these considerations (or if the need to run on any of them easily) heavily change the direction of what should be written, even after the language and such have been chosen. So, yeah - effectively you do need to specify quite a bit to a senior dev, if you want specific properties in the output - so it's obvious that this needs to be specified to a linguistic interface to coding like these LLMs.


I guess it depends on how you'd define "senior" in this context, someone who knows lots of techstack or someone who has an idea. Of course that doesn't directly map to people's skills because most people develop skills in various dimensions at once.


> Every post that claimed using ChatGPT to achieve non trivial tasks turned out to have non trivial human intervention.

That means full autonomy reached in 0% of applications. How do we go from 0 to 1? By the way, until we remove the human from the loop the iteration speed is still human speed, and number of AI agents <= number of human assistants.

The productivity boost by current level AI is just 15%, as reported in some papers, percentage of code written by Copilot is about 50% it just helps writing out the easy parts and not much for debugging, designing, releasing, etc which take the bulk of the time, so it's probably back to 15% boost.


Ok but this is extremely new tech, all of that stuff will get better over time, and the AI will require less and less intervention.


I don't think so. Ultimately there's not enough information in prompts to produce "correct" code. And any attempt to deliver more information will result in a worse programming language, or as it is now, more iterations.


Many high quality human programmers could go off and make a very good program from a simple description/prompt. I see no reason an LLM couldn’t do the same.

On top of that, there’s no reason an AI couldn’t ask additional questions to clarify certain details, just like a human would. Also as this tech gets faster, the iteration process will get more rapid too, where a human can give small bits of feedback to modify the “finished product” and get the results in seconds.


English is a programming language now. That is what is being demonstrated here. Code is still being written; it just looks more like instructions given to a human programmer.

Eventually, human languages will be the only high-level programming languages. Everything else will be thought of the way we currently think of assembly code: a tool of last resort, used only in unusual circumstances when nothing else will do.

And it looks like "Eventually" means "In a year or two."


English is a programming language once you stop looking at or storing the output of the LLM. Like a binary. I'm not seeing anybody store their prompts in a source repo and hooking it directly up to their build pipeline.


We'll be adding flakey code gen to our flakey tests, because someone will do this


What programming language do your stakeholders use to communicate their ideas during planning meetings? Unfortunately, mine can only speak English…


The issue in this is that they speak english, think english, want X in english. But in reality need Y.

ChatGPT will not help with that.


The point is that the roles are reversed not that you give ChatGPT to the stakeholders. ChatGPT is a programmer you hire for $30/month and you act as its manager or tech lead.

This is pointless to argue though since it’s apparent there are people for which this just doesn’t fit into their workflow for whatever reason. It’s like arguing over whether to use an IDE.


Seems like if it can eventually test that the output meets the criteria then it will excel.


But when the code doesn't meet the requirements, the AI needs to know what's incorrect and what changes it needs to make, and that still requires a human. Unless you just put it into a loop and hope that it produces a working result eventually.


So what if you don't "just put it into a loop and hope" but actually make a complex AI agent with static code analysis capabilities, a graph DB, a work memory etc?

I'm doing just that and it works surprisingly well. Currently it's as good as people with 2-3 years of experience. Do you really believe it's not going to improve?

Now I'm making a virtual webcam so it has a face and you can talk to it on a Zoom meeting...


Do you have a presentable demo? LLM augmented by static code analysis sounds very interesting


I don't have GPT-4 API access yet... Using my ChatGPT Plus subscription so far. Will make a release once I get the API.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: