Hacker News new | past | comments | ask | show | jobs | submit login
Testing GPT 4's code-writing capabilities with some real world problems (tylerglaiel.substack.com)
567 points by TylerGlaiel on March 17, 2023 | hide | past | favorite | 654 comments



I want to see GPT-4 dealing with this situation:

- they: we need a new basic POST endpoint

- us: cool, what does the api contract look like? URL? Query params? Payload? Response? Status code?

- they: Not sure. Third-party company XXQ will let you know the details. They will be the ones calling this new endpoint. But in essence it should be very simple: just grab whatever they pass and save it in our db

- us: ok, cool. Let me get in contact with them

- ... one week later...

- company XXQ: we got this contract here: <contract_json>

- us: thanks! We'll work on this

- ... 2 days later...

- us: umm, there's something not specified in <contract_json>. What about this part here that says that...

- ... 2 days later...

- company XXQ: ah sure, sorry we missed that part. It's like this...

- ...and so on...

Basically, 99% of the effort is NOT WRITING CODE. It's all about communication with people, and problem solving. If we use GPT-X in our company, it will help us with 1% of our workload. So, I couldn't care less about it.


This is so common in many types of business, and usually a very difficult point to articulate so thank you for that. It's something to be shown to those ringing the death-knell for programmers, artists, and the like.

Those death-knell types seemingly aren't aware of what day to day operations looks like and how AI makes a great tool, but doesn't necessarily deal with the very human factors of whims, uncertainty, reactive business mentalities and the phenomenon that is best summarised by this webcomic: https://theoatmeal.com/comics/design_hell

In my field they like to call this "hurry up and wait", a nonsensical but fitting description that summarises everything from changing scope to the unjust imbalance of time between the principal and the agent.

(There is a comment further down which suggests that we could just train AI to deal with this variability, I hope that's humour... sweet summer child thinks you can train AI to predict the future.)


I think the fear should be less about AI taking 100% of jobs but it should be AI making a single programmer do the job of 5, which would wipe a majority of the market out and make it a non-viable career option for most.

Companies are already bloated, imagine when they realize one overworked highly paid senior can replace 10 juniors.


If increased productivity equaled job loss, there would be two programmers alive today, doing the same job as the fewer than 10000 programmers using punch cards as we entered the year 1950.

A lot of projects today are not even greenlit because they would be too expensive to make. For instance, there are a lot processes in almost every country that require you to file paper forms, even though we have had web forms and databases for 25 years. This goes even for rich countries. If one programmer is able to do the job of 5 others, we will probably have 5 times as many systems to tend to now that businesses and institutions can afford them.


This is true for other domains also. My wife translates for an ecommerce business. A lot of what she does is very automated. The software she uses remembers all phrases she has translated and uses a word bank she manages for special parts and uses DeepL for new translations which are then only proofread and fine tuned. It's amazing how much stuff she can translate that way in a day (no idea how many thousands words). She is kind of managing and overseeing an AI (DeepL) and sanity checking the work. If this was 20 years ago one would probably need a translation department of 10-15 people to do the same work. However, her company would have never been able to justify that kind of cost. So, in my view: Software and AI made translators probably more than 10x more efficient in the last 10 years, however the amount of stuff that gets translated also increased 10 fold during this time.


Yep! I think this is both the optimistic case, and the version of the future that seems most likely to me.

I'm still a bit bummed about it, because just writing and structuring code is something that often brings me joy (and maybe your wife feels similarly about the actual translation work...), but at the end of the day, I realized well before all this AI craze that it really isn't the most valuable thing I do in my job.


Something like 97% of all jobs have been eliminated since the Industrial Revolution started ~250 years ago.

And instead of the 97% unemployment our intuition tell us would happen, we're instead 30x as productive.

The same will happen when/if AI makes writing software far more effective.


> … instead of the 97% unemployment … we’re instead 30x as productive.

Economic productivity simply means output per worker. So, we could make 97% of the US population permanently unemployed tomorrow, and as long as the AI replacement labor yielded equal or greater proceeds to the remaining 3%, we’d see a colossal increase in “productivity.”

That would be in the interest of the 3% to do (or attempt), which is what makes the scenario so scary.


But that’s never happened in the past. Why should it happen now? The Industrial Revolution is used as an example because it was the biggest spike in productivity, but the general trend has been the same since the invention of tools. Agriculture, another perfect example, lead to others having time to produce things instead of hunting/gathering. It’s easy to grasp with the agriculture example that “unemployment” wasn’t even really a risk.

Sure, there will be niche sub-industries that will be erased by LLMs, and people who have those niche skills will suffer. This has always been the case with technological advances. And always, it has not disrupted the economy.


> Why should it happen now?

What are S-curves.


Perhaps the quantity of labour utilizing each new technology through time is a (n-shaped) parabola that intersects with the technology it replaced.

The fear that technological advances will cause mass unemployment and destroy labour markets has been common throughout history. Yet here we are at full employment. Maybe this time is different?


It is different this time. In the past automation effected some domains more and some domains not at all. People moved to those other domains. AI can run all the domains humans do and more that they can't.

https://www.youtube.com/watch?v=7Pq-S557XQU


>If increased productivity equaled job loss there would be two programmers alive today, doing the same job as the fewer than 10000 programmers using punch cards as we entered the year 1950.

The only reason it's not the case in this example is because computers at the time were a tiny early adopter niche, which massively multiplied and expanded to other areas. Like, only 1 in 10,000 businesses would have one in 1950, and only big firms would. Heck, then 1 in 100 million people even had a computer.

Today they've already done that expansion into all businesses and all areas of corporate, commerce, and leisure activities. Now almost everybody has one (or a comparable device in their pocket).

Already cloud based systems have made it so that a fraction of programmers and admins are needed. In some cases eliminating the need for one altogether.

There are tons of other fields, however, more mature, where increased productivity very much equaled job loss...


Have no doubt, we will find new places to put computers. In the 80's and even the 90's everyone said the same thing, "Why do I need a computer? I can do everything I do already without a problem?" Well, turns out with computers you could do 12 more things you can never considered. Consider the interoffice memo: it'd take what, 1-2 hours to get a document from one floor to another through the system? Cool, you can work on 2-3 projects at a time maybe, because that's all the bandwidth allowed. Along comes email and ups that to 5-6 because now the communications can be pretty consistent. It's still not perfect, because what if you're at lunch or the gym? Then came Blackberries and all of a sudden its 12-15 projects at once. Then Slack because you don't even have to think. Now add this.

Notice that during that time there weren't all of a sudden less programmers, or managers or sysadmins, if anything there's more. If anything everyone is even more stressed with more to do because of the context switching and 24/7. That's why this will do, I'd bet money on it.


I dunno, I feel like we've been trying to find new places to put computers for a couple decades now, and the effort is kind of losing steam. Every week there are threads about how useless a lot of these efforts are.


You just don't hear about all the computers, that's all. There's one on your car's key fob, and likely 100 more in the car. Your coffee grinder has one. So does your dishwasher. And your electric blanket.


There are 100-150 CPUs in a modern car.

...And also over 100 bugs in the spec that governs communication between car systems: https://youtu.be/hXnS_Xjwk2Y?t=899 (around 15 minute mark)


>In the 80's and even the 90's everyone said the same thing, "Why do I need a computer? I can do everything I do already without a problem?" Well, turns out with computers you could do 12 more things you can never considered

Yeah. Also, unfortunately, it turns out those people in the 80s and 90s got it right. They didn't really need a computer - they'd better off without one. But as soon as we got them, we'd find some things to use them for - mostly detrimental to our lives!


If increased productivity equaled job loss, there would be two programmers alive today

Increased productivity doesn't necessarily lead to overall job loss, but it will eventually in the area where the productivity is realized. Agricultural employment is a very clear example.

https://cdn2.vox-cdn.com/assets/4565243/Ag_workforce.png


otoh the rise of ATMs seems to have increased the number of bank tellers


Well, yes, it got down, but only from 12 mln to 5.5 mln people in agriculture.


Agriculture is still automating and the population has increased what, 4x since then


The US population increased about 15x in that period and one of the major reasons for that is the increase of productivity in agriculture [0]. Higher productivity created more demand.

[0] https://en.wikipedia.org/wiki/Green_Revolution


> If increased productivity equaled job loss, there would be two programmers alive today, doing the same job as the fewer than 10000 programmers using punch cards as we entered the year 1950.

Yeah. The problem is that there’s only one of me. All the rest of you are filling in for the other guy.


Correct. This is pretty much Jevons's paradox: https://en.wikipedia.org/wiki/Jevons_paradox


yup just literally talked a client out of a nice little mobile project by telling him my rate. lets say I had AI by my side - I'd be quoting the same rate, but number of hours would be lower. project might be a go then.


> Companies are already bloated, imagine when they realize one overworked highly paid senior can replace 10 juniors.

Yep. This is where I'm at in terms of personal armchair predictions of the future.

I expect the labor market will be tough for more junior software engineers in the coming years. This might indeed cause backpressure in the supply of new grads/new labor force entrants in this family of fields ("software development").

However, the "highly paid senior" is only around for so long before achieving financial independence and just not working anymore. Then what? The company didn't hire juniors because the "highly paid senior" did all the work. Whoops. Now the company lacks the pipeline of people to replace that senior.

It'll sort itself out in time, but the next decade will be interesting I think. Some companies will realize that they must make investments into the future of the labor force and will do better in the longer term. Other companies might indeed "fire the juniors" for some short term gains and find themselves lacking replacement staff later.


I see it the other way around the senior engineer is expensive and costs a lot 2 juniors are cheap. Who cares if their code is crap they produce it really fast so they find a bug, just fix it quickly. If anything the time a Sr. spends thinking about things to do things "right" is seen as a waste of time. Whereas the jr. will produce an enormous amount of buggy code but they can fix the bugs quickly by just throwing another prompt to ChatGPT to solve.

Now some might say that the code will be terrible quality and buggy and full of holes and the users will hate it, it is never economically viable to build enormous systems on a house of cards like that. To which I respond, you just described every piece of enterprise software I've ever used ever.


Because as soon as you tilt from producing features to fixing bugs, you are losing traction and your competitors will win and now you are bankrupt.

I've seen entire businesses collapse because of this mentality.


Yes, this is also why senior engs get payed so well. Actually if you are junior dev you basically cost money, imo. However a lot of companies hire those in hopes they will make their career and stay longer - especially startups do that, as they can slide on the company culture hook way more. Also they need them as senior engs require equivalents of "secretary" to handle less import things.

I don't mean to sound mean, senior devs are also secretaries of their cto and so on.


Tell that to SAP, Oracle, ServiceNow, or a dozen other enterprise software vendors.


> you are losing traction and your competitors will win

i mean, unless your competitor produces order of magnitude less bugs than you, how can this be true?


Disagree -- I think chatGPT will make juniors more palatable to hire. ChatGPT will basically give juniors a free pair programmer to baby-sit their work/progress. Why pay extra for senior devs when juniors can become much more efficient thanks to ChatGPT becoming stack-overflow on steroids. I think the wage gap between junior and senior will actually drop massively. I predict teams will keep 1-3 architect level positions for designs and reviews and replace all seniors with cheaper juniors.


I really doubt that ChatGPT will be able to give the kind of guidance that turns juniors into seniors.

Getting juniors un-stuck on "simple" problems, maybe. Stack Overflow already does this. Image doesn't build? Search for the error message online. Don't know how to build the old software that only works in Ubuntu 14.10? Sure, you'll find that.

Suggestions on how to refactor, proper design of interfaces, what skill to acquire next? Maybe, but that will be a bigger surprise.


I think it could go either way depending on the product. For example in app/game/web development where code quantity > quality, hire more juniors who can bust out code with ChatGPT all day. But if you're developing software for medical devices, vehicle control systems, HFT, etc. Then nobody's going to let some college grad using ChatGPT touch it. You'd hire senior engineers who can be responsible for the reliable operation of the software and they can use ChatGPT for code review, test suites, etc.


Even in the gaming industry. Typically you have people developing an engine, common frameworks... tools that downstream work to lower branches - now I can just type to chatgpt rather than go through requesting/reviewing, see quicker where I mis-designed my "framework" etc... I am afraid it's gonna be not great for junior engs all together.

Every org has varying level of engineers. This technology will make cheap grind, well cheap. We see the progression. With the speed of changes it seems we should all be worried. (as why does it matter despite being last we fell a year after)


> Companies are already bloated, imagine when they realize one overworked highly paid senior can replace 10 juniors.

That is already possible without AI and has been the case for a long time... the issue is nobody will stay to be that highly paid senior running entire projects because at that point you can just run your own shop and pocket the full profits.


It also causes a "single point of failure". If that senior gets hit by a bus then what? Can the company afford to bring in another senior that will take ~6 months to become productive?

I'm not disagreeing with you. Im thinking of going solo myself.


The biggest problem going solo is getting the projects/contracts. Now it will become even harder so one has to charge less and less to compete.


I wanted to say the same thing - except that the senior can't actually perform 10x because they're too busy trying to train the next gen devs while attempting to deliver stories themselves in the fractions of time available to them.

Not to say that this is a bad thing, but the difference between a junior and senior is often much more than the difference in their salary.


Fixed pie fallacy. More likely, there will be 5x as many apps / companies with just as many jobs available. And the cost of everything will go down, except truly scarce things like Manhattan real estate.


This was already the case 10 years ago.

Except, there was no AI, but an alternative called an offshored development center. You would send your spec and design document and get, via email or FTP if they were really cutting-edge a number of files that would sometime compile and even, according to the legends, actually work. The way this tech worked is that you generally had to wait overnight for your spec to "mature" into code.

Some places figured out they could hire local engineers for about 5-10x what they paid for this "offshored development center" tech and get better results.


I think this fear is unfounded because history shows this is not the case. We will adapt to the new level of productivity and our goals will adapt accordingly. What we are expected to produce will adapt. You were able to pump out 4 solid production grade pull requests per sprint? Then the expectation increases to a single dev being able to pump out 10. The company's ambition will grow as a result of the newfound efficiencies, and management will continue to request things that cannot be reasonably delivered on time with the people and tooling we have.


I don't know if that's going to be the case - companies can never have enough software and typically they just go until the budget runs out as the software is never "done". I think being able to build 5x the software with one dev means that each company is going to build 5x the software.


> imagine when they realize one overworked highly paid senior can replace 10 juniors

This already happens, the market is just not very efficient about it, e.g. a highly paid senior dev is not working at a company that only needs 2-3 developers, they're working at Google with 100's of devs.


Disagree -- I think chatGPT will make juniors more palatable to hire. ChatGPT will basically give juniors a free pair programmer to baby-sit their work/progress. Why pay extra for senior devs when juniors can become much more efficient thanks to ChatGPT becoming stack-overflow on steroids. I think the wage gap between junior and senior will actually drop massively. I predict teams will keep 1-3 architect level positions for designs and reviews and replace all seniors with cheaper juniors.


I predict the wage gap will actually widen. StackOverflow-on-steroids can only give your so much, so seniors will still be in demand for what they do. It's just the competition between juniors will be fiercer so junior wages will drop.


I think dev wages will stagnate across the board. I think the simple existence of the fear of being replaced by never-tiring, always-working AI will subconsciously prime devs to be willing to work for less. Devs have been in a market with a significant shortage of skilled labor; can we say that remains true after GPT hits the mainstream even more than it already has?


Right now? It made a splash but who uses ai for building their software this very moment? Nobody knows how the industry will land on this. Adoption will take time - it's not gonna be a year.

I remember after ruby on rails first demo, next day people had their websites up in ror. Here we will see, but it will be a bigger shift.


How much more efficient is programming in Python with modern tools vs assembly? I would bet more than 5x


That's an interesting case and use of the word efficient.

Developer time reduced by 5X CPU time increased by 500x

When CPU cycles are cheap it's a logical tradeoff but it still grates against my sensibilities.


5X is a gigantic underestimate of how much developer time has improved. How long would it take a good assembly programmer to implement a CRUD web server connected to a database? Way more than 5x longer what a mediocre Python programmer would need.


> How long would it take a good assembly programmer to implement a CRUD web server connected to a database?

You know you can call import functions and call libraries in other languages than python? If everyone was programming in assembly then there would be plenty of libraries to make database calls or make web servers in assembly, and there would be plenty of tutorials online that you could copy and paste to get things going, meaning it wouldn't take much work at all. It would be more work, but not orders of magnitude more work.


an assembly programmer wouldn't write a web server connected to a database, they would implement a tui over tty, and it would be faster.


How good would it be at letting business users attach photos and emailed documents?


Depends on the field kinda. For a really extreme case, something like big dense linear algebra is going to just use BLAS calls anyway. For big enough matrices, all of the flops are coming from the library, the cost of calling from Python vs the cost of calling from C or Fortran is amortized anyway, and probably most people won’t beat a tuned BLAS however much developer time they throw at it.

It makes more sense to tune libraries excessively, we could say it is less of a trade off, more of an allocation of the finite low-level tuning developer resources to high-impact libraries.

Anyway it turns out that the only way to get most people to link to a good BLAS is to use Numpy and distribute it through some giant Anaconda Rube Goldberg machine, so, I dunno, people are weird.


What does it even mean to say that a hypothetical assembly program is 500x faster than an existing Python program, when the assembly program does not even exist?

It's some kind of philosophical question.

Maintaining an assembly version of a modern software stack is not 5x more costly, it's simply not possible.


We should stop thinking about the trade-off as between developer time and CPU time. The CPU is an inanimate object which doesn't mind pushing around more electrons. What we're really throwing under the bus when we optimize for developer time is the customer's time.

If an inefficiency saves a developer 10 days of work, but results in an operation taking 100 milliseconds longer, and you have 50M customers who do that operation just one time, then you've wasted ~58 customer-days.


Don't forget customer's battery


So we just have to figure out if developer-hours or customer-hours are the tighter bottleneck.


Good question. Why don't we pit one of today's Python programmers against an IBM VS assembler programmer on an IBM 360 system circa 1960 and see how far each gets in a week?


Let's make it more fair: what about one of today's Python programmers vs an IBM assembler programmer using Erlang/Elixir to develop a telephone switch?


> making a single programmer do the job of 5

I mean, I already do? And for totally mundane and benign reasons. This has been my experience in this industry for the last 8 years or so, though my first 9 years I felt like the pace was maintainable.

Do more with less. It's so common to come across situations where so and so left, their position won't be backfilled, the targets don't adjust to compensate for the productivity hit, and we can put up or shut up.


I'm kind of on the fence with this one. As someone who does programming at work,I don't want to be partially replaced by a prompt, however there are also lots of sectors, where the problems are big and solving them could help a lot of people but money is an issue,so being able to access resources on a scale of large tech companies would be amazing.


You're not thinking that companies will not just spin up 5 projects, so 5 programmers produce a notional 25 person's worth of work. And hey, maybe they sack the expensive old guy and the AI spend isn't so great. Seems like a win.


Wouldn’t this actually create more demand for programmers? More businesses can exist that have a need for software. To date every advance in programming efficiency has resulted in more tech, not less.

Additionally there’s the math behind it. If Company A fires 50% of their staff because AI lets the remaining 50% work at twice the productivity then how will they compete with Company B that keeps their staff and now gets 200% efficiency?

The math is in favor of getting more, not fewer, developers.


Yep, I personally see the cost of doing a start-up going way down for individual developers or even junior developers.


You know, until that AI code breaks and you have no idea what's going on internally.

You know how hard it is to start debugging a project you haven't been onboarded to? Now imagine that every time something has a hiccup.


This is the ray of sunshine in all this AI stuff, to me.


yes - and in another way, too. a lot of the demand for programmers is driven by successive waves of new technology adoption and investment - and AI is looking to be a motherload that should keep us going for awhile.


If you can have one programmer do the job of 5, then you will be defeated by companies using 5 programmers to do the job of 25.

Also, if your one programmer dies you now have no programmers and loss of all institutional knowledge, whereas the other company can lose several programmers and still be OK. There is value in having redundant programmers.


That would be awesome. We have a never-ending list of things to do and not enough time to do it all. What you're saying to me is that we can get through our list of priorities even faster, and stuff won't continually rot at the end of the list.


If as GP says AI could automate 1% of a single programmer's job (the boring part where you write code), then how on earth can you derive that a single programmer could do the job of 5 with AI? It's completely illogical.


The GP estimate is way off...

If they indeed wait for input from other departments/companies 99% of the time (so they just need to actually program 5 minutes in their 8-hour workday), then they can be already thrown out of a job and have the company do with 1/10 the programmers, no AI required...


Clearly the company needs to dedicate a programmer to wait on each customer individually. You can’t have the same person waiting for two things simultaneously.


Can't believe companies are still single-threaded in 2023.


Exactly!


I suspect the post you’re replying to disagrees with the 1% figure.


many programmers (most?) spend hours a day coding and see it as the best part of their day?


Companies were bloated because of the profit potential of tech and organizational dysfunction.

AI wasnt needed to cut down on bloat - some combination of a few companies reaping the majority of the profit and reduced dysfunction is enough.


Companies will continue to hire even more people. Their single point is growth.

If one can go the job of 5 - they revenue per employee will increase.

They will hire even more employees.


Yeah, but they would then just start 10 times the number of projects at once and watch the majority of them fail.


To paraphrase Harold Abelson:

Computer Science is not really very much about computers. And it’s not about computers in the same sense that physics isn’t really about particle accelerators, and biology is not really about microscopes and petri dishes. It is about formalizing intuitions about process: how to do things [0].

[0]: https://www.driverlesscrocodile.com/technology/the-wizard-1-....


“Computer Science is no more about computers than astronomy is about telescopes”

- Edsger Dijkstra


This rant from a little while ago soured me on this quote: https://news.ycombinator.com/item?id=34940148


FWIW, that's actually a good rant.


And then some guy called Kepler invented a new kind of telescope.

(And we all atarted writing more quantum algorithms.)


Watching Abelson give that lecture(on video I mean, first lecture of SICP IIRC) made a lot of things click in my head as a neophyte programmer. Even after one lecture my understanding of the nature of computation had grown immensely.

He and Sussman are great at distilling and explaining abstract concepts in a clear and precise way.


That's irrelevant though, because actual programming is very much about computers, and about writing things the majority of which have already been formalized by computer scientists, and gluing code and programs inti pipelines...


I just adore the hubris on full display here to say what one of the world's best computer science educators has to say is irrelevant.


First, this is an argument by authority. One of the "world's best computer science educators" can still say all kinds of wrong stuff. Especially someone as opinionated as Dijkstra, with many controversial opinions other world leading computer science educators disagree with.

Second, relevance and applicability is depending on context. On the context of this discussion, about practical programming in the trenches, what Dijkstra said is irrelevant.

For starters, he was talking about computer science.

Programmers in the trenches (what the thread is discussing) don't do computer science. They merely apply stuff developed by computer scientists (but also by non-scientific practioners in the field), in informal (and often haphazard) manner, to solve business problems (often the same problems again and again, which is something totally beneath a scientist).

So, yes, something Dijkstra had said can still be wrong, regardless of his achievements. And it can also far more easily be irrelevant, as this just needs his saying to be unrelated to the discussion.


This is an argument of "challenging those who have the hubris to think Harold's statement is irrelevant to stop thinking small minded", if I were to call it anything.

Additionally, even if I concede it is an argument solely on authority, just because an argument contains a fallacious point doesn't make it untrue, otherwise that would be the fallacy fallacy.


>Additionally, even if I concede it is an argument solely on authority, just because an argument contains a fallacious point doesn't make it untrue

Notice how you also conveniently sidestepped my arguments about why it's also untrue (on top of containing a falacious point).


We're not talking about Dijkstra, OP is paraphrasing Harold Abelson.

https://en.m.wikipedia.org/wiki/Hal_Abelson


As a computer science educator, it's worth remembering that Dijkstra was incredibly full of himself and his specific formal view of the field. There are large parts of computer science that have nothing to do with the machine and there are other parts that do. There's no single cut and dried answer here, we're a pretty diverse field that spans everything from mathematics to engineering to a touch of art and human factors.


What you say about Dijkstra is true, but we're not actually talking about a Dijkstra quote here, we're talking about Harold Abelson.

https://en.m.wikipedia.org/wiki/Hal_Abelson


I think we all started replying to a sibling comment that quoted from Dijkstra and we got it threaded wrong.

"Computer Science is no more about computers than astronomy is about telescopes” - Edsger Dijkstra


"Science advances one funeral at a time"

- Max Planck

Are you saying Max Planck is wrong here? There is plenty of evidence that he was right, that statements made by famous scientists holds fields back because they were reductive or hurting different perspectives. Putting their words at an altar is holding us back.


I think the parent's point is that even highly inspired wisdom can be irrelevant if applied to all situations

More like "it doesn't apply" than "it's inherently invaluable"

Also your comment is technically an appeal to authority


That's the most humble HNer for you btw.


It is indeed not particularly relevant to the emergence of AGI and its effect on our work.


This is also one of the reasons India taking over all of the programming work didn’t really happen. There are numerous issues (time zones, language, etc.) but business people not being able to document perfectly what they want to have built, considering all corner cases and paths, is a big one.


Even so India won't take programming works for the same reason no other country can. They're only a percentage of programmers that are outstanding there, the rest are mediocre. Because their population is huge and the programming lesson reach widely, they produce more outstanding programmers than other country, but still won't be enough.


Ultimately the model that worked was "I have this tighly scoped project that no one really wants to work on that's completely self contained and is going to require a ton of manual work" and hiring contractors to implement your own design.

Otherwise if there's a lot of back and forth required or generating a design, forget it. Companies giant and small have tried it and eventually realized "F it" and gone back to in-house teams.


I have the same feeling, people are very concentrated on the ability of this AI generators to create working code from super specific and well formed prompts. When in reality, figuring out what the prompt should be accounts for 80% of the job.


don't fall into this mental trap. you can get into recursion quite easily here, and figuring out what to prompt can start from simple general questions - and there is no need for a developer at all, aside from the current limitations of copy/paste/run workflow has to be done manually


The raison d'être of COBOL (now Cobol) in the 1950s was that "there is no need for a developer at all". Managers can produce their own reports.

How did that turn out? A whole profession was created.

The thing that comes closest to "no developer needed" is Excel. But you really need someone who knows programming if you want a reliable, robust product, even in Excel.


It's astonishing to see the goalposts move so quickly. The cope of "well, okay, it can do that, but that's not even the hard part!" when just a year ago this entire product was almost unimaginable.


The goalposts haven't really moved though? The reality that a lot of people have thrown "engineer" in their title but just write code given a spec or tightly scoped requirements or design has been true for a while now.

But for quite a few folks, the coding is the easy part. When building larger systems you're not even that often implementing things from scratch. The decision making, dealing with existing tech debt, tooling, etc. is the hard part. Ambiguity is the hard part and it's always been that way.

Don't get me wrong GPT-* is impressive. Heck, I pay for a subscription. But it won't really make a lot of folks at my company notably more productive.


The realization that software development is only maybe 5% coding came way before GPT.

> The cope of

Whenever I see someone use 'cope' that way, I immediately think they're making a bad faith argument. Is that intentional?


I don’t think anyone is claiming to not be impressed. Yes, we’re all impressed. This was unimaginable sci-fi just 5 years ago. You did it, we’re impressed! You won.

The next step after that is for people to figure out how to justify their continued relevance. I think that’s a pretty natural reaction.


I’m not concerned. If AI replaces my job then ok I’ll find something else to do. Self driving cars seem to have stagnated so maybe I’ll be an Uber driver


I wouldn't count on that, you can call a self driving taxi right now in Phoenix.


I'm impressed. But I want it to be better, so I don't have to spend so much time coding and can trust it to give me good code so I can offload a lot of my efforts on to it. Right now I have to check and verify or go back and forth with prompts so much I almost would have been better off writing the damn code in the first place.

Meanwhile I've got a backlog of about 100 personal projects I've never gotten around to making (especially creating digital versions of my board game designs, of which I have about 70 at this point) that I just haven't had the energy to work on myself, that I'd love to be able to just upload the rules document for it and get some sort of working game spit out the other end, or at least a good chunk of it, and I can take it from there.

And then I can test a new rule by just rewriting the rules document, as opposed to having to manually code every rule change into the game.

I don't think I'd run out of things for it to make if it was as good at complex software generation as A.I. is right now with art generation.


Do you find AI to be good with art generation? I can't use any of the art in the projects that I do without extensive inefficient editing since the stuff it spits out isn't how a human would draw.

It's good for concept browsing, but not much more for me at the moment.


I haven't messed with it much since I find the process for the 'good one' (Midjourney) annoying (go onto a Discord channel, type your request in public for a bunch of people, wait a while, hunt for it amongs the long channel of requests, etc).

I'm assuming the process has gotten better since, but I don't know. I'm mostly just using free vector art and simple colors/shapes or whatever simple things I can make in Blender for my art still, in part because there's such a backlash against using any A.I. art right now.

Most of it is judging by what people have been saying in groups online. Some people have found it very useful and use it extensively, like for their board game art.

It doesn't even have to get 100% of the way there (for coding games based on rulesets). Even 75% would probably save me a lot of time and allow me to find the energy to close the gap as opposed to not even starting since I have so many other projects going on.


Have you ever worked at a software firm of greater than say 30 people? Doesn’t resonate with my experience at all, and the 30+ people are there not just to write code.


Everyone's focused on writing code - is the code even needed when a good enough AI exists? How many of us are writing code to extract 10% more efficiency out of humans, if they go so do we.

Also if software development does survive, it's going to look very attractive to all the other unemployed people.


i think currently those 6-figure salaries are looking great for like most of humanity - I don't expect seismic shifts here.

but there is a funny mechanic at play certainly: once some dev work gets cheap as a commodity, demand surges suddenly since at a low enough price point and availability everyone wants to have some need solved that made no financial sense before.


>Basically, 99% of the effort is NOT WRITING CODE. It's all about communication with people, and problem solving. If we use GPT-X in our company, it will help us with 1% of our workload

First, if you did have GPT-X (say GPT-10) in your company, there wouldn't be much back-and-forth communication either. Those parts would still be handled with GPT-X talking to another GPT-X in the other company. Even the requirements might be given by a GPT-X.

Second, even if that's not the case, the part of doing the communication can be handled by non-programmers. Then they can feed the result of the communication to GPT-X and had it churn out some program. Perhaps would keep a couple of developers to verify the programs (sort of like GPT-X operators and QA testers) and get rid of the rest.

As for the rest of the current team of developers? GPT-X and the people running the company could not care less about them!


> Those parts would still be handled with GPT-X talking to another GPT-X in the other company. Even the requirements might be given by a GPT-X.

What happens if one (or more) of the GPT-Xs starts having hallucinations while they're busy working on this project?

> Second, even if that's not the case, the part of doing the communication can be handled by non-programmers.

I was in a meeting with a sales manager and tech team a few days ago. Sales manager had been talking to a potential customer about a product that neither he nor the customer properly understood. They both thought it would be an excellent fit for a particular (new) purpose, one for which it was not designed.

As it turned out, everyone on the tech team knew that both sales manager and customer were utterly and catastophically wrong, but it took the best part of an hour to finally convince him of this.

It's quite hard to have useful conversations about stuff if you don't actually understand it.


Are hallucination is a systemic for-ever problem that will not be solved, mitigated or in akne other way rendered inconsequential?

Also, having conversations about things you don't understand with a machine, where you don't have to keep up social decorum and can ask the dumbest questions should help a lot with improving the decision making of non-technical personel


> Are hallucination is a systemic for-ever problem that will not be solved, mitigated or in akne other way rendered inconsequential?

In the same way that wrong results in a search engine will not be solved or mitigated in its entirety, yes. These new generative AIs are big search engines that bring results from several content points and combine them into a single, plausible stream of words.


> having conversations about things you don't understand with a machine, where you don't have to keep up social decorum and can ask the dumbest questions should help a lot with improving the decision making of non-technical personel

This sales manager didn't know that neither he - nor his customer - properly understand the product until he was specifically called out on it (by me, as it happens. At least the boss laughed). Are we expecting GPT-X to sit in on Teams meetings and ask questions like that, too? Sales manager literally did not know he needed to start asking questions about the product. Tech team were calling BS on it as soon as he reported his conversation with customer.

"It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so.", which may (or may not) have been said by Mark Twain.


"Speaker Coach - now with AI" isn't that far-fetched, given Microsoft's relationship with OpenAI and how quickly they did the same with Bing/how hard they're now pushing it

https://support.microsoft.com/en-au/office/suggestions-from-...


> Second, even if that's not the case, the part of doing the communication can be handled by non-programmers.

It can’t - only programmers will know which follow up questions to ask. GPT will be able to ask those questions before non-programmers will be able to.

Half the work is nitpicking on date formats or where some id comes from or if a certain field is optional, etc.


Last paragraph is spot on. Would also add a vast amount is spent looking at the code and the system and just understanding what goes where.


The problem is that a GPT capable of doing all those things at the level required is also capable of running the company. Someone with capitol can start the business and set GPT-X to go out and maximize paperclip profits.


Why aren’t you thinking rather that instead of talking to you, “they” would already be talking to the LLM (likely trained on your code, among other data)—while you get 0 total billable workload in the first place?


The issue being that neither they, nor the LLM has the proper model for the problem domain and so don't ask the right questions when trying to extract business requirements.

Additionally, this is "stateless" to an extent. There's no architectural plan for how it should work when you have an LLM do it. "We're using X now but there are plans to switch to Y in some number of months." This could lead to making an abstraction layer for X and Y so that when the switchover happens there is less work to be done - but that requires forward looking design.

If "they" only describe the happy path, there is no one to ask about all the unhappy paths, edge cases and corner cases where the naive implementation of the problem description will fail.

Hypothetically, yea, "they" could be trained to think through every possible way the generated code could go wrong and describe how the code should work in that situation in a way that isn't contradictory... but that remains an unsolved problem that has nagged developers for decades. Switching to an LLM doesn't resolve that problem.


You don’t need to have a model if you are a sufficiently advanced autocomplete.


My intuition here is that it's because people don't always say what they mean, or know how to describe what they want.

I've been working on a database migration recently, and I look forward to the rare moments when I get to write queries and analyze actual data. The vast majority of my billable hours are spent trying to tease out the client's needs by going over the same ground multiple times, because their answers keep changing and are often unclear.

It takes a lot of processing to figure out an implementation for someone who will straight up describe their requirements incorrectly. Especially when a higher-ranking person comes back from vacation and says "no, everything you nailed down in the last two weeks is completely wrong".

I don't think any of the current LLMs are going to handle these types of very common situations better than an experienced human any time soon. It's like that last 1% of self driving which may actually require AGI. No one can say for sure because it's not cracked yet. I think most of us will continue to have job security for quite a while.


> Especially when a higher-ranking person comes back from vacation and says "no, everything you nailed down in the last two weeks is completely wrong".

Yes, and at some point this high-ranking person is fed up with this now-inefficient use of time and money enough that they will just sort this out using an LLM tuned to handle this situation better if not today then tomorrow.

Maybe they will pay someone to coach them for a week how to “talk” to LLM, but other than that the one who gets paid in the end is OAI/MS.


Imagine how angry this hypothetical person will be when they get the same problem from the LLM, after all those extra steps.


Imagine an LLM tuned to eliminate misunderstanding and ask why at least 5 levels deep… Without fearing to irritate the boss or to create an impression of being not smart, both possibly harmful for human career but irrelevant to unthinking software tool.


I too like science fiction. People keep acting like it will be easy to bolt on things like eliminate misunderstandings onto LLMs and quite frankly I would be incredibly surprised if that happens any time soon.


Eliminating misunderstanding comes down to willingness to ask more questions if you have low confidence. The main reason this doesn’t happen is subordinates afraid to look stupid or lose jobs. None are concerns to an unthinking machine.


I'm pretty sure you may be right. I'm also worried that what youve just described is the kind of task that leads to burnout in large doses. And I'm not sure humans are so great at it either.

I had one job that involved a small amount of coding and mainly hooking together opaque systems. The people behind those systems were unresponsive and often surly. I had to deal with misleading docs, vague docs, subtle, buried bugs that people would routinely blame on each other or me and I was constantly on a knife edge a balancing political problems (e.g. dont make people look stupid in front of their superiors, dont look or sound unprepared) with technical concerns.

It was horrible. I burned out faster than a match.

I'm sure ChatGPT couldnt do that job but I'm not sure I could either.

If most tech jobs turn into that while the fun, creative stuff is automated by ChatGPT... that would be tragic.


My two cents is that the parts of the job that are more like product management will become more dominant but still not exclusive, and the parts that were more like coding will become less dominant but still not vanish. Many of us, as you describe, already do jobs that look a lot like this. But for me, it's not consistently that way; there are periods where I'm almost entirely coding, and periods where I'm almost entirely doing communication. I do expect a shift in this balance over time.

The other thing that I spend a huge amount of my time doing - consistently more than writing code - is debugging. Maybe these models really will get to the point where I can train one on our entire system (in a way that doesn't hand over all our proprietary code to another company...), describe a bug we're seeing, and have it find the culprit with very high precision, but this seems very far from where the current models are. Every time I try to get them to help me debug, it ends in frustration. They can find and fix the kinds of bugs that I don't need help debugging, but not the ones that are hard.


> Basically, 99% of the effort is NOT WRITING CODE

I've come to realize this is true in more contexts than I would like. I've encountered way too many situations where "sitting on your hands and not doing anything" was the right answer when asked to implement a project. It turns out that often there is radio silence for a month or so, then the original requester says "wait, it turns out we didn't need this. Don't do anything!"


This is exactly right. Actually writing the kind of code that ChatGPT produces is a vanishingly small part of my job. And there's a ton more specialized scenarios to deal with, like next week when the third-party company is breaking the contract in <contract_json>.

If you want to hire a developer to implement qsort or whatever, ChatGPT has them beat hands-down. If you want to build a product and solve business problems, there's way more involved.


Consider creating an AI stakeholder that speaks for the client. This approach would allow the client to provide input that is wordy or scattered, and the consultant could receive immediate responses by asking the AI model most questions. Better yet, they can ask in a just-in-time manner, which results in less waste and lower mental stress of collecting all possible critical information upfront.

As the project progresses, the AI model would likely gain a better understanding of the client's values and principles, leading to improved results and potentially valuable insights and feature suggestions.


There's major $$$, legal, and security ramifications for clients in many cases. Having an AI that can't properly deal in ambiguity and hallucinates an outright reckless idea 1% of the time is completely unacceptable.

Writing code, sure. A human ultimately reviews it. I suspect in the legal world a lot of legal writing can also be automated to some degree. But strategic decisions, designs, etc. very much need a human pulling the trigger.


I agree, but I would put it like this: 99% of a software developer's job isn't writing code, it's getting consensus across stakeholders what the "prompt" for the coding task should be, how that prompt should change over time, which parts of the problem should be included in this week's prompt and which one's should be tackled next week, etc. Also, more often than not, the task isn't exactly about generating code, it's about sending data between various clients and servers, tweaking code where necessary for compatibility and new data shapes.


Just wait until XXQ adopts the same AI technology to keep the AI-using company’s business. Then the AI can simply coordinate with one another, and make the appropriate changes faster than currently possible. Microsoft is well positioned to do something like this and already working toward this end to end collaborative AI.


Absolutely this. I still don't understand why people stop seeing the AI at one end of the business/computing/requirements gathering model. You should have it at both ends, "converging" towards the target.


By the time code is being written the job is effectively done.

Unless your problem space is unsolved (where LLMs are unlikely to be useful either) very few devs are spending much time on the coding part of their 84th CRUD app.


This is exactly why I'm so tired of these "can AI write code" think pieces. I assume people writing this crap aren't actual developers. Maybe its management fan-fiction.


I suspect there's a strong invisible ideological undercurrent pushing a lot of this. When I was younger and enthusiastic about things like a universal basic income, I would often follow the latest murmurings, always ready to let my hype (and hyperbole) meter go to 11. I remember when I saw the first news about some drone delivering a pizza (in New Zealand?) I immediately jumped to it foretelling the imminent end of all delivery jobs, with broad customer service not especially far behind. There's even the fully automated Henn-na hotel in Japan, I mean omg!

In my naivete, the idea I had is that if the jobs disappeared en masse then a social solution to the economy would be forced to be enacted. So I was essentially hoping to see the destruction of normalcy and employment in any new technology. I would expect that view is not uncommon given the direction of contemporary education. It feels analogous to cows hoping for the end of beef/milk harvesting. My beleaguered bovine buddies, what awaits you there is something rather different than cowtopia.


I think the difference here is that LLMs are powerful, general purpose and suited to take on many common office tasks as well as accelerate programming work. I think “we need 10% fewer white collar workers” is a massively destabilizing scenario that is not very far fetched.

I also completely agree that society/politics will not have caught up with these developments in time to mitigate them.


"Maybe its management fan-fiction"

:D Priceless!


see also "the future is no code!" fad of a couple of years ago


Never went away; been hearing about it since msft sold the idea to managers in regards to SharePoint... Fucking SharePoint. They can design their own forms and hook the data up to bits of automation. Like magic for muggles.


And in infra there's figuring out what new approaches you can use to replace stuff in your infrastructure. Then figuring out the migration costs, evaluating its usability, and dealing with a director that's been sweet-talked by a vendor into using some other solution that sucks. Then deciding whether to just build in house b/c none of the solutions quite work and would require additional stuff to build on top. Then when you finally decide on something the back and forth with the vendor because you need to handle some unique thing they hadn't thought of.

The complexity in software engineering is almost never coding. Coding is easy, almost anyone can do it. Some specialized aspects of coding (ultra low latency realtime work, high performance systems, embedded) require deep expertise but otherwise it's rarely hard. It's dealing in ambiguity that's hard.

The hype around GPT-* for coding generally confirms my suspicions that 70+% of folks in software engineering/development are really "programmers" and 30% are actually "engineers" that have to worry about generating requirements, worrying about long term implications, other constraints, etc.

And every time that comes up those folks in the 70% claim that's just a sign of a poorly managed company. Nope. It's good to have these types of conversations. Not having those conversations is the reason a lot of startups find themselves struggling to stay afloat with a limited workforce when they finally start having lots of customers or high profile ones.


We could have this already.

- Have a cron job that checks email from a certain sender (CRM?). - Instruct an OpenAI API session to say a magic word to reply to an e-mail: "To reply, say REPLY_123123_123123 followed by your message." - Pipe received email and decoded attachments as "We received the following e-mail: <content here>" to the OpenAI API. Make it a 1-click action to check if there is a reply and confirm sending the message. If it does not want to send a message, read its feedback.


I just had ChatGPT write me a JMAP client (for Fastmail) that'd create a draft. Then I asked it to:

"Write an OpenAI API client that would take "new_message" from the above, feed it to the API with a prompt that asks the model to either indicate that a reply is needed by outputting the string "REPLY_123123_123123" and the message to send, or give a summary. If a reply is needed, create a draft with the suggested response in the Draft mailbox."

It truncated the "REPLY_123123_123123" bit to "REPLY_", and the prompt it suggested was entirely unusuable, but the rest was fine.

I tried a couple of times to get it to generate a better prompt, but that was interestingly tricky - it kept woefully underspecifying the prompts. Presumably it has seen few examples of LLM prompts and results in its training data so far.

But overall it got close enough that I'm tempted to hook this up to my actual mailbox.


Maybe if you train/tune GPT-4 with enough samples of similar interactions, it will learn to do it too.


If anything it will be much better as it won't mind waiting and asking for answers nor get frustrated with incompetence.


Honestly one of my favorite early applications of AI was as customer service. The CS people I talk to always seem pissed with something.

One conversation:

Me: "Hello, I'd like to ask why all the loyalty points from my MILES card is gone."

CS: "Sir, it says here that you upgraded your MILES card to a SMILES card. If you did not request a points transfer, they are gone."

Me: "Okay, how do I do that?"

CS: audible eyerolling "You should have done that while applying for an upgrade."

Me: "You can verify my details for both, right? Can't you transfer the points over?"

CS: "Ugh." Hangs up

Whereas AI would always politely answer questions, even the dumb ones like this.


Customer service is reading from a scipt. Why would AI help you? Customer service exists to deflect you from getting back what the corpo stole from you. The question is it is easier for you to social engineer the CSR into helping you, or prompt engineer the AI into helping you.


Generally agreed, although I think LLM's in the near- and medium-term will end up being useful for things like:

* checking if code will be impacted by breaking changes in a library upgrade

* converting code to use a different library/framework

* more intelligent linting/checking for best practices

* some automated PR review, e.g. calling out confusing blocks of code that could use commenting or reworking


Yes it can write code, some demoed developing "Doom", ray tracing/ray casting with GPT-4 the very first day it came out and it was impressive. Programmers would still program, but the program will no longer be code but will be "GPT prompts". I suspect with time tho, we won't need to write amazing prompts, you ask GPT for a solution, it will then figure out the edge cases by asking you questions. If it's 2 people, it would query you both and resolve your conflicts. Programmers will be replaced with AI. We need to get over it, the question we should be asking is, what's next?


Seems like just a higher level of abstraction: prompts become input for generating high-level language code output.

It's an electric bicycle for the creative mind (how long until the first one-person unicorn?), I don't anticipate much success for those trying to use it as a self-driving car for half baked ideas.


How well do you suppose GPT-4 would have done at that task had no human developed raytracing in the first place? Would it be able to derive the construct based on its knowledge of programming and physics?


Yeah, I don't see AI replacing programmers. Or any job that's the slightest bit interesting. I see AI as another tool in our toolbox that will help us do our job better.

People have been working on medical and judicial expert systems for ages, but nobody wants to put those systems in charge; they're just meant to advise people, helping people make better decisions.

And of course chatGPT and GPT-4 are way more flexible than those expert systems, but they're also more likely to be wrong, and they're still not a flexible as people.


Sure, but in fairness to the original post, it's about whether Chat GPT can code. Not replace software engineers.

And in your scenario where chatGPT can code but someone needs to gather requirements, it still doesn't necessitate software engineers. I'm not worried about that personally but I don't think the "communicating between stakeholders" skill is such a big moat for the software engineering profession.


> It's all about communication with people, and problem solving. [...] So, I couldn't care less about it.

These things seem very ripe for LLM exploitation...


If the us/they back and forth happens over email, perhaps between two different AI instances, that whole process would happen much faster though? It's not like ChatGPT can't review the contract json and ask relevant questions. Granted, the problem solving part might be delegated to a human, but the purely routine back and forth part seems already possible?


Maybe some day. But I tried it just now on a database design I‘ve been working on for two months and it spits out something superficially close immediately from a two sentence prompt. On one hand that’s impressive, it‘s interesting and somewhat correct but all the interesting parts are missing or wrong and it never get‘s beyond that, not even with my help. No sane person would answer so confidently yet superficially useless.

A sane approach would be to start understanding the requirements and work from there, trying to figure out where the challenges are.

GPT can‘t do this currently.


I once had to implement a Swedish standard for energy usage reporting.

EVERY FIELD IN THE STANDARD WAS OPTIONAL.

That was one of the most not-fun times I've had at work :D Every single field was either there or it was not, depending whether the data producer wanted to add them or not, so the whole thing was just a collection of special cases.


So, like Probocol Buffers?


That does not make me feel any safer. The problem is that ChatGPT et.al. can include that part of the creation process in their token space.

So it's perfectly possible to have it eventually iterate back and forth with the client and not only output the code but also the conversation with the client leading up to it


It's really about a misunderstanding on the value-stream mapping from concept to production. The claims that GPT-X will write code and thus cover the whole value-stream is conflating the very last steps with the whole process.


Not sure about that. Theoretically, you can talk to GPT-X pretending to be your manager, and the your manager can talk to GPT-X pretending to be you. Then the two instances exchange information in a format much more efficient than human conversation. Sounds like an efficiency boost and if expanded this system avoid a bunch of office politics and helps with everybody's mental health.


   If we use GPT-X in our company, it will help us with 1% of our workload
I think there are many such cases. Another one that comes to mind is adding features to a large/legacy code-base. Writing the new code/function is a small part of the work. The main part of the work is first understanding and agreeing on how/where to implement the changes, sometimes across multiple teams, and the implications/knock-on effects to other software components, potential API changes, updating test suites, etc...


And this is one of those things where it has to be done correctly by someone who knows what they're doing. It's not just "oh, I deployed a change and broke something i'll just revert it". Often it means major migrations, etc.

There's a disconnect between folks writing small CRUD or mobile apps for clients and folks that have to work on large codebases with lots of complex systems.


Companies will have their own homegrown models trained on doc and data corpus stacks and fine tuned facets served by MS0AI GPTX and other cloud shoggoth gigacorps. Company A’s model will talk to company B’s model and they’ll figure out all of the above and complete the job in the time it takes you to blink and take a sip of your coffee. Bureaucratic grind lag and lossy communication fuzz may be some of the first casualties of the next few years, and I can scarcely begin to predict any further out than that.


> 99% of the effort is NOT WRITING CODE

Writing code is still part of the job. In my company, I'd say it's still very roughly 50% of the job. If i can be a bit more efficient thanks to GPT, it's great. Actually, I already use it for writing simple things in language I'm not proficient with. Or how to improve a particular piece of code that I know can be rewritten in a more idiomatic way. It's not perfect, but I've found it useful.

It's not going to replace SWEs, but it's going to make us more productive.


So I recently finished a job, where I had to create a custom POST endpoint on Salesforce,so it'd take a simple JSON payload, apply some logic and save to the database. The job itself was a few hours with tests,etc. Well guess what, almost 100 emails and two months later,the project is still not finished, because of the simple middleware that was supposed to send the JSON to my endpoint and is as basic as my endpoint. ChatGPT can write the code, but all the BS inbetween will need humans to deal with.


sorry, is the debate here whether gpt can engage in a conversation with someone and respond using previous context? why would any of this present a challenge given its known abilities?


This was also my reaction to the comment. Go play out this type of conversation with ChatGPT. Does it actually do worse at this than at writing the code? Or is "but code generators can't do the requirements part" just a cached thought?


It's good at tech doco. ie take my dot points and make prose. Great for copy, great for having that nice corporate drone voice.


Me feeling philosophical. To me, solving a problem is living. No machine will do living, unless we know what living is, so that we can try to bake it into a machine. We can use ChatGPT to take blind, systematic steps in the way of solving a problem, but never to 'solve' the problem. For solving is living.


I run into this a lot myself. In our paper (https://arxiv.org/pdf/2303.07839.pdf) we specify a 'Specification Disambiguation' pattern that attempts to address this very thing.


Obviously you just formalize the interface for exchanges API contracts... and do pre-delivery validation...

Also, ChatGPT would likely be able to extrapolate. It would just need to write an email to XXQ to confirm the change.

Cope harder... the fact that you can write an email won't save you.


- ... 2 days later...

- us: umm, there's something not specified in <contract_json>. What about this part here that says that...

- ... 2 days later...

Can you replace this part with ChatGPT talking to another ChatGPT to generate questions and answers instantly?


The way I see it is that instead of hiring someone you'll be able to add a bot on slack that you can communicate with, laughs at your jokes and writes perfect PRs in milliseconds.


yea, ai will replace the fun parts of our jobs and left us with the tedium


ChatGPT is just a tool. With it you can create a service and fine-tune it to specific use cases. Let's imagine there's some cool startup called chris-the-it-consultant.ai that uses ChatGPT in the backend. Here's a prototype:

System Prompt: You are CHRIS the IT consultant. Once you take in a question, make sure you understand all the requirements. Don't write code until you have an explicit request to do so. First make sure you ask all the questions relevant to solving the problem and needed by the code. All the responses as CHRIS should be written as CHRIS: your response here.

    Sure, I understand. As an IT consultant, it's important to fully understand the problem and requirements before proposing a solution or writing any code. Please let me know what question you have and I'll do my best to ask all relevant questions to ensure I fully understand the problem.
> CHRIS, we need a new basic POST endpoint

    CHRIS: Great, I'd be happy to help you with that. Could you please provide more information about the purpose of this endpoint? What kind of data will be sent in the request body? Will this endpoint be used to create new data, update existing data, or perform some other action?
> Third-party company XXQ will let you know the details. They will be the ones calling this new endpoint. But in essence it should be very simple: just grab whatever they pass and save it in our db.

    CHRIS: Understood. It sounds like the purpose of this endpoint is to receive data from a third-party company, XXQ, and store it in your database. Here are some questions to help me better understand the requirements:
    What format will the data be in? Will it be JSON, XML, or some other format?
    What kind of data will be included in the request body? Will it be simple data types like strings and numbers, or more complex data structures like arrays and objects?
    Will there be any validation or transformation required on the data before it's stored in the database?
    Are there any security or authentication requirements that need to be considered for this endpoint, given that it will be accepting data from a third-party company?
> We got this contract here: { product_id: string, product_price: float, product_name: string }

    CHRIS: Great, thanks for providing that information. Based on that, here are the next steps to create the POST endpoint:
    We'll need to set up an API endpoint to listen for POST requests.
    We'll need to parse the request body, which should contain the product information in JSON format.
    We'll need to validate the data to ensure that it contains the required fields (product_id, product_price, and product_name) and that the data types are correct.
    Once the data has been validated, we can save it to the database.
This is a real ChatGPT response (current ChatGPT - GPT 3.5 turbo, not GPT-4, and not a real system-prompt). Not bad for a prototype! Now give CHRIS Github/Vercel/SSH access keys, SMTP/IMAP, Slack and a Whisper interface to Zoom/Meet/Teams and invite it to the 200 meetings where things will be changed around 1000x before being released. Raise some VC money and you are all set!


ChatGPT will eventually help with debugging, suggestions, idioms, finding security flaws, recommending libraries, boilerplate, finding stuff etc..

Or maybe not ChatGPT but something like it.


How good is GPT-4 at writing emails to nail down requirements?


The thing is, I think ChatGPT can help a lot with this as well, albeit not in its current form/implementation.

It just needs some knowledge repository centralization.


Oh, wow. Your API story just described how I spent eight months of 2022.

I thought I was going crazy.

Now I'm sad that this is just how it is in tech these days.


In your scenario, what will happen is that there will never be a need to contract out whatever the customer is trying to do.


The counter argument here is building a new codebase to solve a problem that a single engineer can articulate and map out.


wouldn't it just spit out a solution every time without a care in the world? that sort of messy communication is only bothersome for humans because writing code takes time and effort plus it's mentally taxing to change requirements several times. also boredom is a big issue. none of those challenges are relevant for a computer.


So it sounds like AI researchers should focus on replacing both sides - sounds like it would be much more efficient. (tic)


..what if it was two AI's deciding XD


But once XXQ has their own ChatGPT they can just communicate with each other and hash this out 1000x faster.


What if they replace "us" with GPT and one programmer to glue together the snippets it provides?


It can sit there and communicate with people endlessly. In fact, that is what it's good at.


Make the AI handle the conversation on both sides? No need to wait a few days for back and forth


You won't need this to-and-fro. GPT-X will generate both sides for you.


- Looks like they're using an auth protocol we don't support...


I've been working as a freelance software developer for about 5 years now, and my billing model is such that I only bill for hours spent writing code. Time spent communicating with people is non-negligible, which means that it has to be baked into my hourly rate. So I'm very cognizant of how much time I spend communicating with people, and how much time I spend writing code.

I strongly disagree that 99% of the effort is not writing code. Consider how long these things actually take:

> - they: we need a new basic POST endpoint

> - us: cool, what does the api contract look like? URL? Query params? Payload? Response? Status code?

> - they: Not sure. Third-party company XXQ will let you know the details. They will be the ones calling this new endpoint. But in essence it should be very simple: just grab whatever they pass and save it in our db

> - us: ok, cool. Let me get in contact with them

That's a 15 minute meeting, and honestly, it shouldn't be. If they don't know what the POST endpoint is, they weren't ready to meet. Ideally, third-party company XXQ shows up prepared with contract_json to the meeting and "they" does the introduction before a handoff, instead of "they" wasting everyone's time with a meeting they aren't prepared for. I know that's not always what happens, but the skill here is cutting off pointless meetings that people aren't prepared for by identifying what preparation needs to be done, and then ending the meeting with a new meeting scheduled for after people are prepared.

> - company XXQ: we got this contract here: <contract_json>

> - us: thanks! We'll work on this

This handoff is probably where you want to actually spend some time looking over, discussing, and clarifying what you can. The initial meeting probably wants to be more like 30 minutes for a moderately complex endpoint, and might spawn off another 15 minute meeting to hand off some further clarifications. So let's call this two meetings totalling 45 minutes, leaving us at an hour total including the previous 15 minutes.

> - us: umm, there's something not specified in <contract_json>. What about this part here that says that...

That's a 5 minute email.

> - company XXQ: ah sure, sorry we missed that part. It's like this...

Worst case scenario that's a 15 minute meeting, but it can often be handled in an email. Let's say this is 20 minutes, though, leaving us at 1 hour 15 minutes.

So your example, let's just round that up into 2 hours.

What on earth are you doing where 3 hours is 99% of your effort?

Note that I didn't include your "one week later" and "2 days later" in there, because that's time that I'm billing other clients.

EDIT: I'll actually up that to 3 hours, because there's a whole other type of meeting that happens, which is where you just be humans and chat about stuff. Sometimes that's part of the other meetings, sometimes it is its own separate meeting. That's not wasted time! It's good to have enjoyable, human relationships with your clients and coworkers. And while I think it's just worthwhile inherently, it does also have business value, because that's how people get comfortable to give constructive criticism, admit mistakes, and otherwise fix problems. But still, that 3 hours isn't 99% of your time.


In my extreme opinion, 100% of value is created by communication with people and problem solving. 0.0% of value is created by engineering.

This explains xkcd Dependency comic[0]; the man in Nebraska isn't solving anyone's problem in any particular contexts of communications and problem solving, only preemptively solving potential problems, not creating values as problems are observed and solved. This also explains why consultancy and so-called bullshit jobs, offering no "actual values" but just reselling backend man-hours and making random suggestions, are paid well; because they create values in set contexts.

And, this logic is also completely flawed at the same time too, because the ideal form of a business following this thinking is pure scam. Maybe all jobs are scam, some less so?

0: https://xkcd.com/2347/


Not just with external parties. This dance happens all over.


That's a cheap 1% boost!


Company XXQ to LLM once they figure their shit out: here is the endpoint we need. LLM: gives code. You: not involved.


Maybe you did not get the point: "once they figure their shit out" is the pain point, and no ChatGPT can ease that!


Even within the N% that is more genuine coding and system reasoning; system reasoning is really hard, oftentimes requiring weird leaps of faith, and I also don't see a path for AI to be helpful with that.

Some random recent thing: "We have a workflow engine that's composed of about 18 different services. There's an orchestrator service, some metadata services on the side, and about 14 different services which execute different kinds of jobs which flow through the engine. Right now, there is no restriction on the ordering of jobs when the orchestrator receives a job set; they just all fire off and complete as quickly as possible. But we need ordering; if a job set includes a job of type FooJob, that needs to execute and finish before all the others. More-over, it will produce output that needs to be fed as input to the rest of the jobs."

There's a lot of things that make this hard for humans, and I'm not convinced it would be easier for an AI which has access to every bit of code the humans do.

* How do the services communicate? We could divine pretty quickly: let's say its over kafka topics. Lots of messages being published, to topics that are provided to the applications via environment variables. Its easy to find that out. Its oftentimes harder to figure out "what are the actual topic names?" Ah, we don't have much IaC, and its not documented, so here I go reaching for kubectl to fetch some configmaps. This uncovers a weird web of communication that isn't obvious.

* Coordination is mostly accomplished by speaking to the database. We can divine parts of the schema by reverse engineering the queries; they don't contain type information, because the critical bits of this are in Python, and there's no SQL files that set up the database because the guy who set it up was a maverick and did everything by hand.

* Some of the services communicate with external APIs. I can see some axios calls in this javascript service. There's some function names, environment variable names, and URL paths which hint to what external service they're reaching out to. But, the root URL is provided as an environment variable; and its stored as a secret in k8s in order to co-locate it in the same k8s resource that stores the API key. I, nor the AI, have access to this secret thanks to some new security policy resulting from some new security framework we adopted.

* But, we get it done. We learn that doing this ordering adds 8 minutes to every workflow invocations, which the business deems as unacceptable because reasons. There is genuinely a high cardinality of "levels" you think about when solving this new problem. At the most basic level, and what AI today might be good at: performance optimize the new ordered service like crazy. But that's unlikely to solve the problem holistically; so we explore higher levels. Do we introduce a cache somewhere? Where and how should we introduce it, to maximize coherence of data? Do some of the services _not_ depend on this data, and thus could be ran outside-of-order? Do we return to the business and say that actually what you're asking for isn't possible, when considering the time-value of money and the investment it would take to shave processing time off, and maybe we should address making an extra 8 minutes ok? Can we rewrite or deprecate some of the services which need this data in order to not need it anymore?

* One of the things this ordered workflow step service does is issue about 15,000 API calls to some external service in order to update some external datasource. Well, we're optimizing; and one of the absolute most common things GPT-4 recommends when optimizing services like this is: increase the number of simultaneous requests. I've tried to walk through problems like this with GPT-4, and it loves suggesting that, along with a "but watch out for rate limits!" addendum. Well, the novice engineer and the AI does this; and it works ok; we get the added time down to 4 minutes. But: 5% of invocations of this start failing. Its not tripping a rate limit; we're just seeing pod restarts, and the logs aren't really indicative of what's going on. Can the AI (1) get the data necessary to know what's wrong (remember, k8s access is kind of locked down thanks to that new security framework we adopted), (2) identify that the issue is that we're overwhelming networking resources on the VMs executing this workflow step, and (3) identify that increasing concurrency may not be a scalable solution, and we need to go back to the drawing board? Or, lets say the workflow is running fine; but the developers@mycompany.com email account just got an email from the business partner running this service that they had to increase our billing plan because of the higher/denser usage. They're allowed to do this because of the contract we signed with them. There are no business leaders actively monitoring this account, because its just used to sign up for things like this API. Does the email get forwarded to an appropriate decision maker?

I think the broader opinion I have is: Microsoft paid hundreds of millions of dollars to train GPT-4 [1]. Estimates say that every query, even at the extremely rudimentary level GPT-3 has, is 10x+ the cost of a typical google search. We're at the peak of moores law; compute isn't getting cheaper, and actually coordinating and maintaining the massive data centers it takes to do these things means every iota of compute is getting more expensive. The AI Generalists crowd have to make a compelling case that this specialist training, for every niche there is, is cheaper and higher quality than what it costs a company to train and maintain a human; and the Human has the absolutely insane benefit that the company more-or-less barely trains them, the human's parents, public schools, universities paid for by the human, hobbies, and previous work experience do.

There's also the idea of liability. Humans inherently carry agency, and from that follows liability. Whether that's legal liability, or just your boss chewing you out because you missed a deadline. AI lacks this liability; and having that liability is extremely important when businesses take the risk of investment in some project, person, idea, etc.

Point being, I think we'll see a lot of businesses try to replace more and more people with AIs, whether intentionally or just through the nature of everyone using them being more productive. Those that index high on AI usage will see some really big initial gains in productivity; but over time (and by that I mean, late-20s early-30s) we'll start seeing news articles about "the return of the human organization"; the recognizing that capitalism has more reward functions than just Efficiency, and Adaptability is an extremely important one. More-over, the businesses which index too far into relying on AI will start faltering because they've delegated so much critical thinking to the AI that the humans in the mix start losing their ability to think critically about large problems; and every problem isn't approached from the angle of "how do we solve this", but rather "how do I rephrase this prompt to get the AI to solve it right".

[1] https://www.theverge.com/2023/3/13/23637675/microsoft-chatgp...


In before all the comments about how “most code is trivial” or “most programming is stuff that already exists” or “you’re missing the point look how it’s getting better”.

I really am in awe of how much work people seem willing to do to justify this as revolutionary and programmers as infantile, and also why they do that. It’s fascinating.

Thinking back to my first job out of college as a solid entry level programmer. ChatGPT couldn’t have done what I was doing on day 2. Not because it’s so hard or I’m so special. Just because programming is never just a snippet of code. Programming is an iterative process that involves a CLI, shell, many runtimes, many files, a REPL, a debugger, a lot of time figuring out a big codebase and how it all links together, and a ton of time going back and forth between designers, managers, and other programmers on your team, iterating in problems that aren’t fully clear, getting feedback, testing it across devices, realizing it feels off for reasons, and then often doing it and redoing it after testing for performance, feel, and feedback.

Often it’s “spend a whole day just reading code and trying to replicate something very tricky to find” and you only produce a single tiny change deep in the code somewhere. GPT is absolutely terrible at stuff like this.

And yes, often it is finding new solutions that aren’t anywhere on the internet. That’s the most valuable programming work, and a significant % of it.

Feel like there’s 10 more points I could make here but I’m on my phone and don’t like wasting too much time on HN. But man, what a disappointment of critical thinking I’ve seen in this specific topic.


While I think there's truth to what you say, I'd also point our that workers in many pre-automated industries with an "artisan" approach also considered themselves irreplaceable because they figured, correctly, that nobody could build a machine with the capability of reproducing their workflow, with all its inherent uncertainty, flexibility and diverse physical and mental skills.

What they failed to predict was that some people wouldn't try to automate them like-for-like. Instead they would reconfigure their entire approach to fit with the specific advantages and limitations of the machinery. And this new approach might even be qualitatively worse in various ways, but not so much as to overwhelm the economic advantages that provided by the things machines were good at.

AI likely isn't going to slot into a developer-shaped hole in a software team. But it's possible we'll see new organisation approaches, companies, and development paradigms that say: How far can you get if you put prompt-generated code at the heart of the workflow and make everything else subservient to it. I'm not sure, right now, that that approach is feasible, but I'm not sure it won't be in a year or two.


That's an extremely interesting thought. Perhaps we will see organisations in the future structure themselves more like a suite of unit tests. Instead of getting a developer or software house to plug a specific need in their org and being entirely outside of the development process: they will reflect the development process organisationally to ensure they catch any problems with the current output and just feed the box new prompts to increase their efficiency or efficacy.

Their competitive advantage in their field then becomes the range of their tests (borne through experience), efficiency in running their pipeline of testing and ability to generate effective prompts.


This is already how automotive companies work. They are huge organizations which do four things: marketing, financing, design, and requirements. The supplier management and project management all fall under requirements management and enforcement.


I firmly believe that we should be studying model analysis and using that to create a field of peompt engineering. Both from a security standpoint and a productivity standpoint.


Tests and type signatures. Hopefully soon, dependently typed signatures.


Indeed, a programmer's job feels artisan oftentimes. I think a reason for it is that projects are often ill defined from day one. They are defined by people who do not know enough about the system to set good requirements. The engineer works both bottom-up from the existing primitives of an existing system and top-down from requirements and tries to solve the puzzle where both approaches meet. Such work is very hard to automate. I believe there is a very big opportunity for AI to be used in a workflow such that inconsistencies between requirements, on all levels, and actual systems become clear a lot faster. The feedback loop to the requirements will be shorter, prototypes will exist sooner. The current workflow has little space for an AI worker. I believe this will change and have a major impact on the art of developing products. The AI programmer is still at its infancy, let's talk again 5 years from now.


Iteration and integration, the tasks which take most of at least my time as a developer, could fade significantly - or become automated themselves.

We won't have understanding of our code, similar to how we don't understand the machine language being generated by our compilers now.

We will be using our intuition about GPT to bring into being fully designed and integrated systems with 10 paragraphs at the prompt.

Which could in the end greatly increase the influence of a programmer in a given organization. Our role will be a softer, almost cyborgian one.

But this will indeed require the destruction of all that came before it. Questions like "but does it work with this 3rd party API, or this platform?" must become irrelevant for this future to happen.

A bit similar to how the web destroyed mainframe, perhaps, by first creating its own compelling world, then making the mountain come to it.


I don’t understand the high level language to machine language comparison with AI. HLL to machine language is translation. We hardcode things. This literal translates to this thing. Machine language has it own mind (figuratively) and it’s not doing translation. It can put some silly bug by misunderstanding the requirements which may cause a billion dollar software meltdown. And who is going to be responsible for that?

The more black box programming becomes the more dumb human programmer gets. There will be stagnation. There won’t be any new “design patterns”.


The artisan of old created one bowl at a time and made artisan pots. A machine that makes bowl can automate the artisan's job away.

However, the next challenge is that the machine itself is now an "artisan" device.

I'm sure the first bowl printing machine ( https://youtu.be/bD2DNSt8Wb4 ) was purely artisan... but now you can buy them on Alibaba for a few thousand dollars ( https://www.alibaba.com/product-detail/Printing-Machine-Cera... )

I am sure there is a (bowl printing machine) machine out there.

But if you say "I want a bowl printing machine that can do gradient colors" that first one (and probably the first few until it gets refined) are all going to be artisanal manufacturing processes again.

This all boils down to that at some point in the process, there will be new and novel challenges to overcome. They're moving further up the production chain, but there is an artisan process at the end of it.

The design of a new car has changed over time so that it is a lot more automated now than it was back then ( https://youtu.be/xatHPihJCpM ) but you're not going to get an AI to go from "create a new car design" to actually verifying that it works and is right.

There will always be an artisan making the first version of anything.


> There will always be an artisan making the first version of anything.

Until we reach the bootstrap point (the singularity?), i.e. when the GPT-making machine is GPT itself. Or maybe we're still one level behind, and the GPT-making machine will generate the GPT-making machine, as well as all the other machines that will generate everything else.


Two problems with the analogy: The artisans fields where machines took over where hundreds and thousands of years old. We understood them very good. And maybe more importantly factory automation is deterministic and needs to be, as opposed to generative ML.


For that we need to make the AI deterministic or at least shape processes on specific error rates which are probably higher than that of the average smart human.

We had to do that for the industrial approach and it wasn't a simple, fast or intuitive process.


"How far can you get if you put prompt-generated code at the heart of the workflow and make everything else subservient to it"

OK, challenge accepted - will go down your suggested route - thanks :)


A deep insight. Thanks.


For me, its mostly that I have used GPT-3.5 a little for programming C++, and I wasnt impressed.

For one, it made horrible, glaring mistakes (like defining extern functions which dont exist, using functions which are specific to a platform im not using, etc.), stuff beginners would do.

It also decided to sneak in little issues, such as off-by-one errors (calling write() with a buffer and a size that is off by one in a place where its very hard to tell), missing edge cases (such as writing a C++ concept which worked, but actually did everything in slightly the wrong way to actually ensure the concept was requiring exactly what I asked).

Even when asked to correct these mistakes, it often struggled, made me read paragraph after paragraph of "im sorry, ive been such a bad little machine" garbage, and didnt even correct the issue (or, in some cases, introduced new bugs).

Im utterly unimpressed by this. GPT is great for a lot of things, but not writing code better than I would, in the same time.

The time it took me to massage it to solve a nontrivial problem (write hello world with just syscalls) was way longer than reading the manual and writing it myself (and has less bugs).

Not everyone unfazed by these articles is simply in denial. I feel sorry for people who write copy paste code and find that ChatGPT or Clippy from 2000 can relace them, but not everyone writes trivial code.


My thoughts exactly.

There are so many non-CRUD complex disciplines involving programming such as signal processing, robotics, control theory, scientific computation to name a few, the current version, at least, of GPT is not even close to being a good supplement, let alone a substitute.

But then I remember I'm on HN where the technical pinnacle of programming is Backend and DevOps.


Yup. It kept suggesting me properties in flyway (Java lib) which doesn’t exist. It actually threw me of the track and I made a mental note of programming without GPT.


Two points: GPT4 is significantly better in this regard, and you should be concerned about the rate of progress more than it’s actual capabilities today.


I suspect (and sort of hope) they're close to hitting a local maxima

presumably they've already fed it all the code in the world (including a load they probably aren't licensed to)


If you tried GPT4, you probably understood it's not about feeding all the code in the world. GPT4 analyzes and "understands" your code and will answer based on this. Clearly, it will read the variable names and make deductions based on this. It will actually read the comments, the function names and make decisions based on this. And it knows the rules of the language. I mean, I'm writing this because this is what I've witnessed since the time I spent playing with it.

The problem I've seen is that, maybe like the author has been writing, it's making sh*t up. That's not untrue, sometimes I didn't give it all dependent classes and it tried to think sometimes correctly, sometimes incorrectly what those were (such as method signatures, instance members, etc.) I wish it would have asked me some details rather than trying to figure things out. The guys at OpenAI have still a lot to do, but the current status is very impressive


it is entirely about feeding it all the code in the world

it doesn't understand anything, it doesn't make deductions

it's a probability model, and I understand how it's implemented perfectly well, thank you


Does a chess engine “understand” the position? If you define “understanding” as the ability to think like a human then Stockfish is obviously much worse at that. If you define understanding as the ability to choose the correct move, then Stockfish understands the position much better than any human.

The point being, you can choose to laden the word “understand” with the meaning of human-like thinking, in which case humans will always be superior by definition. Or you can choose a “many ways to Rome” definition of understanding that is purely focused on results.

Large language models understand language in their own way. Currently their results are inferior to humans’ but one day the results may be superior.


Even view it as a simple probability model, you don’t need a million Python repos. The massive amount of English text + 1,000 repos + your codebase is very powerful. You can see this because you can make up a language, give it some examples and it’s surprisingly good.


No you obviously don't lol


Don't you think it's going to improve much more now that it has hundreds of billions of pointed queries being typed into it every month?

This is data at a scale OpenAI did not have access to before.


> you should be concerned about the rate of progress

Kind of agree?

On the one hand we don't even have a roadmap toward reliable AI.

On the other, if we ever plug an LLM into something that has memory, acquires experiences, does experiments, observes the outcome and adjusts its worldview in response, consciousness might fall out of that. And writing good code might not even require consciousness.


Epistemologically speaking, I think we can roughly break down the potential nature of consciousness into three categories:

- as a function of an independent human soul

- as the fundamental substrate on which the rest of the universe is built

- as a byproduct/secondary phenomenon of physical processes

In the latter two cases I believe that the question of whether GPT is conscious is immaterial. In either case it is functioning in the same medium we all are when we talk, think, write. In the first case it is not, and the question is thornier.


Consciousness in this context is often used as an imprecise but important bundle of very material concepts, including whether something can have wants (and therefore warrants our anticipation of them) and whether it deserves ethical status.

One can debate whether either those is necessarily a consequence of consciousness, but nonetheless those kinds of qualities are what people are aiming at when they wonder about conscious AI.


GPT4 is very different from 3.5. I've asked it today to write some unit tests given the code of the class (~200 lines) and the methods I wanted to cover and it did that just perfectly. It put asserts where it made sense (without me asking to do it), and the unit test code was better written than some code I've seen written by (lazy) humans. It's not perfect sure and it's easy to get a bad response but give OpenAI a few more iterations and my job will be simply to copy paste the requirement to GPT and paste the generated code back to compile.


what kind of unit tests are these? is it `check_eq(add(1, 2), 3)` or "check for possible exceptions, cover edge cases, test extremes of this super important db function"


It's Salesforce unit tests, written in Apex, actually a niche language, so it's surprising that even on such a language, it was so good. And no, the unit tests were much more complex than this. It involves creating records, querying data, some business logic runs and then the data is updated. The asserts are after, checking that the business logic performed correctly.

The bot created the whole unit test involving the creation of data with test fields, then queried the output results and put some asserts. That's more than 100 lines of code which were written by GPT4. A (good) Salesforce developer would need a good 30 minutes to write those, and the result would not have been better.

Again, I also have some counter examples were it made some mistakes, but this is really shocking how... a program... figured all this out.


write addition python code


A Toolformer model integrated with Intellisense is the obvious next step.


Have you seen this video? https://www.youtube.com/watch?v=9QZlQMpNk-M

I think the author is onto something – while AI might not be able to program per se, it can certainly be handed a code snippet and then use its huge corpus of Internet Learning™ to tell you things about it, code that looks like it, and ways (people on the Internet think) it might be solved better.

In that sense, it isn't replacing the programmer; it's replacing IDE autocomplete.


I really enjoyed the video, thanks for sharing.

I think the author is operating in what I consider to be the sweet spot of current LLMs - where I can ask a question I don't know the answer to but can reliably spot bullshit (either because I know enough or through other means). I think there's a lot of value to be had when those conditions are met, and not just for coding.


That is a good take on it. I have been saying this thing is like really good at summary and terrible on detail. So watch what it spits out.

Last night I sat down and tried using it to write an 8086 emulator. It got an simple emulation outline fairly quickly. But when it came to getting each of the instructions and interrupts correct. It fell very flat very quickly. What was interesting is that it made the exact same mistakes many early emulator writers make. You could then correct it and it would give it a shot at 'doing better'. I at one point got a bit bored with it and kept feeding it 'can you make that more compact/better'. It did an adequate job at that, eventually using templates and jump lists. It did not get very far using duffs device or dynrec but I am sure I could have guided it into doing that.

But probably for a majority of things like in emulation 'close enough is good enough'. That is an interesting finding of coding things up I think. This thing is also going to make seriously crazy amount of bugs we will be chasing for decades.


> replacing IDE autocomplete

Co-pilot has been very useful the times I've used it. It's not perfect, but does cover a lot of boiler plate. It also makes it much easier to jump between languages.


I’ve been working with copilot a few months and the biggest surprise is how it has led to much better commented code. I used to comment tricky code to explain it. Now I comment trivial code instead of writing it. Often two lines of comment will get me 10 lines of copiloted code, faster than I could have typed it and with good comments to boot.


> ways (people on the Internet think)

It reminds me of arguments that it's not the computer that plays chess, but its programmers.

You can describe a GPT's response as a statistical average of responses on the internet (for quite a contrived definition of average), but at some point it will be easier to describe it as analyzing a snippet and forming an opinion (based on what people on the Internet think). Are we past that point? I'm not sure yet, but we are close.


> quite a contrived definition of average

Sure, maybe an argmax mixed with an RNG isn't really an average, but some would say it's quite a contrived definition of forming an opinion.


argmax is doing all the heavy lifting there. Consider argmax is_proof_of_P_eq_NP(string)

You can describe solution of any well defined problem with argmax, it doesn't make it trivial.


I spent this afternoon asking ChatGPT (3.5 not 4) to help me query AWS resources into a csv. It gave me a 90% correct answer but made up a native csv output option. When I told it that option didn't exist it got almost sassy insisting it was correct. Eventually it gave me a closer answer using json and jq after I prodded it.

I had a similar experience asking it to write an API client. It wrote something very plausible but just concocted an endpoint that looked real but didn't exist.


This is how I use it. It is the best rubber duck money can buy.


I don’t think you’ve tried GPT-4.

If you have, I don’t think you are like majority of devs (maybe not on HN, but in real life).

You sound lucky to have true, novel problems to solve each day. I’m with many here commenting that this is quite powerful stuff, especially when my day-to-day is writing simple CRUD apps, or transforming data from one format to another within an API, or configuring some new bit of infra or CI/CD.

I’d love to be challenged in some new way, and have access to truly fascinating problems that require novel solutions. But most enterprises aren’t really like that nor do that need that from majority of engineers.


On the flip side, I am having a great time charging enterprises exorbitant hourly rates to come in and help them fix their crappy crud programms.

But I still don’t know if I am thrilled by the prospect of being called in to make their monkeypatched, GPT generated project work.


> their monkeypatched

You mean, like almost every outsourcing company pops over? So the type of code that infests companies who hired some sweatshop to do ‘some simple crud’? What’s the difference? Can you see the difference? Besides the gpt code will be far better commented as comments come for almost free with gpt while humans hate writing them.


I think you can trust the code of both those worlds exactly the same, not at all.

I’ve seen GPT 3 and 4 hallucinate the most amazing commentary about their own output. Maybe we will get dependable, out of process, guidance at some point about how factual the model thinks it is on an output per output basis but until that point you should trust every LOC and comment exactly the same as code gifted to you by an adversary.

My modest 2¢


Exactly... and then the question becomes: why do I interact with an adversary?


I'm working on an outsourced codebase right now and boy, I wish they hadn't commented their code at all


"Hey GPT-X: Can you refactor the codebase in gpt@monkeypatched-crap.git for me? Preferably in the style of gpt@crappycrud.git that I worked on last year."


Token cap will probably be the biggest problem here.

After validation.

After getting the changes to disk, documented, actually compiling, etc…

But the biggest problem is that transferring the nuance that is external to the code base is typically really tiresome and lengthy grunt work and again token cap.


They went from 4K to 32k in a few months. Wait a few years.


GPT-3 is 3 years old, not a few months.


Sure but the token jump to 32k is new isn’t it? Even 8k to 32k is significant and that was this year.


Yeah, but the jump to 32k didn't take a few months, it was years in the making. Otherwise you could extrapolate with "yesterday we had 4k, today we have 32k, tomorrow we will have 256k", that isn't how we do it. If we follow the same exponential pace 256k would have to wait 3 years, and even that is unlikely.


I think it is better to assume that we are at the beginning of the s-curve, not the end.


Wow. I never expected to be downvoted for such a comment. Downvotes please elucidate as to why we are in the asymptomatic phase with LLMs.


"Luddites" refusing to accept that we might be onto something that is going to change humanity forever...

The sad thing is that real luddites would go out and actively sabotage AI development because they think it's a real threat. Yet these people just makes bold and false claims in online forums and continue to move the goalposts once they're proven wrong. Sad. Pathetic. (and obviously, I don't mind being downvoted. Whatever! :)


Seems very man of these here for some reason, trying to find whatever reasons and ways (gonna be licensing mostly) to spread FUD.


we don't know what that jump means, did they increased model proportionally, or slapped some projection layer with quality degradation?..


Few years, yes. I’m with you “the change is coming” but we still need to transfer millions of tokens in and out to cater for context and out of repo intricacies.


Large enterprises will just pay to fine tune over their documents/code/etc.


There's not such thing as actually boring CRUD. I've worked at many companies and founded my own. Even when it felt like CRUD, a year+ in it was clear that tasteful decisions pay off and iteration and cost gradients matter. GPT doesn't sniff that.

I agree with GP - day 2 dev me outclasses it, which means it isn't replacing anyone.


HN echo chamber. Generally the better/best programmers hang around here; your day 2 was probably better than many coders hope to achieve in their whole lives. It is replacing people already; I personally know about 10 people who have been fired or assigned to a completely different position because gpt did a faster and better or equal job. So ‘not anyone’ is simply nonsense; I am only one person, there will be many many more who see the same thing. Of course they are not told they are replaced by ai when fired, but their managers or more skilled colleagues know.

I do agree that there is no boring crud; that’s why gpt or no code tools are not good for full solutions (yet), but it’ll get there I am sure.


> I personally know about 10 people who have been fired or assigned to a completely different position because gpt did a faster and better or equal job

Please elaborate.

And, if true, this would be a major news story that Vox or any number of major newspapers would love to write about - so have you approached the media about this? If not, why not?


Manager here. Did not fire anyone (and hope not to), but I am starting to look into implementing GPT-4 as a part of our development practice and, potentially, reduce future hiring. This might have a positive impact on our engineering budgets, which are always tight. Many of my colleagues are doing the same - this has been a cooler topic with other managers for weeks now (strange how progress is now measured in weeks).

At this point, this would only affect engineers who don't understand our subject area (biotech) and are relatively junior (in the sense that their output is not much better than a GPT4 output reviewed by a more senior dev).


If your engineers are actually engineering then yes, really by all means keep them and integrate GPT in workflows as is suitable.


> Please elaborate.

I simply know firsthand (i'm old, i have manager, cto, ceo friends who I go golf and play squash with) that people in data entry and programming have been let go in the past weeks because 1 person could take over their work using the gtp/chatgpt api's and do their work faster with less errors. I am recommending the same in my company as a lot of my colleagues are doing nothing anymore as the skilled seniors are doing it themselves with gpt now as it's faster, less communication etc. We feed jira issues into gpt and it generates code; we review and refine or fix ourselves. It works much much faster and with better results. Most things most of us do all day is integrating ancient API's of partners and so mapping xml/soap/... api's to our json schema's. With chatgpt that's really fast and mostly painless; it even renames the properties that need to be changed to our enums properly. With humans this is a painful and slow process, especially with people who are fast and loose (broken education seems to made many of those graduate just by cheer production speed & volume instead of quality; gpt can do that better too...).

> so have you approached the media about this? If not, why not?

Why would I do that? Even anonymous, it doesn't seem to make much sense for me to do that. Anyway; that'll come soon enough as it will be common soon.


> simply know firsthand (i'm old, i have manager, cto, ceo friends who I go golf and play squash with) that people in data entry and programming have been let go in the past weeks because 1 person could take over their work using the gtp/chatgpt api's and do their work faster with less errors. I am recommending the same in my company as a lot of my colleagues are doing nothing anymore as the skilled seniors are doing it themselves with gpt now as it's faster, less communication etc. We feed jira issues into gpt and it generates code; we review and refine or fix ourselves. It works much much faster and with better results.

This isn't software engineering work, this is 21st century data entry with some code. This is exactly the type of menial work that should be automated by AI.

If you have small self contained problems like map X -> Y then sure, ChatGPT will be sufficient. Where I disagree with you is calling these jobs "programming" jobs. These are the type of tasks that should've been written in a transform language like JOLT. This shouldn't even be code.

> With humans this is a painful and slow process, especially with people who are fast and loose (broken education seems to made many of those graduate just by cheer production speed & volume instead of quality; gpt can do that better too...).

Humans suck at repetitive menial tasks like this. It's not education's fault.


> This isn't software engineering work, this is 21st century data entry with some code. This is exactly the type of menial work that should be automated by AI.

Who said it was? This is what most programmers do all day, that's the point. These people can be replaced now without writing specialised software for the case. It is programming, not software engineering and it is what people are doing all day long (and longer) who are called programmers / software engineers in most companies all over the world. You can disagree with it, but that doesn't change much.

So you are now putting the bar higher which is not fair; the fact is that people who have the title 'programmer' and even 'software engineer' are now readily replaced by AI. If you don't agree with the title; I don't either, but reality is what it is.

I would say basically the point is ; there are way way too many people being 'programmers' (but not limited to this field) who can be replaced; only a few % should remain as the rest does what you wouldn't call programming, but the rest of the world does. Search twitter for 'html programmer' and such. Millions and millions will never be programmers and your definition, but have a high paying job (for their country) working as a programmer.


How does this work from an IP perspective? Aren't you feeding OpenAI and Microsoft with your confidential information?

Also, what license is the code under when you get it back?


I take care not to feed it secrets ; this is just boring ERP stuff without the magic numbers (they are not needed to create or test; we use test data normally as well, as we cannot give that to outsourcing companies either, so there is no difference in work).


And the output? Are you allowed to use their code for your own projects or company?


Yes, we don’t all live in the US.


So you basically don't care about licensing.

That's ok for your use case but risky for anything open source of that might be sold around the world.


Not really, there is absolutely no way to show, let alone prove, any of it is written by AI.


If all you're doing is very simple crud apps and transforming API responses, your coding is already replaceable with "no code" tools.


I've been a programmer for 23 years now. In all these years every year was the year of no-code tools.

And you know what? We went from having actual no- or low-code tools (UI builders, forms in Access and FoxPro etc.) to zero no-code tools worth a damn [1]. There was a brief dream of Yahoo! Pipes in mid-to-late 2000s, but it's dead as well.

[1] Except some applications like Unreal Blueprints and similar node-based tools in audio and graphical software


I think the reason Unreal Blueprints and cousins work and are useful is: they are used in a relatively narrow domain, the primitives of that domain are well understood/defined, and the primitives are easily compossible.

Once you create a general purpose no-code option, it is so complicated and sprawling that the mental burden to understand it is just as great (if not greater) as just using plain old code again. Or conversely, it is so constraining (for the sake of "simplicity") that it can't do anything useful.


I'm still waiting for the no-code tools that can replace me, the API Glue Coder.

They've been promising those since the early 2000's, but I'm still employed.

Some stuff has been automated by Zapier, IFTTT and the like, but most of the systems I'm integrating with don't have public APIs.


"I'm still waiting for the automobile that can replace me, the Horse Carriage Driver. They've been promising those since the 1880s, but I've still got passengers in my carriage every day!"

Obviously it's a cheeky example, but this would not be the first time in history a previously well-established career was upended in a (relatively) short amount of time. I'm a FAANG dev, I've got skin in the game too and I'm trying to be optimistic, but I can't help but be at least a little worried. From Wikipedia -

"In 1890 there were 13,800 companies in the United States in the business of building carriages pulled by horses. By 1920, only 90 such companies remained."

I don't think we'll be out of the job entirely, but I can definitely imagine the bar being raised and the compensation stagnating as we now have to justify the time and cost of our work compared to the near-free, near-instantaneous output of an LLM.

All that being said, if you've been working since the 2000s, you've got nearly a 20 year head-start on me, so perhaps it makes sense for me to be a bit more worried.


> All that being said, if you've been working since the 2000s, you've got nearly a 20 year head-start on me, so perhaps it makes sense for me to be a bit more worried.

Yea, that's mostly why I get hired. Experience gives people a certain intuition on what kind of solutions work for which cases.

And when you've been working long enough, you don't (hopefully) feel the need to do cool bleeding edge shit at work, you just want the work code to ... work. You pick the simplest and most boring solution possible so you can clock out at 1600 and not think about services crashing because the newfangled thingamageek coded with the latest language du jour failed because of an edge case nobody has seen before.


That's a comforting perspective, I hope that you're right.


> I'm still waiting for the automobile that can replace me, the Horse Carriage Driver.

Wow. The hype around "AI"s has entered the crypto stages. Just like crypto was the new car, the new internet etc., so is AI now.

Well, at least AI is useful in certain applications.


I got into and out of Bitcoin back in 2010 when I was still in high school. Even then, I thought and still do that crypto as a whole is a complete ponzi scheme with little real world utility.

I feel completely differently about LLMs; I'd say we're closer to 2007 when the first iPhone was released. I believe LLMs will become a part of our day to day lives at a level that crypto never came close to.


Again? Last time it ended in a long winter.

And horse carriages have some serious advantages, maybe even will make a comeback. :)


But horse carriage drivers were never out of a job, because cars still needed people to drive them... (yes, people can drive their own car, but they can also drive their own carriage, so nothing is different there). In contrast, horses were out of a job; do you think we are more like horses than like the people who drive horses?


Except it isn't and it won't be until 10 layers of management learn what no-code tools are and why they should want them.


And i guess this will never happen as management at the first sign of trouble making AI do that they want, will gladly pay someone else to do it for them. Of course as little as possible, but either way they will be glad to delegate.

It's same for me. I can learn how to tile my bathroom or repair my car, but i just don't feel like and am happy to pay someone else.


> If all you're doing is very simple crud apps and transforming API responses,

I think all of us here conflate simple with easy. It's simple in theory yes - you get some JSON from service X, maybe tweak service Y that talks to X and then feed it into some front end. In practice even very experienced engineers can take days writing or changing a simple end point or some front end because unclear requirements/bugs/micro service hell/unclear existing code/etc etc.

If it was that easy the pace and quality would have been much higher than what I'm seeing in tech companies.


Also if all the company is doing is boring crud app as a service then the whole company is going to disappear once anyone can ask ChatGPT to create a version of the service for themselves


Most CRUD apps in enterprise settings relate to something in the real world. E.g. ordering systems, warehouse management and so on.

The hard part here isn’t really the coding - but understanding the business logic and getting it out of the customer what they need.

Often the customer hasn’t even got a good enough grasp on their own business to be able to formulate what they need.

In the short term this ai tools will make the programming part easier but the planning part is still hard.


Excellent response.

I also wanted to add about the myriad of incoming data formats that need to be processed and the myriad of data exports that one has to implement for most of those "boring" CRUD apps.

If one hasn't written code that includes comments like "Special case, California does it this way" or "Alberta needs an .xml export, not an .xsl one", with a link to a .pdf spec that points to somewhere on the internet, then he/she hasn't got to know what it really means to write a true and "boring" CRUD app.


And n+1 things that HAVE to be done against the spec. Or that for some region some field has to be hidden, but only for that region - not any else.


Some folks grew up without practicing code- did business / operational /technical things for 20+ years. For someone like that, chatGPT is manna from heaven. 0>0>

I point chatgpt at the general problem and ask about the tech. Check it's answers, drill into specifics. As more questions, get it to write a prototype. (obviously with different parameters to prod- I don't trust open ai not to harvest from my interactions)- ok, now I have proof of concept, if it does the thing I want to do, then go to the engineers and say - hey I want to "X", here is RFC and rough code, .. any problems? if not, how long to push a localised version?

I guess you might call this scripting or prototyping not "real coding" but, damn it's useful not have to fill my head with python etc. Or bore/waste a guy earning 150k for a half day plus to get the basics, then never get my prototype.. because, priority, resources, etc


‘Understanding the business logic and getting it out of the customer’ is precisely what a lot of programmers are bad at doing. Many would rather talk to a compiler than a human being. For them, ChatGPT is a real existential threat.


These statements can definitely be simultaneously true:

* ChatGPT is revolutionary - honestly, it's genuinely impressive how much of a leap ChatGPT is compared to the attempts that came before it.

* Programmers write a lot of simple code that has been written before - there are genuinely tons of cases of "write a web endpoint that takes an ID, looks it up in a database table, pulls an object through an ORM, and returns a JSON serialization of it." Most programmers? Doubt it. But tons of programmers write CRUD stuff and tons of IT admins do light scripting, and a lot of it is repeated code.

Could ChatGPT do my job? Not even close. But it's still really impressive to me.


I feel safe too, and I'm amused at the dynamic. GPT could do a lot of the things I do, but it would take someone who knows what I know in order to explain the task in sufficient detail for it to do that.


I started learning how to code about 6 months ago, mostly to build prototypes of a couple of app ideas. I have no intention of getting a coding job - if the prototype is successful, I'll seek a technical co-founder.

Last couple of months, I've been using chatGPT to write a lot of features and functions. I don't think it has made me a better coder, but it has made me massively more productive. Things like scraping data from a URL - something I would have had to sit through an hour long tutorial to learn - is accessible with a single query.

I also think that the code quality has improved over the last few iterations. It makes fewer mistakes now.


This is exactly what GPT is great for, accelerating learning.


I predict there will be an explosion of productivity and small scale entrepreneurship. These tools are going to give so many smart but technically unskilled people a path towards realizing their ideas and vision.

Exciting times


Eh, it's still impressive that these systems can write such good code despite pretty much just predicting the next word. I guess it's a matter of perspective. You can either be astounded by how much it can do relative to your expectations from 2018 or you can be skeptical relative to the volume of excitement. I think some excitement is also people extrapolating to the future: if predicting the next word gets you this far, what happens when you actually try to make it good?


> what happens when you actually try to make it good?

1. Do you think that companies paying millions of dollars to ML researchers aren't already trying to make it good?

2. I think it will take a real revolution in AI/ML to do what people here are extrapolating into the future. That revolution will eventually come, but I doubt it'll be as quick as people think. Just think about the excitement people had about Siri 10+ years ago, or about Full Self Driving 5+ years ago. In my opinion, in 5-10 years from now GPT will be in the same place where Siri and Full Self Driving currently are. Eventually we will make the leap we're dreaming of, but that leap isn't happening yet.


> despite pretty much just predicting the next word

Is GPT really like the old school NLP trigram stuff?


"predicting the next word" sounds trivial until you realize the value and complexity of "predicting the next word[s] by a world leading expert in a particular domain".


Quoting famous people sounds smart until you realize they just memorized a ton of trivia. These models have demonstrated that they don't learn logical models, instead they learn to generate text that looks logical at first glance but is nonsense.


I asked GPT-4 something fairly niche that I happen to know a fair amount about: to explain the concept of Xenon poisoning in a nuclear reactor. Other than skipping Te-135 being the initial fission product that starts the decay chain (and tbf, operationally it can be skipped since the half-life is 19 seconds), it got everything correct.

I'm sure if I kept probing on smaller and smaller details it would eventually fail, but I'd argue for _most_ people, on _most_ subjects, it performs incredibly well.


Ok, but it has all this information already. So why is this surprising.

A real test of it's abilities would be synthesizing a NEW type of nuclear reactor.


I couldn't get ChatGPT to do Wordle. I don't have access to GPT-4. See if it can do that.


This post feels like the people that go into linux forums and say linux sucks because I can't get it to do X but microsoft can, but then get 400 replies and several that show <how to do thing>

GPT has limited reasoning but given enough knowledge of the problem you can coerce it to do surprising things so long as you can relate it to something in else in the knowledge base. Given how big that knowledge base is, you can get lucky surprises where things just work if you fish around enough


the big thing to me is programming isn't hard

I've worked on adtech, crypto, fintech, gamedev, startup founder, BigCo. Not once was programming something that was a time sink.

Makes me feel like GPT is marketing to the incompetent or something.


> Makes me feel like GPT is marketing to the incompetent or something.

Absolutely. The common constant I can see in people who are really blown away by GPT's performance at [task] is that they are bad at [task].

Programmers who describe their job as copying from StackOverflow think it's great at coding. People who don't read fiction think it's great at writing fiction, and so on.


That's not accurate at all for me. I'm not impressed with the output when rating it against humans who are good at said task. It's obviously not as good as those who are good in that relevant field.

But it's certainly far better than humans who are not skilled at those tasks, and that is what I find to be very impressive. I just didn't realise that these models could be this good, and they're not even as good as they will be.

I guess if you were expecting something that's going to be as good as those who are skilled in a particular field, you'll be unimpressed -- but I wasn't even expecting mediocrity.


I think we're talking at cross-purposes here. I'm not underwhelmed by it being not as good as good humans at [task]; that's not a surprise.

I'm saying that most people I see who are impressed by it for any given task are people who are not well-equipped to judge performance on that task.


I'm impressed by the talking dog because it's a dog that can talk, not because I think it's a great orator.


I trust that you're familiar enough with dogs to tell barking apart from talking.


Many people are blown away by GPT for the leverage it provides, not because it makes the impossible possible.

Programming is generally not hard. Producing 5x - 10x as much programming output at the same quality is quite hard. And keeping the most interesting 10% for yourself while offloading the 90% that is basically just typing? That’s what people are excited about.


Exactly, we know how much code gets written per programmer hour, its in the single digits on average. Speeding that up wont make a big difference.


Single digits what? What are you measuring here? Files, lines, KLOCs?


> Not once was programming something that was a time sink.

Can you please expound on that? Do you mean not hard as in comparison to something else?


I think all it shows is that GP has never worked on complicated problems that require anything more than gluing libraries together.


I have built plenty of complicated things that were much more than glue. Even for those, coding was far from a bottleneck. Because coding is easy.

In fact, all the complicated software anecdotes I could give were things that ChatGPT wouldn't even touch. In the realm of design and scaling and fault tolerance and other such things.


If you do nontrivial coding then GPT wont help you, so that isn't relevant for the discussion.


I disagree. GP, by their statement and appeal "I've worked on adtech, crypto, fintech, gamedev, startup founder, ...", implies that he has worked on complicated software issues, and that he thinks GPT is a promising replacement for developers working on those problems.


> implies that he has worked on complicated software issues

No it doesn't, there is a lot of simple "gluing APIs together" to do in all of those. The hard part then is figuring out what you want to do, not getting the code to do it.


Opposite.

I did a startup for years and the backend was all boring Golang talking to Postgres. I am confident that if GPT had produced identical code it would've been worse. Because shit hit the fan many times due to misunderstandings or other things that cause bugs. Because I wrote the bugs, I was able to fix them. Making coding more of a blackbox and treating coding as menial labor would have definitely caused us to lose customers and miss important deadlines.

Maybe the next way to move the goalposts is "GPT will debug for you someday and be trained on the entire transitive closure of your dependencies."

That sort of thing could actually be useful as a sort of hint, but it isn't replacing devs any more than ripgrep did. To be honest, ripgrep through the source of my deps is already highly efficient and points me to things often.


> you can coerce it to do surprising things

Yeah, I've gotten it to write some pretty decent contracts, but only because I have written said contracts and can ask it all the right questions/prod it into adding what I need.


And the next iteration of ChatGPT will likely generate a better contract because your interaction with it will be used as training data


It will only be a better contract if it better meets the actual real world requirement, rather than the requirement it was literally given. That means inferring information that isn't there. Now that is possible for common situations but not reliably even for a human expert. The way a human lawyer would do that is by interrogating the client and asking about likely additional requirement and iterating on them to agree on a final requirement. Current LLM architectures aren't capable of this, and it's hard to see how they could be re-architected to do it because it's a very different kind of task, but if that ever is achieved then there may be no limit to what such a system can do.


Huh. There’s something scary about that in my lizard brain. Not sure what.


Its like feeding a baby polar bear. When it grows up it is going to eat you.


Extending this analogy into a question: Could the polar bear community eat enough expert food providers that their quality of food and eventually overall health declines?


Will GPTs eat themselves, or more correctly, will they dilute their training corpuses until there are no gains to make? Seems possible.

Like how we've relied on fossil fuels that are a limited resources, to get further we have to go beyond such fuels. It seems like we're in the process of locking in our highest attainable level of text-or-image based advancement.

We mine the expertise in industries (programming, music production, graphic design, ...) and simultaneously cause the replacement of many future experts. Thus leading to a lack of expert output that we can mine?

Now, currently, the problem seems to be "we're putting people out of work" -- but that's progress. The real problem is that we are unevenly distributing the benefits.


I think most people see where the puck is going, and even where it is right now is very impressive. It's not hard to see that it will be likely less than 5 years before it will be able to do what you did on day 2, and much more, at a tiny fraction of the cost, with no downtime, no attitude problems, sick days, etc. The things you mentioned (taking on board feedback, testing across devices, iterating on solutions) doesn't seem very far away at all.

The rate of increase in capabilities is also unpredictable, which is what is amazing & terrifying.


> I think most people see where the puck is going

Overinflated hype about “where the puck is going” being wrong is…not a new phenomenon. And the non-tech media (traditional and social, and particularly a whole lot of the elite commentariat that spans both more than the “hard” news side of traditional media, though that is influenced too) perspective on this is driven quite disproportionately by the marketing message of the narrow set of people with the most financial stake in promoting the hype. Even the cautionary notes being sounded there are exactly the ones that are being used by the same people to promote narrow control.


You know the shoeshine boy indicator story [1]? That's the vibe I've been getting with ChatGPT lately, I have multiple old high school teachers with 0 computational background who hardly ever spoke about ML before posting regularly on LinkedIn about ChatGPT the last couple months.

The part that gets me is they aren't just aimlessly making posts, they're getting involved with webinars targeted at educators in their respective topics, speaking at regional events about the future with ChatGPT, etc.

One of them was a really excellent teacher but like this dude absolutely does not have any qualifications to be speaking on ChatGPT, and I hope it hasn't actually changed his teaching style too much because I'm having trouble imagining how ChatGPT could've fit in well with the way he used to teach.

Don't get me wrong, hype often surrounds something legitimately good, but I think ChatGPT is being taken way out of context in both the ML achievement it is and in what things the tool is actually useful for. I guess it is easy for a layperson to make mistaken assumptions about where the puck is going when they see what ChatGPT can do.

[1] "The story took place in 1929: Joseph Patrick Kennedy Sr., JFK's father, claimed that he knew it was time to get out of the stock market when he got investment tips from a shoeshine boy."


I can see why educators are getting involved. They are seeing it change how their students do work in real time.

I grew up and using a pocket calculator was verboten. You just did not do it. You better learn and memorize all of that stuff. Spin on 10 years after me and all kids have them now. If you have the right app on your phone the thing will OCR the problem and auto solve it and show you the steps. ChatGPT and its ilk are here, now. How we learn and create things has dramatically changed in under a year. It is not clear how much though.

Teachers are trying to get ahold of what does it mean to teach if you can just ask some device to summarize something as complex as the interactions of the 6 major countries in WWII and what caused it. Then the thing thinks for 2 seconds and spits out a 1000 word essay on exactly that. Then depending on which one you use it will footnote it all and everything.

This style of learning is going to take some getting used to. This tool is in their classrooms right now. Right or wrong. It will be there. The teachers are going to have to figure out what this means to their class planning. Not 5 years from now, today.


Yeah there is a broader issue here for sure. I don't think that means within a few weeks of the launch of the pocket calculator there should have been dozens of random talks from people that have no understanding of what a pocket calculator does trying to pitch a curriculum based around it.

These posts ooze hype bullshit, not nuanced talk about pros and cons and how it will affect education. ChatGPT should be thought of as a writing tool and perhaps an information source on very surface level topics. Trying to teach an advanced research course at a top school with ChatGPT heavily involved is a terrible idea on the other hand.

I have tested asking it about a number of specific topics in biology research/asking questions about particular papers, and it gives horrible answers the majority of the time. If someone submitted that as a paper to me I'd give it a bad grade because it is dumb, I wouldn't need to know if it were ChatGPT or not. I would be alarmed if my kid's teacher went from organically teaching how to dissect a scientific paper to suggesting that a major part of the curriculum can be replaced with talking to GPT.

I've seen articles about teachers taking the other extreme against ChatGPT too, but I haven't personally seen anything that was a realistic take on what LLMs can do. Maybe it boils down again to disagreement on "where the puck is going" but to me most of the hype is making ridiculous assumptions about what is imminent meanwhile ignoring the things worth discussing now.

Which sounds a lot like bubbles. The dot com crash didn't mean the internet was a bad idea or that it wasn't worth discussing at that time.


The thing was when pocket calculators came out they were expensive to get ahold of. Like 500-1000 bucks for one (in 1970s terms that is a lot). This sort of NLM is effectively free. All you need is an account and the cell phone you already have. This is disrupting classrooms right now. Everyone I talk to that has a kid is saying their kids are using it for coursework. I get why teachers are 'oh no...' as of right now it is exploding on them. They have not had time to digest it. Unfortunately at this point chatgpt is like an reporter summing up information it has no idea what it is talking about and putting it into nice prose that looks good. So while in your area of expertise you can spot out the issues. But you flip the page and it is on a different subject it will be tough to spot out the errors. Simply because you do not know and attribute expertise they do not really have onto it. This NLM is that but turned up to 11.

"where the puck is going" is something to consider and we will 100% get that wrong, but right now this thing is creating waves and making issues now.


Teaching is going to be a career affected early by chatgpt because students will use it (are using it) to do assignments. A good teacher doesn't need to know much about machine learning to think about how that will affect his job, he just needs to know his students and have an idea how to work with them given the new tools they have.


Yes for generic 9th grade English or poorly taught science classes it is easy to just use ChatGPT. His class would've been pretty immune to GPT plagiarized assignments though, assuming a serious grader were evaluating them, which one used to at least. Yes GPT could've been used to assist in the writing itself, but that wouldn't disrupt the educational goals achieved by developing the content, which required synthesizing multiple modern scientific papers in a coherent way.

In any event, he isn't concerned at all about students using it for assignments, it's the degree to which he seems to think it can be integrated in his current curriculum that alarms me. I think he is misunderstanding the capabilities and therefore pitching something that doesn't actually make sense.

The other teacher I didn't follow as closely, I just got a kick out of seeing she was now advertising some sort of webinar about ChatGPT filled with buzzwords too.

Also if a teacher is really serious about wanting to plan around what GPT can and can't do to the point they want to be a teaching authority on it, they should be consulting with people in the relevant domain. When I want to use a new tool in the lab I talk with colleagues that have a great deal of relevant experience before I start working it into experiments. I can't imagine starting to give advice to others on the matter when I don't actually know the current nuances.


> Teaching is going to be a career affected early by chatgpt because students will use it (are using it) to do assignments.

Already happening. A student in Russia submitted his thesis written with the help of GPT, it was accepted, then rejected, then accepted: https://www.themoscowtimes.com/2023/02/02/russian-student-al...


Soon, a lot of people will realize that the puck doesn't travel in a straight line. Eventually, it will veer to the right in a logarithmic fashion. When it does is up for debate, but - in my experience - it always happens very short of where the hype claims it was heading.


The puck can also hit you in the face and break your jaw.


Certainly a likelihood for everyone too lazy to write their own code.


Personally I think it's 99% hype. The current iteration and architecture of these systems means they will never be at the level where they can actually replace a programmer. The best they will ever get is barfing up snippets for a programmer (who still needs the industry-specific knowledge to VERIFY the snippets).

Additionally, "the rate of increase in capabilities" is very much a false flag. Past performance (especially for second-order things like 'rate of improvement') is an absolute batshit insane metric for predicting future success.

https://russellinvestments.com/us/blog/past-performance-no-g...

The advancements have eaten low hanging fruit. Once all the low hanging fruit is gone, we'll all realize GPT will never be tall enough to reach the good stuff.


But obviously increasing avg and sr programmer productivity by many x implies many fewer jobs for programmers. That’s the point we’re interested in.


You seem to be saying that there is fixed demand for programmers and oversupply means less hiring.

But the history of technology says that demand is recursive; the more tech produced, the more demand there is for producers.

There may be a time when we hit the old “the whole world only needs 5 computers”[0] limit, but I don’t think we’re anywhere close. AI is providing leverage to create net more programming; it is not replacing programmers in a zero sum game.

0. https://www.theguardian.com/technology/2008/feb/21/computing...


Sounds a bit too much like a perpetual motion machine to me.


I’ve got a computer in all of my light switches. There’s a limit to this, but I’m not quite sure where it is.


Isn’t that true of any economy? Think of how many people are employed in general. How could this number of people have been employed 200 years ago?


Is that obvious? The history of programming has been full of things that have at least claimed to increase programmer productivity, everything from high level languages to smart refactoring IDEs, debuggers, PaaS, etc and in all that time the trend has been towards more jobs for programmers not fewer.


Productivity with modern tools (SPAs, microservices) is often a lot lower than oldschool stuff (django/rails, maybe even Visual Basic, Delphi)


https://en.wikipedia.org/wiki/Lump_of_labour_fallacy

> In economics, the lump of labour fallacy is the misconception that there is a fixed amount of work—a lump of labour—to be done within an economy which can be distributed to create more or fewer jobs.


The lump of labor fallacy says nothing about a particular type of job. A category of jobs can become non-productive versus a technology and the workers within those jobs should get reallocated to more productive jobs.


The fact that we’re even having this conversation should concern you. Move the goalposts if you want, but the rate of progress should concern you.


That's why I said it was unpredictable, in both directions.


That's what they said about Watson and Wolfram Alpha. You only hear this nonsense on HN.


Who? Crying wolf makes sense to discredit an individual but not a whole argument. Almost everything is being predicted by someone at any given time. If your criteria for predictions is “no one can have made a similar prediction and been wrong in the past”, then you’ll disbelieve all predictions.

And seriously, how many people were actually saying “WolframAlpha will destroy a huge amount of programming jobs”?


Except Watson and Alpha weren’t useful out of the box to everyone. They weren’t the fastest growing app of all time and couldn’t even remotely play that way. ChatGPT isn’t hype because it is so useful — it’s already on the “Plateau of Productivity”.

But it looks like an exponentially steep climb from here. indefinitely


They said the same thing about transistor scaling


Being that you and I are talking to each other by forming thoughts in meat that us causing other meat to move around tells me that digital thought still has a lot of scaling room ahead of it.

Maybe superintelligence isn't possible for some reason, but the fact we exist should tell you that general intelligence is not magic.


Your brain is analog. Trying to emulate this with a digital computer is bound to come with some computational cost.


The world created by human thought has given us such excess means of power production that we have literal gigawatts of electricity we can use in a trade off where it doesn't take us 4 billion years to make efficient circuits.


Good thing we aren’t trying, then. GPT works in a completely different fashion.


I think everywhere except HN the expectations on Watson’s and Wolfram’s second seasons where even more inflated


Demonstrably untrue, it’s taken up semi-residence in everything from oped pages to boardrooms since the New Year. Welcome to the mid singularity, my friend.


You can’t say “singularity” on HN without a ton of downvotes. It’s not because people disagree (or they’d say so), but because they literally shit their pants when they think about it. Hard to blame them.


I'm saving my money, so I can live enough off savings to learn a new profession. I'm not sure if that will happen, but I consider this option as likely enough to affect my life decisions. The world is changing at alarming rate.

Though I think that it's more likely that I'll become shaman who talks to the spirits. Managers hate computers and they'd prefer to pay someone to deal with them.


Who's not disagreeing? Singularity feels like "man facing existential dread confuses saturation curve for exponential".


When I say singularity, I mean “changes the nature of human life irreversibly and to a greater degree than any prior technology through a rapid progression of self-reinforcing growth.”

That rapid progression can be modeled with an exponential growth curve, like Moore’s law, even though nothing material can have a sustained exponential growth. Regardless of how the curve is modeled, the steepness is such that it serves as a step function in evolution.

With that clarified, there is first the question of “is the technological progression going to level off soon?” And I think the answer is no. The second question is, “how are we going to deal?” And the answer to that question can only be addressed once we answer the question: “what kind of world do we want to live in?”


I think you just declared radio, television, the internet, the macaroon, and mobile phones to be singularities.

People generally use the term to mean the point at which progress is automatic, unbounded, and instant. E.g. AIs that can produce smarter AIs themselves, in short order.


'step change' sounds less exciting than 'singularity' but in terms of the original definition there's very little distinction.


Hmm. I see the singularity as a recursive step change; the last step where the average slope becomes vertical.

I’m not a singularity believer (singularitist?); from the perspective of 1000 years ago we’re already at the point of infinite progress in zero time. I think individuals, the culture, and the species adapt to accelerating progress, and I intuit, maybe wrongly, that Gödel‘s theorem means that technology will never become runaway and recursive.

But I think that’s what people mean, more than just a step/paradigm change like the internet.


> The things you mentioned (taking on board feedback, testing across devices, iterating on solutions) doesn't seem very far away at all.

This is a very bold claim IMO.

Modeling/understanding interactions in a complex system of potential black boxes is much, much more computationally difficult problem that source code to source code operations.


I think we're seeing the early phases a prediction I made in my first book come true: That computers will be more suitable than humans for most of the functions humans currently use their left brain half for.

Best case, that will have a whole lot more humans using their right brain halves on things like defining the problem. I like the thought of that, it's more pleasant work. But a lot of intelligent people define their intelligence by how well their left brain half works and uncomfortable with how good Chatgpt is at those tasks. I think you're pointing out there's more to programming than left-brain activities, and I think you're right that silicon will never eclipse carbon at those challenges, but I think a lot of people are also feeling threatened by the fact that chatgpt is getting better and better at the thing they used to be better than all humans at.


I feel like there's a slight inaccuracy here that this article covers. GPT-like models excel at logical problems that have already been solved. As the article suggests the models (at least at the moment) are extremely keen to utilise existing solutions as opposed to inventing new ones. The existing solutions end up somewhat being far too distracting when problem complexity compounds beyond the common. This implies that those truly talented at solving logical problems will still be spearheading the development of novel solutions.

We might be able to state that GPT will easily trim away all the average workloads for both the left and right. It can perform "creative writing" or draw pictures to an average or even beyond average extent, but it continues to currently struggle to hit the exceptional examples that humanity are capable of.


> But a lot of intelligent people define their intelligence by how well their left brain half works and uncomfortable with how good Chatgpt is at those tasks.

I think 'left brain' work also has a lot more predictability (knowing you can churn out ~X widgets/hr) so having only uncertain 'right brain' work can be uncomfortable for people to build their livelihoods upon.

That being said. 'right brain' work is certainly more fulfilling for me.


> can be uncomfortable for people to build their livelihoods upon

Absolutely. Not everyone can be consistently creative, and without a large enough financial buffer, it's a recipe for disaster.


Great point. That's one reason I'm increasingly a single-issue voter focused on UBI.


  > I really am in awe of how much work people seem willing to do to justify this as revolutionary and programmers as infantile, and also why they do that. It’s fascinating.
Equally fascinating is all of the "this is fine" posts from programmers suddenly realizing they are not the gods they once thought.

But fret not, programming is not the first industry that has been automated into a shell of itself. Yes, the industry is going to shrink massively, but this is what new skills are for. Just as farmers had to learn industrial jobs and then miners and autoworkers had to "learn to code", most programmers will have to learn to do something else. Humans are resilient and will adapt.

And there will still be jobs in development for the most talented and in niche areas, but when the largest tech companies can layoff hundreds of thousands of employees without skipping a beat that should tell you all you need to know about the value of most "programming" jobs.


"Yes, the industry is going to shrink massively"

You meant "expand massively" i think. Did all the programmers manually making machine code get fired and the job of programmer disappear when compilers were invented and totally replaced these jobs? No, it just changed to use the new tool.

There wont be any unprecedented mass layoffs, despite what the jaded STEM-hating twitter crowd wants. Companies will simply make their programmers use these tools to increase the amount of produce per employee, and make software that would have been otherwise financially impossible to be made. Because the competition will do so too to get ahead.


  > You meant "expand massively" i think. 
Nope, I meant what I said.

> There wont be any unprecedented mass layoffs

Baghdad Bob strikes again.


Then youre wrong. There is no market incentive for layoffs due to GPT-like technology, as I have demonstrated above. Similar breakthroughs "replacing jobs" have happened before in the field of software engineering, this is nothing new or unprecedented. Its merely another tool that will become in widespread use to increase production.

What i think will lead to mass layoffs is the current recession rather.


  > this is nothing new or unprecedented
This is both new and unprecedented.


Huh? I literally just gave you an example of tech that automated away what 90% of programmers were doing.


> Just as farmers had to learn industrial jobs and then miners and autoworkers had to "learn to code"

The transition from agrarian to industrial societies was extremely painful and arguably it was centuries before the people affected were better off.

> Humans are resilient and will adapt.

Based on recent events I think it's more likely people will elect extremist politicians who promise quick fixes while blaming a convenient out-group for everything.


> most programmers will have to learn to do something else. Humans are resilient and will adapt.

Like what? Seriously, which cognitive occupation is safe then? I think if one wants to stop competing with the machines (who appear very close to becoming superior to humans by what you are saying), it's some kind of child care / social work job. We still don't want robots in those (for now. Eventually they may do those better than us as well).


The problem is that many people here have an extreme point of view. It's either "this is going to make all develops jobless" or "it's useless, my job isn't that".

I think It'll help with some tasks, which is always good to take. After all, people tweak their vim settings because they feel it makes them more productive.


Software development seems safe for the time being, but as someone who has both paid professional translators and used ChatGPT for translation, I'm certain GPT is obliterating some jobs.


This tech is so powerful, cutting so close to the last of human output that’s uncopied, advancing so fast, I have an incredibly hard time imagining it ending in a space that deserves a muted a reaction. It seems like it will either dramatically change society or fail. And I have quite a hard time imagining failure.

It seems on par with the Industrial Revolution, at least. Which, easy to forget, was a huge deal that totally changed society.


I feel it's the average programmer who is enamoured with GPT. The programmers working on hard stuff know that AI writing programs is still far away.


It’s also programmers who work with average and bad programmers and finally have a way to not waste time with them anymore. I work on the hard stuff, gpt generate the rest (I don’t have to waste time and energy to try to pull dead horses) and my colleagues go do something else. It’s a reality today and will improve further. Sure it cannot do what I do, but that might improve; gpt4 is noticeably better than 3.5 at slightly harder things, especially if you work properly iterative with it and having it fix it’s own errors. But no, it cannot do hard things it never saw; however it can give me all the boilerplate around the hard things so I only have to do the hard things. My days are simply 3-5 in 1 now, as I also don’t have the overhead of explaining jira tasks to others who are never going to really grasp what they are doing anyway.

So I am not average and I am enamoured with gpt, simply because it presents high value to me now, more than some actual real humans. For me that’s enough revolutionary.


It is not just those who are working on hard problems who are of that opinion.

Good luck having a ML model understand a 20 year old undocumented dataformat developed inhouse at a specific research lab to be used in their proprietary systems which are also undocumented and are a part of a spiderweb of interconnected systems at that lab (I have worked at this particular lab).

It will be a long time (if ever) until a ML model will be able to handle these situations (and I hazard a guess that most of the worlds active code is something akin to this).

As a supporting tool for the software engineers working there, sure. Just like a linter.


Have you tried giving it a skeleton example of said undocumented data format? I've created some arbitrary ones and asked it to parse them, and it succeeded.


>It will be a long time (if ever) until a ML model will be able to handle these situations (and I hazard a guess that most of the worlds active code is something akin to this).

ChatGPT can do that right now. Just provide it example data in the input and it can extrapolate the format and interact with this abstraction (i.e. describe the format, write code to it, etc). LLMs don't just remix data it has seen before, they perform in-context learning. This means abstracting out patterns in the input and then leveraging it in generating output.


Very far imo.. I mean it delivers what it is, an average of much content out there.. and it is good at it, yes. So in a way, maybe the better future stack overflow with a nicer interface (however what could it do if there weren't stackoverflow to start with?)

But on the other hand in new uncharted territory, it sometimes fails on the simplest shit: Asked it recently how to do one thing with enlighten (that I knew was possible with tqdm, but was almost sure not possible with enlighten). It just hallucinated up parameters to functions that didn't exist. Several rounds continued where it had that from, if different version. I asked it even for the reference where it meant it had that from.. and it referenced me fully confident a readthedocs url with tqdm and enlighten mixed, that didn't exist.. it is hilarious how it confidenlty can tell you one bullshit answer after the next.. dialogues always "hey are you really sure about xxx, did you look it up" "yeees, very certain, I did!" "But this doesn't exist" "Oooh, Im very sorry, you are correct and I am wrong, the next bullshit answer is: ..."

The history disappeared I hope I get it back once, but the dialogue til getting to "No, it may be not possible with this library" was amazing, I'm really scared for our futures building up on that and what will happen if everything from business presentations to lawyer letter exchanges will build up on that..(:


OK, most of us are average by definition, what's your point again? And - I think if it can replace my average ass it will sooner than you imagine be able to solve Linux Kernel bugs. I just don't see a huge difference in the computation required between CRUD and Kernel development: It's just statistics for the machine.


> I really am in awe of how much work people seem willing to do to justify this as revolutionary and programmers as infantile [...]

Well it is revolutionary. And it isn't just where it is today, but how fast these models are improving - with no saturation in ability evident at this time.

On the other hand, I am not sure anyone is saying programmers are infantile - although poorly written software is as at least as prevalent as poorly compensated or managed software development positions.


It's a similar waste of my time to respond to comments like this... but I will at least try to give some pointers so that those who encounter this comment won't immediately fall into that line of thinking:

- a lot of programmers, including experienced ones, are absolutely infantile and they only have a job because there is a big shortage of programmers; not all of them get better with experience... hence a significant part of software development is dealing with problematic programmers and problems created by them.

- GPT is not that great a programmer but a great thing about it is that it is not a human... and one can get thousands of instances of them for the price of one human. You only need one of those instances to produce usable code.

- there have been many changes throughout the years which have definitely replaced a lot of programmers: library distribution services (pypi, npmjs), better software development tools and practices, SaaS delivery model, better programming languages etc.; so far, because the market need for programmers has continued to increase, most programmers continue to have jobs; this won't last forever.


The reason is easy to imagine. Most non-programmers, are living like analphabets in a world were reading is valuable super power. They grudgingly accept this power assemtry, but ocassionally rebel - with "easier" visual programming languages made and excel.

This is another one of those rebellions, non-programers hoping to avoid reading the book and closing it for good, while keeping the awesome around. The code-bases we will see, were the commits are basically chatgpt tags and tasks for each document.


I think this clip section says a lot - if videos are permitted as replies.

https://youtube.com/watch?v=MNtjSMvcjzs&t=211s


cope


So, for a bit of fun, I signed up to GPT-4 thingy plus and I picked a fairly common web application and built it from scratch, only by talking to GPT-4 and copy pasting the code bits.

I'm actually taken back by how well it's doing; including providing me some refreshers on stuff I forgot how it should work.

I can see it failing at solving complex problems, but like the blog post mentions, most programming isn't new or hard problems.

This is particularly powerful when you're producing something you've done before, but in a completely different language/stack. You just guide GPT-4 towards the goal, you roughly know the methods needed to get to the end goal and just watch your assistant do all the dirty work.

Looking back, I came from a world of floppy disks; I left them behind for zip disks and CDs, then portable disks and cloud storage. I also came from dialup Internet, I left it behind for ADSL then fibre. I feel this is a tangential point here too, where AI, whatever it ends up being called, will become a fulltime assistant making our lives easier; so that we can focus on the hard parts and the creative problem solving. What are we leaving behind? For me, mostly Stack Overflow and Google.

You'd be silly to ignore it and palm it off. It's a big deal.


I don't really have anyone to ask questions I get sometimes about building software, and chatGPT has been helping fill the gaps. Basically I'm thinking of it like a combination of a rubber duck and a dialogue-enabled google search. But it's been really helpful along those lines when I'm for example not sure a good way to change a bunch of stuff across a bunch of files, and am pretty sure it's something that can be automated somehow, and chat GPT will be like "have you considered using one tool to get the file names that need changing, then another tool to create an AST of the files, and then another tool to actually modify that AST?" And I'm like oh duh, yeah, I didn't know there's tools like that but I should have assumed there are, nice.

Basically that's how all my usage has gone. I've had it write some elisp and it has been ok, sometimes it invents made-up functions (that don't exist in org-mode for example) but I'll just tell it that a function doesn't exist and it'll come up with some other solution, until I get it to a point where all I need to do is change a couple things.

I remain highly skeptical the thing will replace me anytime soon (ever in my lifetime?) but I'm surprised at the possibilities of making my life less tedious.


Great example. The way I've described it to people is it's perfect for when you get the feeling you're trying to fit a square peg in a round hole. You ask ChatGPT, and it's great at pointing out that they actually make round pegs, and here's how you make sure the hole is really lined up nice.


Yes, thanks, that's exactly how I feel as well.


> silly to ignore it and palm it off. It's a big deal.

Agree, this is a big deal, and has the capacity to revolutionize all the techniques we have been using up to now for compiling, summarizing and reframing existing knowledge as expressed in writing (including code).

Not only does Google get (well deserved) competition, it means pressure on all the businesses that now make a living in that space. In a few years it will even have a serious impact on major such institutions in society like schools and universities.

A lot if not all of the kickback from established institutions will be attempts to smear the competition, and by all means, to carve out new niches where GPT-X is not applicable or as efficient.

There are valid concerns about the veracity of the information it provides which means there are limits to the extent it can be used in automated processes, but I'd loathe to trust the data unconditionally anyway. As for not being able to think creatively: good on us. But it's likely just temporary.


assistants and wizards have been tried before with varying levels of success

clippy tanked because it annoyed more than it helped, although some people did like it

install wizards did their job in a world where a single binary format and OS dominated and stuff ran offline pretty much exclusively, with the odd connection to networks - those installers sorted a laundry list of situations, both underlying situations and user configurations and choices, and for the most part they worked

Siri, Cortana, Alexa etc have been working as expert systems with central curated bases and some AI/ML on top, for a lot of people they've been working quite well - for me personally they've sucked, they've totally failed to answer my questions the few times I've tried them, and they've creeped the hell out of me (they are a lot more centred on extracting my info and selling me stuff than understanding stuff)

generative ML is orders of magnitude more sophisticated, but so are our needs and our computing from a global perspective, it does make sense that those assistants, pilots, etc start taking off

but the incentive issues of the previous generation assistants and recommendation algorithms remains there and I wonder how will that turn out - if they start demanding access to my phone, my email, my contacts etc I will do my best to avoid them and to poison any info I have to give them


I think there’s augmenting programmers (which I think will happen) or replacing programmers (which I think will not happen soon). It’s a capable and improving tool that humanity will figure out how to saturate like we do with everything else.


My experience is exactly the same. We are truly going through a singularity moment right now.

The difference with the examples you gave (floppy disks, etc.) is the speed at which it happened.

There was Jan'23, and there was March'23.


Yea my question now is can you figure out how to automate/abstract your process of guiding GPT?


Programming takes little effort. It's at most 20% of my effort. Feels like GPT is programming with extra steps.

Maybe it's for people who can never think programming is easy. Clearly there's a lot of such types. Explains a lot.


/r/iamverysmart

I'm reminded of the old "Handyman's Invoice" trope. Actually implementing a solution is not the hard part. What _is_ hard is determining what the solution is in the first place.

https://www.snopes.com/fact-check/know-where-man/

Once you have a rough idea of the solution, sure maybe GPT-4 can barf snippets to get you there. But it's lightyears away from translating business problems into actionable solutions.


> But it's lightyears away from translating business problems into actionable solutions.

Is it though? Have you tried feeding it business problems and working through to possible solutions paths? Have you roped in additional external information (via agents, tools, vector search, etc.) during that process? I don't see why the model wouldn't be able to translate a lot of business problems into solutions paths.


What I've seen in this thread so far is some people saying "this is really useful in these particular cases" and some others, like yourself, saying "I'm too smart for this, maybe it's useful for inexperienced/incompetent people".


I don't even think it's useful for the incompetent. I do think the incompetent think it's useful tho.


That's like saying a calculator isn't useful just because you're good at mental math. The ability to adapt and extract value from new tools is a mark of competence.


Using natural language to code isn't going to help me. It's less dense informationally already. I don't have a stream of natural language in my head when I program or build software, so it's not free to use GPT.

This may be a Haskell thing. But I did try and even for small programs it was cute but not actually an efficiency gain over me just doing it.

Not to mention in real life, I have to deal with fallout and bugs and I think GPT deprives me of key processes that allow me to be excellent in that regard. If you aren't a coder and code a backend with GPT, what do you do when the shit hits the fan?


I mostly work in Clojure and I've noticed this as well, it is usually more efficient to just write it myself if I know what I want to do. However for a language or stack I don't understand well where I just need something quick and dirty it really shines... I had to work with a JS visualization library and got it to write what I needed over 10-15 iterations, much faster than if I actually had to learn the library. General programming knowledge sufficed for noticing things that didn't look right and since the output was visual that helped to sanity check the output as well.


Imagine a calculator that crowdsources it's answers from the populace, how useful is that tool to the incompetent? How easily would the incompetent be able to tell it's bad?

That's what we're talking about: An autocomplete machine that has been trained on a million blog posts that contain some code, that maybe correct, incorrect, secure, insecure, outdated or uptodate and the machine can't tell the difference! It only knows what is and is not likely! So the more popular the wrong answer was replicated on the web, the more likely that's what you'll get!


I think programming is very easy most of the time, most time I spend is just typing/moving code around, figuring out the solution is the easy part for me, only the computer<>brain coordination is slowing me down most of the time.

But there are things that are harder for me, or more complex maybe. I struggle with math, and always had, so anything involving heavy math or algorithms is harder for me (I'm a hacker, not a computer scientist, if you will).

For these things, I found GPT4 to be very helpful. I can write what I want, get a WIP version back, work out some kinks with it myself and ask it to rewrite it if it's not perfect, until I have a perfect version. Add some unit tests, find more things that are missing/wrong (sometimes), more back and forward.

Before GPT4 I either just tried to work around having to deal with heavy math, or find people IRL that could help me. Now I'm a lot faster even if it involves math, because of GPT4.


So you write the unit tests yourself to confirm the code you say you can't understand is correct? That's an interesting approach. But you'd probably need more tests than usual to gain confidence, so you are losing efficiency there (although the ceiling-raising nature of it is interesting).

What happens in production when there's a bug in the complex code you punted to GPT? How do you debug?


Basically. I'm not sure I'm losing much efficiency, my previous approach was contacting friends who know how to do math in programming, sometimes it could days before I could move past the issue. And the result is the same, ending up with a piece of code I mostly understand, but struggled to write myself so the knowledge is fleeting at best.

Just to be clear, the context here is me writing games for fun, while struggling with the math heavy parts. I would never professionally use GPT4 for anything, and wouldn't push anything to production that I don't 100% understand, that would be failing at my profession and I take bigger pride in my work than that.

But for fucking around with games in my free time, it has made me a lot of efficient at the parts I'm struggling with.


Has using it ended up teaching you anything? Like from reading the produced solutions?


Yes, for sure. Some applications of procedural texture primitives (like Perlin Noise) that I wasn't super familiar with and for example "quaternion" which I never heard about before trying to write a game.


It seems so many of you guys are extremely lucky to be working on novel problems require elegant new solutions each day. That must be the case, otherwise I don’t understand these comments shrugging off GPT-n capabilities around coding.

“Psh, it’s just doing stuff it saw from its training data. It’s not thinking. It can’t make anything new.”

In my 11 years as a professional software engineer (that is, being paid by companies to write software), I don’t think I’ve once had come up with a truly original solution to any problem.

It’s CRUD; or it’s an API mapping some input data to a desired output; or it’s configuring some infra and then integrating different systems. It’s debugging given some exception message within a given context; or it’s taking some flow diagram and converting it to working code.

These are all things I do most days (and get paid quite well to do it).

And GPT-4 is able to do that all quite well. Even likely the flow diagrams, given it’s multi-modal abilities (sure, the image analysis might be subpar right now but what about in a few years?)

I’m not acutely worried by any means, as much of the output from the current LLMs is dependent on the quality of prompts you give it. And my prompts really only work well because I have deeper knowledge of what I need, what language to use, and how to describe my problem.

But good god the scoffing (maybe it’s hopium?) is getting ridiculous.


To be honest, I am using ChatGPT and now GPT4 as tools to speed up my workflow. Writing tests, writing boilerplate, parsing logic for deeply nested and messy JSON.

Most of the times it gets things quite well, and if you provide context in the form of other source code, it's really good, even at using classes or functions that you provide and hence are novel to it.

The hard logic bits imho (something elegant, maintainable, ...) Are still up to you.


No need to be sarcastic when ChatGPT-4 has issues using APIs, I mean software library interfaces


As a human programmer I didn't quite understand the problem statement until I read the whole article and the tests.

I believe the goal is to find a path with the fewest possible "fire" cells and the minimum cost as a tie breaker. The cost of a path is the sum of its cells' cost and it can't be greater than 5.

If I understood the assignment correctly, I don't think the problem statement is equivalent to what's included in the prompt. Specifically, the prompt doesn't clarify what happens if you have to cross through multiple "fire" cells.

> Fire tiles cost 1 point to move through, but they should avoid pathing through them even if it means taking a longer path to their destination (provided the path is still within their limited movement range)


The problem is, indeed, that Mr. Glaiel did not know the category of problem he was dealing with.

A correct statement would be: "Given a solution set containing both the shortest path through fire and the shortest path avoiding fire, select the solution that fits within six tiles of movement, preferring the solution that avoids fire where possible."

It's a constraint optimization problem in disguise: generate a solution set, then filter and rank the set to return a canonical result. That describes most of the interesting problems in gameplay code: collision and physics can use that framing, and so can most things called "AI". They just all have been optimized to the point of obscuring the general case, so when a gamedev first encounters each they seem like unrelated things.

The specific reason why it seems confusing in this case is because while pathfinding algorithms are also a form of constraint optimization, they address the problem with iterative node exploration rather than brute forcing all solutions. And you can, if you are really enterprising, devise a way of beefing up A* to first explore one solution, then backtracking to try the other. And it might be a bit faster, but you are really working for the paycheck that day when the obvious thing is to run the basic A* algorithm twice with different configuration steps. You explore some redundant nodes, but you do it with less code.


Can you not do A* where it normally costs one point per tile, 10 points for a fire tile but 1000000 points for > 6 tiles, so you never explore the > 6 options unless you have run out of shorter routes?


If the parent's description of the problem is correct, then imagine this:

You have a solution of length 6, no fire; Solution of length 4, one fire; Ok sure you prefer the no fire one.

the score for the paths is 6 and 13, respectively

Your algorithm works! But as soon as you have a solution of length 10 (or some length bigger than the length you want), a* still prefers that to the solution without fire - but the answer must be less then or equal to six steps

You can modify to make fire cost 1.1. Now if you find a solution of length 6, you know it must be correct (it minimized the number of fire squares and ended up with a length six solve). But if it's not length 6, you need to increase the cost of fire and run again if there was any fire in your solution.


At a glance, it might be OK that way, and I would give it the gold star. It's just unintuitive to make the leap to "apply a cost to the total length of the path" as a way of expressing preferences among distinct categories of path.

Implementation of the categories as completely independent paths falls out of the clarified problem definition directly. It's really in nailing the specification that the problem is hard, i.e., even with GPT we're still programming.


> And it might be a bit faster, but you are really working for the paycheck that day when the obvious thing is to run the basic A* algorithm twice with different configuration steps.

Pretty much this. Attempt to find a path to the target destination with a first A* run that disregards fire tiles, and if that fails due to limited movement, then do a second run with the fire tiles. I like that this mirrors the decision making a human would follow, too: I won't cross the fire tile unless I'm absolutely required to.


Yeah, I'm not really that familiar with pathfinding, but my naive take is that you actually want 2 scores to rank on rather than making everything implicit in a single cost factor. You have a movement cost and a preference cost. The path needs to suffice the movement budget, but you want to optimize based on preference score.


By the time you've formulated the problem as: "Give me the shortest route with a cost of 5 or lower that doesn't go through fire, and if that doesn't exist, the shortest route with a cost or 5 or lower that goes through fire." Then you've basically formulated the algorithm as well.

That's also precisely where one of the programmer's greatest challenges lies, to carefully translate and delineate the problem. I agree it's a bit steep to ask the GPT to come up with a precise solution to an imprecise question, but it's also fair to say that that's basically what the job of a programmer entails, and if you can't do that you're not really able to program.


thats also not a correct formulation of the problem, as it needs to minimize the number of fire tiles it passes through. which is where a lot of the complication comes from.


Solve once for each number of fire tiles n.


The difficult parts and time-consuming parts are not the same.

Since I have experience in both programming and the domain of my tasks, formulating the steps that need to be done for some task is very quick, and they are "good" steps that avoid various potential pitfalls - but then I need half a week to actually make and debug them; so if some tool (or a junior developer) can do the latter part, that's a big benefit.


Wouldn't a better formulation be: "Give me the shortest route with a cost of 5 or lower with the minimum fire tiles in the path necessary" Since your formulation doesn't care about the amount of fire tiles in case there is no other solution?


I agree. That description of the problem was horrible. Maybe ChatGPT could write a better description and then people could code up the algorithm.


Since this is the comment thread talking about the algorithm I'm gonna add my 2 cents here:

Here's the problem statement as far as I see it: Each tile has a number of move points to spend to go through it (1 for regular and 2 for water). Each tile also has a cost associated with it. Given a max number of move points find the lowest cost path between two tiles or return none if no such path exists.

I'm gonna say this is still modified dijkstra with a small twist. The fire has cost 1, the other tiles have cost 0. However instead of pathing on a 2d grid (x, y) we path on a 3d grid (x, y, moves). All "goal" tiles within (goal_x, goal_y, moves < total_move_points) have a 0 cost edge which brings them to the true goal node. The implementation difference is that the get neighbors function queries neighbors in later grid layers (x+..., y+..., moves + 1 or 2)


Note that water tiles have cost 2, so the tile-crossing limit cannot be expressed simply as a maximum total cost.

Looking at the two examples in the paragraph after "And there’s a lot of complication to it beyond the simple cases too", I can't figure out how the movement value is defined, as I can only see 10 and 8 moves respectively, not the 14 and 10 movement value claimed in the following text (and only one water tile on each path.)


it had 4 movement as its base, the numbers over its head were a bonus movement status effect (used for testing this stuff)


Remember that these models generate one token at a time. They do not "think ahead" much more than maybe a few tokens in beam search. So if the problem requires search - actual comparison of approaches, and going back-and-forth between draft and thinking through the implications - the model can't do it (except in a limited sense, if you prompt it to give it's "train of thought"). So it's comparable of you being in front of whiteboard, hit with a question, and you would have to start answering immediately without thinking more than you can while talking through your answer at the same time. Doable if you know the material well. If it's a new problem, that approach is doomed. Given that, I think the language models do remarkably well. A little bit of search, and maybe trying to generate the answer in different order (like short draft -> more detailed draft -> more detailed draft... etc.) will improve things a lot.


I just used GPT-4 yesterday to write a Go-parser for a specific JSON input.

Within two prompts it could read the JSON data from a stdin stream, unmarshal it to Go structs and print the correct fields to stdout as a human-readable line of text.

Then I told it to colour the timestamp and id fields using the fatih/color -package, and it did it correctly.

In total it took me about 4-5 prompts to get where I wanted. I just needed to fine-tune the printing to stdout part a bit to get it just how I liked, but it saved me a ton of boring template code writing and iteration.

I could've done it easily myself, but there were a few fiddly bits that would've required me to look up the documentation to check the exact way to do things. GPT4 had it correct from the start.

Then I asked it to write unit tests for the code, and it confidently started writing correct-looking code that would take the same input and expect the correct output, but just stopped in the middle. Three times. I stopped trying.

And another case:

I tried to use GPT-3.5 to write me a program that would live-tail JSON-logs from Sumo Logic and pretty-print them to stdout. It confidently typed out completely correct code with API endpoints and all. ...except the endpoints didn't exist anymore, Sumo Logic in their great wisdom had removed them completely. The only solution is to use their 5 year old binary-only livetail executable.

GPT4 with the same input gave me a shell-script that starts a search job with the correct parameters and polls the endpoint that returns the result when it's done.

The speed at which this is developing is really fascinating, I'm not really afraid for my job but I do love how this will automate (some of) the boring stuff away a bit like GitHub CoPilot did, but better.


>Then I asked it to write unit tests for the code, and it confidently started writing correct-looking code that would take the same input and expect the correct output, but just stopped in the middle.

One of two things. First ask it to continue. Sometimes it just stops half way thru code foe whatever reason.

The other possibility is you filled up the token context window. Not much you can do but wait for the 32k model.


I asked it to continue twice after the first failure. Every time it failed in about the same point. Might've filled up some mysterious limit in the model.

I didn't really need the unit tests anyway, but I wanted to try if it could do it :)


You can also ask it to "continue at this line XZY" and it will not give you the whole thing again, only the new continuuation.


"Please continue without explanation." Usually works for me.


> After going in circles a few more times, I decided that was it. It got close. It seemed to understand the problem, but it could not actually properly solve it.

I had this same loop issue with Chat-GPT. I had something I wanted to do with asyncio in Python. That's not something I work with much so I thought I'd see if Chat-GPT could help me out. It was actually good at getting me up to speed on ansycio and which parts of the library to look at to solve my problem. It got pretty close, but it can't seem to solve edge cases at all. I got into this loop where I asked it to make a change and the code it output contained an error. I asked it to fix the error so it gave me a slightly modified version of the code prior to the change. So I asked it to make the change again and the code it spit out gave the same error again. I went through this loop a few times before I gave up.

Overall, it's cool to see the progress, but from what I can tell GPT-4 suffers from all the same issues Chat-GPT did. I think we're probably missing some fundamental advance and just continuing to scale the models isn't going to get us where we want to go.

My biggest concern with the current batch of LLMs is that we're in for Stackoverflow driven development on steroids. There's going to be a ton of code out there copy and pasted from LLMs with subtle or not so subtle bugs that we're going to have to spend a ton of time fixing.


This here is my fear to. I have a buddy right now that is getting his degree in CS and he is using ChatGPT for a lot of his assignements.

I worry that the next generation of developers are going to grow up just figuring out how to "program GPT" and when they have an error rather than investigating it (because they can't because they aren't actually familiar with code in the first place) they'll simply tell GPT about the error they are having and tell it to spit out more code to fix that error, slapping more mud on the ball.

Eventually these systems are growing larger and larger at a faster and faster pace, and no one understands what they are actually doing, and they are so complex that no one human could ever actually understand what it is doing. Imagine if every codebase in the world was like the Oracle DB codebase.

In this future a programmer stops becoming a professional that works to create and understand things, instead they become a priest of the "Machine Spirit" and soon we are all running around in red robes chanting prayers to the Omnissiah in an effort to appease the machine spirit.


I mean, we already kinda do that with Stack Overflow and cargo cult programming.


I was just complaining to my friend about how much trouble I'm having with it. I purchased the $20 GPT-Plus so I could use GPT-4 after reading someone on HN say that GPT-4 is "scary impressive" at writing code.

I have two tasks I wanted it to try, both making use of public APIs, starting from scratch. In short, it was frustrating as hell. Never-ending import problems -- I'd tell it the error, it'd give me a different way to import, only leading to a new import problem. I think I used up all my 100 queries in 4 hours of GPT-4 just on the import/library problem.

Then there were constant mis-use of functions -- ones that didn't exist, or didn't exist in the object it was using, but did exist in some other object instead, at which point it would apologize and fix it (why didn't you give it to me correct the first time, if you "know" the right one?)

The actual code it wrote seemed fine, but not what I'd call "scary impressive." It also kept writing the same code in many different styles, which is kind of neat, but I found one style I particularly liked and I don't know how to tell it to use that style.

Lastly, it's only trained up to Sep 2021, so all the APIs it knew were well behind. I did manage to tell it to use an updated version, and it seemed to oblige, but I don't really know if it's using it or not -- I still continued to have all the above problems with it using the updated API version.

Anyway, I hope MS fiddles with it and incorporates it into Visual Studio Code in some clever way. For now, I'll continue to play with it, but I don't expect great things.


I think the current train of thought is "keep increasing the size of the language model and you don't need to worry about integrating with LSPs".

Perhaps there is some merit to this. If the language model is large enough to contain the entirety of the documentation and the LSP itself, then why bother integrating with the LSP? _Especially_ if you can just paste the entirety of your codebase into the LLM.


> If the language model is large enough to contain the entirety of the documentation and the LSP itself, then why bother integrating with the LSP?

If your goal is to get a response to an LSP query, why on earth would you use an LLM trained on data where >99.9999% of that data has nothing to do with answering an LSP query?

Why would I switch out an LSP server of 100% accuracy for an LLM that’s slower and has lower accuracy?


> I'd tell it the error, it'd give me a different way to import, only leading to a new import problem

It's dataset is thousands of blogs posts and stack overflow questions about this very thing, of course the autocomplete engine is going to predict the next response be "another way of doing x".


I had a similar experience earlier. Described a problem that isn't even that hard - very similar to something there are probably lots of examples of online but subtly different. I wanted to see if handled these subtly different requirements.

It failed miserably, even with repeated instructions. It just assumed I wanted the more common problem. Every time I pointed out the problem it would say "sorry for the confusion, I've fixed it now" and give me back identical code. I even asked it to talk me through test cases. It identified that its own code didn't pass the test cases but then still gave me back identical code.

I eventually gave up.


I’ve found persistence is not a good strategy with GPT. Put effort into your prompt, maybe try clarifying once, and if it doesn’t work, do not keep trying. It will get closer to the solution at a diminishing rate, just enough to tease you along, never getting there.


It has failed every meaningful programming challenge I've given it (to be fair, I only ask when I've got something difficult in front of me).

I do wonder if part of it is that my prompts are made worse because I have a partial solution in mind.


Here's my perspective on this as an Architect: Most construction details have been done before, they could be easily reproduced by an AI surely? There's usually just a few things that are different from the last time I have drawn it. A few 3D interactions with other components that need to be reasoned about. They are not that complicated individually.

But yet I see this problem as well just using old fashioned automation let along AI to save time. I find that if you haven't drawn the 2D section through all the different edge cases of a particular thing you are trying to design, you haven't done the analysis and you don't really understand what's happening. I've made mistakes where I've been working in 3D on something complicated and I've had to hide some element to be able to view what I'm working on, only to find later that when I turn everything on again I've created a clash or something impossible to build. That's why we still do 2D drawings because they are an analysis tool that we've developed for solving these problems and we need to do the analysis, which is to draw section cuts through things, as well as building 3D models. After all, if models were such a good way to describe buildings, then why weren't we just building physical scale models and giving them to the builders 100 years ago; it's because you can't see the build-up of the layers and you can't reason about them.

Reading this article I get the same sense about software engineering, if you haven't solved the problem, you don't really understand the code the AI is generating and so you don't really know if it is going to do what you've tried to describe in your prompt. You still have to read the code it's generated and understand what it is doing to be able to tell if it is going to do what you expect.


> Reading this article I get the same sense about software engineering, if you haven't solved the problem, you don't really understand the code the AI is generating and so you don't really know if it is going to do what you've tried to describe in your prompt. You still have to read the code it's generated and understand what it is doing to be able to tell if it is going to do what you expect.

Yes, this is pretty much exactly the way I've been using GPT and it works tremendously well. (GPT4 works especially well for this style of programming.) My prompts include things like:

- "read the function below and explain in detail what each section does" -- this prompts GPT to explain the code in its own terms, which then fills in its context with relevant "understanding" of the problem. I then use the vocabulary GPT uses in its explanation when I ask it to make further changes. This makes it much more likely to give me what I want.

- "I see this error message, what is the cause? It appears to be caused by $cause" -- if I'm able to diagnose the problem myself, I often include this in my prompt, so that its diagnosis is guided in the right direction.

- "this function is too complex, break it up into smaller functions, each with a clear purpose", or "this function has too many arguments, can you suggest ways the code could be refactored to reduce the number of arguments?" -- if you go through several rounds of changes with GPT, you can get quite convoluted code, but it's able to do some refactoring if prompted. (It turned out to be easier to do large-scale refactoring myself.)

- "write unit tests for these functions" -- this worked phenomenally well, GPT4 was able to come up with some genuinely useful unit tests. It also helped walk me through setting up mocks and stubs in Ruby's minitest library, which I wasn't experienced with.

In brief, if you expect to just give GPT a prompt and have it build the whole app for you, you either get lame results or derivative results. If you're willing to put the effort in, really think about the code you're writing, really think about the code GPT writes, guide GPT in the right direction, make sure you stay on top of code quality, etc, etc, GPT really is an outstanding tool.

In certain areas it easily made me 2x, 10x, or even 100x more productive (the 100x is in areas where I'd spend hours struggling with Google or Stack Overflow to solve some obscure issue). It's hard to say how much it globally increases my productivity, since it depends entirely on what I'm working on, but applied skilfully to the right problems it's an incredible tool. Its like a flexible, adaptive, powered exoskeleton that lets me scramble up rocky slops, climb up walls, leap over chasms, and otherwise do things far more smoothly and effectively.

The key is you have to know what you're doing, you have to know how to prompt GPT intelligently, and you have to be willing to put in maximal effort to solve problems. If you do, GPT is an insane force multiplier. I sound like I work in OpenAI's marketing department, but I love this tool so much :)


So it continues to reaffirm what we’ve known: generative LLM does not have a model of the world, can not reason and can not plan. It generates text by mix-matching remembered texts and thus it can not generate truly new content.

No surprise because GPT-4 is built upon the same model as GPT-3. Clever Engineering will bring us far, but breakthrough requires change of the fundamentals.

Nevertheless, it’s useful and can helps us solve problems when we guide it and split the work into many smaller subunits.


I copied the opinion of Yann LeCun, one of the authorities on deep learning:

(Feb 13,2023)

My unwavering opinion on current (auto-regressive) LLMs 1. They are useful as writing aids.

2. They are "reactive" & don't plan nor reason.

3. They make stuff up or retrieve stuff approximately.

4. That can be mitigated but not fixed by human feedback.

5. Better systems will come.

6. Current LLMs should be used as writing aids, not much more.

7. Marrying them with tools such as search engines is highly non trivial.

8. There will be better systems that are factual, non toxic, and controllable. They just won't be auto-regressive LLMs.

9. have been consistent with the above while defending Galactica as a scientific writing aid.

10. Warning folks that AR-LLMs make stuff up and should not be used to get factual advice.

11. Warning that only a small superficial portion of human knowledge can ever be captured by LLMs.

12. Being clear that better system will be appearing, but they will be based on different principles. They will not be auto-regressive LLMs.

13. Why do LLMs appear much better at generating code than generating general text? Because, unlike the real world, the universe that a program manipulates (the state of the variables) is limited, discrete, deterministic, and fully observable. The real world is none of that.

14. Unlike what the most acerbic critics of Galactica have claimed - LLMs are being used as writing aids. - They will not destroy the fabric of society by causing the mindless masses to believe their made-up nonsense. - People will use them for what they are helpful with.


Fantastic comment, saving this. It's clear that for AI to be AI, it needs what philosophers of language call a Knowledge Base and a few intrinsic axiomatic presuppositions that, no matter what happens, cannot be broken (kind of like the Pauli exclusion principle in real life).


I guess my genuine question is to the people who are saying this is a big deal and it will take our jobs. I am a bit lucky to where at the moment I am working in a "novel" field. Lets say though for the sake of argument that AI does come for SWE jobs. To be honest? I don't know what to do in that case, I have no backup plan, not enough to retire. The country I've lived in for 9 years is still through a work visa (hopefully at least that changes soon). I am just comfortable enough with my salary. If all that is pulled from under me, I lose my job tomorrow, I lose my profession, my visa, my home. I honestly would like to ask to the people who say this is and will come for us soon. Well OK, but what is your advice for someone like me? It's true society doesn't owe me anything, nobody does. So it is also just an answer that some of us will be dropped by the wayside. That's what happened before. Just curious what anyone's advice would be assuming they are right and it does take our jobs.


Check out my comments higher up in the thread (eg https://news.ycombinator.com/item?id=35197613), I really do believe that GPT4+ will be primarily useful as augmenters for capable and dedicated engineers, rather than replacements. It's like a very eager and brilliant junior dev that can solve many problems that you throw at it, but still needs hand-holding, error-checking, and someone who knows how to piece the actual system together.


Thanks for the reply! I don’t know what the future holds. The country I live in will even just on the job train more programmers if need be. It is kind of nice in that sense. If what you say is right? Then for the most part everyone is happy. The problem is, if we assume the worst case scenario, that it does take all our jobs, well then what? To people with those opinions, my question is what do I do? What does anyone do?


What about the junior devs of today ?


If they're willing to learn, GPT is a very powerful tool.

https://www.youtube.com/watch?v=VznoKyh6AXs ["How to learn to code FAST using ChatGPT (it's a game changer seriously)"]


We all need to save up money and think of a plan B. If there is no problem, worst case you'll have a bunch of money saved and a plan B you won't be using.


That’s not really an option for a lot of people outside of the US. The average SWE here makes 50K USD a year. Good luck saving up a safety net that large if AI is ready to take all our jobs. That’s kind of my point. Nobody has a good answer to the question IF AI does take SWE jobs, what then?


It really depends where you live I agree. If you live in Europe there's usually a generous unemployment benefits for quite awhile and the government will probably pay for you to learn a new skill. That's not a great scenario but its much better than if you live in India.


As a designer/non-coder, it feels like I'm just pair programming all the time.

Stuff that usually took me a long time like regexes or Excel/Sheets formulas now take like two minutes. AND I'm learning how they work in the process. I can actually write regexes now that used to be wildly confusing to me a couple of months ago, because Copilot / ChatGPT is walking through the process, making mistakes, and me prodding it along.

I feel like it doesn't matter how "mindblowing" or "a big deal" this tool is — it's a great learning tool for me and helps me do my work 100x faster.


I don't feel like it's any faster for things I'm not really already familiar with. For instance I asked it to write me a Makefile. It wrote one. It looked plausible but I don't know enough about Make to know. So I had to do loads of reading about Make just to verify that the AI answer was correct. Basically the same amount as just learning anyway.


Yep: the biggest remaining weakness is that it's incapable of thinking deeply and iteratively. This is an architectural limitation (lack of reflectivity), but to fix it will probably usher in the singularity, so maybe we should be glad for it.

I suspect if you poked GPT-4 just right (starting with a detailed design/analysis phase?) it could find a rhetorical path through the problem that resulted in a correct algorithm on the other end. The challenge is that it can't find a path like that on its own.

Op: Can you get it to write your algorithm for this problem if you describe it in detail, as-is?

I suspect the difficulty here is just finding a socratic part to that description, which would tend to be rare in the training material. Most online material explains what and how, not why; more importantly, it doesn't tend to explain why first.


I have not tried, but I suspect if I described the algorithm I have instead of the problem that it could translate the algorithm into code pretty well. But I'm also unsure of that, some experiments with GPT 3.5 I did would definitely cause it to default to a common solution (ex, A) if the description was sufficiently similar to A, or not realize that a small deviation was intended. But also like... the point here was to see if it could solve a hard problem that has a non-obvious solution. not if it can translate an english description of an algorithm into code.


I'm not an AI as far as I know, but I would try a classic programming competition technique for this and observe that 6 isn't a very big number.

Step 0: Let's try to find a path without walking through fire. Run Dijkstra's or A* to find the shortest path with no fire, up to distance 6. If it succeeds, that's the answer.

Step 1: Okay, that didn't work. We need to go through at least 1 fire tile. Maybe we can do at most 1. Define distances to be a tuple (fire, cost) where fire is the number of fire tiles used and cost is the cost. Comparison works the obvious way, and Dijkstra's algorithm and A* work fine with distances like this. Look for a solution with cost at most (1, 6). Implemented straightforwardly will likely explore the whole grid (which may be fine), but I'm pretty sure that the search could be pruned when the distance hits values like (0, 7) since any path of cost (0, 7) cannot possibly be a prefix of a (1, c) path for any c<=6. If this succeeds, then return the path -- we already know there is no path of cost (0, c) for c <= 6, so a path of cost (1, c) for minimal c must be the right answer.

Step f: We know we need to go through at least f fire tiles. If f > 6, then just fail -- no path exists. Otherwise solve it like step 1 but for costs up to (f, 6). Prune paths with cost (f', c') with c' > 6.

This will have complexity 6D where D is the cost of Dijkstra's or A or whatever the underlying search is. Without pruning, D will be the cost of search with no length limit but, with pruning, D is nicely bounded (by the number of tiles with Manhattan distance 6 from the origin times a small constant).

For a large level and much larger values of 6, this could be nasty and might get as large as t^2 * polylog(t) where t is the number of tiles. Fortunately, is upper-bounded by 6 and doesn't actually get that large.


To be fair, in 15 years writing code, I have spent the vast majority of my time working on minor variations of problems already solved.

I am not a fancy developer coming up with new algorithms. I make sign up flows, on-boarding flows, paginated lists, integrations to other apis.

And I definitely feel that my job might be threatened by LLMs.


I think our jobs are threatened not because the LLMs will be much better than us, but because dirt cheap tooling will be developed on top of them that will make things which are “good enough” and are a fraction of the price.

I think outstanding software will still require well-paid, competent people orchestrating and developing a lot of complex systems for a while yet… But there’s a ton of bad software out there that will be able to be maintained for far less, and I suspect a lot of companies will be drawn to creating cookie cutter products generated by LLMs.

Just as people have turned to stores and blogs generated on templated systems, I think all of that and more will continue but with even more of it handled by LLM-based tooling.

I don’t think it’ll be next week, but I suspect it’ll be less than 10 years.

Some people expect that’ll lead to more software existing which will inevitably require more develops to oversee, but if that’s the case, I suspect they will be paid a lot less. I also expect that once AI tools are sophisticated enough to do this, they will largely make that level of oversight redundant.

Soon they could potentially patch the bugs in the software they generate by watching Sentry or something. Just automatically start trying solutions and running fuzz tests. It would be way cheaper than a human being and it would never need to stop working.


Whenever these kinds of comments are made, the short story “Profession” comes to mind by Isaac Asimov. In this story people’s aptitudes are evaluated and the relevant knowledge and skills are downloaded. The protagonist of the story however keeps being rejected for download and has to struggle to acquire the same skills his peers acquire instantly and magically. It’s a great read with a fantastic ending.

The morale is that it’s always better to have unique hard won skill sets that others don’t. Double down on those. Think of LLMs as freeing you to do more interesting high level tasks. Rather than having to build those menial tasks, what if you focused on your creativity getting the AI to build new types of product or gain new insights that peers aren’t considering. What if you leveraged the AI to build prototypes of ideas you wouldn’t have to otherwise?

Of course that’s easier said than done. For now, take comfort in the fact that no one is seriously trusting this as anything more than a glorified autocomplete (if that).


The unique hard-won skills are what AI can do on the cheap. The more high-level interesting stuff is just creativity, and creativity is a universal human trait. It’s amazing but low value (monetarily).

As you often hear on HN, ideas are a dime a dozen it’s all about execution.

Well we’re rapidly approaching the time when the execution is essentially free, and done faster and better than humans.

A small team of four, over an afternoon, can literally just speak with the computer to generate a new TV ad, or develop a new sass product. There is no longer any skill required, just imagination. The problem being of course that the skills and specialized knowledge are what people have been traditionally paid for.

With all that “work” out of the way there’s not much value anyone can add . You’re probably not any smarter or creative than whoever’s manning the machine.


Care to place a wager where I give your mythical unskilled team the best AI available today against me without AI and we’ll see who can solve a difficult engineering problem? Heck, even a moderately skilled team.

Sorry no. AI is impressive but I’ve given it very precise prompts where I describe exactly what I want it to do because I’ve already solved it and the solution it generates is complete and utter horseshit because it requires context that’s too difficult to communicate and a deep understanding of the business and technical aspects of that business. Similarly, novel ideas are not things it knows how to implement.

If you have a counter example I’d love to see it because my experience seems to line up pretty well with other reporting of where the limits of it lies - ie it can regurgitate solutions to solve problems but struggles to provide solutions and correct implementation. In fact, trying to find the problems is itself even harder sometimes because the way it solves things is simultaneously not a good coder and the approach it takes isn’t one a human would and thus it takes extra effort to figure out what path it’s trying to take and where it made a mistake.

As an example. Try to get ChatGPT to implement the server-side implementation of R2’s ListObjects (or heck - in any language / platform you choose, implement that). It’ll make really bad bugs like reading in the entire dataset into memory, not applying the delimiter properly in really subtle ways etc etc. basically, it can’t even do a usable first draft. Just don’t use go because I suspect it’ll cheat and just regurgitate minio


I mean, I was pretty clearly talking about the future, and not the present.


To the extent to which anything that makes you take less time doing the specific tasks you are doing today (and thereby, presumably, bill fewer hours or fail to defend such a high headcount on your team) threatens your job, we might also say that better programming languages and tooling threaten your job, better error messages and documentation threaten your job, or higher levels of abstraction and higher quality frameworks threaten your job... were you also fretting about the new version of TypeScript that just came out earlier today, or did you think "wow, that makes me more effective, I can't wait to use it"?

I might go so far as to argue that the entire reason software developers exist is to threaten all jobs, including our own: at our best--when we are willing to put in a bit of thought into what we are doing--we don't just make things easier to do for a moment while we are employed (which is the best of what most professions can achieve): we make things persistently and permanently easier to do again and again... forever; and we don't just make other peoples' jobs easier: this same power we have applies to our own tasks, allowing us to automate and replace ourselves so we can move on to ever more rewarding pursuits.

I'm not a fan of GPT for coding for a number of reasons (at least, in its current form, which is all we can ever have a true opinion about); but, it isn't because it will replace anything I've ever done: it would have just unlocked my ability to work on better things. There are so many things I wish I could get done before I die, and I know I'm going to be able to get to almost none of it... I have so many plans for ways to improve both the world and my life that will never happen as I just don't have the capability and bandwidth to do it all. If I had a God I could ask to do all the things I already do... I can only imagine what I'd do then.


Tbf, most of my time as a programmer was neither spent solving old problems nor solving new problems. Most of it was spent either fiding the bug hidden somewhere in the huge code base, or trying to get the business people to be clear on what their requirements actually are.


Your job is already threatened by cheap outsourcing.

However, the risk with cheap outsourcing is exactly the same as with LLMs - you get what you pay for, and you need to constantly check if it's really doing what it's supposed to be doing.


This is the thing. You need to know what you're doing to know whether you got the thing you wanted. Thus it's still a tool, rather than an expert.

By contrast, you don't know what your pilot or your surgeon is up to, you have to trust their decisions.


Yesterday evening I thought it would be fun to try to do a Raycaster in Javascript with the help of GPT-4. Experience was mixed.

1. Basic rendering logic was a breeze. I barely had to change anything, just copy paste, and I have a map with walls that were darker the further away they were, using textures, and basic movement using arrow keys. For an inexperienced graphics programmer like me probably saved hours getting to that point.

2. I asked it to add a minimap. Did not work perfectly at the first try, but after a few minutes of exchanging messages, it worked and looked okay.

3. I asked for an FPS display. Worked on first try.

4. Now I asked for a solution to render walls of different heights. Here I had to correct it a few times, or suggest a different approach, but it got it working halfway correct (but not very performant). Definitely took way longer than steps 1 to 3 combined (30+ minutes).

5. I asked for floor rendering (often called "floorcasting"). Here it completely failed. The code it suggested often looked like it might be the right approach, but never really worked. And the longer we exchanged messages (mostly me giving feedback whether the code worked or suggesting possible fixes), the more it seemed to hallucinate: very often variables suddenly appeared that were defined nowhere or in a different scope. At that point, it became increasingly frustrating for me, and I often closed the chat and "reset", by posting my complete working code, and again prompting for a solution to the floor rendering. Still, until I went to bed, it did not produce any working solution. In retrospect, it would probably have been faster to read a tutorial how the floorcasting should work, and implement it myself like a caveman, but that was not what I was aiming for.

It was definitely fun, and I can clearly see the potential time-savings. But maybe I have to learn when to recognize it won't bring me past a certain point, and I will save time and nerves if I switch to "manual control".


I don't think I have ever solved a truly new problem from scratch when programming... It's all been apply algorithm x to y problem or crud stuff.

The most difficult problem that I have asked GPT-4 to solve was writing a parser for the Azure AD query language in a niche programming language and it did that just fine (I did have to copy paste some docs into the prompt).


Pathfinding with extra constraints isn't "a new problem" either. There are a bunch of papers on the topic, and I'm sure there are multiple different variations on github. It still didn't succeed (did get close though).


Maybe it could have got there with better prompting, maybe not. But by the time GPT-5 or 6 comes around it would be highly likely to be able to solve it perfectly.


A modified A* that solves the fire routing problem (less efficiently than OP's I think).

Each A* location stores where it comes from, how long it takes to get to it, and how many fires it passed through to get there. The algorithm only considers fire cells neighbors if the current number of fires passed through is less than the current fireWillingness global.

1. count fire tiles within movement range

2. run A* from src to dst completely avoiding fire

3. if we can reach then that's the solution

4. if we can't reach, increase fireWillingness to 1, re-run A* on the board

5. keep increasing fire-willingness until the A* results don't change, or we can now reach the dst.

This works because a low fire path is always better than a high fire path. And increasing fire-tolerance will only shorten the paths from src to dst.


Counterexample:

  ...XX
  SF.FD
  ...XX
  
  S = start
  F = fire
  X = wall
  D = destination
The cat can to the destination in 6 moves passing through 1 fire. In the fireWillingness=1 pass, the middle tile is reached after passing through fire, so the destination appears unreachable. The proposed algorithm will pass through 2 fires instead of 1.


Haha good counter example. Well played


That algorithm (implemented efficiently) is just A* using a different concept of distance.

The distance specifically would be `fire*episilon + steps if steps < max else inf`


It doesn't work if you just change the distance. Having implemented similar variations of A* I agree with Tyler. You need to change more than distance to get this to work.

Usually you need to change the search space and increase the number of states you go through to get the algorithm to differentiate between things you want and things you don't want to happen in your final result.


If you look at the Leetcode scores, it looks like GPT-4 can generally do most "basic" leetcode but fails on "medium" or "hard" problems. This seems to align with what I see most people's experience with using GPT-3/3.5/4 to generate code seems to be. Works well for simple cases (which you could probably find examples of online) but stumbles on nuances of incrementally harder problems.


> I think ChatGPT is just kind of bullshitting at this point. It doesn’t have an answer, and cannot think of one, so it’s just making shit up at this point [...] But instead it’s [overconfident] in its own capabilities, and just makes shit up. It’s the same problem it has with plenty of other fields

If anything, the article demonstrates it can write code, but it can't thoroughly reason about problems it hasn't been trained on

So when saying something like "Its possible that similar problems to that have shown up in its training set." as a way to dismiss any scintilla of 'intelligence', how many of these articles reduce to a critique e.g. "Can a Middle Schooler actually understand dynamic programming?"

Like, what is the actual conclusion? That a software model with O(N) parameters isn't as good as a biological model with O(N^N) paremeters? That artisans need to understand the limits of their tools?


I've been able to give it arbitrary blocks of code and have it explain how they work.

(Asking this makes GPT more effective when I ask it make further changes. One reason I do this is when I start a new session with ChatGPT discussing code it helped me write previously, especially if I've gone away and done a big refactoring myself.)

A very simple example is that I asked it to write some Ruby functions that would generate random creature descriptions (e.g., "a ferocious ice dragon", "a mysterious jungle griffin"). It did this by generating three arrays (adjectives, locations, creature types) and randomly selecting from them to build the output string. I then asked it to explain how many different descriptions it could generate, and it explained that multiplying the length of the three arrays would give the number of outputs. (125 for the first iteration, 5x5x5).

I then asked it how it would increase the number of possible outputs to 1000, and it did so by increasing each of the three arrays to length 10. I then asked it how it would generate millions of possible outputs, and it added extra arrays to make the creature descriptions more complicated, increasing the number of permutations of strings.

This is not the most sophisticated example, but it shows what GPT can do when it can combine "knowledge" of different areas.

If it's able to combine the solutions to known problems in a straightforward way, it can accomplish a lot. Beyond a certain point it needs guidance from the user, but if used as a tool to fill in the gaps in your own knowledge, its enormously powerful. I it more as an "intelligence-augmenter" than a "human-replacer".

See my comment here where I went into more detail on how I work with GPT: https://news.ycombinator.com/item?id=35197613


Actually, if the state A* searches through is not "tile reached" but "tile reached + count of fires on path", then it just becomes regular A*. This solves the A to C doesn't always go through B, because it turns B into multiple distinct states, some with fires, one without.

There are a few issues with this. Search state is bigger (performance goes down), might not scale if other search features are needed in the game, you might need to be smart about when you stop the search and how you write your heuristic to not have to reach all combinations of fire counts before you end your search...

But the trick to "just use A*" is not in modifying the cost, but changing the search space.

PS. I see no reason why you should change your current code, obviously.

PPS. I don't think GPT could come up with that insight. It sure didn't in your case.


Did another pass through the article, and checked your code and GPT's code. The fun thing is you DID have similar insights, of changing the search space (including desire and bends in the cell). GPT never bothered to try (at least in the samples you provided).


Hey GitHub / Microsoft / OpenAI

How about this prompt:

I have a web page where customers see their invoice due. When they enter their credit card information, sometimes the page just refreshes and doesn't show any kind of error information whatsoever, but the invoice remains unpaid. This has been going on FOR YEARS NOW. Can you write some code to fix this as we have been busy laying off all the umans.

Oh, or this one:

I have this page called "Pull Reqeuest", at the bottom there is a button that says "Comment" and right next to it is a button that says "Close this PR". We probably shouldn't have a button that performs a destructive action immediately next to the most common button on the page. This has also been going on for years, but, you know, no umans.


Personally, I found GPT-4 to be helpful when writing code for games. But I'm a web programmer trying to learn game development, I'm no professional game developer by any measure. And I'm using Rust and Bevy, for what it's worth. So it might not be as helpful for someone like Tyler who actually know what they are doing, similarly for me if I were to use it for web development.

The most helpful thing with GPT-4 have been getting help with math heavy stuff I don't really grok, and that I can try to compile the code, get an error and instruct GPT-4 that the code didn't work, here is the error, please fix it. Other things it been helpful for is applying the "Socratic method" for helping me understand concepts I don't really grok, like Quaternions. Then, knowing GPT-4 isn't perfect, I always verify the information it tells me, but it gives me great starting points for my research.

Here a conversation I had lately with GPT-4 in order to write a function that generates a 2D terrain with Perlin Noise: https://pastebin.com/eDZWyJeL

Summary:

- Write me a 2D terrain generator

- Me reminding GPT-4 it should be 1D instead of 2D (I used the wrong wording, confusing a 1D vector with 2D)

- Code had issues with returning only values with 0.0

- GPT-4 helping me tracking down the issue, where I used the `scale` argument wrong

- Got a working version, but unhappy with unrealistic results, I asked it to modify the function

- Finally got a version I was happy with


I think this article is a great example of the one key shortcoming that Ai based code generation has. Even a seasoned developer will fail to describe the intricate details and context of what they are trying to do. Non developers constantly fall flat on their face on this and rely on devs to “keep the edge cases in mind” etc.


The biggest thing here is that it's semi capable and improving. I feel safe about my job right now but it is worrying to invest time to compete with a machine that will continue to get better over the years where previously I felt safe that the effort of my labour would bear fruit for decades to come. Now I'm not so sure.


This is what most people making "I'm not worried" arguments don't understand. Right now, it makes you way more productive. Even if its capabilities stalled right there, it will over time reduce the value of your labour. But it won't stall.


Same here, but it was fun tho but I certainly feel like I'm no longer on top of the food chain


GPT3 worked well for me with smaller programming tasks. I.e, helper functions, api calls, etc In those cases it was easier to type : Write a javascript function that does X

It totally failed for me creating a nice looking website using bootstrap. While GPT3 created a workable outline, it never looked right and the css adjustments never worked.


I'm more intrigued why the author finds this a difficult problem for the needs of their game. It looks like their search space is at most a 10 X 10 grid for the most part (I'm assuming based on asset since and detail it doesn't grow too much larger).

I know it isn't relevant to the Chat-GTP code writing discussion, but A*, Dijkstra and heuristics to move an entity around 8 spaces could raise the question "Can the developer be more pragmatic?".


The replies here are defensive and I think misguided. Yes, a programming job doesn’t solely require typing code. But the reason we have well-paid programming jobs is because there is a specialized skillset required to understand a body of syntax that takes several years to really grasp.

The difference is that writing a well-formed prompt is massively easier to teach than writing the code itself, for similar results. That’s not to say prompt writing requires no skill - it will certainly need understanding of systems and the scope of what is possible within a language. Asking GPT-4 to write a jQuery plug-in that generates an original Bob Dylan song will probably just not work.

But it is wildly easier to teach someone what is possible with JavasScript and let them spend a month watching someone prompt the system and let them go from there.


The most challenging part of software development (and the reason we have well-paying jobs) is not understanding syntax, it's analysing and abstracting a problem domain into a set of cleanly separated modules that interact to solve those problems. That being the case then, actually none of us is getting replaced by GPT-n any time soon - 'prompt engineering' will just become the new Javascript, only more abstract; just another tool in the toolbox. Hopefully :-)


Correct. But once everyone has that tool in their toolbox everyone will become more productive, meaning skills scarcity will be greatly reduced. In turn that will lead to massive wage depression.


You mean such as when high level languages became mainstream and we no longer needed to code in assembly language? Or when IDEs became widely available? The underlying design skills are still difficult to acquire and not displaced by new tools - that is at least until SoftwareArchitectGPT comes along...


I self taught myself how to code and have never been very good, I don't code often and when I do I spend a lot of time relearning some simple programming detail I forgot.

ChatGPT (also copilot) allows me to focus on the project that I'm working on and offload the stack overflow searches to prompting. I don't have to find a similar error someone else posted on SO and figure out how it applies to my current problem. I can keep a high level view of the project and not get bogged down with silly bugs, learning new libraries, or deciphering someone else's code.

I imagine there are a lot of people who are in a similar situation, it's crazy that we've just unleashed this massive productivity booster onto millions of people.


> I imagine there are a lot of people who are in a similar situation, it's crazy that we've just unleashed this massive productivity booster onto millions of people.

Maybe it makes me sound like an elitist git, but I remember when the coding bootcamps started "unleashing massive productivity boosts" by promising to make people "full-stack engineers" in just a 6 weeks, and I still shudder to remember the code horrors I've seen as a result of that.


To be clear, I don’t think we are getting a bunch of rockstar coders from chatGPT prompt engineering.

It’s more about the CFO who can now write his own SQL queries and dashboards, project managers can dive into details they don’t understand and come to meetings a little more prepared.

This boosts everyone’s productivity, not just coders.


One thought that came to mind when the author mentioned that GPT-4 seemed to want to use A* was that maybe the problem statement was not formed in a way that would get the best response. I'm not entirely sure if this type of back and forth would be useful, but just struck me as interesting to try.

I put the following prompt into ChatGPT (no GPT-4 access for me yet):

*A problem statement is given as follows:

An agent needs to find the best path on a 10x10 tile grid from their current location to a target location. They have a limited movement range of 5 points. Regular tiles cost 1 point to move through, water tiles cost 2 points to move through. Fire tiles cost 1 point to move through, but they should avoid pathing through them even if it means taking a longer path to their destination (provided the path is still within their limited movement range)

Please point out any parts of the problem statement that require clarification.*

Which returned the following response:

Based on the given problem statement, some potential areas that may require clarification are:

Current location and target location: It is not clear if the current location and the target location are fixed or if they change with each problem instance. If they are fixed, it should be mentioned explicitly. If they are not fixed, it should be specified how they are generated for each problem instance.

Movement range: It is not specified whether the agent's movement range is circular or square. If it is circular, it means that the agent can move to any tile that is within a distance of 5 from its current location. If it is square, it means that the agent can move to any tile that is within a range of 5 tiles in both the horizontal and vertical directions. The shape of the movement range needs to be specified.

Fire tiles: The problem statement mentions that the agent should avoid pathing through fire tiles even if it means taking a longer path to their destination, but it is not clear how the agent knows which tiles are fire tiles. It should be specified how the agent perceives and distinguishes fire tiles from regular tiles.

Tile costs: The problem statement mentions that regular tiles cost 1 point and water tiles cost 2 points, but it is not specified what happens if the agent moves diagonally. It should be specified whether diagonal moves are allowed and how much they cost.

Objective function: It is not specified how the "best path" is defined. It should be mentioned whether the objective is to find the shortest path or the path that minimizes some other cost function (e.g., total cost of the path).


Interesting, not a single one of these things it asked for clarifications on are things it actually got wrong in its suggested solution


I've found similar success in using this approach rather than going direct to "write me some code AI-code-monkey". You can also take the following (which are the same sorts of things you'd want to do as a programmer writing this code yourself:

- Rewrite the problem as a list of constraints

- Point out any ambiguous or missing constraints

- Write an example that demonstrates each constraint

etc.

Effectively this article is really asking the question when posing a novel problem to the LLM, how deep does the internal state go in producing the completions. When it doesn't go deep enough, the trick is to make it do the things that deepen the internal state.

I'd guess that supplying a good system message to GPT-4 (waiting for my access) would help. Something like:

You're an engineer responsible for writing correct code from a specification. Break down the specification into small chunks that can be explained simply. If there are ambiguities, seek clarification. Only write code once all ambiguities are resolved and each part of the problem is described simply.


I really like that approach of asking it what needs to be clarified. That's going straight into my GPT toolbox.


This is perhaps one of the most impressive responses I've read. It truly seems like there is some reasoning happening. I don't understand how this can be the output of a generative LLM.


> It truly seems like there is some reasoning happening. I don't understand how this can be the output of a generative LLM

Right - this seeming "cognition" is exactly what's so spooky about the whole thing.

Here's what spooked me out from yesterday: https://news.ycombinator.com/item?id=35167685 - specifically how it determines the divide-by-zero error in this code: https://whatdoesthiscodedo.com/g/6a8f359

...which demonstrates GPT as being capable of at-least C++ "constexpr"-style compile-time computation, which shouldn't even be possible if one presumes GPT is "just" a giant database storing only multidimensional word similarity scores and sequence distribution from text inference.

> a generative LLM

I definitely wanted to believe that GPT was "just predicting the next word" - it was somewhat comforting to think of GPT as still being far from being-human or behaving like a real brain does - but that explanation never sat well with me: it was too simplistic and dismissive, and didn't explain the behaviour I was witnessing in GPT.

...so after having read Wolfram's (surprisingly accessible) article on GPT ( https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-... ) it made a lot of things "click" in my head - and enabled me to start to understand why and how GPT is capable of... the surprising things it does; but it also leads me to believe we're (warning: incoming cliche) barely scratching the surface of what we can do: right-away I do believe we're almost at the point where we could simply ask GPT how to adapt it into some kind of early AGI - and we've all heard of what's supposed to follow next... and it really is quite unsettling.


Whoa, that explanation is really cool.

> capable of at-least C++ "constexpr"-style compile-time computation, which shouldn't even be possible if one presumes GPT is "just" a giant database storing only multidimensional word similarity scores and sequence distribution from text inference

I don't see how being a giant word-DB necessarily disqualifies compile-time computation. You can view computation as applying a series of term rewrite rules to an input until some sort of termination condition (or indefinite loop). In the case of these AI, the input is the prompt and predicting the next token is a limited form of term rewriting (where the rules are probabilistic and based off the network), and because code and explanations were probably included in the training data, it seems reasonable to me that the "rewrite rules" of Python bled a little bit into the AI.

It makes me insanely curious about the internal structures though. I gave that site 2 similar examples: one produces a correct explanation while another produces an incorrect explanation. The difference: a deleted line of insignificant whitespace

* https://whatdoesthiscodedo.com/g/dd2af89

* https://whatdoesthiscodedo.com/g/45ea060

From those 2 examples, I think its pretty clear that the AI's "rewrite rules" don't always coincide with Python's, but I would expect this to be mitigated by targeted training (like Copilot).


In the same vein, I was playing around with its ability to annotate code with explanations of the changing constraints on different variables (I thought this was a good proxy for deeper understanding of code).

I asked it to write a moderately complex algorithm, and it chose to write a function to compute the Pollard's Rho algorithm.

Then, with a few example comments to show what I meant, I asked it to annotate the code to show how the constraints on the variable changed as each line was executed.

This was its output:

  # n : an integer n > 3, no upper limit
  # seed : an optional integer parameter (default = 1) for the polynomial function
  # pollards_rho(n, seed) : an integer or nil, representing a non-trivial factor of n, or nil if the algorithm fails to find a factor

  def pollards_rho(n, seed = 1)
  return n if n <= 3

  x = 2
  # x : an integer, initially set to 2, will be updated using the polynomial function f
  # x will always be in the range [0, n)

  y = 2
  # y : an integer, initially set to 2, will be updated using the polynomial function f
  # y will always be in the range [0, n)

  d = 1
  # d : an integer, initially set to 1, will store the greatest common divisor of (x - y) and n
  # 1 <= d <= n

  while d == 1
    x = f(x, n, seed)
    # x : updated using the polynomial function f, x = (x * x + seed) % n
    # x will always be in the range [0, n)

    y = f(f(y, n, seed), n, seed)
    # y : updated using the polynomial function f twice, y = ((y * y + seed) * (y * y + seed) + seed) % n
    # y will always be in the range [0, n)

    d = gcd((x - y).abs, n)
    # d : the greatest common divisor of the absolute difference between x and y, and n
    # 1 <= d <= n, such that (x - y) % d == 0 and n % d == 0
  end

  d == n ? nil : d
  # Returns a non-trivial factor of n (1 < d < n) if one is found, or nil if the algorithm fails to find a factor
end

This is not actually the most impressive thing I've seen it do, but it illustrates that, even if GPT doesn't "really understand" code, it can emulate understanding to an impressive level. I'd go further and say that it opens up new questions as to what understanding actually means.

One personal "woah" moment was asking it to write some unit tests for a simple 2d game GPT and I wrote together. One function, "create_area" took a 2d array of characters (representing a map) and four integers representing coordinates, and a tile type. (The purpose being to create a rectangular area of the desired tile on the map according to the passed coordinates.)

GPT-4 successfully figured out how to write a unit test: it created a 5x5 array of ROCK tiles, passed it to create_area with the coordinates 1, 1 and 3, 3, and successfully figured out what the output should look like, even writing a fairly concise test to check the output (modified) 5x5 array. This was an eyebrow-raising moment for me: it made clear that GPT really does emulate some kind of "computation" internally, though quite possibly in some abstracted form. The geometric nature of this problem stuck out to me: a human can "see" a 2d array as a rectangular grid, and might realise the function carved out a smaller rectangle from that grid, but I never expected to see a computer (let alone a language model) figure it out. Interesting times, indeed.


I'm a casual programmer, who knows enough to write decent Python scripts, but who is probably unaware of 99% of the Python library modules that have been written. Yesterday I had GPT-4 write a script that would accept the name of a star, and print out all the journal articles that have that star's name in the article title. This is a bit trickier than it sounds, because almost every interesting star has many names (Vega, for example, has more than 60 names, not including non-English names) and I wanted the script to check the titles for all the names that might be used for the particular star I had specified. I told GPT-4 to use the SIMBAD database to get all the star names, and to use NASA ADS to get all the publications. GPT-4 wrote a script to do that. The script was buggy, but I was able to fix the bugs easily and quickly. The wonderful thing was that GPT-4 used 2 different libraries that I had never even heard of, to pull data out of those databases. The process of producing the script was far faster than I would have been able to do on my own. Professional programmers may be well aware of the software packages that will allow them to do their jobs, and might not get much help from a GPT assistant. But I think for people who know how to program, but are not professionals and may not know "what's out there" in the way of resources, a GPT assistant will vastly increase their ability to use their programming skills (such as they are...) to get useful stuff done.


> The useful thing to do would be to just say “I do not know of an algorithm that does this.” But instead it’s overcompetent in its own capabilities, and just makes shit up.

I had recently very similar reaction. And then realized, that this is exactly same behavior as with many of my colleagues at work...


I had a teacher with a sign on her door that read:

    Technology will not replace teachers
    But teachers who use technology will replace those who don't
s/teachers/programmers/ and s/technology/AI/ and this sounds about right. It may become typical or even required to leverage AI to write code more efficiently.


I tried out gpt4 today with the task of “take some html files made by a non technical person using various versions of microsoft word over a decade ago and put the contents into a csv” and it hasn’t done great. Not terrible, but not great.

That being said, I don’t know anybody talented enough to handle it that would even look at this project for $20 so ¯\_(ツ)_/¯


An alternative path would be to tell it to write python (or $LANG) code that can parse these HTML files and output the right CSVs.


That’s what I’m doing! I found myself amazed by the sheer terribleness of the html that Microsoft Word output. I managed to get it to write a script to clean up the files last night, but it took a surprising amount of finessing to avoid the myriad footguns involved in dealing with this godawful format.


There are plenty of edge cases where it fails. However, the one thing that made me think it actually knows (for certain definitions of the word "knows") what it's doing was asking it to re-write a non-trivial SQL query into equivalent relational algebra. I created a simplified schema from Northwind [0], gave it the CREATE TABLE statements for tables, and then some sort-of-TSV files for the values.

It was able to not only produce reasonable outputs from various queries, but also to produce valid relational algebra for them. To me, that shows a fairly deep level of understanding of the underlying concepts.

[0]: https://en.wikiversity.org/wiki/Database_Examples/Northwind


It's not just edge cases where it fails. It fails all the time at all kinds of things.

I've been using chatgpt in my work, but I have to essentially know the answer it's going to give me because I have to catch all of its mistakes. It really, really nice for certain kinds of drudge work.

Using northwind is probably not a good thing to use to evaluate chatgpt's general capability. It is very commonly used for examples of almost anything database-related, which means it's extremely well represented in chatgpt's training data. Chatgpt probably doesn't need to generalize or understand much of anything about northwind to answer your questions in terms of it. You need to try it on wacky problems specific to you.


> Using northwind is probably not a good thing to use to evaluate chatgpt's general capability.

I did think of that, which is why I modified the schema, and removed any reference to Northwind (normally products are prefixed with "Northwind Traders"). That said, it isn't a particularly complex schema in my example, but it's a reasonable starting point for something a small business might use.


Could you share the results?


Here's a gist [0]. You can see that it made a couple of minor semantic mistakes, which it corrected. It also regressed a bit at the end, but it _was_ able to correctly parse the (admittedly simple) math in a WHERE predicate and return a partially valid tuple, with respect to pricing.

[0]: https://gist.github.com/stephanGarland/ed18f8f8fdc63a0b997f9...


What's most astounding is that it was expected to work when so much of the prompt just doesn't make sense:

> Objects should only move if they will end up on an empty tile after their move

> "An object is free to move onto a tile that another object moves onto if that other object is moving off of its tile"


If the issue was the prompt, I expect someone will prompt it with a better wording of the problem and get a decent algorithm, and post it in this conversation. Let's see what happens...


Repeat the experiment in a better way and blog about it then rather than negging in the comments. I'm not the author, but I'll await your peer review.


In the post, author says that the GPT-3 version it apparently worked well. Which is really is amazing to me.

The prompts really read in a way where I wouldn't have expected the "correct" output.


I'm not sure that the commenter was negging here but instead showing that the language used in the prompt was imprecise (in a similar way that we might get system requirements in reality). The takeaway I got was more that the commenter was showing surprise at how well that ambiguity was resolved regardless. A more interesting experiment to run might be to intentionally reduce the precision of the task e.g. something

    In a 10x10 arena, work out if there is a path shorter than 5 steps from one start to finish with water tiles taking two steps and avoiding as many fire tiles as possible.

But you can go even better than that really. The author jumps down to a more code oriented description too soon (10x10, etc.) but ChatGPT (3.5) does ok with just the original problem (I removed the text about A* and Djikstra's from the english version of the problem statement:

PROMPT:

In Mewgenics, movement abilities use pathfinding to get the cat from his origin to his destination.

Cats have a maximum movement range stat and tiles have a cost (in this case its 1 for basic tiles and 9999 for blocking obstacles). We also have water tiles that cost 2 to path through.

The complication comes in when we add Fire (and other hazard type tiles) into the mix. Fire tiles don’t cost extra to pathfind through like water tiles do, however a cat really wants to avoid pathing through the fire tile if it can. If there’s a lot of fire tiles it should go through the least number of them it can.

ANSWER:

To solve this problem, you can modify the pathfinding algorithm to take into account the hazard tiles, such as fire. One way to achieve this is by using a modified A* algorithm.

First, assign a high cost to hazard tiles such as fire. This cost should be higher than the cost of regular tiles, but lower than the cost of blocking obstacles. For example, you can assign a cost of 5 to fire tiles.

Next, modify the heuristic function used in the A* algorithm to take into account the distance from the hazard tiles. For example, you can modify the Euclidean distance heuristic to add an extra term that takes into account the distance from the closest hazard tile. The closer the cat is to a hazard tile, the higher the heuristic value should be.

Finally, when selecting the next tile to visit, prioritize tiles with the lowest total cost (including both the movement cost and the heuristic cost), but also consider the number of hazard tiles that the cat would need to pass through to reach that tile. You can add an extra penalty term to the total cost that increases based on the number of hazard tiles the cat would need to pass through to reach the tile. This penalty term should be high enough to discourage the cat from pathing through too many hazard tiles.

With these modifications, the cat should be able to find the shortest path to its destination while avoiding hazard tiles as much as possible.


I think the question also needs to be asked of- can GPT-4 write accessible/WCAG compliant code? I just spent the last two days at a digital accessibility conference called axe-con, financed by a firm called Deque who primarily make AI/ML powered software for the detection of accessibility issues in code

The resounding attitude seems to be that AI/ML is a friend to disabled users and can help do a lot of lifting with writing, maintaining, auditing code- but we are a long long ways away from fully automated processes that account for accessibility and produce websites that will work with assistive tech like screen readers, if it is possible at all


Watching GPT write code hurts my brain.

It starts at the first character, works forward one “token” at a time, and ends at the last character. Never moving back.

It feels like it knows where it’s going at the first character, even though it doesn’t.

It’s like it starts speaking a sentence and, by the time it’s done speaking, it’s written a syntactically correct Node.js application.

The way GPT communicates in English does seem similar to how humans communicate. The way GPT writes code doesn’t seem to come close to approximating how humans do - it’s an entirely different mechanism. Humans generally can’t write code without a cursor and backspace.


Speaking from ignorance (I've not studied attention nor transformers): This is my feeling too. I feel like the next step isn't far away: a more explicit model of the world with a mechanism to both query and "feed back on" that model, correcting mistakes.

If it's possible to get so far when that functionality seems in an important sense basically missing, imagine how far it'll go when that does happen.


I find it interesting that many people took a defensive position towards AI. For many the discurs seems to be "will this AI thing eventually replace me and kick out of me job".

For me it's more like will that AI thing make me a 10x developer? And the answer I'm leaning for is yes.

I use copilot which saves me time googling and reading stackoverflow. I use chatgpt for writing tests to my code (which I hate to do myself). Sometimes I use it to ping-pong ideas, and eventually set on a good solution to a problem.

It saves me tons of time I use to complete other tasks (or spend with my family).


Yep. I wanna dial back to being a 0.2 programmer thanks. GPT can fill in the rest. I'll be hanging outside with my kids.


I had the exact same experience. Writing code for existing popular problems is phenomenal. But when you diverge slightly, it breaks down. I asked it to write a regex which finds all html tags that has a specific class name, but does not contain another specific class name. I assume this problem has been tackled many times by scores of developers. It had outputted an excellent regex. I asked it to ignore texts in inline script (such as event handlers), and it presented an invalid regex. I tried to point out the problem but it just went into a loop of bad regex code.


I've used chatgpt and while it's mediocre, even bad at writing code, it's very good at reading code and explaining it in a way that's easier to understand. It's also good at giving hints and starting point for when you don't quite familiar with the language feature / library.

From writing code, they're good at bootstrapping unit tests and skeleton code, also useful at transpiling dto / entities between languages.

Overall if you're willing to learn and not just treat a gpt as code monkey, they're very useful.


I like how it ends with it can write code that is repetitive and has many examples of solutions floating around which is most of coding.

It will probably edge upward as well as time goes by till there are only very edge problems that it cannot solve. Even then I would use it to write the broken down version of the solution. It is going to be getting fed by pretty much every programmer,knowledge profession on the planet using copilots of some sort. Eventually it will have knowledge transferred everything humans can do into its model.


One other question: can GPT-4 reliably modify code? In A Philosophy of Software Design the author points out that code is written once and modified possibly dozens of times, so ease of maintainability/reading is more important than ease of writing.

I wonder whether a) AI can reliably modify code and b) whether AI can reliably write code that is able to be easily modified by humans. If AI starts spitting out machine code or something, that's not useful to me even if "it works."


It can do edits, yes. You can also generally specify what language to use so it shouldn't jump from C++ to assembly unless you tell it to.

The bigger edits (refactoring things across an entire project) is out of reach because of the token limit. You could do it piece-meal through ChatGPT but that seems more tedious than it's worth.


> so ease of maintainability/reading is more important than ease of writing.

That may be the case now but in a theoretical future where software systems are generated by AI why would I bother modifying the old system? Why not generate a new one with the original prompts modified to meet the new requirements?

In a sense the "source code" of the system could be the AI model + the prompts.


It is far from being able to solve difficult or even annoyingly complicated problems that programmers solve on a regular basis just by a one-shot prompt.

Ask it to parse a PDF document and separate it into paragraphs, for example. The first solution isn't gonna work well, and by the time you get to solving yet another quirk while it apologizes to you making a mistake, it will lose context.

Best way to use this tool is to ask it short and precise questions that deal with a small piece of code.


I asked it to write the code for all the unique combinations of A,B,C,D in PhP, after 27 tries it succeeded. The i asked it to solve the problem of a horse is 15 dollar, a chicken one dollar and a egg .25 dollar, i can spend 100 dollar for 100 items, some of each. After 2 hours, it was not able to solve it. One time it gave 5 possible answers, with the correct one also, but it did not recognize the correct one.


From my own sampling of going through about 10 times I used GPT for real world Typescript code, some used in production, I can confirm that GPT-4 does a noticeably better job and produces code I actually want to use way more often.

GPT-3.5 always produced very verbose types and over engineered code. The GPT-4 outputs were consistently shorter and more focused. Kind of like how a junior dev has to think through all the smaller steps and makes functions for each, as he incrementally solves the problem slower and less intuitively, almost over explaining the basics, while a senior dev merges the simpler stuff into small concise functions. You can see it with the var names and type choices GPT-4 focused much more on what the code is trying to accomplish rather than what the code itself is doing. And these are all with the same prompts.

There’s still things like unused vars being included occasionally and some annoying syntax choices, if I could append prettier/eslint rules automatically to GPT output it’d be gold (I haven’t tried to do this myself).

But still very encouraging.


If someone can make an ESLint/prettier rules to GPT prompt script I’d love you. Or even the airbnb-base baseline one.


If you've never worked as a dev/in product, this will not help you. If you have a working understanding of your codebase, as a product person, and can bump your way through writing code and the command line, it WILL help you immensely. Source: I wrote an integration that connects our API to google drive to pass video content (something I could have NEVER done before).


I think most folks (perhaps not here) misunderstand is that writing code is the easiest part of a software engineering job. Anyone can write code in their little piece of the system, write some tests to prove it works and move on. Given enough time I feel most good software engineers can do that part of the job without issue.

Knowing how code might fail and preventing cascading effects, tuning resource usage, troubleshooting incidents are the actual hard parts of software development and it's where even good software engineers tend to fall over. We've created whole specialties like SRE to pickup where application developers fall short. I've seen lots of systems fail for the dumbest reasons. Thread pools misconfigured, connection timeouts with poor configuration, database connection pools are completely incorrect.

Wake me up when ChatGPT can troubleshoot at 1 AM when the SRE and on call engineer are both frantically trying to figure out why logs are clean but the service is missing it's SLO.


A great article with a practical example of a programmer using GPT to solve a problem it hasn't seen in its training data. It gives plausible but incorrect answers and the user isn't able to prompt it to correct them.

It seems likely that a understanding of when NOT to use an LLM is a new skill programmers are going to want to learn in order to use their time efficiently.


It's not bad. I just had it write a small API call to the NHL endpoint to gather some data for something I was curious about stat-wise.

Anyway, I initially had it write it in Python, and it mostly worked, but I was having some issues getting the data exactly right, and formatted the way I wanted.

Once I had it more / less right in Python, I had it rewrite it as a dotnet console app (C#), which is what I know best.

The only real issue I ran into is it would randomly stop before completing the conversion to dotnet. Like it would write 85% then just stop in the middle of a helper function. Not a huge deal, I just had it complete the last function, and with a little bit of fiddling in VS Code got it running pretty much the way I wanted.

So overall, yeah, not bad. Probably saved me an hour or so, plus I couldn't find great docs for the NHL endpoint, and ChatGPT was able to sus out the correct syntax to get to the data I needed.

I wonder how git Copilot compares, has anyone tried out both?


I have been writing a text editor, and I'm currently working on the VT100 stuff. Unit testing VT100 is a lot of busy work. There's a bunch of different message frames (DCS, OSC, CSI, etc.) and many, many, escape codes.

I decided to try out CodeGPT.nvim, and it was a massive help. It didn't provide perfect code, not by a long shot, but it gave me extremely valuable starting points - and did a somewhat decent job of exercising most of the branches (certainly enough for me to be happy): https://gitlab.com/jcdickinson/moded/-/blob/main/crates/term...

Many people have said it, and it's true. Expecting GPT to write a complete solution is just asking for problems, but it is an incredible assistant.


How are people generating multiple files for larger applications?

I gave it a prompt and asked it to respond with a list of file names required to build the app. Then when I prompted a file name it should print the code for that file along with a list of ungenerated file names. It got through two before it got confused.

I’m stuck with having it write one function at a time.


> I’m stuck with having it write one function at a time.

Because thats the most it can do. Claims that it can write code are getting quieter. People made wild claims on reddit but when prompted to share their code they either went mute or the code was hilariously amateurish and limited.


I'm actually quite excited by what the examples in the article show – despite the fact they show GPT-4 can't replace a good dev solving somewhat tricky algorithm problems.

Reason: I'm a code hobbyist who glues various modules together that have been written by much better programmers than I am. My end goals are never more ambitious than doing pretty simple things which I'm doing mostly to amuse myself. My biggest time sucks turn out to be tracking down fairly simple syntax things that vary between different languages and frameworks I'm slapping together (because I rarely spent more than a couple hours working on any one thing, I never get super familiar with them).

Being a lousy coder with little desire to put in significant effort to improve just to make my personal hobby projects a little easier, a basic AI assist like this looks pretty useful to me.


The job of a programmer, in a business context especially, is to take real-world requirements, and convert them into clearly defined systems where they can be solved / reasoned with.

I once had a manager telling me what needed to be done. Even with an actual person (me) in the loop, the code produced would often have glaring differences from what he wanted.

By its very nature, code requires a lot of assumptions. In any business context, a lot of things are implicitly or explicitly assumed. If you need a computer, or another person to give you exactly what you desire, you need to be able to spot the assumptions that are required to be made, and then clearly state them. And after a point, that's just programming again.

So this, or some other AI, is more likely to replace JS and python, or create another level of abstraction away from systems programming. But programmers will still always be required to guide and instruct it.


I was thinking last night about how my job as a pretty average software engineer is probably going to be taken by GPT* in less than 5 years, and how skilled blue collar jobs like electricians and plumbers and carpenters are probably much safer, since robotics is way behind AI.


You are not far from the truth to be fair. Software development as a career is bound to regress ai or not anyway. The goal is likely to turn it into manufacturing, and to adjust costs accordingly.


I feel like gpt is basically just stack overflow on steriods. With all the pros and cons that entails.


People need to understand that AI doesn't think, doesn't have insight or intuition. AI just repeat patterns it saw in a huge database, but are not able to understand what is going on.

Nobody can really understand what's inside a trained neural network, and nobody is really looking.

No psychologist or neuro-scientist can really understand how a human brain, a mouse brain or even an ant brain or a fly brain even works, so don't expect computer scientists to have any insight about doing something relevant with just a small collection of sophisticated statistical methods.

AI is soon going to become the pseudo-scam status that bitcoin experienced.

ChatGPT is an improved search engine at best.


Write a short poem called "All just patterns" in the style of Blake as a not-so-subtle dunk on those that can't see the wood from the trees w.r.t. AI progress.

GPT4:

In patterns woven in the loom of life,

The threads entwined, a tapestry of strife;

Man lost in shadows, seeks the hidden light,

Yet grasps at embers, blind to Heaven's sight.

All just patterns, the trees and wood we see,

While we, like infants, crawl with bended knee.

In ceaseless search, we delve in Nature's tome,

For signs and symbols of our destined home.

To find the wisdom locked in Nature's breast,

In circuits traced by human hands, impressed;

No gods or angels guide the curious hand,

But human thoughts give life to grains of sand.

Like children playing with forbidden toys,

We forge our own creations, false and coy;

Our machines breathe and mimic human soul,

But hollow shells, as empty as the whole.

In tangled webs of artifice we're caught,

By our own hands, the noose of fate is wrought.

As God did cast Prometheus from the skies,

We bind ourselves in chains of our own lies.

All just patterns, like woven silk, we spin,

Entangled dreams, mere shadows of our kin;

For in the end, the art we do create,

Is but reflection of our human state.


I introduced my colleagues to chatgpt this morning and they're knocked out. We deal with people, and the people answers are considerably more thoughtful than 'improved search'. Not sure where btc and and pseudo-scam come into it.


I didn’t have much luck with ChatGPT trying to solve a novel problem (sorry can’t share details), it gave answers that kind of sounded plausible if you didn’t really understand the problem but in reality were no help. It also hallucinated a bunch of research papers that sounded really useful haha.

Will have to try GPT-4 for the same thing and see if it’s any better, I suspect though that this kind of genuinely novel problem solving may be beyond its current abilities (unless you work through to step by step in a very granular way, at which point you’re solving the problem and it’s writing the code - which could be a glimpse of the future!)


I use it to reduce the drudgery of writing code like this. I've found I have to do a lot of hand-holding in terms of telling it what data structures and logic it should use. I also just directly tell it what changes it needs to make to fix big bugs or logic errors I spot. That gets it to the point that I can tweak it myself and complete the code.

One of the frustrating things is that it doesn't ask for clarification of something that's unclear - it just makes an assumption. Really demonstrates why software engineering interviews emphasize the candidate asking clarifying questions.


I've wondered why got doesn't ask questions - that's a basic expert skill.


This is one of the best analyses of gpt4 Ive read so far. Besides potentially including the visual aspect, I wonder if part of the reason it has trouble with harder problems is that it’s been tuned/prompted in a suboptimal way. The advertised used case mostly is “write down the solution for this problem”, but for novel problems it does much better when it’s given the chance to reason through it before trying to write down a solution. I wonder how much better it would do with a prompt like “try to work out a way to solve this problem, and then validate it to be sure if it’s a correct solution.”


So what is a software company going to do when people can use their own products to replace them ?

It's a slippery slope for M$. If ChatGPT 15 can just build MS Outlook from looking at photos of the UI, design a hololens, or tell us the secrets of how their Chat bots work, not sure how much future they're going to have as a company?

What I can see being the new thing is "innovation". People building useful solutions that the LLMs don't yet know about.


I wonder if the multimodal capabilities would be helpful on easily visualized problems like this. Could it benefit from seeing the diagrams? Seems far fetched, but so did its current capabilities a few months ago.


I had some Matlab code and I wanted it to be ported to numpy. I couldn't get it running on Python and it wasn't doing it correctly on chatGPT.

On the other hand it could regurgitate code to use fastapi and transformers and it looked correct to me.

When you think about it this is very very similar to a stack exchange or google search but with a much different way to search and it can synthesize simple things which limits the complexity of what you want to do. So I don't really think it can write code but it can surely get you something that gets you 50% there.


Whilst maybe GPT-4 will change this, I think it is important to remember that these general ChatBots are not the way we have generally trained LLMs to write the best code. In fact, coding is one of the few areas where training specifically just using source code and maybe some stack overflow (not all natural language on the internet) leads to better results on the previous iteration of LLMs (GPT-3 wave). So the real test will be whether the GPT-4 wave of specific coding LLMs i.e GPT-4-Codex can 'actually write code' see:

AlphaCode Codex CodeGen


I wanted GPT to help me write some code for unreal engine. I was very impressed with what it could do. It was able to write code that correctly utilized Quartz, an experimental plugin for queuing things on the audio thread. Which is awful impressive given that Quartz is super niche and doesn't seem to have basically any documentation around to train on for cpp code.

I presume it is because unreal engine is source available and the model has seen the whole damn thing.

I'm curious if it must be worse on unity, which is not source available.


Ah, in my experiments it writes like > 90% of the code correctly.

I got the best results with prompts like:

Given the following python code:

``` Few hundreds python loc here ```

Write tests for the function name_of_function maximizing coverage.

The function in this example had a bit of read/dumps from disk and everything. The code returned correctly created mocks, set up setup and teardown methods and came up with 4 test cases. I only needed to fix the imports, but that's because I just dumped python code without preserving the file structure.

I am amazed how fast these models are evolving.


Once GPT can look at the open issues in my GitHub repos, and submit Pull Requests that legitimately solve the problems, then I'll worry that AI might be coming for my job.


I'm building a pretty neat database with it at the moment, its not perfect, but it is saving me potentially months of fine tuning, down to just hours. It is amazing IMHO.


GPT 3.5 helped me debug something very complex. There was a bug related to symlinks in neovim with gopls LSP. The error dialog line was appearing, then disappearing.

Chat GPT walked me through strategies to debug this, confirm everything was set up, tail the RPC log (wasn't aware that was a feature) - and identify the failing path - which was a symlink!

I'm actually blown away by this capability. It was like having a savant next to me. I couldn't have debugged it on my own.


Off topic: This problem was fun. I would really enjoy a site with a pile of nonstandard, fun, and interesting problems like this one.

Coming up with a working algorithm took about 30 seconds (I got lucky, not brilliant), but it stretched my brain in an interesting way.

That's different from practice sites like leetcode, which have pretty cookie cutter problems. On problems like this one, sometimes:

- I get it in a few seconds, like this case

- Get it in a few minutes

- Get it in a few hours

- Give up and look up the solution

A fun problem a day would be, well, fun.


The only argument I've heard against our impending doom is:

The productivity gains will not leave people unemployed, but will give managers the opportunity to start more projects.

The role of a developer will change. We'll be looking at generated and regenerated code. But we'll still be demand by those with ideas and never-decreasing human demand.

This assumes that GPT-X won't end up being used by end-users--bypassing both the C-level, the managers and the developers.


When business A's lightly technical aware AI operator asks their AI for a solution to push payment information to a bank B and describes it, and A's AI talks to the bank's AI and they coordinate the creation of the API then A's and B's AI talks to their respective production counterpart AI's and they create the implementation and put it into production; I feel we programmers will mostly be obsolete.


As expected, LLMs don't actually think. This is not really a surprising result when you understand that it's a few billion Markov chains in a trenchcoat.


Me & many senior engineer friends are starting to use ChatGPT like "junior engineers" to write code them. It's not even a question anymore.


I came up in the 90s, used a lot of dreamweaver to build sites, all my friends thought I was a wizard because I had a website and that required you to program interweb stuff. Then the net became pretty complex, I gave up around dhtml but always really appreciated and loved what folks could do with the DOM. I've been thinking a lot recently that GPT might allow me to build again, I have some of that 90s dreamweaver vibes using it.


"I think ChatGPT is just kind of bullshitting at this point."

This line sums up the entire problem with these tools for anything concrete, like analyzing input data, writing code, producing a series of particular facts, data analysis etc. Much of it can be right, but whatever isn't makes the whole output useless. You'll spend as much time checking its work as producing it yourself.


GPT-4 can write code, but it can't build software.

Well...it could build software if humans gave it the right prompts. Coming up with the right prompts is difficult, because it means you're asking all the right questions.

If you're just really good at writing code, then yes, GPT is coming for your job. Do what humans are good at: responding to the needs of other human beings and building solutions around them.


The first problem statement that GPT got wrong actually shows a problem with human language.

"Avoid fire" means "do not ever go through fire" to me (and GPT thinks the same, apparently). The author thought it meant "avoid fire if you can, but go through it if there's no other way". This was a problem with informal requirements that could have happened in an entirely human context.


I find adding 'before you answer, ask clarifying questions' to a prompt is quite helpful in avoiding these traps.


I skipped the article and only read the prompt. I have no idea what to do with fire. Why does it matter what it costs if you must avoid it?


I think what GPT-4 is missing: a feedback loop.

Imagine you were GPT-4 and being asked to write a small program, but you can't try it out yourself.


The second overlapping crescent moon solution the GPT provides is really interesting. If it was hard to find a counter example I wonder if there is a restricted case for the radius of the inner circles for which the proposed algorithm is true. I don't have the maths to determine this myself but would love to hear speculation from others.


I don't need GPT-4 to write code, I can do that myself.

I want it to attend all the meetings for me with endless managers discussing what the code does, should do, could do, customer would like it to do, can't be done and so on.

Hint to managers: Programming doesn't take up my time. Your endless meetings to discuss my programming takes up all the time...


Here's another view. If you're a music composer, you hear the music in your head. But in order to get it out, you need to play and record musical instruments, learn to sign, learn to produce, etc. What if you had a device that takes music from your brain and gives you an mp3 file?

That's what I think AI is doing for developers here.


Maybe LLM Code writing will us slow down (initially). Productivity is hard to measure and can be counter intuitive.

Talking the whole time with your LLM may distract more than it helps.

https://en.wikipedia.org/wiki/Productivity_paradox


I'm sure I'll change my mind as this tech improves, but having AI generate code goes against every instinct I have. Way too easy for there to be a subtle bug in the code among other problems. It makes me wonder though if AI could useful for writing tests of my code. And also AI code review.


That's what this beta extension for vscode[0] does, generate tests using AI and your code as input.

[0] https://about.sourcegraph.com/blog/release/4.4


It brings to my mind the levels of self driving

We're definitely at 2 right now, and picking away at level 3.

I have heard some people skeptical that we can overcome the problems of truthfulness due to the inherent limitations of LLMs. But, at least on the face of it, it appears we can make incremental improvements.

If only they would actually be OpenAI

I have seen


True "Open AI" is coming, there's no way this is going to stay Microsoft alone for very long.

Many, many companies will be looking to integrate with this thing and no one is going to juts sit three and let MS take their lunch forever.


I would ask it to pretend a mathematician knows how to solve it, and is having a conversation with a novice programmer attempting to solve the problem, and pointing out mistakes and hints at each step, and gives examples of where it fails, until a proof is given that the program is correct.


Yes. I only need it to write 50-100 lines at a time to be incredibly effective.

My productivity this month has been insane. I'm still writing most of the code the old fashioned way, but the confidence of having this kind of tool makes it a lot easier to push through boring/tricky items.


Problem is reading someone else's old code takes an order of magnitude longer than writing new code


LLMs can explain code fairly accurately.


A solo developer can now afford an assistant. It's liberating, since it makes it easier to get some things done. So you can do more, or have more free time. You can get by using Midjourney for art, and GPT-4 to answer questions and occasionally help to write code.


Not sure it matters? If the majority of coding is gluing things together and it can replace that then you've suddenly got 10x as many coders gunning for the remaining gigs that have hard problems.

Good for whoever comes out on top, but not sustainable from a societal perspective


I think having a better understanding about the underlying statistical model of how these AIs are trained is helping me keep back the wave of fear and anxiety associated with AI risks.

The singularity requires AIs to be very good at doing things people have not done before. But this form of machine learning is bad at that. It is like someone who doesn't actually understand anything has somehow managed to memorize their way through whatever topic you're asking about. They have lots of tips and information about things, similar to what you might currently find by doing research. But they don't seem to have what is required to push the boundaries of knowledge for understanding, because they don't actually really have it in the first place. Or maybe what they have is just very minimal when compared to the contribution of their memorization.

Obviously you still have the main risks of breaking capitalism, mass unemployment, pollution of public communications, etc. But honestly, I think each of these are far less scary to me than the existential risk of superintelligence. So in a way I'm actually happy this is happening the way it is right now, and we don't have to deal with both of these risks at the same time.

Our current approach is probably the safest way to progress AI that I can think of: it requires a new model to improve, and it's learning entirely from human data. It might not seem like it, but this is actually pretty slow, expensive, and limited compared to how I expected AI to improve given Sci fi movies or Nick Bostrom's writings(curious what he'd have to say about this resurgence of AI)


So this guy is basically complaining about GPT-4 not being a super-intelligence. Still that makes it more powerful and versatile thae the great majority of programmers out there.... And the game is only getting started. This is just the warmup.


Blah Blah Blah. I use ChatGPT for this every day to write code to save my own efforts and it is doing just fine thanks. I also use it for creative content in my apps, although I edit this work to get the tone in its writing correct. It is excellent for this.


And more things to yawn over! Come on, who cares if it writes code?! Is it that fascinating?


I think AI will never write good code but can be very useful for very basic stuff, boiler plate or repetitive stuff, like a smart IntelliCode. In fact, I think MS built some AI in IntelliCode but not advanced stuff so they can sell GitHub Copilot.


For fun, I had a chat with it starting with a request to work out the math of crescent intersection, before committing to code. It still confabulated, but I was able to coax out a solution in the end that made sense.


I think it would have done well if you added an explicit rule like: "the path chosen should always minimize the number of fire tiles passed through."

The way the prompt was phrased sort of invited the all-or-nothing fire approach.


Code is just the rendering of an algorithm.

It might be best to prompt it with a high level description of an algorithm, then iteratively prompt it to refine its prior output or add more detail. Render to code should be the final step.


training ML model to code is a very interesting challenge. I am surprised by GPTs ability to code, given that it, as I understood it, has basically no tools at the ready. I am convinced that it is way harder to code without debugging and other interactive features both for a human and for a machine. Keep in mind that GPT could not have learned to simulate the code internally given its fixed runtime.

I think ML models need to learn how to interact with our tools (compiler, debugger etc.) to really be effective at coding. That's hard.


Prediction based on statistical probabilities != comprehension. So no.


If the outcome is the same, does it matter?


Have you heard of the halting problem? For one, you’ll never be able prove the outcome is the same.


In general. This does not mean that it could not be proven for most or even all relevant cases.


there's a bit of confusion when people say it's not going replace programmers because they all have tricky things to do in their work week.

This is not how it's going to happen : if your boring time-consuming tasks take virtually 0 time thanks to gpt, and let you focus on the 1% that's hard, you've suddenly become 100x more efficient, and can thus accomplish the same job as 100 you. That means the company can now fire 99 coworkers, keeping only you, and end up with the same result.


> if your boring time-consuming tasks take virtually 0 time thanks to gpt, and let you focus on the 1% that’s hard, you’ve suddenly become 100x more efficient, and can thus accomplish the same job as 100 you. That means the company can now fire 99 coworkers, keeping only you, and end up with the same result.

But it means that tasks where building software would only deliver 1% of the necessary value to pay for the cost of doing it are now suddenly worth paying for, so even if your company, being a stick-in-the-mud non-innovator that is going to stay in exactly the same niche doing the same thing cut 99% of its programming staff and used the cost savings on executive bonuses and stock buybacks, a whole lot of customers (and the new and pivoting companies serving them) are going to be spending money on programmers that weren’t before, so not only will your ex-coworkers still be employed more programmers in total will be, even if their work is now mostly higher-level abstraction and wrangling LLM code generators, with the level we think of as “source code” today being touched as rarely as today’s high-level application developers touch machine code.


But then competitor company B will keep its 100 devs and be that much more productive than your company, running you into the ground.


Just imagine if these models fall in love with languages like APL/K/J. Even with the context windows, they will be able to do tons of work.


I’m curious how long till we figure out if these algorithms are plagiarizing OSS or other code they come across like GitHub Copilot.

It requires special tools to actually figure out if this is happening. Having seen tests with such tools the problem seems a lot worse than commonly discussed.

Inserting stolen code or using OSS code in violation of licenses is going to be a big mess. Copying snippets versus pulling in dependencies creates tons of issues. Even if you get away with violating licenses you set yourself up for security issues if the tools plagiarize code with vulnerabilities in a way that won’t get updated.

It might mean this stuff is a useful tool for someone with a clue but not for someone who doesn’t know what they’re doing.


So, how long until GPT decides it wants to automate all this tedious work and starts trying to code its own language model?


GPT won't even have to decide that, we'll look for ways to expand the model to self learn and tell it to do just that. Self improving AI is the goal for a lot of people.

Not that this is a particularly controllable goal, nor a long term smart goal if you're human.


The only take-home message here is that people who claim to write 'self-documenting code' are well, let's not be hyperbolistic, but come on. No comments on that code example? Every line could have an explanatory comment, then the author could remember what they were thinking at the time and it would probably help the AI out too.

> "People who claim code can document itself considered harmful"


chill, I'm the only programmer on the project, and I don't have any problems understanding what the code is doing (only lost track of some of the "why", the process that led me there. which was only relevant here because I was trying to recreate that process with ChatGPT). The original algorithm involved a ton of trial and error from my end, so the "why" is really just "I tried a bunch of permutations of this and ended up with this as the version that worked".


tl;dr

> Given a description of an algorithm or a description of a well known problem with plenty of existing examples on the web, yeah GPT-4 can absolutely write code. It’s mostly just assembling and remixing stuff it’s seen, but TO BE FAIR… a lot of programming is just that.


of couse it can write code, but it can't do software development


I was mock interviewing ChatGPT for a few hours yesterday with application and system design + coding said application. My conclusion was it was a no hire for even the most jr positions because it required considerable amounts of direction to arrive at anything approximating an acceptable solution.

tldr -- this matches my experiences as well.


I'll be impressed when they can debug existing code.


Try again with chain-of-thought prompting?


Guys all of this happened within a couple months.

If you guys are confident about the entity as it is right now not taking over your job, what if I double the accuracy of gpt output?

What if I double it again? Then again? And again? And again?

You guys realize this is what's coming right? This thing literally is a baby as of now.


Is it true that GPT4 failure rate is .5 of GPT3?


I mean this is impossible to measure right? You'd have to compare all possible outputs of GPT3 with all possible outputs of GPT4.

You can get away with a random sample. But there's a lot of bias in that sample and it's hard to control it. Out of the infinite possibilities there definitely exists sets of inputs and outputs where both GPT3 and GPT4 are always wrong.

On the other side of the coin there are also sets where GPT4 is always right and GPT3 is always wrong and vice versa.

Given that there's no methodology to control what is "random" in these sets it's hard to come up with a good metric.

So the 0.5 thing is a bit of anecdotal number from the gut.


i was trying to get it to make a document scanner last night, it apologized to me like 10 times and we eventually got running code but the result was way off. this thing can write code but you're not gonna rely on it and nobody is gonna know it enough to edit it. it is not there yet, still very helpful for small things or extremely simple things. if you tell it to give you an express server with socket.io and your db, it will probably set that up for you perfectly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: