Hacker News new | past | comments | ask | show | jobs | submit login
LLMs and Programming in the first days of 2024 (antirez.com)
461 points by nalgeon 9 months ago | hide | past | favorite | 288 comments



Salient point:

> Would I have been able to do it without ChatGPT? Certainly yes, but the most interesting thing is not the fact that it would have taken me longer: the truth is that I wouldn't even have tried, because it wouldn't have been worth it.

This is the true enabling power of LLMs for code assistance -- reducing the activation energy of new tasks enough that they are tackled (and finished) when they otherwise would have been left on the pile of future projects indefinitely.

I think the internet and the open source movement had a similar effect, in that if you did not attempt a project that you had some small interest in, it would only be a matter of time before someone else did enough of a similar problem for you to reuse or repurpose their work, and this led to an explosion of (often useful, or at least usable) applications and libraries.

I agree with the author that LLMs are not by themselves very capable but provide a force multiplier for those with the basic skills and motivation.


I find that even beyond "activation energy", a lot of my exploration with ChatGPT is around things I don't necessarily initially even intend to take forward, but is just curious about, and then realise I can do with much less effort than I expected.

You can often get much of the same effect by bouncing ideas of someone, who doesn't necessarily need to know the problem space well enough to solve things but just well enough to give meaningful input. But people with the right skills aren't available at the click of a button 24/7.


I feel very left out of all this LLM hype. It's helped me with a couple of things, but usually by the time I'm at a point where I don't know what I'm doing, the model doesn't know any better than I do. Otherwise, I have a hard time formulating prompts faster than I can just write the damn code myself.

Am I just bad at using these tools?


I give you an example. I took advantage of some free time in these days to finally implement some small services on my home server. ChatGPT (3.5 in my case) has read the documentation of every language, framework, API out there. I asked it to start with Python3 http.server (because I know it's already on my little server) and write some code that would respond to a couple of HTTP calls and do this and that. It created an example that customized the do_GET and do_POST methods of http.server, which I didn't even know exist (the methods.) It did do well also when I asked it to write some simple web form. It did not so well when things got more complicated but at that point I already knew how to proceed. I finished everything in three hours.

What did it save me?

First of all the time to discover the do_GET and do_POST methods. I know that I should have read the docs but it's like asking a colleague "how do I do that in Python" and getting the correct answer. It happens all the time, sometimes it's me to ask, sometimes it's me to answer.

Second, the time to write the first working code. It was by no means complete but it worked and it was good enough to be the first prototype. It's easier to build on that code.

What it didn't save me? All the years spent to recognize what the code written by ChatGPT did and to learn how to go on from there. Without those years on my own I would have been lost anyway and maybe I wouldn't been able to ask it the right questions to get the code.


Regarding your last point about what it didn't save you, it reminded me of this blog post: https://overbring.com/articles/2023-06-23-on-using-llm-to-ge...


I've been learning boring old SQL over the last few months, and I've found the AIs quite helpful at pointing out some things that are perhaps too obvious for the tutorials to call out.

I don't mind taking suggestions about code from an AI because I can immediately verify the AI's suggestion by running the code, make small edits, and testing it.


But this is my pet peeve when people claim this; the only reason is because the AI code is small and constrained in scope. Otherwise the very claim that humans can easily verify AI code quickly would, like, violate Rice's Theorem.


That's an underutilization of the complexity theory. Since not all problems are formulated as being Turing-complete, there are better theorems to apply than Rice's Theorem, such as:

* IP=PSPACE (you can verify correctness of any PSPACE computation in polynomial time)

* NIP=NEXPTIME (you can verify correctness of any NEXPTIME computation with two non-cooperative provers)

* NP=PCP(1,log(n)) (you can verify correctness of any NP statement with O(log(n)) bits of randomness by sampling just O(1) bits from a proof)

What these means is that a human is indeed able to verify correctness of the output of a machine with stronger computational abilities than the human itself.


I'd reframe that slightly: it's not that you are bad at using these tools, it's that these tools are deceptively difficult to use effectively and you haven't yet achieved the level of mastery required to get great results out of them.

The only way to get there is to spend a ton of time playing with them, trying out new things and building an intuition for what they can do and how best to prompt them.

Here's my most recent example of how I use them for code: https://til.simonwillison.net/github-actions/daily-planner - specifically this transcript: https://gist.github.com/simonw/d189b737911317c2b9f970342e9fa...


I've developed a workflow that's working pretty well for me. I treat the LLM as a junior developer that I'm pair programming with and mentoring. I explain to it what I plan to work on, run ideas by it, show it code snippets I'm working on and ask it to explain what I'm doing, and whether it sees any bugs, flaws, or limitations. When I ask it to generate code, I read carefully and try to correct its mistakes. Sometimes it has good advice and helps me figure out something that's better than what I would have done on my own. What I end up with is like a living lab notebook that documents the thought processes I go through as I develop something. Like you, for individual tasks, a lot of times I could do it faster if I just wrote the damn code myself, and sometimes I fall back to that. In the longer term I feel like this pair programming approach gives me a higher average velocity. Like others are sharing, it also lowers the activation energy needed for me to get started on something, and has generally been a pretty fun way to work.


Here's what I find extremely useful:

1 - very hit or miss -- I need to fidget with the aws api in some way. I use this roughly every other month, and never remember anything about it between sessions. ChatGPT is very confused by the multiple versions of the APIs that exist, but you can normally talk it into giving you a basic working example that is then much easier to modify into exactly what I want than starting from scratch. Because of the multiple versions of the aws api, it is extremely prone to hallucinating endpoints. But if you persist, it will eventually get it right enough.

2 - I have a ton of bash automations to do various things. just like aws, I touch these infrequently enough that I can never remember the syntax. chatgpt is amazing and replaces piles of time googling and swearing.

3 - snippets of utility python to do various tasks. I could write these, but chatgpt just speeds this up.

4 - working first draft examples of various js libs, rails gems, etc.

What I've found has extremely poor coverage in chatgpt is stuff where there are basically no stackoverflow articles explaining it / github code using it. You're likely to be disappointed by the chatgpt results.


As the article says, it helps to develop an intuition for what the models are good or bad at answering. I can often copy-paste some logs, tracebacks, and images of the issue and demand a solution without a long manual prompt - but it takes some time to learn when it will likely work and when it's doomed to fail.


This is likely the biggest disconnect between people that enjoy using them and those that don’t. Recognizing when GPT-4’s about to output nonsense and stopping it in the first few sentences before it wastes your time is a skill that won’t develop until you stop using them as if they’re intended to be infallible.

At least for now, you have to treat them like cheap metal detectors and not heat-seeking missiles.


I had good results by writing my requirements like they were very, high-level code. I told it specifically what to do. Like formal specifications but with no math or logic. I usually defined the classes or data structures, too. I’d also tell it what libraries to use after getting their names from a previous, exploratory question.

From there, I’d ask it to do one modification at a time to the code. I’d be very precise. I’d give it only my definitions and just the function I wanted it to modify. It would screw things up whereby I’d tell it that. It would fix its errors, break working code with hallucinations, and so on. You need to be able to spot these problems to know when to stop asking it about a given function.

I was able to use ChatGPT 3.5 for most development. GPT 4 was better for work needed high creativity or lower hallucinations. I wrote whole programs with it that were immensely useful, including a HN proxy for mobile. Eventually, ChatGPT got really dumb while outputting less and less code. It even told me to hire someone several times (?!). That GPT-3-Davinci helped a lot suggests it’s their fine-tuning and system prompt causing problems (eg for safety).

The original methods I suggested should work, though. You want to use a huge, code-optimized model for creativity or hard stuff, though. Those for iteration, review, etc can be cheaper.


Its useful for writing generic code/template/boilerplate, then customizing it by inserting your own code. For something you already know better, there isn't a magic prompt to express it, since the code is not generic enough for LLM to understand as a prompt.

Its best usecase is when you're not a domain expert, need quickly to run some unknown API/library inside your program inserting code like "write a function for loading X with Y in language Z" when you barely have an idea what is X,Y,Z. Its possible in theory to break-down everything to "write me a function for N" but the quality of such functions is not worth the prompting in most situations and you better ask it to explain how to write a function X,Y,Z step-by-step.


This is exactly where I get the most value. For example, I know write a bunch of custom chrome plugins. I'm not much of javascript guy, but I can get by and validate the code. Usually what I want to do is simple, but requires making an api call and basic parsing. All stuff I could probably figure out myself in an hour or two. But instead I can get an initial version done in 2 minutes and spend 5-10 debugging. I probably wouldn't even try otherwise.


I felt the same every time I tried to get some help in a subject matter where my knowledge was quite advanced and/or when the subject matter was obscure/niche.

Whenever I tried it on something more common and/on in some stuff I had absolutely zero familiarity it did help me bootstrap quicker than reading some documentation would have

That tells a lot about how hard it is to write/find documentation that is tailored exactly to you and your needs


You can use LLMs as documentation lookups for widely used libraries, eg: the Python stdlib. Just place a one line comment of what you want the AI to do and let it autocomplete the next line. It’s much better than previous documentation tools because it will interpolate your variables and match your function’s return type.


Yeah, I’m not sure how often these tools will really help me with the things that end up destroying my time when programming, which are stuff like:

1) Shit’s broken. Officially supported thing kinda isn’t and should be regarded as alpha-quality, bugs in libraries, server responses not conforming to spec and I can’t change it, major programming language tooling and/or whatever CI we’re using is simply bad. About the only thing here I can think of that it might help with is generating tooling config files for the bog standard simplest use case, which can sometimes be weirdly hard to track down.

2) Our codebase is bad and trying to do things the “right” way will actually break it.


One thing you might find useful is using it to write unit tests for legacy code, that more easily allow you to refactor the crappy codebase.


No, you're likely just a better programmer than those relying on these tools.


That could be the case and likely is in areas where they are strongest, just like the articles example of how its not as useful for systems programming because he is an expert.

If you ask it about things you don't know it was likely trained on high quality data for and get bad answers, you likely need to improve your writing/prompting.


Except it's most dangerous when used in areas that you're weakest because it will confidently spit out subtly wrong answers to everything. It is not a fact engine.


That only matters if you use it for things where failure actually causes damage.


However could it be that the 'entertainment effect' of using a new (and trendy) technology like LLMs provides the activation energy for otherwise mundane tasks?


In my experience, it genuinely lowers the activation (and total) energy for certain tasks, since LLMs are great at writing repetitive code that would otherwise be tedious to write by hand. For instance, writing a bunch of similar test cases.


This is true. I write way more documentation now since llm does all the formatting, structure , diagrams ect. I just guide it at a high level.


Do you use a specialized LLM for diagrams?


I'm not OP, but I just ask GPT to turn code or process or whatever else into a mermaid diagram. Most of the time I don't even need to few-shot prompt it with examples. Then you dump the resulting text into something like https://mermaid.live/ and voilà.


I just ask chatgpt to generate text diagrams that i can embed in my markdown.


No. I've completed projects that have been on the back burner for years in hours. They weren't on the back burner for lack of interest but mainly for lack of expertise in a specific stupid area.

It's not an exaggeration to say that I now do two weeks of programming a night. Of course a lot of times the result gets thrown away because the fundamental idea was flawed in a non-obvious way. But learning that is also worth while.


It's revolutionized our in house tooling at work.

No longer do I need to give PR feedback more than a couple times, because we can just ask chatgpt to come up with a lint rule that detects and sometimes auto-fixes the issue. I use it to write or change Jenkins jobs, scaffold out tests, diagram ideas from a monologue brain dump, write alerting and monitoring code, write and clean up documentation.

Most recently I wanted to get some "end of year" stats for the teams, normally it would never have happened because I don't have half a day to dedicate to relearning the git commands, and how to count the lines of code and attribute changes to teams and script the whole process to work across 20 repos.

20 minutes later with chatgpt I had results I could share within the company.

It's just allowed me to skip almost all of the boring and time consuming parts of handling small things like that, and instead turns me into a code reviewer who makes a few changes to make it good enough then pushes it out


Does your company have any considerations for feeding chatGPT source code? Would it not be safer to use a local LLM?


Not any more than feeding source code to Github. (I personally feel "source code" is very rarely the "secret sauce" of a company anyway). But where I work we not only have the blessing of the company, we are encouraged to use it because they've clearly seen the benefits it brings.

A local LLM would be preferrable all things equal, but in my experience for this kind of stuff, GPT-4 is just so much better than anything else available, let alone any local LLMs.


It's safe if you use the OpenAI API, or you have an enterprise chatgpt account - they have the same privacy guarantees as other cloud providers there.


Wait what?

ChatGPT does diagrams for you? Writes documentation?


Absolutely!

I found a plugin a while back called "AI Diagrams" that generates whimsical.com diagrams for me. Combined with the "speech to text" systems in chatgpt means I can just start babbling about some topic and let it write it all down, collect it into documentation, and even spit out a few diagrams from it.

I generally have to spend like 10 minutes cleaning them up and rearranging them to look a bit more sane, but it's been a godsend!

Similarly I sometimes paste a bunch of code in and tell it to write some starter docs for this code, then I can go from there and clean it up manually, or just tell it what changes need to be made. (technically I tend to use editor plugins these days not copy+paste, but the idea is the same)

Other times I'll paste in docs and have it reformat them into something better. Like I recently took our ~3 year old README in a monorepo that goes over all the build and lint commands and had it rearrange everything into sets of markdown tables which displayed the data in a much easier to understand format.


Same here - I feel as though it's turned me into a super programmer. One of my favorite uses is converting a C# model with a bunch of properties into a SQL table along with their corresponding stored procedures. I used to have boilerplate code and would have to copy/paste every property, along with their SQL datatypes. One model might take me 10 to 20 minutes to get translated to a table, and the stored procedures would take me another 20 minutes. Now it's all done in 5 minutes tops, and I'm not having to nitpick datatype issues I may have screwed up in the manual process.


I had a project recently: building an advanced Java class file analyzer. I knew a lot about ow2 asm libraries, but it saved me a lot of digging time remembering exact descriptor formats. Also it helped me understand why other static analysis libraries weren't good enough for me for stateful reasons.

For me, ChatGPT is doing two things: 1) saving trivial StackOverflow and library code walking to answer specific question, and 2) helping the initial project research stage to grasp feasibility of approaches I may take before starting.


to add to that I've heard a lot of people who have things like ADHD saying LLM's are life changing more than say neurotypical people. For those of us with bad ADHD doing simple things are the hardest, just taking out the trash or opening the mail is nearly impossible, your internal dialog is screaming at you for hours to just do the thing, but your body won't listen. They call it executive dysfunction, but it's the bane of my life

LLM's helping to just start the thing is actually a huge deal.


Also, just throwing a bunch of disorganised thoughts into it, and letting it organise everything.


Agreed, I've now got a sizable project going that I would probably procrastinate on attempting for years without GPT 4 laying the initial groundwork, along with informing me of a few libraries I had never heard of that reduced wheel reinventing quite a bit.


First automation acts as a powerful lever and enabler, then it replaces you.

In 5 years time you may well be more productive than ever. In 15 years I doubt there'll be many programming jobs in the form they are recognisable today.


Historically, automation has always made society richer and better off as a result.

There are two guys outside of my window at this very moment getting rid of a huge pile of dirt. One is in an excavator, the other in a truck. There’s a bunch of piles that they’ve taken care of today. Two centuries ago this work would’ve taken a dozen people, a few animals of burden, and a lot more time.

Where are the other ten hypothetical people? I don’t know, but chances are they’ve been absorbed by the rest of the economy doing something else worthwhile that they are getting paid for.


There are two problems with this argument. The first, and easier to accept one is that while society might be better off, in the long run , as a result the affected individuals will probably not. We tend to generalize from a single historical example, the industrial revolution and, more specifically, the automatic loom, and in that case the displaced workers ended up doing worse. Better jobs and opportunities only got created later.

The other problem is, of course, is that all the historical examples (the data) are too few to generalize from while we do see how these examples are different from each other. As technological evolution progresses, automation gets more and more sophisticated, it can replace jobs that require more and more skills and talent. In other words, jobs that fewer and fewer people were able to do in the first place. This means that the bar for successfully competing in the labor market gets higher and higher and it will get to a point where a substantial number of people will just be plain uncompetitive for any job.

Or, at least that was one of the morels until LLMs were invented. (Mostly everyone thought that automation would take over the opportunities from the bottom up in general.) Now it seems that indeed white collar jobs are more in danger for now. But I digress.

The point here is that past examples are false analogies because AI (and I moslty mean future AI) is funcamentally different from past inventions. It's capabilities seem to improve quickly but we're mostly stuck with what evolution gave us. (We, as a species, are evolving but it's very slow compared to the rate of technological evolution and also we, as individuals, are stuck with whatever we were born with.)


I don't speak for the parent, but what they said seems true - Henry Hazlitt covers this phenomenon pretty well if you are ever interested. Your two points I think are also true. It won't be nice to everyone... Such is life. That being said, my practical mind is telling me to get ahead of it, whether that be learning new skills or whatever it takes to stay competitive. If that means picking another industry/profession entirely, so be it. You do what you gotta do.


The other problem with the excavator argument is that all that amazing productivity isn't being pulled from thin air by cleverness -- it's burning (limited, polluting) fossilized solar power from a million years ago. We haven't avoided all the work of moving the dirt around, but only transferred the burden from ourselves and our horses to our unsustainable energy sector. AI (and all software, really) is similar in that it tries to do the same to intellectual activities: instead of requiring a cheese sandwich and an afternoon, some tasks can be instead be done with a few cents worth of electricity and a few hundred milliseconds, but I still eat the cheese sandwich for lunch, so in the final analysis we've saved time but no other resource.

We of course wouldn't have to make this explicit calculation if we could incorporate all this knowledge about fossil fuels directly into the prices of goods and services, but this is very difficult thing to do, and so far nobody has managed to do it in a global way.

It may still make material sense to use ChatGPT to create slides for middle management meetings, but that is not at all certain in a world with a significant price on emissions (though, to be fair, almost no human activity from the past fifty years stands up to this test either).


IIRC, Microsoft, which is hosting OpenAI, has already their server farms on carbon-neutral electricity, and is heading towards full carbon neutrality on the lifecycle of their server farms.

So doing stuff yourself is less carbon efficient than letting ChatGPT do its job.

As for calculating co2 pollution into the prices - we're slowly doing that, e.g. EU is setting up its carbon tax that applies to companies abroad.

The issue is that if we were to instantly include true costs of carbon removal into everything, the economy might collapse at worst, and at the least poor people might not afford food, heating nor other basic necessities any more. It will take time to do it sensibly.


I'm skeptical that industrial civilization can exist at all (at least on anything like a billions-of-people scale) with priced-in emissions, but I guess we'll see. Widespread famine and suffering are in store one way or the other (either because of climate change, or because of what we will need to do to fight climate change).


So far we’ve been exceeding the predictions for switching to carbon neutral economy, so I wouldn’t be too sure of that.


I think the other thing people miss is the impact of our existing infrastructure on the speed at which these new technologies can be deployed.

Society had a lot of time to get used to the printing press, the advances of the Industrial Revolution and the internet. This is because the knowledge had to spread, and it's also because a ton of equipment had to be manufactured and/or shipped all over the world. We had to make printing presses, design and build factories, and get a critical mass of people internet-capable computers.

AI is fundamentally different, in that the knowledge of AI can spread instantly because of the internet and because the vast majority of the world already has access to all of the hardware they need to access the most powerful AI models.

Soon we'll have humanoid robots coming, and while they obviously have to be built, we are much more capable now than we were 50 years ago at building giant factories. We also have an efficient system of capital allocation that means that as soon as someone demonstrates a generally useful humanoid robot and only needs to scale production, they'll have access to basically infinite investor money.


Those ten hypothetical people would likely have actually been zero because the task wouldn't have been worth hiring ten people.


Important, and often ignored! Technological improvements change the cost structure of activities. Consider texting. We send billions of texts every day. This does not replace billions of couriers. Rather, more people work to maintain our phone supply and telecom than we ever had running around with letters. It's not a promise but it is a pattern.


I agree, on a societal level it's a great benefit.

On an individual level, however, I'd suggest keeping an eye out for opportunities to retrain.

As an analogy, I'd rather be like the coal miners in the 80s who could read the writing on the wall and quietly retrained into something else rather than those who spent their better years striking over cuts to little avail.

It's a very daunting prospect seeing a path to unemployability, though.

Depending how quickly the change happens, it could be a gentle transition, or it could upset a lot of people.


Yeah, but what do you retrain to? I can't think of any industry that isn't being threatened with massive automation.


Well, the AI job market is pretty hot, so that's an option. And I expect as things mature it'll only create even more opportunity. Projects that nobody previously would have considered, because they'd have taken too long, required too much training for too many people? Now they can happen! And for each of those things, jobs are created, not taken away.

Imagine yourself as CEO. What do you think is your most likely train of thought? A/ "I can fully replace my labor force with bots" or B/ "My employees now have superpowers to do things we couldn't even conceive of two years ago". While there are certainly some scenarios where the first choice is appropriate, the latter sounds far far more likely in most scenarios to me. Why would you contract when there's suddenly so much opportunity to expand?


What AI jobs?


This comment and the grandparent can both be right. Society gets richer because automation replaces jobs, leaving us more time to do other jobs.


also, two centuries ago, we didn't have trucks and the concomitant infrastructure, jobs, wealth, etc. that machines brought along.


> Where are the other ten hypothetical people?

Zoned out somewhere in the backstreets of SF


Or dead because Trump cut off their welfare.


Same as looking back 15-20 years ago from now, though.

Very few jobs now where you put together basic webpages with HTML and CSS (Frontpage, then Wix/Wordpress etc replaced you). Very few jobs where you spend 100% of your time dealing with database backups and replication and managing the hardware (the cloud replaced you). Very few jobs where you spend all your time planning hardware capacity and physically inserting hard disks into racks (the cloud replaced you, too).


I share a similar intuition, but am skeptical that my imagination is in the right ballpark of what it will look like 15 years hence.

What do you think programming will be like in 15 years and where is the high-value work by human programmers?


I think AI will shift the value from those who know how to program toward those who know which programs are most useful to write. In other words, we’ll all become product engineers.


>I think the internet and the open source movement had a similar effect, in that if you did not attempt a project that you had some small interest in, it would only be a matter of time before someone else did enough of a similar problem for you to reuse or repurpose their work, and this led to an explosion of (often useful, or at least usable) applications and libraries.

Being developing before when Mac had b&w screen (system 6?). Had so many good ideas just didn't work on them for whatever reason, and eventually someone did them. Maybe it is just tech people run into the same problem, or collective tech unconscious, the longer you have being around, eventually people in the industry will try to solve the problems. These occurrences goes to show me that there are really aren't many true original ideas out there, a lot of it comes down to implementation, PR, funding, geopolitics, your users and luck.


When it comes to programming, I agree completely. The sweet spot for any use of LLMs is that you already know enough about the subject to verify the work - at least the output - and know enough how to describe in detail (ideally only salient details) what you want. Huge +1 to it helping me do things faster, do things that I wouldnt have otherwise done, or using it for throwaway, mostly inconsequential yet valuable programs.

But another area I have found it extremely helpful in is exploring a new topic entirely, programming or otherwise. Telling it that I dont know what im talking about, don't necessarily need specifics, but here is what I want to talk about and want it to help me think through.

Especially if you are that person who is willing to take what you hear and do more research or ask more question. The entrance to so many fields and subjects is just understanding the basic jargon, listening for the distinctions being made and understanding why, and knowing who the authorities are on the subject.


And it's equally and inversely harmful to junior developers who keep prodding it until it generates an abomination they don't understand that manages to pass the build. People who are learning need help, but the kind of help that LLMs in copilot form provide aren't the right fit.

It would be interesting to train a copilot model that is specifically intended to ask clarifying questions and be a partner in determining a solution, rather than doing its best to generate code for a vague or incorrectly specified question from a junior.


Have you tried prompting it to ask clarifying questions and be that partner? Perhaps no (extra) training required.


In my opinion the junior developers are not equipped to guide their teacher. If they knew they were asking to turn an incorrect assumption into code in the first place, they already wouldn’t believe the confident hallucination they get in reply.


> And it's equally and inversely harmful to junior developers who keep prodding it until it generates an abomination they don't understand that manages to pass the build.

This sounds like shotgun debugging.


While on LSD in this case.


It might not be good for juniors, but, as Antirez points out, is great for adapting to new frameworks or APIs, especially with search results having become so much more noisy lately


Last month I tried to use LLMs for things I didn’t know and couldn’t easily find. Every time they were either subtly wrong or outright hallucinated premises which led me to waste time until I realized they were wrong.

If not for the unwarranted confidence in incorrect responses, I could say they were at least not much worse than what I could piece together from what I knew. As it stands, they are OK filling in for a rubber duck and as autocomplete.


Surely a bad LLM, not chatGPT 4


I can't tell if you are joking or not.


No


I'll add another thought here - what I really want many times is a custom LLM like GPT, but trained on a particular language or framework or topic. I would love to go to a website for a new language and be able to talk about its documentation and ask questions of an LLM to help me understand. Huge bonus points if it was trained on real world code examples of that language or framework and I could have it help me write a new program or function right there. More bonus points if its tied in with an online repl where it can help me right inline.


Ease of retraining/refinement is something I'm really hoping for.

There are an endless number projects to make a "cleaner, revised X", where the coding itself is rote and has already been done at some point, it's just shoved into slightly different semantics that will be a bit more optimal or secure or configurable. It's something that an LLM feels like it's "tip of the tongue" capable of, and in more trivial cases you really can tell GPT to "rewrite this from JS to Python" and it works. But it's limited by just interpolating what's in the training set, when what you want is "port all these standard libraries to my experimental language, and also make a build system for them".


The One and Six Pagers I have it write based on loose criteria help me refine the criteria and in some cases, uncover methods that would not have been evident otherwise.


I think the most under-appreciated aspect of LLMs, one on which the article touched on but didn't directly address, is being the "developer that knows everything" aspect.

No matter how senior of a programmer you are, you're eventually going to encounter a technology you know very little about. You're always going to be a junior at something. Maybe you're the God of Win32, C++ and COM, but you get stuck on obscure NSIS scripts when packaging your software. Maybe you've been writing web apps for the last 25 years and sit on the PHP language committee, but then you're asked to implement some obscure ISO standard for communicating with credit card networks, and you've never communicated with credit card networks on that level before. Maybe you've been writing iOS apps since the first iPhone and Mac apps before that, spent a few years at Apple, know most iOS APIs by heart and designed quite a few yourself, but then you're asked to implement CalDAV support in your app and you don't know what CalDAV is, much less how to use it. An LLM can help you out in these situations. Maybe it won't write all the code for you, but it'll at least put you on the right track.


>"No matter how senior of a programmer you are, you're eventually going to encounter a technology you know very little about"

Or worse, you've filled your head with different tech and now you need to rehash and brush up on stuff you learned prior but swept under the rug for new stuff. It's a strange sensation. Naturally you just go with the median of whatever your company you work for is doing - then find yourself in this situation where it's "been a while" since you worked on CSS. Or it might take you a weekend of study to bring back those Python dataclass skills.


I've found LLMs great for vague questions about functions and APIs whose details I've long forgotten. Recognising the right answer when it appears is often faster than digging through random results on google.


At its heart GPT is the world's best googler. As long as you can find it on Google, an LLM can probably do a better and faster job of finding and curating the information for you.


errr, I don't know about that. I like to ask the AIs about products I built and know everything about as a test, and while they often get the top level facts correct, there is always some crazy hallucination that you won't find googling.


Today it won't, tomorrow it certainly will, and it is time to look for something else as a job.

Thankfully I will most likely already be retired.


The code was written mostly by doing cut & paste on ChatGPT…

I am constantly shocked by how many people put up with such a painful workflow. OP is clearly an experienced engineer, not a novice using GPT to code above their knowledge. I assume OP usually cares about ergonomics and efficiency in their coding workflow and tools. But so many folks put up with cutting and pasting code back and forth between GPT and their local files.

This frustrating workflow was what initially led me to create aider. It lets you share your local git repo with GPT, so that new code and edits are applied directly into your files. Aider also shares related code context with GPT, so that it can write code that is integrated with your project. This lets GPT make more sophisticated contributions, not just isolated code that is easy to copy & paste.

The result is a seamless “pair programming” workflow, where you and GPT are editing the files together as you chat.

https://github.com/paul-gauthier/aider


I like aider. But is there a way to use it to just chat about the code?

I use LLMs to chat about pros and cons of various approaches or rubber duck out problems. I need to copy code over for that, but I've not found aider good for these kinds of things, because it's all about applying changes.

I usually have several back and forths about the right way to do things and then maybe apply some change.


Glad to hear you're finding aider useful!

Sure, there's a few things you could keep in mind if you just want to chat about code (not modify it):

1. You can tell GPT that at the start of the chat. "I don't want you to change the code, just answer my questions during this conversation."

2. You can run `aider --dry-run` which will prevent any modification of your files. Even if GPT specifies edits, they will just be displayed in the chat and not applied to your files.

3. It's safe to interrupt GPT with CONTROL-C during the chat in aider. If you see GPT is going down a wrong path, or starting to specify an edit that you don't like... just stop it. The conversation history will reflect that you interrupted GPT with ^C, so it will get the implication that you stopped it.

4. You can use the `/undo` command inside the chat to revert the last changes that GPT made to your files. So if you decide it did something wrong, it's easy to undo.

5. You can work on a new git branch, allow GPT to muck with your files during the conversation and then simply discard the branch afterwards.


Can I recommend an additional option? I would enjoy being able to enable a “confirm required” mode which presents the patch that will be applied, and offers me the chance to accept/reject it, possibly with a comment explaining the rejection.


Awesome, these might help.

What I feel like I want is /chat where it still sends the context, but the prompt is maybe changed a little, to be closer to a chatgpt experience.

I haven't dug into the prompt aider is using though, so I could be wrong.

Great tool for refactoring changes though! Keep up the good work.


> OP is clearly an experienced engineer

You think? He's the creator of Redis.


For one thing the ChatGPT web interface is useful for much more than just programming. If you're already paying for a sub, it makes sense to cut and paste instead of making additional payments for the API. On top of that people have different thresholds for the efficiency gains that warrant becoming dependent on someone else's project, which is liable to become paid or abandoned.


Yeah, I can ask ChatGPT to "do some web research, and validate the approach/library/interface/whatever", which is a useful feature to me.


I really like the idea of aider but when I tried it, it didn't work. The first real life file I tried it on was too big and it just blew up. The second real life file I tried was still too big. I was surprised that aider doesn't seem to have the ability to break down a large file to fit into the token limit. GPT's token limit isn't a very big source file. If I have to both choose the files to operate on and do surgery on them so GPT doesn't barf, am I saving time vs. using Copilot in my IDE? Going into it, I had thought that coping with the "code size ≫ token limit" problem was aider's main contribution to the solution space but I seem to have been wrong about that.

I hope to try aider again but it's in the unfortunate category of "I have to find a problem and a codebase simple enough that aider can handle it" whereas Copilot and ChatGPT come to me where I am. Copilot and ChatGPT help me with my actual job on my real life codebase, warts and all, every day.


I'm sorry to hear you had a rough experience trying aider. Have you tried it since GPT-4 Turbo came out with the 128k context window? Running `aider --4-turbo` will use that and be able to handle larger individual source code files.

Aider helps a lot when your codebase is larger than the GPT context window, but the files that need to be edited do have to fit into the window. This is a fairly common situation, where your whole git repo is quite large but most/all of the individual files are reasonably sized.

Aider summarizes the relevant context of the whole repo [0] and shares it along with the files that need to be edited.

The plan is absolutely to solve the problem you describe, and allow GPT to work with individual files which won't fit into the context window. This is less pressing with 128k context now available in GPT 4 Turbo, but there are other benefits to not "over sharing" with GPT. Selective sharing will decrease token costs and likely help GPT focus on the task at hand and not become distracted/confused by a mountain of irrelevant code. Aider already does this sort of contextually aware selective sharing with the "repo map" [0], so the needed work is to extend that concept to a sub-file granularity.

[0] https://aider.chat/docs/repomap.html#using-a-repo-map-to-pro...


Try again since the token limit increased in November by a factor of 16 (128000 now for GPT-4 Turbo 1106 preview instead of 8000 for GPT 4).


Mind the cost though! A single request with a fully loaded 128k token context window to GPT-4 Turbo costs $1.28.


Like others, wanted to say thank you for writing Aider.

I think you've done a fantastic job of covering chat and confirmation use cases with the current features. Comments on here may not reflect the high satisfaction levels of most of your software users :)

Aider helps put into practice the use cases that antirez refers to in their article. Especially as someone get's better at "asking LLMs the right questions" as antirez refers to it.


I've given Aider and Mentat a go multiple times and for existing projects I've found those tools to easily make a mess of my code base (especially larger projects). Checkpoints aren't so useful if you have to keep rolling back and re-prompting, especially once it starts making massive (slow token output) changes. I'm always using `gpt-4` so I feel like there will need to be an upgrade to the model capabilities before it can be reliably useful. I have tried Bloop, Copilot, Cody, and Cursor (w/ a preference towards the latter two), but inevitably, I end up with a chat window open a fair amount - while I know things will get better, I also find that LLM code generation for me is currently most useful on very specific bounded tasks, and that the pain of giving `gpt-4` free-reign on my codebase is in practice, worse atm.


There is a bit of learning curve to figuring out the most effective ways to collaboratively code with GPT, either through aider or other UXs. My best piece of advice is taken from aider's tips list and applies broadly to coding with LLMs or solo:

Large changes are best performed as a sequence of thoughtful bite sized steps, where you plan out the approach and overall design. Walk GPT through changes like you might with a junior dev. Ask for a refactor to prepare, then ask for the actual change. Spend the time to ask for code quality/structure improvements.

https://github.com/paul-gauthier/aider#tips


Shameless plug: https://github.com/codespin-ai/codespin-cli

It's similar to aider (which is a great tool btw) in goals, but with a different recipe.


I currently use gptel which inserts into my buffer directly and has less friction than copy paste.

Aider seems super cool, will check it out. What kind if context from the git repo does it share?


Glad to hear you'll give aider a try. Here's some background on the "repo map" context that aider sends to GPT:

https://aider.chat/docs/repomap.html


If I was doing it all the time, I might care. As it is I don't really find that workflow painful. It reminds me of the arguments about how much it helps to be able to touch type or type very fast. Actually inputting code is a minor part of development IME.


I edit cell-sized code that uses Colab GPUs, asking a lot of questions to chatGPT l,so copying and pasting is not a problem for me.


Thanks for making aider. I use it all the time. It's amazing.


For the past few days, I have been trying to fix a bug in a closed-source Mac app. I otherwise love the app, but this bug has been driving me crazy for years.

I was pretty sure I knew which Objective-C method was broadly responsible for the bug, but I didn't know what that method did, and the decompiled version was a nonsensical mess. I felt like I'd hit a wall.

Then I thought to feed the decompiler babble to GPT-4 and ask for a clean version. The result wasn't perfect, but I was able to clean it up. I swizzled the result into the app, and I'm pretty sure the bug is gone. (I never found reproduction steps, but the problem would usually have occurred by now.)

I never could have done this without GPT-4.


This sounds rather like the junior/bad developer who makes a bug disappear (at least for time being) by changing the order of functions in a source file or some such.

Admittedly a complete rewrite of a piece of code, even without understanding what you are doing (e.g by using an LLM), is unlikely to have the same bugs as the original implementation (but may have different bugs), but hopefully no-one is doing this for code where bugs have any significant consequence (e.g. system downtime, cost to customers).


Just to be clear, I do understand the new version of the method. I don't entirely know how it fits into the larger system, but that's to be expected when I literally don't have the source code.

When I cleaned it up, I took out some complexity which I believe was responsible for the bug, at the cost of some performance. According to GPT-4, the original version was checking file descriptors to decide when to do work. My version just does the work every 5ms.


So the parent commenter tried to tell you, that they (and I too) heard this story from junior and bad programmers in the past 20 years all the time, and they didn’t use LLMs. It doesn’t matter whether you use generative AIs or not, it’s a bad way of thinking, and long term it’s not beneficial to anybody. The real problem is that you didn’t dig deeper when you figured out that that code change fixed the problem.


But I actually think this is a reasonable way to fix a hard-to-pin-down bug in any context, at least temporarily. (In my case, I don't intend to go back because it's not my app and mostly for personal use, but that's beside the point.)

There was a tradeoff between performance and complexity. The high-performance, high-complexity version was buggy, so I switched to a simpler option at the cost of some performance.

This isn't where the LLM was significant. The LLM was able to make sense of unreadable decompiled code, similar to how the author had ChatGPT translate from compiled assembly code back to C. (Giving GPT-4 the actual assembly never occurred to me, in hindsight I should have tried that first.)


My job is exactly to fix code which was not understood by its creators. And this “I have no idea why, but it works” (until it doesn’t) is the main cause of most of the problems which I work on.

For example, at my current company the developer who introduced a “clever” navigation system didn’t know how HTML forms should be used, and why servers still allowed what they did. It worked. Now, 20+ years later that sole developer’s stupid decision, and lack of HTML best practices will cost my company a few million dollars (and by the way already cost probably a few million). A missing day of learning (and by the way a clear sign, that that developer should’ve never trusted with this task).

Senior developers learn this, and I’ve never seen that better developers would be satisfied and would say “yeah, I fixed it”, when they don’t understand the what and how completely, even when it’s not strictly necessary. They burned themselves enough times.


You've got me curious, how did the developer's misunderstanding of forms cost millions? Was it submitting duplicate orders? Blocking submission of valid orders causing lost business?

I think it's an error to bring that up here though, where we're talking about someone patching a closed source app for their personal use. Is it worth the cost/benefit of decompiling and studying the app's code sufficiently long to be highly confident of the fix?

Sloppiness and "good enough" has its place. So does full effort correctness.


Maybe you’re right. I hardly disagree, but those are just opinions.

The main culprit is that the backend is Java EE, and they used a single form for multiple things. This is a 20+ years old software, so they had to used request parameters and attributes everywhere. It’s impossible to locate where a given parameter is used or where attributes are created and used exactly. That’s the first problem. Second, is this single form per page thing. They use that HTML form for everything. Even page navigation, they just discard unnecessary fields on server side. This means that they change that form from JavaScript all the time. Sometime the generated action is not used at all, it’s overwritten with every event. And a single input can be used for multiple things. Third, they used a very flexible framework, which was outdated 10 years ago, so it definitely needs to be replaced.

Add these together, you have terrible spaghetti code for 350+ pages with 1000+ endpoints. There is no separation of code even on HTML level. I know where this “use only one form” came from. I found the ancient doc of that framework, where it mentioned. The problem is that they meant a form object on server side per page (and not per endpoints), and not on client side. And they fucked up completely, because they started to use multiple form objects per page, and single client side HTML forms.

So now, replacing that old framework starts with a hefty refactoring, creating individual HTML forms for example. It will take at least half a year for a team, just the refactoring, because it’s very difficult to split those forms, since request attributes and parameters used everywhere.


This web dev? If so, it must be some special stuff for it to still be in prod after all these years.


This post is absolutely devastating to me. Salvatore is surely one of the most capable software engineers working today. He can lucidly see that this supposed tool is completely useless to him within his area of expertise. Then, rather than cast it off as the ill-fitting, bent screwdriver that it is, he accepts the boosters' premise that he must find some use for it.

Just as any introductory macroeconomics class teaches, if one island has superior skill in producing widgets A, it doesn't matter how terrible the other island's skill at producing B is, we'll still see specialization where island A leverages island B. So of course antirez's relative ability in systems programming would relegate the LLM to other progamming tasks.

However! We do not exist in isolation. There is a multitude of human beings around us, hungry for technical challenges and food. Many of them have or could obtain skills complimentary to our own. In working together, our cooperative efforts could be more than the sum of their parts.

Perhaps the LLM is better at writing PyTorch code than antirez. Just because we have an old bent screwdriver in the garage doesn't mean we should try to use it. Perhaps we'd be better off heading to the hardware store today.


If the LLM is better than me at writing Torch code, it is a great idea for me to use an LLM to write my model definition, since the exact syntax or the reshaping of the tensors is not so important to me. If I want to create a convnet and train it on my images, for my own usage, I don't need to bother some Torch expert to do it for me. I can do it myself, if I understand enough about convnets themselves and not enough about Torch syntax / methods. The alternative would be to study the details of Torch in a manual, and the end result would be the same: the important thing in this task is to govern the ML concepts, not the details of MLX, Keras or PyTorch.


> I don't need to bother some Torch expert to do it for me.

This is, indeed, the core of our disagreement. You seem confident it would be a bother to others to ask for help. I'm confident that there are many who would value the opportunity to collaborate with you. I feel sure that whatever analysis you're doing could benefit from the sounding board of a domain expert, and you'd both benefit from the exchange.

EDIT: to clarify, being "famous" has nothing to do with it. Each of us has worth and we all would gain by working with others.


Hi Couchand.

I'm a mediocre programmer who uses GPT for a ton.

Are you volunteering to answer my questions on all the obscure stuff I ask it? Because I don't really know anybody else who will.

Anyway, my email is in my profile, write to me if this is something you're up for!

Edit: Here's a list of the stuff I asked it over the weekend:

- Discuss the pros and cons of using Standardized-audio-context instead of just relying on browser defaults. Consider build size and other issues.

- How to get github actions to cache node_modules (not the npm cache built-in to the actions/setup-node action.

- Howto get the current git hash in a GitHub action?

- Rewrite a react class-based component in functional style

- How to test that certain JSX elements were generated without directly-comparing React elements for my ANSI color to HTML parser?

- Does it make more sense to keep a copy of the original text in an editor, or just hold on to something like a crc32 to mark a document dirty?

- Can you set focus to a window you create with window.open? (You sure can!)

- Rewrite the rollup.config.js for a library of mine to produce a separate rollup config per audio worklet bundle

- Turn this tutorial for backing up a Mastodon instance into a script

- Refactor a standalone class to split it in half so each class manages precisely one thing.

- I have some code I'm writing for turn-by-turn directions. I have the data structures already, let's write code to narrate them.

- What's with this weird type error around custom CSS properties with React?


If only there was a website that allowed people to ask an absurd amount of questions, for free, and get people's insight within a semi-reasonable amount of time.


That's a pretty big exception to the general case though if your argument is "you are a famous person and people would love to collab" - at 2am your time, exactly when you are motivated? And what if you are not a world famous programmer, what then?

To me its like saying you shouldn't play solitaire because you are a world class poker player, and there's plenty of people who want to game with you. They are orthogonal concepts - just communicating with people can be more work than just reading on your own.


You're right, but not always. Some people work differently than others and would rather headbutt against a wall on their own for hours than work with other people.

In the long term, it is beneficial to have experts as your collaborators. From my experience though, true collaboration is unlocked once you have established a personal relationship with someone, which takes time and repeated effort. Until then, the collaboration is no better than searching the internet or asking chatGPT.

Establishing relationships with people is hard and takes a lot of work, and frequently doesn't work out like you hope. ChatGPT is a close enough approximation for smaller tasks like the OP describes


It's hard, and many of us (myself included) and not great at it. That makes it seem easier to reach for the simulacrum. But at what cost?

Homo sapiens's superpower is social cooperation. My concern is that these systems will abet the existing social forces which seem to be causing unprecendented levels of isolation of adults, which will continue to drive smart people away from collaboration and towards solitude, at a level far beyond simple prefences would suggest.

We already have enough trouble hearing each other through the noise, and understanding what each other has to say. I don't have the answers but I'm looking for them and I do hope other humans will, too.


You said yourself. They are "social forces". In other words, problems that people created. Not technology problems.

I think there is an incorrect worldview that tries to blame human problems on technology.

It's quite true that isolation is an increasing problem. But the idea that instead of using an AI that can spit out a comprehensive answer in seconds, we should all pretend that such tools don't exist, and start constantly asking for help with every idea or request instead, while waiting 3-100 times longer for a less thorough response, is ludicrous.

It's a great idea to collaborate more and try to avoid isolation. But those are societal problems. They are not caused by the latest tools.

Also, as far as humanity's "super-power" as being collaboration, this is quite a shortsighted comment. I believe that well before AI achieves "super" level IQ, it will vastly outperform humans due to other advantages. One of those advantages is speed. Another is the ability to communicate and collaborate much, much more rapidly and effectively than humans.

One type of digital life that may take over control of the planet (possibly within decades rather than centuries) would be a type of swarm intelligence with the ability to actually "rsync" mental models to directly transfer knowledge.


I think people tend to cooperate only when there are tangible benefits such as:

- survival

- making money

- sex

- enjoyment via social interactions (like parties, hangouts, etc)

It just so happens that for the majority of our civilization, to get those things, we've had to cooperate, but as we develop technology, our ability to get those things increases and our reliance on others decrease (though in a weird way it increases since technology is complexity so society becomes larger, more complex, and more inter-dependent)

We are still in an unprecedented technological boom of computing so we are adjusting on the fly to it. Like the OP says, AI can greatly accelerate independent learning, but eventually that learning plateaus. Once it does, we have to go back to collaboration, but until we find that limit, I think it's human nature to push on.


no - predatory behavior in groups is completely missing from this list. To extend that ides, IMHO much of "business" and some government, falls into this category easily. This is true for all large language groups world wide, irrespective of politics details.


Those fall under making money and surival (and honestly sex and enjoyment as well)


The argument here seems to be that there is a free supply of software developers available for the taking. Software developers are quite well paid, which suggests this is not true.


It's more of the scale you do your thing. For the quick and dirty thing, I'm going to write Torch model within the hours - where would I find a collaborator who is willing to start immediately within that time?


Local inference of LLMs is essentially free. There doesn’t exist a sufficiently deep pool of experts of all knowledge spaces who are also willing to work for free, who can be trivially identified, contacted, and scheduled.


I'm not going to ask a human - even a coworker - for every intellisense suggestion. Neither of us would get value from that exchange.


At least the LLM is in the same time zone...


If the pytorch piece is for something low priority, its probably not worth reaching out to someone else.


For small tasks this is fine, but for larger projects you have to be careful. The key to being productive with an LLM for coding is to be able to understand the code being generated, to avoid rather nasty bugs that may crop up if the model hallucinates. The worst part is that an LLM will create these bugs in very subtle ways, since it excels at writing convincing code (whether or not it actually works).


The funny thing about LLMs is that there's no rush in adopting them. They're semi useful but not really right now, but you're not gonna be 'left behind' if you don't make use of them. Everyone involved is working their hardest to make them more capable so when that day comes, you'll just use it to prompt what you want. But there's no rush to try to squeeze out anything out of the current generation which mostly lowers productivity rather than increases it.


My thinking exactly! There's FOMO going on (and being fueled by people most of which seem to hope to somehow make money off it), but the barriers to entry for using LLMs are just not that high. When the tools are good enough, I'll happily use them. Today, for the work I do, I found that not to be the case yet. I wouldn't advocate for not trying them, but I see no reason to force yourself to use them.


Yea. TBH the tooling is the bigger issue for me, personally. I come across a ton of times where an LLM might work, or could - potentially. Usually refactors. But it requires access to basically all my files, both for context and to find where references to that class/struct/etc are.

Furthermore, i'd vastly prefer a workflow most of the time where i don't even have to ask. Or so i imagine. Ie i think i'd prefer a Clippy style "Want me to write a test for this? Want me to write some documentation for this?" etc helpers. I don't want to have to ask, i want it to intuit my needs - just like any programmer could if pair programming with you.

And most of all i want it to have access to all files. To know everything about the code possible. Don't just look at the func name in isolation, attempt to understand how it's used in the project.

If i have to baby sit an LLM for a simple function refactor to give it all files where the function is used or w/e, i'd rather do it myself with tools like AST Grep or even my LSP in many cases.

I'm very interested in LLMs for simple tasks today, but the tooling feels like my primary blocker. Also possibly context length, but i think there's lots of ways around that.


Just to throw my 2c here, since I also want the models to access the whole codebase of (at least) the current project.

I had great impression of sourcegraph’s cody. https://sourcegraph.com/cody few months ago.At least with the enterprise version of the sourcegraph that had indexed most of the orgs private repos.

The web ui (vscode extension was somehow worse, not sure why) was providing damn good responses, be it code generating or QA/explanation about code spanning through multiple repos. (e.g. terraform modules living in different repos)

Afaiu it was using the sourcegraph index under the hood. But I never really deep dived into the cody’s design internals (not even sure if they are actually public)

That being said, I’ve departed from the org months ago and haven’t used cody since then, so take this with a grain of salt, since the whole comment could be outdated a lot.


I think this is an extremely unfavorable interpretation of the article. I wonder if we even read the same thing?

He sees a new tool that others have found interesting, and he identifies ways to use that tool that are useful for him, while also acknowledging where it's not useful. He backs it up with plenty of examples of where he found it not-useless. This is not a revolutionary insight, especially for a developer. We constantly use a variety of tools, such as programming languages, that have strengths and weaknesses. Why are LLMs so different? It seems foolish to claim they have zero strengths.


You might be surprised at the number of large companies who think that "GenAI" can be used to replace programmers due to non-technical executives having got the impression that a competency of LLMs is writing code ...

Of course they do have uses, but more related to discovery of APIs and documentation than actually writing code (esp. where bugs matter) for the most part.

I also have to wonder how long until open source code (e.g. GPL'd) regurgitated by LLMs and incorporated into corporate code bases becomes an issue. The C suite dudes seem concerned about employees using LLMs that may be publicly exposing their own company's code base, but illogically unworried about the reverse happening - maybe just due to not fully understanding the tech.


So you're suggesting that instead of asking an LLM, we should spend time on Fiverr/Upwork to find someone to do random coding tasks that might not fall under our expertise? Can you do that for less than 20$/month?


I agree there are difficult coordination problems that our society has failed to grapple with, let alone solve.


I don't understand which part was devastating.


A respectable person offering nuance can be shattering if you identify as pro- or anti something.

I wish we had an easier time talking about ideas with a little more detachment.


In a frictionless market it would make sense to do that. But as Coase pointed out nearly a century ago[1] forming and monitoring the relationships that allow specialization involves a certain amount of overhead. At a certain point going through the hiring or vetting process to utilize another person's skills makes sense, but it looks like the author is very far from that point.

[1]https://en.wikipedia.org/wiki/The_Nature_of_the_Firm


Bad metaphor and worse that you use it to inform your conclusion. If you must have one then use training wheels, experts don’t need training wheels beginners do, simple as that. Though the utility of LLMs goes much further as the author points out, it can often make the boring or tedious parts easier so you can add assistive peddling to the metaphor. To carry it to the end, once you have four wheels and a motor it’s not long before someone invents the car.


Like… hiring people? I think it’s a little rediculous to say “no don’t use that screwdriver, go hire a workman instead.” To say the least, there are some sizeable economic differences between those two options


> He can lucidly see that this supposed tool is completely useless to him within his area of expertise.

Did you read the article? Throughout the entire post he clearly says LLMs have a lot of value in his workflow.


Parent seems to think that Antirez has been begrudgingly pulled along by the tide of LLM hype, not as if Antirez is an accomplished developer whose judgement on tools can be trusted.


This seems like a very impractical perspective to me. If there really was a "hardware store" that we could head off to to get what we need, it might be different. But in general, that's not the case.

There can also be significant overhead in looking elsewhere for a solution. That's a big part of why so many developers reinvent things. This is often dismissed as NIH syndrome, but there's more to it than that.

You raised "introductory macroeconomics". The economic effect that will most strongly apply in the case of LLMs is that of the technology treadmill (Cochrane 1958): when there's a tool that can improve productivity, it will be used competitively so that those who don't use it effectively, to improve their productivity, will be outcompeted.

This seems like an unavoidable result in typical capitalist economies.

Your point about leveraging hungry humans would require strong incentives to overcome the treadmill effect. Most Western countries don't have many ways to implement anything like that. The closest thing might be unions, but of course most software development is not unionized.


There is an impedance problem when working on a new project.

At the beginning, when there's 0% of the task done, and you need to start _somewhere_, with a hello world or a CMakeLists file or a Python script or whatever, it takes effort. Before ChatGPT/LLM, I had to pull that effort out from within myself, with my fingertips. Now, I can farm it out to ChatGPT.

It's less efficient, not as powerful as if I truly "sat down and did it myself," but it removes the cost of "deciding to sit down and do it myself." And even then, I'm cribbing and mashing together copy-pasted fragments from GitHub code search, Stackoverflow, random blog posts, reading docs, Discord, etc. After several attempts and retries, I have a "5% beginning" of a project when it finally takes form and I can truly work on it.

I sort of transition from copy-pasting ChatGPT crap to quickly create a bunch of shallow, bullshit proofs-of-concept, eventually gathering enough momentum to dive into it myself.

So, yes, it's slower, and more inefficient, and ChatGPT can't do it better than I can. But it's easier and I don't have to dig as deep. The end result is I have much more endurance in the actual important parts of the project (the middle and end), versus burning myself out on the beginning.


Was I digging too deeply before?

Was I asking the right questions from the beginning, and if not, can I effectively salvage my work?

Sunk costs disappear into a $20 subscription


> I have a problem, I need to quickly know something that I can verify* if the LLM is feeding me nonsense. Well, in such cases, I use the LLM to speed up my need for knowledge.*

This is the key insight from using LLMs in my opinion. One thing that makes programming especially well suited for LLMs is that it's often trivial to verify the correctness.

I've been toying around this concept for evaluating whether a LLM is the right tool for the job. Graph out "how important is it that the output is correct" vs "how easy is it to verify the output is correct". Using ChatGPT to make a list of songs featuring female artists who have won an Emmy is time consuming to verify it's correct, but it's also not very important and it's okay if it contains some errors.


> One thing that makes programming especially well suited for LLMs is that it's often trivial to verify the correctness.

is this why software never has any bugs?


Yeah, exactly.

Problems where coming up with solution is hard but verifying a possible solution is easy.

And we all know what that class of problems is called.


If something is time consuming, not very important, and accuracy doesn’t matter, perhaps the correct answer is to not do it. The world is already full of irrelevant inaccurate drivel and we’d do well to have less, not accelerate its production.

This is not a comment on your specific example, but on the idea as a whole.


I use chatgpt as my thinking partner writing code. I chat with it all day everday to finish work.

My company has approved copilot but Copilot autocomplete has been an awful experience. company hasn't approved copilot chat ( which is what i need) .

But I would love something similar that can run on my laptop for my code to generate unit tests, code comments ect ( ofcourse with my input and guidance).


> My company has approved copilot but Copilot autocomplete has been an awful experience

I had the same experience, I feel like I must be crazy because so many of my colleagues have been singing its praises. I found it immensely distracting and disabled it again after a couple of days.

It was like having someone trying to finish my sentence while I was still speaking; even when they were right, it was still annoying and knocked me out of my flow (and very often, it wasn’t right).


I actually find copilot quite useful but I use it from emacs and it only provides suggestions when I intentionally hit my defined shortcut for it, so it never gets in the way. It may be worth for you to try setting it up in a similar way in the tool you’re using it from, as I agree I’d find the experience awful if it was always trying to autocomplete my sentences.


If you use VS Code or a JetBrains IDE, Continue works well with Ollama and it’s really easy to get going.

[0] https://continue.dev/

[1] https://ollama.ai/


Any opinion on what the best experience is, currently?


What do you mean by experience?

If you mean the model, I’ve been happy with Deepseek Coder. Mistral is a popular alternative as well.


By experience, i mean the whole UX - which is as much the interface to the LLM as it is the LLM itself, imo. Ie i'm not terribly interested in constantly telling an LLM what i want, prodding it to do the right thing, etc. I'm more interested in ways it can be easy and correct - even if that's limited in functionality. Ie perhaps automatically suggesting docs, knowing what style to use, etc.


Right. Continue is a bit more hands on, conversational. I've used Llama Coder for code completions and generation right on the editor to some success.

A sibling comment suggested Wingman but I haven't tried it.


yes i'd second deepseek-coder and check out wingman. it's a relatively new extension, snappy and with good ux.



yes, it's the nvms one.


Fwiw, there are now some local models that rival 3.5-turbo in code chat, like the Codeninja I tried out the other day. Not nearly as good as 4 which iirc runs the copilot backend, but for sensitive data that can't leave the premises it's the only real option. Or getting a dedicated instance from OpenAI I guess.


Perhaps the most important point in the piece, and one that can't be repeated enough or understood enough as we head into what 2024 has in store:

> And then, do LLMs have some reasoning abilities, or is it all a bluff? Perhaps at times, they seem to reason only because, as semioticians would say, the "signifier" gives the impression of a meaning that actually does not exist. Those who have worked enough with LLMs, while accepting their limits, know for sure that it cannot be so: their ability to blend what they have seen before goes well beyond randomly regurgitating words. As much as their training was mostly carried out during pre-training, in predicting the next token, this goal forces the model to create some form of abstract model. This model is weak, patchy, and imperfect, but it must exist if we observe what we observe. If our mathematical certainties are doubtful and the greatest experts are often on opposing positions, believing what one sees with their own eyes seems a wise approach.


> Instead, many have deeply underestimated LLMs, saying that after all they were nothing more than somewhat advanced Markov chains, capable, at most, of regurgitating extremely limited variations of what they had seen in the training set. Then this notion of the parrot, in the face of evidence, was almost universally retracted.

I'd like to see this evidence, and by that I don't mean someone just writing a blog post or tweeting "hey I asked an LLM to do this, and wow". Is there a numerical measurement, like training loss or perplexity, that quantifies "outside the training set"? Otherwise, I find it difficult to take statements like the above seriously.

LLMs can do some interesting things with text, no doubt. But these models are trained on terabytes of data. Can you really guarantee "there is no part of my query that is in the training set, not even reworded"? Perhaps we can grep through the training set every time one of these claims are made.


Exactly. I think that it’s very hard for us to comprehend just how much is out there on the internet.

The perfect example of that is the tikz unicorn in the Sparks paper. Seemed like a unique task, until someone found a tikz unicorn in an obscure website.

There is plenty of evidence that LLMs struggle as you move out of distribution. Which makes perfect sense as long as you stop trying to attribute what they’re doing to magic.

This doesn’t mean they’re not useful, of course. But it means that we should should be skeptical about wild capability claims until we have better evidence than a tweet, as you put it.


They didn't actually find a unicorn ; they found other tikz animals. It still generalized to the unicorn.

This was the package: https://ctan.org/pkg/tikzlings?lang=en


>Can you really guarantee "there is no part of my query that is in the training set, not even reworded"?

I mean..yes?

Multi digit arithmetic, translation, summarization. There are many tasks where this is trivial.


The most useful feature of llms is how much output you get from such little signal. Just yesterday I created a fairly advanced script from my phone on the bus ride home with chatgpt which was an absolute pleasure. I think multi-prompt conversations don't get nearly as much attention as they should in llm evaluations.


I suppose multi-prompt conversations are just a variation on few-shot prompting. I do agree though, that they don't play a big enough role in eval, but also in the heads of many people. So many capable engineers I now nope out of GPT because the first answer isn't satisfactory, instead of continuing the dialog.


> These are all things I do not want to do, especially now, with Google having become a sea of spam in which to hunt for a few useful things.

Seriously, just don't use Google for search. Google search is just a way to get you to look at their ads.

Use a search engine that is aligned with your best interests, suppresses spammy sites, and lets you customise what you want it to surface.

I've used chatgpt as a coding assistant, with varying results. But my experience is that better search is orders of magnitude more useful.


> Use a search engine that suppresses spammy sites and lets you customise what you want it to surface.

Can you give an example of such a search engine? Which one(s) do you use and why?


I see a lot of recommendations for kagi, but no mention of brave search - specifically the (beta) feature called “goggles”. Afaiu it’s a blend of kagi’s “lenses” and the site ranking in search results.

https://search.brave.com/help/goggles

There is a list (search) of public goggles: https://search.brave.com/goggles

The goggles itself are just text files with basic syntax and can be hosted on e.g. github gist. (though you have to publish it to brave)

https://github.com/brave/goggles-quickstart/blob/main/goggle...

Tbh, I can’t really compare brave search to kagi, since I never used kagi (though I’m using Orion - webkit based browser from the same dev and love it). Afaik, brave search is using its own index, thus making the results somehow limited and inferior to kagis. Just wanted to throw some (free) alternative here that works for me. :)

* Note that Brave search, despite privacy oriented, is still ad funded and there was few controversies about brave’s (browser) privacy in the past. (if that’s relevant for you)

* I’m not affiliated with Brave in any way.


Kagi. You have to pay - but it prioritises based on content, not ads, and it lets you pin / emphasise / deemphasise / block sites according to your needs.

(No connection with kagi.com except being a very satisfied user)


Pay for Kagi. It's a tool.

Pay for it, so search results are the product, instead of an ad platform sold to advertisers with you as the product.


Kagi does that, switched to it full time after years of giving alternatives like DDG a go and failing. Can recommend!


Can you give an example where kagi is better than google?

I've tried a couple of searches on the free tier and they gave pretty much the same results. I only have so many free searches to check too.


It allows me to remove websites from the results. That’s already one of the main selling points for me.


You can do this within google as well, by the way.


Sure, but can you please give me an example since I only have so many searches and I've switched over to chatgpt for most of my former googling tasks.


Google has been inundated with SEO spam, and sometimes I want current things so LLMs don't work that well. One example is I was buying a ... wait, actually I was putting together some examples for you to compare Kagi (I am an unlimited subscriber) to Google directly, and none of them work now. My Google results for things like "best running shoes 2024" or things like that returned basically the same results as Kagi, pushing sites like Reddit and Wirecutter and REI blog and other known-good blogs to the top. Tried this in Private Browsing as well.

This is definitely a departure because when I subscribed to Kagi a couple months ago, all of my Google results for similar searches were SEO spam blogs filled with Amazon affiliate links that look like they had just sucked some Amazon reviews automatically into some poor facade to generate affiliate revenue.

These results were a surprise to me. Not sure what changed.


Yes, that's what I was getting too.

I imagine what changed is that Kagi started getting traction on site like here and some managers at google actually did something about it.

My own test "voynich illuminated manuscript" which used to give nothing but pintrest spam on google. Now there is just one result from pintrest in google and pretty much every result in Kagi is from pintrest.

There is an academic tab which seems interesting. I will give it a try later.


Above, I suggested pay for Kagi. A search engine is more than just serps:

https://blog.kagi.com/kagi-features

If you prefer LLMs to Googling, then at least consider "phind":

https://www.phind.com/search?home=true


I use phind.com, but perplexity.ai also works well


I like to use chatgpt for the easy stuff, things I forgot, do to rarely to remember or code in another language very similar to the one I already know.

I do quickly run into bumps, where search is necessary (a lot of times it's some variant of a breaking change in a dependent library). Once I find a good enough issue description, I just slap that back into chatgpt. It handles it very well and sticks for the rest of the conversation. Somehow chatgpt is aware that the context information takes precedence over trained data.

I also have the Kagi subscription, which I'm using for above. I'm very happy with both tools working in tandem and am genuinely happy about that kind of time spending


antirez thank you for talking some sense. I’ve seen skilled devs discard LLMs entirely based on seeing one (too many) hallucinations, then proclaiming they are inferior and throwing the baby away with the bathwater. There is still plenty of use to be had from them even if they are imperfect.


What an ending...

I have never loved learning the details of an obscure communication protocol or the convoluted methods of a library written by someone who wants to show how good they are. It seems like "junk knowledge" to me. LLMs save me from all this more and more every day.

This is depressing or tongue-in-cheek considering who he is -- Redis creator -- and has an older post titled 'In defense of linked lists', so talking about linked lists in Rust is not "junk knowledge" or something an LLM can analyze circles around any human.

It's the best coding nihilism as a profession post I have read though.


There is a misunderstanding going here. A linked list is a pure form of knowledge. What we see today is an explosion of arbitrary complexity that is the fruit, mostly, of bad design. If I learn the internals of React, I'm not really understanding anything fundamental. If I get to know the subtleness of Rust semantics and then Rust goes away, I'm left with nothing: it's not like learning Lisp. Think to all the folks that used to master M4 macros in the Sendmail, 30 years ago. I was saying the same, back then: this is garbage knowledge.

Today we have a great example in Kubernetes, and all the other synthetic complexity out there. I'm in, instead, to learn important ML concepts, new data structures, new abstractions. Not the result of some poor design activity. LLMs allow you to offload this memorization out of your mind, to make space for distilled ideas.


Spot on - it is one of the main reasons I haven't enjoyed programming in recent years, so much of it is learning what you call "garbage knowledge". Yet another API, yet another DSL, yet another standard library. Endless reading of internal wiki pages to learn the byzantine deployment system of my current company. Even worse, when I know exactly what I want, but some little dependency or piece of tooling is bad and I spend hours, or days, trying to debug it.

I, too, find LLMs a balm for this pain. They have kind-of-basic level of knowledge, but about everything.

In short, it allows for a more efficient expenditure of mental and emotional energy!


To rephrase it a little bit.

Much of programming, coding and developing is done by a person who is a knowledge worker and writes code. A good proportion of code to be written, will be written just once and never again. The one-off code snippet will stay in a file collecting dust forever. There is no point in trying to remember it in the first place, because without constant repetition of using it, it will be forgotten.

LLMs can help us focus our knowledge where it really matters, and discard a lot of the ephemeral stuff. That means that we can be more of knowledge workers and less of coders. I will push it even further and state that we will become more of knowledge workers and less of coders until we will be, eventually and gradually, just knowledge workers. We will need to know about algorithms, algorithmic complexity, abstractions and stuff like that.

We will need to know subjects like that Rust book [1] writes about.

[1]https://github.com/QMHTMY/RustBook/tree/main/books


> this erudite fool is at our disposal and answers all the questions asked of them,

Yes, but I have to double-check every answer. And that, for me, greatly mitigates or entirely negates their utility. Of what value is a pocket calculator that only gets the right answer 75% if the time, and you don't ex ante know what 75%?


- I can read the code and reading code is faster than writing it.

- I can also tell the llm to write tests for the code it wrote and i can validate that the tests are valid.

- LLMs are also valuable in introducing me to concepts and techniques I would never had had exposure to. For example, I have a problem and explain my problem, it will bring up technologies or terms I never considered because I just didn't know about them. I can then do research into those technologies to decide if they are actually the right approach.


> I can also tell the llm to write tests for the code it wrote and i can validate that the tests are valid.

If I don't trust the generated code, why should I trust the generated code that tests the generated code?


Do you trust your ability to read?


As long as P != NP, verification should be much easier than producing a solution.

Or, from a different angle - all models are wrong, some are useful.

As it happens, LLMs are useful even if they're sometimes wrong.


> As long as P != NP, verification should be much easier than producing a solution.

Perhaps so. I guess it depends on how long it takes to code up property-based tests.

https://hypothesis.readthedocs.io/en/latest/


Programming is special because 99% of times you can tell immediately if something works or not, so the risk of misinformation is very narrow.


Perhaps you've omitted some important context here, or you're using an extremely restricted definition of "works"? The interesting and hard question with software is not "did it compile" but rather "did it meet the never-clearly-articulated needs of the user"...

I would agree that it is a primary goal of software engineering to move as much as possible into the category of automatic verification, but we're a long, long way from 99%.


I agree with your point here.

I think that antirez is technically correct in that there is a vast amount of code that will not compile compared to the amount of code that will compile. So saying '99%' sort of makes sense.

But that doesn't capture the fact that of the code that compiles there is a vast amount of code that doesn't do what we want to happen at runtime compared to the code that does do what we want to happen.

And after that there is a vast amount of code that doesn't do what we want to happen 100% of the time at runtime compared to the code that only most of the times does what we want to happen at runtime.

The interesting thought experiment that came to me when thinking about this was that I would be more likely to trust LLM code in C# or Rust than I would be to trust LLM code in assembly or Ruby.

Which makes me wonder ... can LLMs write working Idris or ATS code?


I've seen people put untested AI hallucinations under review, with non existant function names, passing CI just because it was under debug defines.

I've seen some refer to non existant APIs while discussing migration to a new library major version. "Sure that's easy, we should just replace this function with this new one".

Imagine all those more subtle bugs that are harder to spot.


Hi! I'm a big fan of Redis and also the little Kilo editor you wrote.

But, I have to disagree on this point, since many programs written in ie. C have security issues that takes a long time to discover.


Doesn't seem very secure if it entirely dependent on human's cheking it manually. Humans are famously fallible.


I am glad you agree with my point that 99% of the time you can not immediately tell if code works or not.


I have found only a few cases where ChatGPT has been very useful to me. e.g. writing long SQL queries and certain mathematical functions like finding the area of intersection of two rectangles. And it hallucinates enough that a lot of the time I can't use it because I know it would take more time to check it for correctness and edge cases than it would to just write it in the first place. Maybe I am using it wrong, but so far the results for me have been extremely impressive, but not yet very useful.


I'm surprised it can do long SQL queries, I wouldn't have thought. I've been looking into PRQL or other solutions to cover that ground. Can it do reasonably complex things like window functions?


Never tried it with something like that. I just mean long as in a lot of text like selecting a lot of stuff from a lot of tables and grouping, etc.


> High levels of reasoning are not required. LLMs are quite good at doing this, although they remain strongly limited by the maximum size of their context. This should really make programmers think. Is it worth writing programs of this kind? Sure, you get paid, and quite handsomely, but if an LLM can do part of it, maybe it's not the best place to be in five or ten years

I appreciate the author writing this article. Whenever I read about future of field, I get anxiety and confusion but then again I think other options too which were available to me was less interest of me.

I am now at the place that I still have the opportunity to pivot and focus on pure/applied mathematics than being in software field.

Honestly I wanted to make money through this career but I don't know what carrer to choose now.

I keep working on myself and don't compare myself to others but if argument is top 1% programmers will be required in the future then I doubt myself because I have still learn lot of things and then how about competing with both experienced & knowledgeable.

I was thinking about pin-pointing a target then becoming expert at it (by 10000 hrs rule)

I'm sorry to ask but today or in-general I am very confused which path/carrer to Target related to computing, Mathematics. Please suggest and give me your valuable advice. Thank you


I’m not sure you’re focused on the important question. For example: Who you marry, if you choose to go that route, may very well be the most important decision you’ll ever make.

Putting that aside, based on your question and willingness to put it out there… I would say this: just surrender to what charms you right now. Do you feel drawn toward programming? Follow that. Or math? Follow that. They may not be mutually exclusive.

As you go, stay tuned in to how you feel about the activity in the moment. Not your anxiety about what you think about the future prospects, but just how it feels right now to be doing the thing. That feeling may change over time, and it will guide you if you stay tuned in.


I'd say study deep learning (nice mix of maths and CS) or do software but learn to use the AI tools in the process.

If you're looking at a 5-10 year timeline then even pure or applied mathematics may well heavily use AI models.

We're always going to need architects that build the scaffolding together with LLMs. Programmers + LLMs will be able to outcompete programmers without. If one programmer can do more it just means projects will become more ambitious not less programming needed.

I've never worked for a company that had too little work for their software engineers. Rather many projects are on long timelines because there are only so many hours available per month.

Another analogy: with a high level programming language you can do what previously needed 10x the lines of code in assembly. I don't think they caused job losses for software engineers.


I wouldn’t worry too much about the demand for programmers. Jevon’s paradox has played out enough times that I’m sure as the cost of code goes to zero the demand will continue to increase. Look forward to the day that your toilet paper comes with an API.


But I don't want to write code to display ads on toilet paper :(


Nobody’s going to make you :)

Write code for the wise cracking door AI instead.


One of the areas that has sped up the most for me while using ChatGPT to code is having it write test cases. Paste it a class and it can write a pretty good set of specs if you iterate with it. Literally 10x faster than doing it myself. This speed-up can also occur with languages/frameworks I'm not familiar with.


> This should really make programmers think. Is it worth writing programs of this kind? Sure, you get paid, and quite handsomely, but if an LLM can do part of it, maybe it's not the best place to be in five or ten years.

Some one, a person with a sense of responsibility, has to sign off on changes to the code. LLMs have shown to come with answers that make no sense or contain bugs. A person (for now) needs to decide is the LLM's suggestion is acceptable, if we need more tests, if we want to maintain it.

I think programmers will be needed for that, they will just be made more productive (as what happened with the introduction of garbage collection, strong typed languages, powerful IDEs, StackExchange, ...)


The deep coder example doesn't appear to actually be doing what the comments or the article say it does.

It appears no better than the mixtral example that it's supposedly an improvement on.


This is my cut & paste failure (I didn't re-check the GPT-4 output that fixed the grammar). Fixing...


Do you know of an article that covers LLMs from the point of view of a tutor/study partner/reading group?

Yours is the first blog which matches my experiences with the code side of things, but I've found them even more useful in the learning side of things.


I wrote an article subtitled "llms flatten steep learning curves": https://earthly.dev/blog/future-is-rusty/


> At the same time, however, my experience over the past few months suggests that for system programming, LLMs almost never provide acceptable solutions if you are already an experienced programmer.

Hmm, this suggests to me that in a better world, the systems problems would have been solved with code, and the sorts of one-off problems which current LLMs do handle well would have been solved with formulae in a (shell-like? not necessarily turing-complete?) DSL.


> LLMs almost never provide acceptable solutions if you are already an experienced programmer.

or, the other stuff GPT was producing is just as bad, but that he's not experienced enough in the domain to see it, where as the stuff he is experienced with looks immediately sus or subpar.



Yes, but this is working. I literally didn't look at the code... but at the result. In other fields GPT can't deliver working code at all.


I find that I am very unproductive if I am connected to the internet. The only way for me to get any real work done is to turn off the router.

At the same time, GPT apparently doubles programming productivity. (Though obviously this depends on the task.)

I've long wished to have the best of both worlds. It seems I may soon get my wish: local LLMs will probably catch up with GPT-4 this year, or even outpace it!


For me LLMs revealed how easily it is to manipulate masses with properly done marketing. Despite these tools being obviously unreliable, tens of people on here report how well they work and how much they changed their lives. Shows that with sufficient propaganda you can make people see and feel things which are not there - not a new concept. But what’s new to me is just how easy it is.


I'm ok with a degree of unreliability when experimenting with new ideas.

That's a tradeoff I already make when using relativly new third party libraries/services to accelerate experimentation.


That is ok, I do it too - that's why I use tools such as chatgpt. But from that to calling it life changing there's a wide gap. The tool is nowhere near what's advertised, far from it.


> At the same time, however, my experience over the past few months suggests that for system programming, LLMs almost never provide acceptable solutions if you are already an experienced programmer.

In one off tasks where someone is not enough of an expert to know its flaws, and such expertise is not required, "the marvel is not that the bear dances well, but that the bear dances at all".


LLMs are going to have to get much cheaper to train to be useful in corporations, where the questions you want to ask are going to depend on proprietary code. You can't ask "What does subsystem FooBar do, and where does it fit in the overall architecture?" You'd want to be able to continuously retrain the model, as the code base evolves.


> Since the advent of ChatGPT, and later by using LLMs that operate locally

Does HN have any favorite local LLMs for coding-related tasks?


Phind-CodeLlama-34B-v2 seems to work well for our team.


deepseek-coder has been decent for my purposes


Currently, what I get out from it is a good quick overview with some hallucinations. You have to actually know what you're doing to check the code. However, this is a fast moving target and will in no time be doing that part as well. I think stepping back and thinking maybe this thing is just giving us more and more agency and what can we do with that? We need to adapt to not constrain ourselves to just being programmers. We are humans with agency and if we can adapt to this, we can be more and more powerful having our technical insight that we've gained over the years to do some really cool things. I have a startup and with ChatGPT I've managed to do all parts of the stack with confidence and used it for all sorts of business related things outside of coding that have really helped move the business forward quickly.


> I regret to say it, but it's true: most of today's programming consists of regurgitating the same things in slightly different forms. High levels of reasoning are not required.

But that's basically what engineering (, and medicine, and law, and all sorts of professions out there,) has always been about. Engineers build railways and bridges based on the same proven principles in slightly different forms, adapting to the specific needs of each project. Their job is not to come up with groundbreaking inventions every day.


This quote in particular struck me as relevant:

> And now Google is unusable: using LLMs even just as a compressed form of documentation is a good idea.

Beyond all the hype, it'd undeniable that LLMs are good at matching your query about a programming problem to an answer without inundating you with ads and blog spam. LLMs are, at the very least, just better at answering your questions than putting your question into to google and searching Stack Overflow.

About two years ago I got so sick of how awful Google was for any serious technical questions that I started building up a collection of reference books again just because it was quickly becoming the only way to get answers about many topics I cared about. I still find these are helpful since even GPT-4 struggles with more nuanced topics, but at least I have a fantastic solution for all those mundane problems that come up.

Thinking about it, it's not surprising that Google completely dropped the ball on AI since their business model has become bad search (i.e. they derive all their profit from adding things you don't want to your search experience). At their most basic, LLMs are just really powerful search engines, it would take some cleverness to make them bad in the way Google benefits from.


How many of us remember that at the beginning of last year the fear was that programming by programmers will get obsolete by 2024 and LLMs will be doing all the job?

How much has changed?


I remember some people were saying things in vaguely but not explicitly that direction, but given OpenAI were "we're not trying to make bigger models for now, we're trying to learn more about the ones we've already got and how to make sure they're safe" I dismissed them as fantasists.

What has happened is GPT-4 came out (which is certainly better in some domains but not everywhere), but mainly the models have become much cheaper and slightly easier to run, and people are pairing LLMs with other things rather than using them as a single solution for all possible tasks — which they probably could do in principle if scaled up sufficiently, but there may well not be enough training data and there certainly aren't computers with enough RAM.

And, like with the self-driving cars, we've learned a lot of surprising failure modes.

(As I'm currently job-hunting, I hope what I wrote here is true and not just… is "copium" the appropriate neologism?)


Conclusion in the blog post says it all:

> I regret to say it, but it's true: most of today's programming consists of regurgitating the same things in slightly different forms. High levels of reasoning are not required. LLMs are quite good at doing this, although they remain strongly limited by the maximum size of their context. This should really make programmers think. Is it worth writing programs of this kind? Sure, you get paid, and quite handsomely, but if an LLM can do part of it, maybe it's not the best place to be in five or ten years.


>> I regret to say it, but it's true: most of today's programming consists of regurgitating the same things in slightly different forms.

I wonder how different this would be if software was not hindered by "intellectual property" laws.


I wouldn't call myself an expert, but my gut tells me we're close to a local maximum when it comes to the core capabilities of LLMs. I might be wrong of course. If I'm right, I don't know when or if we'll get out of that. But it seems the work of putting LLMs to good use is gonna continue for the next years regardless. I imagine hybrid systems between traditional deterministic IDE features and LLMs could become way more powerful than what we have today. I think for the foreseeable future, any system that's supposed to be reliable and well understood (most software, I hope) will require people willing and capable to understand it, that's in my mind the core thing programmers are and will continue to be needed for. But anyway: I do expect less programmers will be needed if demand remains constant.

As for demand, that's difficult to predict. I'd argue a lot of software being written today doesn't really need to be written. Lots of weird ideas were being tried because the money was there, pursuing ever new hypes, with an entire sub industry building ever more specialised tools fueling all this. And with all that growth, ever more programmers have been thrown at dysfunctional organisations to get a little more work done. My gut tells me that we'll see less of that in the next years, but I feel even less competent to predict where the market will go than where the tech will go.

So long story short, I guess we'll still need programmers until there's a major leap towards GAI, but less than today.


The compiler is still not part of the picture, when LLMs start being able to produce binaries straight out of prompts, then programmers will indeed be obsolete.

This is the holy grail of low-code products.


Why is an unauditable result the holy grail? Is the goal to blindly trust the code generated by an LLM, with at best a suite of tests that can only validate the surface of the black box?


Money, low-code is the holy grail that business no longer need IT folks, or at very least, reduce the amount of FTEs they need to care about.

See all the SaaS products, without any access to their implementation, programable via graphical tooling, or orchestrated via Web API integration tools, e.g. Boomi.


Is it no different to you when the black box is created by an LLM rather than a company with guarantees of service and a legal entity you can go after in case of breach of contract?

Where does the trust in a binary spit out by an LLM come from? The binary is likely unique and therefore your trust can't be based on other users' experience, there likely isn't any financial incentive or risk on the part of the LLM should the binary have bugs or vulnerabilities, and you can't audit it if you wanted to.


As usual this kind of things will sorted out, as developers have to search for something else.

QA, acceptance testing whatever, no different from buying closed source software.

Only those that never observed the replacement of factory workers by complete robot based chains can think this will never happen to them.

Here is a taste of the future,

https://www.microsoft.com/en-us/power-platform/products/powe...


Assembly line robots are still a bit different from LLMs directly generating binaries though, right?

An assembly line robot is programmed with a very specific repeatable task that can easily be quality tested to ensure that there aren't manufacturing defects. An LLM generating binaries is doing this one off, meaning it isn't repeatable, and the logic of the binary isn't human auditable meaning we have to trust that it does what was asked of it and nothing more.


The same line of arguments from Assembly language developers against FORTRAN compilers and the machine code they could generate.

There are ACM papers about it.

It didn't hold on.

Do you really inspect the machine code generated by your AOT or JIT compilers, in every single execution of the compiler?

Do you manually inspect every single binary installed into the computer?


There's a fundamental difference between a compiler and a generative LLM algorithm, though. One is predictable, repeatable, and testable. The other will answer the same question slightly differently every time its asked.

Would you trust a compiler's byte code if it spit out slightly different instructions every time you gave it the same input? Would you feel confident in the reliability and performance of the output? How can you meaningfully debug or performance profile your program when you don't know what the LLM did and can't reproduce the issue locally short of running the exact copy of the deployed binary?

Comparing compilers and LLMs really is apples and oranges. That doesn't mean LLMs aren't sometimes helpful or that they should never be used in any situation, but LLM fundamentally are a bad fit for the requirements of a compiler.


So who is instructing the LLMs on what sort of binaries to produce? Who is testing the binaries? Who is deploying them? Who is instructing the LLMs to perform maintenance and upgrades? You think the managers are up for all that? Or the customers who don’t know what they want?


Just like offshoring nowadays, you take the developers out of the loop, and keep PO, architects and QA.

Instead of warm bodies somewhere on the other side of the planet, it is a LLM.


Nobody of any interest said this. This is something you are saying now using a thin rhetorical strategy meant to make you look correct over an opponent that doesn't exist.


That's like the people saying, "they said the ice caps would melt, ha, hasn't happened, all fake". Meanwhile, nobody said that.


Can't say I saw anyone thinking programmers would be obsolete by 2024...


I remember 10 years ago when the fear was that cheaper programmers in developing countries (India mostly) would be doing all the programming.

It's just a scam to keep you scared and stop you from empathizing with your fellow workers.


Outsourcing was more of a threat than AI. And a lot of jobs really did move. It is still a real thing, not that many programming jobs moved back to the states.


That's legit. I've managed to dodge it but many jobs have moved overseas. Many of my coworkers the past years have been contractors living in other countries.

This is what happened to America's manufacturing industry. Shouldn't emphasizing with fellow workers mean recognizing the pattern instead of dismissing it as FUD?


I don't think it's a scam or a conspiracy. It's human nature to worry and when given a reasonably sounding, but scary, idea we tend to spread it to others.


> It's just a scam to keep you scared and stop you from empathizing with your fellow workers.

I am quite unconvinced this is the reason. Seems rather conspiratorial.


So far I've seen bad programmers create more (and possibly worse) bad code and good ones use LLM to their advantage.


I really like the argument about misinformation vs testing. I'm not totally sold on "you can just see it", but I do think something like TDD could suddenly be really productive in this world.

I've found autocomplete via these systems to be improving rapidly. For some work, it's already a big boost, and it's close to a difference in kind from the original IntelliSense. Amusingly though, I primarily write in an editor without any autocomplete, so I don't experience this often. But I do, precisely for the throwaway code and lower-value changes.

Finally, it's not clear to me that the distinction is between systems programming and scripting. My sense is that Chat GPT and similar are (a) heavily influenced by the large corpus of Python, so it's better at it than C and (b) the examples here involved more clever bit manipulation than most software engineers ever interact with.


> the examples here involved more clever bit manipulation than most software engineers ever interact with

Perlis once quoth:

> 18. A program without a loop and a structured variable isn't worth writing.

After 5 minutes of thought, I'd update that, for my hacking, to:

"A program without some convergence reasoning and a non-injective change of representation isn't worth writing."

(iow, I'd be happy to let LLMs, or at least other people, wrangle glue and parsley code, according to the taxonomy of: https://news.ycombinator.com/item?id=32498382 )


I like the term parsley code!

I do suspect though that both the hashing and 6-bit weight examples are just extremely rare in the corpus. It wasn't confused about loops, or hashing generally, but just didn't do as well as antirez would have liked. The description of the 6-bit to "why don't I just cast this to 8-bits" thing is definitely a problem. And worse, it's a problem a more junior engineer might not understand. But I suspect that a model trained on a corpus with lots more bit manipulation would have been fine, as it wasn't complex.

Clearly we just need a fine tuned one :).


Besides using chatGPT for certain pieces of code that use a 3rd party library. I successfully used it as a "Code Reviewer". I recently copied functions of a Symfony PHP controller and asked for a code review and suggestions for refactoring with code and reasons. Surprisingly it worked very well and I was able to refactor a good amount of code.


I feel that I'm being too conservative with how I use AI. Currently I use Copilot Autocomplete with a bit of Copilot Chat, which is great and almost always gets small snippets correct, but I sometimes worry that I'm not using it to the full potential -- so I can be faster with my side projects -- for example, by generating entire classes.


In general Copilot is much weaker than bigger/slower models, so if you have the feeling you are not using AI enough, the first thing to try IMHO is to chat with powerful models like GPT4 to see what is the current state of art in code generation.


Thank you for the suggestion, antirez! Unfortunately that option does cost money (Copilot is free for students), although Bing might be an alright alternative.


Indeed, you are right. If you have an M[1,2,3] MacBook with enough RAM, you may want to run some model like DeepSeek-coder 34B in local. Or the smaller one, but it is going to be weaker.


This is one of the best pieces I’ve read that articulates what it’s like to work closely with LLMs as creative partners.


I definitely mostly use it in the same way - generating discrete snippets.

Haven’t had much luck with code completion thus far.


Here it is said that the earlier assumption of LLMs being parrots has been universally retracted in the face of the evidence. But if you see the NYT case against OpenAI, being uncontrolled parrots is exactly what ChatGPT is being accused of. Which is the truth?


  LLMs are like stupid savants who know a lot of things.
Leaving the requisite “no, that’s not what language models are, you’re misunderstanding what’s important here, the best knowledge model already exists and it’s called Wikipedia”


While there clearly was a lot of hype, retrieval augmented generation (RAG) proved to be an effective technique with LLMs. Using RAG with project documentation and/or code can be useful.


Honestly speaking code generation is a form of augmented retrieval. And going further back, I would say human memory is generated from context rather than retrieved (which is why it's often fallible - we hallucinate details).

LLM's today for me are the equivalent of a large scale human memory for code or for faster augmented retrieval - do they hallucinate details, quite often, but do I find it utilitarian versus dragging myself over documentation details - more often than not.


I think LLMs are good for quick prototyping first drafts of small functions or simple systems.

For me they help when time is short and when I want to maximize creative exploration.


@dang something is wrong with the ranking of this post.


I enjoy antirez's work, and I enjoyed this essay, but I disagree with many of its conclusions. In particular:

> this goal forces the model to create some form of abstract model. This model is weak, patchy, and imperfect, but it must exist if we observe what we observe.

Is a completely fallacious line of reasoning, and I'm surprised that he draws this conclusion. The whole reason the "problem of other minds" is still a problem in philosophy is precisely because we cannot be certain that some "abstract model" exists in someone's head (man or machine, do you argue it does? show it to me) simply because an output meeting certain constraints exists. This is exactly the problem of education. A student that studies to answer questions correctly on a test may not have an abstract model of the subject area at all. Even they may not be conducting what we call reasoning. If a student aces a test, can you confidently say they actually understand a domain? Or did they simply ace a test?

Furthermore, LLM's lack of consistency and inability to answer basic mathematical questions, and their limitation to purely text based areas of concern and representation are all much stronger arguments for siding with the notion that they really are just sophisticated, stochastic, machines, incapable of what we'd normally call reason in a human context. If LLM's "reason" it is a much different form of reasoning than that which human beings are capable of, and I'm highly skeptical that any such network will achieve parity to human reason until it can "grow up" and learn embodied in a rich, multi sensory environment, just like human beings. For machines to achieve reason, they will need to break out of the text-only/digital-only box first.


> is still a problem in philosophy

Exactly! This is why I removed this fundamental questions from my post: in this moment they don't have any clear reply and will basically make an already complex landscape even more complex. I believe that right now, whatever is happening inside LLMs, we need to focus on investigating the practical level of their "reasoning" abilities. They are very different objects than human brains, but they can do certain limited tasks that before LLMs we thought to be completely in the domain of humans.

We know that LLMs are just very complex functions interpolating their inputs, but this functions are so convoluted, that in practical ways they can solve problems that were, before LLMs, completely outside the reach of automatic systems. Whatever is happening inside those systems is not really important for the way they can or can't reshape our society.


TLS cert has expired on the antirez site.


> Homo sapiens invented neural networks

Is it just me or did anyone smile at this sentence? The first paragraph sounds like the academic way of saying "we invented huge neural networks but we couldn't understand it".


This might be a good article, I wouldn't know because I can't read this monospace atrocity.

Reader view in Safari preserves the monospace font... /facepalm



  In order to protect your delicate sensibilities
  I would further suggest to avoid consulting most
  research output from before the mid 1980s.
eg https://www.rand.org/content/dam/rand/pubs/research_memorand...


I also found it looks awful, which made it hard for me to read, but Firefox reader mode did at least change the font.


bro it's 2024, get SSL




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: