GitHub cuts AI deals with Google, Anthropic

altbdoor · 2024-10-29T16:27:30 1730219250

thih9 · 2024-10-29T19:59:46 1730231986

I use cursor and its tab completion; while what it can do is mind blowing, in practice I’m not noticing a productivity boost.

I find that ai can help significantly with doing plumbing, but it has no problems with connecting the pipes wrong. I need to double and triple check the updated code - or fix the resulting errors when I don’t do that. So: boilerplate and outer app layers, yes; architecture and core libraries, no.

Curious, is that a property of all ai assisted tools for now? Or would copilot, perhaps with its new models, offer a different experience?

ianbutler · 2024-10-29T21:49:00 1730238540

I'm actually very curious why AI use is such a bi-modal experience. I've used AI to move multi thousand line codebases between languages. I've created new apps from scratch with it.

My theory is the willingness to baby sit and the modality. I'm perfectly fine telling the tool I use its errors and working side by side with it like it was another person. At the end of the day it can belt out lines of code faster than I, or any human, can and I can review code very quickly so the overall productivity boost has been great.

It does fundamentally alter my workflow. I'm very hands off keyboard when I'm working with AI in a way that is much more like working with someone or coaching someone to make something instead of doing the making myself. Which I'm fine with but recognize many developers aren't.

I use AI autocomplete 0% of the time as I found that workflow was not as effective as me just writing code, but most of my most successful work using AI is a chat dialogue where I'm letting it build large swaths of the project a file or parts of a file at a time, with me reviewing and coaching.

0xFACEFEED · 2024-10-29T22:08:26 1730239706

As a programmer of over 20 years - this is terrifying.

I'm willing to accept that I just have "get off my lawn" syndrome or something.

But the idea of letting an LLM write/move large swaths of code seems so incredibly irresponsible. Whenever I sit down to write some code, be it a large implementation or a small function, I think about what other people (or future versions of myself) will struggle with when interacting with the code. Is it clear and concise? Is it too clever? Is it too easy to write a subtle bug when making changes? Have I made it totally clear that X is relying on Y dangerous behavior by adding a comment or intentionally making it visible in some other way?

It goes the other way too. If I know someone well (or their style) then it makes evaluating their code easier. The more time I spend in a codebase the better idea I have of what the writer was trying to do. I remember spending a lot of time reading the early Redis codebase and got a pretty good sense of how Salvatore thinks. Or altering my approaches to code reviews depending on which coworker was submitting it. These weren't things I were doing out of desire but because all non-trivial code has so much subtlety; it's just the nature of the beast.

So the thought of opening up a codebase that was cobbled together by an AI is just scary to me. Subtle bugs and errors would be equally distributed across the whole thing instead of where the writer was less competent (as is often the case). The whole thing just sounds like a gargantuan mess.

Change my mind.

hansonkd · 2024-10-29T22:55:48 1730242548

> The more time I spend in a codebase the better idea I have of what the writer was trying to do.

This whole thing of using LLMs to Code reminds me a bit of when Google Translate came out and became popular, right around the time I started studying Russian.

Yes, copying and pasting a block of Russian text produced a block of english text that you could get a general idea of what was happening. But translating from english to russian rarely worked well enough to fool the professor because of all the idioms, style, etc. Russian has a lot of ways you can write "compactly" with fewer words than english and have a much more precise meaning of the sentence. (I always likened russian to type-safe haskell and english to dynamic python)

If you actually understood Russian and read the text, you could uncover much deeper and subtle meaning and connections that get lost in translation.

If you went to russia today you could get around with google translate and people would understand you. But you aren't going to be having anything other than surface level requests and responses.

Coding with LLMs reminds me a lot of this. Yes, they produce something that the computer understands and runs, but the meaning and intention of what you wanted to communicate gets lost through this translation layer.

Coding is even worse because i feel like the intention of coding should never to be to belt out as many lines as possible. Coding has powerful abstractions that you can use to minimize the lines you write and crystalize meaning and intent.

balder1991 · 2024-10-29T23:40:53 1730245253

> the intention of coding should never to be to belt out as many lines as possible

That’s such an underrated statement. Especially when you consider the amount of code as a liability that you’ll have to take care later.

abadpoli · 2024-10-30T02:50:13 1730256613

This presumes that it will be real humans that have to “take care” of the code later.

A lot of the people that are hawking AI, especially in management, are chasing a future where there are no humans, because AI writes the code and maintains the code, no pesky expensive humans needed. And AI won’t object to things like bad code style or low quality code.

tivert · 2024-10-30T05:34:57 1730266497

Well that will work great if you let the AI decide if the code is working or not.

User: This is calculating the result wrong.

AI: CLOSED WONTFIX: WORKING AS DESIGNED.

DeathArrow · 2024-10-30T07:16:21 1730272581

>AI writes the code

AI will never write proper code unless guided by someone who knows how to properly code and how to properly translate business needs into code.

TheNewsIsHere · 2024-10-30T11:32:16 1730287936

> [...] business needs into code.

I think this is where we lose a lot of developers. My experience has been this is a skill set that isn’t as common as you’d hope for, and requires some experience outside developing software as its own act. In other words, this doesn’t seem to be a skill that is natural to developers who haven’t had (or availed themselves of) the opportunity to do the business analyst and requirements gathering style work that goes into translating business needs to software outcomes. Many developers are isolated (or isolate themselves) from the business side of the business. That makes it very difficult to be able to translate those needs themselves. They may be unaware of and not understand, for example, why you’d want to design an accounting feature in a SaaS application in a certain way to meet a financial accounting need.

On the flip side, non-technical management tends to underestimate and undervalue technical expertise either by ego or by naïveté. One of my grandmothers used to wonder how I could “make any money by playing on the computer all day,” when what she saw as play was actually work. Not video games, mind you. She saw computers as a means to an end, in her case entertainment or socializing. Highly skilled clients of mine, like physicians, while curious, are often bewildered that there are sometimes technical or policy limitations that don’t match their expectations and make their request untenable.

When we talk about something like an LLM, it simply doesn’t possess the ability to reason, which is precisely what is needed for that kind of work.

friendzis · 2024-10-31T06:56:45 1730357805

> > [...] business needs into code.

> I think this is where we lose a lot of developers. My experience has been this is a skill set that isn’t as common as you’d hope for

I know this is highly unpopular take, but I believe agile, scrum and similar has led the field directly into this direction.

Look at this magnificent blog post (https://www.scrum.org/resources/blog/5-worst-agile-mistakes-...) published recently right on scrum.org and especially this item, listed as one of the worst mistakes:

> 2. Letting the Definition of Done Include Customer Approval

In the olden days we used to model user workflows. Task A requires to do that and that, we do this and this and transition to workflow B. Acceptance testing was integral step of development workflow and while it did include some persuasion and deception, actual user feedback was part of the process.

As much as scrum tries to position itself as delivering value to the customer, the actual practices of modeling the customer, writing imagined user stories for them and pulling acceptance criteria out of llama's ass ensures that actual business and development team interact as little as possible. Yes, this does allow reduce the number of implemented features by quite a bunch, however by turning the tables and fitting a problem inside the solution.

Think about it, it is totally common for website layouts to shift, element focus to jump around as the website is loading or more recently two step login pages that break password managers. No sane user would sign these off, but no actual user has participated in the development process, neither at design phase, nor at testing phase.

emptiestplace · 2024-10-31T16:29:14 1730392154

Are you familiar with the idea of consciousness as an emergent property?

netdevnet · 2024-11-01T11:19:41 1730459981

You know this future isn't happening anytime soon. Certainly not in the next 100 years. Until then, humans will be taking care of it and no one will want to work at a place working on some Fransketeinian codebase made via an LLM. And even when humans are only working on 5% of the codebase, that will likely be the most critical bit and will have the same problems regarding staff recruitment and retention.

jkestner · 2024-10-30T05:49:06 1730267346

All you got to do is write the unit tests and let the AI evolve the code, right??

mreid · 2024-10-30T00:48:24 1730249304

I've heard a similar sentiment: "It's not lines of code written, it's lines of code spent."

It also reminds me of this analogy for data, especially sensitive data: "it's not oil, it's nuclear waste."

grbsh · 2024-10-30T04:04:42 1730261082

I think this is a bit short sighted, but I’m not sure how short. I suspect in the future, code will be something in between what it is today, and a build artifact. Do you have to maintain bytecode?

kevingadd · 2024-10-30T05:43:47 1730267027

People working on VMs have to maintain compatibility with old bytecode and evolve the bytecode format forward, does that count?

azangru · 2024-10-30T08:06:48 1730275608

> Russian has a lot of ways you can write "compactly" with fewer words than english and have a much more precise meaning of the sentence. (I always likened russian to type-safe haskell and english to dynamic python)

Funny; my experience has been completely the opposite. I've always envied the English language for how compactly and precisely it can express meaning compared to Russian, both because of an immensely rich vocabulary, and because of the very flexible grammar.

I suspect this difference in perception may be due to comparing original texts, especially ones produced by excellent writers or ones that have been polished by generations that use them, to translations, which are almost invariably stylistically inferior to the original: less creative, less playful, less punchy, less succinct. So, if you translate a good Russian writer who is a master of his craft into English, you may feel the inadequacy of the language. Likewise, whenever I try to read translations of English prose into Russian, it reads clumsy and depressingly weak.

pastage · 2024-10-30T08:52:36 1730278356

Translating is an interpretation of the original text. A translated book can be better than the original. But you often need mastery of the language you translate to.

crazygringo · 2024-10-30T16:48:11 1730306891

> A translated book can be better than the original.

Can you give some examples?

> But you often need mastery of the language you translate to.

Professional written translation is virtually always done into your native language, not into a language you've learned later. So that mastery should be taken for granted; it's a prerequisite.

DeathArrow · 2024-10-30T07:14:16 1730272456

Coding isn't the hard part. The hard part is translating the business needs in code.

You can tell a junior programmer "Make a DB with tables book, author, has Written, customer, stock, hasBought, with the following rules between them. Write a repository, for that DB. Use repository in BooksService and BasketService. Use those services in Books controller and Basket controller." and he will do a fine job.

Ask the junior to write an API for a book store and he will have a harder time.

etherealG · 2024-10-30T11:37:29 1730288249

Ask a modern LLM to write an API for a book store... pretty sure it'll be close to, if not better, than the quoted part of your comment.

It's hard to argue with the drastic jump in LLM capabilities.

jwr · 2024-10-30T13:27:48 1730294868

This.

I write in Clojure and "coding" is perhaps 10% of what I do? There is very little "plumbing", or boilerplate, most of the code directly addresses the business domain. "AI" won't help me with that.

atrettel · 2024-10-30T01:53:06 1730253186

This is a great analogy. I find myself thinking that by abstracting the entire design process when coding something using generative AI tools, you tend to lose track of fine details by only concentrating on the overall function.

Maybe the code works, but does it integrate well with the rest of the codebase? Do the data structures that it created follow the overall design principles for your application? For example, does it make the right tradeoffs between time and space complexity for this application? For certain applications, memory may be an issue and while code the may work, it uses too much memory to be useful in practice.

These are the kind of problems that I think about, and it aligns with your analogy. There is in fact something "lost through this translation layer".

Etherlord87 · 2024-10-30T08:14:55 1730276095

I think translating to russian wasn't worse than translating to English because "russian is more compact".

Probably it was worse, because people in charge in Google speak English. It was embarrassing to watch Google conferences where they proposed Google Translate to translate professional products. It's similarly embarrassing watching people proposing chatGPT lightly, because they lack the ability, or probably just don't care to, analyze the problem thoroughly.

eru · 2024-10-30T02:47:51 1730256471

I had the opposite experience lately:

I was helping translate some UI text for a website from English to German, my mother tongue. I found that usually the machine came up with better translations than me.

patrickk · 2024-10-30T06:17:44 1730269064

English and German are EU languages. Russian is not.

The EU maintains a large translation service to translate most EU official texts into all EU languages. So Google Translate is using that to train on. Google gets a free gift from a multinational bureaucracy and gets to look like a smart company in the process.

This is also why English-Mandarin is often poorly translated, in my opinion.

arthur_sav · 2024-10-30T07:45:10 1730274310

Doesn't that mean it's just inevitable what will happen.

The question is not longer IF machines are capable, the question is WHEN. And the when is no longer decades away.

71bw · 2024-10-30T07:40:38 1730274038

>This is also why English-Mandarin is often poorly translated, in my opinion.

Shockingly, this is something that Yandex Translate absolutely excels at.

tripzilch · 2024-11-05T08:48:31 1730796511

> Google gets a free gift from a multinational bureaucracy and gets to look like a smart company in the process

it would have cost you exactly nothing to not make an unnecessary dig at Europeans, here

eru · 2024-11-04T05:52:30 1730699550

> Google gets a free gift from a multinational bureaucracy and gets to look like a smart company in the process.

Luckily, that gift is available for free to everyone. So it seems like a half-decent thing to do with tax payer money.

amake · 2024-10-30T04:28:32 1730262512

Perhaps you are not a translator. Translating is a skill that is more than simply being bilingual.

tkgally · 2024-10-30T05:02:23 1730264543

I am a professional translator, and I have been using LLMs to speed up and, yes, improve my translations for a year and a half.

When properly prompted, the LLMs produce reasonably accurate and natural translations, but sometimes there are mistakes (often the result of ambiguities in the source text) or the sentences don’t flow together as smoothly as I would like. So I check and polish the translations sentence by sentence. While I’m doing that, I sometimes encounter a word or phrase that just doesn’t sound right to me but that I can’t think how to fix. In those cases, I give the LLMs the original and draft translation and ask for ten variations of the problematic sentence. Most of the suggestions wouldn’t work well, but there are usually two or three that I like and that are better than what I could come up with on my own.

Lately I have also been using LLMs as editors: I feed one the entire source text and the draft translation, and I ask for suggestions for corrections and improvements to the translation. I adopt the suggestions I like, and then I run the revised translation through another LLM with the same prompt. After five or six iterations, I do a final read-through of the translation to make sure everything is okay.

My guess is that using LLMs like this cuts my total translation time by close to half while raising the quality of the finished product by some significant but difficult-to-quantify amount.

This process became feasible only after ChatGPT, Claude, and Gemini got longer context windows. Each new model release has performed better than the previous one, too. I’ve also tried open-weight models, but they were significantly worse for Japanese to English, the direction I translate.

Although I am not a software developer, I’ve been following the debates on HN about whether or not LLMs are useful as coding assistants with much interest. My guess is that the disagreements are due partly to the different work situations of the people on both sides of the issue. But I also wonder if some of those who reject AI assistance just haven’t been able to find a suitable interactive workflow for using it.

eru · 2024-10-30T07:21:40 1730272900

> While I’m doing that, I sometimes encounter a word or phrase that just doesn’t sound right to me but that I can’t think how to fix. In those cases, I give the LLMs the original and draft translation and ask for ten variations of the problematic sentence. Most of the suggestions wouldn’t work well, but there are usually two or three that I like and that are better than what I could come up with on my own.

Yes, coming up with variations that work better (and hit the right connotations) is what I used the machine for, too.

Tainnor · 2024-10-30T10:05:19 1730282719

You don't even have to go back in time or use a comparatively rare language pair such as English/Russian.

Google Reviews insists on auto translating reviews to your native language (thankfully, the original review can be recovered by clicking a link). Even for English->German (probably one of the most common language pairs), the translations are usually so poor that you can't always tell what the review is trying to say. To be fair, I think that state of the art machine translation is better than whatever the hell Google is using here (google translate probably), but apparently product people at Google don't care enough to make translations better (or better, allow you to turn off this feature).

friendzis · 2024-10-31T06:35:05 1730356505

> Russian has a lot of ways you can write "compactly" with fewer words than english and have a much more precise meaning of the sentence. (I always likened russian to type-safe haskell and english to dynamic python)

The difference is in fusionality. English does not use inflection and relies heavily on auxiliaries (pre-, post-positions, particles, other modifiers) while Russian (and other Slavic, Baltic languages) rely rather heavily on inflection.

For English speakers, probably the closest is the gerund. A simple suffix transforms a verb into a noun-compatible form, denoting process. In highly fusional languages a root can be combined with multiple modifying pre-, a-, suf-fixes, and inflected on top. This does unlock some subtlety.

dools · 2024-10-30T02:40:48 1730256048

> But the idea of letting an LLM write/move large swaths of code seems so incredibly irresponsible

I heard a similar thing from a dude when I said I use it for bash scripts instead of copying and pasting things off StackOverflow.

He was a bit "get off my lawny" about the idea of running any code you didn't write, especially bash scripts in a terminal.

It is obviously the case that I didn't write most of the code in the world by a very large margin, but even not taking it to extremes if I'm working on a team and people are writing code how is it any different? Everyone makes mistakes, I make mistakes.

I think it's a bad idea to run things that you don't at least understand what it's going to do but the speed with which ChatGPT can produce, for example, gcloud shell commands to manage resources is lightning fast (all of which is very readable, just takes a while if you want to look it up and compose the commands yourself).

If your quality control method is "making sure there are no mistakes" then it's already broken regardless of where the code comes from. Me reviewing AI code is no different from me reviewing anyone else's code.

Me testing AI code using unit or integration tests is no different from testing anyone else's code, or my own code for that matter.

scubbo · 2024-10-30T03:36:50 1730259410

> Me reviewing AI code is no different from me reviewing anyone else's code.

I take your point, and on the whole I agree with your post, but this point is fundamentally _not_ correct, in that if I have a question about someone else's code I can ask them about their intention, state-of-mind, and understanding at the time they wrote it, and (subjectively, sure; but I think this is a reasonable claim) can _usually_ detect pretty well if they are bullshitting me when they respond. Asking AI for explanations tends to lead to extremely convincing and confident false justifications rather than an admission of error or doubt.

However:

> Me testing AI code using unit or integration tests is no different from testing anyone else's code, or my own code for that matter.

This is totally fair

Cthulhu_ · 2024-10-30T12:04:29 1730289869

> Asking AI for explanations tends to lead to extremely convincing and confident false justifications rather than an admission of error or doubt.

Not always true, AI can realise their own mistakes and they can learn. It's a feedback loop system, and / but as it stands this feedback of what is good and bad is provided by end-users and fed back into e.g. Copilot.

fragmede · 2024-10-30T12:22:21 1730290941

That loop is not a short one though. LLMs don't actively incorporate new information into its model while you're chatting with it. That goes into its context window/short term memory. That the inputs and outputs can be used when training the next model, or for fine tuning the current one doesn't change that the distinct steps of training and inference.

tripzilch · 2024-11-05T09:03:42 1730797422

> AI can realise

wait did AGI happen? which AI is this?

stop anthropomorphizing them

No they can't. They can generate text that indicates they hallucinated, you can tell them to stop, and they won't.

They can generate text that appears to admit they are incapable of doing a certain task, and you can ask them to do it again, and they will happily try and fail again.

Sorry but give us some examples of an AI "realizing" its own mistakes, learning, and then not making the mistake again.

Also, if this were even remotely possible (which it is not), then we should be able to just get AIs with all of the mistakes pre-made, so it learned and not do them again, right? So it has already "realized" and "learned" which tasks it's incapable of, so it will actually refuse or find a different way.

Or is there something special about the way that _you_ show the AI its mistakes, that is somehow more capable of making it "learn" from those mistakes than actually training it?

mikeshi42 · 2024-10-30T05:58:06 1730267886

I'm assuming by bullshitting you mean differentiating between LLM hallucinations and a human with low confidence in their code.

I've found that LLMs do sometimes acknowledge hallucinations. But really the check is much easier than a PR/questioning an author - just run the code given by the copilot and check that it works, just as if you typed it yourself.

scubbo · 2024-11-03T20:09:03 1730664543

> just run the code given by the copilot and check that it works

You've misunderstood my point. I'm not discussing the ability to check whether the code works as _I_ believe it should (as you say, that's easy to verify directly, by execution and/or testing); I'm referring to asking about intention or motivation of design choices by an author. Why this data structure rather than that one? Is this unusual or unidiomatic construction necessary in order to work around a quirk of the problem domain, or simply because the author had a brainfart or didn't know about the usual style? Are we introducing a queue here to allow for easy retries, or to decouple scaling of producers and consumers, or...? I can't evaluate the correctness of a choice without either knowing the motivation for it, or by learning the problem domain well enough to identify and make the choice myself - at which point the convenience of the AI solution is abnegated because I may as well have written it myself.

(ref: "Code only says what it does" - https://brooker.co.za/blog/2020/06/23/code.html)

And, yes, you can ask an LLM to clarify or explain its choices, but, like I said, the core problem is that they will confidently and convincingly lie to you. I'm not claiming that humans never lie - but a) I think (I hope!) they do it less often than LLMs do, and b) I believe (subjectively) that it tends to be easier to identify when a human is unsure of themself than when an LLM is.

mikeshi42 · 2024-11-04T05:01:50 1730696510

> I can't evaluate the correctness of a choice without either knowing the motivation for it, or by learning the problem domain well enough to identify and make the choice myself - at which point the convenience of the AI solution is abnegated because I may as well have written it myself.

I think I usually accept code that is in the latter - the convenience is I did not need to spend any real energy implementing the solution or thinking too deeply about it. Sometimes the LLM will produce a more interesting approach that I did not consider initially but is actually nicer than what I wanted to do (afaik). Often it does what I want or something similar enough to what I would've written - just that it can do it instantly instead of me manually typing, doc searching, adding types, and correcting the code. If it does something weird that I don't agree with, I instead modify the prompt to align closer to the solution I had in mind. Much like Google, sometimes the first query does not do the trick and a query reformulation is required.

I wouldn't trust an LLM to write large chunks of code that I wouldn't have been able to write/figure out myself - it's more of a coding accelerant than an autonomous engineer for me (maybe that's where our PoVs diverged initially).

I suspect the similarity with PRs is that when I'm assigned a PR, I generally have enough knowledge about the proposed modification to have an opinion on how it should be done and the benefits/drawbacks of each implementation. The divergence from a PR is that I can ask the LLM for a modification of approach with just a few seconds and continue to ask for changes until I'm satisfied (so it doesn't matter if the LLM chose an approach I don't understand - I can just ask it to align with the approach I believe is optimal).

AdieuToLogic · 2024-10-30T03:42:08 1730259728

Multiple times in my s/w development career, I've had supervisors ask me why I am not typing code throughout the work day.

My response each time was along the lines of:

  When I write code, it is to reify the part of a solution which
  I understand.  This includes writing tests to certify same.

  There is no reason to do so before then.

ElFitz · 2024-10-30T10:05:44 1730282744

> He was a bit "get off my lawny" about the idea of running any code you didn't write, especially bash scripts in a terminal.

I hacked together a CLI tool that provides an LLM a CRUD interface to my local file system for, letting it read, write, and execute, code and tests, and feeds it back the commands outputs.

And it was bootstrapped with me playing the role of CLI tool.

Mostly useless, a bit irresponsible, but fun.

fragmede · 2024-10-30T12:04:06 1730289846

If that idea engages you, might take a look at the openinterpreter GitHub.

tripzilch · 2024-11-05T08:53:08 1730796788

> if I'm working on a team and people are writing code how is it any different? Everyone makes mistakes, I make mistakes.

because your colleagues know how to count

and they're not hallucinating while on the job

and if they try to slip an unrelated and subtle bug past you for the fifth time after asking them to do a very basic task, there are actual consequences instead of "we just need to check this colleague's code better"

geysersam · 2024-10-29T22:53:49 1730242429

I'll take a stab at changing your mind.

AIs are not able to write Redis. That's not their job. AIs should not write complex high performance code that millions of users rely on. If the code does something valuable for a large number of people you can afford humans to write it.

AIs should write low value code that just repeats what's been done before but with some variations. Generic parts of CRUD apps, some fraction of typical frontends, common CI setups. That's what they're good at because they've seen it a million times already. That category constitutes most code written.

This relieves human developers of ballpark 20% of their workload and that's already worth a lot of money.

photonthug · 2024-10-30T00:20:53 1730247653

> I'll take a stab at changing your mind.

Not the parent but this doesn’t seem mind changing, because what you describe is the normal/boring route to slightly better productivity using new tools without the breathless hype. And the 20% increase you mention of course depends a lot on what you’re doing, so for many types of work you’d be much closer to zero.

I’m curious about the claims of “power users” that are talking very excitedly about a brave new world. Are they fooling themselves, or trying to fool others, or working at jobs where 90% of their work is boilerplate drudgery, or what exactly? Inevitably it’s all of the above.. and some small percentage of real power users that could probably teach the rest of us cool stuff about their unique workflows. Not sure how to find the signal in all the noise though.

So personally, if I were to write “change my mind”, what I’d really mean is something like “convince me there are real power users already out there in the wild, using tools that are open to the public today”.

GP mentioned machine assisted translation of a huge code base being almost completely hands-off. If that were true and as easy as advertised then one might expect, for example, that it were trivial to just rewrite media wiki or Wordpress in rails or Django with a few people in a week. This is on the easier side of what I’d confidently label as a game-changingly huge productivity boost btw, and is a soft problem chosen because of the availability of existing code examples, mere translation over original work, etc. Not sure we’re there yet.

lucianbr · 2024-10-30T09:28:43 1730280523

> Are they fooling themselves, or trying to fool others, or working at jobs where 90% of their work is boilerplate drudgery, or what exactly?

I wonder about this also. Maybe it's just some of each? Clearly some people fool themselves, the AI companies are doing marketing, some people do have boring jobs...

skywhopper · 2024-10-30T00:57:45 1730249865

I have to disagree. If there’s that much boilerplate floating around then the tooling should be improved. Pasting over inefficiency with sloppier inefficiency is just a pure waste.

geysersam · 2024-10-30T06:18:35 1730269115

It's not boilerplate, it's just uncomplicated but necessary specifications.

AI is the improved tooling.

takeda · 2024-10-30T07:21:06 1730272866

So maybe that's the issue I'm having.

I spent may entire career trying to eliminate such code as much as I can, so then having copilot write code that I have to fix on almost every step. I frequently have to look for subtle issues and few times they sneaked through, when it produces correct code it frequently is often more verbose than my code.

hitradostava · 2024-10-30T07:26:15 1730273175

In a couple of years time I don't see why AI based tooling couldn't write Redis? Would you get a complete Redis produced with a single prompt? Of course not. but if extreme speed is what you want to optimize for, then the tooling needs to be given the right feedback loop to optimize for that.

I think the question to ask is what do I do as a software engineer that couldn't be done by an AI based tool in a few years time? The answer is scary, but exciting.

0xFACEFEED · 2024-10-29T23:18:43 1730243923

I can definitely see the value in letting AI generate low stakes code. I'm a daily CoPilot user and, while I don't let it generate implementations, the suggestions it gives for boilerplate-y things is top notch. Love it as a tool.

My major issue with your position is that, at least in my experience, good software is the sum of even the seemingly low risk parts. When I think of real world software that people rely on (the only type I care about in this context) then it's hard to point a finger at some part of it and go "eh, this part doesn't matter". It all matters.

The alternative, I fear, is 90% of the software we use exhibiting subtle goofy behavior and just being overall unpleasant to use.

I guess an analogy for my concern is what it would look like if 60% of every film was AI generated using the models we have today. Some might argue that 60% of all films are low stakes scenes with simple exposition or whatever. And then remaining 40% are the climax or other important moments. But many people believe that 100% of the film matters - even the opening credits.

And even if none of that were an issue: in my experience it's very difficult to assess what part of an application will/won't be low/high stakes. Imagine being a tech startup that needs to pivot your focus toward the low stakes part of the application that the LLM wrote.

Aeolun · 2024-10-30T00:04:32 1730246672

I think your concept of ‘what the AI wrote’ is too large. There is zero chance my one line copilot or three line cursor tab completions are going to have an effect on the overall quality of my codebase.

What it is useful for is doing exactly the things I already know need to happen, but don’t want to spend the effort to write out (at least, not having to do it is great).

Since my brain and focus aren’t killed by writing crud, I get to spend that on more useful stuff. If it doesn’t make me more effective, at least it makes my job more enjoyable.

0xFACEFEED · 2024-10-30T01:12:01 1730250721

I'm with you. I use Copilot every day in the way you're describing and I love it. The person I was responding to is claiming to code "hands off" and let the AI write the majority of the software.

SoftTalker · 2024-10-30T02:54:36 1730256876

> The alternative, I fear, is 90% of the software we use exhibiting subtle goofy behavior and just being overall unpleasant to use.

This sounds like most software honestly.

0xFACEFEED · 2024-10-30T04:48:53 1730263733

And that's what LLMs are trained on.

Hahaha

sdesol · 2024-10-30T00:21:23 1730247683

> But the idea of letting an LLM write/move large swaths of code seems so incredibly irresponsible.

I do think it is kind of crazy based on what I've seen. I'm convinced LLM is a game changer but I couldn't believe how stupid it can be. Take the following example, which is a spelling and grammar checker that I wrote:

https://app.gitsense.com/?doc=f7419bfb27c8968bae&samples=5

If you click on the sentence, you can see that Claude-3.5 and GPT-4o cannot tell that GitHub is spelled correctly most of the time. It was this example that made me realize how dangerous LLM can be. The sentence is short but Claude-3.5 and GPT-4o just can't process it properly.

Having a LLM rewrite large swaths of code is crazy but I believe with proper tooling to verify and challenge changes, we can mitigate the risk.

I'm just speculating, but I believe GitHub has come to the same conclusion that I have, which is, all models can be stupid, but it is unlikely that all will be stupid at the same time.

sbarre · 2024-10-29T22:46:07 1730241967

I think it depends on the stakes of what you're building.

A lot of the concerns you describe make me think you work in a larger company or team and so both the organizational stakes (maintenance, future changes, tech debt, other people taking it over) and the functional stakes (bug free, performant, secure, etc) are high?

If the person you're responding to is cranking out a personal SaaS project or something they won't ever want to maintain much, then they can do different math on risks.

And probably also the language you're using, and the actual code itself.

Porting a multi-thousand line web SaaS product in Typescript that's just CRUD operations and cranking out web views? Sure why not.

Porting a multi-thousand line game codebase that's performance-critical and written in C++? Probably not.

That said, I am super fascinated by the approach of "let the LLM write the code and coach it when it gets it wrong" and I feel like I want to try that.. But probably not on a work project, and maybe just on a personal project.

tetha · 2024-10-29T23:27:20 1730244440

> Porting a multi-thousand line web SaaS product in Typescript that's just CRUD operations and cranking out web views? Sure why not. > > Porting a multi-thousand line game codebase that's performance-critical and written in C++? Probably not.

From my own experience:

I really enjoy CoPilot to support me writing a terraform provider. I think this works well because we have hundreds of existing terraform providers with the same boilerplate and the same REST-handling already. Here, the LLM can crank out oodles and oodles of identical boilerplate that's easy to review and deal with. Huge productivity boost. Maybe we should have better frameworks and languages for this, but alas...

I've also tried using CoPilot on a personal Godot project. I turned it off after a day, because it was so distracting with nonsense. Thinking about it along these lines, I would not be surprised if this occurred because the high-level code of games (think what AAA games do in Lua, and well what Godot does in GDScript) tends to be small-volume and rather erratic within there. Here there is no real pattern to follow.

This could also be a cause for the huge difference in LLM productivity boosts people report. If you need Spring Boot code to put query params into an ORM and turn that into JSON, it can probably do that. If you need embedded C code for an obscure micro controller.. yeah, good luck.

nottorp · 2024-10-29T23:33:23 1730244803

> If you need embedded C code for an obscure micro controller.. yeah, good luck.

... or even information in the embedded world. LLMs need to generate something, o they'll generate code even when the answer is "no dude, your chip doesn't support that".

scubbo · 2024-10-30T03:38:12 1730259492

> they'll generate code even when the answer is "no dude, your chip doesn't support that".

This is precisely the problem. As I point out elsewhere[0], reviewing AI-generated code is _not_ the same thing as reviewing code written by someone else, because you can ask a human author what they were thinking and get a moderately-honest response; whereas an AI will confidently and convincingly lie to you.

[0] https://news.ycombinator.com/item?id=41991750

ehnto · 2024-10-30T01:51:48 1730253108

I am quite interested in how LLMs would handle game development. Coming to game development from a long career in boutique applications and also enterprise software, game development is a whole different level of "boutique".

I think both because of the coupled, convoluted complexity of much game logic, and because there are fewer open source examples of novel game code available to train on, they may struggle to be as useful.

shadowmnifold · 2024-11-01T10:13:33 1730456013

It is a good example of how we are underestimating the human in the loop.

I know nothing about making a game. I am sure LLMs could help me try to make a game but surely they would help someone who has tried to make a game before more. On the other hand, the expert game developer is probably not helped as much either by the LLM as the person in the middle.

Scale that to basically all subjects. Then we get different opinions on the value of LLMs.

Tiktaalik · 2024-10-30T06:02:07 1730268127

Yeah I think the lack of game code available to train on could be a problem. There's a fair amount of "black art" type problems in games too that a LLM may struggle with just because there's not a lot to go on.

Additionally the problems of custom engines and game specific patterns.

That being said there's parts of games with boilerplate code like any application. In a past game as I was finishing it up some of this AI stuff was first becoming useable and I experimented with generating some boilerplate classes with high level descriptions of what I wanted and it did a pretty decent job.

I think some of the most significant productivity gains for games is going to be less on the code side and more in the technical art space.

0xFACEFEED · 2024-10-29T23:01:09 1730242869

> A lot of the concerns you describe make me think you work in a larger company or team and so both the organizational stakes (maintenance, future changes, tech debt, other people taking it over) and the functional stakes (bug free, performant, secure, etc) are high?

The most financially rewarding project I worked on started out as an early stage startup with small ambitions. It ended up growing and succeeding far beyond expectations.

It was a small codebase but the stakes were still very high. We were all pretty experienced going into it so we each had preferences for which footguns to avoid. For example we shied away from ORMs because they're the kind of dependency that could get you stuck in mud. Pick a "bad" ORM, spend months piling code on top of it, and then find out that you're spending more time fighting it than being productive. But now you don't have the time to untangle yourself from that dependency. Worst of all, at least in our experience, it's impossible to really predict how likely you are to get "stuck" this way with a large dependency. So the judgement call was to avoid major dependencies like this unless we absolutely had to.

I attribute the success of our project to literally thousands of minor and major decisions like that one.

To me almost all software is high stakes. Unless it's so trivial that nothing about it matters at all; but that's not what these AI tools are marketing toward, are they?

Something might start out as a small useful library and grow into a dependency that hundreds of thousands of people use.

So that's why it terrifies me. I'm terrified of one day joining a team or wanting to contribute to an OSS project - only to be faced with thousands of lines of nonsensical autogenerated LLM code. If nothing else it takes all the joy out of programming computers (although I think there's a more existential risk here). If it was a team I'd probably just quit on the spot but I have that luxury and probably would have caught it during due diligence. If it's an OSS project I'd nope out and not contribute.

mistrial9 · 2024-10-30T01:22:29 1730251349

adding here due to some resonance with the point of view.. this exchange lacks crucial axes.. what kind of programming ?

I assume the parent-post is saying "I ported thousands of lines of <some C family executing on a server> to <python on standard cloud environments>. I could be very wrong but that is my guess. Like any data-driven software machinery, there is massive inherent bias and extra resources for <current in-demand thing> in this guess-case it is python that runs on a standard cloud environment with the loaders and credentials parts too perhaps.

Those who learned programming in the theoretic ways know that many, many software systems are possible in various compute contexts. And those working on hardware teams know that there are a lot of kinds of computing hardware. And to add another off-the-cuff idea, so much web interface ala 2004 code to bring to newer, cleaner setups.

I am not <emotional state descriptor> about this sea change in code generation, but actually code generation is not at all new. It is the blatent stealing and LICENSE washing of a generation of OSS that gets me, actually. Those code generation machines are repeating their inputs. No authors agreed and no one asked them, either.

mithametacs · 2024-10-29T23:14:47 1730243687

You still use type systems, tests, and code review.

For a lot of use cases it's powerful.

If you ask it to build out a brand new system with a complex algorithm or to perform a more complex refactoring, it'll be more work correcting it than doing it yourself.

But that malformed JSON document with the weird missing quotation marks (so the usual formatters break), and spaces before commas, and the indentation is wild... Give it to an LLM.

Or when you're writing content impls for a game based on a list of text descriptions, copy the text into a block comment. Then impl 1 example. Then just sit back and press tab and watch your profits.

girvo · 2024-10-30T00:15:28 1730247328

The (mostly useless boilerplate “I’m basically just testing my mocks”) tests are being written by AI too these days.

Which is mildly annoying as a lot of those tests are basically just noise rather than useful tools. Humans have the same problem, but current models are especially prone to it from what I’ve observed

And not enough devs are babysitting the AI to make sure the test cases are useful, even if they’re doing so for the original code it produced

chillfox · 2024-10-30T01:27:10 1730251630

There are very few tutorials on how to do testing and I don't think I have ever seen one that was great. Compared to general coding stuff where there's great tutorials available for all the most common things.

So I think quality testing is just not in the training data at anywhere close to the quantity needed.

fullstackchris · 2024-10-30T08:00:54 1730275254

Testing well is both an art and a science, and I mean, just look at the dev community on the topic, some are religious about TDD, some say unit tests only, some say the whole range to e2e etc. etc. hard to have good training data when there is no definition of what is "right" in the first place!

KronisLV · 2024-10-30T07:40:20 1730274020

> I think about what other people (or future versions of myself) will struggle with when interacting with the code.

This feels like the sign of a good developer!

On the other hand, sometimes you just need executable line noise that gets the job done by Thursday so you can ship it and think about refactoring later.

As far as AI code goes, more often than not, it will read as something very generic, which is not necessarily a bad thing. When opening yet another Java CRUD project, I’d be more happy to see someone copy and pasting working code from tutorials or resources online (while it still works correctly), as opposed to seeing people develop bespoke systems on top of what a framework provides for every project.

Etherlord87 · 2024-10-30T08:17:49 1730276269

> On the other hand, sometimes you just need executable line noise that gets the job done by Thursday so you can ship it and think about refactoring later.

This is a problem too. ChatGPT enables you to write bad code.

It's like Adobe Flash: flash using websites didn't have to be slow, but it was easy to make a slow website with it.

JoeyJoJoJr · 2024-10-30T09:01:12 1730278872

It’s also like Macromedia Flash in that it is a highly creative force, but people who don’t get it or can’t make it work for them will complain.

Cthulhu_ · 2024-10-30T12:02:24 1730289744

> Change my mind.

Imagine the LLM is another developer and you're responsible for reviewing their code. Would you think of them the same thing?

While I don't like AI either, I feel a lot of the fear around it is just that - fear and distrust that there will be bugs, some more subtle than others. But that's true for code written by anyone, isn't it?

It's just that you're responsible for the code written by something or someone else, and that can be scary.

But remember that no code should make it to production without human and automated review. Trust the system.

DeathArrow · 2024-10-30T07:01:38 1730271698

> Whenever I sit down to write some code, be it a large implementation or a small function, I think about what other people (or future versions of myself) will struggle with when interacting with the code. Is it clear and concise? Is it too clever? Is it too easy to write a subtle bug when making changes? Have I made it totally clear that X is relying on Y dangerous behavior by adding a comment or intentionally making it visible in some other way?

Over 20 years of experience, too, but I quit doing that for work. Nobody really really cares, all they care is about time to market and having features they've sold yesterday to customers being done today.

As long as I follow some mental models and some rules, the code is reasonably well written and there is no need to procrastinate and think too much.

When I write code for myself, or I am contributing to a small project with a small number of contributors, then things change. If I can afford and I like it I am not only willing to assure things are carefully thought out, but also I am willing to experiment and test until I am sure that I use the best variant I can come up with. Like going from 99% to 99.9%, even if it wouldn't matter in practice. Just for fun.

As a manager, I wouldn't ask people to write perfect code, nor I would like them to ship buggy code very fast, but ship reasonably good code as fast as they can write reasonable good code.

munksbeer · 2024-10-30T12:49:42 1730292582

> Over 20 years of experience, too, but I quit doing that for work. Nobody really really cares, all they care is about time to market and having features they've sold yesterday to customers being done today.

I don't recognise this.

Or at least, I recognise that it can be that way but not always. In places I've worked, I tend to have worked with teams that care deeply about this. But we're not writing CRUD apps or web systems, or inventory management, or whatever. We're writing trading systems. I absolutely want to be working with code that we can understand in a hurry (and I mean, a real hurry) when things go wrong, and that we can change and/or fix in a hurry.

So some of us really do care.

DeathArrow · 2024-10-31T07:29:54 1730359794

> We're writing trading systems

If you write critical systems executing trades, managing traffic lights, landing a rover on the moon, then you should take your time and write the best possible version.

Our code is both easy to read and easy to modify because that allows us to add features fast. It is not the very best possible version of what we can do, because that would cost us much more time.

The code has few bugs, which are mostly caught by the QA teams, is reasonably fast. Maybe not the most elegant, not engineered to take into account future use cases and we push to eliminate from AC some very rare use cases,that will take too much time to implement. Maybe the code it's not the most resource efficient.

But the key aspect is we focus on delivering the most features possible from what customers need in the limited amount of time we have and with the limited manpower we have.

Company is owned by some private equity group and their focus is solely growing the customer base while paying as little as possible. Last year they fired 25% of personnel because their ARR was missing a few millions.

Newertheless, most companies I worked before were in the hurry. With the exception of a very small company where I could work however I see fit.

eru · 2024-10-30T02:37:03 1730255823

> But the idea of letting an LLM write/move large swaths of code seems so incredibly irresponsible.

Why? Presumably you let your coworkers move code around, too, and then you review it? (And vice versa.)

outside1234 · 2024-10-30T01:36:11 1730252171

The saying "You can delegate tasks but not responsibility" comes to mind.

You are still responsible for the code AI is writing. It is just that writing code with AI is more like reviewing a PR now.

nonethewiser · 2024-10-30T17:03:51 1730307831

> But the idea of letting an LLM write/move large swaths of code seems so incredibly irresponsible. Whenever I sit down to write some code, be it a large implementation or a small function, I think about what other people (or future versions of myself) will struggle with when interacting with the code. Is it clear and concise? Is it too clever?

I agree with our conclusion but not your supporting evidence. Not only can you read it to answer all these questions, but you can BETTER answer these questions from reading it. Because you are already looking at it from the perspective you are trying optimize for (future reader).

What is less clear is if it handles all the edge cases correctly. In theory these should all be tested, but many of them cannot even be identified without thinking through the idiosyncrasies of the code which is a natural byproduct of writing code.

lljk_kennedy · 2024-10-30T09:56:47 1730282207

It's like worrying about moving bits on a hard drive, or writing nice machine code. Eventually you just won't care. AI / LLMs interacting with code bases in future won't care about structure, clearness, conciseness etc. They'll tokenize it all the same.

epolanski · 2024-10-30T02:37:23 1730255843

> Change my mind.

Unit, integration, e2e, types and linters would catch most of the things you mention.

Not every software is mission critical, often the most important thing is to go as fast and possible and iterate very quickly. Good enough is better than very good in many cases.

almostdeadguy · 2024-10-30T02:44:30 1730256270

> Unit, integration, e2e, types and linters would catch most of the things you mention.

Who’s writing those?

fullstackchris · 2024-10-30T08:02:59 1730275379

Lots of people. For certain types of software (ISO) they are required.

But I'm in the boat (and also experienced many times first hand) all those tests you write will by definition, never test against that first production bug you get :)

almostdeadguy · 2024-11-08T13:31:47 1731072707

My point was not to question that people would write tests, the point I'm making is that it's tempting to generate both code and tests once you start using an LLM to generate large swaths of code, and then the assurance that tests give you goes out the window.

I'm not convinced that using AI as more than auto-complete is really a viable solution, because you can't shortcut an understanding of the problem domain to be assured of the correctness of code (and at that point the AI has mostly saved you some typing). The theory-crafting process of building software is probably the most important aspect of it. It not only provides assurance of the correctness of what you're building, it provides feedback into product development (restrictions, pushback that suggests alternate ways of doing things, etc.).

afro88 · 2024-10-30T03:19:32 1730258372

> But the idea of letting an LLM write/move large swaths of code seems so incredibly irresponsible.

I think this is where the bimodality comes from. When someone says "I used AI to refactor 3000 loc" some take it to mean they used AI in small steps as an accellerator, and others take it to mean a direct copy/paste, fix compile errors and move on.

Treat AI like a mid level engineer that you are pair programming with, who can type insanely fast. Move in small steps. Read through it's code after each small iteration. Ask it to fix things (or fix them yourself if quick and easy). Brainstorm ideas with it etc etc.

nwienert · 2024-10-30T04:01:29 1730260889

It’s really far from mid level. It’s a weird mix of expert at things it trained on, and complete misleading idiot at anything outside.

For a bash script or the first steps of something simple it’s great.

For anything complex at all it’s worse than nothing.

anonzzzies · 2024-10-30T07:24:31 1730273071

Works well for us nonetheless, also on more complex things. It's not worse than most (including seniors) humans I worked with in the past 40 years, but it is faster and cheaper. On HN it is sometimes forgotten that by far most programmers do not like it; they need money. If you see what comes out of them, you have to puke; yet it's running billion$ businesses and works surprisingly well considering the bad code quality.

nwienert · 2024-11-03T02:23:58 1730600638

It's quite literally incapable of solving many very mid-level things, no matter how much you help it. It's not a reasoning machine, it's basically a different way to search existing answers.

afro88 · 2024-10-30T04:18:29 1730261909

For anything complex, move in small steps.

For anything truly novel, or on a codebase with a very bespoke in house architecture or DSL, yeah you won't get much out of it.

nwienert · 2024-10-30T15:22:00 1730301720

Even in small steps, it fails. I have two cases I test with, nothing special, just some TS generics in one instance and a schema-to-schema mapping tool in another. Both things that Junior devs could do given a couple days, even though they'd need to study and figure out various pieces.

o1 can't get either, no matter how much I break it down, no matter how much prodding. In fact the more you try the worse it gets. And yes I do try starting new conversations and splitting it out. Simply does not help, at all.

It's not to say it isn't really helpful for really simple things. Or even complex things but that are directly in the training set. But the second you go outside that, it's terrible.

DeathArrow · 2024-10-30T07:04:57 1730271897

>Change my mind.

Nobody pays for splendid code that isn't in production. They will gladly pay for buggy code that is in production and solves their needs as long as the marketing team does a good job.

ianbutler · 2024-10-29T22:46:29 1730241989

I have 10 years professional experience and I've been writing code for 20 years, really with this workflow I just read and review significantly more code and I coach it when it structures or styles something in a way I don't like.

I'm fully in control and nothing gets committed I haven't read its an extension of me at that point.

Edit: I think the issues you've mentioned typically apply to people too and the answer is largely the same. Talk, coach, put hard fixes in like linting and review approvals.

Aeolun · 2024-10-30T00:06:21 1730246781

> Talk, coach, put hard fixes in like linting and review approvals.

And sometimes, when all that doesn’t work? Just do it yourself :)

frankdenbow · 2024-10-30T11:06:52 1730286412

"cobbled together by an AI"

It will be as cobbled together as the thoughtfulness of the person in charge of the code. Same as if they wrote it themselves.

HarHarVeryFunny · 2024-10-30T13:01:56 1730293316

Maybe in the future when we have AGI, but not at the moment.

Did you read yesterday's "How I code using Cursor" thread:

https://news.ycombinator.com/item?id=41979203

The "Changes to my workflow" part is most relevant, and would be more accurately titled "How Cursor writes code differently to me [a senior developer]".

For example:

1) Cursor/AI more likely to reinvent the wheel and write code from scratch rather than use support libraries. Good to avoid dependencies I suppose, but widely used specialized libraries are likely to be debugged, and mature - able to handle corner cases gracefully, etc. AI "writes code" by regenerating stuff from it's training data - akin to cut and pasting from Stack Overflow, etc. If you're using this for a throwaway prototype or personal project then maybe you don't care as long as it works most of the time, but for corporate production use this is a liability.

2) AI more likely to generate repetitive code rather than write reusable functions (which he spins as avoiding abstractions) means code that is harder to read, debug and maintain. It's like avoiding global symbolic constants and defining them multiple times throughout your code instead. This wouldn't pass typical human code review. When future you, or a co-worker, maybe using a different editor/IDE, fixes a bug, they may not realize that the same bug has been repeated multiple times throughout the code, rather than fixing it once in a function.

We don't have human level AGI yet, far from it, and the code that today's AI generates reflects that. This isn't code that an experienced developer would write - this is LLM generated code, which means it's either regurgitated as-as from some unknown internet source, or worse yet (and probably more typical?) is a mashup of multiple sources, where the LLM may well have introduced it's own bugs in addition to those present in the original sources.

thesz · 2024-10-30T21:20:44 1730323244

You are absolutely right.

AI is used to generate new code, not to reduce its size through rewrite [1].

[1] https://www.folklore.org/Negative_2000_Lines_Of_Code.html

The code is always a burden, a legacy. To support it, one need to look after it, take care of it, bear it's weight.

I have 35+ years of experience on my shoulders. I always celebrate when I can reduce code - and, recently, I do that more and more often, being a support person for a large code base with a 20+ years of history.

PS You were answering a comment that brags about migrating multithousand LOC code bases from language to the other. Recently I had to review a file that is more than (metric) megabyte (10^6 bytes) and more than 32 thousands LOC in size. I need to find a way to apply a fix for a problem triggered by single statement in several thousands (more that 1M of LOCs) of test cases.

shinycode · 2024-10-30T12:50:28 1730292628

I have a colleague that did it also, moving parts of code and « writing » code quickly with copilot. Because it’s easier to overlook LLM updates he riddled the code with bugs. Subtle things that we undercover later, now that he’s gone. When you write everything yourself you are more keen to think deeply about changes. I read today that Google has 25% of their code written by AI. They have an history of trashing huge projects and the quality of their services is getting worse over time. Maybe the industry is going to move to « let’s trash the codebase and ask cGPT 8 to write everything with this new framework »… OP said he’s talking to AI like guiding an other dev. Isn’t he afraid that he will loose the ability to think about solutions for himself ? That’s a trained part of the brain that we can « loose » no ?

peab · 2024-10-30T14:35:35 1730298935

It really depends what you're building.

If you're building code that's going to go in some medical system, or a space shuttle, then yeah, you probably want to write every small function with great detail.

If you're creating some silly consumer app like a "what will your baby look like in 5 years", then code quality doesn't matter you just need to ship fast. Most startups just need to ship fast to validate some ideas, 99% of your code will be deprecated within a few months

onion2k · 2024-10-30T11:01:54 1730286114

The whole thing just sounds like a gargantuan mess.

Most apps are a gargantuan mess. It's just a mess that mostly works. In a typical large scale web app written in something like Node or PHP, I wouldn't be at all surprised if 95% of the code is brought in from libraries that the dev team don't review. They have no idea about the quality of the code they're running. I don't see why adding AI to the mix makes much of a difference.

AdieuToLogic · 2024-10-30T03:25:39 1730258739

> Whenever I sit down to write some code, be it a large implementation or a small function, I think about what other people (or future versions of myself) will struggle with when interacting with the code. Is it clear and concise? Is it too clever? Is it too easy to write a subtle bug when making changes? Have I made it totally clear that X is relying on Y dangerous behavior by adding a comment or intentionally making it visible in some other way?

> It goes the other way too. If I know someone well (or their style) then it makes evaluating their code easier. The more time I spend in a codebase the better idea I have of what the writer was trying to do.

What I believe you are describing is a general definition of "understanding", which I am sure you are aware. And given your 20+ year experience, your summary of:

> So the thought of opening up a codebase that was cobbled together by an AI is just scary to me. Subtle bugs and errors would be equally distributed across the whole thing instead of where the writer was less competent (as is often the case).

Is not only entirely understandable (pardon the pun), but to be expected as algorithms employed lack the crucial bit which you identify - understanding.

> The whole thing just sounds like a gargantuan mess.

As it does to most whom envision having to live with artifacts produced by a statistical predictive text algorithm.

> Change my mind.

One cannot because understanding, as people know it, is intrinsic to each person by definition. It exists as a concept within the person whom possesses it and is defined entirely by said person.

otikik · 2024-10-30T09:11:36 1730279496

Agree that it is a mess.

If I know that someone is using an llm to produce code, I think it is only fair that I use an LLM to review the code too.

If you want me to put the work as a reviewer, you'd better put the work as a writer.

Cthulhu_ · 2024-10-30T12:06:06 1730289966

I don't know if you jest but this is likely the next stage, to be released within the next months, that is, AIs doing the first rounds of code reviews. It'll likely be from github / microsoft as they have one of the biggest code review datasets around.

svieira · 2024-10-30T13:59:11 1730296751

This is already happening - I recently saw a resume which included "Added AI-driven code reviews" as an accomplishment bullet point (the person was working for a large consulting firm).

TheNewsIsHere · 2024-10-30T11:23:09 1730287389

I’m with you on this, at least in spirit.

I’ve tried AI coding assistance tools multiple times. Notably with Ansible and then with some AWS stuff using Amazon Q. I decided I wanted to be curious and see how it went.

With Ansible, it was an unusable mess. Ansible is a bit of an odd bird of a DSL though. You really need to be on top of versions and modules. It’s not well suited for AI because it requires a lot of nuanced understanding. There’s no one resource that will work (effectively) forever, like C or something.

With AWS’s Amazon Q, I started out simple by having it write an IAM policy that I was having a hard time wrapping my head around. It wasn’t very helpful because the policies it provided used conditional keys that weren’t supported in the service that was addressed by the policy.

I’ve found I can typically work with higher quality and less fixing by just writing it myself. I could babysit and teach an AI, but at that point what’s the point?

I’m also unconvinced it’s worth the environmental impact, especially if I need to tutor it or mark up its output anyway.

In any event, it’s easy enough to outsmart/out-clever oneself it a colleague (and vice versa). Adding AI to that just seems like adding a chaotic junior developer to that equation.

OJFord · 2024-10-30T02:17:02 1730254622

> As a programmer of over 20 years - this is terrifying. > > I'm willing to accept that I just have "get off my lawn" syndrome or something. > > But the idea of letting an LLM write/move large swaths of code seems so incredibly irresponsible.

My first thought was that I disagree (though I don't use or like this in-IDE AI stuff) because version control. But then the way people use (or can't use) SVC 'terrifies' me anyway, so maybe I agree? It would be fine correctly handled, but it won't be, sort of thing.

com2kid · 2024-10-30T02:26:26 1730255186

> But the idea of letting an LLM write/move large swaths of code seems so incredibly irresponsible.

People felt the same about compilers for a long time. And justifiably so, the idea that compilers are reliable is quite a new one, finding compilers bugs used to be pretty common. (Those experimenting with newer languages still get to enjoy the fun of this!)

How about other code generation tools? Presumably you don't take much umbrage with schema generators? Or code generators that take a scheme and output library code (OpenAPI, Protocol buffers, or even COM)? Those can easily take a few dozen lines of input and output many thousands of LoC, and because they are part of an automated pipeline, even if you do want to fix the code up, any fixes you make will be destroyed on the next pipeline run!

But there is also a LOT of boring boilerplate code that can be automated.

For example, the necessary code to create a new server, attach a JSON schema to a POST endpoint, validate a bearer token, and enable a given CORS config is pretty cut and dry.

If I am ramping up on a new backend framework, I can either spend hours learning the above and then copy and paste it forever more into each new project I start up, or I can use an AI to crap the code out for me.

(Actually once I was setting up a new server and I decided to not just copy and paste and to do it myself, I flipped the order of two `use` directives and it cost me at least 4 hours to figure out WTF was wrong....)

> As a programmer of over 20 years

I'm almost up there, and my view is that I have two modes of working:

1. Super low level, where my intimate knowledge of algorithms, the language and framework I'm using, of CPU and memory constraints, all come together to let me write code that is damn near magical.

2. Super high level, where I am architecting a solution using design patterns and the individual pieces of code are functionally very simple, and it is how they are connected together that really matters.

For #1, eh, for some popular problems AI can help (popular optimizations on Stack Overflow).

For #2, AI is the most useful, because I have already broken the problem down into individual bite size testable nuggets. I can have the AI write a lot of the boilerplate, and then integrate the code within the larger, human architected, system.

> So the thought of opening up a codebase that was cobbled together by an AI is just scary to me.

The AI didn't cobble together the system. The AI did stuff like "go through this array and check the ID field of each object and if more than 3 of them are null log an error, increment the ExcessNullsEncountered metric counter, and return an HTTP 400 error to the caller"

Edit: This just happened

I am writing a small Canvas game renderer, and I am having an issue with text above a character's head renders off the canvas. So I had Cursor fix the function up to move text under a character if it would have been rendered above the canvas area.

I was able to write the instructions out to Cursor faster than I could have found a pencil and paper to sketch out what I needed to do.

bongodongobob · 2024-10-29T22:17:31 1730240251

You. Can. Write. Tests.

0xFACEFEED · 2024-10-29T22:39:49 1730241589

How do tests account for cases where I'm looking at a 100 line function that could have easily been written in 20 lines with just as much, if not more, clarity?

It reminds me of a time (long ago) when the trend/fad was building applications visually. You would drag and drop UI elements and define logic using GUIs. Behind the scenes the IDE would generate code that linked everything together. One of the selling points was that underneath the hood it's just code so if someone didn't have access to the IDE (or whatever) then they could just open the source and make edits themselves.

It obviously didn't work out. But not because of the scope/scale (something AI code generation solves) but because, it turns out, writing maintainable secure software takes a lot of careful thought.

I'm not talking about asking an AI to vomit out a CRUD UI. For that I'm sure it's well suited and the risk is pretty low. But as soon as you introduce domain specific logic or non-trivial things connected to the real world - it requires thought. Often times you need to spend more time thinking about the problem than writing the code.

I just don't see how "guidance" of an LLM gets anywhere near writing good software outside of trivial stuff.

Aeolun · 2024-10-30T00:08:07 1730246887

> How do tests account for cases where I'm looking at a 100 line function that could have easily been written in 20 lines with just as much, if not more, clarity?

That’s not a failure of the AI writing that 100 line monstrosity, it’s a failure of you deciding to actually use the thing.

If you know what 20 lines are necessary and the AI doesn’t output that, why would you use it?

pama · 2024-10-30T01:05:45 1730250345

> How do tests account for cases where I'm looking at a 100 line function that could have easily been written in 20 lines with just as much, if not more, clarity?

If the function is fast to evaluate and you have thorough coverage by tests, you couod iterate on an LLMs that aims to compress it down to a simpler / shorter version that behaves identical to the original function. Of course brevity for the sake of brevity can lead to less code that is not always more clear or simpler to understand than the original —LLMs are very good at mimicing code style, so show them a lot of your own code and ask them to mimic it and you may be surprized.

fullstackchris · 2024-10-30T02:51:36 1730256696

Finally found a comment down here that I like. I'm also with the notion of tests and also iterating until you get to a solution you like. I also don't see anything particularly "terrifying" that many other comments suggest.

At the end of the day, we're engineers that write complex symbols on a 2d canvas, for something that is (ultimately, even if the code being written is machine to machine or something) used for some human purpose.

Now, if those complex symbols are readable, fully covered in tests, and meets requirements / specifications, I don't see why I should care if a human, an AI, or a monkey generated those symbols. If it meets the spec, it meets the spec.

Seems like most people in these threads are making arguments against others who are describing usage of these tools in a grossly incorrect manner from the get go.

I've said it before in other AI threads that I think (at least half?) of the noise and disagreement around AI generated code is like a bunch of people trying to use a hammer when they needed a screwdriver and then complaining that the hammer didnt work like a screwdriver!!! I just don't get it. When you're dealing with complex systems, i.e, reality, these tools (or any tool for that matter) will never work like a magic wand.

nprateem · 2024-10-30T06:27:25 1730269645

I'm sure people who are "terrified" either haven't really tried AI or are so attached to their intellect their egos won't allow them to admit that there's little value now in a lot of the stuff they've memorised over the last few years.

I think this egoic threat is the biggest influence on this kind of thinking tbh.

scubbo · 2024-10-30T03:41:16 1730259676

> a bunch of people trying to use a hammer when they needed a screwdriver and then complaining that the hammer didnt work like a screwdriver

When it's being sold as a screwdriver, that's hardly their fault.

fullstackchris · 2024-10-30T08:05:05 1730275505

I'll give you that! Too much of this stuff is sold as the "magic wand" solution... I guess marketing for many products has been like that for a long time...

nprateem · 2024-10-30T06:23:27 1730269407

Sure but you go fast on the simple parts it's good at and slow on the novel/critical part. It's not that hard to understand. You don't drive at top speed everywhere either. You go slower depending on the context.

The real problem with AI coding is not knowing in advance the cases where it's going to spin its wheels and go in a cycle of stupid answers. I lose 20 minutes at a time that way because it seems like it needs just one more prompt, but in the end I have to step in, either with code or telling it specifically where the bug is.

hakunin · 2024-10-29T22:39:43 1730241583

How do you write a test for code clarity / readability / maintainability?

pnut · 2024-10-30T13:18:58 1730294338

If the code is a generated artefact and conforms to the test cases, who cares?

What maintenance is required when the codebase is regenerated from scratch on each build?

lanternfish · 2024-10-29T22:35:43 1730241343

Tests aren't a full solution for all the considerations of the above post.

TeMPOraL · 2024-10-29T23:36:37 1730244997

More importantly, you can read diffs.

Depending on whether I'm using LLMs from my Emacs or via a tool like Aider, I either review and manually merge offered modifications as diffs (in editor), or review the automatically generated commits (Aider). Either way, I end up reading a lot of diffs and massaging the LLM output on the fly, and nothing that I haven't reviewed gets pushed to upstream.

I mean, people aren't seriously pushing unreviewed LLM-generated code to production? Current models aren't good enough for that.

fzeroracer · 2024-10-30T02:47:06 1730256426

The most common failure of TDD is that assuming just bolting on more tests will fix the problem of a poorly designed codebase.

ok_dad · 2024-10-29T22:35:02 1730241302

Tests haven’t saved us so far, humans have been writing tests that passed for software with bugs for decades.

the_real_cher · 2024-10-29T22:36:20 1730241380

Even better you can let the AI write tests.

blitzar · 2024-10-29T22:35:45 1730241345

Just let the LLM do that too.

bongodongobob · 2024-10-29T22:27:54 1730240874

My theory is grammatical correctness and specificity. I see a lot of people prompt like this:

"use python to write me a prog that does some dice rolls and makes a graph"

Vs

"Create a Python program that generates random numbers to simulate a series of dice rolls. Export a graph of the results in PNG format."

Information theory requires that you provide enough actual information. There is a minimum amount of work to supply the input. Otherwise, the gaps will get filled in with noise, working, what you want, or not.

For example, maybe someday you could say "write me an OS" and it would work. However, to get exactly what you want, you still have to specify it. You can only compress so far.

foobiekr · 2024-10-30T01:12:02 1730250722

The most likely explanation is that the code you are writing has low information density and is stringing things together the same way many existing apps have already done.

That isn’t a judgement but trying to use the ai code completion tools for complex systems tasks is almost always a disaster.

fullstackchris · 2024-10-30T08:10:05 1730275805

Not sure what you mean by "complex systems tasks" but most of the leading models have helped me with writing concurrent go code just fine. Not sure if that counts as "complex" enough. However this was prompting, not completion. Obviously I expect something like copilot to pick the normie non-concurrent implementation

foobiekr · 2024-10-30T16:32:25 1730305945

It doesn’t. You’re kind of illustrating what I’m getting at.

__float · 2024-10-29T21:59:19 1730239159

I'm not sure how many people are like me, but my attempts to use Copilot have largely been the context of writing code as usual, occasionally getting end-of-line or handful-of-lines completions from it. I suspect there's probably a bigger shift needed, but I haven't seen anyone (besides AI "influencers" I don't trust..?) showing what their day-to-day workflows look like.

Is there a Vimcasts equivalent for learning the AI editor tips and tricks?

sbarre · 2024-10-29T22:49:54 1730242194

Have you tried the chat mode?

The autocomplete is somewhere between annoying and underwhelming for me, but the chat is super useful. Being able to just describe what you're thinking or what you're trying to do and having a bespoke code sample just show up (based on the code in your editor) that you can then either copy/paste in, cherry-pick from or just get inspired by, has been a great productivity booster..

Treat it like a pair programmer or a rubber duck and you might have a better experience. I did!

whimsicalism · 2024-10-29T23:07:02 1730243222

Yeah using a chat interface

hitradostava · 2024-10-30T07:17:59 1730272679

I agree with you and its confusing to me. I do think there is a lot of emotion at play here - rather than cold rationality.

Using LLM based tools effectively requires a change in workflow that a lot of people aren't ready to try. Everyone can share their anecdote of how an LLM has produced stupid or buggy code, but there is way too much focus on what we are now, rather than the direction of travel.

I think existing models are already sufficient, its just we need to improve the feedback loop. A lot of the corrections / direction I make to LLM produced code could 100% be done by a better LLM agent. In the next year I can imagine tooling that: - lets me interact fully via voice - a separate "architecture" agent ensures that any produced code is in line with the patterns in a particular repo - compile and runtime errors are automatically fed back in and automatically fixed - a refactoring workflow mode, where the aim is to first get tests written, then get the code working, and then get the code efficient, clean and with repo patterns

I'm excited by this direction of travel, but I do think it will fundamentally change software engineering in a way that is scary.

RheingoldRiver · 2024-10-30T07:19:46 1730272786

> Using LLM based tools effectively requires a change in workflow that a lot of people aren't ready to try

This is a REALLY good summary of it I think. If you lose your patience with people, you'll lose your patience with AI tooling, because AI interaction is fundamentally so similar to interacting with other people

hitradostava · 2024-10-30T07:29:38 1730273378

Exactly, and LLM based tools can be very frustrating right now - but if you view the tooling as a very fast junior developer with very broad but shallow knowledge then you can develop a workflow which for many (but not all) tasks is much much faster writing code by hand.

zeroonetwothree · 2024-10-29T22:01:29 1730239289

I guess for me it actually takes longer to review code than to write it. So maybe that’s some of the difference.

2024user · 2024-10-29T23:20:14 1730244014

If you're doing something that appears in it's training model a lot, like building a twitter clone, then it is great. If you're using something brand new like react router 7 then it makes mistakes

Myrmornis · 2024-10-30T06:53:42 1730271222

I think it's bimodal because there's a roughly bimodal distribution of high level attitudes among programmers. There's one clump that are willing to be humble and interact with the AI in a thoughtful, careful manner, acknowledging that it might be smarter than them (e.g. see Terry Tao's comments regarding mathematics usage about how in order to get good results he takes care with what he puts in (and imagine what "care" means for a professional mathematician!)) and there's another clump who aren't.

DeathArrow · 2024-10-30T06:48:46 1730270926

>My theory is the willingness to baby sit and the modality. I'm perfectly fine telling the tool I use its errors and working side by side with it like it was another person.

In my experience, baby sitting the AI takes to much time and effort. I'd rather do it myself and use AI for tasks I don't have to babysit.

tartoran · 2024-10-30T16:43:03 1730306583

> Im actually very curious why AI use is such a bi-modal experience. I've used AI to move multi thousand line codebases between languages. I've created new apps from scratch with it.

I think this depends on the nature of your work. I've been successful with using LLMS for creating things from scratch, for myself, especially in a domain I was not familiar with and am quite happy with that. Things like proof of concepts or exploring a library or a framework. But in my current work setting, relying on LLMS to do production work is only somewhat helpful here and there but nowhere near as helpful as in the first case. In some cases it hallucinated so close to what it was supposed to do that it introduced a bug I would have never created had I not used LLMs and that took a lot of effort to spot.

Tainnor · 2024-10-30T09:59:49 1730282389

> I've created new apps from scratch with it.

I don't think that "creating new apps from scratch" should be the benchmark. Unless you're doing something very novel, creating a new app/service is rather formulaic. Many frameworks even have templates / generators for that sort of thing. LLMs are maybe just better generators - which is not useless, but it's not where the real complexity of software development lies.

The success stories I am looking for are things like "I migrated a Java 6 codebase with a legacy app server to Java 21", "I ripped out this unsupported library across the project and replaced it with a better one" or "I refactored the codebase so that the database access is in its own layer". If LLMs can do those tasks reliably, then I'll have another look.

ta12653421 · 2024-10-30T13:54:21 1730296461

Well, for the latter parts they do this already by a perfection of 90%+, i'd say - finally you are glueing together the pieces you get from the LLM.

ClaudeAI Pro rocks, since you can create Projects/Worksapces and put documents into.

Let me ask: How much did you already work with LLM like Claude?

dools · 2024-10-30T02:36:31 1730255791

> I'm actually very curious why AI use is such a bi-modal experience

I think it's just that it's better at some things than others. Lucky for people who happen to be working in python/node/php/bash/sql/java probably unlucky for people writing Go and Rust (I'm hypothesising because I don't know Go or Rust nor have I ever used them but when the AI doesn't know something it REALLY doesn't know it, like it goes from being insanely useful to utterly useless).

> I use AI autocomplete 0% of the time as I found that workflow was not as effective as me just writing code, but most of my most successful work using AI is a chat dialogue where I'm letting it build large swaths of the project a file or parts of a file at a time, with me reviewing and coaching.

Me too, the way I use it is more like pair programming.

alaaalawi · 2024-10-31T16:26:41 1730392001

MMV But for me at least i tend to use it for brain storming, aka intial sailing through a subject/topic/task, getting intial idea. the idea is to use as an admin who is guided by you throgh chatting. for example im given a task to translate a user description/requirement to pull something from the database. like (simplistic example) what are the top grossing films by category within each rating. so igive the AI the database tables schema and give it literally the user requirement. and see what it gave back and compare it with how I'll do it. ask it more for optimizations what else can be done more.... etc.. keep chating with the AI until I'm bored ;)

SergeyHack · 2024-10-31T22:11:36 1730412696

The same here. It helps my slow autistic context switching a lot.

It's easier for me to recognize the ideas, concepts, keywords it has written than to recall them from memory when my "memory cache" is empty.

zkry · 2024-10-30T06:19:38 1730269178

I'm curious coming from the other end. I guess I can totally understand certain use cases where I'm generating fairly simple, self contained code in a language I'm unfamiliar with being good.

But surely you must have experienced something where you're literally fighting with the model, where it continuously repeats its mistakes, and fixing a mistake in one place, breaks something else, and you can't seem to escape this loop. You then get desperate, invoking magic phrases like "you think through your problems step by step", or "you are a senior developer", only for it to loose the entire thread of the conversation.

Then the worst part is when you finally give up, your mental state of the problem is no better than when you first started off.

Vegenoid · 2024-10-30T07:23:46 1730273026

This is my experience. I’d love to see some full streams of people building whole useful apps from scratch with an LLM, does anyone have any good examples?

IgorPartola · 2024-10-30T11:48:41 1730288921

> I've used AI to move multi thousand line codebases between languages.

And are you certain you’ve reviewed all use cases to make sure no errors were introduced?

I recently tried using Google’s AI assistant for some basic things like creating a function that could parse storage size in the format of 12KB, or 34TB into an actual number of bytes. It confidently gave me amount, units = s.split() which just is not correct. Even added a comment explaining what that line is meant to do.

This was an obvious case that just didn’t work. But imagine it did work but flew into an infinite loop on an input like “12KB7” or some such.

DirkH · 2024-10-30T15:32:51 1730302371

I'm convinced what we are witnessing is that there are genius level engineers (lots of them) that are and many have always been sub-par communicators. I think being a good communicator tracks really well with how much someone can get out of LLMs (as does engineering competency. You need both).

Great but not genius engineers who are also great communicators may broadly outperform people with only technical genius soon, but that's speculation on my part.

tivert · 2024-10-30T05:30:50 1730266250

> ...and I can review code very quickly so the overall productivity boost has been great.

Color me skeptical. After a certain point, greater speed is achieved by sacrificing accuracy and comprehension. So, "I can review code very quickly" starts to sound like "I don't read, I skim."

IMHO, reviewing code is one of the parts of the job that sucks, so I see "AI" as a wonderful technology to improve our lives by replacing fun with chores.

ta12653421 · 2024-10-30T13:52:14 1730296334

+++1

THIS!

exactly my style of working and how i think about that: i've also not enabled/installed CoPilot or similar, just AutoSuggestion by default in VS.NET But i use LLM heavily to get rid off all the exhausting tasks, and to generate ideas what to improve in some larger code blocks so i dont have to rewrite/refactor it on my own.

It boosts my productivity by 10x at least.

Bjartr · 2024-10-30T02:50:57 1730256657

Interesting that you find the conversational approach effective. For me, I'd say 9 out of 10 code conversations get stuck in a loop with me telling the AI the next suggested iteration didn't actually change anything or changed it back to something that was already broken. Do you not experience that so often, of do you have a way to escape that?

bequanna · 2024-10-30T03:04:44 1730257484

I encounter that issue when the chat becomes too long.

Starting a new chat with context and asking your question again typically works for me.

andsoitis · 2024-10-30T03:42:36 1730259756

> I'm perfectly fine telling the tool I use its errors and working side by side with it like it was another person.

This is key. Traditional computing systems are deterministic machines, but AI is a probabilistic machine. So the way you interact and the range, precision, and perspective of the output stretches over a different problem/solution space.

torginus · 2024-10-30T11:09:57 1730286597

I've tried using AI(Claude) to do refactors/move code between languages, and in my experience, it has the tendency to go off the rails and just start making up code that does something similar, essentially doing a rewrite that never works.

disgruntledphd2 · 2024-10-30T13:18:56 1730294336

I like to believe (it may not be true though), that the AI has learned what code actually exists in the wild, and is doing what all of us end up doing when trying to refactor a system we don't understand, writing new, similarish code as writing code is more fun than reading it.

apwell23 · 2024-10-30T03:04:32 1730257472

> move multi thousand line codebases between languages

i am more curious about why someone do would this

rqmedes · 2024-10-29T22:23:01 1730240581

I agree. I am in a very senior role and find that working with AI the same way you do I am many times more productive. Months of work becomes days or even hours of work

mhuffman · 2024-10-30T15:06:55 1730300815

>I've created new apps from scratch with it.

Would mind sharing those apps to view. Not the code, just the apps. I have a suspicion about the bi-modal experience.

xk_id · 2024-10-29T23:25:07 1730244307

> I'm actually very curious why AI use is such a bi-modal experience.

My conspiracy theory is that the positive experiences are exaggerated and come from investors in the Nvidia stock.

eyegor · 2024-10-30T04:48:08 1730263688

Have you tried using chatgpt/etc as a starting point when you're unfamiliar with something? That's where it really excels for me, I can go crazy fast from 0 to ~30 (if we call 60 mvp). For example, the other day I was trying to stream some pcm audio using webaudio and it spit out a mostly functional prototype for me in a few minutes of trying. For me to read through msdn and get to that point would've taken an hour or two, and going from the crappy prototype as a starting point to read up on webaudio let me get an mvp in ~15 mins. I rarely touch frontend web code so for me these tools are super helpful.

On the other hand, I find it just wastes my time in more typical tasks like implementing business logic in a familiar language cause it makes up stdlib apis too often.

silisili · 2024-10-30T05:22:35 1730265755

This is about the only use case I found it helpful for - saving me time in research, not in coding.

I needed to compare compression ratios of a certain text in a language, and it actually came up with something nice and almost workable. It didn't compile but I forgot why now, I just remember it needing a small tweak. That saved me having to track down the libraries, their APIs, etc.

However, when it comes to actually doing data structures or logic, I find it quicker to just do it myself than to type out what I want to do, and double check its work.

xk_id · 2024-10-30T11:06:57 1730286417

I don’t really care about broken clocks even if to someone they are useful twice a day.

Nullabillity · 2024-10-29T23:46:47 1730245607

That's a very important caveat. In our modern economy it's difficult to not be a shill in some way, shape, or form, even if you don't quite realize it consciously. It's honestly one of the most depressing things about the stock market.

malux85 · 2024-10-30T02:29:00 1730255340

Theres a big difference between being a happy customer and being a shill.

Nullabillity · 2024-10-30T02:43:30 1730256210

Holding stock is not being a "happy customer". I may be happy with the headset that I bought, but the difference is that I don't make money if you buy an identical one.

malux85 · 2024-10-30T03:02:46 1730257366

I wasnt talking about holding stock, I was responding to this comment you said:

> In our modern economy it's difficult to not be a shill in some way, shape, or form, even if you don't quite realize it consciously.

Oxford dictionary defines a shill as "an accomplice of a confidence trickster or swindler who poses as a genuine customer to entice or encourage others."

So the difference between someone shilling and being a satisfied customer is an intent to decieve. How is it "difficult to not pose as a genuine customer to entice or encourage others" ?

xk_id · 2024-10-30T00:03:04 1730246584

I credit my past interest in cryptocurrencies for educating me about the essence of the stock market in its purest form. And in fact there are painful parallels with the AI bubble.

bob1029 · 2024-10-29T20:28:46 1730233726

It's the subtle errors that are really difficult to navigate. I got burned for about 40 hours on a conditional being backward in the middle of an otherwise flawless method.

The apparent speed up is mostly a deception. It definitely helps with rough outlines and approaches. But, the faster you go, the less you will notice the fine details, and the more assumptions you will accumulate before realizing the fundamental error.

I'd rather find out I was wrong within the same day. I'd probably have written some unit tests and played around with that function a lot more if I had handcrafted it.

tanseydavid · 2024-10-29T20:58:15 1730235495

>> The apparent speed up is mostly a deception.

When I am able ask a very simple question of an LLM which then prevents me having to context-switch to answer the same simple question myself; this is a big time saver for me but hard-to-quantify.

Anything that reduces my cognitive load when the pressure is on is a blessing on some level.

bongodongobob · 2024-10-29T22:12:16 1730239936

Cognitive load is something people always leave out. I can fuckin code drunk with these things. Or just increase stamina to push farther than I would writing every single line.

oogetyboogety · 2024-10-29T21:05:08 1730235908

This might be the measurable "some" non deceptive time saving, whereas most of it is still deceptive in terms of time saved

0xFACEFEED · 2024-10-29T21:56:38 1730238998

You could make the same argument for any non-AI driven productivity tool/technique. If we can't trust the user to determine what is and is not time-saving then time-saving isn't a useful thing to discuss outside of an academic setting.

My issue with most AI discussions is they seem to completely change the dimensions we use to evaluate basic things. I believe if we replaced "AI" with "new useful tool" then people would be much more eager to adopt it.

What clicked for me is when I started treating it more like a tool and less like some sort of nebulous pandora's box.

Now to me it's no different than auto completing code, fuzzy finding files, regular expressions, garbage collection, unit testing, UI frameworks, design patterns, etc. It's just a tool. It has weaknesses and it has strengths. Use it for the strengths and account for the weaknesses.

Like any tool it can be destructive in the hands of an inexperienced person or a person who's asking it to do too much. But in the hands of someone who knows what they're doing and knows what they want out of it - it's so freakin' awesome.

Sorry for the digression. All that to say that if someone believes it's a productivity boost for them then I don't think they're being misled.

tensor · 2024-10-29T21:47:34 1730238454

Except actual studies objectively show efficiency gains, more with junior devs, which make sense. So no, it's not a "deception" but it is often overstated in popular media.

zeroonetwothree · 2024-10-29T22:00:40 1730239240

Studies have limitations, in particular they test artificial and narrowly-scoped problems that are quite different from real world work.

tensor · 2024-10-29T23:45:16 1730245516

And anecdotes are useless. If you want to show me improved studies justifying your claim great, but no I don't value random anecdotes. There are countless conflicting anecdotes (including my own).

rqmedes · 2024-10-29T22:18:21 1730240301

I find the opposite, the more senior the more value they offer as you know how to ask the right questions, how to vary the questions and try different tact’s and also observe errors or mistakes

enneff · 2024-10-29T20:47:14 1730234834

That’s the thing, isn’t it? The craft of programming in the small is one of being intimate with the details, thinking things through conscientiously. LLMs don’t do that.

__MatrixMan__ · 2024-10-29T22:07:40 1730239660

I find that it depends very heavily on what you're up to. When I ask it to write nix code it'll just flat out forget how the syntax works half way though. But if I want it to troubleshoot an emacs config or wield matplotlib it's downright wizardly, often including the kind of thing that does indicate an intimacy with the details. I get distracted because I'm then asking it:

> I un-did your change which made no sense to me and now everything is broken, why is what you did necessary?

I think we just have to ask ourselves what we want it to be good at, and then be diligent about generating decades worth of high quality training material in that domain. At some point, it'll start getting the details right.

esafak · 2024-10-30T01:53:10 1730253190

That doesn't work in the tech industry, because almost nothing is decades old, for obvious reasons.

__MatrixMan__ · 2024-10-30T04:59:35 1730264375

What languages/toolkits are you working with that are less than 10 years old?

Anyhow, it seems to me like it is working. It's just working better for the really old stuff because:

- there has been more time for training data to accumulate

- some of it predates the trend of monetizing data, so there was less hoarding and more sharing

It may be that the hard slow way is the only way to get good results. If the modern trends re: products don't have the longevity/community to benefit from it, maybe we should fix that.

Nevermark · 2024-10-29T20:58:32 1730235512

Perhaps it should be prompted to then?

Ask it to review its own code for any problems?

Also identify typical and corner cases and generate tests?

Question marks here because I have not used the tool.

The size & depth of each accepted code step is still up to the developer slash prompter

nrclark · 2024-10-29T21:16:41 1730236601

I use Chatgpt for coding / API questions pretty frequently. It's bad at writing code with any kind of non-trivial design complexity.

There have been a bunch of times where I've asked it to write me a snippet of code, and it cheerfully gave me back something that doesn't work for one reason or another. Hallucinated methods are common. Then I ask it to check its code, and it'll find the error and give me back code with a different error. I'll repeat the process a few times before it eventually gets back to code that resembles its first attempt. Then I'll give up and write it myself.

As an example of a task that it failed to do: I asked it to write me an example Python function that runs a subprocess, prints its stdout transparently (so that I can use it for running interactive applications), but also records the process's stdout so that I can use it later. I wanted something that used non-blocking I/O methods, so that I didn't have to explicitly poll every N milliseconds or something.

bongodongobob · 2024-10-29T22:15:21 1730240121

Honestly I find that when GPT starts to lose the plot it's a good time to refactor and then keep on moving. "Break this into separate headers or modules and give me some YAML like markup with function names, return type, etc for each file." Or just use stubs instead of dumping every line of code in.

tomrod · 2024-10-29T21:32:02 1730237522

How long are you willing to iterate to get things right?

bongodongobob · 2024-10-29T22:16:31 1730240191

If it takes almost no cognitive energy, quite a while. Even if it's a little slower than what I can do, I don't care because I didn't have to focus deeply on it and have plenty of energy left to keep on pushing.

Nevermark · 2024-10-30T00:42:26 1730248946

As my mother used to say, "I love work. I could watch it all day!"

I can see where you are coming from.

Maintaining a better creative + technical balance, instead of see-sawing. More continuous conscious planning, less drilling.

Plus the unwavering tireless help of these AI's seems psychologically conducive to maintaining one's own motivation. Even if I end up designing an elaborate garden estate or a simpler better six-axis camera stabilizer/tracker, or refactoring how I think of primes before attempting a theorem, ... when that was not my agenda for the day. Or any day.

Bjartr · 2024-10-30T02:55:24 1730256924

I'm constantly having to go back and tell the AI about every mistake it makes and remind it not to reintroduce mistakes that were previously fixed. "no cognitive energy" is definitely not how I would describe that experience.

bongodongobob · 2024-10-30T20:49:53 1730321393

Sounds like the context window is getting pruned. Start a new chat fresh after you make significant changes.

EVa5I7bHFq9mnYK · 2024-10-29T22:10:05 1730239805

That's presumably what o1-preview does? Iterates and checks the result. It takes much longer, but does indeed write slightly better code.

pawelduda · 2024-10-29T22:04:41 1730239481

Exactly, 1 step forward, 1 step backward. Avoiding edge cases is something that can't be glossed over, and for that I need to carefully review the code. Since I'm accountable for it, and can't skip this part anyway, I'd rather review my own than some chatbot's.

tensor · 2024-10-29T21:46:10 1730238370

Why aren't you writing unit tests just because AI wrote the function? Unit tests should be written regardless of the skill of the developer. Ironically, unit tests are also one area where AI really does help move faster.

High level design, rough outlines and approaches, is the worst place to use AI. The other place AI is pretty good is surfacing api call or function calls you might not know about if you're new to the language. Basically, it can save you a lot of time by avoiding the need for tons of internet searching in some cases.

chairhairair · 2024-10-29T21:57:34 1730239054

I have completely the opposite perspective.

Unit tests actually need to be correct, down to individual characters. Same goes with API calls. The API needs to actually exist.

Contrast that with "high level design, rough outlines". Those can be quite vague and hand-wavy. That's where these fuzzy LLMs shine.

That said, these LLM-based systems are great at writing "change detection" unit tests that offer ~zero value (or negative).

Aeolun · 2024-10-29T22:09:32 1730239772

> That said, these LLM-based systems are great at writing "change detection" unit tests that offer ~zero value (or negative).

That’s not at all true in my experience. With minimal guidance they put out pretty sensible tests.

filoleg · 2024-10-29T23:27:45 1730244465

> With minimal guidance[, LLM-based systems] put out pretty sensible tests.

Yes and no. They get out all the initial annoying boilerplate of writing tests out of the way, and the tests end up being mostly decent on the surface, but I have to manually tweak the behavior and write most of the important parts myself, especially for non-trivial tricky scenarios.

However, I am not saying this as a point against LLMs. The fact that they are able to get a good chunk of the boring boilerplate parts of writing unit tests out of the way and let me focus on the actual logic of individual tests has been noticeably helpful to me, personally.

I only use LLMs for the very first initial phase of writing unit tests, with most of the work still being done by me. But that initial phase is the most annoying and boring part of the process for me. So even if I still spend 90% of the time writing code manually, I still am very glad for being able to get that initial boring part out of the way quickly, without wasting my mental effort cycles on it.

tensor · 2024-10-29T23:18:38 1730243918

The fact that you think "change detection" tests offer zero value speaks volumes. Those may well be the most important use of unit tests. Getting the function correct in the first place isn't that hard for a senior developer, which is often why it's tempting to skip unit tests. But then you go refactor something and oops you broke it without realizing it, some boring obvious edge case, or the like.

These tests are also very time consuming to write, with lots of boilerplate that AI is very good at writing.

DeathArrow · 2024-10-30T07:37:49 1730273869

>The fact that you think "change detection" tests offer zero value speaks volumes.

But code should change. What shouldn't change, if business rules don't change, is APIs and contracts. And for that we have integration tests and end to end tests.