When AI is used properly, it’s doing one of two things:
1) Making non-controversial fixes that save time and take cognitive load off the developer. The best example is when code completion is working well.
2) It’s making you smarter and more knowledgeable by virtue of the suggestions it makes. You may discard them but you still learn something new, and having an assistant brainstorm for you enables a different mode of thinking - idea triage - that can be fun, productive, useful and relaxing. People sometimes want completion to do this also, but it’s not well suited to it beyond teaching new language features by example.
The article makes an interesting assertion that AI tools “fail to scale” when the user has to remember to trigger the feature.
So how can AI usefully suggest design-level and conceptual ideas in a way that doesn’t require a user “trigger”? Within the IDE, I’m not sure. The example given of automated comment resolution is interesting and “automatic”, but not likely to be particularly high level in nature. And it also occurs in the “outer flow” of code review. It’s the “inner flow” that’s the most interesting to me because it’s when the real creativity is happening.
> So how can AI usefully suggest design-level and conceptual ideas in a way that doesn’t require a user “trigger”?
I'm just guessing here. But maybe make it part of some other (already natural and learned) trigger made by the user.
I'm thinking part of refactoring. Were your AI is not only looking at the code, but the LSP, recent git changes (both commit and branch name), which code files you've been browsing.
And if you want to make it even more powerful. I guess also part of your browser history will be relevant (even if there is privacy concerns)
Clippy was on the right path at the wrong time, the issue wasn't the concept of an assistant that watches your work and provides help, the issue was it could detect a letter, offer help but the help it gave you was just a bunch of shallow formatting suggestions without context to the actual work.
An actual assistant that can preempt what you need and create it before you get there with a 95% success rate will not feel like Clippy.
Clippy from MSFT? this is where the techies really lose perspective.. you see, its not just a computer, a computer company, and a user.. Real life includes social systems with social contract, and the relationship of the user's logs, records and autonomy to the "master" of the economic relationship. Microsoft has made it clear that surveilling the user and restricting autonomy is as valuable or more valuable from a business perspective than the actual actions performed, AND at the same time, entire industry product lines are designed to trivialize and factor out personal skill and make job performance more like replaceable piece-work.
Lastly, people are more or less aware of these other dynamics at work. Yet, people are people and respond to other social cues. The sheer popularity of something being novel, cute or especially "cool" does move the needle about adoption and implementation. Clippy was a direct marketing response to Apple getting "cool" ratings with innovative GUI elements.
While I agree with some of your points about corporate data practices and the potential for technology to deskill labor, I'm not sure I see the direct connection with Clippy. Clippy, as limited as it was, seems like a poor example of these larger concerns.
Furthermore, saying "techies really lose perspective" comes across as dismissive and judgmental. It's important to remember that people are drawn to technology for various reasons, and many are deeply concerned about the ethical implications of what they build. Painting an entire group as lacking awareness isn't helpful or accurate.
If we want to have a productive conversation about the impact of technology, we need to avoid generalizations and engage with each other respectfully, even when we disagree.
>Clippy was a direct marketing response to Apple getting "cool" ratings with innovative GUI elements
Honestly think Clippy predated that, came out in 1996 for Office 97, Macs were only on System 7.5 back then while I think you're thinking of the early MacOS X era which was was 6-7 years later.
Not an AI expert but this sounds a lot like code review.
Most people are using systems to facilitate code review these days (we aren't sitting around boardrooms with printouts these days) so I wonder if there is a way to use the code review data streams combined with diffs to train AI to "do code reviews"?
As long LLMs are seen as patterned knowledge aggregator then work as intended. They hallucinate answers mostly because they try to interpolate data or there is some pattern that it didn't catch due to lack of context or training.
They're really good for type / name finding and boiler plate generation.
For larger suggestions as your pointed out they're too wrong to be used as is. They can give good ideas, especially if guided with comments, but usually at this stage I just use Phind.
"When AI is used properly" is a very very loaded statement.
Especially since evidence shows code is one of the worst things its good at, it' a lot better at other tasks.
Put another way: "LLMs are great when I don't have lots of experience in the domain but want to do something in that domain. Otherwise, my own brain works better"
Id argue it's the opposite. If you have extremely deep knowledge you can constrain the problem and get a great answer. i.e. boilerplate for X. Boilerplate is only boilerplate because you have a certain level of knowledge.
At this point, asking an LLM to "implement feature X please" is not going to give you great results. however, unless you can type at 600 wpm, an LLM doing extremely trivial boilerplate code completion is a godesend.
From the blog post:
> We observe that with AI-based suggestions, the code author increasingly becomes a reviewer, and it is important to find a balance between the cost of review and added value. We typically address the tradeoff with acceptance rate targets.
In the past year since GPT-4 came out, I've also found this to be the case. I'm an ML/backend engineer with little experience in frontend development. Yet, I've been able to generate React UIs and Python UIs with GPT-4 in a matter of minutes and simply review the code to understand how it works. I find this to be very useful!
IMHO, review is a misnomer for where software engineering is going. I'm not sure where we are going, but review implies less responsibility for the outcome.
But I do think that we will have less depth of knowledge of the underlying processes. That's the point of having a machine do it. I expect this, however, to be a good trend: the systems will need to be up to a task before it makes sense to rely on them.
This is how progress (in developer productivity) has always been made. We coded in assembler, then used macros, then a language like C, Fortran, then more of Java/Go/Puthon/Rust/Ruby et al. A developer writing a for loop over a list in Python need to necessarily know about linked lists and memory patterns because Python takes care of it. This frees up that developer from abstracted details and think one level closer to the problem at a higher speed.
LLMs _can_ be a good tool under the right hands. They certainly have some ways to become a reliable assistant. I suppose in the way of LLMs, they need better training before they can get there.
Frankly, its fine more often than we may care to admit.
As the parent comment suggested, UI elements are a great candidate for this. Often very similar (how many apps have a menu bar, side bar, etc) and full of boilerplate. And at the rate things change on the front-end, it's often a candidate for frequent re-writes, so code quality and health don't need to be as strict.
It'd be nice if every piece of software ever written was done so by wise experts with hand-crafted libraries, but sometimes it's just a job and just needs to be done.
UI is a terrible example to make your point. Tell me you don’t know frontend development…
Accessibility, cross browser+platform support, design systems, SEO, consistency and polish, you name it. You are most certainly not getting that from an LLM and most engineers don’t know how or don’t have a good eye for it to catch when the agent has gone astray or suggested a common mistake
You definitely have a point, but the reality is that LLMs are about as good as an "average" UI developers in some cases -- lots of people who work on UI every day think very little about accessibility and don't understand if their code actually runs in a non-chromium browser.
Does everything ever written need to be crafted by an artisan? And awful lot of useful stuff written “good enough” is good enough. Depth of knowledge or understanding is irrelevant to a lot of front end UI development where the key is the design itself and that the behavior of the design is accurate and reliable not that the engineer -really- understand at a core of their soul graphql and react with the passion of a Japanese craftsman when they’re building a user interface for the ML backend that internal users use for non critical tasks. There does exist a hierarchy of when depth matters and it’s not homogeneously “literally everything you do.”
When someone is using an LLM they are still the author.
Think about it like someone who is searching through record crates for a good sample. They're going to "review" certain drum breaks and then decide if it should be included in an artwork.
The reviewing that you're alluding to is like a book reviewer who has nothing to do with the finished product.
Yup that’s an old reviewer/author problem. Reviewer has a huge blind spot because they don’t even know what they don’t know. The author knows what he knows but more importantly also has a bigger grasp on what he doesn’t know. So he understands what’s safe to do and what’s not
Well it is not actually review where you have a PR. It is more like you are guiding and reviewing a very fast typer in your decided order that in any simple cases handles it 99 percent of the time.
not every developer knows how exactly his modern CPU oder memory layers work, or how electromagnetic waves are building up a signal.
people use tools to make things. Its okay. Some "hardcore folks" advance the "lower level" tooling, other creative folks build actually useful things for daily life, and mostly these two groups have very little overlap IMO.
I know of no review process that produces the same level of understanding as does authorship, because the author must build the model from scratch and so must see all the details, while the reviewer is able to do less work because they're fundamentally riding on the author's understanding.
In fact, in a high-trust system, e.g. a good engineering culture in a tech company, the reviewer will learn even less, because they won't be worried about the author making serious mistakes so they'll make less effort to understand.
I've experienced this from both sides of the transaction for code, scientific papers, and general discussion. There's no shortcut to the level of understanding given by synthesizing the ideas yourself.
> "the author must build the model from scratch and so must see all the details"
This is not true. With any complex framework, the author first learns how to use it, then when they build the model they are drawing on their learned knowledge. And when they are experienced, they don't see all the details, they just write them out without thinking about it (chunking). This is essentially what an LLM does, it short-circuits the learning process so you can write more "natural", short thoughts and have the LLM translate them into working code without learning and chunking the irrelevant details associated with the framework.
I would say that whether it is good or not depends on how clunky the framework is. If it is a clunky framework, then using an LLM is very reasonable, like how using IDEs with templating etc. for Java is almost a necessity. If it is a "fluent", natural framework, then maybe an LLM is not necessary, but I would argue no framework is at this level currently and using an LLM is still warranted. Probably the only way to achieve true fluency is to integrate the LLM - there have been various experiments posted here on HN. But absent this level of natural-language-style programming, there will be a mismatch between thoughts and code, and an LLM reduces this mismatch.
I believe pretty much anyone who has observed a few cycles can tell as much.
Often the major trigger for a rewrite is that the knowledge has mostly left the building.
But then there's the cognitive dissonance; because the we like pretending that the system is the knowledge and thus has economic value in itself, and that people are interchangeable. None of which is true.
It is similar to how much a student learns from working hard to solve a problem versus from being given the final solution. The effort to solve it yourself tends to give a deeper understanding and make it easier to remember.
Maybe that's not the case in all fields, but it is in my experience at least, including in software. Code I've written I know on a personal level, while I can much more easily forget code I've only reviewed.
Also, people shouldn't be allowed to use computers unless they understand how transistors work. If you don't have the depth of knowledge you get nothing.
The person Im responding to was gatekeeping. I responded by sarcastically doing the same to an extreme degree. A lot of people Will have agreed with the person i'm responding to. "Oh yeah of course You should understand these things, the things that I already understand", genuinely not realizing that there's no basis for that. When they reae my response they realize what they were doing, and are less feeling embarrassed for their senseless (and pretentious!) gatekeeping.
As a noob I copied code from Railscasts or Stack Overflow or docs or IRC without understanding it just to get things working. And then at some point I was doing less and less of it, and then rarely at all.
But what if the code I copied isn't correct?! Didn't the sky fall down? Well, things would break and I would have to figure out why or steal a better solution, and then I could observe the delta between what didn't work vs what worked. And boom, learning happened.
LLMs just speed that cycle up tremendously. The concern trolling over LLMs basically imagines a hypothetical person who can't learn anything and doesn't care. More power to them imo if they can build what they want without understanding it. That's a cracked lazy person we all should fear.
In our startup we are short on frontend software engineers.
Our project manager started helping with the UI using an IDE (cursor a VS-code fork) with native ChatGPT integration. In the span of six months, they have become very proficient at React.
They had wanted to learn basic frontend coding for multiple years but never managed to pass the initial hurdles.
Initially, they were only accepting suggestions made by ChatGPT and making frequent errors, but over time, they started understanding the code better and actively telling the LLM how to improve/fix it.
Now, I believe they would have the knowledge to build simple functional React frontends without assistance, but the question is why? As a team with an LLM-augmented workflow, we are very productive.
With the number of anti patterns in React, are you sure everything’s OK? The thing’s about declarative is that it’s more like writing equations. Imperative have a tighter feedback loop, and ultimately only code organization is a issue. But so many things can go wrong with declarative as it’s one level higher in the abstraction stack.
I’m not saying that React is hard to learn. But I believe buying a good book would have them get there quicker.
Yes totally. They already tried doing a few online courses and had a few books.
I'm a senior React Dev and the code is totally fine. No more anti-patterns than juniors who I've worked with who learned without AI. The contrary actually.
Not gonna fault people for learning, think the FUD is more so in the vein of being ignorant while working.
Yea, you dont really need to know how transistors work to code, but you didnt need that for 2 generations. Personally think (and hope), LLM code tools replace google and SO, more so than writing SW itself.
I got my start on a no-code visual editor. Hated it because the 5% of issues it couldn't handle took 80% of my time (with no way to actually do it in many cases). See LLM auto generation as the same, the problems that the tool dosent just solve will be your jobs, and you still need to know things for that.
Oh come on, the user is talking about building UIs. I don't know how else you learn. Your attitude just reeks of high-horse. As if it was better to learn things from stackoverflow.
Who learned stuff from stack overflow? In my own case, it was all books, plus a few videos. Stack Overflow was mostly to know why things has gone (errors not explicit enough) or very specific patterns. And there was a peer review system which lent credibility to answers.
“In my own case...” something worked for you. So what?
Are you sure that would work for others? And that other approaches might not be more effective?
I’ve learned lots of things from SO. The top voted answers usually provide quite a bit of “why” content which have general utility or pointers to more general content.
Yes, there are insufferable people on there, but there are gatekeepers and self-centered people everywhere.
Maybe I am being snarky, but saying “I don’t like that” or “that’s not how I did it” just isn’t that interesting. I’d love to hear why books are so much effective, for instance, or which books, or what YT channels were useful.
Because they’re consistent and they follow (the good ones) a clear path to learn what you want to learn. The explanations may not be obvious at first glance and that’s when you may need somone to present to you in another perspective (your teacher) or provide the required foundational knowledge that you may lack. You pair it with some practice or do cross-reference with other books and you can get very far. Also they’re can be pretty dense in terms of information.
> which books, or what YT channels were useful.
I mostly read manuals nowadays, instead of tutorials. But I remember starting with the “Site du Zero” books (a French platform) for C and Python. As tech is moving rapidly, for tutorial like books, it’s important to get the latest one and know which software versions it’s referring to.
Now I keep books like “Programming Clojure”, “The GO Programming Language”, “Write Great Code” series, “Mastering Emacs”, “The TCP/IP Guide”, “Absolute FreeBSD”, “Using SQlite”, etc. They’re mostly for references purposes and deep dive in one subject.
The videos I’m talking about were Courses from Coursera and MIT. Algorithms, Android Programming, Theory of Computation. There are great videos on Youtube, but they’re hidden under all the worthless one.
Having a similar experience but in the other direction. I have a lot of backend and frontend development experience but no ml background. Being able to ask stupid questions to get further faster has been making difference for me.
I would suggest we have flexible RAM. Also, we have an awful lot of it. The analogy breaks down as soon as you look at it too seriously!
In IT we largely deal with compute, persistent storage and non-persistent storage. Roughly speaking: CPU, RAM, HDD. In humans we might be considered to have similar "abilities" but unlike IT there is a mostly a single thing that performs all of those functions - the brain. That organ is both compute and storage.
LLMs can be surprisingly useful but they are a tool. As with all tools they can be abused and no doubt you have spotted all those tech blogs that spout the same old thing and often with subtle failings (hallucinations).
Keep your tools sharp and know how to safely use sharp tools.
Just because the brain is a single “thing” doesn’t mean it doesn’t have distinct types of memory. Consider looking up “working memory” as it’s probably the best analogue to RAM here.
Not really, it's more like CPU registers. Very limited and stuff has to be in their to be computed on (consciously) (lots of unconscious computation as well of course; a lot of efficiency to be found in moving computation from conscious to unconscious).
These kinds of similies make less and less sense nowadays because we've got nvme storage nowadays, and that can be as fast as 7GByte/s. That's a lot faster then the RAM in most devices today. And with less latency too.
RAMs differentiating factor is increasingly just that it can handle a lot of read/write cycles, not it's speed. And that doesn't map to anything in biology
> These kinds of similies make less and less sense nowadays because we've got nvme storage nowadays, and that can be as fast as 7GByte/s. That's a lot faster then the RAM in most devices today. And with less latency too.
7 GB/s is the low end of the DDR3 performance range; DDR3 is 17 years old. Meanwhile, DDR5 performance ranges from about 33.5 GB/s to about 69 GB/s. RAM latency, even on DDR3, is measured in nanoseconds; NVMe latency is measured in microseconds, making it about three orders of magnitude higher.
By quantity, most devices are low-end and old androids. But I admit, I might have phrased my comment poorly. I didn't mean to imply that high end RAM has the same performance profile as high end NVMe storage
For example, I asked it for a database schema given some parameters. What it gave me wasn't quite what I wanted, but seeing it written out helped me realize some flaws in the original design I had in mind. It wasn't that what it gave me was better. The act of evaluating its suggestions helped me clarify my own ideas.
It's sort of like Cunningham's Law with a party of one. Giving me the wrong answer helps me clarify what the correct answer should look like.
Or, perhaps a better way to put it:
It's easier to criticize than to create. It gives me something to criticize and tinker with. Doing so helps me hone in on the solution I want. (Provided its suggestion was at least in the right universe, of course.)
> I've been finding AI's suggestions -- even when rather wrong -- help me do that initial step faster.
I have no idea how I could even integrate AI into my workflow so that it's useful. It's even less reliable than search is for basic research and can't even cite its sources....
This argument held a lot more weight when it was a search engine playing the role of our memory.
That comment was solely about AI code suggestions.
Generative AI still has a ways to go for other forms of research, and it will never fully replace the utility of a search engine. They're two different tools for different but overlapping tasks.
> That comment was solely about AI code suggestions.
Even AI code suggestions seem to be only a minor improvement over basic LSP integration. One major exception is tedious formatting of the text—say you want to copy over a table by hand to a domain value, copilot is really good at recognizing values and situating them appropriately in the parent l-value.
If chatbots could serve as my RAM, surely they'd be able to generate code relevant to the rest of the codebase or at the very least not require deep scrutiny to ensure their RAM matches mine (it most often does not).
LLMs are undeniably useful for programming. The core challenge in making them more useful is the right UX for making this more seamless. I use intellij and things like codegpt.
A few weeks ago they enabled auto complete. I disabled it after a day. Reason: most of the suggestions weren't great and it drowned out the traditional auto complete, which I depend on. I just found the whole thing too distracting.
I also have the chat gpt desktop app installed since a few weeks. This adds a key binding (option + space) to be able to ask it questions. I've found myself using it more because of that. Copy some code, option+space, ask it a question and there it goes.
The main issue is that it is a bit ground hog day in the sense that I have to explain it in detail what I want every time. I can't just ask it to generate a unit test (which it does very well). Instead I have to specify that I want it to generate a unit test, use kotlin-test and kotest-assertions, and not use backticks for the function names (doesn't work with kotlin-js). Every time. If I don't it will just default to the wrong things because it doesn't remember your preferences for frameworks, style, habits, etc. and it doesn't look at the whole code base to infer any of that.
Mostly, progress here is going to come in the form of better IDE support and UX. The right key bindings would help. A bigger context so it can just grab your whole git project and be able to suggest appropriate code using all the right idioms, frameworks, etc. would then be possible. Additionally, it would be able to generate files and directories as needed and fill them with all the right stuff. I don't think that's going to take very long. Gpt-4o already has a quite large context and with the progress in OSS models, we might be doing some of this stuff locally pretty soon.
> Instead I have to specify that I want it to generate a unit test, use kotlin-test and kotest-assertions, and not use backticks for the function names (doesn't work with kotlin-js). Every time.
Have you experimented with GPTs much? I'd solve this problem by creating my own private "write a unit test" GPT that has my preferences configured in the system prompt.
I ask many things in a typical day. Creating custom GPTs for everything that pops in my head is not very practical. It's workaround for the fact that it doesn't remember anything about previous chats and conversations.
In an IDE context a lot of this stuff should be solved in the UI, not by the end user.
IntelliJ also has "full line completion" built in that uses a local model. I've found it rather good and it's the only in-IDE code generation I use. It does not get in the way and usually gives me exactly what I want.
I don't know. There's an idealized vision that people have in their heads of what LLMs should be in which they're "undeniably useful". It's unclear if that already exists and is just constrained by the right UX concept, or if it doesn't exist and never will.
Seems they need to compare against "dumb" code completion. It seems that even when they are error-free, "large" AI-code-completions are just boilerplate that should be abstracted away in some functions rather than inserted into your code base.
On a related note, maybe they should measure number of code characters that can be REMOVED by AI rather than inserted!
boilerplate that should be abstracted away in some functions rather than inserted into your code base
Boilerplate is often tedious to write and just as often easy to read. Abstraction puts more cognitive load on the developer and sometimes this is not worth the impact on legibility.
Totally agree - LLMs remove the tedium of writing boilerplate code, which is often a better practice than abstracted code. But it takes years to experience to know what that's the case.
Say I'm faced with a choice right now -- repeat the same line twice with 2 minor differences which gets checked by IDE immediately, or create a code generator that generates all 3 lines which may not work well with the IDE and the build system which leads to more mistakes?
Agree with the first sentence. I used to tell people I never type anything more than periods and two or three letters because we already had good autocomplete.
Yes. Although I’d like to see a deeper investigation. Of course, quality of completions have improved. But there could be a confounding phenomenon where newer folks might just be accepting a lot of suggestions without scrutiny.
I accept a lot of wrong suggestions because they look close enough that it's quicker for me to fix than it is to write the whole thing out from scratch. Which, IIUC, their metric captures:
> Continued increase of the fraction of code created with AI assistance via code completion, defined as the number of accepted characters from AI-based suggestions divided by the sum of manually typed characters and accepted characters from AI-based suggestions. Notably, characters from copy-pastes are not included in the denominator.
As someone who is generally pro-AI coding tools, I do the same. However sometimes it winds up taking me so much effort to fix the mistakes that it'd have been faster to do it myself. I often wonder if I'm seeing a net positive effort gain over time, I *think* so, but it's hard to measure.
Question for any Googlers in the thread - do folks speak up if they see flaws in the methodology or approach of this research or is the pressure from the top so strong on this initiative that people hush up?
There were already teams building ML-based code completion, code suggestions, and code repair before LLMs blew up a couple years ago. So the principle of it isn't driven by AI hype.
Yes, there are oodles of people complaining about AI overuse and there is a massive diversity of opinion about these tools being used for coding, testing, LSCs, etc. I've seen every opinion from "this is absolute garbage" to "this is utter magic" and everything in between. I personally think that the AI suggestions in code review are pretty uniformly awful and a lot of people disable that feature. The team that owns the feature tracks metrics on disabling rates. I also have found the AI code completion while actually writing code to be pretty good.
I also think that the "% of characters written by AI" is a pretty bad metric to chase (and I'm stunned it is so high). Plenty of people, including fairly senior people, have expressed concern with this metric. I also know that relevant teams are tracking other stuff like rollback rates to establish metrics around quality.
There is definitely pressure to use AI as much as reasonably possible and I think that at the VP and SVP level it is getting unreasonable, but at the director and below level I've found that people are largely reasonable about where to deploy AI, where to experiment with AI, and where to AI to fuck off.
The code completion is quite smart and one of the bigger advantages Google has now is the monorepo and the knowhow to put together a pipeline of continuous tuning of models to keep them up to date.
The pressure, such that it is, is killing funding for the custom extension for IntelliJ that made it possible to use it with the internal repo.
Cider doesn't have the code manipulation featureset that IntelliJ has, but it's making up for that with deeper AI integration.
Fwiw, ai code completion initiatives at Google well predate current hype (ie overall research path, etc, was started a long time ago, though obviously modified as capability has changed).
So this particular thing was a very well established program, path, and methodology by the time AI hype came.
Whether that is good or bad I won't express an opinion, but it might mean you get a different answer to your question for this particular thing.
There are plenty in this post, and I'm one of them.
It may or may not be great for job prospects, but as someone who isn't one of you fancy startup/FAANG super programmers it's great for me to be able to ask it design problems, what the 'best' way to do a certain thing is, or tell it "I need a function given x,y, and z that does this and returns this."
There are plenty of instances where it doesn't make sense to use it, but I always have a tab with ChatGPT open when coding now. Always.
It's good if you have some domain knowledge and can kind of detect the bullshit there, and also just in cases where there is large amounts of training data. I am in big tech and I use it pretty much every day, mostly in cases where I would have spent a lot of time googling before.
This fails to recognize that this is a bad feature that the Abseil library would explicitly reject (hence the existence of absl::CivilDay) [0], and instead perpetuates the oversimplification that 1 day is exactly 24 hours (which breaks at least twice every year due to DST).
Which is to say: it'll tell you how to do the thing you ask it to do, but will not tell you that it's a bad idea.
And, of course, that assumes that it even makes the change correctly in the first place (which is nowhere near guaranteed, in my experience). I have seen (and bug-reported!) cases where it incorrectly inverts conditionals, introduces inefficient or outright unsafe code, causes unintended side effects, perpetuates legacy (discouraged) patterns, and more.
It turns out that ML-generated code is only as good as its training data, and a lot of google3 does not adhere to current best practices (in part due to new library developments and adoption of new language versions, but there are also many corners of the codebase with, um, looser standards for code quality).
So how long till AI will be fully replacing a SWE at Google? That is where the drive for productivity at organizations like Google are leading towards.
Assuming Google has tens of thousands of software engineers (for a lower bound of 10,000) and artificial intelligence increases productivity by at least 0.01%, the first engineer has already been replaced.
I'm guessing it's 30+k, depending on how liberal you are with the job (eg. data engineering, SREs, etc).
> the first engineer has already been replaced
Realistically, many engineers never got hired because of this already. Also they've had a few rounds of layoffs, so probably plenty are fired by now.
Google already was known for their use of automated code-gen long before they invented LLMs so I wouldn't be surprised if they're on the verge of major systems being gen'ed or templated with LLMs.
It's like asking "How long until a bulldozer [1] fully replaces a human construction worker?" A bulldozer is not a full replacement for a human construction worker. However, 1 worker with a bulldozer can do the work of 10 workers with shovels. And 10 workers with bulldozers can build things no amount of workers with shovels can.
[1] I'm using bulldozer as shorthand for all automated construction equipment - front end loaders, backhoes, cranes, etc.
New tech drives job changes. Always has, always will. No guarantee that the changes are good for every individual. And there’s no guarantee that they are good in the aggregate, either.
Never? Google will probably reduce their SWE more and more assuming we get exponentially better at LLMs or something better comes along, but it won't ever be fully replaced.
Yes paying customer of all three. Sometimes for fun, start a task with the same prompt across all three. Within tree interactions Gemini start giving woke lectures, and provide worst or unusable results while the other two shine.
The gradual, eventual, ultimate conclusion killer app of AI is to run the systems, debug apps like an SRE, manage warm datastores, and eventually write app code based on the desires of feedback and requirements descriptions from users, design new programming languages/formats for it, and later design silicon. The curious bit is how product managers will fit into this picture to perhaps supervise what is allowed, only to eventually removed. There's no conceivable job that can't be eaten partially or mostly by AI, if not nearly entirely eaten in the future.
At some region on the technological "singularity" timeline, there will can and will be fully-autonomous corporations. The question is: With corporate personhood, can a corporation exist, pay (some) taxes, reinvest in itself, and legally function without any human owners? This may also be contingent on whether or not a corporation conduct legal and business activities, i.e., if performed by a human agent or through some automated means, even if they were directed by AI management. IANAL, but I guess an autonomous corporation could be sued, and perhaps even the creators of the software used to create the AI that run it could also be potentially at risk.
In Go it’s not uncommon to use code generation to recreate boilerplate code, especially before the introduction of generics. No human looks at this usually. And if they do, they find the code they’re looking for contained in a few files. I personally found this pattern pretty good and easy to reason about.
Well, someone (a human being) still maintains it, and ultimately someone likely will find the code unmaintainable even if LLMs help. If you use ChatGPT enough you would know it has its standards as well, actually pretty high. At one point the code likely still needs to be refactored, by human or not.
It’s really not a problem at a certain point. Also, we’ll probably have “remove and replace” but for software in the next couple years with this stuff.
After reading this I'm wondering how the indy code autocomplete tools are going to be able to compete longterm with this giant feedback rich data machine Google has built...do engineering orgs of sufficient scale ultimately hoard their tooling for competitive advantage, thereby leaving independent players to cater to developers outside of Google? Feels like yes...but plenty of inventions trickle out in various forms.
I’m not seeing any evidence that any of this is actually good for Google’s business. Their observations that half of code checked in is from suggestions by an LLM is not really surprising in a regimented dev platform with tons of boilerplate. That stat tells us nothing about actual code quality, development velocity, or skill curves over time, much less business impact.
What product of Google’s has been improved by this feedback loop? The trajectory of Google search itself in the past year has been steadily downhill. And what other products of theirs I use don’t really change much, at least not in positive ways. Gmail is just gradually being crushed from every side by other app widgets scrounging for attention. Chrome has added… genAI features and more spyware? Great.
Yeah I kind of agree that when LLMs work REALLY well for autocomplete of your codebase -- that might be an indication that the language and library abstractions you use don't fit the problem very well.
Code is read more than it's written. And it should be written to be read.
If you are barfing out a lot of auto-completed stuff, it's probably not very easy to read.
You have to read code to maintain it, modify it, analyze its performance, handle production incidents, etc.
> If you are barfing out a lot of auto-completed stuff, it's probably not very easy to read.
From my experience using LLMs, I'd guess the opposite. LLMs aren't great at code-golf style, but they're great at the "statistically likely boilerplate". They max out at a few dozen lines at the extreme end, so you won't get much more than class structures or a method at a time, which is plenty for human-in-loop to guide it in the right direction.
I'm guessing the LLM code at Google is nearly indistinguishable from the rest of it for a verbose language with a strong style expectation like java. Google must have millions of lines of Java, and a formatter that already maintains standards. An LLM spitting out helper methods and basic ORM queries will look just like any other engineers code (after tweaking to ensure it compiles).
If you already apply a code-formatter or a style guide in your organization, I'm guessing you'd find that LLM code looks and reads a lot like the rest of your code.
Yes, it can make stuff that fits in the rest of the codebase
But I am saying it's not going to ever make the code significantly better
In my experience, code naturally gets worse over time as you add features, and make the codebase bigger. You have to apply some effort and ingenuity to keep it manageable.
So if everyone is using LLMs to barf out status quo code, eventually you will end up with something not very readable. It might look OK locally, but the readability you care about is a global concern.
What you say tracks - but I'm wondering what happens if they manage to unlock some meaningful velocity increase to the point where they can begin tackling other domains and shuttling out products at a higher rate...Agree with your thoughts on search - it's f'ing unusable now and frustrating to look at.
An extrapolation to these make me think that programmers becoming like Tony Stark telling Jarvis what to do, and it gets it done. Or Minority Report style interfaces, where people will inform their intent (prompt?) and get the answers.
Happy for them to experiment with the tools, but i am afraid it will have more negative effect through others blindly copying without reading between the lines.
i am still struggling to find value where the main selling points of using llm for code lies. for boilerplate this is a very inefficient although interesting way, but it does not help me do the thinking. well-written, language-specific boilerplate generation in most ide brings out the current effects just fine.
I'm looking forward to the day that some spicy autocomplete regurgitates an obvious chunk of AGPL code that it's stolen without permission or attribution - and it ends up in some critical part of Googles money printing machine, and the outside world finds out about it.
I wouldn't be surprised if the sole training data for autocompletes was google3. It's an absolutely massive codebase, using the libraries and patterns Googlers use, and more or less entirely safe to train on. Any training data beyond that would be whitelisted by legal.
That was my assumption also. I worked at Google as a contractor in 2013 on a AI related project and their entire internal development environment was so much fun to use, and massive. Their internal coding world is an ideal and probably sufficient source of end-stage training data, given early training on the usual data sources to build language understanding first. They wouldn’t train on just their internal data exclusively.
People who work at your bank right now are pasting your personal details into language models and asking if you deserve a loan. People will figure out how to get this data back out.
"But the model only uses old training data" there are myriad lawsuits in flight where these companies took information they shouldn't have, in all forms. Prompt engineers have already got the engines to spew things that are legally damning.
A real hack, which might archive user inputs as well as exfiltrate training data. We're only beginning to imagine the nightmare.
Can people use these language models to get private Google info? Only if Google was dumb enough to include that in the training. Hint: Yes, it's much, much worse than anyone imagines.
And? For big companies, lawsuits are mostly about being forced to pay money. Rarely does losing a case drive real behavioral change, much less fundamental change.
IP getting coughed out of an AI will keep some lawyers busy and result in fig leaf changes.
The outcome of any IP lawsuit is going to include an order to cease use of the IP. Even a DMCA notice requires cessation.
If the infringing code powered a widely used service, replacing the code with a clean room refactor could be arduous and time-consuming and might require a court-appointed monitor to sign off.
If there's any risk of this happening, it's probably because someone imported AGPL code into the codebase, outside of //third_party, after removing the copyright notice / license. (We have an allowlist of acceptable third-party licenses; I would not expect it to include copyleft licenses like AGPL.)
We nominally have controls against this sort of thing, but if it's only weakly-enforced policy, then unfortunately it's easy enough to bypass.
You'd be surprised at how many "Make money teaching AIs to code!" job postings there are out there now. Taking "training your replacement" to whole new levels.
I work at Google, this stuff is available, entirely optional, and in fact the most recent stuff is still something you have to sign up for and get on an opt-in basis in return for providing feedback/answering surveys.
I have AI suggested edits turned off. It's the 4th setting down in the settings menu. AI suggestions in Cider can be disabled. And all of these things are web apps for which googlers have a rich history of creating Chrome extensions to change things they don't like or disagree with.
I don't feel it's obvious at all that this will become forced. What would that even mean – you don't get to edit code anymore but can only interact via chat with an AI which edits the code for you? I think it's obvious that's not going to happen.
Will there be a day when you can't turn off AI suggestions for something? Maybe, but suggestions aren't requirements, no one is forced to use them so at worst they're a minor UI annoyance.
I'm interested to hear why you have such a negative take on it? I don't particularly enjoy most AI implementations in products (not Google specific), but they're almost entirely optional everywhere, and ignoring them really isn't much of a problem.
You can turn off giving ai suggestions, not receiving them.
Not all AI affordances in cider can be disabled. There are plenty of people complaining about this on yaqs and buganizer.
> no one is forced to use them so at worst they're a minor UI annoyance.
This attitude is exactly the problem with Google and is why Google’s AI rollout has been terrible thus far. Luckily the stock is still going up, so whatever I guess.
You don't receive AI suggestions in code review, you receive suggestions from your reviewers. If they are providing bad recommendations, regardless of the source, I suggest providing them feedback.
I agree this may not be how they are used, but honestly, code review is a skill and many people are bad at it. Blindly suggesting things because an AI (or a presubmit) suggests it is bad form regardless of the source. People slam that "please fix" button without looking at what it's asking for, or without reading my detailed explanations of why I'm ignoring a recommendation, etc. This was already a problem and AI isn't changing that.
We can make code review better, but that comes through training reviewers better, not by ruling out AI tools just because they're AI tools.
> You don't receive AI suggestions in code review, you receive suggestions from your reviewers
If a reviewer makes a comment, Critique will create an AI suggestion based on their comment, even if they didn’t explicitly do so (unless they turn off giving AI suggestions on their end, but there’s no way to stop from seeing it from the other).
> not by ruling out AI tools just because they're AI tools.
This was not my point at all.
The point is that Google force feeds AI tools, and makes it difficult to opt out. In many cases it is not possible as shown with GenAI search.
Word of advice from someone who left: if you've reached the point you're trashing work for a root-level OKR that is considered existential, taking an absolutist exaggerated stance against it, and discussing fine-grained details of internal tools, while claiming they go against PR/"research papers"...you're well past the right time to leave.
We all have unique circumstances, but I can almost guarantee you that you'll be absurdly happier elsewhere.
Life's short, and no matter how much you save, something can take it away.
Better to start pursuing it now, than after being the sacrificial MI, or after the call from HR asking you to chill because a VP got upset and had lackeys reverse engineer your identity.
> We all have unique circumstances, but I can almost guarantee you that you'll be absurdly happier elsewhere.
Thanks for your insightful contribution - I read the post again and found this right after what you found, like, right after. I hope this helps clarify
Guarantee with what? Personal money? I know people who have more than 3+ years of experience having trouble with getting an offer after months of searching these days. What can you offer to guarantee the "happiness" "elsewhere"?
What would it mean for code suggestions to be non-optional? Like, you can't edit the code file yourself but have to talk to a chatbot to ask it to make the edits for you? I think that's fairly obviously a ridiculous notion.
I could envision some point in the future where the tooling is good enough that if you reject the suggestion, you had better have a very good reason for doing so. We're already there on the formatting front, as well as the widely-enabled clang-tidy checks (e.g. pessimizing moves). Once a tool consistently meets that high bar, it's irrelevant whether its suggestions are derived from static analysis or a LLM.
As far as being non-optional: a code owner could very much refuse to approve your change until it conforms to their standards for their code base.
ML code suggestions at Google do not inspire that high level of confidence in me today, but I have no reason for thinking that will always be the case.
I see your point, and maybe you're right, but I do think formatting tools are materially different. You can define correctness for a formatting tool and they're understood to be about enforcing a consistent style. That doesn't apply to code more generally where there's more open to interpretation, other "style" to consider, program behaviour, etc.
Also I would say that automatic formatters weren't popular until the mid 2010s from what I've experienced, despite being technically possible since pretty much the advent of programming. I even remember having to push hard for adoption of them in ~2018. Even if the AI tools were at the level (and they're definitely not yet, any of them), it could easily take a decade for it to become the norm or for it to be mandated.
That's not true, I'm a xoogler as of October, and at least 2 of my ex-colleagues continue to generally wonder if AI can write code or not, and if it can, they haven't tried it. Last update 60 days ago.
It does look like there's an auto-installed cider extension, which is fine, the worst case for this stuff is "it's in my autocomplete list" -- that's fine!
The AI suggestions they added to our editor and review tool are pretty uninvasive. The review suggestions are right 60% of the time, but the reviewer can just turn off the bad suggestions. The autocomplete is 90% good, but it isn't very ambitious. There's also a free text LLM chat thingy which can act on selected code, but I only ever tell it "fix" or "extract function" and for those use cases it's quite good.
Google has been really lost here over the last several years. When lambda and bard were already killer at producing reasonable code and OpenAI had scarcely released anything, there was an explicit embargo against using them internally.
When Blake Lemoine was talking nonsense about lambda being conscious in June ‘22, these internal models were already killer at producing code. ChatGPT wouldn’t be released for another 5-6 months.
As others have mentioned, unless you have a strong conscience and really know what you're doing, it's far too tempting to just accept AI-generated suggestions without really thinking, and IMHO losing that understanding is a dangerous path to go down. AI can only increase quantity, not quality. The industry desperately needs far more of the latter.
I felt this on myself as well when I tried to use copilot. Especially when it was later in the day. I still use it for some boilerplate code / building visualizations that feels like boilerplate, but turned it off for any real important code. Atm I find most value in AI in discussing design decisions and to evaluate alternative approaches to mine. There it really had a huge positive impact on my workflow
It's a concern I have too, when I get tired, I start to just delegate to co-pilot suggestions as I get desperate, if I didn't have co-pilot, I'd probably just log off for the day.
I actually don't really use copilot as I didn't find it that helpful, so I don't really have the problem anymore, but I could see it was a danger. Bit like driving when tired.
I look at is as asking an intern to do some work that I don't have time for. Do I have to check their work? Yes. Might I have to correct and guide the outcome? Again, yes. Am I going to ask them to implement something novel and groundbreaking? Not really, that'd be a disaster unless they are a prodigy. None of that removes my capacity, or any kind of danger.
I find this absolutely nothing like delegating to an intern personally. Interns usually do their best because they will be held accountable if they don't.
I don't know how to articulate it, but wherever I hear the financial analysts talking about how much work AI is going to do for us, I just have this spidey-sense that they're severely underestimating the social aspect of why anyone tries to achieve a good outcome
they think they can just spend 100,000$ on GPUs and get 10x the output of someone buying a house and raising kids getting paid a 6 figure salary
At work I already had teammates who would grant LGTM to code they barely read, but looked fine after reading the description and hopefully also skimming over the code.
I feel that this path of least resistance/effort will also apply to things that an LLM spit as those are highly likely to look correct on the surface level.
I disagree. I don't want to go into the docs to understand specific syntax or options of a library. Just let me write in natural language what I want and give me the result
If I can't tell based on the code what it's supposed to do, then it's a shitty library or api.
Not to be rude but I have a hard time relating. Code is its own meaning, and to read it is to understand it. The only way you can use code you don't "understand" is to lack understanding of the language you are using.
In my almost 30 years in the industry I've run into plenty of people who claimed they knew what code did by reading it, but in practice every single one of them turned out to be only another human flummoxed by unexpected runtime behavior.
I never found my experience that simple. Even at times when I was paying attention which is always in short supply when you need it most.
Libraries are biggest pain point. You don't know what a function really doing unless you have used it before yourself. Docs are not always helpful even when you read them.
Lot of assumptions that may not be totally wrong but not right also. In c++, using [] on a map to access an element is really really dangerous if you haven't read the docs carefully and assumes that it does what you believe it should do.
> using [] on a map to access an element is really really dangerous if you haven't read the docs carefully and assumes that it does what you believe it should do
It's not that bad, as it just inserts a default-constructed element if it's not present. What would you expect it to do, such that it returns a reference of the appropriate type (such that you can write `map[key] = value;`)? Throw an exception? That's what .at() is for. I totally agree that the C++ standard library is full of weird unintuitive behavior, and it's hard to know which methods will throw exceptions vs have side effects vs result in undefined behavior without reading documentation, but map::operator[] is fairly tame.
Meanwhile, operator[] to access an element of a vector will result in a buffer overflow if not appropriately bounds-checked.
Something akin to `std::optional` would have been great. These days, folks don't come to C++ from C and are not used to such under-the-hood tricks.
The committee decided that since the reference can't be null, therefore, we'll insert something into the map when someone "query" for a key. Perhaps two mistakes can make a right!! It is hard to make peace with it when you are used to a better alternative.
> These days, folks don't come to C++ from C and are not used to such under-the-hood tricks.
It's funny to see that, since I'm used to folks coming to C++ from C and bemoaning that C is a much more straightforward language which doesn't engage in such under-the-hood tricks.
> Something akin to `std::optional` would have been great.
So, map.find(), which returns an iterator, which you can use to check for "presence" by comparing against map.end()? (Or if you actually want to insert an element in your map, map.insert() or, as of C++17, map.try_emplace().)
Again, the standard library is weird and confusing unless you've spent years staring into the abyss, and there are many ways of doing subtly different things.
C++ does not have distinct operators for map[key] vs map[key] = val. Languages like JS and Python do, which is why they can offer different behavior in those scenarios (in the former case, JS returns a placeholder value; Python will raise an exception). But, that's only really relevant in the context of single-dimensional maps.
Autovivification is rather rare today, but back in the early 90s when the STL first came about (prior to C++'s design-by-committee process, fwiw), what language was around for inspiration? Perl. If you don't have autovivification (or something like it), then map[i][j][k] = val becomes much more painful to write. (Maybe this is intended.) Workarounds in, say, Python are either to use a defaultdict (which has this exact same behavior of inserting elements upon referencing a key that isn't present!) or to use a tuple as your key (which doesn't allow you to readily extract slices of your dict).
I have seen with my own two eyes programmers push code they didn't read because it produced the expected output, whether via chatgpt or stack overflow, copy paste, run, passed the tests
If there's anything in this press release to justify how "characters added by AI" is more true a reflection of quality than commit count is of productivity, I didn't see it.
It's a short release and I read it twice, so if it was there I feel like I'd have noticed.
I haven't found it particularly smart, at least in its GitHub Copilot incarnation.
Per the metrics I added to the integration when I began to trial it, I accepted about 27% of suggestions.
I didn't track how many suggestions I accepted unmodified, because that would have been orders of magnitude more difficult; I would be fascinated to see Google's solution to the same problem documented, but doubt strongly that I will. I'm sure it's entirely sound, though, and like all behavioral science in no sense a matter of projection, conjecture, interpretation, or assumption.
I turned off the Copilot integration months ago, when I realized that the effort of understanding and/or dismissing its constant kibitzing was adding more friction on net to my process than its occasions of usefulness alleviated.
I do still use LLMs as part of my work, but in a "peer consultant" role, via a chat model and a separate terminal. In that role, I find it useful. In my actual editor, conversely, it was far more than anything else a constant, nagging annoyance; the things it suggested that I'd been about to write were trivial enough that the suggestion itself broke my flow, and the things it suggested that I hadn't been about to write were either flagrantly misguided for the context or - much worse! - subtly wrong, in a way that took more time to recognize than simply going ahead and writing it by hand in the first place would have.
I've been programming for 36 years, and it's been well more than two decades since I did any other kind of paying work. The idea that these tools are becoming commonplace, especially among more junior devs without the kind of confidence and discernment such tenure can confer, worries me - both on behalf of the field, and on theirs, because I believe this latest hype bubble ill serves them in a way that will make them much more vulnerable than they should need to be to other, later, attacks by capital on labor in this industry.
On a related note, Microsoft published a press release last year [1] where they seemed to suggest that 30% of accepted copilot suggests was a 30% productivity boost for devs.
> users accept nearly 30% of code suggestions from GitHub Copilot
> Using 30% productivity enhancement, with a projected number of 45 million professional developers in 2030, generative AI developer tools could add productivity gains of an additional 15 million “effective developers” to worldwide capacity by 2030. This could boost global GDP by over $1.5 trillion
They were probably just being disingenuous to drum up hype but if not they'd have to believe that:
1) All lines of code take the same amount of time to produce
2) 100% of a developer's job is writing code
This is kind of like answering “which software engineers are excited about Stack Overflow?” with “mediocre to bad ones”. It’s A) insulting, but more importantly B) wrong — if I’m, say, a web developer who is trying to learn graphics programming, of course I’ll need help as I build a mental model of the domain.
1) Making non-controversial fixes that save time and take cognitive load off the developer. The best example is when code completion is working well.
2) It’s making you smarter and more knowledgeable by virtue of the suggestions it makes. You may discard them but you still learn something new, and having an assistant brainstorm for you enables a different mode of thinking - idea triage - that can be fun, productive, useful and relaxing. People sometimes want completion to do this also, but it’s not well suited to it beyond teaching new language features by example.
The article makes an interesting assertion that AI tools “fail to scale” when the user has to remember to trigger the feature.
So how can AI usefully suggest design-level and conceptual ideas in a way that doesn’t require a user “trigger”? Within the IDE, I’m not sure. The example given of automated comment resolution is interesting and “automatic”, but not likely to be particularly high level in nature. And it also occurs in the “outer flow” of code review. It’s the “inner flow” that’s the most interesting to me because it’s when the real creativity is happening.