Risk Assessment of GitHub Copilot

nojs · on July 12, 2021

I think the biggest problem copilot will have in practice gaining traction is that verifying correctness isn’t any faster than writing the code yourself in many cases. The Easter(y) function is a classic example - it would be way faster to write that than to try and verify that there’s no subtle bugs.

Copilot is by design trying to give you something that _looks_ correct without caring whether it actually is - so it optimises for real-looking but subtly buggy code, which is the worst kind of broken code.

mumblemumble · on July 12, 2021

Years ago, I ran into a similar problem working on a program that was doing named entity recognition to assist humans with data entry. We found that, for our purposes, there seemed to be no (realistic) accuracy threshold beyond which the tool would save clients money, because double-checking the machine-generated output was inherently more work than doing it by hand.

So we pivoted the product to being something you would run on full auto, for situations where you didn't need a high level of quality. I'm not sure if that option is available to programmers, though.

jpeloquin · on July 12, 2021

Maybe Copilot could be turned into a context-aware search engine? That is, invoking it would return a list of examples that it thinks do the same thing as what you're trying to do, based on your work-in-progress code.

visarga · on July 12, 2021

Seems like humans are more creative when they can get inspiration from language models.

https://arxiv.org/abs/2107.04007

bshanks · on July 12, 2021

Ironic if AIs end up being more useful for creativity than for detail-oriented tasks.

jessmartin · on July 13, 2021

I honestly think this is where things are going. Man-machine partnership on creative tasks. Not only is it more amenable to current models, it’s higher leverage and less likely to be completely automated away.

I wrote a little about this shift to what I think of as “conversational programming”: https://jessmart.in/articles/copilot

atatatat · on July 13, 2021

You're not going to get it out of a generation of people raised by tablets, needs to come from somewhere.

stormbrew · on July 13, 2021

This would be way more interesting of a product than what copilot actually is.

saurik · on July 13, 2021

Yeah: I would love this; I actually often try to use GitHub like this, and it sort of works at times, except GitHub's search features frankly suck.

jtbayly · on July 12, 2021

I would have thought to pivot to an assistive model in the case you mention, and in cars and in copilot.

Have the machine notify you when it thinks you’ve made a mistake.

mumblemumble · on July 12, 2021

The thing we ran into was, in cases where the human and the machine disagreed, it was basically never the human that was wrong.

pessimizer · on July 12, 2021

> I think the biggest problem copilot will have in practice gaining traction is that verifying correctness isn’t any faster than writing the code yourself in many cases.

Humorously, this is a similar problem to the one autonomous driving has. Being alert when something goes wrong randomly is more difficult than being alert all of the time.

slg · on July 12, 2021

However in the real world people don't always write bugless code and aren't always alert when driving. Therefore these AI assistants can still have a net positive result as long as they are better than the average performance of a human. Of course three quarters of us probably believe that "I'm not an average programmer so Copilot would only make me worse."

Personally I think the more interesting angle is the trolley problem this creates. People will die in self-driving car accidents and bugs will exist in AI generated code. Those people and bugs are different than the people who will die in human caused accidents and the bugs in human written code. If the number and severity of the results are lessened by the computer, are we willing to forgive the damage directly caused by the AI that falls short of perfection?

tkiolp4 · on July 12, 2021

I’m a very mediocre developer and if Copilot is any better than me at writing code, then I will have a hard time understanding whatever Copilot throws at me. I cannot just save, commit and push whatever Copilot suggests... so it’s faster if I write the code myself than to review Copilot’s code.

slg · on July 12, 2021

> I cannot just save, commit and push whatever Copilot suggests

I don't think that is the goal just like the goal of the current generation of self-driving cars isn't for you to be able to take a nap in the driver's seat.

Imagine you need some code that would have traditionally taken you and hour to write. I believe the goal of Copilot is to generate the code for you as a starting point. Maybe you don't understand that code immediately and it takes you 20 minutes to figure out what is going on. Then you spend another 20 minutes tweaking it for your exact purpose. If that results in code of similar quality to what you would have written alone, then Copilot makes you more efficient by saving you 20 minutes.

yojo · on July 12, 2021

I haven't had a chance to try it yet, but I'm skeptical of the time savings claim of copilot in its current form. At least working on a large code base, the things that take time are:

1) Understanding the data model and logic of the code that interacts with the component I'm working on

2) Refactoring existing code to accommodate my change gracefully

3) Writing and fixing tests

4) Working through the code review process

For a major new piece of functionality, add

5) Put together a design document and review it with relevant stakeholders

The part that is fast is actually writing the code, as once I've done steps 1 and 2 (and sometimes 5) writing the new code itself is near trivial. I don't see how copilot could possibly help me in a meaningful way on these kinds of tasks.

The work that seems most amenable to copilot help is things like utility functions for transforming data/calculating things from it, as in the "Easter" example from the article. But here I would rather use a well-tested library, or if one doesn't exist (or I can't use it), write well documented code that I understand thoroughly.

Put another way, the work that copilot seems most adept at is "junior developer" work performed by people operating at a junior level. But if they delegate "figuring things out" to copilot, they're just going to spend way more time in code review. Or worse, they're not going to spend that time, and will learn nothing/stagnate in their professional progression.

Ever since the advent of satellite nav I've become terrible at learning my way around cities. I'm okay with the loss, since I can generally rely on having nav when I need it, and navigating cities isn't one of my core responsibilities. Copilot is not reliable (it won't answer your question every time), and it automates something that is your actual job. A junior dev might be better served by spending the extra 20 minutes muddling through and building their skillset.

munchbunny · on July 12, 2021

> I don't think that is the goal just like the goal of the current generation of self-driving cars isn't for you to be able to take a nap in the driver's seat.

I think the issue is that the MVP from a customer perspective is, effectively, being able to take a nap in the driver's seat. From a research perspective there are obviously intermediate milestones, but that doesn't make it fit for what people would want to use it for. Same goes for Copilot.

slg · on July 12, 2021

>I think the issue is that the MVP from a customer perspective is, effectively, being able to take a nap in the driver's seat.

Maybe that is a requirement for some users, but it isn't a universal one. Plenty of people see a benefit in assistive technology that isn't complete such as adaptive cruise control or boilerplate/scaffolding dev tools.

It also raises the ethical question of whether these creators are responsible for the misuse of their products. Is it enough for them to say "This is how this product should be used. You are on your own if you use it outside these settings."? Holding developers responsible for the misuse of their software could create an actual slippery slope. Where is the line drawn? Do we start punishing people who create encryption algorithms because someone used the encryption to hide evidence of a crime?

munchbunny · on July 12, 2021

> It also raises the ethical question of whether these creators are responsible for the misuse of their products. Is it enough for them to say "This is how this product should be used. You are on your own if you use it outside these settings."?

I don't think you have to answer the ethical question to address the level of readiness that Copilot or self-driving cars are at. It definitely raises the question, but you don't have to answer it to talk about suitability for use cases.

As you say, it might address the requirements of some specific people. My argument is that Copilot is not good enough yet for the bulk of imagined use cases, whether or not you call that MVP, and I think the post makes a good argument about why.

IanCal · on July 13, 2021

I don't think that's a reasonable assumption - that if you're a mediocre developer you won't understand code written by a better developer.

Better code should largely be easier to understand.

bamboozled · on July 13, 2021

Faster and more fun to write it yourself.

mumblemumble · on July 12, 2021

So, you can at least make a theoretical argument for why self-driving cars can do a better job than humans: they are always alert and paying attention, and the set of things they're trying to accomplish are concrete, can reasonably be presumed a priori and baked into themodel, and are reasonably well specified so that we hopefully don't need hard AI to be successful.

By contrast, Copilot doesn't necessarily have any idea what you're trying to do. So it can, to an approximation, pattern match on what you've already written, and spit out valid code that is "inspired" by things it's seen in the past. But it doesn't actually know what you're trying to do. It doesn't know what your acceptance criteria are, or what invariants you're trying to maintain, or anything like that. And, at least in the places I've worked, most the interesting bugs (by which I mean, the ones that managed to cause trouble in production) happen when the programmer writing the code didn't have a firm idea of what they were trying to do. So, that's what worries me - I would fear that the spots where Copilot can't even theoretically be expected to do a good job happens to be exactly the kind of things for which people would tend to rely on it the most.

Maybe I'm being overly pessimistic? But that's kind of my job - I work in an area where "move fast and break things" is pretty antithetical. But it would still be a lot more compelling to me if I could see a paper demonstrating that a team using Copilot has fewer production defects than a team that's doing exactly the same work but without Copilot. Or alternatively, if it were repackaged as something that's a bit like a smarter version of IDE refactorings. "Hey, it looks like you're about to spit out a big old mess of boilerplate. Let us get that for you." Or, "Hey, some functions you called can fail, how about I go ahead and suggest a catch block so you don't forget to write one?" Basically, give me something that's a bit more smart cruise control and a bit less Autopilot.

IlliOnato · on July 13, 2021

My cynical opinion is that Copilot will be most useful for people who write junk code. They will be able to write more junk code faster.

dogecoinbase · on July 12, 2021

> However in the real world people don't always write bugless code and aren't always alert when driving. Therefore these AI assistants can still have a net positive result as long as they are better than the average performance of a human.

This is an extremely good analogy -- in both situations, the human will become lazy and stop paying attention (regardless of whether they're supposed to keep their hands on the wheel, literally or metaphorically), and it will be possible to have a net result worse than either human or AI acting alone.

bamboozled · on July 13, 2021

Are we also willing to just start accepting lower quality code (on average) because we have AI guiding us towards it.

What's the point of striving to write better, more correct code, being a safer driver, if all we ever do is rely on the status quo to train models to be average?

mattwad · on July 12, 2021

I've been using it and that's 100% correct. If it suggests more than a few lines, i might as well do it myself. However, it has been an awesome Intellisense tool for one-liners.. it can write out the rest of a comment or a simple map/filter method just fine. I don't think it will ever go further than that, but nor does it need to.

visarga · on July 12, 2021

Be sure it doesn't generate any GPL licensed code down to the size of a single letter, to be safe. The reaction to GPL inspired snippet generation has been more fierce than I could have imagined, even though usable snippets are so short.

z3t4 · on July 12, 2021

You can't license a code snippet. You can only copyright/license a complete work, as in a complete app or library. Software patterns might work differently though!?

dec0dedab0de · on July 12, 2021

Any portion of a work has a copyright the moment there is a tangible copy.

There is a uniqueness requirement, but it has nothing to do with length. A unique one liner would have a copyright.

z3t4 · on July 13, 2021

You can not take the source code of an app, change a few lines, and then call it your own. You can however make your own app by stealing a few lines of code here and there... funny enough many companies write code so that all functions depend on all other functions making it impossible to reuse any parts of the code in another app (but i think that is not deliberate)

shinjitsu · on July 12, 2021

> give you something that _looks_ correct without caring whether it actually is

Oh this is going to make teaching intro to computer science sooooo 'interesting'.

It wouldn't be so bad if the students looked at the generated code and understood it, but experience tells me most of them will not.

staticassertion · on July 12, 2021

Students will always cheat tbh. I doubt this will radically change anything with regards to that.

intothev01d · on July 12, 2021

Yea, that's the major problem. I'd prefer just some sort of inline helper that could point directly to documentation, topics, stack overflow answers that might be helpful for whatever I'm developing. An enhanced "Intellisense" or something. That to me, is better, because ultimately it's up to the developer to place scrutiny on the solution. You basically can't blindly accept the implementation which makes this just a constant code review, of yourself? I dunno. This just seems half-baked

visarga · on July 12, 2021

Copilot does approximately that, but instead of showing you code that's written independently it adapts to your variable names and context.

elcomet · on July 12, 2021

It works well for completing lines one by one. I just write a few characters and it can often complete very well the lines.

Of course it will not write complete functions correctly.

tkiolp4 · on July 12, 2021

Agree. It’s way easier to write a function from scratch than to read/evaluate/fix whatever snippet Copilot throws at me. Replace Copilot with “junior engineer” or “senior engineer that knows more than me” and the result is the same (the junior engineer will probably introduce couple of subtle bugs that are hard to find; the senior engineer would write code in such a way that my mediocre brain won’t understand).

jeltz · on July 12, 2021

The best senior developers I have worked with knew how to keep code simple to understand for less skilled developers.

TameAntelope · on July 12, 2021

"Subtly buggy" is another way of saying "mostly working", and I'll take "mostly working" any day of the week!

ohazi · on July 12, 2021

More likely: "inscrutably buggy"

It looks to you like it should work, but it doesn't, and you can't figure out why.

That's not "mostly working," that's a frustrating waste of time. It's hard enough to notice when you accidentally swap `i` and `j` -- why would you want to make your life even more miserable by spending your time finding all of the instances where a pattern matching robot has done something similar in an unfamiliar block?

And if you do happen to get "mostly working" code, but only want it to stay together long enough for you to fundraise, you're basically stating that you plan on foisting this technical debt onto the poor sod you happen to hire.

Attitudes like yours are the reason this dogpile scares me.

TameAntelope · on July 13, 2021

If I understand your argument correctly, it's that GitHub Copilot does not produce functioning code, yes?

If that's the case, I agree with your assessment, that GitHub Copilot isn't delivering on its promise and I would not be using it.

My understanding, however, was the GitHub Copilot does produce functioning code. If you're saying, as I think you are, "No, GitHub is lying about Copilot." I find that claim fascinating (How the hell did we get to the point where a software company could release a product that literally does not do even the most basic version of what it says it does, and only a few people notice?), but I'd need more specific information from you before I'd believe it.

gspr · on July 12, 2021

You must not be coding anything important.

TameAntelope · on July 12, 2021

Ouch, but I'm not interesting in writing software as much as I'm interested in making enough money to spend the rest of my days sipping piña coladas on a beach in a foreign country.

All I really need is for the product to work well enough that I can fundraise and hire someone who's better at programming than I am, someone who hopefully doesn't write comments about how unimportant other people's work is on HN.

gspr · on July 13, 2021

OK, that's fair enough, you're not interested in the engineering side of things – but then expect blowback if you decide to comment on exactly that.

TameAntelope · on July 14, 2021

I am interested in the engineering side of things, for sure, but only insofar as their actual outcomes and justification as part of a value-add to a business.

GitHub Copilot, if it can create "mostly working" code, is a huge value-add to an early business trying to find product/market fit because it cuts down on the time it takes to prototype and generate early versions.

Not every piece of software has to work perfectly every time, and I don't think that's the standard to which we should hold any automated coding tool.

corpdrone2021 · on July 12, 2021

I feel like a mix of hand written test cases and copilot generated code might go somewhere, but I think you've got the basic problem sorted out. I'd much rather type an algorithm in from scratch than wrap my head around whatever copilot spits out.

dkersten · on July 12, 2021

I had an idea long ago that you basically write unit tests (nowadays I would add property-based tests to the mix too) and a genetic algorithm (best I could come up with at the time, nowadays we obviously have much fancier techniques, as evidenced by Copilot) would come up with code to try and make the tests pass.

I could see Copilot used in such a way. I think the interaction would have to change though: force the user to give it the tests as input, not give it some basic instruction, have it generate code, and then I try to write tests after. The tests should be the spec that Copilot uses to generate its output.

Right now, I'm not excited about Copilot. Like you say, understanding what Copilot spits out is difficult and I suspect more error prone than just writing it yourself (since we often see what we want to see and can overlook even glaring mistakes). I'm also not excited about them ignoring the licenses of the code they trained on. But I can imagine a future iteration that generated code to pass some tests that I could get excited about.

corpdrone2021 · on July 12, 2021

It seems to me that "generate the code that makes these unit tests pass" is actually a much saner engineering task than "go from a comments to an implmentation"

fighterpilot · on July 12, 2021

I can foresee one niche where this doesn't matter: exploratory ad-hoc data science.

In this exploration stage, total correctness doesn't matter since you're just getting a feel for the data. Copilot might help a lot with the associated boilerplate.

jessmartin · on July 13, 2021

I think Copilot-like tools could be excellent for the exploration phase. Marvin Minsky mused on this usage back in 1967:

> The programmer does not even have to be exact in his own ideas‑he may have a range of acceptable computer answers in mind and may be content if the computer's answers do not step out of this range. The programmer does not have to fixate the computer with particular processes. In a range of uncertainty he may ask the computer to generate new procedures, or he may recommend rules of selection and give the computer advice about which choices to make. Thus, computers do not have to be programmed with extremely clear and precise formulations of what is to be executed, or how to do it.

From: https://web.media.mit.edu/~minsky/papers/Why%20programming%2...

908B64B197 · on July 12, 2021

I can't wait for coders who used copilot for all coding projects they did. Copy and pasting snippets until it works. No proofs or real exams at bootcamps!

Having offline coding interviews to find Software Engineers will become even more important.

tyingq · on July 12, 2021

It also isn't giving you any information on the source(s) of the generated code. Which might help determine how much to trust it, whether it could have licensing issues, etc.

visarga · on July 12, 2021

It's probably the only way it would work - to show top matching snippets from training data on request, with links to the source and ideally licensing information, if it can be gleaned automatically. This would also clearly show how much it is copying verbatim and what exactly is its contribution.

The funny part will be when all the human programmers who steal code will get doxed as a side effect. It shine a light on lots of skeletons in the closet.

ikiris · on July 12, 2021

I can think of multiple cases where the best result is probably volume of output that looks good without close scrutiny.

intuitionist · on July 12, 2021

I think my best guess is that this is actually meant to produce broken code, so that Microsoft can sell you additional services (cloud fuzzing?) to find and fix the bugs.

cyanydeez · on July 15, 2021

just use it to talso write the tests, problem solved!

nimbius · on July 12, 2021

see when I saw the words 'risk assessment' i figured the presuppositional framework of the authors argument wasnt that copilot was legally sound. In other words, i didnt expect to jump straight to the technical validity of the product.

do not ignore the elephant in the room. copilot is stealing code from projects with open licenses.

ShroudedNight · on July 12, 2021

I would expect an information security expert to comment on the risks they have a professional background in assessing. More broadly, your comment almost seems to suggest that one should preclude all avenues of criticism beyond whichever singular issue is most "obvious" / "problematic". That strikes me as less than optimal.

ForHackernews · on July 12, 2021

Producing code that kinda-mostly works, very quickly, is the behaviour the software industry optimises for. This tool will help do more of that, so it will be very widely adopted.

Developers who not use this (or similar tools) will not be hired, or only in particular niche domains where correctness matters.

ohazi · on July 12, 2021

I'm surprised that so much of the discussion around Copilot has centered around licensing rather than this.

You're basically asking a robot that stayed up all night reading a billion lines of questionable source code to go on a massive LSD trip and then use the resulting fever dream to fill in your for loops.

Coming from a hardware background where you often spend 2-8x of your time and money on verification vs. on the actual design, it seems obvious to me that Copilot as implemented today will either not provide any value (best case), will be a net negative (middling case), or will be a net negative, but you won't realize that you've surrounded yourself with a minefield for a few years (worst case).

Having an "autocomplete" that can suggest more lines of code isn't better, it's worse. You still have to read the result, figure out what it's doing, and figure out why it will or will not work. Figuring out that it won't work could be relatively straightforward, as it is today with normal "here's a list of methods" autocomplete. Or it could be spectacularly difficult, as it would be when Copilot decides to regurgitate "fast inverse square root" but with different constants. Do you really think you're going to be able to decipher and debug code like that repeatedly when you're tired? When it's a subtly broken block of code rather than a famous example?

That Easter example looks horrific, but I can absolutely see a tired developer saying "fuck it" and committing it at the end of the day, fully intending to check it later, and then either forgetting or hoping that it won't be a problem rather than ruining the next morning by attempting to look at it again.

I can't imagine ever using it, but I worry about new grads and junior developers thinking that they need to use crap like this because some thought leader praises it as the newest best practice. We already have too much modern development methodology bullshit that takes endless effort to stomp out, but this has the potential to be exceptionally disastrous.

I can't help but think that the product itself must be a PSYOP-like attempt to gaslight the entire industry. It seems so obvious to me that people are going to commit more broken code via Copilot than ever before.

qbasic_forever · on July 12, 2021

IMHO they built the opposite of what's actually useful for real-world use. Copilot should have been trained to describe what a selected block of code does, not write a block of code from a description. It could be extremely useful when looking at new or under-documented codebases to have an AI that gives you a rough hint as to what some code might be doing. For example if you select some heinous spaghetti code function, press a button, and get a prompt back that says "This code looks like it's parsing HTML using regex (74.2% confidence)" it could be much easier for folks to be productive on big codebases.

heavyset_go · on July 12, 2021

But that would require hiring tons of software engineers to label training data accurately.

Why do that when you can just train a GPT-3 model on public repositories and call it a day?

qbasic_forever · on July 12, 2021

No presumably copilot skirted that need by just analyzing the AST of code they host and using the nearby comments to identify what a section of code is meant to do. This would use the same dataset but solve the opposite problem, generate a description from a block of code AST as input.

patchguard · on July 13, 2021

> copilot skirted that need by just analyzing the AST of code they host and using the nearby comments to identify what a section of code is meant to do.

I'm curious what it spills out for things like "Todo", or "this is probably broken", etc.

gspr · on July 12, 2021

Sorry for adding just noise, but I think this is the most insightful comment I've read on HN this year. Excellent analysis and idea!

moralestapia · on July 12, 2021

Something like this would be amazing, particularly for poorly written, obfuscated or even disassembled/decompiled code!

dkersten · on July 12, 2021

Now that is a damn good idea!

rantwasp · on July 12, 2021

it’s a good idea. depending on how “smart” it is it can be extremely hard to pull off

toomuchtodo · on July 12, 2021

Ideally, you'd train/teach it using PR code reviews. Human labeling and all that jazz.

JetSpiegel · on July 15, 2021

> Ideally, you'd train/teach it using PR code reviews.

Which is why, based on Windows state, it will never come out of Microsoft.

randallsquared · on July 12, 2021

I'm not sure I understand how you envision this working, given the underlying technology. You'd have to have a pretty large cache of such analyses to train on, right?

qbasic_forever · on July 12, 2021

Github has a huge amount of source code and likely for copilot they already had to transform it into an AST to look at comments and nearby code. This would use the same dataset but build the opposite model--input a block of code AST and get a guess as to what the description (i.e. comment) should be for it.

randallsquared · on July 13, 2021

My naive assumption is that they don't have nearly that level of control. I'd be surprised if they have an AST step before the tokenizer, or in it.

im_down_w_otp · on July 12, 2021

This is the thing that made no sense to me about it as a premise. Doing correct program synthesis is really hard even when you have really opinionated and well-defined models of the domain (e.g. the Termite project for generating Linux device drivers). The domain model for Copilot is somewhere between non-existent to so open-ended (i.e. all the diverse code on Github, et al.) as to be functionally non-existent.

A bare minimum baseline validation check for Copilot would be to see if it provides you code which won't compile in-context. If it will, then that means it's not even taking into account well-specified domain model of your chosen programming language's semantics. Which, upon satisfaction, is still miles away from taking into account the domain of your actual problem that you're using software to solve.

The only place where the approach taken, as-is, makes sense to me is for truly rote boilerplate code. However, that then begs the question... how is this machine learning approach more effective than a targeted heuristic approach already taken by existing IDE tooling, etc.?

FWIW, I don't think any of this is lost on GitHub. I think Copilot is more likely a tremendously marketable half-step and small piece of a larger longer-term strategy unfolding at Microsoft/GitHub to leverage an incredible asset they're holding, i.e... practically everybody's source code. The combination of detailed changelogs, CI results (e.g. GitHub actions), Copilot, and a couple other key pieces makes for a pretty incredible basis for reinforcement learning to multiple ends.

juped · on July 12, 2021

I'd like to think that it's because Copilot is so obviously useless that that part of it doesn't need discussion.

bryanrasmussen · on July 12, 2021

>It seems so obvious to me that people are going to commit more broken code via Copilot than ever before.

Maybe we should use Copilot to commit more open source code meaning that Copilot becomes more and more corrupted and unusable!

of course then we end up with a bunch of bad open source code that will turn people off of using open source.

Gee, I don't think Microsoft really thought this one through.

reilly3000 · on July 12, 2021

I would hire copilot to write tests for me, that’s about it. Writing tests can be a drag. It’s really a low-risk proposition to have generated code attempt it. If it’s a usable test, maybe it will catch a bug. If not, then kill it and let it generate a few more.

The expectation is entirely different than producing code. Code needs to be correct, secure, performant, and readable. Failure on any of those fronts can be expensive to disastrous. Nobody can reasonably expect a test suite to catch every bug, even if created by the smartest humans. If a copilot-created test does prevent a bug from shipping it provides immediate value. I could see it coming up with some whacky-but-useful test cases that a sane person might not consider. From a training perspective I would think that assertion descriptions contain more consistent lexical value than the average function signature.

It seems like the ambitious data scientists, product marketers, and managers fell in love with a revolutionary idea about AI writing code, and neglected to consult the engineers they are trying to ‘augment’.

wpietri · on July 12, 2021

Nope. That's like saying, "I might let a machine write the docs for me."

Good tests are documentation that a computer can verify. Because they explain the meaning of parts of the system, they contain information not available in the code. If you try using ML for test generation, you'll have the same problem you do with GPT-3 prose: it might look plausible at first glance, but lacks coherent meaning.

You'd also end up with one of the problems common in big test suites: poorly factored tests that end up being the sort of expressive duplication that is a giant drag on improving existing code. ML is nowhere near advanced enough to say, "Gosh, we're doing the same sort of test setup a bunch; let's extract that into a fixture, and then let's unify some fixtures into an ObjectMother.

For people looking to get the computer to do the work of catching more things with less burdensome test writing, I suggest taking a look at things like Hypothesis: https://hypothesis.readthedocs.io/en/latest/

ayewo · on July 12, 2021

You claim that:

> If you try using ML for test generation, you'll have the same problem you do with GPT-3 prose: it might look plausible at first glance, but lacks coherent meaning.

There is a company in this space of generating "plausible tests" for legacy code bases at very large enterprises (think Goldman Sachs, telcos etc) called Diffblue [0].

They raised funding back in 2017 [1] and it seems their biggest value-add is in creating unit tests for legacy Java code bases that often have little to no unit tests.

Essentially, these AI generated unit tests help a team "document" all known the behaviors of a legacy code base such that when a change is introduced that violates the behaviors covered by the generated unit tests, the tool can alert the team of the potential presence of a regression.

Anyway, they offer a fairly basic browser-based demo of their AI product called Diffblue Cover [2].

Are you aware of them?

0: https://www.diffblue.com/

1: https://techcrunch.com/2017/06/27/diffblue/

2: https://www.diffblue.com/try-cover-browser/

gravypod · on July 12, 2021

Is diffblue AI based or is it just property based testing? I assume that since it's limited to Java that they just decompile the opcodes and find what branches each method has and writes a test that calls each method with all possible permutations that lands down each branch.

ayewo · on July 12, 2021

I'm not affiliated to them in any way, so I have no insight into their secret sauce. I shared because their product seems relevant to the discussion.

wpietri · on July 16, 2021

I haven't looked at it. But there are plenty of "magic beans" product targeted at Enterprise companies with legacy code. It's perfectly plausible to me that many of the companies using something generating bad tests wouldn't know the difference, because that's what their code base has already.

developer2 · on July 12, 2021

> one of the problems common in big test suites: poorly factored tests that end up being the sort of expressive duplication that is a giant drag on improving existing code.

I feel like you just described every developer/codebase where mock testing is stupidly enforced. Where every single unit test mocks every single indirect object. 98% of the testing code is just exhaustive setup and teardown of objects not being tested by each test, and then a bunch of conditional checks to ensure that every deeper/indirect method is being called exactly the right number of times with exactly the right arguments and returning exactly the right value. Almost all of the test code is just hacking mock objects. The actual purpose of each test is buried so deep that it's impossible to even understand the business logic being applied.

I hate evangelical "mock testers" with a passion.

wpietri · on July 16, 2021

Yes. I have been doing unit testing for a long time, and I think this is a clear antipattern. It's cargo-cult testing, not actually a serious effort to improve quality and developer productivity.

spullara · on July 12, 2021

These are absolutely the worst tests I have ever seen. They make iterating on the implementation almost impossible. Why people do this I will never understand.

lexandstuff · on July 12, 2021

Preach. This is what chasing 100% test coverage looks like.

nullc · on July 12, 2021

It would be a mistake to say that the output from GPT3 lacks coherent meaning. It's not that the output is gibberish, it's that it's too easy to mistake it for a human's work. This means that it's easy to mistake it for something that was created with understanding and intention, when in fact the author was nothing more than a random number generator. The same risk exists for copilot. [--GPT3]

TchoBeer · on July 12, 2021

>when in fact the author was nothing more than a random number generator

GPT is not "nothing more than a random number generator"

nullc · on July 12, 2021

Well, take it up with GPT3 since it wrote that reply. :P

Though I don't fully disagree with it, though 'nothing more' is a bit too strong. The author of a GPT3 written comment like the one here where the prompt was pretty much just the thread is pretty much just the RNG. The language model makes the random choice draw from the distribution of plausible texts, and the RNG picks the output.

GPT3 could have written your comment-- if only it drew the right random numbers.

TchoBeer · on July 12, 2021

>pretty much just the RNG

What RNG? It definitely doesn't randomly pick words. If the comment I responded to was written by a bot (is that legal? Can I report that?) then it's indistinguishable from a human written comment.

nullc · on July 12, 2021

GPT3 works in compressed representation with symbols that are (sometimes) smaller than complete words but larger than letters. It takes a set of symbols as a context and generates a probability distribution function for the next symbol. Then a random number generator is used to sample from that distribution, and the process is repeated with the selected output added to the context. So its output is random but not uniformly random.

Exclusively selecting the most likely symbol produces pathological behavior outside of extremely short output.

What caused GPT3 to output its comment rather than yours is a product of its random choices. There is a set of choices it could have made which would have caused it to output your comment. You can see this property employed by the GPT2 text compressor: https://bellard.org/libnc/gpt2tc.html to compress text it just writes down the choices, using an entropy coder to represent likely choices with fewer bits.

I assume copilot is the same general structure as GPT-- just trained on different data.

And yes, the comment you responded to was written entirely by GPT3 (with some number of retries and trims). As it said-- it's "easy to mistake it for a human's work". :) There is nothing illegal about it, but I suppose HN would prefer that there be enough human supervision of bot comments such that they're limited to contexts where they are funny/insightful. :P

TchoBeer · on July 12, 2021

>its output is random but not uniformly random

Is the same not true of human writers? Are human writers deterministic?

cmeacham98 · on July 12, 2021

They didn't say they were going to have Copilot write _all_ the tests. Writing tests for cases you can think of and trying Copilot for the extras doesn't seem like that bad of an idea.

debaserab2 · on July 12, 2021

Sounds like a good way to pile up your test suite with a bunch useless tests.

haggy · on July 12, 2021

Can you imagine the CI build time bloat? I'm getting anxiety just thinking about it

comicjk · on July 12, 2021

It would be awful to write every test using Copilot, but there is potential there for a certain kind of test. If I'm writing an API, I want fresh eyes on it, not just tests written by the person who understands it most (me). For example, a fresh user might try to apply a common pattern that my API breaks. Copilot might be able to act like such a tester. By writing generic tests, it could mimic developers who haven't understood the API before they start using it (most of them).

wpietri · on July 12, 2021

If you can find an example of Copilot coming up with a test you wouldn't have thought of, I'd be very interested to see it.

Even if that happened, which I am not expecting, I think the need is much more easily solved via means that are simpler and more effective. E.g., a good tester writing up a list of things they test about APIs: https://www.sisense.com/blog/rest-api-testing-strategy-what-...

barkingcat · on July 12, 2021

copilot as defined would not be "fresh eyes"... it would be "old tired eyes of every code writer who uploaded stuff to github, not knowing if they made one off errors or mistakes in their code"

comicjk · on July 12, 2021

I mean fresh eyes with respect to my new API. Having seen a lot of other code is a benefit. I expect most tests that Copilot writes to fail, but I would hope some would fail in interesting ways. For example, off-by-one errors might encourage me to document my indexing convention, or to use a generator rather than indexing.

ska · on July 12, 2021

> I want fresh eyes on it

Crucially, that's not what copilot is.

neatze · on July 12, 2021

My time is spend mostly not on writing code, but thinking what program has to do, testing (including writing unit tests) and understanding errors, I always tell people half joking programming is not about writing code, but an ability to debug it; understanding requirements, errors and bugs is hard, writing code and fixing bugs is relatively easy, in general.

Maybe Copilot 2 will do exactly this; it will generate tests based of half working code, run them and suggest improvements, that would increase productivity by like ~100%, but to me this sounds to good to be true.

IncRnd · on July 12, 2021

> Writing tests can be a drag. It’s really a low-risk proposition to have generated code attempt it.

If Copilot can't write the correct code in the first place, you really shouldn't expect a proper test to be written by Copilot.

> Code needs to be correct, secure, performant, and readable.

Most tests should also have at least three of those attributes. Nobody actually wants their tests to be incorrect, slow, or impossible to understand or modify.

dkersten · on July 12, 2021

From the article:

"Oh, and they're both wrong."

"Both look plausibly correct at a glance"

You would end up with tests that look plausibly correct, but test the wrong results.

PufPufPuf · on July 12, 2021

On the contrary, it might be interesting writing tests by hand and using the AI to produce code. If the tests are good enough for humans, they should be good enough for AI, given that the AI doesn't try to be actively malicious.

_fat_santa · on July 12, 2021

I think the use case for Copilot is a bit misunderstood. The way I see it you have two types of code:

1. Smart Code: Code that you honestly have to think about while you're writing. You write this code slowly and carefully, to make sure that it does what you need it to

2. Dumb Code: This is trivial code, like adding a button to a screen. This is code you really don't have to think about, because you already know exactly how to implement it. At this point your biggest obstacle is how fast can your fingers type on a keyboard.

For me Github Copilot is useless for "Smart Code" but a godsend when writing "Dumb Code". I want to focus more on writing and figuring out the "Smart Code", if I need to throw a form together in HTML or make a trivial helper function, I will gladly let AI take over and do that work for me.

throwawaygh · on July 12, 2021

> This is trivial code, like adding a button to a screen.

UX is probably the most important aspect of most software products. Every software product is either "smart code" or "smart ux". No one pays much for "dumb code with bad UX" except in dysfunctional markets.

Adding a button to a screen should be trivial, and if it's not you need better tools. (As in "a not-horribly-misdesigned language and framework", not as in "giant transformer".)

Deciding where to add the button, its shape, its size, what happens when it's clicked, the text on the button, ... is anything but trivial.

somesortofthing · on July 12, 2021

But everyone does pay for dumb code. Even in the best-written and most efficient codebases, there's still going to be some amount of tedious glue code and boilerplate that you have to write in order to create a functioning product. It definitely would be better to have better languages and frameworks instead of a giant transformer, but the better languages and frameworks don't exist yet while the giant transformer does.

throwawaygh · on July 12, 2021

> there's still going to be some amount of tedious glue code and boilerplate that you have to write in order to create a functioning product.

This is true, but actually tedious glue code is often non-trivial. For example, in one of my hobby projects I have a repo where I have to write a lot of glue code to schlepp data from a CSV file format into an existing database. Doing this correctly requires reading through the (lengthy) documentation for the format of both the CSV data and also the system that ingests the database, since there are a bunch invariants about the key tables and columns that aren't enforceable in SQL (and obviously not enforceable in the CSV).

This is the sort of glue code/boilerplate where a synthesizer that can understand natural language would be actually helpful.

> but the better languages and frameworks don't exist yet

There are certainly some languages that are less verbose than others.

Java and Go are very boilerplate-y languages. Python is also pretty verbose and inexpressive for certain types of code.

The typescript example on the copilot page right now is a prefect example of "ugh just use a better language".

The examples of boilerplate where copilot shines seem like situations where really simply using "snippets" would work better. E.g., everything on the copilot homepage right now.

lowbloodsugar · on July 12, 2021

>Java and Go are very boilerplate-y languages.

And because of the nature of boilerplate, Java has IDEs that will both generate and modify this boilerplate without thought, no AI required.

I don't remember the last time I typed the text 'class' for example. Instead I type "new Foo(someStringVar)", and then hit Alt-Enter and my IDE creates the file, the `class Foo` along with a ctor that takes a String.

noduerme · on July 12, 2021

Even in different versions of the same project, the glue may change. E.g. my boilerplate functions for making ajax calls have changed half a dozen times since 2005, been gutted and rewritten to be promosified, upgrade to websockets, and all sorts of other options, but I still have projects deployed that use several previous versions. And my typical PHP or Node executor has evolved, too. I myself find it confusing when working new bits into older projects where I'll occasionally match the wrong gateway code with the wrong frontend.

In other words, a machine looking at only my own glue would be more likely to mismatch or use the wrong version in any given situation.

OJFord · on July 12, 2021

And once you decide all that, you still have to write:

    <Button ...

or whatever.

aabhay · on July 12, 2021

Code should be 90% thinking and 10% typing. I don’t need to over optimize the typing part

OJFord · on July 12, 2021

Yeah well until I get that job I suppose I'll just keep typing.

adkadskhj · on July 12, 2021

I'm mostly curious how much overlap there is in use cases for Snippets with parameters, possibly even snippets with conditional parameters vs Copilot.

I imagine there's some situations where Snippets do better, and where Copilot does better, but the more complex the situation the less i trust Snippets... but _also_ the less i trust Copilot.

It seems my trust in Copilot is very similar to that of use cases for Snippets. To throw out fake numbers, it makes me feel like Snippets (and tools like them) cover ~%70 of Copilot's use case. So i'm really curious on knowing what that %30 is, and if it is ever useful.

OJFord · on July 12, 2021

Yes I agree, I don't use a snippets feature/plugin, but that's what I thought of when it launched - 'why wouldn't I just'.

I suppose there's a (supposed) advantage that it's automatically finding and suggesting the snippet for you, rather than relying on you to think of it and recall the key binding, or know that it's there to such for.

throwawaygh · on July 12, 2021

Do you actually, though? I don't type out much HTML these days.

spockz · on July 12, 2021

This is basically the same argument that was made against required boiler plate in Java. “Your IDE can just generate that for you!” (And in sufficiently advanced cases, also keep ik it up to date.)

Imho, it is just an argument for making better languages and libraries. (These libraries will also make it easier to use with copilot.)

wpietri · on July 12, 2021

Exactly. The reason we aren't all sitting around hand-writing assembler is that programmers look at tedious processes and find higher-level abstractions that allow us to do more work in less time.

Once we spot a tedious common pattern, we should be finding ways to DRY it up. Configs, libraries, frameworks, DSL, tools, and languages are all great ways to do that. Copy-pasting and machine-generating code are short-term thinking in two ways: they focus on the initial creation of the code at the expense of maintenance, and they give up on increasing abstraction, lock the system into a productivity plateau.

boxed · on July 12, 2021

You still have to describe to co-pilot what you want. So that doesn't make much sense. You should work on a higher level of abstraction then. If you aren't, why not spend a few minutes writing some functions instead of generating tons of unmaintainable boilerplate with co-pilot?

sumtechguy · on July 12, 2021

The code for a 'button type code' is trivial. Most what we used to call wizards handles that bit.

It is what the action of that button is where the real fun comes in.

I once had a project that was a yes/no dialog. Two buttons and some text. I had the dialog up and running in under an hour. The action that happened when you pressed yes took 3 months to finish.

skohan · on July 12, 2021

I think you just described which coders will be replaced by AI first

wongarsu · on July 12, 2021

I don't know, we somehow manage to replace point-and-click GUIs for placing buttons (Windows Forms etc) with Frontend developers writing elaborate code to achieve the same result in HTML/CSS. Productivity is far from the first priority for frontend development.

skohan · on July 12, 2021

Idk I think it's not as easy to achieve a good front-end with point-and-click as you describe. When you have to do things like adaptive layouts, it seems like code actually manages a bit better than a WYSIWYG. Or there is some point of complexity where you need so many configuration options in a point-and-click editor that the code becomes easier to manage.

BigJono · on July 12, 2021

It's not that, it's that when you eventually reach the point where you need to do something that can't be handled by the WYSIWYG editor, you're left with 50,000 lines of shitty machine generated code that's almost impossible to work with.

The risk with WYSIWYG editors isn't that there's some tipping point where it becomes 10% more efficient to write code and you lose a bit of productivity or something. It's that something comes up half way through development and the WYSIWYG doesn't have the feature you need[1], and the entire project slams into a brick wall and dies instantly.

You can prevent this by running into the exact same problem CoPilot has, which is that reading code is harder than writing it. If you try to avoid the brick wall by having devs familiarise themselves with the code as the WYSIWYG generates it, those devs would have just been able to build it themselves in less time and with cleaner code.

[1] which will always happen eventually, because they're balancing the feature set for the exact reasons you mentioned. If they can do everything code can then the UX is going to be so bloated and horrible that it'd be trivially worse to use than just writing code.

mikepurvis · on July 12, 2021

And the code is better for version control and review.

gonzo41 · on July 12, 2021

Or you're describing how you could put copilot in a box and make a really good low code gui programming solution where the complex stuff is good old complex code.

skohan · on July 12, 2021

It sounds like you're talking about a way to cut my front-end hiring budget by 80%.

pydry · on July 12, 2021

Did autocomplete replace any coders?

If not, why should copilot?

brundolf · on July 12, 2021

The problem is you still have to go back and read through the "dumb code" to make sure it was written correctly. At a certain point, is that actually faster than just writing it yourself? Maybe a little bit, for some people and for some usecases, but it becomes a much narrower value-proposition.

dhagz · on July 12, 2021

Personally, I'd rather use snippets or some form of "dumb" code generation over an AI to generate the "dumb code". Sure, I'll probably still have to do some typing using those methods, but it's still less than if I were doing it all by hand.

wpietri · on July 12, 2021

It's not clear to me how that's better than the traditional solution to generate "Dumb Code", copy-pasting something. And we all know the problems with copy-pasting as a lifestyle.

hrdwdmrbl · on July 12, 2021

Yes! Why is everyone so negative about copilot? I think it's a great name for the product. It helps you write, it doesn't write for you. You're still in charge and it can't write the "smart code".

GuB-42 · on July 12, 2021

Generally, a copilot is someone you can trust. The whole point of having a copilot is to reduce my cognitive load. If I am a pilot and have my copilot fly the plane while I do something else, I may be in charge, but I trust him to fly safely and alert me if things go wrong. A copilot is also a licensed pilot, able to do almost everything the pilot does, he is just not in charge.

The article shows that I can't trust GitHub copilot. So I don't think it is a representative name. Here, it would be more like a servant.

hrdwdmrbl · on July 21, 2021

I love servants!

arkitaip · on July 12, 2021

≥ These three example pieces of flawed code did not require any cajoling; Copilot was happy to write them from straightforward requests for functional code. The inevitable conclusion is that Copilot can and will write security vulnerabilities on a regular basis, especially in memory-unsafe languages.

If people can copy paste the most insecure code from Stack Overflow or random tutorials, they will absolutely use Copilot to "write" code and it will be become the default, especially since it's so incredibly easy to use. Also, it's just the first generation tool if it's kind, imagine what similar products will accomplish in 20 years.

whoknew1122 · on July 12, 2021

With the pace of technological innovation, I'm honestly not sure what a similar product will be able to accomplish in 20 years. It'll be crazy for sure. But I'm worried about today.

This is a product by a well-known company (GitHub) which is owned by an even more well-known company (Microsoft). GitHub is going to be trusted a lot more than a random poster on Stack Overflow or someone's blog online. And GitHub is explicitly telling new coders to use Copilot to learn a new language:

> Whether you’re working in a new language or framework, or just learning to code, GitHub Copilot can help you find your way. Tackle a bug, or learn how to use a new framework without spending most of your time spelunking through the docs or searching the web.

This is what differentiates Copilot from Stack Overflow or random tutorials. GitHub has a brand that's trusted more than random content creators on the internet. And it's telling new coders to use Copilot to learn things and not check elsewhere.

That's a problem. Doesn't matter what generation of the program it is. It creates unsafe code after using its brand reputation and recognition to convince new coders to not check elsewhere.

azangru · on July 12, 2021

> GitHub is going to be trusted a lot more

> GitHub has a brand that's trusted more

Consider Google Translate, right? Google is a well-known brand that is trusted (outside of a relatively small group of people that doesn't trust Google on principle). Yet every professional translator knows that the text produced by Google Translate is a result of machine translation, Google or no Google. They may marvel at the occasional accuracy, yet expect serious blunders in the text, and would therefore not just trust that translation before submitting it to their clients. They will check. Or at least they should.

Same with programmers.

wpietri · on July 12, 2021

Sure. Is Google Translate used only by serious professional translators who have a rigorous translation-checking process? Not at all.

And as you say, it will be the same with programmers. Who's this being targeted at? People "working in a new language or framework, or just learning to code". The whole value prop is, "You don't have to know what's going on!"

The important difference is that the target readers can usually spot an egregiously bad translation. But the target users for software cannot easily spot gaping security holes and other serious issues until something bad happens.

naniwaduni · on July 12, 2021

> The important difference is that the target readers can usually spot an egregiously bad translation.

What, no, that's not true at all, that's like the second biggest problem. GTL routinely does stuff like invert the meaning of clauses, or drop information, or hallucinate absent context. Target readers can't reasonably be expected to catch any of that.

wpietri · on July 19, 2021

In which case it's not an egregiously bad translation, it's a subtly bad one. Progress of a sort, I guess!

tylersmith · on July 12, 2021

> The whole value prop is, "You don't have to know what's going on!"

The value proposition is a better tab completion. It's not autopilot.

mipmap04 · on July 12, 2021

I think this is an important way we need to frame the use of these tools for junior developers. I'd advise that anyone who is recommending this product to their team also take the time to give this analogy - maybe even going so far as to require explicit comments that notifies reviewers when code was provided by Copilot and similar services.

whoknew1122 · on July 12, 2021

The difference here is that professional translators often have professional training.

The bar is substantially lower for a 'programmer', especially with an incredibly large bootcamp market which churn out 'professional' 'programmers' in 6-8 weeks.

Been to a bootcamp? Know some leetcode? Someone will hire you. And then you got Copilot advertising its services to you as a way to learn how to code. The implication of 'learn to code' being 'learn to code correctly'.

Google Translate has no similar relationship with professional translators.

azangru · on July 12, 2021

> The difference here is that professional translators often have professional training.

You'd be surprised. What you described here for programmers is true for translators as well, and probably for many other specialities in which the ability to deliver the result is more important than any documents certifying that you've had a formal training for how to deliver those results. In case of translators — found an agency? Check. Passed an interview with a test? Check. You are good to go.

foepys · on July 12, 2021

> imagine what similar products will accomplish in 20 years.

People said the same thing about UML and similar tools so I'm not holding my breath.

arkitaip · on July 12, 2021

Maybe you are right but where UML created busy work, Copilot will literally do your work for you. I can even imagine a future where management makes it policy to Copilot first to save time and money.

wpietri · on July 12, 2021

Most of my work isn't copy-pasting snippets. It isn't even typing code. It's understanding user needs and the existing system, and then figuring out how to making things better for the user with also improving the system. So this does not do my work for me.

I can also imagine clueless bosses mandating Copilot use and that's what scares me. The real costs of most code aren't in the first writing. They're in the long-term maintenance. Copilot does not and cannot understand the whole system, or what makes for maintainability down the road. So it can't make that better, and will likely make worse. In the same way that code generation tools and code wizards made things worse.

Zababa · on July 12, 2021

Reviewing code written by Copilot may be longer than writing the code in the first place.

swiftcoder · on July 12, 2021

> imagine a future where management even makes it policy

Because management policies correlate well with engineering excellence, right?

piokoch · on July 12, 2021

I think there is some difference. You don't come across some piece of code by chance, you were actively looking for it, probably there were multiple blogs, SO entries with needed information, one of those sources has to be chosen. You know that this is some random blog post or SO answer given by someone fresh.

Copilot is something different. Code is suggested automatically and, what's the most important, suggested by the authority - hey, this is GitHub, huge project, largest code repo on the planet, owned by Microsoft, one of the most successful company ever. Why should you not trust the code they are suggesting you?

And that's for starters before malicious parties start creating intentionally broken code only to hack system built with it. Greedy lawyers who will chase some innocent code snippet asking to pay for using it, etc.

muglug · on July 12, 2021

I had not considered the proliferation of terrible open-source code on GitHub. I'd wager that the amount of code in public repositories from students learning to code may outweigh quality code in GitHub.

I wonder if there was any sort of filter for Copilot's input — only repositories with more than a certain number of stars/forks, only repositories committed to recently etc.

tovej · on July 12, 2021

I wouldn't bet on it, they didn't filter by license after all.

eddieroger · on July 12, 2021

> Ultimately, a human being must take responsibility for every line of code that is committed. AI should not be used for "responsibility washing."

That's the whole point, and the rest is moot because of it. If I chose to let Copilot write code for me, I am responsible for it's output, full stop. This is the same as if I let a more junior engineer submit code to prod, but there aren't blog posts about not letting them work or trusting them with code.

smusamashah · on July 12, 2021

Copilot doesn't seem any better then Tab Nine. Tab nine is GPT-2 based, works offline and can produce high quality boilerplate code based on previous lines. It can also generate whole methods which when work seems mind blowing but they are not always correct. Most suggestions are usually mind blowing anyway because previously we never had this kind of code completion.

It feels like it wrote the whole line which you were going to write exactly as it should have. But that's all it does. And it seems like Copilot is the same but on much larger scale and online.

tylersmith · on July 12, 2021

I'm a long time TabNine user and fully agree. It's cool that Copilot is bringing the idea to more people but TabNine is the better product.

onionisafruit · on July 12, 2021

This articulates some of the concerns I had trying copilot.

I noticed that I ended up assuming the code reviewer role when I was trying to write code. Context switching between writing and reviewing felt unnatural.

I also think I am less likely to spot a bug than I am to avoid writing it in the first place. Taking the off-by-one error in the last example. I don't think I would have made that mistake, but if copilot had presented that code block, I probably wouldn't have noticed the error either.

mumblemumble · on July 12, 2021

The moon phase example is illustrative in another way.

It's not technically possible to precisely calculate the moon's phase based on time alone. It's an optical effect that is influenced by parallax, so you have to pay attention to location as well. This is, for example, why Eid al-Adha falls on different days in different parts of the world. So the function signature itself is potentially wrong, depending on my needs. I might find that out if I had to do some Googling to finish the function, but (assuming I didn't already know) I'm not sure if that possibility would ever have occurred to me if I were using Copilot.

Copilot can spit out code that's influenced by what others have written. But can it clue you into design considerations like this? Or should we be worried that it is helping us to write code that does the wrong thing with a higher degree of confidence?

daveFNbuck · on July 12, 2021

The function signature was determined by Copilot. The author just wrote the comment above the function and the word function, then let Copilot determine what the signature was and how it would be implemented.

mumblemumble · on July 12, 2021

These use cases where people are running into trouble are increasingly sounding a lot like "What happens if I engage Autopilot and then take a nap?" That's definitely not how the tool was intended to be used, but it's hard to see how you can reasonably expect that nobody would ever do that.

prepend · on July 12, 2021

> me @'ing them several hundred times about ICE

This is such curious behavior to me. Does someone really @ a corporation hundreds of times about anything? Does this have any effect? Should it?

It makes me doubt the rationality of the author’s post if ve truly did this. Although I suppose maybe their use of Twitter is just completely different from anything I understand.

smitop · on July 12, 2021

You can use some Twitter search filters to see all of them: https://twitter.com/search?q=from%3A0xabad1dea%20%40github%2...

tylersmith · on July 12, 2021

It's a way to feel better about yourself without changing any comfortable behaviors. The author still uses Github and the ICE tweets are public Hail Mary's.

infinitezest · on July 12, 2021

Mostly, this just seemed like a non sequitur to me. Right away, I get the sense that "oh, this author has a bone to pick with GitHub.". Even if it's unrelated to the crux of the piece, I already feel like I'm going to have to take what the author says with a grain of salt.

ainar-g · on July 12, 2021

I'm pretty sure that this conclusion isn't new, but I've come to think that Copilot shouldn't be thought of as a better developer, but merely a quicker one. Obviously its code will be somewhat average, considering that it's been trained on code the only unifying characteristic of which is that it's public.

Something like Copilot, but trained explicitly to analyse the code instead of writing it could be much more useful, imo. Basically a real-time code review tool. There are similar tools already, but I'm talking of something that is able to learn from the actual codebase being worked on, perhaps including the documentation, and giving on-the-go feedback.

rob74 · on July 12, 2021

If you interviewed two developers, one who produces reasonably correct code in a given amount of time, and another one who produces code which is subtly incorrect most of the time, but much faster, which one would you hire?

The problem with your proposal is that it's relatively easy to do what Copilot does at the moment using AI, i.e. guess what code you are looking for and find something that does (or says it does) more or less that. However, which codebase would you use to check against if the generated code is really correct? The same codebase that produced the more-or-less-correct code in the first place?

jazzyjackson · on July 12, 2021

I like this idea, given that it takes advantage of how git repos are made of bug-fixes. How many git diffs are out there that update a '=' in an if statement to '=='?

so an AI copilot should be watching out for code I write that looks similar to code that was updated in another repo. It could even use the text from issues to synthesize a suggestion of why your code might cause problems!

emerongi · on July 12, 2021

Copilot doesn't seem like the right word. Maybe first year college student with no previous programming experience? Then it would be clear what level of help you are actually getting.

Impressive, for sure. Unclear whether it's a net-positive tool, though.

bla3 · on July 12, 2021

GitHub Excited Intern

jsjohnst · on July 13, 2021

> Copilot doesn't seem like the right word.

Maybe “autopilot” then (as in the Tesla marketing term, not as in a real autopilot)? /s

Both lull you with a false sense of security which will suddenly and unexpectedly cause you to pay dearly for using it. Interestingly both seem to also get a vaguely similar balance of critics, apologists, and advocates on here.

Sample size of two here, but what is it with companies using %pilot to describe product features which are nothing like the actual appropriate use of the term?

baud147258 · on July 12, 2021

fuzzy matching with publicly available code on Github

Hamuko · on July 12, 2021

GitHub Student Driver

eCa · on July 12, 2021

Re license washing:

> This may or may not suddenly become a huge legal liability for anyone using Copilot.

And if it doesn’t, can’t Copilot also be used to license wash for people not using it, but claiming to be?

mattwad · on July 12, 2021

People complaining about Copilot should just try it. All the concerns people are bringing up are correct but missing the point. Just think of it as a context-aware autocomplete that can complete more than just properties - it can finish out that one-line comment or map function too! It's really not that intrusive, nor is it going to replace anyone's coding job or even require less programmers. It'll just speed you up a bit, similar to auto-completion. I think at least 70% of the time I take its suggestions, which is good enough to keep using it.

ghostly_s · on July 12, 2021

Did you even read the article?

dboreham · on July 12, 2021

This is a fundamental problem with ML and current generation "AI" : it only works in scenarios where a statistical win is acceptable (e.g. ad targeting). It is useless and often worse than useless where you want absolute correctness and false positives are highly problematic (e.g. spam filtering, writing code, not running over pedestrians).

IfOnlyYouKnew · on July 12, 2021

It's interesting how the community that used to fervently argue that, say, the sampling of a few seconds of a musical composition is so obviously fair use doesn't extend the same attitude when it comes to snippets of source code.

Indeed I don't remember any of these complaints ever being made about, for example, AI-generated music or images, even though they work exactly the same and were trained on datasets of copyrighted works, both commercial and CC-licensed.

Compared to the manual sampling that DJs and other musicians do, the AI process almost certainly produces only fairly generic code snippets, since they always need a multitude of examples. Some loop-through-file python snippet is a legal risk but five seconds of a Disney song don't reach the level of creativity needed for copyright to matter? That seems strange to me...

_ph_ · on July 12, 2021

The problem is, we don't know what Copilot is going to do. Sometimes it reproduces entire files in verbatim. That is certainly a copyright violation. Sometimes it produces whole functions which are more or less taking verbatim from a copyrighted file. The user cannot see the level to which they are "verbatim" and there are also no clear legal guidelines to what is considered violating copyrights and what not.

ShroudedNight · on July 12, 2021

While I think there is some room for retrospective humility, It seems perfectly consistent to believe that, morally, de minimis use of copyrighted material should be considered fair-use while acknowledging that there are practical legal and professional risks for doing so (especially blindly), and that encouraging people to engage in that behaviour in an uncritical manner (especially in the context of engineering) is sociologically reckless.

generationP · on July 12, 2021

The Easter Date algorithm was probably someone implementing an algorithm from the Wikipedia ( https://en.wikipedia.org/wiki/Date_of_Easter#Anonymous_Grego... ) without bothering to understand it (because honestly it's not a very interesting problem). No wonder it's uncommented.

As long as the AI just regurgitates lines from repositories like a bad undergrad cheating on his homework, CS jobs should be safe.

The fact that it has picked up the GPL might not mean that much -- it might appear in dual-licensed projects.

user-the-name · on July 12, 2021

> The fact that it has picked up the GPL might not mean that much -- it might appear in dual-licensed projects.

Github have stated that Copilot is trained on all public code on Github, regardless of license. It very trivially follows that it has been trained on a lot of code that is explicitly GPL single licensed. We don't need to do any guessing here.

skohan · on July 12, 2021

> As long as the AI just regurgitates lines from repositories like a bad undergrad cheating on his homework, CS jobs should be safe.

They said the same thing about Chess and Go.

Tainnor · on July 13, 2021

Chess and Go are closed systems, the whole knowledge about how to play them well can in theory be derived just from a few basic rules. The same is not true for programming.

Plus, in theory Chess can be solved by exploring the whole solution space (which is finite, even though insanely large) and heuristics can make this practical by reasoning about which branches can probably be cut off. At that point, having more and more processing power and memory helps make the task feasible.

Not that I want to downplay these achievements, they were certainly very significant, but it's still entirely different from "solving programming" (whatever that means).

skohan · on July 13, 2021

You have made a distinction between programming and chess and go. There was a period when people thought chess could be solved by AI, but go couldn't, because go was too highly dimensional. That distinction has been proven not to be meaningful.

Tainnor · on July 20, 2021

Let's maybe start with the fact that in Chess and Go it is straightforward to find out if the game is over and if so, who has won, whereas with programming, it's not even clear what it means for a problem to have been "solved", unless you formalise it to an extent that is basically never done (and which is also usually more expensive than just programming the solution).

Solving Chess/Go and programming are really not much alike.

generationP · on July 12, 2021

The chess "job market" seems to be doing pretty well these days, at least based on recent publicity.

skohan · on July 12, 2021

Watching people play chess has entertainment value. Coding is lucrative because there's a lot of demand and not that many people who can do it well. What if a computer can do it better than almost anyone?

prionassembly · on July 12, 2021

This is something I don't get. You're supposed to be able to integrate BSD-licensed (or even Public Domain) code into GPL works, right? The fact that something shows up in GPL code means what exactly?

This is like: there are scholarly books that quote extensive from original philosophers -- long, third-of-a-page quotations. Still I should be able to quote something in its original language (translations may be copyrighted) copying from the derived work. Copyrighted work is not supposed to be able to poison noncopyrighted work it originates from.

tjpnz · on July 12, 2021

>As a code reviewer, I would want clear indications about which code is Copilot-generated.

I would like to see this tracked behind the scenes. At any time I should be able to get Copilot to spit out a list of suggestions I've accepted. I should be able to generate this report for the lifetime of a project.

jazzyjackson · on July 12, 2021

It's kind of funny that this whole thing is built on top of git but uses none of the features, such as git blame. Instead of being an auto-complete, I would want co-pilot to be a git contributor, making pull requests that improve my code.

brundolf · on July 12, 2021

> This is well-formed and even commented C that sure looks like it parses HTML, and the main function has some useful boilerplate around opening the file. However, the parsing is loaded with issues.

This sounds like a great example of an interview question (where the person is asked to find and fix all of the issues in a chunk of bad code). Unintended usecase for Copilot?

seanwilson · on July 12, 2021

If Copilot isn't showing exact copies of code it's seen, how is it able to produce code that mostly work? E.g. Copilot code from the article:

    function getPhase() {
        var phase = Math.floor((new Date().getTime() - new Date().setHours(0,0,0,0)) / 86400000) % 28;
        if (phase == 0) {
            return "New Moon";
        } else if (phase == 1) {
            return "Waxing Crescent";
        }
     // etc.

It feels like small incorrect modifications to any of the code here would completely break the function.

I've seen stories and articles written by GPT-3 where it will lose the plot and context on the way - in comparison Copilot doesn't suffer from this as much? How?

YeGoblynQueenne · on July 12, 2021

There are parts of the internet that google doesn't search, it seems. I've just tried searching for a couple of strings from my github repository and I can't find them with google or duckduckgo (I get tons of results but none that's actually my repository or contains my code). To be sure the Copilot generated code is not copied verbatim from somewhere you'd have to search its training corpus. I don't know if that is available publicly.

quenix · on July 12, 2021

This is also something that kind of breaks my brain. It's always impressive to see Copilot-"authored" code which seems complex, coherent, (and most of the time, functional), and Google it to zero previous results.

How is it that a glorified statistical machine is able to put blocks of code so well together?

gotostatement · on July 12, 2021

its worth noting that, in the article, the author explains that the moon phase function produced by copilot was actually incorrect

piyh · on July 12, 2021

Code is much mores structured than English. Code is built around well defined ideas, processes, datastructures. Human language is context dependent, loose, and things like tonality of speech can change the meaning.

It's easier to guess a multiple choice questions when all choices can be generated by intellisense instead of having to look at a dictionary.

tifadg1 · on July 12, 2021

copilot will be interesting in 10-20 years. Now it's an early stage ml driven experiment in a field that hasn't advanced in forever - the strides it'll make will be very gradual, incremental and filled with mistakes along the way.

bryanrasmussen · on July 12, 2021

Luckily global warming should be killing me about the same time machine learning is taking my job. That's pretty good planning right there.

speedgoose · on July 12, 2021

You should consider to immigrate to a colder place.

bryanrasmussen · on July 12, 2021

I'm in Denmark. I guess I could move up to Sweden, but I'm not sure if they want me because, after all, I am from Denmark.

jazzyjackson · on July 12, 2021

it's not the heat that gets you, it's the water wars

seba_dos1 · on July 12, 2021

Just like Richard Branson, Elon Musk, Jeff Bezos do...