Hacker News new | past | comments | ask | show | jobs | submit login
In Defense of Copy and Paste (zacharyvoase.com)
100 points by zacharyvoase on Feb 8, 2013 | hide | past | favorite | 58 comments



I love copy & paste! I also defended it in this scholarly article: http://harmonia.cs.berkeley.edu/papers/toomim-linked-editing... with a video: http://youtu.be/1wo_7MTdWWI


No idea why you were downvoted. That paper looks like it has far more thoughtful things to say than most comments here. :-)


I didn't downvote it, but I was just going to skip that comment and continue reading.

The comment is too short, apparently slightly off-topic, looks like self promotion and give no reason to look to the linked content. But as you said, this is a good comment!

How I would rewrite this comment (with some parts stolen from the abstract and some parts just invented):

I love copy & paste! I think that the use of programming abstractions like functions and macros has inherent cognitive costs. A few years ago, as part of my research we proposed "Linked Editing": A technique for managing duplicated source code with the help of the text editor. We implemented it in a prototype editor as a XEmacs extension. More details in this article: http://harmonia.cs.berkeley.edu/papers/toomim-linked-editing... and we made a video of the editor: http://youtu.be/1wo_7MTdWWI .

P.S.1 I still prefer refactoring to this kind of multiple edition. I use Racket, so I love functions and macros. But sometime they are very complex to edit, so perhaps sometimes this multiple edition can be a good idea.

P.S.2 Another possibility is that some users are too eager to downvote. I saw many good comments downvoted, but usually they bounce later.


That is amazing.


> This may come across as a straw man argument

Big time. The refactoring in this case was ill advised. When things started getting hairy, it should've been backed out.

Piling too much flexibility in one function is a common mistake. A justification for copy/paste it does not make.

I worked at a shop with this rule: don't try DRY until you've seen at least three repetitions. I think this saves one from premature refactoring.

Another way to put it: Refactor when the code speaks to you, that is when need is evident. Keep the result only if its a significant improvement. Avoid refactoring only because you are enamored of refactoring. (Or enamored of a rule.) Goes for any programming technique/tool, really.


My exact point was that the refactoring was ill-advised—but in my experience as a contractor (which means I've seen a lot of horrid things done in code) it happens a lot.


So your point is that refactoring is so often done badly, that it's better off if everyone cut & pastes?

In one sense, this doesn't surprise me. I remember doing lots of ill-advised refactorings when I was younger. Refactoring is a nifty idea, and so it's easy to be enamored of it and eager to apply it like it's a new toy. I'm sure lots of people act this way with new power tools and often the resulting "oops" gets thrown in the scrap bin.

The problem with refactoring code, is that recognizing the "oops" is harder. Probably, the thing still runs and all the tests still pass. It requires putting yourself in the shoes of someone who hasn't seen the thing before. It's not like seeing that you gouged a sanded surface. It's more like realizing that the technique you were so fond of conflicts with the composition. Not only is it subtle, you often also have to be mature enough to swallow your pride.

That said, a properly done poor refactoring is still superior to copy & paste. Why? Because, done correctly, you can pretty much guarantee finding all the places you need to change. With cut & paste, in a large codebase, you need to do some inspired searching and second-guessing -- and this is where the major risk comes in when you have to change something. You're still going to need some of that in a large codebase, even when it's very DRY. The smart move is to minimize that as much as possible, because again, that is where the risk comes in.

Ultimately, refactoring and DRY should be thought of, at its core, as a clerical tool. You need to take steps to ensure that ideas don't get lost in the codebase -- those give rise to bugs. However, you should only take those steps when the downside risk of ideas getting lost definitely outweighs the risk of making a mistake and introducing bugs or decreasing code clarity and flexibility.


The article talks about when to refactor: when you are making the code more closely resemble the product specification.

I'm disappointed that the key point of my entire article, which is the difference between accidental sameness and essential sameness, was apparently misconstrued as an attack on all refactoring.


My apologies, then. However, this is the risk that the attention-grabbing pinch of sensationalism afforded by "copy & paste" brings with it.


> don't try DRY until you've seen at least three repetitions

A significant number of bugs I've had to fix over the years were the result of just two repetitions. Invariably someone updates one of them and not the other one. Or copies the first one, then updates, then forgets.

This happens over and over and over and over. I'm so sick of it that I want to punch my coworkers (I don't usually get angry quickly but this has been wearing on me for a few years now). So I will continue to refactor at the second copy.


The number of repetitions is a red herring. It's more a question of which business flows the code adheres to, and what's the likelihood that a change in one should also affect a change in the others. Often this is a gray distinction and it amounts to a judgement call about what type of changes and what mindset will the maintainer come in with in the future. I'd like to think that with experience I've gotten much better at this, but it's exceedingly hard to get anything besides infrequent anecdotal evidence. Working on the same code base for 5 years, every once in a while I get the joy that comes with discovering that I made a good decision years ago, beating the YAGNI odds or leaving a particularly relevant comment that saved me significant re-investigation of a difficult issue.

But how could studies on this type of long-term maintainability draw meaningful conclusions? It just seems incredibly intractable from a scientific perspective—a result of experience and thoughtful consideration outweighing formal, documented and trainable techniques.


As Djikstra said, "two or more, use a for."


That's not entirely relevant in this context. We're talking about two similar bits of code in entirely different parts of the codebase.


Personally I tend to follow the "1-2-N" rule. I'll live with one copy if it's justified. Two or more copies and it's time to refactor.


That example under the When Tools Make It Worse section - uggghhhhh. Why would anyone actually do that? That isn't DRY refactoring, that's cargo cult refactoring.

DRY is not, was never, and should never be about unnecessarily replacing clean, well-factored code with @$2!% shared mutable state. The goal is to normalize your code, not to micro-optimize for keystroke count. No. Nonononononono. Just no.


Oh man, I've been fighting a similar battle this month. I have to deal with a codebase which has to talk to a handful of proprietary protocols, which don't have the best interoperability between implementations. However the all the protocols look fairly similar at a macro level (general concepts, data structure layout, etc). So somehow this mutated into a terrible inheritance tree 3-9 classes deep, with absurd placement of special cases 4 classes upstream, or special case handling for a specific device in each class all the way up the inheritance stack, and worse.

So I am trying to convince the cargo cult inheritance people that there are patterns like strategy and "library of useful functions" to handle a lot of the code better. But they keep sticking on cargo cult DRY. No matter how many times I explain to them that we need to organize with repeated code, because as the future happens, and new versions of "interoperable" code will diverge in their special case handling and whatnot, we still can't repeat ourselves.

/rant

So yeah... Cargo-cult DRY is just as bad for readability and maintainability as spaghetti and big balls of mud.


Sometimes I really wish there were a way to make violating the Liskov Substitution Principle throw a compiler error.


Learn Haskell.


/picks-up-rant-baton

Ha -- I've seen this too.

This is why I've started hating inheritance. I see people who think 3-9 class deeps inheritance trees are ok, and they then go out and design new 3-9 class deep taxonomies, and become very proud of their abstractions. Of course, most times a simple function (or lambda) will do just fine to patch over the inconsistencies. By patch I mean something like middle-ware.


Knowing when to refactor is obviously an art, not a science. And with an example as trivial as this, it's not really a very "real-world" example.

A lot of the time, you don't even know if two swaths of code are "coincidentally" identical (don't refactor) or identical in a "deep" way (refactor), even when the program is yours -- you just don't know how the program will evolve.

In the absence of additional information, I usually refactor only when I see three similar code paths, since by that point a project rarely goes back. Over the years, it's turned out to be a surprisingly good rule of thumb.


Same here. Doing the same thing three times is always my rule of thumb for when to take a look at refactoring, automating, etc.


Also at two, the repetition probably reduces complexity rather than having to deal with both use cases in the same code. At three, assuming the use cases are not any further apart, I think they start to simplify when refactored.


As I ponder this more, I think it's useful to consider the concepts of simplicity and complecting as articulated by Rich Hickey in his talk "Simple Made Easy". As he explains it, to complect is to braid multiple things together, whereas in a simple system, multiple things are composed. He has often pointed out that simplicity does not necessarily mean fewer things; as I understand it, it's not about how many things they are, but how they interact.

In that light, the single flexible tweet list function presented in this post is indeed problematic because it has a few things braided together: a tweet list, a profanity filter, and pagination.

So we should be suspicious of repetition, but at the same time avoid complecting.


One rule I try to follow is to avoid refactoring when the shared code is "coincidental." Perhaps this is another way of expressing what the author says about business logic.

I've definitely worked on projects where developers created large, unwieldy, hard-to-grok, buggy abstractions in the name of DRYing code. I'm pretty aggressive about making code DRY, but simplicity and readability are more important.

The effort I'll tolerate in pursuit of DRY also varies by language. I've been doing some Android work lately, and I'm finding that things I would have done DRY in Ruby require too much added complexity to make DRY in Java.


I think that's exactly what the author tries to say. I've also joked expressing it as "you don't refactor your twins."

I'm curious (and I think it can be a useful sub-topic) what you think made Java worse for DRYing. The rigid type system? Added verbosity?


One easy example: no higher-order functions like `map`, `select`, and `inject` because of no lambdas, so I wind up repeating the same looping code all over the place.

I took a look at a few "functional programming in Java" libraries, but their solutions were still pretty verbose, and it didn't seem worth adding a dependency for a tiny smartphone app.


EDIT: After thinking about it some more, there are several places where two methods are almost the same except for one line stuck in the middle. I don't know how to turn them inside-out in Java without spawning a bunch of tiny classes, and it's just not worth it. But with Ruby I could turn them into a single method, throw a yield in the middle, and then call it with a block. So again it comes down to no lambdas.

(Hmm, this was supposed to be an edit, but somehow I did a reply instead.)


I'd been thinking about these things over the last few years and was calling it "accidental duplication", analogously to the term "accidental complexity".


Straw man thinks DRY applies to two-line function. Straw man is a straw man. Also, less code > DRY. In fact, less code -> DRY. If refactoring makes for more code, not really DRY. More like taking a principle to its illogical conclusion. Compression is a process of diminishing returns.

Although there are certainly times when a factoring two lines into one line is better. Like when it's self-documenting, or when those lines otherwise add noise to part of another function.

Sometimes a new function is not the right approach to avoiding repetition. If you can't write a function to adhere to DRY, use a macro or equivalent. In C/etc, macros are wonderful if used well.


I feel like people in this thread (not just you) are particularly caught up with an oversimplified example that the author even acknowledged may not have been the best example within the article itself.

This article sheds light on something I also encountered frequently when I was doing contracting, and also have to put the brakes on myself when I see I'm going down a bad road: creating more generalized code is not always better than creating code that repeats trivial pieces of functionality but accomplishes distinct tasks.

Part of the difference between "conscious competence" and "unconscious competence" is innate awareness of places where refactoring or normalization will actually create technical debt. I found myself thinking "no duh" when I read the article, but that's only because it was explaining things I was unconsciously very familiar with.

I think this article would be a great read for less experienced programmers. I think the examples may have been lacking, but it would be hard to simplify any application to a point that would make sense to illustrate this issue in a blog post, so attacking it as a "straw man" is actually a "straw man" in and of itself if you fail to account for the author's intended purpose by including the examples. lol


I don't think the article serves any purpose. It uses an obvious strawman to try and argue; this is not going to convince anyone of anything, novice programmer or no.


One of the things that people don't get about refactoring is that it is not just a matter of extracting things or removing duplication. Sometimes you merge things or re-introduce duplication to get someplace better.

When you look at refactoring examples online, they often make that mistake. There's a straight arrow toward a "better solution" but without any backtracking. It's a hobbled view of refactoring.

To bring it home, in the blog example, I think is perfectly fine to remove duplication in the way listed as "bad", as long as you reintroduce the duplication when you have a bit of trouble. Much of the time, you're lucky and you don't.


I really really like this take. Refactoring is usually pitched as something that is completely orthogonal to solving the actual problem you were given. I think too many of us (clearly, I'm projecting) are weary of anyone else going on a refactoring spree because we see it break down things that were just fine separate. Often with only "warm fuzzies" being the actual gain. The progression shown in this post is really really good.


Agreed. This is -- or should be -- called "premature refactoring."


I don't even think it's always premature refactoring. Sometimes it is just stupid refactoring.

A correlating result to stupid refactoring is the existence of over-generalized functions that try to do so much that they need an absolute crap-ton of parameters passed in and still end up locking you down to a limited set of functionality. Adobe's ColdFusion scripting language (anyone remember that) used to have functions that would automatically generate huge and specific pieces of client-side JavaScript functionality. Stuff to the effect of:

  cfCreateShoppingCartWithPopupSummaryWhenUserHoversOverLink({
    supportsPaypal: true,
    dontShowLinkOnCategoryPage: true,
    doShowLinkOnProducePage: false, ...},
    'myShoppingCartElement',
    ...)
Okay maybe I'm embellishing a little bit. But the end result was loading hundreds of kilobytes of proprietary JavaScript libraries to support these weird built-in functions that would create very specific bits of client-side functionality that would then need a billion parameters passed in to allow remotely useful customization. Maybe it would have been better to just learn JavaScript instead of being locked in this way.


I think the first step the author took on the refactoring path was one I wouldn't take. It breaks the "do one thing" rule and the rest of the post is the pain that naturally follows from having an over-generalized method that tries to do too much.


These articles are a dime a dozen. This popular philosophy isn't right, because look at my poorly coded example of it.

You're dry code, is only dry is the laziest of senses, and represents a lousy programmer cluttering the system. A really lousy implementation of any of these programming paradigms would make one side look wrong.

In your example, the refactored code would look excellent if it implemented OOP and the Strategy Pattern. The two different feeds can inherit their similarities from the same place, and their differences implemented in separate places. Which feed to produce can be chosen dynamically, rather than one crappy grab-all function.


Not directly related - but it's something I come across so often when mentoring newbie devs that I thought I'd mention it in passing just in case anybody has this problem.

A pattern I sometimes see with newbies who understand the value of DRY is - as soon as they get to the point when they're about to repeat something or about to copy and paste - they stop themselves and start refactoring to remove the duplication they haven't typed into existence yet. They see adding the code that will produce the duplication as bad / waste.

Don't do that.

It's hard - because the code that they've not typed or copy/pasted doesn't exist or work yet. It's still in their head.

Make the duplication explicit first.

Type it out. Copy and paste. Change those two branches so they have exactly the same structure.

When you've done that - and everything is working and all tests pass - then refactor the heck out of it.

Much simpler, faster and less error prone.


I don't know if I agree 100%. Sometimes the extra abstraction helps you solve the problem in the first place.

In my experience the OP is right. The worse problem is when you eliminate duplications in the wrong place or using the wrong abstraction, leading to brittle abstractions that break in the future.


Yeah - there are probably exceptions ;-)

However what I've seen happen on multiple occasions is somebody merrily driving along churning out code then suddenly hitting the "ohhh - duplication is bad" wall and halting as they feel their way around the duplication and abstract they may need (or may not - since they've not written the code yet).

Duplication is not a mortal sin. Having it sit there for a few hours while you work out the meat of the problem isn't going to kill any kittens.

And often the easiest way to fully grok the abstraction that you need is to make the duplication really, really obvious.


It depends. Sometimes it is obvious what refactoring will need to be done, sometimes it isn't.

I do agree that for newbie devs, your approach is a good one, but I think that as folks get more experience, shortcuts often are appropriate.


I expect most of us would agree that a single function should ideally have exactly one main job and do it well.

Two functions are really doing the same job, and should probably therefore be combined into a single function, not when their behaviour is the same but when it should always be the same. As the article suggests, that determination is generally more about the software design or domain model than the mechanics of the current implementations.

Having said that, there is also a middle ground: create some sort of utility/helper function(s) to contain the code that is the same, coincidentally or otherwise, and then rewrite the two higher-level functions in terms of common helpers for now. If those higher-level functions need to diverge for good reasons later, at least it will be an active decision to separate the behaviours.

IME that sort of breakdown is unlikely to be beneficial with very short functions such as the examples here. There’s not enough commonality to justify the overheads of breaking everything up. However, in more realistic code, if you’ve got, say, 80% common operations between multiple cases, there are often some underlying concepts that can be extracted into their own functions. Those then become informatively named building blocks for the original functions.

Put another way, you might not want to consolidate the functions’ interfaces if they serve logically distinct purposes, but you can still consolidate some of their implementation details.


I see a lot of comments here of the type "You have to know when to refactor". I don't do it this way. Instead, I rely on a willingness to undo a refactoring when I see that something else might work better -- and even to undo that when I decide that I've got that wrong.

I have no problem extracting as in "WHEN REFACTORING GOES BAD" -- although I might wait for a third copy because removing the duplication -- because I want to see whether a useful abstraction would emerge. On the other hand, as soon as I recognise that one of those copies wants to change in a way that the other does not, I'd simply inline the method and let them diverge. I don't consider this a problem.

It seems as though some programmers believe that, one they extract something, it needs to remain extracted. No. It's only "cargo cult refactoring" if you stop thinking.

Most importantly, refactoring is experimentation. It's a kind of Mechanical Turk-based genetic programming-oriented style of designing, except that you have heuristics you can follow. That means that you'll go down the wrong path. THAT'S OK! as long as you allow yourself to backtrack. Remember: refactorings are small, reversible design changes. That means not just that one can undo them, but that one is willing to undo them.


OK, I'm obviously missing something, and part of the problem is that I'm not a Python programmer so my brain is obviously in "skim-mode".

Couldn't the problematic DRY pattern be alleviated by refactoring the following call:

    filter_profanity = kwargs.pop('filter_profanity')
    tweets = Tweet.objects.filter(**kwargs)
    if filter_profanity:
        tweets = itertools.ifilter(lambda t: not t.is_profane(), tweets)
    return render(request, template, {'tweets': tweets})

Into something like:

    def tweet_list(request, **kwargs)
       ...

       tweets = get_filtered_tweets(kwargs)
       ...

    def get_filtered_tweets(**args)
       filter_profanity = args.pop('filter_profanity')
       if filter_profanity 
          etc....
        end
       return tweets
    end

Why does the logic for the Tweet filtering have to be encapsulated in the rendering function?

// edit:

What might help is if the OP showed how the non-refactored code would look with the profanity_filter and pagination features. I agree that his refactored proposal is confusing...I'm just having a hard time imagining how the non-refactored version would be less so.


That still seems complicated to me. You're still passing around an object representing a collection of arguments/options, and there are functions whose outputs vary based on this object's state—it's not immediately inferable from the call to `get_filtered_tweets` whether the result will be filtered for profanity or not. So you might as well have that bit of code right next to the code which uses its result, rather than splitting it up into two separate (but still complex) parts.

EDIT: I added a bit here on how I would do it better without needless refactoring: http://localhost:3000/2013/02/08/copypasta/#a-better-solutio...


I believe they intended the 'proper' code to look like:

    def global_feed(request):
        tweets = Tweet.objects.all()
        tweets = itertools.ifilter(lambda t: not t.is_profane(), tweets)
        return render(request, 'global_feed.html', {'tweets': tweets})

    tweets_per_page = 20
    def user_timeline(request, username):
        tweets = Tweet.objects.filter(user__username=username)
        page = request.GET.get('page', 1)
        offset = (page - 1) * tweets_per_page
        tweets = tweets[offset:offset + tweets_per_page]
        return render(request, 'user_timeline.html', {'tweets': tweets})


So the problem with that is seems to be...what if there was another view that required pagination?

So the two unpleasant scenarios seem to be this:

1) The OP's assertion that DRYing the code may unintentionally break functionality in all the places that use it.

2) The DRY assertion: copy-pasting functionality, such as pagination, makes it more likely that the pagination functionality won't be properly updated across all the modules that use it.

I guess it's a case of YMMV...because in this hypothetical app, it doesn't seem likely that the number of views will multiply, thus making it easier to update the copy-pasted code. But that seems like a mindset as prone to future problems than one that is more DRY-minded.


My only real concern with this essay is that the OP bothered to refactor out the duplication, but didn't bother to refactor his internal refactoring when it got too complicated, instead claiming: "look, now it got messy", threw up his arms, and said there's nothing more that can be done, blaming DRY as the culprit.

Except we CAN do something about it.

It would have been just as easy to continue refactoring the tweet_list() method to pull filtering, pagination, and profanity checking out into sub methods-- at which point you've built a strong reusable component that can support many more combinations of those extra requirements. So by the time you get more feedback saying, "we need a new page that only shows 5 tweets per page and hides profanity, but does not filter", you can now easily take that reusable component, pass in those options and be done rather than starting from the top because you refused to clean up your internals. That's why we strive for reusable components in the first place.

In other words, if the argument is that refactored code is messy, it really means you aren't done refactoring.


Refactoring / DRY to me is not about creating monolithic, generic do anything functions. It decomposing code into layers of abstraction somewhat like mini-"DSL"s. The top level functions are tying together next level "down" helper functions. Which may themselves be higher level tools over something like DB api. More than 3-4 layers is probably a smell.


In defense of the single flexible function, I think the hypothetical business requirements are pathological. Or perhaps the hypothetical developer is taking a pathologically literal interpretation of them. Who would want pagination in one view but not another? As for the profanity filter, that should probably be a preference of the currently logged-in user which is applied to all feeds which that user views. (It should probably be enabled when an anonymous user is viewing any feed.)

I suppose some developers don't have the freedom of suggesting alternative specified behavior that is nicer to implement. In some cases I have not had that freedom. But in this hypothetical case, when pressed, the person setting the requirements ought to value consistency.

My own experience has been that I tend to do copy-and-paste because it's easier, but then regret it later. I don't think I've yet erred too far on the side of trying to follow the DRY principle.


I did a self-study on a project that lasted several months. I wrote down everything, including mis-typed characters, grammar errors, syntax errors, and semantic errors. I did a root cause analysis of the result. One general result is that I have a 3% error rate, regardless of activity. In fact I find that the delete key is by far the most important key on the keyboard, representing 3% of all of the characters I type.

One observation is the fully 1/2 of ALL the programming errors I made were due to copy/paste. Your mileage may vary but I doubt it.

Copy/paste is evil but it is so "low level", like the delete key, that you probably don't even think about it.

You may find it worthwhile to do a deep analysis of your personal error rate on some project. It is very enlightening. In fact, we ought to fund studies so we can get industry wide statistics.


The thing is:

"If you are using copy and paste while coding you are probably committing a design error" doesn't conflict at all with what he says. The fact is that copy and paste is the point when one looks and says "is refactoring appropriate here?"

One thing I would point out is that premature optimization is the root of all evil. You can get a pretty good sense that if your refactor adds more lines than it deletes and functionality remains the same, that you have added complexity in refactoring which means very likely that you are doing it wrong. This is particularly true if you can't say it is reducing the number of lines of code generally, or compartmentalizing state changes.

(This leaves aside the fact that the most pernicious use of copy and paste in the world is "sample code.")


If your code is starting to look ugly, refactor it. If your refactoring is looking ugly, stop, you're doing it wrong. If a test breaks because of your refactoring, stop, you're doing it wrong. I call this the Don't be Stupid principal.


principle


Lol. Do I look less stupid if I say it's my principal principle?


Isn't this what functional programming is for?

You have several pieces of code that follow a very similar structure and logic but perform very different purposes for the program. So you try and generalise the structure of the code?


Refactor if doing so will give you an advantage.

Copy paste when you aren't sure if the requirements will change. Nothing is worse than building an abstraction only to find out it's useless given this new project requirement and that the two abstractions should really be separate.


I don't remember where I read it, but it's good advice:

Copy the first time, only start refactoring if you need the code a third time.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: