Hacker News new | past | comments | ask | show | jobs | submit login
The Wrong Abstraction (2016) (sandimetz.com)
654 points by mkchoi212 on July 5, 2020 | hide | past | favorite | 240 comments



I feel some people here are misunderstanding the blog post.

Sandi Metz IMHO doesn't claim that the problem occurs at step 2 or 3. She doesn't claim that it's wrong to introduce abstraction when there is duplication.

What she is saying instead is that the problem occurs from step 6 onwards: when you find yourself wanting to reuse an abstraction that, regardless of whether it made sense in the first place or not, has outlived its usefulness.

I think this is in agreement with other points that she often makes, about being bold, but methodical about refactorings.

The whole discussion about "you should never abstract away code before you see the third duplication" has little to do with the article, and I'm also really not sure it's good advice.


> What she is saying instead is that the problem occurs from step 6 onwards: when you find yourself wanting to reuse an abstraction that, regardless of whether it made sense in the first place or not, has outlived its usefulness.

You're 100% correct in this. And what's even more amazing to me is that even after you explicitly calling this out, the majority of people replying to you (and presumably have read the article) still think the problem is between 2 & 3.

The argument she is making is not "don't make abstractions until you're 100% certain they are correct". She is essentially saying make abstractions where appropriate. Some of these abstractions will be wrong. When you start seeing yourself making certain behaviors it's probably because it's the wrong abstraction, so back it out and refactor.

Ultimately that abstraction seemed right based on the info known at the time it was created, now that you know more don't try to cling to it because it was already made. Be ok with backing it out and refactoring.


If you see an abstraction does not fit, you have the choice to consider it incomplete or unsuitable. If incomplete, you can fix it (assuming write access). If unsuitable, you should "back it out" as you say.

In my opinion this distinction is applicable and thus useful in contrast to whining about leaky abstractions: http://beza1e1.tuxen.de/leaky_abstractions.html


It seems to me that a straightforward fixing of an incomplete abstraction is exactly what Sandi Metz warns against (i.e. steps 5+6). The abstraction is "almost perfect", so it should not fall in the "unsuitable" category.

It just so happens that complecting several slightly different uses in one abstraction comes at a significant cost. Backing out (inlining the abstraction, eliminating unused code) is a simple recipe to let you see the true amount of overlap, which may or may not itself be a suitable candidate for a smaller abstraction.


I rather see Sandis post as a criterium when an abstraction should be considered unsuitable: When you use only a small fraction of it because of conditionals.


I think that's probably correct in describing where you end up, but not any particular step along the way. It's one of those "the road to hell is paved with good intentions" situations.

When you first modify the abstraction, it's nearly perfect. Just one tiny conditional and it's a perfectly suitable abstraction again. The problem is, when this process repeats itself, you slowly get to the point where any one client of the abstraction is only using a small fraction of it, but there was never a singular point where someone made a decision to use the abstraction when it was anything less than "almost perfect".


yep.

Causes of this I've seen:

- sunk cost fallacy

- not spending enough time to understand the codes behaviour (and its side effects)

- code being written so it becomes hard to understand its behaviour and any side effects.

- trying to "win the PR game" by avoiding refactoring

- giving up on refactoring because thinking is too hard


Sandi Metz us not the only person with opinions on abstraction, nor is the article a definitive opus on abstraction, so it's not surprising people also include other ideas.


I think it's fair to say that abstractions should have to prove themselves as a necessity and that we make things abstract way too early. Most really good abstractions in an app fall out of well-written code to solve a specific problem.

In day to day life as an engineer, I find that we have very few _enduring_ abstractions - there are very deep ones, like the concept of streams, things like filesystems and related ideas, the concept of a virtual machine in the process sense, and so on - and a lot of faddish abstractions that have a pretty wide blast radius when they start to go wrong. A lot of the good ones (networking has a _lot_ of these, such as the abstractions above and below the model of an interface in professionally-written network device code) are focused on layering.


I disagree, and this is one of the things where it's really hard to get to a shared understanding because I don't know what kind of problems you've worked on and what kind of code you've seen and vice versa.

But in my daily work, I routinely see abstractions just come up very naturally all the time. Sometimes they turn out to be slightly wrong, but often also not. I need to perform some calculation (e.g. for billing)? That can be abstracted away. I need to parse some unstructured user request into something structured? That's an abstraction. And so on. A lot of these things are clear to me even before I start writing code.

I also tend to use (at least some amount of) DDD, to write small, composable functions with few side effects and to be as declarative as possible. All of this might help with coming up with lasting abstractions.

But I'm not denying that I'm running into lots of situations in my daily work where it turns out some abstraction was wrong. Just that I find that many more of them actually turn out to be correct or at least correct for the most part (it might be that something needs to receive an additional parameter or to returns some slightly different structure to account for error conditions or so, but it's still basically the same abstraction).


Does it make sense to characterise abstractions as “right” and “wrong” in the first place? This feels too absolute to me.

Abstraction is just hiding some complexity in implementation details behind a simpler interface. It offers benefits from reducing the need to deal with the full complexity everywhere else. It also has costs. The interface establishes a new concept, albeit a simpler one, that must also be understood and maintained wherever client code uses the abstraction. Moreover, if you need to understand or modify the detailed implementation later, there is now a barrier to doing so.

When we define an abstraction, hopefully we do so because the benefits outweigh the costs at that time. The simpler the interface relative to the complexity of the implementation it hides, the more likely this is to be true. However, that balance is inevitably subject to change as a program evolves and the relevance of the hidden details to different parts of the system changes.

So it feels like abstractions might be better characterised by whether they represent good value under the current circumstances. It is perfectly reasonable for an abstraction to be cost-effective at the time it is added, but to become more or less so as the context evolves. If it reaches a point where it is no longer cost-effective, it should be removed. Either the relevant parts of its implementation can then be inlined at each place that previously used it, or some new abstraction(s) can be defined that better reflect the relevance of different implementation details at that time.


I sometimes review PRs where people are encouraged (by other reviewers) to create new methods because there's a single line of code duplicated between two other methods. I don't think that rate of abstraction construction - and every method is another abstraction - is helpful for the health of the code, or its readability.


If that line of code deals with a particular piece of business logic, that should be consistent within the application, it’s a good thing.


I can't judge that without more context. If this is just accidental duplication, it's pointless to abstract it away. But if it's a line of code that is necessary to deal with some gotcha of a particular library etc., it's probably good to extract and add an explanatory comment.


“Accidental duplication” well said. Not enough people even consider this before deciding that they must remove the duplicate code. Will each duplicate section change for the same reason? If not then it’s not really duplicate code, it just looks the same.


Why are your two examples in the second paragraph necessarily "abstractions"?


Because "calculate_vat" is more abstract than the exact sequence of calculations performed?


Are we talking about that kind of general function labelling abstraction?

I was under the impression this was about generalizing the abstraction to something reusable.

I.e. calculate_vat will always be used by the same thing and there will only ever be one of them.

The generalization would be:

    interface SalesTaxCalculator { 
        func calculate_sales_tax 
    }

This kind of abstraction is more uncommon.

What im guessing the article is struggling with is that making abstractions like salestax calculation os usually kind of useless, there is only 1-3 types of sales tax calculation and its best defined inline next to the proce calculation or put in a function like calculate_vat.

A lot of developers use abstractions to remove code duplication.

Its more fun using this kind of code than code where devs dont try to remove duplicstion but i just wish instead of seeing duplication as evil, they instead saw it a sign that their design is wrong.

If youre seeing duplication, there is an abstraction there, whether it is worth it to implement is usually more to do with if its just hiding code duplication or if it is creating a new concept that hides implementation details.


Yes, if you restrict your definition of abstraction to "polymorphism", then it becomes a different discussion (Go developers would like to have a word).

But I think there are lots of abstractions often missing from code that I have seen that don't need any sort of polymorphism. A common example is performing input validation all over the place instead of at the boundary with some sort of form object that only accepts valid params.


Im not restricting the definition to that at all.

Input validation, and any abstraction that smells like a framework or library make great code folds for abstraction.

The question is all about cost. Does a single layer abstraction, at the boundaries, with clear inputs and outputs have much of a cost?

I don't think so.

Does forcing vat calculation into a new class far away from the widget price class have a cost? I think so, the cost is having the same concern separated for no material gain other than perhaps reducing one form of duplication for the cost of creating another.


You may not think that input validation at the boundary has a cost, but the fact is that the "typical" Ruby on Rails application is happy to just pass controller params along to the model layer unchanged and unvalidated (there is a security layer that allows you to whitelist which parameters you want to forward, but nothing more than that). Validation is supposed to happen in the model layer. I think this is horrible design, but it's unfortunately a strong convention and breaking conventions also has associated costs. By contrast, if you write your application in e.g. Spring Boot you typically have validation at the controller level built in, so yes, in that case, it's easy to do the right thing.

I think you're right that it may not be necessary to extract "calculate_vat" into another class. It just really depends: if the VAT code is so complex that it should be properly unit tested, I think it deserves its own class. If it's fairly straightforward and only used inside of a single other price calculation class, then yeah, it can remain a private method or so.

But I don't think this is a question of whether to abstract or not, rather about what the right level of cohesion and coupling is for a given situation.


I also (want to) understand the article this way, but I think the article is a bit vague about this point. It focuses on Programmer X, who is coming after many iterations of trying to retrofit features into a very wrong abstraction.

The article agrees that the abstraction was useful when Programmer A wrote it, but it remains vague about whether Programmer A's judgement was correct at step 2 or 3!

I'm not quite sure what Sandi Metz is thinking right now, but the original 2014 talk[1] linked in the article is pretty clear about _keeping the duplication_, and _waiting for the right abstraction_. This statement sounds more in line with the Rule of Three, or the Go proverb 'a little copying is better than a little dependency', which is what people are arguing here.

Personally, I believe that any non-trivial duplication should be eliminated at the very first chance, to avoid unintended business logic divergence. I find the later case more risky than having to deal with the wrong abstraction, since - as Sandi eloquently explains here - wrong abstractions can (and should!) be refactored.

[1] https://youtu.be/8bZh5LMaSmE?t=893


> but it remains vague about whether Programmer A's judgement was correct at step 2 or 3!

Which emphasizes the OP’s point that the article isn’t about what happens at steps 2 or 3 at all. So it’s not concerned with whether the judgment there was right or wrong.


Not to take this on a huge tangent, but I really _do_ think it's good advice. Unrolling complicated abstractions is a lot of work. Keeping two pieces of nearly identical code in sync is work too, but I've never found it all that onerous. But there's obviously a continuum; on one side it's obvious that it's a shared concept, and on the other, code just happens to be similar almost by chance, and not for much longer. But lately duplication has been turned into a code smell to be linted out, causing a lot of people to get rid of all of it, at all cost.


If you only have 2 instances but are spending effort keeping them both in sync for changes then that might be a good time to abstract anyway. The fact that you’re keeping them in sync means they aren’t just coincidentally the same. But this is very situational.


The cost of maintaining two (or three ...) copies might well be less than creating an abstraction. The danger rather lies in situations where not everyone potentially modifying that code (now or later) is aware that there are N copies (and where they are) which need to be maintained.


To be fair, Sandi suggests adding dup tags (essentially comments with links to other duplicates, I guess) in the linked video.

I can see how this could work in some cases, but I remain skeptical about this technique. I have a strong gut feeling that over 20% of the junior engineers touching this code will not understand the dup tag and will end up modifying one case without keeping the other case in sync. This number is already high enough to be worrisome for me.


That's a big part of it for me. The abstraction would end up being the best way to document the duplication, imo. Far better than the likes of /* See also... */.

It depends of course, but I personally feel the work of 'simplifying an abstraction', is easier than the problem of 'tracking down anything that might need to be edited'.


Then such person needs better IDE, because finding callers is one shortcut away.


That's an amazingly mechanical view of code.

If two blocks of code refer to exactly the same thing (event, process, object, rule) in the application domain, then the duplication can be eliminated.

You can't eliminate duplication without asking, "what does this code mean in this context?".


I think there are a couple of things at play here.

One is the use of code quality tools like CodeClimate. It's true that those can sometimes be extremely aggressive when it comes to duplicate code to the point that I find their complaints to be uselessely beside the point. This is especially true if you have typical "structural" duplication like "many controllers start with the same sequence of steps" etc. or is even worse when you have to use configuration DSLs etc.

OTOH, it has been my personal experience that many people, if they use CodeClimate etc., routinely just ignore them for the most part, so I'm not always sure what the point of them is. But maybe other people have different experiences and some people really are routinely overabstracting the most coincidental of duplication in which case I agree that that is not a very useful thing to do.

As for the advice itself: it is definitely problematic if it is used as some sort of hard "rule". If it is taken as a heuristic / "rule of thumb", then it might be ok as long as you make sure people don't overemphasise it where other/better rules of thumb would be appropriate.

For example, if I were to write some billing code and somebody else just duplicated that code somewhere instead of using a shared abstraction, I would probably find that to be a serious code health issue as you really shouldn't perform billing calculations in two separate places: this is something that needs to be kept in sync across the code base; one sibling (nephew?) comment is right in pointing out that here you have to consider the cost of things that need to stay in sync accidentally going out of sync.

There are many more examples which is why I think that if you use "refactor on 3" as _one_ heuristic, it's fine, but if it's the sole one, then less so.


> you should never abstract away code before you see the third duplication" has little to do with the article, and I'm also really not sure it's good advice

Absolutes like that are rarely good advice.


Abstractions have other purpose than deduplication. They make it easier to reason about your code as well. It might be the smartest thing sometimes to abstract away the first occurrence in a method imho.


Sure, but some comments on here are literally saying that. Not as a rule of thumb (although such comments can be found here too, which is ok), but as a "as an engineer I always enforce this rule" thing.


I think you generally shouldn't create an abstraction until you have at least three uses for it.

That's very generally. You might want to create abstractions before then, but be prepared that they will be wrong, and don't invest in e.g. lots of unit tests, because when you break the abstraction you'll throw away that work. Some unit tests yes, but more in semi-integration tests that verify the stack sandwiching the under-proven abstraction.


We humans just can't help ourselves, but to invent mental shortcuts. Making a judgment "is this really a good abstraction or am I just mindlessly deduplicating code" is context-dependent, nuanced and requires some mental effort - much more work than "do I have it repeating 2 or 3 times already" which is mindless and mechanical.


You are correct that the problem lies at point 6. However a problem only exists if programmer B decides that it is acceptable to keep adding conditional logic to the method. Wrong. This is just a case of programmer B not knowing how to refactor properly. There really is nothing else to it. Yes remove duplication. Then if later on, that duplication then requires conditional logic then refactor to replace the conditional logic with polymorphism. Both these steps are clearly defined on Martin Fowler’s Refactoring book and just need to be applied when the time comes.

I think the article has the premise that once you remove duplication you have committed to using that abstraction and need to shoehorn all future changes into that function. That’s a ridiculous notion. You can simply refactor it again if the abstraction no longer makes sense.


> I think the article has the premise that once you remove duplication you have committed to using that abstraction and need to shoehorn all future changes into that function. That’s a ridiculous notion. You can simply refactor it again if the abstraction no longer makes sense.

I don’t understand how you read this. The premise is exactly the opposite...


I'm not sure it matters about applying "prefer duplication over the wrong abstraction" at step 3 or step 6 nearly as much as applying that advice at some point.

I often consider "is this abstraction going to be prone to misuse" (regardless of if it's the second, third, fourth... copy) and try to head it off with either strict typing or comments or internal visibility - to try to do step 3 without opening up as big of a door to step 6, but the important thing is less when to do stuff like this but just to try to avoid things reaching step 7.


> Sandi Metz IMHO doesn't claim that the problem occurs at step 2 or 3.

But the headline does.

I had to read quite a long way down the page to discover that all she is advocating is what i have always done: deduplicate when practical, undo the duplication when new requirements make it incorrect and push the unique parts into the callers.


> But the headline does.

That’s not really fair, it repeatedly says “wrong abstraction”, in the title and in the article. At steps 2 and 3 it is still the right abstraction, duplication only becomes better when it is the wrong abstraction.


Sandi is a he.


I'm pretty sure that Sandi has identified as female in the past, but in case this has changed or I'm simply wrong, would you mind pointing me to a reference?


That’s not what I get from the article. The problem does indeed occurs at step 2 and 3: leave duplication alone and don’t introduce abstraction if you are not sure about future requirements.


Taken to its logical conclusion, doesn’t that argument mean we would almost never introduce any abstractions at all? That doesn’t seem very practical compared to the alternative of introducing abstractions if they are useful at the time but remaining willing to change or remove them again later if the situation changes.


Yes my comment was poorly worded it misses something like "unless you have a strong case for it". Also there was a title change on HN, previously it was "Duplication is far cheaper than the wrong abstraction", she also says "prefer duplication over the wrong abstraction" at the beginning, so with that emphasis I might have misinterpreted the rest of the point.


You can’t plan for what you don’t know.

This is why I like the "Rule of three"[1]. Only once you've done it three times will you truly begin to understand what the abstraction might need to look like.

1. https://wade.be/2019/12/10/rule-of-three.html


The rule of three helped me get get over my initial abstraction issues, but I leaned much more towards a rule of 5 or 6. Around three you finally find an abstraction, but around six uses there is a good chance it breaks down. Making an abstraction saves you from having to make the same change to the code you copied multiple times. But the cost of repeating yourself is so low. With good keyboard mechanic repeating a change in four to five place take just a bit longer than doing it once since most of the upfront cost is in deciding on the correct change. It does feel a bit like drudgery, but it’s also very freeing to not think about abstractions and just make progress at all costs. It’s strategy can bite you if you don’t take the time to look back and make a refactor later, but I find the approach of churn out code and letting the patterns emerge then restructuring with hindsight much more fruitful than pausing frequently to think about it abstractions. They are really two different mindsets and best left for different sessions of work.


Any advice on teaching this to junior engineers? Seems like folks with 3-5 years of experience keep trying to not only over-abstract but also keep re-inventing the wheel with abstractions (vs looking for existing libraries).


It's largely because they're dealing with an area with no theoretical tools. Any time you hit an area that are full of people "Designing" solutions/abstractions rather then "Calculating" an optimal solution/abstraction you know you've hit an area where there's very little theoretical knowledge and most people are just sort of wandering chaotically in circles trying to find an "optimal" solution/abstraction without even a formal definition of what "optimal" is.... I mean what is the exact definition of the "perfect abstraction"? What is bad about duplication what is a bad over abstraction and what is this "cheaper" cost that the title is talking about? It's all a bunch of words with fuzzy meanings injected with peoples biased opinions.

That being said theories on abstractions do exist. If you learn it you'll be at the top of your game; but it's really really hard to master. If you do master it, you'll be part of a select group of unrecognized elites in a world of programmers that largely turn to "design" while eschewing theory.

Here are two resources to get you started:

The Algebra of Programming: https://themattchan.com/docs/algprog.pdf

Program Design by Calculation: http://www4.di.uminho.pt/~jno/ps/pdbc.pdf

You will note that both of these resources talk about functional programming at its core which should indicate to you that the path to the most optimal abstraction lies with the functional style.


My favorite example of really bad abstraction is add/edit crammed into single popup/model. You know edit is basically a copy paste of add so "ding ding ding here goes DRY!" in a junior mind. But quickly enough it shows up that some properties can be set in add, whereas in edit they have to be read only. Quite often you get also other business rules that can be applied only on edit or make sense only when adding new entity. But when you create first version they look a lot like the same code that should be reused.

For me this is really good example of how similar looking code is not the same because it has different use case.


> But quickly enough it shows up that some properties can be set in add, whereas in edit they have to be read only.

So? Just put in some conditionals.

What is the alternative? Duplicate most of the code with minor, non-explicit differences? What's the benefit? You just moved complexity around, you didn't get rid of it.

The drawback is that now anything you have to add, you have to add and maintain it in two places. And since your "add" and "edit" are probably 90% the same, it's going to happen 90% of the time. It's very annoying during development and you're likely to fuck it up at some point.


This is a good example of how this overall topic gets reduced to "How much abstraction?" instead of "In what ways should something be abstracted?"

Obviously an Add/Edit field are operating on the same record in a hypothetical database, so it makes little sense to duplicate the model.

On the other hand, if the conditionals within the abstracted version become too complex or keep referencing some notion of a mode of operation (like, ` if type(self) == EditType && last_name != null` lines of thinking), that is sometimes another type of smell.

But say you make some kind of abstract base class that validates all fields in memory before committing to the database, and then place all of your checking logic in a validate() method. That sounds like pretty clean abstractions to me.

And moreover, this is probably provided by an ORM system and documented by that system anyway--so that's a publicly documented and likely very common abstraction that you see even between different ORMs. That, I think, is the very best kind of abstraction, at least assuming you are already working in such an environment as a high-level language and ORM. Making raw SQL queries from C programs still contain their own levels of abstractions of course without buying whole sale into the many-layered abstraction that is a web framework or something.

This question becomes more important when you aren't just updating a database though. If you're writing some novel method with a very detailed algorithm, over abstraction through OOP can really obscure the algorithm. In such a case, I try to identify logical tangents within the algorithm, and prune/abstract them away into some property or function call, but retain a single function for the main algorithm itself.

The main algorithm gets its definition moved to the base class, and the logical tangents get some kind of stub/virtual method thingy in the base class so that they have to be defined by subclasses. The more nested tangents are frequently where detailed differences between use cases emerge, which makes logical sense. It's not just that it's abstract, but the logic is categorically separated.

It's a very general pattern supported by many languages, so you see it all over the place. That organization and consistency in itself helps you to understand new code. In that way, it also becomes a kind of "idiom" which in a sense is one more layer of abstraction, helping you to manage complexity.

As a counter of that, you see code where `a + x * y - b` becomes self.minus(self.xy_add(a), b). More abstract, but not more logical; not categorically separating; not conforming to common idioms; obscuring the algorithm; and so on...

And then there is performance! Let's not talk about the performance of runtime abstractions.


I mean, aren’t we just bikeshedding inheritance at this point?


Each to his own. If I found that a junior had created two separate popups, one for add and one for edit, I'd want to look into the code with them to understand if that was a good choice, because usually it wouldn't be for anything with more that one or two properties.


I just had a case of this last week in a web-app I’m writing.

In the frontend code I decided to use an abstraction and parametrization in the backend code I kept the logic separated.

It really depends on context. Specifically on the layer you are operating on.


I think there are two parts to it. First, you want to push them to get into the habit of solving problems by expressing the question clearly enough that the answer falls naturally from it. That's so fundamental that every aspect of engineering benefits from it, but it's particularly important as a first step in writing code.

The second part is building the intuition for the abstractions themselves. That's tricky as they have to teach themselves. They need to build coherency in their internal mental langauge of abstraction, and the only way to do that is to work directly on real code, and work through the consequences of doing it one way vs. another.

That means you have to let them commit code you don't like. By all means, explain what your concerns are, but then let them see how it evolves and as it becomes more untenable, that's when you go back to rethinking it and trying to state the problem clearly.

Likewise, when they do it well, you can highlight that, especially drawing attention to changes to their code that worked nicely.


Bring the idea that abstraction has a cost, like technical debt. It’s not something to be proud of, on the contrary, it must be justified and serve a true purpose and not be only an intellectual satisfaction.


Teach them about cyclomatic complexity and then review their work in these terms. It gives them something concrete to target rather than trying to accomplish some ethereal notion of "proper abstraction".


I dislike any programming rule which includes a number.

The issue is whether sections of similar code implement the same idea or just happen to be accidentally similar. The number of instances does not really matter. If you have 100 lines of code which are almost the same two places in the program, then you should unify sooner rather than later, before they are allowed to diverge.


Rules are great because they can be broken, if you know when to do so.


Exactly. With experience, you learn not to abstract too soon.


Seems so counterintuitive, but it’s one of the most important lessons I’ve learned in 15 years of development experience.


Every Line Of Business codebase I've worked on has been the worst "there I fixed it" copypasta spaghetti, and has never made it to the point where "maybe we shouldn't add a parameter to this existing, cleanly abstracted method to handle this new similar-but-distinct use-case" was anywhere near my radar for abstraction.

I would love to have developers where my problem was "maybe you piggybacked on existing code too much, in this case you should've split out your own function".


In contrast, every junior developer I've ever worked with has wanted to abstract too early and often, and been slow to recognise that abstraction has costs too (often far higher over time than is initially obvious).

There are costs to copying code, and costs to abstraction, and there's a balance somewhere in between where the most resilient and flexible code lives. The costs of both are paid later, which makes it very hard to judge when starting out where that balance lies, and hard to assign blame later on when problems manifest. Was it too little abstraction, or too much, or the wrong abstraction?

Note that the article claims that duplication is cheaper than the wrong abstraction. The problem is not abstraction in itself, but that abstraction is very hard to get right and is better done after code has been written and used.


What I run into with juniors is that yes, they want to abstract the new problem, and that's good... But they show disinterest in learning the existing abstractions and the existing problems and how their new code would fit into that. Given that approach, you end up with a million individual "frameworks", each only solving a single specific case of a series of overlapping similar problems.

Because reading code is harder than writing it. And the only thing worse than "there, I fixed it code" is "there, I fixed it with this massive cool new framework I've built".


yes, they want to abstract the new problem, and that's good...

I'm not sure that is good. I started off this way too, but now I like to think carefully about abstractions and avoid introducing them till I'm sure it will not hinder understanding, hide changes/bugs, bury the actual behaviour several layers deep, or worst of all make things hard that should be easy later (the problem in the article).

Building abstractions is world-building; it's adding to the complicated structure other developers (including your future self) have to navigate and keep in their head before they can understand the code. So perhaps because of your second point (that people rarely like other people's abstractions), it's better to keep abstractions simple and limited.


Every failed IT project that I have worked on in the last 20 years (except those where the cause was non-technical such as bad planning/ bad requirements), failed because it used too many layers of abstraction.


Counter: Every failed IT project that I have worked on in the last 20 years had too much code. Code is bad. Delete code mercilessly.

Seriously though, the problem are bad abstractions, not just abstractions. A total lack of abstractions is typically a spaghetti you need to read fully to understand.


When I read these threads I feel like I must be working on another planet to the people commenting in them.

In almost every front-end project I join, there's a positive correlation between number of abstractions and code size. Everything is so "best practice one size fits all"-y that you can usually start with halving the size of the code base by removing 50% of the dependencies.

Even once you've done that, you can usually speed up development by cutting 50% of the remaining dependencies. All the ones where the API surface area is more complex than the bloody implementation. Code is bad, sure, but at least it speaks for itself. A lot of the time the choice is between 3 lines of code that are well written and self explanatory, or 1 line of code that is incomprehensible without trawling through 800 lines of documentation first.

I agree that code size is probably the most important metric for measuring complexity, but it's not an absolute thing. If you're too merciless with culling code, you can easily code yourself into a shitstorm of required context that makes hiring and onboarding impossible. Having said that, I think the problem is mostly contained to using other people's abstractions. I can't think of many times I've walked onto a project and gotten lost in the mud because someone there had coded up something that was too convoluted. Only one springs to mind and it was a back-end system.

And that probably gives some clue as to why nobody can agree on this stuff. I'm guessing different ecosystems lie on different points on the scale, and so different approaches are going to be more successful. I've seen probably 20 projects grind to a halt because of overabstraction, and none because of code duplication. But I'm sure there's other programmers involved in other communities and industries that have seen the opposite. So if we're talking to a faceless crowd on a forum, we're going to give vastly different advice, under the assumption that the people we're talking to are somewhere around the average of everyone we've ever worked with.


I think that it depends on what you define as an abstraction. I think we're often just counting wrong or plainly heavy-handed abstractions here.

There are many abstractions which you can cut and reduce code size, e.g.:

* Complex frameworks which do not fit your case * Overused GoF-style design patterns which have no place in this day and age * Magic ORMs generated with annotation processors * Universal Tool Factory Factory Factories[1]

The thing is, you're usually not replacing these abstraction with plain old code duplication (let alone the dreaded "fixed A here, fixed B there" copypasta). You usually replaced dependencies (frameworks, ill-fitted libraries and factory-factory-factories) with your own implementation which is a better fit for your needs, and that can be viewed as duplication - sure. But you'd usually still only have ONE implementation in the code base.

In short, in most cases I've seen where we eliminated bad abstractions and save on code, we replaced them with good abstractions, not (a lot of) duplication.

[1] https://medium.com/@johnfliu/why-i-hate-frameworks-6af8cbadb...


> there's a positive correlation between number of abstractions and code size

What really matters to me is how much code I need to read to understand how it works. I prefer to work in codebases where I need to read 100 lines out of 1200 instead of 1000 of 1000.

Good abstractions are not about code duplication - they are about simplifying conceptual models - so I don't need to understand these details of everything, but I can still understand the system as a whole. And this is the hard part to get it right.

Many developers often mistake indirection for abstraction. The worst codebases I worked on had plenty of indirection, many components and many dependencies (often cyclic!), but they actually lacked abstraction or mixed different abstraction levels. A component that is so universal that it can render webpages, solve differential equations and wash the dishes all in one function is not a good example of abstraction.


Tnere is truth in this! I've also seen that some of the most successful projects with the highest performers are the most full of duplicate code.

The operating theory is to be first to market in order to capture the largest market share and be the market leader. Programs are just tools that can be rewritten later. That's similar to any large tech company today that "innovates" then apologizes later.


If a company has been running long enough to be making money, chances are their codebase will be crap.


I just had to debug code that had seven layers of classes on top of dapper to call a stored procedure in SQL server.


So much this. I've encountered many codebases (in science and in tech) where the coder did not even use basic abstractions. In one case there was a lot of

    plot('graph1')
    plot('graph2')
    ....
    plot('graph100')
because somebody didn't know how to create strings at runtime in C++. Another codebase did complex vector calculations in components, I was able to reduce a 500 lines function to 50 lines (including comments, and with bugs fixed).

I can sympathize with this a bit, I started programming with BASIC - you could not return structs, you could not use indirect variables (no pointers/references)... but at least you had the FOR loop :-P

People get often called out for over abstracting (rightly so), but I've rarely seen somebody critisized for copypasta or for overly stupid code. Probably because we're too accidentially afraid to imply somebody can't code.


This comes up very often and is probably a big part of the distaste many people have for jQuery. You see so much copypasta $(selector) that queries the entire DOM over and over again instead of storing the intial query in a selector, querying children based on a ParentNode, etc.. This duplication is wasteful at best, and can hurt performance at worst.

But as others noted, this is usually the sign that the creator is either green, or puts little focus in furthering their programming because they normally do other things--not malice or carelessness.


I saw a post on here recently about the “proportionality of code” (I think this was the term used) - as in, how much one line of code translates to in terms of work for the machine. Python was used as an example, in contrast with Go (list comprehensions vs Go’s verbose syntax).

I think a similar line of thinking is applicable here. $ hides a lot of work behind short syntax. The syntax isn’t “proportional” to the work. Not only that, but the amount of work depends on the argument. Perhaps it’s better that we’re forced to put the effort in and type out “document.getElementById” - it makes us think about what we’re doing.


> I've rarely seen somebody critisized for copypasta or for overly stupid code.

Do you think that is in the realm of what the article is concerned with?


Code like you describe is of often the result when a program is written by someone that does not have programming as their main profession. I have seen code like you describe in code written by scientists (in other disciplines than computer science).

They may have very deep knowledge in their field, and have written a program so solve some problem they have, but are unfortunately not very good programmers. This often results in quite naive code that still try to solve an advanced problem.

In code written by professional programmers, I have seen the pattern described in the article far more often than the naive style you describe. After all, programmers are trained to avoid duplication and finding abstractions, and will often add one abstraction too much rather than one too little.


> but I've rarely seen somebody critisized for copypasta or for overly stupid code. Probably because we're too accidentially afraid to imply somebody can't code.

It's because it's a far more benign problem than too much abstraction.

Sure it's easy to poke fun at that code and lol at how the programmer can't even use the most basic kind of abstraction, but that code is still clear and easy to read. More importantly, it is trivial to fix that kind of error.

I would take code like this any day over code written by an experienced programmer too keen on abstraction.


    plot('graph1')
    plot('graph2')
    ....
    plot('graph100')

I've done a lot of that myself. What you might not be seeing is the for loop in a scripting language that was used to generate that text. It probably took less effort than looking up and implementing it the "right" way. It might make your eyes bleed but if you need to change "plot" to another function, that's just a find-and-replace-all away. Most importantly, the code works fine and doesn't actually need abstraction.


Yes, writing a for loop in another language to generate code instead of just writing the same loop in the language you're already using? Common technique, nothing wrong with it whatsoever.


Yes, a lot of scientists use their computers in ways that horrify software developers. For example, learning exactly enough of a compiled language to do some wicked fast integer / floating point arithmetic, and not bothering to waste time on the mundane crap you find obvious. And that might mean falling back to a familiar language that makes string formatting easy.

If it ain't broke, don't fix it.


> If it ain't broke, don't fix it.

But scientific programming is deeply broken. Code presented along with publications often doesn't work, or is an incomplete subpart/toy example that's supposed to be invoked within some larger framework. That sounds great until you realize that "some larger framework" doesn't refer to a standardized tool, but some deeply customized setup (a la the one you're responding to, that uses e.g. ad hoc code generators across two--or sometimes more--languages because the original authors didn't know how to format a string in one of them).

Even if you do get lucky enough to find a paper with all requisite code included, in many cases it was only ever invoked on extensively customized, hand-configured environments. And that configuration was done by non tech folks with a "just get it to where I can run the damn simulation" attitude, so configs are neither documented nor automated. And when I say configs, I'm talking about vital stuff--e.g. env vars that control whether real arithmetic or floating point is used.

Often as not, you hack your way to try to get something--anything--running, and it either fails catastrophically or produces the wrong result. Now you have to figure out which of several situations you're in: is the research bad? Were the authors just so non-technical they accidentally omitted a vital piece of code? Was the omission deliberate and profit-motivated (e.g. the PI behind the paper plans on patenting some of the software at some point, so didn't want to publish a special sauce)? Was the omission deliberate and shame-motivated (i.e. researchers didn't want to publish their insane pile of hacks written to backfill an incomplete understanding of the tools being used)? Is it an environment-dependent thing?

And all of that is just as pertains to code in published work--usually the higher-quality stuff. Assuming ownership of in-house code from other scientific programmers is much, much worse.

This isn't abstract moaning about best practices. The failure of labs, companies, publications, and universities to combat this phenomenon has direct, significant, and negative effects on the quality of research and scientific advancement in many fields.

TL;dr it is "broke". When programmers complain about reproducibility crises in soft-science fields, they're throwing rocks from glass houses.


You're bringing in a whole host of issues inapplicable to the snippet OC found questionable. Don't disagree with ya, but "lack of obvious abstraction" isn't one of these "extreme sensitivity to environment vars" cases.

In fact, vociferously complaining about such cases is a great way to turn scientists away from code review as a concept. Fold the code away in your head (or edit your local copy), and dig for subtle issues like numerical sensitivity, environment, etc. That's the way to bring actual value to the process.

For the code in question, "oh by the way this can be done simpler", with the simplified snippet, is an appropriate approach to the review. But in my experience it's best to save your breath for actual problems.


> the code works fine and doesn't actually need abstraction

Well, maybe it works fine. We didn't see the other 97 lines to verify that they actually include all the integers from 3-99 without skipping or duplicating any. (NB with a loop this verification would be trivial.)


Maybe they deleted 57 because it triggers an edge case. Put it back if you dare. ;)

(no, that's the bad kind of tech debt that's unfortunately common and I actually hate)


This is fine for code that belongs in the trash, ie. just testing stuff, prototypes, debugging, learning the language/framework, etc.


The business codebase I'm working on now was written by OOP crazy people who thought inheritance was the solution to every line of duplicated code. When they hit roadblocks, they filled the base class with things like if(this.GetType() == typeof(DerivedClass1)){...

I would do anything to have the duplication instead.


If you're truly OOP crazy you will always find ways to avoid resorting to branching on types or even avoid branching altogether (just on the language level of course). "There's a design pattern for that" :-)


Once you ask what the class is you're no longer even "OOP crazy".

You've just capitulated to the complexity and do whatever it takes.

I don't want to sound (too) condescending. I know how easy the best intentions can lead a project there. This job is hard.


Checking for the type is the exact opposite of OO.

The correct OO would be to think about what the check represent, maybe abstract it in a base interface with pure abstract methods and derive from that interface.

What you describe is what people without understanding of OO do when they come from a language without OO.


Very relatable. And they even have the guts to call this code "SOLID"


Then the very same people learn that inheritance bad, composition good, and they'll create abstractions with no meaning on their own, which call 10 vague other abstractions (but hey, no inheritance!). Figuring out what happens there is even worse than with inheritance. Some people grow out of it, fortunately (mostly after having to deal with shit like that once or twice).


> ...they'll create abstractions with no meaning on their own...

As if that doesn't happen with inheritance!

The dark pattern is using inheritance as an alternatve way of implementing composition. Anyone who thinks that "inheritance bad, composition good" is the proper response to this is probably as confused about the issue as those making the mistake in the first place.

To be clear, you are clearly not making that claim yourself, but you are invoking it to make a straw man argument.


> they filled the base class with things like if(this.GetType() == typeof(DerivedClass1)){

That defeats the purpose of polymorphism.


wow, just reading that term "line of business" makes me anxious. I used to work on a global payments platform that supported "multiple LOBs", and it was a nightmare of ifs and switch statements all the way down. The situation was made more difficult by the fact that our org couldn't standardize the LOBs into a common enum.


Nothing I hate more than seeing two files or more, sharing 90% of the same code. No matter what justification one attempts to use, there's a mistake somewhere in the design / development process.

I can see a case for what the OP is saying, but I feel it should always be seen as a temporary measure.


It’s been the exact opposite for me. The spaghetti code has always come from poorly conceived abstractions and the massive problem of inverting an API to reimplement functionality through the API that should be extensible within the API (but fails to be because of poor choices in abstraction or abstracting prematurely).

Later on that spaghetti code gets labeled as lacking abstraction, similar to what you are saying, despite the actual problem being too much abstraction and poorly designed abstraction that became load bearing in a way where everyone decides that living with API inversion as a reality is the lesser evil and figures they’ll probably quit the company and move on to greener pastures before it becomes their headache to deal with.

https://en.m.wikipedia.org/wiki/Abstraction_inversion


Absolutely this. I’d rather look at 200 lines of linear, inline documented code then a spaghetti mess of “helper” functions that do nothing better than obfuscate everything going on.

I’ve had a strict rule with my team of “1, 2, N”. I don’t want to see an abstraction until we’ve solved a problem similarly at least two times, and even then an abstraction may still be a poor idea.

Abstraction is an especially poor idea early in a project because often you only half know what you’re making (I’m in games). Requirements change, or a special case needs to be added, and all of a sudden you are trying to jam new behavior into “generic” helpers without breaking the house of cards built around them.


I agree that over-engineered helper function hell can be a real problem.

I disagree strongly with strictly enforcing the 3x rule. The right abstraction can be helpful even if it is used only once. The right abstraction will communicate its purpose clearly and make it easier to reason about the program, not harder. Obfuscating implementation details is a feature not a bug, as long as the boundaries of the abstraction are obvious. Another benefit is it makes it easier to test the logical units of your codebase.

"It’s nice to pretend that a four word function name can entirely communicate the state transformation it does, but in reality you need to know what it does." Are you suggesting you are cognizant of every line of code of every library you use in your work?


Actually yes, you should know to at least depth=1 what your magic incantations are doing when you call them.

And that’s part of my point, if you go that one level of depth and find an excessive amount of DRY, you’ll find it that much harder to know what the hell is going on.


Yes, you should understand what a function does when you call it. Not everyone who looks at a codebase is modifying the codebase or adding new function calls. The person referencing the code may already be 1-level deep in parsing the implementation.

Not all abstractions will seem like a magic incantations when you use them. Something like "convertToCamelCase" conveys its purpose clearly enough that the reader can assume what the low-level operations are. They don't need to look at these operations every time they need to reference the code.


200 lines of code means that you have to comprehend all 200 lines simultaneously since any line could potentially interact with any other line in that code block. Using functions where the state is passed as parameters limits the potential for code interactions through functional boundaries. The point of abstractions are to limit complexity by limiting potential interactions. Helper methods do a fine job of this.


That’s a gross over-generalization to assume that 200 lines is always a self-referential mess. Functions fundamentally transform data, and often that transformation is a linear process. If it’s not, sure, break it up in a more sensible manner.

Regardless, helper methods have a significant cognitive cost as well. It’s nice to pretend that a four word function name can entirely communicate the state transformation it does, but in reality you need to know what it does and mentally substitute that when reading the function using it. No free lunch.


I worked on a webapp that our team inherited which had 400-800 line controllers (and one that was a little over 1200 lines). When I first started looking at the code I was horrified but then I realized that everything was self contained and due to the linear flow, pretty easy to understand. You just had to get used to scrolling a lot!

The issue that we started having is that pull requests, code reviews, and anything that involved looking at diffs was a lot of work. There were two main issues:

1) Inadvertently creating a giant diff with a minor change that affected indenting, such as adding or removing an `if' statement.

2) Creating diffs that had insufficient context to understand: if your function is large enough, changes can be separated with enough other lines of code to make the diff not be standalone. You end up having to read a lot of unchanged code to understand the significance of the change (it would be an ideal way for a malicious developer to sneak in security problems).


>That’s a gross over-generalization to assume that 200 lines is always a self-referential mess.

The point is that you don't know this until you look. You have to look at all 200 lines to understand the function of even one line. When you leverage functional boundaries you generally can ignore the complexity behind the abstraction.


You're fooling yourself, in a mature codebase, if you think you can modify code and not look past function boundaries.

That assertion would be more credible in a language that captures side effects in the type system, but that's not what most people use.


I'm not sure what point you're making. If you are just assuming that functional boundaries tend to not be maintained in practice then you're not contradicting anything I have said. Whether or not functional boundaries are easy/hard to observe depends on the language and coding conventions.


I have had exactly and overwhelmingly the opposite experience. I wonder if it's a function of our fields, or what...


As I gain more and more experience (I would now call myself more or less a mid-level developer), I find that the distinction that matters is not abstraction vs duplication, but the one between developer mindsets.

I have many times met/worked with people who think the main task of the developer is to 'get shit done'. Regardless of their level of experience, these developers will churn out code and close tickets quite fast, with very little regard for abstraction, design, code reuse etc.

Conversely, the approach that I feel more and more is the correct one is to treat development as primarily a mental task. Something that you first think about for a while and try to design a little. The actual typing will in this case be a secondary activity. Of course, this doesn't mean you shouldn't iterate on your design if during execution problems come up. Just that the 'thinking' part should come before the 'doing'.

My feeling is that with this second approach the abstraction/duplication trade-off will not matter so much anymore. With enough experience you will figure out what you can duplicate and what you can design. And when you design you will develop an understanding of how far you should go.

Approaching development as a task of simple execution I think inevitably leads to illegible spaghetti down the line.


I agree that many issues with bad code could really be avoided by first thinking about the solution a bit, of which the code is just an expression.

I'm not advocating weeks of architecture astronauting without code feedback - because practical considerations (e.g. the compiler can't deal with this kind of code due to some limitations) matter - but some people seem overeager to just start writing some code "and see what happens".


When considering whether some abstraction is "right" or "wrong", another important thing to consider is how cleanly the abstraction fits into a mental model of how the program works. Good abstractions provide value outside of removing duplication. They help us reason about a program by providing compression of logical concepts.

Consider some helper function: "convertSnakeToCamelCase." This abstraction would take a string, do some operations on it, and return another string. It is easy to understand what the input and output is without having to think about these operations. This abstraction provides a benefit for anyone having to think about the program because it reduces the amount of concepts the reader has to parse from N (where N is the number of operations) to 1. This is helpful because people have limited mental bandwidth and can only reason with a finite number of concepts at any given time.

Consider a different helper function: “processDataPayload.” This function takes data in some arbitrary complex shape and returns data in some arbitrary complex shape. The abstraction effectively communicates nothing to the reader, and it is actively unhelpful because it forces that person to follow a reference, remember all the details of what that function does, and substitute those details into the original function.

Trying to find the conceptual boundaries that make the program easiest to reason about IMO is more of an art than a science and difficult to govern with hard and fast rules.


Agreed. I also think it's important to create abstractions that provide guarantees and/or maintain invariants. That way, your abstractions actually help you be more confident that your code is correct.

The point of abstraction isn't per se to reduce duplication--it's to make your code more straightforward and to make errors more obvious.


Counter: Refactoring is far, far, far cheaper than duplication or wrong abstraction.

Duplication means you lose the wisdom that was gained when the abstraction was written. It means that any bug or weird cases will now only be fixed in one place and stay incorrect for all the places you duplicated the code.

About the rule of three: I personally extract functions for single-use cases all the time. The goal is to make the caller be as close to pseudo-code as possible. Then if a slightly different case comes up, I will write the slightly different case as another function right next to the original one. Otherwise, the fact that you have multiple similar cases will be lost.


Yeah, the rule of three is misleading: having a name for three lines of code that do “one thing” is almost always a win and nothing prevents a future developer from either inlining that function, if it was a bad idea, or duplicating and modifying the function.


Counter-counter:

Refactoring is by far the most expensive and error prone activity in programming. It can also be one of the most valuable. But unless it's trivial, it's the most mentally arduous and time-consuming work you do as a programmer.


I disagree. I think debugging is the most expensive activity a programmer ever does. Refactoring is a luxury that you have when you don't have bugs or time pressure to ship/fix something. Debugging potentially requires you to load the entire context of the (incorrect) program into your head, including irrelevant parts, as you grope around to figure out a.) what actually went wrong, b.) why it went wrong, and c.) how to modify the existing system in a way that doesn't make it worse.

Debugging is reverse-engineering under the gun. It has huge cognitive load. Especially debugging a production system with a difficult to reproduce bug in a deep dark part of the code. It's a nightmare scenario.

Refactoring, on the other hand, often happens with incomplete knowledge, and can be quite local. I've seen zillions of refactorings that are done with incomplete knowledge that are local improvements (and many that were not global improvements).


I don't think we're disagreeing.

When I say non-trivial, I don't mean local refactoring. I mean the kind of refactoring that requires you to load the entire system (or a large part of it) into your head, and figure out how to clarify and simplify it.

It is not a luxury. When done successfully, it is the only way to lower the cost of that expensive debugging. The slow debugging and the expensive refactoring are two sides of the same coin. They are both the cost of a system that is too difficult to understand and safely change. But the cost of a good refactoring need only be paid once. Whereas the cost of debugging a system you refuse to fix is levied again and again.


Refactoring is only error prone if you don't have integration tests. The advantage of extensive integration testing is that you can relentlessly refactor without fear of breaking things.


I'd much rather have them than not, but don't fool yourself into thinking you can refactor without any fear because you have integration tests.

No matter how many you have, they'll only be testing a tiny fraction of your possible code paths.


Tests are only sufficient if they tests cover the failure modes of the new abstraction. That is very often not the case.

Tests still help a lot, but they don't reduce the risk to zero.


This quote from John Carmack speaks very succinctly to the problems that many abstractions in a code base can cause, and it's a constant reminder for me when building out business logic.

> "A large fraction of the flaws in software development are due to programmers not fully understanding all the possible states their code may execute in."

https://www.gamasutra.com/view/news/169296/Indepth_Functiona...


This is one reason I love working in the Unity ECS framework. Your data is public and state can’t hide. Your systems are still free to contain a plethora of bugs, but they are easier to track down due to the functional nature of a system.

In the regular Unity OOP land, developers inevitably sprinkle state everywhere. Coroutines are by far one of the worst offenders. Good luck seeing the current executing state of your game when it’s hidden in local variables inside a persistent function body...


But abstractions reduce possible state and allows you to specify that state in obvious ways, e.g. on function parameters. Do not underestimate the power of functional boundaries.


They also tend to impose a degree of discipline. I've often found myself wanting to shove a parameter in somewhere and realized I didn't need the damned thing.


Reading that article and the context of the quote, it appears that Carmack is using that statement to extol the benefits of functional programming styles, not commenting on abstraction.


To me the quote speaks to the general problem of juggling state in your head when writing code. If an abstraction is an attempt to funnel a bunch of code through common logic, it can be hard to understand know what the state of your app will look like when someone else modifies that common logic.


I can't help but wonder if we're sometimes using the wrong words for things. In this discussion we keep talking about "code duplication" and "abstraction" hand-in-hand, but I think they're almost orthogonal concepts, at least as I think of them.

Seeing the same code almost copy-and-pasted in a few places might call for some code-deduplication. But that's not necessarily a new "abstraction" in my eyes. It may be, but it also may not be.

I'm struggling to think of a specific example because I fully intended to go to bed before arriving here... But as a really stupid example, let's say you have `val a = x + x + x` and `val b = y + y + y` and `val c = z + z + z` in your code. If you write a new function like `fun addThreeTimes(i) = i + i + i`, I don't see that as a new abstraction at all. If, however, you invent multiplication, now you're at a new abstraction! `val a = x * 3; val b = y * 3`, etc.

"Abstraction" to me is about thinking at a different semantic level, not about avoiding copy and paste.

Does this resonate with anyone else? Am I missing the point?


They're theoretically orthogonal but practically not. You can deduplicate code without abstraction per se, but the result is generally unreadable and unmaintainable. As such, all reasonable code deduplication relies upon abstractions. However, not all abstractions involve code deduplication, and may instead have other goals (such as making it easier to reason about local state, invariants, etc.)

> If you write a new function like `fun addThreeTimes(i) = i + i + i`, I don't see that as a new abstraction at all.

If you only call it once, it's not code deduplication either.

What differentiates addThreeTimes(i) from sqrt(x) or average(x,y) or pow(x,y) or multiply(x,y)? Not how many call sites it has, nor the presence of a dedicated operator to the function in the language. Instead, I'd say: the function's reusability, composability, commonness, ... or to put it another way: addThreeTimes is an "abstraction" - it's just a poor garbage unreusable unremarkable unrememberable abstraction with no expressive power.

However, poor abstractions aren't the only result of overeager code deduplication. Sometimes you end up with "good" abstractions misapplied to the wrong situations - e.g. they solve issues your current problem doesn't actually have. As an example, turning your list of game entities into a list of (id, aabb_f32) tuples might be exactly what you want for a renderer culling or broad phase physics pass - but completely counterproductive for implementing the gameplay logic of a turn based game! If you've already got a list of tuples, you've a few choices:

1. Modify the tuple (add tile position information that's useless to the renderer/physics, muddying the abstraction)

2. De-abstract (e.g. perhaps change several function signatures to pass in the original entity list instead of the AABB list)

3. Re-abstract (perhaps your gameplay logic should take something else that accounts for things like the fog of war instead of a raw list of entities?)

4. ???


> What differentiates addThreeTimes(i) from sqrt(x) or average(x,y) or pow(x,y) or multiply(x,y)? Not how many call sites it has, nor the presence of a dedicated operator to the function in the language. Instead, I'd say: the function's reusability, composability, commonness, ... or to put it another way: addThreeTimes is an "abstraction" - it's just a poor garbage unreusable unremarkable unrememberable abstraction with no expressive power.

I agree that call sites or presence of language operators is not the defining distinction here. But I disagree that reusability, composability, or commonness (is that not "call sites"?) are somehow defining features of an abstraction, either. Obviously, those are good qualities for code to have, but that's not related to what I'm thinking about.

The difference in my example is specific to the ladder of abstraction from addition to multiplication. When I was taught multiplication in early grade school, I was taught it as basically just being another way to write addition. When I first learned it, I would do exercises that involved taking an expression like "3 * 5" and translating it to "3 + 3 + 3 + 3 + 3" and then evaluating that. However, after time, I've stopped thinking about multiplication as addition. In my mind, I just think of multiplication as its own thing. I've fully internalize the "abstraction" because I don't even think about addition anymore when I see multiplication.

So, when we take a Year, Make, Model, and Color and group them together and call it "Car", we're making an abstraction and it has little to do with code duplication. It has much more to do with wanting to think about higher-order constructs. You and I agree here, as per your first paragraph.

If I have some kind of rendering engine and I find myself often rotating, then shifting a shape, I can write a `rotateThenShift(Shape, angle, distance) -> Shape` function and not feel like I've abstracted anything. I'm still "talking" about a shape and manually moving it around. Even if I just rename that function to `foobinate(Shape, angle, distance)`, I feel like I'm closer (but not quite) to a new level of abstraction because now I'm talking about some higher-order concept in my domain (assuming "foobinate" would be some kind of term from geometry that a domain expert might know).

All other points about good or bad abstractions apply. I just don't think every single function we write is a new abstraction.


> commonness (is that not "call sites"?)

I realize it's been 8 days, but I've mulled over the distinction and figured out the point I'm trying to make - and it's a matter of concept reuse vs code reuse. I might write a once-off, project specific, completely nonreusable function, with exactly one call site, but it still might be named after and based off of reusable concepts.

A concrete example that comes to mind: I often write a "main" function, even in scripting languages that don't require it. This lets me place the core logic at the start of the script for ease-of-reading/browsing without having all it's dependency functions defined yet. I then invoke this main function exactly once, at the bottom of the script.

This is clearly not code reuse nor code deduplication - but it is concept reuse, the concept being "the main entry point of an executable process."

I might write a mathematical function like "abs" or "distance" as a quick local lambda function without intending to reuse it as well. I might later refactor to reuse/deduplicate that code by moving it into a common shared library of some sort. I might then later undo that refactoring to make a script nice and self-contained / standalone / decoupled / to shield it from upstream version churn / to improve build times / ???

> multiplication

If you'd only used multiplication exactly once, it wouldn't have had much staying power as a useful abstraction. That it's a repeating, common, reusable pattern that can be useful in your day to day life is part of what makes it a useful abstraction worth internalizing.


I would say it becomes an abstraction when it needs parameters in addition to the nominal arguments. So in the case of "addThreeTimes(i)", sure, that's just abbreviation, but "addNtimes(i,N)" is a bona-fide abstraction.

Edit: Reminds me of the distinction between inheritance and subtyping: https://www.cs.utexas.edu/users/wcook/papers/InheritanceSubt...


I agree with your point and your edit. The inheritance thing is very similar to what I'm talking about!


In my experience, code dedupping without shared requiremeots hurts you over time. Nothing is keeping the code focused so as requirements change, you either fork it or add "one more flag".

Focusing on abstractions is what helps me in dedupping with a focus on requirents.


Two questions (genuine, not rhetorical):

(1) How much of this is because it's actually hard to back out of the wrong abstraction and pivot to the correct one, and how much of it is other causes?

The article hints at this with, "Programmer B feels honor-bound to retain the existing abstraction." Why do they feel this way, and is the feeling legitimate? Do they lack the deep understanding to make the change, or are they not rewarded for it, or are they unwilling to take ownership, or is it some other reason? I could see it going either way, but the point is to understand whether you're really stuck with that abstraction or not.

(2) How much of the wrong abstraction is because people lack up front information to be able to know what the right abstraction is, and how much of it is because choosing good abstractions (in general and specifically ones that are resilient in the face of changing requirements) is a skill that takes work/time/experience/etc. to develop?

If it's due to being unable to predict the future, then it makes sense to avoid abstractions. If it's due to not being as good as you could be at creating abstractions, then maybe improving your ability to do so would allow a third option: instead of choosing between duplication and a bad abstraction, maybe you can choose a good abstraction.


> Why do they feel this way, and is the feeling legitimate?

In my experience, it's because the amount of diff (red or green) in a change request is--consciously or subconsciously--correlated with risk.

Even though we killed SLoC as a productivity metric years ago, the idea that "change/risk is proportional to diff size" is still pervasive.

I'm totally into YAGNI/"code volume is liability" school of thought. But equating change volume with liability is a subtly different and very harmful pattern.

Adding a single conditional inside your typical 1200 line mixed-concern business-critical horrorshow function may assume a much greater liability (liability as in bug risk and liability as in risk/difficulty of future changes) than e.g. deleting a bunch of unused branches, or doing a function-extraction refactor pass. Standard "change one thing at a time" good engineering practices still apply of course.


1.) I think political and interpersonal issues can play a role here. People are often hesitant to suggest other people's code needs to be rewritten. This is especially true if an abstraction is heavily-used by the organization. If there are many stakeholders using the abstraction, the motivation behind the refactor (ie the perceived defects), would likely need to be communicated widely to justify the effort the refactor requires.


For something that argues against bad abstractions, the article sure is lacking in concrete examples and makes a point in abstract. A lot of people will likely misinterpret or get the idea that abstraction is only done for duplicated code (DRY as some people would call it). I think the wrong/bad "abstractions" here mostly refers to abstraction that was made over common code that is very specific in a context and is very susceptible to domain changes.

But there are a lot of other kinds of abstraction aside from DRY. There are abstractions made to reduce clutter and hide implementation detail and will likely be used only once. There are also abstraction that are more general and aren't coupled to the domain. These abstractions are more reusable and composable, and are immune to domain changes such as the step 6 in the article. Some people would find these kinds of abstractions harder to digest, but I personally consider these kinds of abstractions as extensions to the standard library, or even additions to the vocabulary of the programming language.

Note that I don't claim that general abstractions are necessarily better, since the generality can be made to the extreme and we'd have monads for breakfast.

All in all, I agree with the article, except that it is only referring to one kind of abstraction, although I hesitate to call it as such.


I'm skeptical because it is really easy to un-share code by copying it into multiple places but it is very hard to unify duplicated code. So I prefer to err on the side of sharing.

But yes, you should be ready to change sharing into duplication if you realize the code is just "accidentally similar" and need to evolve in separate directions.

In practice I have seen a lot more pain due to duplicate code compared to the issue of over-abstracting code, because the latter is much easier to fix.


On the other hand, it's really difficult to know who is using that shared code. If you make an innocuous change in a shared method, it could affect someone else you don't know.


It's a million times easier than figuring out if those minor differences in duplicate code are accidental or on purpose.

As bad as a flag-laden method might be, you know the intent of all callers.


It doesn't have to be that way, it's just because our existing language facilities don't have support for what we need them to do:

"Names are regularly repurposed to point to new definitions, even when the old definitions are still perfectly valid. For instance, a library author might make a function a bit more generic by adding an extra parameter, or decide to switch the parameter order. That's fine, but the old definition was not wrong, so should users be forced to upgrade? No.

Repurposing names is fine; it's hard to come up with good names for definitions, so using an old name for a related new definition often makes sense. The trouble is that existing tooling doesn't distinguish between repurposing a name and upgrading a definition. Unison changes that..." https://www.unisonweb.org/2020/04/10/reducing-churn/#incompa...


It's very easy with proper tooling.


Outside of publishing a public API almost any modern language and enviroment should make this easy.


I find it much easier to find the call sites for a function than to find code that’s duplicating or a variant of the code I just fixed a bug in so we can figure out if the same bug is latent in the duplicates too.


Not in any modern language or IDE. Not to mention that would indicate a hole in the test suite


Depends on a specific codebase? I found exact opposite to be true - very hard to reuse code that was abstracted too soon, and abstracting copy&paste the right way is actually easier if you have it in multiple cases and can see how it was used.


How is it harder to copy/paste the helper method and modify as needed, vs tracking down and unifying multiple instances of the same code written slightly differently?


Because the multiple instances are concrete while the unified code is abstract.

In general it is more difficult to read abstract code than concrete code.

Also code written using the wrong abstaction can get hairy very quickly (lots of "if" statements for various cases).


In Java, when I hit a bad abstraction, I hit the inline shortcut (command-alt-n) and then evaluate the resulting code with git diff. Other languages may be more manual, but, at worst, you just use ripgrep or similar to find all the relevant use sites and then manually expand the abstraction: this is only really a problem it the function is used hundreds of time: but, in that case, you can always duplicate the abstraction and rename.


My experience lines up with yours. Working in overly and poorly abstracted codebases dramatically hurts productivity. Poorly duplicated code increases the chance for missed patches, but poor duplication has, in my experience, been vastly easier to fix. One codebase comes to mind. Twisted Python. Multiple layers of inheritance, multiple mixins, and major overloading of methods. Just navigating the code was pain.


> it is really easy to un-share code by copying it into multiple places but it is very hard to unify duplicated code

Code that already exists has a gravity, a presumption of correctness. That presumption is very difficult to overcome, especially for programmers new to the codebase. An abstraction you think of as temporary will be, to those who come after you, simply the way things are done; breaking it apart and re-forming it is, for them, fraught with risk. It's good to keep this in mind as you make commits.


Then the same would be the case for code duplication which really ought to be unified.


I don't think I agree. Identifying an abstraction as leaky and breaking it apart is substantially more difficult and riskier than identifying duplication and creating an abstraction for it.


Removing an abstraction layer can usually be done mechanically by inlining the calls. This is a trivial operation. Identifying duplication not trivial since there might be various differences and you have to investigate if they are inconsequential or not.


The mistake is creating an abstraction because of seeing duplication.

DRY is not a good guiding principle. It is an anti-principle.

Abstractions should only be created when they have a clear purpose and create a simpler architecture by encapsulating a single concern.

The reality is that all code is duplication. The reason we write code is because it is the most concise language to specify the intended goal in the current context.

What is unique is not the code that we are writing. The unique part is the code in the current context and each level of abstraction separates the context from the implementation - so that abstraction must be beneficial in organizing the overall solution into individual logical components of singular concern.


This is so true, but so shallow too. I think the big mistake is to treat the code as "the main thing" when in reality it's just a model (a golem) mimicking some "other thing"

We're missing an entire set of code characterizations. Yes we have a "pattern language" but there's not much to characterize it structurally wrt "code distance" from one part of the code to the other (e.g. in call stack depth as well as in breadth).

And again all of this needs to happen wrt the "abstraction" not the code itself. Having 10 methods 90% duplicated in a single file with 10% pecent difference is many times better than trying to abstract it.

Having the same "unit conversion" function duplicated in 3 parts of the code can be disastrous.

These two examples are very easy to see and understand, but in reality you're always in a continuous state in between. And "code smells" like passing too many parameters or doing "blast radius" for certain code changes are only watching for side-effects of a missing "code theory". An interesting book on the topic is "Your code as a crime scene".

The bottom line is we're trying to fix these problems over and over again without having a good understanding of what the real problem is and this leads to too many rules too easy to misinterpret unless you are already a "senior artist"


> Having the same "unit conversion" function duplicated in 3 parts of the code can be disastrous.

This.

I feel like it's really about cognitive load to remember and recognize the differences.

Duplication in 3 distant files, places a heavy load on the developer to:

1. Discover the duplication 2. To grasp the reason for the differences in the 3 different locations. 3. Remember these things

Whereas when the duplication is in the SAME file, #1, #2, and #3 can become very manageable cognitively.

Now the question changes to..

Is the cognitive load of dealing with the different special cases in a single de-duplicated method GREATER than simply leaving them in separate methods?

Often the answer is duplication WITHIN a file is less of a cognitive load.

Whereas duplication ACROSS files is a heavy cognitive load.

Minimizing cognitive load minimizes mistakes. And minimizes developer fatigue. Thus boosting productivity.

At least, that's my development philosophy, even though I've never seen it in a design pattern or a book.

It just seems to make sense.


This whole thing exists on a normalized/de-normalized spectrum. The problem is that both ends have pros/cons.

On the normalized side, you have the benefit of single-point-of-touch and enforcement of a standard implementation. This can make code maintenance easier if used in the correct places. It can make code maintenance a living nightmare if you try to normalize too many contexts into one method. If you find yourself 10 layers deep in a conditional statement trying to determine specific context, you may be better off with some degree of de-normalization (duplication).

On the de-normalized side, you have the benefit of specific, scoped implementations. Models and logic pertain more specifically to a particular domain or function. This can make reasoning with complex logic much easier as you are able to deal with specific business processes in isolation. You will likely see fewer conditionals in de-normalized codesites. Obvious downsides are that if you need to fix a bug with some piece of logic and 100 different features implement that separately, you can wind up with a nasty code maintenance session.

I find that a careful combination of both of these ideas results in the most ideal application. Stateless common code abstractions which cross-cut stateful, feature-specific code abstractions seems to be the Goldilocks for our most complicated software.


Junior programmers duplicate everything.

Intermediate programmers try to abstract away absolutely every line that occurs more than once.

Expert programmers know when to abstract and when to just let it be and duplicate.


If there is one single article about programming that I hate it is this one. It is completely the wrong message. One should instead be very eager to eliminate duplication. To avoid the pitfalls that the article notes one should create abstractions that are the minimal ones required to remove the duplication to avoid over-engineering. Also one should keep improving the abstractions. That way one can turn the abstraction that turned out to be wrong into the right one. It is the attitude of constant improvement that will make one succeed as opposed to the attitude of fear of changing something that this article seems to encourage. When one does things one learns. When one is afraid to try things everything will just calcify until it is no longer possible to add any new features. What one does need to make the refactoring work is automated tests.


In 30 years, I can count on the fingers of one hand the number of times I've encountered projects that were in trouble because there was copy/pasted code everywhere and the team was not abstracting out of fear of breaking the existing code.

What I have encountered is dozens of projects that had essentially ground to a halt because of numerous deeply, and incorrectly, abstracted systems, modules and libraries.

Correcting projects in this state has almost always been refactoring into fewer abstractions; less complex, more cohesive and less coupling.


Actually, I have in fact seen this. I worked at a place where this copy and paste programming actually lead to functions that are many thousands of lines long and are full of duplication and very deeply nested. At some point a file was split because the compiler would not handle such a large file (!). Very difficult to change anything.

And also, refactoring by removing abstraction is fine as well. The thing that is not fine is having problems and doing nothing about them. To me it seem that is what the article ultimately encourages to do.


> In 30 years, I can count on the fingers of one hand the number of times I've encountered projects that were in trouble because there was copy/pasted code everywhere and the team was not abstracting out of fear of breaking the existing code.

I think the level of experience where underabstraction is common as opposed to overabstraction is so low that it's uncommon to find a team where that gets through, because even if someone junior is at the level where it's common, they’ll get corrected before it becomes a widespread problem.


I don't disagree and have seen the same thing.

However, I've also noticed in those cases that it's very hard to get people to agree on what the problem actually is. One person's incorrect over-abstraction is another person's incompletely-DRYed-up code.


DRY gets abused regularly in my experience. It doesn't stop at method/class abstractions either; I've seen entire microservices & plugins developed to ensure each app doesn't have that one chunk of auth code, for instance, even though they each may have subtly different requirements (those extra params again). The logical end to this sort of thing is infinitely flexible/generic multipurpose code, when the solution is really, probably increased specificity. DRY is probably the lowest-hanging fruit for practices/patterns, and I think this leads to a disproportionate focus on it.


It’s also easy compared to solving new problems, so it can be an emotionally safe way of feeling productive. Failure is difficult to measure until the abstraction falls flat on its face months later, at which point it can be chalked up to the demons of “changing requirements”.


That is a very, very important point; well put.

The "of course it sucks: changing requirements!" boogeyman means one of two things: "the code was written to do the wrong thing because requirements changed/weren't communicated" or "the code was hard to change when it needed to do a new thing".

Figuring out which of those two is in play is very important.


I would say that if developers are hacking on an abstraction that is ill-suited to the task until the code base is a nightmare, they will take this advice and duplicate code until it's a nightmare.

The fact of the matter is every line of code that is written has an associated cost. Developers all too often pay that cost by incurring technical debt.


That's mostly how I matured as a developer: I find myself abstracting less and writing less code today than I did 10 years ago, but I'm more productive today, my code is cheaper to maintain and has fewer bugs. Sometimes, I will literally copy paste a small amount of logic just to avoid making a future reader of this code to keep hunting around where the business logic is actually implemented. "It's right here, my dear future reader!".

Or maybe I was just a really bad programmer 10 years go :)


I find it interesting that comments on these articles mainly discuss 1 aspect about it. But rarely this part:

> Don't get trapped by the sunk cost fallacy.

In my experience, yes, programmers are hesitant to throw out an abstraction. Why not work to change this, rather than telling people not to abstract?


I don't think it's a sunk cost fallacy. I think the hesitation is more for social reasons, often not wanting to do a big pull request that's going to be scrutinized.


"Big pull requests" that are unannounced are always problematic because who wants to be the person saying "all of this work you've done is wrong"?

In such situations, it's good to get buy-in from other people before attempting to do such a thing. Make a proposal for a big change and discuss it. There's still a chance that, in the implementation it doesn't work as nicely as believed initially, but at least now it's less likely that the idea will be rejected wholesale during code review.


This advice just _feels_ very wrong. After thinking about it and seeing the other comments, some remarks:

1) It's fine to go back and duplicate code after you correct the abstraction. But it should be the _first_ phase in doing a larger pass to refactor code to fit the current business requirements. If you forgo the _second_ step, which should be to search for suitable abstractions again, you are absolutely guaranteed to be left with shit code that breaks in this situation, but not that other one, and no one knows why. I would absolutely only duplicate code as the prequel to deduplicating it again with updated abstractions.

2) If you do any of this without thorough unit tests you're insane. Keep the wrongly-abstracted code unless you have time to thoroughly fix the mess you will have made when you duplicate code again and introduce bugs (you're human, after all).

2a) If you are going to do this and there are no unit tests, create those unit tests before you touch the code initially (before the duplication).

3) Some of the comments saying you should wait until you implement something two or three times before creating an abstraction seem like comp sci 101 rules of thumb. It's way too simplistic a rule, way too general. Prematurely abstracted (haha!). The type of project and the type of company/industry will tell you what the right tradeoff is.

That is all.


The article already agrees with you on point 1:

> Once you completely remove the old abstraction you can start anew, re-isolating duplication and re-extracting abstractions.


You are assuming that the code is a moving target. Not every software project behaves that way. Sometimes, the software gets done as is.


In that case, then the original problem (incorrect abstraction) does not exist, or at least does not get worse over time, and thus does not need fixing.


I strongly dislike this article because the title is much broader than most of the substance of the article.

Advising not to overextend an abstraction is inarguable.

The actual title "Duplication is far cheaper than the wrong abstraction", and the thing that people will really discuss, is a loaded statement that's going to need a lot of caveats.


I use DRY in two ways. The first is that I'm happy to make 2 or 3 copies of a snippet before promoting that to a new function.

The second is when I find a bug in a duplicated snippet. I'll mend the snippet and its duplicates, once or twice before promoting it to a function.

In the rarer (in my line of work) instance that a common snippet gets used with several intrusive variations, I usually document the pattern. It's tempting to use templates, lambda functions, closures, coroutines, etc but far simpler to duplicate the code. But again, if a bug (or refactor) crops up and I need to fix it in many places, then I'll spend some time thinking about abstraction and weigh the options with the benefit of hindsight.


Another tip is: if you're duplicating, and they're not lines of code that are visually obviously next to each other, then leave a comment next to both instances mentioning the existence of the other.

There's nothing inherently wrong with duplication, except that if you change or fix a bug in one, you need to not forget about the other. Creating a single function solves this... but at the potential cost of creating the wrong abstraction.

When you're at only 1 or 2 extra instances of the code, just maintaining a "pointer" to the other case(s) with a comment serves the same purpose.

(Of course, this requires discipline to always include the comments, and to always follow them when making a change.)


Would the risk forgetting to update the comments not be a reason for creating a wrapper method that handled calls to both and contained the relevant advice?


Brilliant insight. Always remember: (1) make it work, (2) make it right, (3) make it fast. 80% of projects get scrapped in between (1) and (2) because you end up realizing you wanted something completely different anyway.


On my projects code doesn't make it into the main branch until it gets to at least (2).


> (1) make it work, (2) make it right, (3) make it fast.

I've always disagreed with this. In my view you should make it a habit to write optimized code. This isn't agonizing over minor implementation details but keeping in mind the time complexity of whatever you are writing and working towards a optimal solution from the start. You should know what abstractions in your language are expensive and avoid them. You should know roughly the purpose of a database table you create and add the indexes that make sense even if you don't intend to use them right away. You should know that thousands of method lookups in a tight loop will be slow. You should have a feel for "this is a problem someone else probably solved, is there a optimal implementation I can find somewhere?". You should know when you use a value often and cache it to start with. Over time the gap between writing unoptimized and mostly optimized code gets smaller and smaller just like practice improves any skill.


> In my view you should make it a habit to write optimized code.

It depends on your domain.

If you're writing for embedded, or games, or other things where performance is table stakes, then sure.

If you're writing code to meet (always changing) business requirements in a team with other people, writing optimized code first is actively harmful. It inhibits understandability and maintainability, which are the most important virtues of this type of programming. And this is true even if performance is important: optimizations, i.e. any implementation other than the most obvious and idiomatic, must always be justified with profiling.


You're mostly right, but even in typical LOB applications, there are some low-hanging fruits you should really pay attention to. One common example are N+1 queries.

And if you do find yourself writing an algorithm (something which happens more rarely in LOB applications, but can still happen occasionally), it's probably still good to create algorithms that are of a lower complexity class, provided they are not that much harder to understand or don't have other significant drawbacks. I remember that I once accidentally created an algorithm with a complexity of O(n!).


> You should know that thousands of method lookups in a tight loop will be slow.

That's not always the case. Modern compilers do a lot of things like inlining and unrolling. These days I mostly try to write code that is easy to understand.


> Modern compilers do a lot of things like inlining and unrolling

Smart ones do, I've been writing Java lately and that behavior tends to be unpredictable and rare[0]. I'd use a inline keyword if I had one, or preprocessor directive of some kind if I had that but I don't. I agree it's harder to read but I feel like changing a JVM flag to get a behavior that I want is more inscrutable than having a long method with a comment noting that this in inlined for performance reasons. With modern machines and the price of memory I tend to lean hard to the memory side of the time-memory tradeoff.

[0]"First, it uses counters to keep track of how many times we invoke the method. When the method is called more than a specific number of times, it becomes “hot”. This threshold is set to 10,000 by default, but we can configure it via the JVM flag during Java startup. We definitely don't want to inline everything since it would be time-consuming and would produce a huge bytecode." https://www.baeldung.com/jvm-method-inlining


A related problem: duplication is not equality. If two things happen to be the same right now, it doesn't mean they are intrinsically the same thing. If you have multiple products selling for $59.99 they shouldn't share a function to generate the "duplicate" price. Abstractions needs to be driven by conceptual equivalence, not value equivalence, where duplication is a good hint for a potential candidate of abstraction, but not the complete answer alone.


I think there's a big cultural challenge with adopting duplication. It goes against most people's career growth objectives.

Being able to effectively create clean, re-usable abstractions is a measure of being a "senior" engineer at many places. In other words, to be viewed as senior, you need to be able to effectively write abstraction frequently. It's hard to measure an abstraction in the moment, so a lot of people assume that the senior simply knows better.

I find this extends to a lot of programming. Seniors will often use unnecessary tricks or paradigms simply because they can. It can make it extremely difficult for junior developers to grok code. Often this re-enforces seniority. "If only the seniors can work on a section of code, then they are senior". Likewise, there are so many books on crazy architectures and patterns. It's really neat to understand, but I've determined those books are pretty much self-serving.

----

I've found that my work is often far more limited by the domain/business logic than any sort of programming logic. I'll happily write code that looks really basic - because I know ANYBODY can come in and work with that code. If I write code that a junior needs to ask me questions like "what is this pattern?" or "what does this mean?", I've written bad code.

-----

With all that being said, every single job interview I've ever had expects me to write code at the level of complexity that my title will be at. They'd much rather see me build some sort of abstract/brittle concept than using some constants and switch statements. The prior looks cool, the latter looks normal.


> I think there's a big cultural challenge with adopting duplication. It goes against most people's career growth objectives.

My experience is the complete opposite :D. What I've noticed is that the people who 'deliver' quickly (without much regard for what might be called code quality) and fulfill business requirements without much questioning are perceived as more valuable.

> I've found that my work is often far more limited by the domain/business logic than any sort of programming logic.

I broadly agree with this statement. However, just like a good carpenter knows how to properly build a bookcase, a table, a roof etc. a good developer should understand the programming logic and know how to apply it. Business requirements need to be fulfilled, but it's up to us to decide how to do that. More so, I think it's up to us to push back when we feel business requirements don't make sense from a technical point of view, or even from a business point of view.


I find statements such as this to be profoundly anti-intellectual. It suggests that we can't become better at what we do and need to be stuck at the level of "a beginner can understand that".

Now, I agree that simplicity is a virtue and that some people go overboard with crazy stuff just because they find it cool. But, as Rich Hickey says, there is a difference between simple and easy. If a junior dev doesn't understand "map", then we should explain them what "map" is, instead of going to back to writing everything with for loops.


In a large organization, the other thing you notice with trying to fix duplicated code is, if you take on refactoring it all, you are now responsible to make sure everything still works AND that you do not inhibit any future work. You are now responsible for more than you may have bargained for.

Coming up with the right abstraction takes some predicting of future use-cases. It's more than just refactoring work to put it all in one place.


I’ve seen this “hot take” a few times before and even see developers that I would have considered very good agree with it. Consider that all code is computation, this is the point of a computer: to compute. Consider that abstraction doesn’t seem valuable -to you- for a multitude of reasons. Perhaps you’re using a flawed paradigm that emphasizes objects over computation. This would obviously mean abstraction -increases- the difficulty of reasoning about your code. Perhaps you don’t have a mental map of appropriate abstractions due to a lack of education or knowledge gap, this could lead you down the path of creating abstractions which reduce duplicate characters or lines of text but are not logically sound (“leaky abstractions.”) All of these things come together in a modern “enterprise” software environment in just the right way such that abstraction starts to seem like a bad idea. Do not fall into this line of thinking. Study functional programming. Study algebraic structures. Eventually the computer science will start to make sense.


> prefer duplication over the wrong abstraction

Such a strange advice.

If you're able to recognize the wrong abstraction right away, surely you would not use it, right?


I think the intent was to communicate that abstractions aren't always right.

Some people might think that because there's duplicate code and that the abstracted code maps to the duplicated code 1 to 1 and leads to fewer lines in total, it's a good abstraction, not realizing that there are costs to doing this that may not be aware of.


The reason is that you won't know it's the wrong abstraction until it's time to modify it or add new features.


The main takeaway from the article is that abstractions which have become inadequate should be corrected (removed and/or replaced by adequate ones) as soon as possible. A corollary is that abstractions should be designed such that they can be replaced or removed without too much difficulty. A common problem in legacy code bases is not just that they contain many inadequate abstractions, but that the abstractions are entangled with each other such that changing one requires changing a dozen others. You start pulling at one end and eventually realize that it’s all one large Gordian knot. One thing that I learned the hard way over the years is to design abstractions as loosely coupled and as independent from each other as possible. Then it becomes more practical to replace them when needed.


I couldn't disagree more. There is no such thing as abstracting too early (this does not go for structural abstractions like factories, singletons, etc). The best code is code you don't have to read because of strong, well-named functional boundaries.


sometimes it's better to copy and paste some code only to make each copy diverge more and more over time (somewhat like a starting template) as opposed to introduction an abstraction to generalize some slightly different behaviors only to use said abstraction twice.

this makes even more sense when the code will live on in different programs

there's a point when incurring the cognitive overhead costs of the abstraction become worthwhile, probably after the 3rd time. but my point is that it's also important to consider that the abstraction introduces some coupling between the parts of the code.


I find it easier to read long functions of code than jumping around in helper functions or abstractions. Especially if I am not familiar with the code base and don't know common functions by heart.


> Re-introduce duplication by inlining the abstracted code back into every caller.

Ideally this type of workflow would be supported by the code editor. I've done this manually a few times and it's not fun.


Why not simply duplicate the abstraction, refactor as needed, and adjust the necessary caller(s)?

Having to know, find and maintain the individual duplications feels dirty and its own way wrong.

Choose your wrongs wisely?


Relevant post from earlier today https://news.ycombinator.com/item?id=23735991



I find that first comment particularly insightful.

However, I am not sure about the order of state and coupling. To me it seems to depend on the language, as for functional languages, avoiding state is king and in object oriented environments, coupling could be a more important factor.


One of the reasons duplication is used badly is that it is one of the easiest abstractions to recognize.

One of the ways I've seen DRY go horribly wrong involves reusable code units evolving into shared dependencies that often interdepend in complex ways. Unfortunately, the problems of such a system are observed much later than the original code duplication and fewer people have the experience to see it coming.


Sandi mentions this during a talk she gave on refactoring a few years ago. [0]

It’s a great little video for showing junior developers how a messy bit of code can be cleaned up with a few well chosen OOP patterns (and a set of unit tests to cover your ass).

[0] https://youtu.be/8bZh5LMaSmE


I want to thank everyone here, I’ve been stuck for about a week now on an issue that is entirely germane to this topic and the whole conversation here really helped me flesh out what was wrong and allowed me to understand a path forward. I’m honestly holding myself back from popping onto my computer right now to start working on it.


"With C you can shoot your own foot. With C++ you can blow your own leg off". I feel the same is true here.

The abstraction may be right at the time of writing, yet further on it often becomes not only wrong, but a massive hindrance.

With time and effort, hacky code and be worked into shape. An eventual wrong abstraction normally means a rewrite.


I wish this article was available two years ago when I tried to explain this to a bunch of juniors working for me...


“ Posted on January 20, 2016 by Sandi Metz.”


Damn, I wish I saw it back then :)


This has been one of the hardest fought lessons I’ve learned it my programming career, but also one of the most fruitful. I am to make my abstractions too late rather than too early. My rule of thumb tends to me copy things six to seven times before you try to build an abstraction for it.


I think one of the cool thing about pattern matching or language(In my case, it's Elixir) that support function operator is we can have same method with different argument sigunatures. So we don't have to duplicate or inherit whatever and still share some common method.


Really this is stating the obvious.

The social problem at step 6, 7, and 8 is a social and economic one. Having the time, resources, and skill to do a job properly is very important. But there are social and economic pressures to "just get it done".

This is a specific formulation of a general problem.


Rob Pike discusses similar points in this section of his talk on Go Proverbs https://www.youtube.com/watch?v=PAAkCSZUG1c&t=9m28s.


I'd rather ctrl-f and change code in multiple places than deal with abstraction hell.


This again?? ;)

I love this post. A lot of wasted hours were spent in the past trying to use abstractions that no longer made sense, but Sandi encouraged me to go back and rethink a lot of that and now my code is way easier to read. Thanks Sandi!


Programmer B in Step 6 should have used SOLID and refactored to extend the module (or something similar).

This is strawman argument which has little to do with the "wrong" abstraction and everything to do with poor design choices.


Reminds me of this discussion: https://news.ycombinator.com/item?id=12120752 (John Carmack on inlined code).



What are people's recommendations on books on how and when to create the right abstractions?

Last year I read Zach Tellman's _Elements of Clojure_ and really loved the parts that touched on the subject of abstraction.


Early de duplication is the equivalent of early optimization: a bad idea that boxes you in.

Duplicate code is a sign that there could be a generalization missing.



I think the term "wrong" causes all the misunderstandings.

It sounds like the abstraction was wrong in the first place.

Can it be called "rotten" abstraction?


> they alter the code to take a parameter, and then add logic to conditionally do the right thing based on the value of that parameter

But that's a textbook example of bad code, competent coders don't do this.

Update: for example see Thinking Forth chapter "Factoring Techniques", around the tip "Don’t pass control flags downward.". Page 174 in the onscreen PDF downloadable from sourceforge.

And there is no need for duplication. The bigger function can be split into several parts so that instead of one call with flag everyone calls needed set of smaller functions.


> that's a textbook example of bad code, competent coders don't do this.

That's reductive and dismissive.

There's a ton of subtlety in even defining the terms for that "best practice". What counts as a control flag versus a necessary choice that must be made by callers? Are you still passing control flags if you combine them into a settings object? What if you use a builder pattern to configure flags before invoking the business logic--is that better/worse/the same? What if you capture settings inside a closure and pass that around as a callback? How far "downward" is too far? How far is not far enough (e.g. all callers are inlining every decision point)?

The answer to all of those is, of course, "it depends on a lot of things".

And that's before you even get into the reality (which a sibling comment pointed out) that even if we grant that this is inherently bad code, that doesn't imply anything about the competence of the coder--some folks aren't put in positions where they can do a good job.

Unrelated aside: Thinking Forth is an excellent book! Easy to jump into/out of in a "bite size" way, applicable to all sorts of programming, not just Forth programming.


Competent coders do suboptimal things all the time, especially when there is delivery pressure; competent doesn't mean infallible or perfect.

There's also not a clear boundary between what is a single appropriate abstraction and two (or N) distinct but superficially related concepts.


There should be a code tool to re-inline code from an abstraction


Mods this article is old, should be labeled 2016.


“Premature optimization is the root of all evil”


A manager once asked me: please reuse as much code as you possibly can.

This reminded me of that.


I’m not sure why this is #1... but since it is, both of these - duplication and wrong abstractions - are otherwise known as technical debt.


Not necessarily. Technical debt is when you do something quick and dirty to get a feature out in the short-term knowing that it won't be maintainable, scalable, etc, but you do it anyway with the expectation that you'll fix it later. Some duplication and wrong abstractions are caused by this, but definitely not all.


No, technical debt is a very general category that includes deliberate hacks, structural flaws, and small mistake bugs. It's anything that over time will damage the code base, duplications and wrong abstractions being very much included in that


You're welcome to your own definitions, but personally I keep bitrot, deferred maintenance, and "structural flaws" (which can be subjective and dependent on use cases and scale) out of the bucket of technical debt since it robs the metaphor of a defining aspect: intentionality. Debt is not something that happens passively as the world changes around you, it's something which you sign up for.


If you unintentionally destroy property and have to pay for it, you’re in debt.

We even have a concept of life debt.

Some debt is intentional, some incidental.

Most technical debt I’ve seen was not intentional, just a well meaning design that was created to serve a purpose that eventually outgrew it, and that’s when the interest started to pile up.

And happening passively is exactly what it does, interest rates change, your ability to make downpayments change. All part of the very well functioning metaphor in this context.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: