> Instead of our code being architected around the concept of how pizzas are made in the abstract, its architecture is tightly coupled to the specific needs of these two pizzas that we happened to be dealing with. The chance that we will be putting this code back the way it was is extremely high.
Mistake 1: Switch from DRY to premature optimization.
> You might think that legit reasonable developers but would not actually do something like this and would instead go back to the existing invocations and modify them to get a nice solution, but I've seen this happen all over the place.
Mistake 2: Assumption of incompetence to support your argument.
> . As soon as we start the thought process of thinking how to avoid a copy paste and refactor instead, we are losing the complexity battle.
Mistake 3: Strawman argument. DRY does NOT lead to over-complicating things.
Overcomplicating things leads to overcomplicating things.
Now, i wasted 5 minutes, so you can waste some more to reply to this comment, instead of completely ignoring this dumb random blog post.
I don't read coding opinion articles like OP but I like to check out comments.
> DRY does NOT lead to over-complicating things.
That is not true. I dive around foreign code bases a lot and dry-ness is actually a significant complicating factor in understanding code, because you're jumping around a lot (as in physically to different files or just a few screens away in the same file). As in, inherently every time it's used, not just in situations where it's used in a complicated way.
This sounds dumb but it just simply is much harder to keep context about what's going on around if you can't refer back to it because it's on the same screen or one short mouse scroll above or below your current screen.
That obviously doesn't mean you should leave copy pasted versions of the same code over your code base. But it's important to consider that refactorization of that code into something common that gets called from multiple places as something that you don't get for free, but that is an active trade off which you usually have to apply to prevent bugs (changing one code location and not the other) or simple code bloat. In practice this is very relevant when you suspect something might be repeated in the future, but you're not sure. Imo: Just don't factor it out into anything, leave it there, in place, in the code.
What does `make_pizza()` do? It could be a lot or it could be a little. It could have side-effects or not. Now I have to read another function to understand it, rather than easily skimming the ~four lines of code that I would have to repeat.
I think the article fails to show particularly problematic examples of DRY. E.g. merging two ~similar functions and adding a conditional for the non-shared codepaths. shudders
> What does `make_pizza()` do? It could be a lot or it could be a little. It could have side-effects or not. Now I have to read another function to understand it, rather than easily skimming the ~four lines of code that I would have to repeat.
This is not a problem of DRY. This is a problem of wrong abstraction and naming. If the function is just four lines, it could easily be named `make_and_cook_pizza`. In the alternative scenario where those four lines are copy pasted all over the place, one is never sure if they are exactly the same or have little tweaks in one instance or the other. Therefore, one has to be careful of the details, which is much harder than navigating to function definition, because in this case you cannot navigate to other instances of the code.
Exactly this. I fixed a problem like this a week ago. I found some duplicated code, factored it out into one place by introducing an abstract base class (Python) and in the process discovered one of the duplicated methods had a logic error leading to returning a slightly smaller integer result.
The code had test coverage, but the test confirmed that it produced the wrong result. I had to fix the test too.
in a sense yes, in a sense no. if you see a function and know its sort of black box properties and its inputs and outputs are well defined, you really don't need to care. however, that applies whether the code is in an external function/module or physically inlined into your code. the sectioning off into separate code is then there to forcefully tell the reader "don't even try to care about the implementation details of this", so in practice your point still applies.
however... real software doesn't work like this. the abstractions that work that way exist for a select few very well understood problems where a consensus has developed long before you're looking at any code.
math libraries would be a typical example. you really don't need to know how two matrices are multiplied if you know the sort of black box properties of a matrix.
but the minute functions, classes, and other ways of abstraction code in a DRY way that you encounter constantly in everyday code, even if they are functionally actually well abstracted (meaning it does an isolated job and its inputs and outputs are well defined), even for simple problems, are typically complex enough that learning their abstract properties can be the same level of difficulty and time investment as learning the implementation itself. on top of practical factors like lack of documentation.
this is also why DRYness as a complicating factor really doesn't factor in once the abstracted code does something so complex that there is no way you could even attempt to understand it in a reasonable amount of time. like implementing a complex algorithm, or simply just doing something that touches too many lines of code. in this case you are left to study the abstract properties of that function or module anyways.
which is much more understandable than eight lines of regex magic where you don't even know what the regex is doing.
The problem of not knowing what it does or whether it has side effects or not is more a problem of naming and documentation than DRY. Even then, it's still better than repeating the code all over, simply because when you read and understand the function once, you don't need to go back. On the other hand, if the code is all over, you need to read it again to recognize it's the same piece of code.
if those 8 lines of regex have been unit tested and the function is commented to describe "what" the code does, it is entirely the point that you don't need to understand how it works
additionally, the function should be stateless and have no side effects ;)
How do you test 8 lines of regex inside a function that does more things? And what's easier, to write and read the function name or copy-paste the lines with the comment (if the comment explaining what that piece does is even written, that is)?
Except that there should be only one way to make a filename from a string. Maybe some options like "allow_spaces" if needed but the point of DRY is not only to share code but to share algorithms.
Yup. But I guess that typically happens in steps. So next DRY-programmer that comes along will add a cheezeFilledCrust boolean to that make_pizza function and so on. Every time it will seem more reasonable to add another boolean, because otherwise you have to remove the make_pizza function, and there would be SO MUCH CODE DUPLICATION.
I’ve seen this again and again in the field and I wholeheartedly agree with the sentiment in the OP. IMHO different code paths should only share code if there is good reason to believe that the code will be identical forever.
Now the next genius turns up and says that make pizza is at it's core always a n-step domain process.
So now you've dumped it down to an interface with a default implementation which calls the create_dough, add_toppings, bake_pizza interfaces in order, each of which are either passed in callbacks or discovered through reflection.
We can even sprinkle in some custom DSL to "abstract away" common step like putting the product into the oven correctly!
Jr's will never understand when why and what is effectively excecuted at runtime. Honestly, at this point I enjoy working with this kind of code. It's always such a high entertainment value and I get paid by the hour, so whatever
The strength of the reaction made me realize just how widespread and intractable the "wrong abstraction" problem is. I started asking questions and came to see the following pattern:
1. Programmer A sees duplication.
2. Programmer A extracts duplication and gives it a name.
This creates a new abstraction. It could be a new method, or perhaps even a new class.
3. Programmer A replaces the duplication with the new abstraction.
Ah, the code is perfect. Programmer A trots happily away.
4. Time passes.
5. A new requirement appears for which the current abstraction is almost perfect.
6. Programmer B gets tasked to implement this requirement.
Programmer B feels honor-bound to retain the existing abstraction, but since isn't exactly the same for every case, they alter the code to take a parameter, and then add logic to conditionally do the right thing based on the value of that parameter. What was once a universal abstraction now behaves differently for different cases.
7. Another new requirement arrives.
Programmer X.
Another additional parameter.
Another new conditional.
Loop until code becomes incomprehensible.
8. You appear in the story about here, and your life takes a dramatic turn for the worse.
Existing code exerts a powerful influence. Its very presence argues that it is both correct and necessary. We know that code represents effort expended, and we are very motivated to preserve the value of this effort. And, unfortunately, the sad truth is that the more complicated and incomprehensible the code, i.e. the deeper the investment in creating it, the more we feel pressure to retain it (the "sunk cost fallacy"). It's as if our unconscious tell us "Goodness, that's so confusing, it must have taken ages to get right. Surely it's really, really important. It would be a sin to let all that effort go to waste."
Not really a sunk cost fallacy. Existing code needs to be maintained. Some of it should be deleted since it costs more to maintain. Some of it shouldn’t be deleted since it might bite you in the behind when you realize that all of that code was correct (although gnarly) and now you’ve introduced regressions. And which code is which? Hard to say.
Sunk cost (fallacy) is about making decisions based on things that you have already lost. But you haven’t lost or expended the code—the code is right there, and it’s hard to know if it’s more of an asset or a burden.
Some languages handle massive parameter lists better than other (ex with defaults). There are also design patterns for this type of problem (ex a PizzaBuilder).
But that is usually a problem with abstraction, rather than a problem with a method call. If I can trust what make_pizza does, that is much faster to read than any four lines of code.
A functional style certainly helps. I get the pizza in my hand and don’t have to worry that anyone left the oven on.
You can't, unless it's in a standard library or a core dependency used by millions of people.
That's one of the reasons why functional code is generally easier to read. A lambda defined a few lines above whatever you're reading gives you the implementation details right there while still abstracting away duplicate code. It's the best of both worlds. People who's idea of "functional programming" is to import 30 external functions into a file and compose them into an abstract algorithm somewhere other than where they're defined write code that's just as shitty and unreadable as most Java code.
>> You can't, unless it's in a standard library or a core dependency used by millions of people.
You can if you have reasonably competent colleagues. And if you do make some wrong assumptions about what a certain method does, it should be caught by your tests.
I feel that people that insist on reading and understanding all the code, and write code that has to be read fully to be possible to understand what is does, have missed something quite fundamental about software development.
Thanks - I like this point. I think it's probably a better illustration of what I'm trying to say in my third point. Devs are biased towards adapting existing shared code so we end up with shared libraries picking up little implementation details from each of their consumers and ultimately becoming very messy.
> I think the article fails to show particularly problematic examples of DRY. E.g. merging two ~similar functions and adding a conditional for the non-shared codepaths. shudders
Not a problem of DRY, but bad code structure.
Just keep the two functions and pull the shared code-path out
Not all the time. When the similar code mixes types and the common codepaths are sprinkled multiple times over it you can either have the code there twice, or have an overcomplicated templated common function.
In these cases factorizing may or may not be a good idea.
I think it's just that for every complex topic, any general rule will break down at some point. That doesn't tell you that the rule is bad, but to learn how to tell when you're dealing with such an exception.
DRY makes it harder to actually understand the system as a whole in some sense, since it usually means some indirection has been added to the program. However, it avoids the one thing that actually makes me pull hair out: code that looks the same because it was duplicated but is just different enough to trip you up because each area it was used required minor syntax changes that had major implications for the result.
Repetition also makes it harder to understand a system: not only do you have to read more, you also need to remember and compare repeating fragments that may be identical or just similar.
What makes it easier to understand a system is simplicity. I'd argue that DRY, deployed with a right strategic plan, usually does more to simplify things than does copy-paste.
But, as any tool, DRY is but a tool; to be useful it requires some skill.
Indeed. In any practical optimisation problem, which is fundamentally what all engineering is, there's a sweet spot.
You can't just slam the DRYness knob to 11 and expect it to always be better, any more than you can turn a reflow oven up to 900°C and expect it to be better, just because 380°C is better, for the specific PCB in question, than 250°C.
It also doesn't mean you can turn it off entirely, just as if you look at your charred results at 900°C you don't conclude that "heaters considered harmful".
Also, the problem is strongly multivariate and the many variables are not independent so the "right" setting for the DRYness knob is not necessarily the same depending on all sorts of things, technical and not, up to and including "what are we even trying to achieve?"
> I dive around foreign code bases a lot and dry-ness is actually a significant complicating factor in understanding code, because you're jumping around a lot (as in physically to different files or just a few screens away in the same file).
I can't agree more. Also, "code reuse" makes debugging significantly harder when trying to reverse engineer some code base. The breakpoints or printf:s get triggered by other code paths etc. And you need to traverse stack frames to get a clue what is going on.
Extra bonus points for fancy reflection so that you have no clue what is going on.
I can't disagree more. DRY forces you to create pure reusable code, and split your code into small pieces. When I read such code I need to understand just a few pieces.
You need multiple cases of duplication (repeating yourself) before you can infer a reusable piece of code.
If you make everything as generic and reusable as possible from the beginning, you'll end up with messy code that has way too much options to set for every simple operation.
> This sounds dumb but it just simply is much harder to keep context about what's going on around if you can't refer back to it because it's on the same screen or one short mouse scroll above or below your current screen.
To note, a common effect of not DRYing functions is an increase in local code length.
In many code bases that lived long enough, that means screens and screens of functions inside the module/class files. It is still easier to navigate than between many files, but not by that much in practice (back/forth keyboard shortcuts go a long way to alleviate this type of pain)
That's point. However, I think this is more of a "verbose vs elegant argument". Yes, DRY should not be a religion - I will write more code, possibly duplicated, if I deem it necessary for the code to be more readable that way. It's a judgement call, but I think the basic concept of DRY should still stand. If you find yourself cutting and pasting too much, stop, go get a coffee, take a walk, come back, and see how you can do it better.
> This sounds dumb but it just simply is much harder to keep context about what's going on around if you can't refer back to it because it's on the same screen or one short mouse scroll above or below your current screen.
> Mistake 1: Switch from DRY to premature optimization.
Mistake 1a: Conflating the term "premature optimization" - it doesn't apply here. Premature optimization is about runtime performance, DRY is about optimising maintenance overhead.
Mistake 1b: (good) DRY can't be done early (it's a continuous process throughout project development).
> Mistake 2: Assumption of incompetence to support your argument.
Mistake 2: Assuming you're never working in teams leveraging peers' of varying experience and technical focus.
The presumption of re-usability is absolutely the most common red flag I've seen with DRY: I've seen it with a lot of very senior / experienced devs. You can call them incompetent, but there's plenty of them and we have to work with them. Articles like this help.
> Mistake 3: Strawman argument. DRY does NOT lead to over-complicating things. Overcomplicating things leads to overcomplicating things.
This statement concerns me. DRY very obviously and demonstrably leads to over-complicating things (excessive / ballooning parametrisation is just one of many very simple examples of this). If you can't see this I would have my own concerns about competence...
Thanks for having my back. #3 is an overwhelming real world phenomenon. In fact, I posted my article on reddit and someone wrote back a comment with a huge OOP solution that would mitigate all my problems. Not sure that reader got to point #3.
> Mistake 1: Switch from DRY to premature optimization.
"Premature optimization" is largely a bogus concept, because the meaning of "optimization" has shifted a lot since the concept was first created.
People now use optimization to mean "sensible design that does not needlessly waste resources".
In this meaning of optimization, "premature optimization" is a bogus concept.
You should absolutely ALWAYS write non-pessimized code by default.
What the original concept referred to is what people now call "micro optimizations". Sure, premature micro optimizations is often a waste of time. But this is irrelevant to the context of this discussion.
> In this meaning of optimization, "premature optimization" is a bogus concept.
The idea is that you can end up optimizing before you know the entire use-case, because software engineering isn't like building bridges or skyscrapers.
I'm a performance geek, but I love code I can easily change rather than code that is fast until some customers have touched it. Mostly out of experience with PMs with selection bias on who they get feedback from ("faster horses" or "wires we can hook phones to").
The first thing to optimize is how fast you can solve a new problem that you didn't think about - or as my guru said "the biggest performance improvement is when code goes from not working to working properly".
The other problem with highly optimized code is that it is often checked-in after all the optimizations, so the evolution of thinking is lost entirely. I'd love to see a working bit + 25 commits to optimize it rather than 1 squashed commit.
Optimized code that works usually doesn't suffer from this commentary so the biggest opponents I have with this are the most skilled people who write code with barely any bugs - I don't bother fighting them much, but the "fun" people with work understand my point even if they write great code first time around.
These two are mostly why I talk to people about not prematurely optimizing things, because I end up "fixing" code written by 15 or more people which has performance issues after integration (or on first contact with customer).
The reasoning behind discouraging premature optimization makes no distinction between "micro optimizations" and any other kind, the purpose of this guidance is to minimize wasting time building unnecessarily complex solutions based on untested performance assumptions.
If you're writing and enterprise app and lean back in your chair and start to think about speeding things up with loop unrolling and avx instruction sets then you're doing the premature optimization thing.
But trying to limit large nested loops is easy fruit that doesn't take much effort to pick.
the typo is "non-pessimized code", which should be "non-optimized code".
I see humor in thinking if my code is pessimistic enough. Have I assumed that the edge cases will happen and worked around them? Do I expect (and handle) crashes, i/o failures, network timeouts, etc?
"code pessimism" could be an interesting metric.
The typo in the other post was "superb owl" which should have been "super bowl". Several people on that thread enjoyed the typo, including a comment from CostalCoder saying "Please, please do not correct that typo!"
I think they used that term on purpose. Non-pessimized in this case is the same as optimized, and I believe it's a reference to this video https://youtu.be/7YpFGkG-u1w
> Assumption of incompetence to support your argument.
Okay, but it kind of is about incompetence. And by “it” I mean everything. Look, we all remember that first time we all realized that adults are just winging it most of the time. Almost nobody knows what they are doing. Half the people who “know” actually know the least.
> DRY does NOT lead to over-complicating things.
Don’t Repeat Yourself is a terrible acronym because what it stands for is exactly the opposite of what people do. Not doing something is avoidance, opting out, like “don’t push your sister” vs “be nice to your sister”.
What most people do is they realize they have already repeated themselves, or someone else, and they rip it out. They deduplicate their code. Avoidance definitely can “lead” somewhere, but deduplication is active, and that can often be headed the wrong way, either directly or obliquely.
The Rule of Three is much clearer on this. You get one. There’s nothing to do when you see you’ve duplicated code - except to check if you’re the first or not.
> Mistake 3: Strawman argument. DRY does NOT lead to over-complicating things. Overcomplicating things leads to overcomplicating things.
Sure, I agree, except DRY is probably the second greatest gateway drug to overcomplicating things to OOP. Actually, they really hand-in-hand since OOP features are often used to DRY things.
DRY can easily go too far because fundamentally it's about centralizing ideas with the premise that different operations can and should share units, even though a "writeSomeFileToDisk" function doesn't necessarily have to do the exact same thing between different higher-level operations. Because so many engineers emphasize "elegance", if a set of functions seem similar enough, they pressure themselves to write code that is shareable, hence more abstract. Abstractions are inherently more complicated and hard to understand, not the other way around. Rather than having very simple "molecules" of code that can be understood on their own, there is instead a much larger molecule of nodes that are connected by abstract dependencies, and those nodes may only have dependencies in common.
DRY should be done sensibly, but teaching DRY is a problem in our industry because we don't teach engineering discipline. We teach principles like DRY and OOP, and even YAGNI as if they are tenets of a religion.
> Overcomplicating things leads to overcomplicating things.
This would be the most efficient title, subtitle, and entire contents of most posts about programming principles.
However, each reader has to have a similar enough perspective, background, and experience to understand and apply it. In that sense, the trend line measuring the value of commenting about comments about random blog posts indeed indicates wasted time, but hopefully it's a local minima.
My pithy corollary to your helpful tautology is a quote from Tommy Angelo that's stuck with me since my poker days: "The decisions that trouble us most are the ones that matter least."
Decisions are necessarily difficult to make when the expected value of either outcome are similar. We waste an awful lot of time on choices that could have been made just as well with a coin flip.
So there you go world: two quotes that are generally useful about generalities that are locked, loaded, and ready to shoot you in the foot when misapplied.
What's funny is that DRY was first popularised in the Pragmatic Programmer[0] book, and "coincidental" duplication is explicitly addressed right there on page 34, "not all code duplication is knowledge duplication... the code is the same but the knowledge is different... that's a coincidence, not a duplication."
Article tl;dr: Design is hard and can't be boiled down into applying pithy mindlessly.
For what it's worth, I agree with your points and disagree with the various counterpoints that were posted; "optimization" can mean a lot of things, and I for one understand what you mean.
No really, you're absolutely on point. The post is not worth the time, the case against DRY is too weak.
Sounds like a kid complaining about pushing DRY in a direction that overcomplicated things for him because of himself and instead of improving himself he choosed to attack "an uncomfortable principle".
Mistake 1: Switch from DRY to premature optimization.
> You might think that legit reasonable developers but would not actually do something like this and would instead go back to the existing invocations and modify them to get a nice solution, but I've seen this happen all over the place.
Mistake 2: Assumption of incompetence to support your argument.
> . As soon as we start the thought process of thinking how to avoid a copy paste and refactor instead, we are losing the complexity battle.
Mistake 3: Strawman argument. DRY does NOT lead to over-complicating things. Overcomplicating things leads to overcomplicating things.
Now, i wasted 5 minutes, so you can waste some more to reply to this comment, instead of completely ignoring this dumb random blog post.