Hacker News new | past | comments | ask | show | jobs | submit login
DRY is an over-rated programming principle? (gordonc.bearblog.dev)
599 points by gcassie on July 7, 2022 | hide | past | favorite | 486 comments



A better formulation of DRY is SPOT (Single Point Of Truth). Definitions (code, data) that represent the same “truth”, i.e. when one changes all have to change to represent a consistent truth, should be reduced to a single definition. For example, if there is a rule that pizzas need at least one topping, there should only be a single place where that condition is expressed, so that when the rule changes, it isn’t just changed in one place but not the others. Another example is when fixing a bug, you don’t want to have to fix it in multiple places (or, more likely, neglect to fix it in the other places).


I agree with this as it puts emphasis on semantics rather than syntax and encourages focusing on intentionally similar code rather than unintentional.

A related principle is what I call code locality. Instruction locality is the grouping of related instructons so they can be the CPU's cache (can an inner loop fit all in cache). Similar for data locality. Code locality is for humans to discover and remember related code. As an example of this is those times you make an internal function because you have to but it has a terrible abstraction (say a one-off comprator for a sort), its best for comprehending the caller to be near where its needed within the same file rather than in a separate file or in a dependency in another repo.

Applying code locality to SPOT, when you do need multiple sources of truth, keep them as close together as possible in the code.


I usually say: if things tend to change together, then they should be closer together.

This is also why using layers as your primary method of organization is usually a bad decision. You're taking things that change together and strewing them about in different top level directories.


On a similar note, tree/graph structures should be avoided versus lists unless there is a good reason. Flat is better than nested. A linear block of code is far easier to reason about than a network of function calls, or (heaven forbid) a class hierarchy.

Not that such tools don't have their place, but I've seen too much convoluted code that has broken simple things into little interconnected bits for no reason other than someone thinking "this is how you're supposed to write code" and having fun adding abstractions they Ain't Gonna Need.


I think this can be generalized even further to the "principle of least power": https://www.lihaoyi.com/post/StrategicScalaStylePrincipleofL...

The graph data structure, I believe, is the most generic of data structures, and it can be used to represent any other data structure. This arguably makes it both the most powerful data structure and the, IMO, the one of last resort.


Yes, that's kind of a "fewer moving parts" way of looking at it.

To be clear, I (like the parent post) was using data structures as a metaphor for the organization of source code: "trees" being code that is conceptually like a branching flowchart with many separate nodes, and "lists" being lines of code kept together in one function/class/file (which the parent post points out has the benefit of "locality").


While it is a little easier to deal with flat data than tree data, the real test of this comes when you process the data.

Trees invite recursion, and as the logic grows telling what is going on gets more and more difficult. While it's easier to process trees recursively, it is possible to iterate over it. It's easier to process lists iteratively, but it is possible to handle them recursively. Some people do, much to the detriment of all of their coworkers.

I'm trying to unwind some filthy recursive code on a side project at work. There were two bugs I knew about going in, and two more have been found since, while removing mutually recursive code in favor of self recursion, and then modifying that to iteration.

Iteration is much, much easier to scale to add additional concerns. Dramatically so when in a CD environment where a change or addition needs to be launched darkly. Lists can help, especially with how carefully you have to watch the code. They're not the only solution, they're just the easiest one to explain without offering a demonstration.


I'm not sure I follow... Can you provide an example? (junior dev here)

If I understand some of it correctly, I was contemplating this when I started writing functions for "single functional concepts" like, "check for X; return true or false", then called each of those functions sequentially in a single "run" function. Is that what you mean?

I found that approach much easier to test the functions and catch bugs, but your comment seems to go against that.


Disclaimer, it's mostly a personal thing.

Dividing things into methods mostly gives the benefit of naming and reuse. If you can skip the method's content and reasonably assume it works in some way, it can make reading easier.

If the reader instead feels inclined to read it, they now have to keep context of what they read, jump to the function (which is at a different position), then read the content of the function, then jump back while keeping the information of the function in mind. In bad cases, this can mean the reader has to keep flipping from A to B and back multiple times as they gain more understanding or forget an important detail.

The same thing happens with dividing things up in multiple classes. Don't need to read other classes? Great. Do need to read other classes? Now you have to change navigation to not only vertical, but horizontal as well (multiple files). Some people struggle with it, some people hate it, other people prefer it.

You're basically making a guess what's best in this scenario, and there isn't a silver bullet. The extreme examples are easy to rationalize why they are bad. The cases closer to the middle, not so much.


My own understanding of the comment:

They're making a connection between non-jumping code with a list, and jumping code, like classes, functions, etc. with a graph. A list can be iterated from beginning to end and read in sequence, or can be jumped into at any point in the code at arbitrary points. Similarly, if I open a file and want to read code, I can jump to any arbitrary line in the code and go forwards or backwards to see what the sequence of code is like. I can guarantee that line n happens before line n+1.

With functions, classes, modules, whatever, the actual code is located somewhere else. So for me to understand what the program is doing I have to trust that the function is doing what it says it's doing, or "jump" in the code to that section. My sequence is broken. Furthermore, because I am now physically in a new part of the code, my own internal "cache" is insufficient. It takes me some effort to understand this new piece of code and re-initialize the cache in my mind.

The overall thrust of DRY is overrated is that we often times take DRY too literally. If I have repeated myself I MUST find a way to remove that repetition; this typically means writing a function. Whereas previously the code could be read top to bottom now I must make a jump to somewhere else to understand it. The question isn't whether this is valuable, rather, whether it is overused.


Specifics might depend on the language, domain, and team (and individual preference), so it's hard to avoid being general.

I would say some junior devs can get too fixated on hierarchies and patterns and potential areas of code re-use, though; instead, they should try to write code that addresses core problems, and worry about creating more correct abstractions later. Just like premature optimization is the "root of all evil", the same goes for premature refactoring. This is the rule of YAGNI: You Ain't Gonna Need It. Don't write code for problems you think you might have at some indeterminate point in the future.

When it comes to testing, TDD adherents will disagree, but if you ask me it's overkill to test small private subroutines (or even going so far as to test individual lines of code). For example, if I have a hashing class, I'm just going to feed the hash's test vector into the class and call it done. I'm not going to split some bit of bit-shift rotation code off into a separate method and test only that; if there's a bug in that part, I'll find it fast enough without needing to give it its own unit test. That's what debuggers are for. All the unit test should tell me is whether I can be confident that the hashing part of my code is working and won't be the cause of any bugs up the line.

Obviously I'm not in the "tests first" camp; instead I write tests once a class is complete enough to have a clearly defined responsibility and I can test that those responsibilities are being fulfilled correctly.


I'll start out by saying I have some pretty strong positions opposing what it sounds to me like yours are.

>I would say some junior devs can get too fixated on hierarchies and patterns and potential areas of code re-use, though;

Agreed.

> instead, they should try to write code that addresses core problems,

Still with you, agreed.

> and worry about creating more correct abstractions later.

I don't quite agree. I agree you shouldn't spend too much time, but I think "don't worry about it" gets you a hodgepodge of spaghetti code and half-baked ideas that no one can maintain in the future.

One of the most important things someone can do when adding a feature for instance is understanding the surrounding code, it's intentions, and how any abstractions it may have work, and then add their feature in a way that complements that and doesn't break backwards compatibility.

I'd go as far as arguing that only ever MVP'ing every story without a thought to design or abstraction is one of the major problems in industry alongside cargo-culting code maintenance rather than ever trying to form deep understanding of any meaningful part of the software.

> Just like premature optimization is the "root of all evil", the same goes for premature refactoring. This is the rule of YAGNI: You Ain't Gonna Need It. Don't write code for problems you think you might have at some indeterminate point in the future.

YAGNI is far too prescriptive and misses the point of programming that literate programming gets right:

Programming (like writing) is about communicating intent to other human beings in an understandable way and shuffling complexity around in the way it makes the most sense for:

- The typical reader skimming to understand something else (Needs to know: what does it do?) - The feature adder (needs to know how it works, so they need a high level, then easy way to understand the low level as needed) - The deep reader (needs to be able to take in all of the code to deeply understand it, needs straightforward path to get there)

What you describe sounds like no abstraction and just throwing all of the complexity in front of everyones faces all at once. I can appreciate the habitability advantage of that, but I think that discarding context of acquired domain knowledge as you work on issues is too great of a cost.

In case it's not obvious, the words came to me for the point I'm trying to make: Domain knowledge you acquire while working on something should be encoded in sensible abstractions that others can uncover later on, peeling away more and more complex layers as needed.

> When it comes to testing, TDD adherents will disagree, but if you ask me it's overkill to test small private subroutines (or even going so far as to test individual lines of code). For example, if I have a hashing class, I'm just going to feed the hash's test vector into the class and call it done. I'm not going to split some bit of bit-shift rotation code off into a separate method and test only that; if there's a bug in that part, I'll find it fast enough without needing to give it its own unit test. That's what debuggers are for. All the unit test should tell me is whether I can be confident that the hashing part of my code is working and won't be the cause of any bugs up the line.

This view sounds like it may be a direct result of a YAGNI/avoid abstraction style to me actually. If you avoid abstracting or code-reuse quite a lot (or even don't spend enough energy on it), you lose one of the largest benefits of TDD:

regression testing

If nearly all of your functions are single use or don't cross module boundaries... then the value add of TDD's regression testing never really has a chance to multiply.

For OOP I feel like this would be reflected in terms of testing base objects the most or static methods. For functional code, it would just be shared functions.

> Obviously I'm not in the "tests first" camp; instead I write tests once a class is complete enough to have a clearly defined responsibility and I can test that those responsibilities are being fulfilled correctly

I'd argue you are testing in your head anyway. Visualizing, conceptualizing, and trying to shape the essence of the problem into something that makes sense.

The problem is sometimes our mental compilers/interpreters aren't perfect and the mistakes are reflected as kludges or tech debt in our code.


> then called each of those functions sequentially in a single "run" function

If a lot of conditions are checked sequentially, why not write down exactly that? The sequence itself may well be what a reader would like to know.

The one benefit of the added function calls would be that the definition order does not need to change even if there are future changes to the execution order. But that is exactly also what adds mental overhead for the reader.

If the names add context, then that's a perfect use for comments.


I like your terminology of code locality. Personally, I've always thought of it as being the "ctrl+f" principle. If I'm reviewing a PR or looking at the source in GitHub, it's a lot easier if I can find definitions via "ctrl+f" without resorting to an IDE. Sure, it's probably best practice to checkout code changes I'm reviewing and open them in an IDE, but that often isn't what happens in practice and if I'm reading third party code, configuring the IDE to understand the project might take a lot of effort. The stronger rule is that it should not be necessary to use a stronger tool than a project wide find to lookup references to some function/data structure.


A closely related rule of thumb for organising code is: "Things which change together, belong together."


Right, “point” can mean any notion of “close vicinity” whenever it’s not practical to reduce something down to literally the same single syntactic expression.


This is a pretty good description of the motivation for OOP.


Right, if the "truth" is encapsulated inside an Object, it is the single location for that truth.

If you want a "different truth" create another object. And never assume that the truths of different objects must be the same.

That is a difficult condition to achieve in practice we often code based on what we know a method-result "must be, therefore it is ok to divide by it etc."


Especially for an object that has to have some consistency between its members, all the code that has to maintain that consistency (or that can break the consistency, same thing) should be in the same place - functions of that class. So if you ever see a class that has the data in an inconsistent state, you have a very small set of places to look for the culprit.


Occasionally, it's not clear if a single point of truth is entirely appropriate, or even if it is it can lead to tiresome extra levels of abstraction.

In this case, I sometimes prefer a slightly different approach; let's call it CRAP - Cross Reference Against Protocol: instead of definitions effectively referring physically to the same point of truth, they are designed instead such that they simply cross reference against a protocol, warn if there has been a deviation in protocol giving the developer the opportunity to go back and correct as necessary, or otherwise allow it and manually sever the link to the protocol if one is no longer desired.

This prevents against SPOT's weakness (at the cost of some extra manual work / due diligence by the developer), which is accidentally enforcing future convergence in situations when previous convergence was incidental/accidental, and divergence should have been allowed to take place instead.


The CRAP that you’re referring to is really only solving for a very minor downside of SPOT (and DRY), and that’s the inconvenient connection of technically unrelated code. In my own experience it’s far simpler to disconnect two code paths (copy a function, change its signature, etc) than it is to connect them to achieve SPOT.

In your CRAP model it seems like you’d be relying on tests or assertions to verify the “protocol”, if I’m understanding that correctly. Which means finding all the places where something needs to be true and then testing for it. Seems like a lot of work and wouldn’t necessarily catch all those places.


I have a feeling the acronym won't help to push your idea for wide adoption.


It's always a good idea to use an acronym that people aren't embarrassed to say out loud, especially for extremely successful widely adopted award winning open source projects.

Just watch what happened when the OpenVDB project won the 2014 Academy's Scientific & Technical Achievement Award hosted by Margot Robbie and Miles Teller on February 7, 2015 at the Beverly Wilshire.

https://www.youtube.com/watch?v=5FwOc4OSOR0

https://www.openvdb.org/


can you elaborate on what "protocol" looks like? or can you give us some example?


Not sure if it fits within the CRAP idea, but sometimes I just write a test asserting that all duplicative definitions of X are equal.


> Occasionally, it's not clear if a single point of truth is entirely appropriate

Feels similar to the monolith/microservice discussion in that it's mostly context sensitive. I think the term "programming principle" is misleading, these are tools with specific applications.


If you have client and server side and you have to check only in one place if conditions are met, this means that you cant check in client side anything and must do a server call, or implement both server side and client side in single codebase. Not sure if this is always feasible. Add DB to that and this means that you have to always check for constraints at DB level.


You could consider codegen from the single point of truth. Not always practical, but great when its set up with good ergonomics.


Codegen is great for lots of sync between client and server (I just wish there were more standardized tools), but what if it's a complex algorithm that you need to be able to perform both client and server side, assuming they use different languages?


You've answered one of the easiest ways to do it: just use the same language. Node (and Deno/Bun) is a good way to maximize code sharing and single points of truth.

But I definitely understand not everyone wants to write all of their backend in JS (or even TS). (Node ORMs sometimes don't have the polish of their cousins in other languages, for instance.)

There are great opportunities here for language mixing, however, in writing that complex algorithm in the same language for both frontend and backend but the rest of the codebases may not be in the same language.

1. V8/JavaScriptCore/SpiderMonkey are all very easy to rehost as "scripting languages" inside many languages people prefer to use on the backend. It's often easy to find wrappers to call JS scripts and marshal back out their results inside the backend language of your choice these days. You pay for transitions to/from JS to your primary backend language, of course, so it takes careful management to make sure these "business rules in JS scripts" aren't in any hotpaths, but a backend may have the cycles for that anyway.

2. WASM is offering more opportunities to bring some of your preferred backend language to the frontend. You probably don't want to write your whole frontend in WASM still today, as WASM still doesn't have the same DOM access, but using WASM for single point of truth business rules has gotten quite viable. You still pay for the startup costs and marshalling costs between JS and WASM, but in many cases that's all still less than network call. (I'm still skeptical of tools like Blazor for .NET, especially those building full frontends with it, but I definitely appreciate that reach of "write once" business logic for both client and server.)


You're assuming a web client. The example I had in mind was a mobile client, where we actually used a cross compiler to generate java byte code from swift to allow code sharing between iOS/Android clients, but that wasn't usable for the backends (which were C# mostly). Which is why I went with a server-precomputed lookup table in the cases the number of possible inputs made that feasible. If not, reg-exes for validation (with tooling to enable sharing between codebases) are a decent alternative in many cases. But I was curious what other options HN readers might have tried.


Option 1 applies to mobile frontends, too. In Swift you can always call JavaScriptCore to run a script. Android can use system-wide V8 or even bundle a smaller interpreter for JS.

(In fact, JS is the only "universal" scripting language in mobile due to Apple's fun restrictions that JavaScriptCore is the only JIT engine allowed on iOS. It really is the strongest option today for language to write things in if they absolutely need to be shared across all possible operating environments.)


That's where you need to consider practicality, and it degrades quickly. If it's fairly simple and compartmentalised code without external dependencies like services running on server, you could consider transpiling to target runtimes. But most likely at that stage you'll call a remote function through an API when you need that business logic on the client.


Sure, if performance isn't an issue (e.g. it might need to be done per keystroke). Small amounts of logic duplication like that can be annoying though, esp. when inevitably they end up not agreeing and users don't understand why they're being told their input is invalid. One option I've used is to have the server precompute all possible outputs for all possible inputs, it can work pretty well even with 10s of 1000s of entries.


One way to do this is to have a formal data schema as a separate artifact, which you then have as a dependency in your server and client projects, and generate the checks from the schema, as well as the SQL DDL.

But yes, if you have truly separate codebases it becomes more difficult, and protocols need to be quite stable and changes to them carefully managed.


> If you have client and server side and you have to check only in one place if conditions are met, this means that you cant check in client side anything and must do a server call, or implement both server side and client side in single codebase

Checks don't have to include the definition of knowledge independently, so multiple checks against the same rule don't need to be a violation. As a simple example, if you have a JSON Schema, that is the single source of truth for validation, and you can validate against it in 16 different places at different stages of processing and you haven't violated the principle that each piece of knowledge should be represented once in a system.


Have not thought about this. Good point. Constraints as a data not as a code. aka schema.


This is a huge advantage to using Node.js (or Deno) on the server. In a lot of my projects, I have a shared library that is used for data validation and constraints that is used on the frontend AND the backend. Makes validating data on both sides incredibly easy, and changing the library forces changes on the frontend and backend to match (enforced by automated tests)


> this means that you cant check in client side anything and must do a server call, or implement both server side and client side in single codebase

Or you derive one of the implementations from the other, or both from a third one.

But yeah, heterogeneous environments have a tendency of creating unnecessary code duplication.


Partially agree. I think SPOT (I'd always heard it called single source of truth) is a more universally applicable paradigm than DRY. Having said that, the cost of creating dependency chains is often underestimated. Overly dogmatic adherence to SPOT/SST can lead you to make the wrong tradeoff on coupling two unrelated areas of your codebase to unify some trivial truth.

I'd also say there is a lot of nuance about what "truth" is (i.e. is a pizza crust/sauce/cheese an essential truth that should have a single source).

Some DRY definitions I read actually tie in SST but I think many devs don't bring that nuance to it.


> I think SPOT (I'd always heard it called single source of truth) is a more universally applicable paradigm than DRY.

“Every piece of knowledge must have a single, unambiguous, authoritative representation within a system” is the verbatim definition of DRY from when the DRY principle was first articulated.


Yeah there are a lot of definitions out there that are along these lines and they do hollow out my argument.

But why call it "Don't Repeat Yourself" if it actually means something somewhat more subtle than that. I firmly believe many junior developers don't grasp the nuance and based on the comments I'm not the only one who thinks this. So if DRY is widely understood by developers to mean literally "don't repeat yourself" and nothing more, does it really matter how the formal definition phrases it?

In any event, if SPOT / SST and DRY do mean the exact same thing, I like SPOT / SST better because the names encode the essential concepts of the principle.


Unfortunately, words mean things and people will take names to mean what they say.


I think the rule should be "Try not to repeat yourself"

Rules are like alarms they draw our attention to some peculiar condition which gives us pause to think about if it's kosher and if not why not.


The art of programming is finding the fit and exceptions to the rules. It's just, frankly, a lot easier to be dogmatic.

Someone says "never do this" or "always do that" and you can apply those rules with abandon (often leaving a maintenance nightmare in your wake).

There are no rules to programming.


I find it useful to think of it as forces pulling on the design, similar to physical forces acting on an object. There are forces that try to keep individual truths/knowledge and responsibilities in a singular place, there are forces that try to minimize abstractions, coupling, dependencies, and indirections, there are forces that try to maximize coherence and separation of concerns, and so on. It’s an essential part of the job of a software engineer to balance those adequately in the design of the software.


Right. There are also "forces" like management who want the project to be finished yesterday.

Another metaphor I like for programming is Chess. Any line you add to the program constrains its future development, becomes "weight" or "force" that pulls your development into some direction. Sometimes you have to sacrifice features like pawns. Sometimes you may sacrifice security, you may think it is secure enough. The outcome of this game is often a draw, or stalemate. And the same game can continue for years.


I think it's an easier pointer to the concept for people who aren't already familiar with it. It actually came up the other day when I was talking to a junior dev who'd written a benchmark by copy-pasting the entire test harness; they couldn't get their head around my explanations of why centralized responsibility is important (although maybe I was doing a bad job) but once I mentioned DRY the pieces seemed to click into place.


There's nothing wrong with "Don't Repeat Yourself" except if it applies to code rather than knowledge.

Any principle like this that is applied to code is wrong.


And "knowledge" is a far more applicable word than "truth."

Truth implies facts, knowledge implies understanding the meaning and associated course of actions.

This is the second time in as many days that I've read something purporting to go beyond some original. The one yesterday was "we need more than the four types of documentation." All the examples fit into the four types as originally defined.

In HR training, there was even an entire segment on the "Platinum rule" because the Golden rule isn't good enough. Yet anyone who works to understand the Golden rule to any depth knows that it encompasses every "enhancement" the Platinum rule intends without any of the side effects.

What kind of failure is occurring such that definitions don't function any more, I wonder?


Maybe, but SPOT is a more memorable acronym than SPOK (or whatever). It allows you to talk about the “SPOTs” where stuff is defined.

One could also use “SPOTify X” to mean “reducing X to a SPOT”. :)


How about "reduce coupling" in that function, "increase cohesion" in this other function. The DRY principle is intending to get you thinking of coupling and cohesion, which, when gotten backward dramatically increase complexity.

The mechanism to "SPOT that code out, bro!" can be applying varying techniques for reducing coupling, and tightening cohesion. A proper review on a merge request should be making more specific comments about how, not a hand-wave to say "DRY that sucker up."

One final comment: DRY is three characters, therefore obviously more efficient :p


I really like your SPOT better than SSoT, the acronym is so much more on point, really hits the "spot"


I like this.

It is very hard to find out if the definition already exists or not in the codebase. This can lead to multiple definitions of the same thing or the truth.

anyone has a good way to deal with this?


Unfortunately the issue of lacking a single point of truth is exacerbated the more people who work on a project. I believe the issue in spreading around logic comes from not knowing the original intention, and asking the original authors is, IMO, the best way to fix something or add new features. Obviously knowing the original authors is not always possible, so I try to follow existing patterns.


If the codebase isn’t a total mess, one should be able to guess which components or code paths have to deal with a given truth by virtue of their purpose/function. Then one can investigate the code paths in question to find out where exactly the existing code is dealing with the respective thing.

It should be an automatic thought when implementing some logic to think about which other parts of the system need to be consistent with that logic, and then try to couple them in a way that will prevent them from inadvertently diverging and becoming inconsistent in the future.

In terms of software design, a more general way to think about this is that stuff that (necessarily) changes together (is strongly coupled) should be placed together (have high cohesion).


Some IDEs will warn you about similar blocks of code.


It's interesting to note that this principal doesn't just need to apply at a low level, e.g. code. It continues to add value when designing application architecture. Or, can be used to help refine features.


It also tends to shift the focus, at least in mentally framing the issue, from DRYing out implementations to information. If you're thinking less about implementations (man, I wrote a fold by hand for this thing--maybe I should put all the folds into the same function?) it guards against the "overcomplications" people seem to dislike about DRY. You're thinking more about replacing repeated materializations of the same information with references to a source of truth.

That's why I'm pretty dogmatic in applying DRY to infrastructure as code, for example, as opposed to generically for every code base, because finding or depending on repeated identifiers that have to be the same is such a source of error here.


...we've just hit a DRY SPOT

</inevitable_joke>


One underappreciated hard bit of SPOT is knowing for sure that something does in fact represent the same truth. One of the pain points of the DRY/SPOT model occurs when a new use case arrives that breaks existing truths for certain subcomponents. It can be real painful to decouple things.

This is not a reason to avoid SPOT altogether, but one think through that situation as part of their mental calculus on pros & cons.


It’s difficult to discuss this in the abstract, but one benefit of SPOT is that it tells you which code (the users of the SPOT) you have to consider when decoupling/refactoring. In contrast, when it’s decoupled in the first place, but actually represents the same truth, you may have no idea that the other instances exist.

Writing code such that it’s reasonably easy to decouple or recombine existing uses is mostly orthogonal, I think. Usually you can just duplicate whatever is at the SPOT when the need for more than one truth arises.


Also, these are principles, not compiler errors. These principles will complete and it sometimes helps to have an order of precedence. The number one rule is KISS and would have trumped all the examples in the article.

Nit: the article’s problems also have an obvious solution—named parameters with default values when not provided.


In total agreement. I might have couple of the same snippets of code calculating something in a program and generally would not loose my sleep over it (still will fix it when have nothing else to do). But replicating something like the source of truth is a crime in my book.


What are some patterns that can be used to implement this?

I can think of using a Rule Engine but not sure if there are any performant ones and they don't seem to be used much.


Usually it just means that you have a function like `isValidPizzaToppings(ToppingsList)` somewhere that you call in multiple places, instead of having a condition `myToppings.count() >= 1` in multiple places. So, just normal functional abstraction.


This is much better although I'd argue even it can be taken too far.


Any rule in engineering "can be taken too far" in the sense that there are times when other considerations will win. SPOT (which actually seems equivalent to the intent of DRY, although I recognize that that use might not be well represented in the memepool) is something that is probably always a win on its own terms, but in some cases introduces other costs that exceed the actual benefit. I think that's meaningfully different than the overly syntactic (mis?)interpretation of DRY, where it's true that things looking similar might hint that there's something to factor out, but sometimes factoring that thing out is a bad idea simply because it is actually two separate "pieces of knowledge" that just happen to be the same at the moment.


Every time I read an article like this, "why <often cited best practice> is overrated", I think, yeah you are right in theory. But most places I have worked, these best practices were not overused, but underused. If you have the problem that your coworkers create unneccessary abstractions, I envy you, because I have so often had the opposite problem. Maybe this is not the case if you work in a great software development team. But if you work somewhere where they do software development on the side (science, hardware, etc..) it is the main issue.

People not able to factor out functions or structure their code in a readable way. Variables are called v1, v2, v3. Unit testing seen as a waste of time. CI seen as a fun toy. They lack the experience to even notice the difference.

Maybe I'm becoming a curmudgeon, but I think many people would be well served by just googling "<my programming language> best practices", learning the acronyms like DRY, and just following them. And when you have gained some experience, sure, then you should question the wisdom and not follow it blindly.


> People not able to factor out functions or structure their code in a readable way. Variables are called v1, v2, v3. Unit testing seen as a waste of time. CI seen as a fun toy. They lack the experience to even notice the difference.

Had a colleague work under a 'team lead'. Needed to take a form with variable amount of rows of input data - max 50 - and take data, parse it, and store it. Took 20-30 lines of code. Next day "I don't trust loops, these need to be unlooped". Really? This was all in writing and stated out loud in a meeting with witnesses, and everyone agreed. "Loops can be tricky - they don't always work like you think" (something like that). So a 30 line block of code with a loop around it became 1200+ lines with 'v1, v2, v3.... v50', with 'ifs' around each one to check if that row number was also submitted.

The code to generate the form was, of course, a loop that spat out holders for 50 rows. THAT was OK, because someone else's team wrote that a while back (really??) and ... it was already done and in production. The lead could not put their stamp on it.

Very very very weird. Having half a dozen other people all nod their head suggesting that a 30 line loop is fraught with danger, and the correct answer is copy/paste 50 times. Felt like gaslighting, to my recollection. Worked in same dept, just not on same project together, but enough of this was heard/pickedup across the dept.

And... my colleague and I aren't there any more, and to my knowledge, that team lead is still there.


This is what I call "preloopsarian": a state of coding innocence in which one has discovered assignment and alternation, but not iteration.


Looks like straight out of https://thedailywtf.com/


It's been a while since I'd visited! Always amusing.


>Felt like gaslighting, to my recollection.

Sounds like a real-life example of the Asch experiments.

https://en.wikipedia.org/wiki/Asch_conformity_experiments


That's terrible.

I was in a situation in the early 2000s where the team that would maintain our application after we were gone (to another project or product or company) were not skilled enough to follow certain things, and we were asked to change a number of things to make it easier for them. In that case, the leader of their team was self-aware and honest and communicative, which is the rare and exotic thing, but we did have to re-architect some things and even change the programming language in one area to suit their capabilities. Sometimes that's a business need, and it matters.


At the very least it's a signal to hurriedly look for a new job, if not resign on the spot.


One time I needed to sort some data arbitrarily — the resulting order did not matter, it only mattered that it was the same for the same data in different orders.

My senior engineer advised me against using Java .sort() because “we didn’t write it so we couldn’t be sure it would do the same thing every time.”


To play the fun game of charitability, that engineer could have been talking about sort stability. Which could technically violate the property you want.

A quick search however does say that Java's .sort() is stable.


Sounds like something out of https://blog.codinghorror.com ! :)


I regularly see code of the form `if(x == false)` as the author has a distrust of `if(!x)`.

I guess the author just distrusts smaller things, leaving me to distrust the author’s larger things.


This is completely another level of "issues" than other problems in this thread. During code review I'd only mention it as a nit. The longer form is correct and the only downside is that's a bit longer. It doesn't mess up code modularity or affect maintainability in a noticeable way.

Well, assuming that the language doesn't have any quirks in this area - e.g. in Java your statements aren't equivalent for a Boolean x.


Nitpick (as in general you are right regarding java as well): I’m fairly sure they are the same for java in this instance. Both will convert Boolean to boolean, throwing an NPE if it was a null.


Ha, that's true, thanks! I guess my Java-fu is weak these days :)


They're not equivalent in many languages (JS, C++/swift with operator overloads, if x is nullable etc. etc.).


It's been awhile since I absorbed the weird programming norm that "real programmers use the !x form!" but even after 10+ years of !x , I still find ==false more readable.


I agree. To me, its simpler to understand. Suppose x is a bool, reading the code, I say to myself "if not true..." or "if not false..." and my ape brain gets confused on what happens if its not true or not false.

Reading "if true == false" or "if false == false", it becomes much clearer what we're testing here and I understand it instantly.


If the statement is "if (!isGreen)", it's much clearer to say "if is not green" than it is to say "if is green is false". Putting == true or == false makes you convert a clear statement "is green" into "true" or "false" instead of just being a natural English statement. It would be like saying in conversation, "I want to go to the store is false" instead of "I don't want to go to the store".


> If the statement is "if (!isGreen)", it's much clearer to say "if is not green" than it is to say "if is green is false"

I agree that when you read it, it's clearer. And yet I still prefer "if(isGreen == false)" for reasons of clarity in another sense.

The "!" being right next to the "(" makes it easier to miss the "!" when scanning quickly through the code, hence reading the logic the wrong way round and seeing "(isGreen" instead of "(!isGreen". And that's enough of a risk to ignore the readability advantage of "(!".

(Edit: To be clear, I don't suggest "== true" for the opposite cases, as the lack of a "!" in those means the risk is gone)


It also helps readability if the ! is before a function name that doesn't follow the right naming convention for it. One of my pet peeves in C is "if (!strcmp(a, b))". "!strcmp" I read as "not string compare" and I would expect it to mean that the strings don't compare when it means the exact opposite. This is true of anything following the "0 means success, anything else is an error condition" error handling scheme. So I use "if (strcmp(a, b) == 0)" instead because the "==" makes look at what value it's being compared to specifically and I make fewer assumptions.


Even if the not operator in the language you're writing in happens to be the actual word 'not' ?


My whole career has been C++ and shader languages, so this really hasn't come up for me. I imagine it being a real word would improve readability greatly.


Even more common seems to be

  if (x == true)
which always seem to come with some argument how it is "more clear".

I have started to ask people straight away to change to

  if ((x == true) == true)
which following the same argument should be even more clear.


I wrote code like that sometimes. Equaling false is more specific, depending on language. There are many falsey things that are not false themselves.


In some languages these do different things, right? (Or if someone did something horrendous with operator overloading)


As much as people push the more succinct if(x)/if(!x) style of expression, I don't know that it is better. Now if your example was if(is_a_thing) then maybe it reads better. Add to that the possibility of three value logic and I could lean more to the if(x == false) style.


The worst one is "if x=true", which to me says the writer doesn't know what the if statement does...

That said, if x is data read from elsewhere that just happens to be boolean, I can write code like that in Python.


The UW intro CS courses call the `if (x)` form "Boolean Zen", which I've always enjoyed.


I've seen this in Ruby and Elixir that drives me a little nuts:

  if !is_nil(foo)


While it may not matter in many circumstances, this is not the same as "if foo", because false is not nil.


It's true but in 10+ years of writing Ruby it hasn't mattered in _any_ circumstances I've come across. I also assert that having a boolean where `nil` is meaningfully different than `false` is a smell and should be avoided.


Some of that might be Python habits which is terrifying for completely different reasons.


!x is not equal to === false.


It's not usually needed (especially these days), but there are times that it is better to repeat every possible iteration by hand and not have a loop.

This is a technique called loop unrolling. It is done for performance reasons. This is something we used to do at a company working on games for the old feature phones (think Nokia 30/40/60 series stuff). The devices were very limited, there is no direct control over J2ME garbage collection, etc... so loops could very noticeably slow down games.

We initially wrote code with loops, then would performance test and manually unroll when it was necessary. Eventually this became very burdensome and we eventually... wrote code to unroll the loops for us and that code of course had loops in it because it was build code that wouldn't ship.

There are other performance situations where this technique applies.

This may not have been the situation there, but I think it's important that rather than assume stupidity from the outside that we try to ask why.


Loop unrolling seems like something that should be done by a compiler when you turn on aggressive optimization flags, and not something you need to code explicitly.


Now, that's probably true. However, in the days of feature phones compilers auto optimization were still inferior to a person unrolling a loop in ASM.


That is why I said it's usually not needed, "especially these days". This was not true at the time. Relying on primitive compilers, especially the likes of the J2ME garbage collection would lead to total freezes in a game from something as simple as a loop. As the garbage collector decides previous passes in the loop are no longer needed it can trigger a garbage collection sweep. With such a slow and limited device a garbage collection sweep would literally cause a game to freeze until the sweep completed which could take on the order of several seconds.

The loop unrolling was just one example of how we would go about preventing an undesirable sweep.

As another example a large global game object array would be created when a game was started. As objects were created and deleted they would really just update pre-created objects in that array.

This allowed us to prevent garbage collection, while simultaneously making sure we didn't run out of memory.

The Nokia 1618 (a series 40 device) has a heap limit of 1024KB. Many S20 and S30 devices were even further memory constrained.


> my colleague and I aren't there any more

The only right choice in that situation.


At one of my earliest jobs, in a previous century, if statements were introduced to the language (RPG-3).

I was delighted, but the old-timers were quite suspicious of this experimental technology.


That’s some did-I-wake-up-in-another-dimension shit. :)


Makes me think I should consider myself lucky that so much time has passed since last time when something reminded me of https://thedailywtf.com...


This is terrifying.


with people like that roaming around the landscape maybe there is a point to all that leetcoding nonsense, since they'll certainly spout off some truly insane stuff in interviews...


Which country was this in?


Elbonia perhaps?


US of A.

FWIW this was... 2005-ish.


In rails shops, "dry" is _always_ overused. It was part of the red->green cycle in TDD culture, and got mentioned in every context, and as a result nearly all legacy rails apps are filled with weird abstractions introduced in a commit with a message like "dry it up". The problem is that it's phrased as a rule rather than a smell.

DRY deserves to be listed with other code smells that may indicate a missing abstraction (and there are many), but there is never a reason to blindly extract chunks of code based purely on repetition. That's not a "first step toward good programming", it's a step in the wrong direction.


I've experienced this first-hand in a Rails app, last week in fact. The result was a messy, hard-to-understand hierarchy of classes and abstractions--just because two workflows shared some similarities. Usually I find this happening with hardcore OO programmers or bored programmers who feel the need to start creating and don't know when to stop. I prefer boring code at this stage in my career.


> just because two workflows shared some similarities. Usually I find this happening with hardcore OO programmers or bored programmers who feel the need to start creating and don't know when to stop. I prefer boring code at this stage in my career.

As a Haskeller, when I do this it's about getting certain guarantees about the semantics of related workflows and knowing they must behave the same in X, Y, and Z.

This aids in reasoning about inevitable production issues.

I find boring code easy to modify but hard to reason about from a higher level and that it typically requires nastier solutions to maintain backwards compatibility.

That last point is contradicted by this posts example though, so it has me reflecting on things.


Thanks; this is about what I was going to say but you said it better.

DRY is a tool, not a design goal. I think part of the Rails issue is a combination of the early Rails hype/philosophy, and the fact that DRY as a concept is so easy to "get", that everyone gets it but often fail the next step of "why". Without the why, you often can't figure out the right when/where so it gets applied everywhere.

It feels to me like another oblique angle of Goodhart's Law; eg: "A rule of thumb that becomes a required practice ceases to be a good rule of thumb".


But "early Rails hype/philosophy" is the only thing you've said specifically about Rails. I think I know what you mean though. There are strong philosophies in the Rails (and more generally ruby) communities. Opinionated coding, and software craftsmanship is lauded, more than in other language communities I think.

As a result, rules of thumb like DRY are drummed into new developers. The overall effect of that is probably better code quality in general, but yes probably more instances of them being over-used.

But if Rails has the problem of folks over-using DRY, what are other languages communities doing? Just not doing so well at telling people about DRY in the first place?


I sometimes use term "mid-level engineer syndrome", for "too many levels of abstraction in the codebase". It is very common in my experience. And untangling it's usually harder then extracting common stuff from "dumb" code. I usually don't DRY things up until three repetitions. And in test code - try to not DRY at all, copypaste is a friend of readable and mantainable specs.


I have colleagues that take DRY to an extreme when it comes to tests. There are so many levels of abstractions that it's incomprehensible. Tests should be clear and readable, you shouldn't have to dig code to understand _what_ a test is doing.


Tests in particular SHOULDN'T be DRY, IMO. They need to be very much independently modifiable quickly; they're meant to be quirky end-runs around your bespoke, artisinal architecture to get at all the interesting bits. Repetition there is fine.


I dunno. I think a few helper methods helps a lot in tests - like if you have a common operation in the setup portion. I don't think the code needs to be super abstract, but it shouldn't all be hand typed out.


A downside of this approach is that sometimes when you make a small implementation change, you need to rewrite 50 unit tests.


> copypaste is a friend of readable and mantainable specs

Another generic statement: copypaste (as I understand you mean the opposite of extracting common code) between specs goes against single responsibility. Rather than `setupUser()` you open a connection, create a user fixture, write it to the db, and then paste that across all the specs. Doing quite a lot.

I can imagine a spec with let's say 20 cases. Arrangement of each takes about 6 lines to load something, change some state the test subject depends on, the usual stuff, like in the above example.

A week from now, 10 cases need an extra line of setup, which you dutifully paste across the specs which require them. You put it somewhere in the middle, as it needs an id from the first step of 6.

This happens once or twice. The commonality of the original 6 copy pasted all over the place is hashed up, interspersed with calls specific to each test. The linking factor between those 6 lines is now obscured and requiring careful analysis if only those 6 need to change.

This can be avoided if you extract the common bits out early on. Rule of three is your friend if you don't want to rush it.


You're almost there, IMO. This internet stranger encourages you to ditch the rule of DRYing based on the number of repetitions at all, and instead think of DRYing based on what deserves to change to together. Sometimes a single repetition in code deserves to be DRY. Sometimes 10 repetitions don't deserve to be DRY.


Isn't that playing a little loose with definitions? if you have 10 "repetitions" that don't need to change simultaneously they aren't really repetitions in the first place. Just because a dumb analyzer says you have a 10 line block of identical code in 10 places doesn't mean it's actually identical.


You're restating the article's point. Naive DRY says make the dumb analyzer happy by abstracting out the coincidental similarities. If you work in a place that recognizes that this can be a red herring, then great, but a lot of developers and teams don't make room for that nuance. That is what the article is arguing against.


I'm a big proponent for the "rule of [at least] three" for building DRY abstractions: once is YAGNI (you aren't going to need it), twice is coincidence, three is finally a pattern emerging.


In my workspace, the issue isn’t that junior developers consistently DRY too much or too little, instead they make dramatic mistakes in both directions. However, the code that repeats itself unnecessarily is way, way easier to fix than the code that tangles itself up like the left pineapple example.


> But if you work somewhere where they do software development on the side (science, hardware, etc..) it is the main issue

I have seen this working in finance. I worked at a startup with a quant, and my job became waiting for him to go home at 6 so I could clean up his code and make it maintainable the next day. Now I don't blame him- his background was operations research and he was a professor prior, without any real software background. But, he would get legit mad at me for messing with his code. We had a really tense relationship for awhile. I would occasionally break things, but he didn't really have tests until much later, so it was hard to detect and there were often subtle side effects.

But anyway, I think for awhile he thought I was just a pain in the ass, until one day about 9 months after we started the project, and he wanted to run some experiments using a specific universe of securities, and just apply a few constraints to them, and I set this up for him in about 5 lines of code, and all of a sudden, I could just see the light bulb finally turn on for him as to why I was doing all of these things. After that day we became a lot more friendly.


Your problem isn’t DRY, your problem is that you don’t have people advocating for very basic best practices.

You can’t adopt DRY or SPOT or anything else if you aren’t free to refactor and you aren’t free to refactor without some tests.


In places I've worked DRY is overused, or used poorly. I can't count the number of times you get "DRY" on a code review just cause something appears twice. It's probably because it's the easiest one to spot.

> learning the acronyms like DRY, and just following them

Following DRY is the hard part. As the author points out, to DRY something up you need to pick the right level of abstraction. This requires experience to do well.


It seems like your team should check for DRYness with a static analyzer like PMD. Your reviewers have better things to look at.


This article is acting like DRY means "never repeat yourself in any circumstance". And it's easy to come up with counter-examples if that's your starting point.

While there are certainly some DRY proponents that treat it like that, most discussions I've seen advocating for DRY treat it as a rule of thumb. There are always exceptions, but DRY will steer you in the right direction more often then not.


> If you have the problem that your coworkers create unnecessary abstractions, I envy you

I'm curious what languages and frameworks you work with. I suspect this is something that varies from one programming subculture to another. In my experience, the only codebases where I've consistently seen underuse of abstraction have been PHP and C, and in those cases it has been because the code was written by entrepreneurs or engineers who were not software professionals. In Java and Scala I've seen plenty of code that used too much abstraction but almost no code that used too little. In C++ and Python I've seen it go both ways.

In my book, what programmers need first of all is the willingness to rewrite their code. I've been trying to hit the right balance for twenty years, but even now I frequently have to backtrack because the decision to add or not add a layer of abstraction turns out to be wrong. Biasing somebody towards or away from abstraction just changes the kinds of mistakes they make without making them better at cleaning them up.


I half agree here. Especially in Python applications, so often you so a class that is instantiated only once. I'm not sure if this is overly usage of OOP or DRY, but defining a whole class just to be instantiated once seems like a very verbose abstraction to me. There's nothing wrong with a few functions to group code together and leave it at that.


Every best practice can be overused - used where not needed, used where it doesn't fit, used in trivial cases. And if you've got people who haven't developed sound engineering judgment yet, they'll start by not using those practices. But once you convince them that they should, they'll use them everywhere, even the wrong places.

This is why best practices are needed, and why they are over-rated. Where they are used, they are often over-used.


> But most places I have worked, these best practices were not overused, but underused

For what it's worth, 90% of the code bases I've worked on overused DRY. Two pieces of code that do the same thing but that shouldn't be coupled should be repeated, otherwise you end up with a million "if" statements for all the separate cases this code will need to handle as it grows, as well as uncertainty about whether changes will have unintended consequences.


I am a CS graduate working at a top HW company and you could not have stated it more perfectly. The code I have to work with given to me written by HW engineers is pretty brutal. Talking 1k+ lines of code with 5+ nested conditions on the reg. Ofc, no unit tests. Blows my mind.


True b ut it's still good and fun to discuss whether the best practices could be improved even further. And usually they can because they are context-dependent. There are no absolute rules, except those imposed by the compiler.


Yep, it’s a general gripe with thought-pieces; advice is context dependent.

Or, put differently, my HN motto: don’t give uni-directional advice when optimizing a U-shaped error function.


> Instead of our code being architected around the concept of how pizzas are made in the abstract, its architecture is tightly coupled to the specific needs of these two pizzas that we happened to be dealing with. The chance that we will be putting this code back the way it was is extremely high.

Mistake 1: Switch from DRY to premature optimization.

> You might think that legit reasonable developers but would not actually do something like this and would instead go back to the existing invocations and modify them to get a nice solution, but I've seen this happen all over the place.

Mistake 2: Assumption of incompetence to support your argument.

> . As soon as we start the thought process of thinking how to avoid a copy paste and refactor instead, we are losing the complexity battle.

Mistake 3: Strawman argument. DRY does NOT lead to over-complicating things. Overcomplicating things leads to overcomplicating things.

Now, i wasted 5 minutes, so you can waste some more to reply to this comment, instead of completely ignoring this dumb random blog post.


I don't read coding opinion articles like OP but I like to check out comments.

> DRY does NOT lead to over-complicating things.

That is not true. I dive around foreign code bases a lot and dry-ness is actually a significant complicating factor in understanding code, because you're jumping around a lot (as in physically to different files or just a few screens away in the same file). As in, inherently every time it's used, not just in situations where it's used in a complicated way.

This sounds dumb but it just simply is much harder to keep context about what's going on around if you can't refer back to it because it's on the same screen or one short mouse scroll above or below your current screen.

That obviously doesn't mean you should leave copy pasted versions of the same code over your code base. But it's important to consider that refactorization of that code into something common that gets called from multiple places as something that you don't get for free, but that is an active trade off which you usually have to apply to prevent bugs (changing one code location and not the other) or simple code bloat. In practice this is very relevant when you suspect something might be repeated in the future, but you're not sure. Imo: Just don't factor it out into anything, leave it there, in place, in the code.


Agreed. To use the example from the article

`make_pizza(["pepperoni"])`

What does `make_pizza()` do? It could be a lot or it could be a little. It could have side-effects or not. Now I have to read another function to understand it, rather than easily skimming the ~four lines of code that I would have to repeat.

I think the article fails to show particularly problematic examples of DRY. E.g. merging two ~similar functions and adding a conditional for the non-shared codepaths. shudders


> What does `make_pizza()` do? It could be a lot or it could be a little. It could have side-effects or not. Now I have to read another function to understand it, rather than easily skimming the ~four lines of code that I would have to repeat.

This is not a problem of DRY. This is a problem of wrong abstraction and naming. If the function is just four lines, it could easily be named `make_and_cook_pizza`. In the alternative scenario where those four lines are copy pasted all over the place, one is never sure if they are exactly the same or have little tweaks in one instance or the other. Therefore, one has to be careful of the details, which is much harder than navigating to function definition, because in this case you cannot navigate to other instances of the code.


Exactly this. I fixed a problem like this a week ago. I found some duplicated code, factored it out into one place by introducing an abstract base class (Python) and in the process discovered one of the duplicated methods had a logic error leading to returning a slightly smaller integer result.

The code had test coverage, but the test confirmed that it produced the wrong result. I had to fix the test too.


So your refactor broke the tests, so you assumed the tests must be wrong.


So his refactor fixed a bug and broke a test which he fixed, so you assume he must have assumed instead of verified.


in a sense yes, in a sense no. if you see a function and know its sort of black box properties and its inputs and outputs are well defined, you really don't need to care. however, that applies whether the code is in an external function/module or physically inlined into your code. the sectioning off into separate code is then there to forcefully tell the reader "don't even try to care about the implementation details of this", so in practice your point still applies.

however... real software doesn't work like this. the abstractions that work that way exist for a select few very well understood problems where a consensus has developed long before you're looking at any code.

math libraries would be a typical example. you really don't need to know how two matrices are multiplied if you know the sort of black box properties of a matrix.

but the minute functions, classes, and other ways of abstraction code in a DRY way that you encounter constantly in everyday code, even if they are functionally actually well abstracted (meaning it does an isolated job and its inputs and outputs are well defined), even for simple problems, are typically complex enough that learning their abstract properties can be the same level of difficulty and time investment as learning the implementation itself. on top of practical factors like lack of documentation.

this is also why DRYness as a complicating factor really doesn't factor in once the abstracted code does something so complex that there is no way you could even attempt to understand it in a reasonable amount of time. like implementing a complex algorithm, or simply just doing something that touches too many lines of code. in this case you are left to study the abstract properties of that function or module anyways.


I think that drawing conclusions from these examples is not productive at all. In the wild we're going to see functions such as

    def make_string_filename(s):
        # four lines of regex and replace magic
so that we have code like

    file_src = make_string_filename(object_name)
    file_dst = make_string_filename(object_name_2)
which is much more understandable than eight lines of regex magic where you don't even know what the regex is doing.

The problem of not knowing what it does or whether it has side effects or not is more a problem of naming and documentation than DRY. Even then, it's still better than repeating the code all over, simply because when you read and understand the function once, you don't need to go back. On the other hand, if the code is all over, you need to read it again to recognize it's the same piece of code.


if those 8 lines of regex have been unit tested and the function is commented to describe "what" the code does, it is entirely the point that you don't need to understand how it works

additionally, the function should be stateless and have no side effects ;)


How do you test 8 lines of regex inside a function that does more things? And what's easier, to write and read the function name or copy-paste the lines with the comment (if the comment explaining what that piece does is even written, that is)?


i'm a bit confused, do we need to know the nitty gritty details of how Math.random() is written, or that it will reliably give us a random double?


you dont you mock the function that have the regex with a fixed behavior and check the actual logic inside the wrapper function


That's fair. But maybe someone wants to reuse this in another place so they do this:

``` def make_string_filename(s, style="new"): # 2 lines of shared magic if style == "old" # 2 lines of original magic elif style == "new": # different 2 lines of magic ```

When you get here, two totally separate `make_string_filenames()`, each private to the area of code they're relevant to, would be better.


The ideal is having 3 functions I think:

- make_string_filename_style1

- make_string_filename_style2

- make_string_filename

Then make_string_filename consists of logic to use the right style.

Or one function and a Sum type to be called like:

    makeStringFilename Style1 "somestring"
Given sum type:

    data FilenameStringStyles = Style1 | Style2


Except that there should be only one way to make a filename from a string. Maybe some options like "allow_spaces" if needed but the point of DRY is not only to share code but to share algorithms.


Yup. But I guess that typically happens in steps. So next DRY-programmer that comes along will add a cheezeFilledCrust boolean to that make_pizza function and so on. Every time it will seem more reasonable to add another boolean, because otherwise you have to remove the make_pizza function, and there would be SO MUCH CODE DUPLICATION.

I’ve seen this again and again in the field and I wholeheartedly agree with the sentiment in the OP. IMHO different code paths should only share code if there is good reason to believe that the code will be identical forever.


Now the next genius turns up and says that make pizza is at it's core always a n-step domain process.

So now you've dumped it down to an interface with a default implementation which calls the create_dough, add_toppings, bake_pizza interfaces in order, each of which are either passed in callbacks or discovered through reflection.

We can even sprinkle in some custom DSL to "abstract away" common step like putting the product into the oven correctly!

Jr's will never understand when why and what is effectively excecuted at runtime. Honestly, at this point I enjoy working with this kind of code. It's always such a high entertainment value and I get paid by the hour, so whatever


This is discussed in detail in "The Wrong Abstraction" by Sandi Metz

https://sandimetz.com/blog/2016/1/20/the-wrong-abstraction

Quote follows:

----

The strength of the reaction made me realize just how widespread and intractable the "wrong abstraction" problem is. I started asking questions and came to see the following pattern:

1. Programmer A sees duplication.

2. Programmer A extracts duplication and gives it a name. This creates a new abstraction. It could be a new method, or perhaps even a new class.

3. Programmer A replaces the duplication with the new abstraction. Ah, the code is perfect. Programmer A trots happily away.

4. Time passes.

5. A new requirement appears for which the current abstraction is almost perfect.

6. Programmer B gets tasked to implement this requirement. Programmer B feels honor-bound to retain the existing abstraction, but since isn't exactly the same for every case, they alter the code to take a parameter, and then add logic to conditionally do the right thing based on the value of that parameter. What was once a universal abstraction now behaves differently for different cases.

7. Another new requirement arrives. Programmer X. Another additional parameter. Another new conditional. Loop until code becomes incomprehensible.

8. You appear in the story about here, and your life takes a dramatic turn for the worse.

Existing code exerts a powerful influence. Its very presence argues that it is both correct and necessary. We know that code represents effort expended, and we are very motivated to preserve the value of this effort. And, unfortunately, the sad truth is that the more complicated and incomprehensible the code, i.e. the deeper the investment in creating it, the more we feel pressure to retain it (the "sunk cost fallacy"). It's as if our unconscious tell us "Goodness, that's so confusing, it must have taken ages to get right. Surely it's really, really important. It would be a sin to let all that effort go to waste."


Not really a sunk cost fallacy. Existing code needs to be maintained. Some of it should be deleted since it costs more to maintain. Some of it shouldn’t be deleted since it might bite you in the behind when you realize that all of that code was correct (although gnarly) and now you’ve introduced regressions. And which code is which? Hard to say.

Sunk cost (fallacy) is about making decisions based on things that you have already lost. But you haven’t lost or expended the code—the code is right there, and it’s hard to know if it’s more of an asset or a burden.


Some languages handle massive parameter lists better than other (ex with defaults). There are also design patterns for this type of problem (ex a PizzaBuilder).


100% agree and I just wrote this response and then saw you said it better!


But that is usually a problem with abstraction, rather than a problem with a method call. If I can trust what make_pizza does, that is much faster to read than any four lines of code.

A functional style certainly helps. I get the pizza in my hand and don’t have to worry that anyone left the oven on.


> If I can trust what make_pizza does

You can't, unless it's in a standard library or a core dependency used by millions of people.

That's one of the reasons why functional code is generally easier to read. A lambda defined a few lines above whatever you're reading gives you the implementation details right there while still abstracting away duplicate code. It's the best of both worlds. People who's idea of "functional programming" is to import 30 external functions into a file and compose them into an abstract algorithm somewhere other than where they're defined write code that's just as shitty and unreadable as most Java code.


> If I can trust what make_pizza does

>> You can't, unless it's in a standard library or a core dependency used by millions of people.

You can if you have reasonably competent colleagues. And if you do make some wrong assumptions about what a certain method does, it should be caught by your tests.

I feel that people that insist on reading and understanding all the code, and write code that has to be read fully to be possible to understand what is does, have missed something quite fundamental about software development.


Thanks - I like this point. I think it's probably a better illustration of what I'm trying to say in my third point. Devs are biased towards adapting existing shared code so we end up with shared libraries picking up little implementation details from each of their consumers and ultimately becoming very messy.


Arguably one could say that this is a typing (as in type system) problem

`makePizza :: PizzaType -> [Toping] -> IO (Pizza)`

Seems to carry all that information by just accepting a PizzaType symbol and a list of toppings, `IO` communicating the side effect.


> I think the article fails to show particularly problematic examples of DRY. E.g. merging two ~similar functions and adding a conditional for the non-shared codepaths. shudders

Not a problem of DRY, but bad code structure.

Just keep the two functions and pull the shared code-path out


Not all the time. When the similar code mixes types and the common codepaths are sprinkled multiple times over it you can either have the code there twice, or have an overcomplicated templated common function.

In these cases factorizing may or may not be a good idea.


I think it's just that for every complex topic, any general rule will break down at some point. That doesn't tell you that the rule is bad, but to learn how to tell when you're dealing with such an exception.


There is no way I'm leaving fragments of code that I would have to _manually sync_ every time I make a change to either of them.


DRY makes it harder to actually understand the system as a whole in some sense, since it usually means some indirection has been added to the program. However, it avoids the one thing that actually makes me pull hair out: code that looks the same because it was duplicated but is just different enough to trip you up because each area it was used required minor syntax changes that had major implications for the result.


Repetition also makes it harder to understand a system: not only do you have to read more, you also need to remember and compare repeating fragments that may be identical or just similar.

What makes it easier to understand a system is simplicity. I'd argue that DRY, deployed with a right strategic plan, usually does more to simplify things than does copy-paste.

But, as any tool, DRY is but a tool; to be useful it requires some skill.


On a big codebase, I much prefer to learn a function once than see its body repeated frequently. It's not a small thing.


> practice this is very relevant when you suspect something might be repeated in the future, but you're not sure.

DRY only hits when you indeed repeat something.

If you predict potential reuse, which you don't certainly know, it's premature optimization.


aka

Abstractions have non zero complexity costs.

And

Repeated code has non zero complexity costs

Why is this a hard concept?

It doesn't make dry any less valid.

Generally you can invoke both reasons to do something but the underlying reasoning is always complexity.


Indeed. In any practical optimisation problem, which is fundamentally what all engineering is, there's a sweet spot.

You can't just slam the DRYness knob to 11 and expect it to always be better, any more than you can turn a reflow oven up to 900°C and expect it to be better, just because 380°C is better, for the specific PCB in question, than 250°C.

It also doesn't mean you can turn it off entirely, just as if you look at your charred results at 900°C you don't conclude that "heaters considered harmful".

Also, the problem is strongly multivariate and the many variables are not independent so the "right" setting for the DRYness knob is not necessarily the same depending on all sorts of things, technical and not, up to and including "what are we even trying to achieve?"


> I dive around foreign code bases a lot and dry-ness is actually a significant complicating factor in understanding code, because you're jumping around a lot (as in physically to different files or just a few screens away in the same file).

I can't agree more. Also, "code reuse" makes debugging significantly harder when trying to reverse engineer some code base. The breakpoints or printf:s get triggered by other code paths etc. And you need to traverse stack frames to get a clue what is going on.

Extra bonus points for fancy reflection so that you have no clue what is going on.


I can't disagree more. DRY forces you to create pure reusable code, and split your code into small pieces. When I read such code I need to understand just a few pieces.


You need multiple cases of duplication (repeating yourself) before you can infer a reusable piece of code.

If you make everything as generic and reusable as possible from the beginning, you'll end up with messy code that has way too much options to set for every simple operation.


It also leads to overengineered framework code that only exists to support the glue code that is now required to pull distant code together.

Increasing the distance between inputs and outputs increases complexity.

Reusable code isn't all that reausable when nobody understands it or things are so fragmented people can't figure out how to operate the code base.

This isn't a rule. It's a moderation thing.


> This sounds dumb but it just simply is much harder to keep context about what's going on around if you can't refer back to it because it's on the same screen or one short mouse scroll above or below your current screen.

To note, a common effect of not DRYing functions is an increase in local code length.

In many code bases that lived long enough, that means screens and screens of functions inside the module/class files. It is still easier to navigate than between many files, but not by that much in practice (back/forth keyboard shortcuts go a long way to alleviate this type of pain)


That's point. However, I think this is more of a "verbose vs elegant argument". Yes, DRY should not be a religion - I will write more code, possibly duplicated, if I deem it necessary for the code to be more readable that way. It's a judgement call, but I think the basic concept of DRY should still stand. If you find yourself cutting and pasting too much, stop, go get a coffee, take a walk, come back, and see how you can do it better.


> This sounds dumb but it just simply is much harder to keep context about what's going on around if you can't refer back to it because it's on the same screen or one short mouse scroll above or below your current screen.

Are you still using a VT100?


> Mistake 1: Switch from DRY to premature optimization.

Mistake 1a: Conflating the term "premature optimization" - it doesn't apply here. Premature optimization is about runtime performance, DRY is about optimising maintenance overhead.

Mistake 1b: (good) DRY can't be done early (it's a continuous process throughout project development).

> Mistake 2: Assumption of incompetence to support your argument.

Mistake 2: Assuming you're never working in teams leveraging peers' of varying experience and technical focus.

The presumption of re-usability is absolutely the most common red flag I've seen with DRY: I've seen it with a lot of very senior / experienced devs. You can call them incompetent, but there's plenty of them and we have to work with them. Articles like this help.

> Mistake 3: Strawman argument. DRY does NOT lead to over-complicating things. Overcomplicating things leads to overcomplicating things.

This statement concerns me. DRY very obviously and demonstrably leads to over-complicating things (excessive / ballooning parametrisation is just one of many very simple examples of this). If you can't see this I would have my own concerns about competence...


Thanks for having my back. #3 is an overwhelming real world phenomenon. In fact, I posted my article on reddit and someone wrote back a comment with a huge OOP solution that would mitigate all my problems. Not sure that reader got to point #3.


> Mistake 1: Switch from DRY to premature optimization.

"Premature optimization" is largely a bogus concept, because the meaning of "optimization" has shifted a lot since the concept was first created.

People now use optimization to mean "sensible design that does not needlessly waste resources".

In this meaning of optimization, "premature optimization" is a bogus concept.

You should absolutely ALWAYS write non-pessimized code by default.

What the original concept referred to is what people now call "micro optimizations". Sure, premature micro optimizations is often a waste of time. But this is irrelevant to the context of this discussion.


> In this meaning of optimization, "premature optimization" is a bogus concept.

The idea is that you can end up optimizing before you know the entire use-case, because software engineering isn't like building bridges or skyscrapers.

I'm a performance geek, but I love code I can easily change rather than code that is fast until some customers have touched it. Mostly out of experience with PMs with selection bias on who they get feedback from ("faster horses" or "wires we can hook phones to").

The first thing to optimize is how fast you can solve a new problem that you didn't think about - or as my guru said "the biggest performance improvement is when code goes from not working to working properly".

The other problem with highly optimized code is that it is often checked-in after all the optimizations, so the evolution of thinking is lost entirely. I'd love to see a working bit + 25 commits to optimize it rather than 1 squashed commit.

Optimized code that works usually doesn't suffer from this commentary so the biggest opponents I have with this are the most skilled people who write code with barely any bugs - I don't bother fighting them much, but the "fun" people with work understand my point even if they write great code first time around.

These two are mostly why I talk to people about not prematurely optimizing things, because I end up "fixing" code written by 15 or more people which has performance issues after integration (or on first contact with customer).


Code that is sensibly written in a non-pessimized manner is not hard to read or modify.

That's the whole point of my comment.

The word "optimization" as currently used confabulates two separate concepts:

- Non-pessimization (new meaning of "optimization")

- Micro-optimization (original meaning of "optimization")

You're talking about micro optimized code, and I'm talking about simple non-pessimized code.


The reasoning behind discouraging premature optimization makes no distinction between "micro optimizations" and any other kind, the purpose of this guidance is to minimize wasting time building unnecessarily complex solutions based on untested performance assumptions.


It does not generalize from low level micro optimizations to high level sensible system design.

At the low levels you really have no idea where the performance bottle necks are without profiling and getting actual numbers to work with.

At the high level you pretty much have a clear idea of what roughly the system is supposed to do and what performance characteristics you want.


I think this comes down to a level of experience.

If you're writing and enterprise app and lean back in your chair and start to think about speeding things up with loop unrolling and avx instruction sets then you're doing the premature optimization thing.

But trying to limit large nested loops is easy fruit that doesn't take much effort to pick.


Let's optimise for "years of programmers life spent worrying about it".


> You should absolutely ALWAYS write non-pessimized code by default.

Some days I come here just for the typos. :)

Today I've seen two good ones, number zero was

"Costco had to stop returns on TVs because people were “renting” them for free for the superb owl."


Where is the typo? And the usefulness of your comment?


the typo is "non-pessimized code", which should be "non-optimized code".

I see humor in thinking if my code is pessimistic enough. Have I assumed that the edge cases will happen and worked around them? Do I expect (and handle) crashes, i/o failures, network timeouts, etc?

"code pessimism" could be an interesting metric.

The typo in the other post was "superb owl" which should have been "super bowl". Several people on that thread enjoyed the typo, including a comment from CostalCoder saying "Please, please do not correct that typo!"

https://news.ycombinator.com/item?id=31999048


It's a term of art used by Herb Sutter, among others: https://stackoverflow.com/q/15875252


I think they used that term on purpose. Non-pessimized in this case is the same as optimized, and I believe it's a reference to this video https://youtu.be/7YpFGkG-u1w


That's not a typo. But it doesn't seems that you are engaing in a good faith discussion so I don't think further elaboration is worth anything.


> Assumption of incompetence to support your argument.

Okay, but it kind of is about incompetence. And by “it” I mean everything. Look, we all remember that first time we all realized that adults are just winging it most of the time. Almost nobody knows what they are doing. Half the people who “know” actually know the least.

> DRY does NOT lead to over-complicating things.

Don’t Repeat Yourself is a terrible acronym because what it stands for is exactly the opposite of what people do. Not doing something is avoidance, opting out, like “don’t push your sister” vs “be nice to your sister”.

What most people do is they realize they have already repeated themselves, or someone else, and they rip it out. They deduplicate their code. Avoidance definitely can “lead” somewhere, but deduplication is active, and that can often be headed the wrong way, either directly or obliquely.

The Rule of Three is much clearer on this. You get one. There’s nothing to do when you see you’ve duplicated code - except to check if you’re the first or not.


> Mistake 3: Strawman argument. DRY does NOT lead to over-complicating things. Overcomplicating things leads to overcomplicating things.

Sure, I agree, except DRY is probably the second greatest gateway drug to overcomplicating things to OOP. Actually, they really hand-in-hand since OOP features are often used to DRY things.

DRY can easily go too far because fundamentally it's about centralizing ideas with the premise that different operations can and should share units, even though a "writeSomeFileToDisk" function doesn't necessarily have to do the exact same thing between different higher-level operations. Because so many engineers emphasize "elegance", if a set of functions seem similar enough, they pressure themselves to write code that is shareable, hence more abstract. Abstractions are inherently more complicated and hard to understand, not the other way around. Rather than having very simple "molecules" of code that can be understood on their own, there is instead a much larger molecule of nodes that are connected by abstract dependencies, and those nodes may only have dependencies in common.

DRY should be done sensibly, but teaching DRY is a problem in our industry because we don't teach engineering discipline. We teach principles like DRY and OOP, and even YAGNI as if they are tenets of a religion.


> Mistake 1: Switch from DRY to premature optimization.

Fallacy: False Dichotomy and No True Scotsman.

"Things are either DRY or premature optimization and can't be both"

"No TRUE application of DRY would ever be a premature optimization"


> Overcomplicating things leads to overcomplicating things.

This would be the most efficient title, subtitle, and entire contents of most posts about programming principles.

However, each reader has to have a similar enough perspective, background, and experience to understand and apply it. In that sense, the trend line measuring the value of commenting about comments about random blog posts indeed indicates wasted time, but hopefully it's a local minima.

My pithy corollary to your helpful tautology is a quote from Tommy Angelo that's stuck with me since my poker days: "The decisions that trouble us most are the ones that matter least."

Decisions are necessarily difficult to make when the expected value of either outcome are similar. We waste an awful lot of time on choices that could have been made just as well with a coin flip.

So there you go world: two quotes that are generally useful about generalities that are locked, loaded, and ready to shoot you in the foot when misapplied.

Edit: formatting improvement.


The article is claiming that slavish devotion to DRY results in issues. You are giving names to the issues resulting and calling the article bad.


TLDR: If you do DRY in moderation it’s great (as the OP explicitly says).


What's funny is that DRY was first popularised in the Pragmatic Programmer[0] book, and "coincidental" duplication is explicitly addressed right there on page 34, "not all code duplication is knowledge duplication... the code is the same but the knowledge is different... that's a coincidence, not a duplication."

[0] https://www.amazon.co.uk/Pragmatic-Programmer-journey-master...


I believe this was added in the 20th anniversary edition to address the overuse of DRY following the original edition.


> Mistake 1: Switch from DRY to premature optimization.

Though note that DRY can itself be premature optimisation of the codebase.


Article tl;dr: Design is hard and can't be boiled down into applying pithy mindlessly.

For what it's worth, I agree with your points and disagree with the various counterpoints that were posted; "optimization" can mean a lot of things, and I for one understand what you mean.


I wannabe your friend hahaha

No really, you're absolutely on point. The post is not worth the time, the case against DRY is too weak.

Sounds like a kid complaining about pushing DRY in a direction that overcomplicated things for him because of himself and instead of improving himself he choosed to attack "an uncomfortable principle".


Wtf.

If your use-case is that the user can select the crust, sauce, cheese and toppings for a pizza, just pass that shit to the make_pizza function with the help of enums and arrays. If you want to have predefined pizzas, you'd simply make a dictionary of pizza templates with all the options that the make_pizza function needs and/or if you wanna be fancy, you'd make a separate make_pizza_from_template function, but definitely not a make_pepperoni_pizza function, because that's just encoding data as a code in a silly way that arguably not even a factory pattern.

No solution will be able to cater to requirements that don't exist at the time of developing this pizza-application. You build it according to the requirements that exist and that is enough. It's not your fault if nobody cared to mention that the user should be able to arbitrarily subdivide the pizza and select options sepatately for each subdivision - that's a feature update and it's OK if the original program hadn't though of that. Just like you wouldn't scaffold ecommerce capabilities into a webpage "just in case", if there had been zero mention of such a need.


It's a confused abstraction level: there must be a database of pizza "templates" (consisting of named menu items and of the lower level of pricing rules and admissible choices of crust, topping, etc.); it must be separate from generic pizza processing because it is subject to change over time; and conversely pizza processing must work for any configuration of that database, without special cases.

Mixing pizza database identifiers into generic pizza processing (e.g. make_ham_pizza) is wrong even without repetitions.


> It's not your fault if nobody cared to mention that the user should be able to arbitrarily subdivide the pizza and select options sepatately for each subdivision - that's a feature update and it's OK if the original program hadn't though of that.

I would argue it’s part of your job most of the time to challenge whatever needs are presented and ask questions about the long-term vision to find a good middle ground of future proofing vs over-engineering. That is of course one of the hardest things to get right.


Thought someone might say this!

You're right. At the risk of sounding kind of hypocritical, after a decently long career in software engineering, I've learned that some carefully chosen future proofing is one of the things that makes a great developer and it's also something where one learns to eventually "see" where it is needed.

My "if nobody cared to mention..." part should have probably said "if nobody cared to mention, even after several specification meetings, that the software should be able to do X..." as I agree it's definitely part of your job to assess the needs.

This post is a bit weird though. It's as if someone is ranting about how hard it is to hammer nails into wood with a shoe or 15 other things, when you could just use a hammer.


100% agree, both on your points and the article lacking a point.


> No solution will be able to cater to requirements that don't exist at the time of developing this pizza-application.

I think I know the title of Gordon's next blog post: "Why YAGNI is the second most over-rated programming principle."


Not sure why this is on the frontpage. Not only are there a bunch of typos, a bunch of code doesn't actually work the way they said it does. Also gotta love hating on the 10x developer or whatever for saying you are wrong.

EVERYTHING HAS TRADEOFFS. Every single thing has tradeoffs. Obviously you should not write terrible, brittle code. The reason DRY is important is because when you start duplicating code, having 30 different serialization methods littered throughout your code, 5 different ways of calculating the same value, etc etc you see why it really matters. Its a GUIDELINE used to as a general rule. And as guidelines and general rules go -- its useful for juniors and people who don't have the experience to see the best way to write the code.

Its a good default, and like YAGNI, and 100 other programmer acronyms it has its ups and downs. Your pizza example is not "coincidental repetition" -- it is actual repetition -- you just abstracted it in a really poor way to make a strawman.


Yeah, this reads like a junior programmer that got told off for having very repetitive code and they're trying to get the internet to agree that they're in the right and DRY isn't all it's cracked up to be. From the about page it doesn't seem like this is accurate, but that's how it reads.

The problem is that he made his case poorly and I definitely don't agree.


This. The clean code principles should be considered within the specific context of the situation. They are guidelines that are good to keep in mind, but no more than that.

The article gets this wrong by considering DRY as some kind of dogma and then discovering some situations where it doesn't work well. And then of course some commenters here get it wrong by only looking at situations were it does work well. It's the same religious discussion again as FP vs OOP, static vs dynamic typing, no code vs full code etc. etc. The real answer to each of these is always 'it depends'.


> They are guidelines that are good to keep in mind, but no more than that.

How great dev life could be if everyone saw it like that.


>EVERYTHING HAS TRADEOFFS. Every single thing has tradeoffs

This should be the lede, IMHO.


> Not sure why this is on the frontpage.

Not enough people are flagging the post.


Click bait. Please avoid and don't engage the post.


Completely agree. This is a low quality blog post.


> EVERYTHING HAS TRADEOFFS

Bullshit. What's the tradeoff on using `gets` vs any other function?

Nothing. Absolutely nothing. `gets` is wrong 100% of the time period.

If you're wrong its not a trade and a lot of things in this article and about dry is wrong


In this case, the problem is with a bug creeping in:

            crust: "thyn",
DRY is about avoiding this class of cut-and-paste bugs too. Or with changing a string to a token, as it should have been:

            crust: THIN
The code isn't even correct. It's mixing JavaScript and Python. I'm also not sure why you'd declare functions for each type of pizza; that's data. I'm not sure about the context, but the right way is:

    def make_pizza(crust=THIN, toppings=[], cheese=REGULAR, sauce=TOMATO)
and then in each call, to override.

    make_pepperoni_pizza()
is bad code compared to

    make_pizza(toppings=[PEPPERONI])
All of the code in this post is horrible, and has easy solutions.

I feel dumber for having read this post, and even dumber for having responded.


Subtle bug in that toppings has a mutable default argument [1].

[1] https://docs.python-guide.org/writing/gotchas/


There is no bug... yet. Unless you modify the default argument.

Sometimes I just want to monkeypatch the list to be immutable in my app.


I'd consider this a bug and not expected behavior.

  def make_pizza(crust=THIN, toppings=[], cheese=REGULAR, sauce=TOMATO):
    // first call: toppings=["peperoni"]
    // second call: toppings=["peperoni", "sausage"]
    // third call: toppings=["peperoni", "sausage"]

  make_pizza(toppings=["peperoni"])
  make_pizza(toppings=["sausage"])
  make_pizza()


Speaking about bugs, can you spot how he introduced a bug when going DRY? :)

In case it gets fixed: https://i.imgur.com/ZR2XKA7.png


Or deliberate, as everybody knows that pineapple doesn't belong on pizzas :-)


> Copying and pasting a few lines of code takes almost zero thought and no time.

;)


> I'm also not sure why you'd declare functions for each type of pizza; that's data.

Yep, had the same thoughts reading the code. What you suggest even seems a purer implementation of the DRY principle, rather than what is proposed in the article which would result in copy and pasting the make_pepperoni_pizza() function as soon as you decide to sell a third type of pizza.

Of course, the DRY principle used without considering other factors could produce bad results, but all the code in the article is bad for reasons unrelated to the principle it attempts to criticize.


    make_pepperoni_pizza()
     is bad code compared to
     make_pizza(toppings=[PEPPERONI])
How would you make Hawaiian pizza? I forget, does it include Ham? or just pineapple? you're forced to the remember that nuance in your suggested implementation, but not with "make_hawaiian_pizza()"


It's data.

   make_pizza(toppings=HAWAIIAN_TOPPINGS)
or

   make_pizza(HAWAIIAN)
or similar. Data should generally not be hard-coded, both because it changes and because it wants to be validated. Starting with:

   HAWAIIAN = { TOPPINGS: [ PINEAPPLE  ...
is okay. That can later be loaded from a config file, a database, or otherwise, as the system expands.


There's an interesting architectural decision here: what form of the pizza recipes database strikes the right balance between too hardcoded and too complex. I'd use some kind of configuration file or RDBMS, constants are more readable but still out of place as part of code.


The nice thing about constants is that you can't make typos. A string like "peperoni" isn't caught, but toppings.PEPERONI will fail immediately.

I use Python. It's easy enough to, for example, make an `enum` from entries in a config file or even a database:

https://docs.python.org/3/library/enum.html

Scroll down to the functional API. There are many similar design patterns. The nice thing about these is that you get automatic type checking. If your config file is:

    toppings: ['pepperoni', 'ham', 'pineapple'],
    pizzas: {
       'Hawaiian': ['pinapple']
    }
If you load this in as strings, it will load and later silently fail. If you add validation code, you'll only validate what you remember. If you make an enum or similar custom type, it necessarily will fail-on-load, and probably with a reasonable error.

The major downside -- which is really incidental (due to poor library design) -- is that most JSON/YAML libraries won't reasonably serialize/deserialize non-Python types. So there's a bit of (unnecessary) overhead there.


Surely you'd remove repetition by doing this instead:

    hawaiian_pizza = {
        crust: "thin",
        sauce: "tomato",
        cheese: "regular",
        toppings: ["ham", "pineapple"]
    }

    pepperoni_pizza = {
        crust: "thin",
        sauce: "tomato",
        cheese: "regular",
        toppings: ["pepperoni"]
    }

    def make_pizza(pizza):
        requests.post(PIZZA_URL, pizza)
This isn't better just because it's DRY, it also keeps the data separate from code, which makes it usable elsewhere. Defining fifty different types of pizza inline inside functions is a strange choice, because it tightly couples your pizza definitions to your pizza-making. What if you want to answer a question like "how many thin crust pizzas do we sell?"


Came here for this. Why would your pizza definitions be in code at all.. Python is my favourite language but boy does it make people crazy, imagine recompiling your an app because restaurant A wanted to add a new pizza definition?

To speak to the wider point about DRY, it's a guiding principle for abstraction. If you have two kinds of abtractions for your method, one leads to code repetition, the other does not. Generally favour the does not.

The fact that you should have additional rules, like you shouldn't need rabbit hole debugging (jumping through a million files/objects) to understand core behaviour is not a failure of a useful guiding principle


Even better IMO (although devolving into pseudo code):

    pizza_base = {
        crust: "thin",
        sauce: "tomato",
        cheese: "regular",
        toppings: []
    }

    hawaiian_pizza = pizza_base {
        toppings: ["ham", "pineapple"]
    }

    pepperoni_pizza = pizza_base {
        toppings: ["pepperoni"]
    }

    def make_pizza(pizza):
        requests.post(PIZZA_URL, pizza)


Don't do this! OP's version is better!

It might be "fine", but you don't gain anything here while introducing both indirection and coupling. DRY is _not_ about data repetition. Data repetition is fine.

Alice and Bob having the same birthday is coincidental. And even if they are actually twins, you rather say that they are twins separately.

In your example you are just preserving keystrokes, but you don't say anything of value with 'pizza_base'. You haven't shown that 'pizza_base' is worth keeping track of or even mentioning.

A pepperoni_pizza with thick crust or extra cheese is still a pepperoni_pizza. A hawaiian_pizza's sauce being tomato doesn't relate to a pepperoni_pizza's sauce.

When coding data, just be explicit, verbose and keep it simple. Our text editors, IDEs and database APIs have affordances to change data in bulk. Those things are orders of magnitude easier if your data is simple, dumb and not complected.


With only 2 types of pizza with the same base, I'm inclined to agree. But if there are more, or a strong potential for more (as there is with pizza), I'd argue that this factoring allows simpler implementation of new pizzas as well as easier comparison of existing pizzas.


I think volume is almost deceptive. It seems like we’re doing useful compression, but if we’re not careful we introduce complexity and indirection that is not based on actual needs. So we essentially code ourselves into a corner.

There are circumstances when your version is actually better: when pizza_base has a real meaning in your domain, outside of the code representing it. The chefs, accountants and so on use the term day to day, and might even have specific pricing or techniques around it. Then pizza_base is an actual thing that you want to represent in some way.

However it could also just be a function, which derives from pizza_base from your raw data. Like is_pizza_base(). I would prefer that in general.

There is nothing wrong with dumb, raw, data. It gives you leverage and frees you from coupling.


Agreed, I'm scared of working with programmers that aren't able to see this. There's no point of capturing the "base pizza" concept. Nobody is impressed here. My eyes have to flick around so much more. Additionally, the programmer is decreasing the ease with which the codebase can adapt to market demand due to their likely overconfidence in their understanding/visibility over the market the product is being built for.


I disagree because his abstraction matches the domain. You will literally see menus designed around the assumption that a not-folded thin (or thick, for some venues I guess) pizza with cheese and tomato sauce is the default, that some pizzas deviate from.


this could(!) be coincidental code duplication


Not to mention it lets you do things like:

    mixed_pizza = {
        crust: [["thin"], ["thick"]],
        sauce: [["tomato"]],
        cheese: [["regular"]],
        toppings: [["beef"], ["pineapple", "pepperoni"]]
    }
Where you offload the logic of making multi-topping pizzas into the data.


I see DRY as a smell, not a principle. If you see clones (same code in multiple places), then it's likely indicating that there is something that can be factorized. Now the question you should ask yourself before factorization is whether the duplication is coincidental (as the author shows) or if it's because the logic was copy pasted. Most of the time it's the second case, and duplicate code does make maintenance harder and riskier.

One of my pet-peeves is clones in unit-tests. People tend to care less about code quality when it comes to unit-tests, and code gets copy-pasted all over the place. The result is usually an unmaintainable ball of mess, where the most subtle variation in the unit being tested requires you to apply the same change in 15 different places. In this situation, DRY is a very useful indicator that something is going wrong.

Now the opposite of DRY is YSHRY - You Should Have Repeated Yourself. When you start adding 5 boolean parameters to a function to adapt it to all its calls, it's a smell that you thought you should have DRY, whereas YSHRY.


In my case when I was junior I tried to be very smart and try to DRY a lot, but I found that most of the times is better to write "dumb" code and repeat yourself if the complexity is not worth, also as you stated if your function is used in a lot of places for slightly different things is just so easy to break something without noticing and also harder to test.

So I agree with you, as developer you should know when do duplicate code and when DRY, but overall try to maintain your code as simple as possible, that makes also easier to maintain.


This works until someone updates code in one place, but not the other, and subtle bugs are introduced.

DRY / single source of truth offers a certain protection against such bugs.


Only experience teaches you where to apply DRY and where not to.

Sometimes just because something looks the same or is similar does not mean it’s the same. Applying DRY just because it looks the same can have the unwanted consequence of changing in 1 place changing in another too when that’s not the desired affect. Then you add another parameter and conditional logic just because you don’t want 2 similar looking things.


Out of all the comments on this post, this is the one I agree with the most. I have been bitten by "not enough DRY" as well as "too much DRY".

I think experience as well as the maturity of the code are most relevant when deciding things like this. If you do this too early you will back yourself into a corner or eventually end up with a "god function" (usually the outcome of what OP mentions about conditional logic and more params).

If you wait too late you will inevitably have a giant codebase with lots of duplicates everywhere.

In my experience it's easier to clean up duplicated code than it is to break apart a "master function" that nobody has touched in 3 years out of fear.


WET -> Write Everything Twice (before premature factorization).


DRY and Once And Only Once isn’t about slavishly identifying similar code blocks. It’s about trying to arrange your code so that a single idea is expressed in a single place.

The initial API here is actually quite nice - there’s a good separation of abstraction and specification, and I can see all the information about an individual pizza in one place. The idea of making a pizza and the recipe for each pizza exist once in their respective places.

It’s true that a common pitfall is to prematurely create abstractions before having concrete examples of how they’d be used. But DRY is a _refactoring_. It’s something you do to an existing codebase to better clarify its design, not necessarily something to strive for ahead of time. Much better to extract abstractions from existing examples.

I always remember the tale Ron Jeffries tells of Kent Beck actually _introducing_ duplication to allow both pieces of code to be refactored. Duplication can be an opportunity to refactor towards a clearer design, but it’s not a mechanistic thing to do without thinking.


I am working in a code base right now that was literally ruined because of #3. It's full of extremely difficult to follow and test higher order functions that are completely unnecessary. A feature request did come for a "half/half" pizza and we're spending our days trying to disentangle the higher order functions.

The developer who wrote this thought himself the programmer genius and wanted to make a pattern out of everything. He did not accept criticism because "DRY is a holy principle".

And that is why a post like this is important. Because next time I have someone like him in my team I can point him to this post. Argument by Authority may be a fallacy, but it is significantly more persuasive than other arguments.

And yes, you can respond to this with "why did you hire this guy in the first place, or why did you not fire him?". Well I do not make all the decisions. Not every teammate is perfect. Such is reality. Particularly in such a young industry as software development which (compared to, say, electrical engineering) is still searching a common understanding of ubiquitous best practises.


>Argument by Authority may be a fallacy, but it is significantly more persuasive than other arguments.

I looked at https://gordonc.bearblog.dev/ - I don't know why I would think this guy was any more of an authority on what was important than I am. So I'm not sure if anyone who thinks they're a programming genius would even care.


Maybe it’s rather an “argument by effort” — someone bothered to elaborate this into a blog post, and the recipient didn’t.

It’s like finding a third person on the internet to agree with you in a one-on-one, without exposing the person you disagree with to judgement of an actual third person.


Good point. In my experience the chances of persuading the culprit are limited. However an article written by a third person is effective in helping persuade other team mates, POs, BAs and line managers. Ah yes, the terrifying politics of a team with internal disagreement.


I have to agree with you there - I'm just a random guy who wrote a blog post.


>"why did you hire this guy in the first place, or why did you not fire him?"

Because many companies are actually looking for these guys following the principles to a T initially. It's only midway through they complain about them lacking flexibility, if ever. Then later everyone complains about the incomprehensible mess while a few go "that's the way things are, we just need smarter people to understand our solutions".

And of course, it takes them 10 years to do something with a huge team which only took a small team a few years.


While one can certainly over-engineer everything into N levels of abstraction either with OOP stuff or with FP stuff like higher-order functions (or higher-higher-...-order functions for that matter), I could well imagine, that a person who knows FP stuff would not have as much of a problem with a specific code base and when that person leaves and only OOP-only people are left, they scratch their heads and call it an "unreadable mess".

So I would not rule out that possibility, without having seen the actual code. Documentation is a point though. One should always document at least for a bit stupider version of oneself. One day in the future that self will come back and "not get it".

That said, it is at least possible, that the company in question needs to hire a smarter person, or simply a more FP informed person. Also entirely possible, that the code is over-engineered and way too complicated for what it achieves. Without seeing an example ...


Honestly, I'm done trying to flip the script and approach it as a "well maybe they truly need smarter people to read the code".

Companies are openly looking for people giving them a spoonfed answer on 'good code concepts'. Most of these concepts have no academic basis, can be argued rationally both for and against, are shown to be damaging on a daily basis, and seem to have nothing going for them but 'preference' and 'context matters'. At best they form a way to talk about some things, often buzzwordy.

If companies are filtering based on whether you can regurgitate SOLID, DRY, etc., it becomes a self-fulfilling prophecy. At least some fanatics will treat those things as the solution to every problem and create something illegible to the majority of the population. You don't solve that by pushing the burden on the majority to adapt, you solve that by being a smarter strategist and stop letting fanatics do as they please. It's exactly as you say, write code with the commoner in mind, not the genius.

Writing code is part literature. Don't blame others if your writing is prose, obtuse and completely misses its target audience. Most people aren't born with great writing skills, start cultivating those instead of trying instant 'good code concept' solutions.


In my mind a fitting analogy to building software is like designing and building a machine or a physical tool in mechanical engineering: like a CNC machine, a laser cutter with an optical system for processing etc.

There are no rules in designing a machine except for: These are the requirements. First drafts will be analyzed and iterated on by simulation and prototyping (e.g. for mechanical resonances, thermal deformation) - an inexperienced engineering team will have to do a lot of testing and prototyping and failing, while a team who has some experience in building a machine in a specific domain will know what to look out for and what can be compromised on.

There are best practices but I feel they are always domain-specific (e.g. everyone wants to put their precise machine on a big block of mass to isolate it from the environment on a machine in a shop floor, while the best-practice for a similar application working on a plane is totally different).


Oh I definitely agree with writing code being a bit similar to writing prose. Along with that comes "naming is hard". It can take several iterations of renaming and restructuring things, taking a step back and thinking about how the program reads and how close that is to being easy to understand, without looking too much at the actual implementation and so on. And it does take practice, no question.

I would add though, that the opposite of DRY and SOLID are feared for good reason. Just imagine, if you had duplication everywhere, because the previous developer did not understand or did not take time to introduce fitting, suitable abstractions or indirections. So these principles are definitely worth something. I guess learning to apply them in the right moments is also something, that needs practice.


> I suspect any developer reading this is aware of the DRY principle because it is just so ubiquitous. If not though, you just need to know that it stands for "Don't Repeat Yourself" and is generally invoked when advising people to not copy and paste snippets of code all over the place and instead consolidate logic into a central place.

Well, no.

What you actually need to know is the next layer out: DRY stands for Don't Repeat Yourself, sure, but Don’t Repeat Yourself isn't the rule, it's a short phrase that is supposed to be a memory cue for the principle “Every piece of knowledge must have a single, unambiguous, authoritative representation within a system”.

If all you know is “Don’t Repeat Yourself”, you don't know the principle, and you can neither apply nor critique it.

#1 is simply applying the memory cue as if it were the principle. Yeah, don't do that.

#2 is, well, no, you only refactor to extract a bit of knowledge to a common place where it is immediately reused: there is no presumption of reusability, it is demonstrated.

The specific example they use of how this might be done wrong is...so bad.

Starting with: the example code that they suggest is bad design but works does not work.

  make_pizza([left_topping, right_topping])
gives args a length of 1, not 2, but their function definition relies on it having length 2, and using that to distinguish from the simple case.


Thanks - I fixed this mistake. I originally was "solving" the problem by checking if the arg passed was a list of lists or a list of strings but I thought I'd get flamed even more for being a terrible developer with that solution.

When you go read proper definitions of DRY they have lots of nuance that speak to many of my criticisms. But the reality is most developers are not encoding that nuance and using it as a fairly blunt instrument. I can't really prove it but at least some people in the comments seem to agree. So I guess I could say "DRY is misunderstood" - but if it's so easily misunderstood then maybe that's a shortcoming in and of itself?


Personal anecdata. I have been increasingly aware of my own mental patterns during development and I've noticed that often I've been sitting and mulling over refactoring to some sort of universal solution instead of getting on with the work and getting things done. There are instances when I could've finished the task twice as fast if I would've just went ahead and done it with repeating code instead of thinking of clever ways to DRY it.

Therefore for my personal projects I'm now firm believer of quick iterative building. Just get the first iteration done, get it working and save improvements for later. It may create a bit more work for the future me but it decreases the mental load quite significantly. I'll take less mental load with clear objective (refactor this because this) over more mental load with unclear objectives (make universal solutions taking into account things that may or may not happen in the future) any day.


My principles are "be as stupid as possible" write it for someone stupider and comment it for someone even stupider and then maybe you'll have something maintainable.

(Important note: stupid is not incompetent - it's a proxy for clarity, composability and rational structure without becoming formal, rigid, overly orthodox or academic about it)


Can it be that you‘re using „stupid“ exactly what is meant by „simple“ in the KISS principle? ;-)


Sure. Related. It's an art.

Generally the less code, the cleaner the conceptual execution. I always strive to remove and reduce conceptually deceits

Here's some code I wrote earlier, probably a good example

https://github.com/kristopolous/music-explorer/blob/master/w...

It's self contained, not very big, not trying to be fancy, as direct as possible.

It's worth noting a few things:

Some things are repeated when there's no reasonable way to refactor it in a way that simplify things.

No framework. No view/model/controller/provider/orm separation. It's not doing much and it does it fine

Stuff is composed but intentionally not abstracted

Here's a frontend

https://github.com/kristopolous/music-explorer/blob/master/w...

Again, no react or angular or other framework. Just direct modern code.

As far as what it looks like, it's a music player frontend to some sprawling project Example https://9ol.es/pl/


I can relate to this. I think Casey Muratori coined the maxim "write usable code first". I often mutter that to myself when I'm trying to design some crazy system rather than just solving the problem at hand.


  left_toppings = ["beef"]
  right_toppings = [] 
  make_pizza([left_toppings, 
  right_toppings])  # this will be a very funny pizza
Holy cow I was not expecting a none pizza with left beef reference in code form


For those wondering, it's a reference to this beauty: https://i.kym-cdn.com/photos/images/facebook/000/838/967/e39...


thank you


For anyone curious, here is the background of this, via Hank Green: https://youtu.be/5yWTPtPYukg

Be prepared for laughter


Something I have come to feel myself but haven't found a good way to articulate is asking "is what I am doing favoring authorship over maintenance?". I find that a lot of times the way DRY or other programming principles are used tend to be done to optimize authorship. This optimization sometimes happens at the expense of maintainability.

Anticipating maintenance is tricky; I had a scenario where a developer on my team created a utility function to abstract away some code that was being repeated multiple times in the same file. As an author that made sense because he was writing the same code over and over again, but down the line when we wanted a specific instance of this copied code to work in a slightly different way we ended up making the utility function handle the edge case.

Over time this utility became extremely hard to work with, because you weren't always sure if you made a change it wouldn't create a regression in other places it was used.

When we sat down and asked ourselves "Is this utility assisting in authorship at the expense of maintenance", the answer was clear. We removed it and put back the repetitive code. We felt good about it because in reality, 90% of the time we were interacting with this code we were doing it in maintenance mode, tweaks and small updates. When in maintenance mode I don't feel the strain of a specific part of my code being repeated, I'm only looking at a small subset of the code. Sure, if I need to author a new case in this code it might be a bit more wordy, but I think the tradeoff is worth it.

I am sure there are perhaps better ways to abstract things, or that we were doing DRY wrong, and our utility function could have been smarter, but I've seen this same thing play out over and over again and usually trying to make my abstraction better hasn't helped.


I tend to agree with your premise, though I wouldn't say the two are mutually exclusive. In fact, I imagine favoring authorship would more often trend to maintainable code than not, depending on what is being optimized for (writing less code, etc.)

I think in the case you described, instead of handling edge cases within that function, it might have been better to create an entirely new function to be called in those cases. You could then go a step further and identify shared logic, extract those and call them separately. At least that's what I tend to do when I find myself having to branch logic, especially established logic. Obviously I'm assuming a lot of the details here, and most likely what y'all ended up doing was the best right thing for your project/team.


I have reached similar conclusion. DRY projects tend to snowball over the years into mess where every minor change is insanely difficult, breaks everything and code is hard to read, bugs are difficult to solve, diffs are difficult.

WET code (opposite of DRY code, often starts as copy pasted) has more code, more typing, but the diffs are simple, bugs are simple (often you simply forgot to copy piece of code into 7 different places which is easy to solve). After many iterations, what started as very similar "classes" is now completely different.

One look at WET class and you know what it does, you change one line and you're done, maybe you need to copy it to 2-3 other files, maybe you don't. In comparison, you'll stare at DRY class for 2 hours and realize you need to refactor absolutely everything, it will break half of the codebase and diffs are insanely complicated.

I've recently wrote 2 similar projects, one wet one dry and wet one is simpler, easier to maintain, and more enjoyable to work on. Dry is root of all evil.


> but the diffs are simple, bugs are simple (often you simply forgot to copy piece of code into 7 different places which is easy to solve)

I am sorry but that sounds like an absolute nightmare. 7 different places means 7 different times that bugs might crop up because you forgot, specially if it isn't documented that you need to copy the code in other places. It also seems a nightmare to maintain documentation about that, as comments might get lost or not updated in all the copy paste. Of course, unit tests are out of the question, are you writing 7 slightly different unit tests and keeping them updated? And I'm supposing it's simple bugs, not pervasive, hard-to-reproduce, indirect bugs that take days just to find the root cause.

> In comparison, you'll stare at DRY class for 2 hours and realize you need to refactor absolutely everything, it will break half of the codebase and diffs are insanely complicated.

Sounds like a problem of overcomplicated, bad coding and bad documentation. It's not a problem with DRY.


Why would you want to maintain basically the same code in multiple places? That sounds more error prone to me. If the DRY code becomes too complicated then refactor it as needed.


Until you end up with multiple similar pieces of code, that seem to do exactly the same thing except for a few small changes... and you wonder if maybe all the other copies would benefit from those changes too.


Oooh, I spent a lot of time doing this! Good times... then I told the lead about svn:externals and we gradually mitigated some of the pain.

To the lead's credit he was humble and open to proposals while I was an asshole about it.


If you have the same code copied to several places it makes it much harder to maintain. If there you find a bug in that code block then you have to fix it in several other places, and if you forget some, a bug that you thought you had already fixed might arise again.


How often do you really write the same piece of business logic code in multiple places?

I think discussion is that mostly it really is different code that only superficially looks the same.

I don't like pizza example.

But I have seen more issues because people were trying to cram code that looks the same in one function than some bug needed to be fixed multiple times because code was duplicated.

You also have layers of code and DRY best applies to things like "SaveStuffToDatabase" or framework code. Where a lot of business code can still be better off duplicated because usage will evolve in different ways like: CreateNew vs EditExisting - there is a lot of business cases where when creating new entity you want to set some values that should never be available in Editing - but saving to database should be just saving to database...


> How often do you really write the same piece of business logic code in multiple places?

well, depends if you avoid DRY or not. If you apply DRY, zero times.


Think again.

Read with understanding.

What is really the question that is asked?


I agree with you in principle but your example sounds like a nightmare to me. Aren't you tired of having to make the same change 7 times?


Especially if there is a very common refactoring bug and you get a pepperoni pizza instead of a pizza with one of the most natural toppings like pineapple.

But I tend to agree with the author. Sometimes verbosity is the lesser evil. No suggestion should become a dogma and whoever played some games of code golf knows that short code doesn't mean code that is easy to read. Extreme example of course. But I believe many start to optimize in this way just as a way to reduce the line count.

Still, there is still room for some kind of factories or function templates (not in the c++ sense). I think a user is allowed to repeat himself but then again a user is just another arbitrary layer again. But if such helpers are to be implemented, I tend to like it in a place where the user is invoking said helpers and not on the level below that if that makes sense.


> ...Especially if there is a very common refactoring bug and you get a pepperoni pizza instead of a pizza with one of the most natural toppings like pineapple.

Exactly! The DRY'ed example in the first section should read rather:

  def make_hawaiian_pizza():
    make_pizza(["ham","pineapple"])
This demonstrates the omnipresent dangers of untested copy-paste.

"Copy-paste, copy-paste, Will Robinson!!"


My favorite pizza is pepperoni with pineapple and red beans.


What forced me most to use DRY in inappropriate ways is typing out blocks of the same code again and again. Then I realized that and began to maintain and use easily expandable snippets with fillable placeholders. It turned out that my mind had no objections against repetitive code at all, and the clarity of it has only increased, due to the lack of context switches and parametric entanglement.


Genuinely curious, what do you do for refactors? Create a new snippet and go replace all the required instances?


Yes, e.g. when I need to add a new line/block of code, I just search for a pattern and edit there. When refactoring demands heavy structural cross-module changes, I just don’t do it honestly. What’s dead is dead, but I may do a “guided” side by side rewrite.

I don’t touch snippets unless there is a good reason to do that. They are my general templates, not per-project tools.

In most of my code, the need for refactoring was mostly a consequence of building a too rigid high-tech structure which with time turned out to not fit the job anyway. Figured out I can avoid it by not building it, and antiDRY also plays a role in it (albeit mostly psychological).


I have taken what you might consider low-DRY approach also so it's interesting to see what mechanisms people use to manage reuse without introducing complexity into the system itself.

I think taking another look at project agnostic generators is something worth doing too. It has the benefits of automating some of the duplication without the rigidity of deep abstractions and dependencies. I am still exploring that though.


This is like a compiler inlining code for faster performance, with exactly the same reasons


We do this because, unlike developers, CPUs can’t learn anything. No reusable abstractions are possible without extra instructions (and cycles) that tell the CPU exactly when and how to reuse them, or new microcode firmware that ordinary users aren’t empowered to write.


Why not turn your snippets into actual templates?


What do you mean? I’m currently using vim-snipmate.


I mean, if you have repetitive pieces of code that can be parametrized, why not parametrize them directly in the code instead of using a tool to produce the same code over and over?


Quoting myself,

due to context switches and parametric entanglement


Can you give an example where abstraction would be too complex, but a snippet wouldn't?


E.g. I rarely abstract http calls or endpoints, because APIs tend to have nuances and to account for them I have to add more parameters than is worth. So I have a full-blown snippet from which I delete irrelevant parts.

Another example is little utilities like a promise as a semaphore (by lifting resolve() to the scope). It could go as a utility function without parameters, but then I’d have to maintain that utility module across projects, which makes it extremely fragile, and force other people to deal with it.


I once worked on a project that was basically a simple My Account application/area for a train ticket retailer.

The backend itself held no data, but whoever built the backend had gone full service layer, with models and adapters to the upstream services that hold the data.

The result was a backend that was a pain in the ass to change, necessitating whole trees of file changes to build features.

So we started inlining everything. We just took it back to the request handlers. We started fetching, mutating and returning the data in the request handlers. Suddenly a change became modifying one function. Every endpoint was unique and didn't depend on anything else. Things became easy.

Halfway through the migration, someone got our effort reviewed by a principal engineer who told me "it wasn't SOLID", and my contract wasn't renewed. It didn't dishearten me.

Software design is meant to make change easier and proudly adding abstractions can be a bad thing.


> So we started inlining everything. We just took it back to the request handlers. We started fetching, mutating and returning the data in the request handlers. Suddenly a change became modifying one function. Every endpoint was unique and didn't depend on anything else. Things became easy.

What is the big benefit you gained from doing that compared to calling, say, a service method call in the controller?

    costService.generateCost(newPrice);
Is it really that difficult to go to the service method definition? With inlining at the controller level, in order to unit test generateCost, you'll now have to deal with authentication/authorization/request handling related infrastructure which has nothing to do with cost calculation.


The most complicated code we had was for generating receipts, which used some functions which we kept separate, because it made sense.

Auth was handled by a middleware. And then once we'd stripped out all the layers, 95% of the handlers looked like roughly like this:

result = fetch(...);

/* Maybe more fetches, maps or filters */

response.send(result);

They didn't really _do_ anything. It was all just small tweaks to data someone else owned. The biggest challenge were upstream endpoints changing on us, making sure we were logging and passing things like correlationIds consistently. Moving to fat handlers, we unified those by having those things already set up and passed into the handlers. The focus was on devex, so a junior could easily modify/create an endpoint and not have to think about how to get it right. We made the pit of success as easy to fall into as possible by breaking the rules that weren't serving the project very well.

It was a glorified proxy layer. There were benefits in treating it as such, rather than deluding ourselves into thinking we needed services, repositories, models and such. Just transform data from someone else's endpoints and focus on the frontend.


IMO, DRY is the second principle, the first one is KISS. It is preferable to repeat ourself if that contributes to simplicity and ease of maintenance. My third principle is that there is only three principles.


"There are two ways of constructing a software design: one way is to make it so simple that there are obviously no deficiencies, and the other is to make it so complicated that there are no obvious deficiencies.”

Simplicity is hard.


As with any such article, it comes down to competence mistaken as principle. Principles are never, here, to blame. A cultish (inexperienced) belief in the sanctity of a principle is to blame.

Excellence in programming is trading principles off each other based on the design constraints and expected changes.

DRY trades off everything else in different ways depending on the language and problem. DRY in python should, often stop when you ask "should this be a metaclasss" but before "should this be a decorator", that's different than in C


> "Had we started out with two pizza types that have different crust/sauce/cheese, we never would have made this refactor. Instead of our code being architected around the concept of how pizzas are made in the abstract, its architecture is tightly coupled to the specific needs of these two pizzas that we happened to be dealing with. The chance that we will be putting this code back the way it was is extremely high."

Maybe.

But maybe you're falling prey to the other programmer trap - catering for conceivable situations that are just never going to happen, and making your codebase unnecessarily accommodating as a result. This is another great source of complexity, and quite often the source of unnecessary abstractions (which add to cognitive load) too.

In my experience it's better to cope with half-and-half pizza toppings when they arise, rather than coding as if they're already needed. Because when they are needed, you'll probably find the requirement is actually to put them on a 3-tier wedding cake, or a car.


I often find that DRY is in conflict with Conway's law[1]. It's almost always better to let Conway's law win. If I help write a build script for one team, I often copy and paste it into another's git repo to instead of trying to share it. It's often way better than factoring it poorly[2] or getting the two teams to coordinate on changes they need to make to it. Best to let the two copies diverge in that case.

1: https://www.wingolog.org/archives/2015/11/09/embracing-conwa...

2: https://sandimetz.com/blog/2016/1/20/the-wrong-abstraction


I think there are three levels of understanding this topic:

1) Repeat yourself everywhere because you're a noob and don't know how to DRY

2) DRY everything because it's easier to maintain, learn all sorts of clever tricks to make DRY work

3) Realise that sometimes it's easier to DRY and sometimes it's easier to write repetitive code.

The problem is that if someone at level 3 (like the author) talks to someone at level 2 then the level 2 developer thinks that they're talking to a level 1.


The general approach is to refactor/generalize when creating a third version of something. With the first thing, you don't know if it needs any common functionality or what an appropriate abstraction will look like. With the second thing, you have some similarities but not enough information to know where the abstractions should be -- here, repeating yourself is OK.

With the third thing, you should have enough information to work out where generalizations should be. Even then, only generalize what you need to at the time. Going overboard can add unnecessary complexity, so it is generally a good idea to be conservative in what you generalize.

As you add more things, you can refine and evolve the system as needed. At this time you should have a better understanding of the system and what parts can be shared and generalized.


I've tended to approach refactoring common functionality based on whether two pieces of code are either "coincidentally" the same or "intrinsically" the same.

If code is coincidentally the same, then you should leave it alone - the two pieces of code are likely to evolve independently and trying to make a common function/class handle two separate usecases is likely to lead to complex, ugly code.

Conversely, if the two pieces of code are intrinsically the same then you SHOULD pull them out into something common. If you don't, you risk the implementations drifting and getting inconsistent behaviour over time.

Determining which is which is a matter of interrogating your domain and business logic, which is the essential function of our job as developers/engineers.


Two instances isn't worth consolidating because it's hard to know whether the similarities are real or coincidental. Check out the rule of three: https://en.wikipedia.org/wiki/Rule_of_three_(computer_progra...


> "All these ideas are great. But remember that the fundamental goal here, is to send a POST request with a single JSON object."

This. A million times this.

IMO, the single most important principle is still "Keep It simple when you can, make it complex when you have to."

A system that is simple can be grokked quickly, meaning it can be debugged quickly, modified quickly, new developers can be onboarded quickly,...

Yes complex systems have to exist. Some tasks are complex, and require complex solutions. BUT: Complexity should come into play when it is necessary.

It is perfectly okay to design simple solutions for simple tasks. Yes, sometimes this means ignoring things like DRY.


I disagree with the author's example as given.

The example discusses a code boundary that is internal to a single atomic "module" - the preparation of a data structure that describes a pizza. Then the author says that bad things will happen if said code boundary is used from other modules.

However, why would an extrenal module developer do that? It is common wisdom to recognize and avoid module-internal utility functions.

Conversely, as long as the presented shortcut is internal to a module (=used only for a specific set of use cases well understood by anyone touching the code), and saves toil, it might actually be justified.


> However, why would an extrenal user do that? Potential external users typically recognize and avoid module-internal utility functions.

External users will go look the implementation of msvc's standard library and reverse engineer windows API to make things faster lol. No internal module function is ever safe.


I agree that no internal module function is safe, but MSVCRT is used by literally millions of developers, some of which have very uncommon functional requirements, such as making their product work on a rare version of Windows. My empirical observation is that most developers are prone to the other extreme of not considering internals when they should.


Have you ever seen the complete opposite? non-DRY all over the place? No variables or constants defined, same values manually inserted all over the code? Or same code with slight differences duplicated all over the place. - When you see that, you'll realize how useful the DRY principle is.


"I figured I'd kick off my new blog with the most click baity thing I could think of."

Then write: "Why i prefer tabs over spaces!" let the flamewars begin


"When to use tabs, and when to use spaces."


The article is actually good. The criticisms of DRY here are valid! For criticism 1, I had a coworker once say something that resonated with me:

"Just because two things are the same right now doesn't mean they _should_ be the same."

So that criticism is totally valid - DRY has to be applied only when things _should_ be the same, and that can actually be hard to identify.

That being said, of course the title of the article is bad and not accurate. DRY is essential. I don't think there's many people that actually argue against it. If you have a piece of business logic that's essential to the business and it influences other pieces of logic, they all have to refer to the same definition. Repeating it is bad for everyone - users will see inconsistent behavior, and devs will have to "remember" (read: never actually remember) to update important logic in multiple places. Important things should have a single source of truth. That seems inarguable to me.

It can be hard to find a design that actually achieves that. That's not DRY's fault.


No, this article isn't good because it discusses alternative options within the boundaries of seriously wrong premises (write nonsensical hardcoded recipes "right"), and unsurprisingly all the options are bad.


The code in the second example is horrible and for some reason used incorrectly by the author.

My usual approach would be something like:

    def make_pizza(left_toppings, right_toppings=None):
        if right_toppings is None:
             right_toppings = left_toppings
    
        ...
through really it should probably be

    def make_pizza(toppings=None, **kwargs):
        if toppings is not None:
            kwargs['toppings_left']  = toppings
            kwargs['toppings_right'] = toppings

        return requests.post(PIZZA_URL, kwargs)
If this logic should even be handled in the application itself at all (not sure why you'd choose to make a breaking change to the API rather than extending the API, though changing the make_pizza function to keep the code working after an API change is the correct response).

I'm also not sure why the author chose to make the function capable of handling an arbitrary number of arguments, or why after doing so he chose to incorrectly invoke it on a list.


The big deal is that when a boundary exists, DRY should be "ignored" (cannot word this decently).

For example, let's say there is an application where a user can purchase items and also give reviews for such purchased items, the user reviewing and the user buying have in common just the ID, while in the review boundary the relevant information is probably the user nickname, while in the purchase boundary the relevant information is the payment system, or the state of the checkout ("purchase" is probably not a single boundary).

In that case, data could be duplicated to ensure the boundaries are decoupled.

This is of course at the data level, but usually it translates to "there is a user model that has many orders and many reviews", because of DRY, no two user models could exist, there you have the boundary violation though.

Sorry, this is a bit of a ramble, it's a long discussion.


> Now we are talking about all kinds of fancy programming stuff to try to solve problems that only exist because we don't want to repeat the same 6 line snippet in a handful of different places because DRY tells us that's bad.

Oooh yeaah ... I have had that exact argument in the recent code review where patch literally modified _hundreds_ of LoC across many different files just to avoid duplicating a simple 10-liner at a single place? Yeah, you read that right.

A developer basically rewrote half of the existing code architecture and applied "best" OOP practices. Intention wasn't a bad one but it is unnecessary to say how incomprehensible code would have become if that patch went in in its original form. It was hard to argue against it and what would have been a 10-minute work it became a 2 or 3 week long discussion. And that is just ... bad.


The biggest issue with DRY is commitment to a bad abstraction just for the sake of not copy-pasting some code. Abstractions should be liquid while you're figuring out the best way to model your problem, and DRY can often be a culprit in having a model that's a bit too rigid.

Obviously YMMV.


Despite the backlash in the comments, I have to say I agree with the article. I realized eventually that it's our job to produce solutions, and not write code. Engineering (at least for software) is about making a computer do something novel, or if not that then making easy to adapt. Creating code that is nice and elegant, DRY, is about engineering code, but not engineering a solution. If it helps make things easier then sure, but despite the simplicity of the examples I think they demonstrate what everyone has seen -- this one nice to have function or whatever turning really ugly from trying to handle all of the edge cases. At that point it becomes nice to not have, but it's too late.

Also, I liked the comment from ihateolives somewhere in the thread a lot.


"Duplication is far better than the wrong abstraction" - Sandi Metz

https://sandimetz.com/blog/2016/1/20/the-wrong-abstraction


<whatever-done-correctly> is better than <wrong>


The very first "refactor" is broken, the last line should be `make_pizza(["ham", "pineapple"])`. Which is actually very easy to notice if you actively watch out for repetitions.

Ironically the author just demonstrated why DRY is a great principle.


If I can be a little too honest here, I will admit that I just don't like making programs that are too simple. I can make an extremely simple program... But it doesn't feel good. I know there will be limitations to that simplicity, and I want to make functions that do things, and combine those functions, and let them override things, and pass state, transform state, etc. Making it complex just feels better.

I actually stand outside my body and watch myself make it more complex, and think, "Ugh, this is more complex than it needs to be, I should make this simpler. But I don't want to." I continue and hope that a refactor will make it less embarrassingly complicated.


A Philosophy of Software Design by John Ousterhout goes into some of these ideas a bit. To me, the book takes an approach that is somewhat contradictory to Clean Code (a bible to many), but in a rational and well explained way. Lots of talk about over abstraction which can end up complicating code reading in the long run.

An idea I've seen a lot here on HN is that DRY is good with a baseline number of reuse. If we see the same pattern twice, maybe it's not a good abstraction since we haven't seen it grow yet. If we see that same pattern 15 times, I think we know an abstraction is handy here.


I believe that’s called WET (Write Everything Twice). It’s a useful reminder to not be dogmatic about DRY.


Perhaps the problem is that we've stopped teaching people about coupling and cohesion, along with the mechanism of stepwise refinement.

We abstract to functions to reduce cognitive load and to allow scope rules and information hiding within the language to prevent local variables becoming pseudo globals.

The whole premise of the 'goto consider harmful' structured programming movement was to allow us to replace control structures with a single black box consisting of input-process-output, which aided in reasoning.

The premise was to construct the program from cohesive functions that are lightly coupled.

When did we move away from that?


Notably, this already starts dry.

A less dry one would look like

def make_hawaiian_pizza(): payload = { hwaiianCrust: "thin", redSauce: "tomato", cheese: "regular", ham: true, pineapple: true, toppings: ["ham"] } requests.post(PIZZA_URL, payload)

def make_pepperoni_pizza(): payload = { pepperoniCrust: "thin", crust: "thick", sauce: "tomato", cheese: "regular", toppings: ["pepperoni"] } requests.post(PIZZA_URL, payload)


We could move away from prescriptive "programming principles" and towards ideas that empower people to use their own judgment. Instead of "Don't Repeat Yourself", it could be "You Don't Have to Repeat Yourself".

Now I know what I'm getting myself into here. Most people hate making their own choices and love to blindly follow simple prescriptive rules which are known by Experts to produce Good Results. But when the religious approach isn't working for you, maybe it's time to stop making your occupation a religion.


People should reconsider writing articles like these. It's just a list of (3!) criticisms. You can still have your click-baity title, but why not write about when and how to use XYZ principle instead?


As anything it is just a tool in our toolbelt and should be used carefully. If our system contains same user validation in 2 places, changing it in 1 place may lead to issues, which are difficult to discover. However forcefully implementing DRY everywhere can lead to coupling and lack of separation between modules and influence deployments, and work of different teams. Its more difficult to build context of the implementation, if one needs to jump from file to file. There’s a balance to when to use it or not.


> same user validation in 2 places, changing it in 1 place may lead to issues, which are difficult to discover.

Disagree. Quicker discovering issues difficult to discover is actually a good thing.

> Its more difficult to build context of the implementation, if one needs to jump from file to file.

Agree.

In general, I personally look at DRY as "It takes time to implement and/or understand it, but when it's done then it works and it will last." "It takes time" is something your boss won't like, but that's (mostly) not an issue if working on open source SW. No boss -> no pressure -> higher quality.


What about something like this (in JS)?

  // Option 1
  const Pizzas = {
    Hawaiian: {
        type: 'hawaiian',
        crust: 'thin',
        sauce: 'tomato',
        cheese: 'regular',
        toppings: ['ham', 'pineapple'],
    },
    Pepperoni: {
        type: 'pepperoni',
        crust: 'thin',
        sauce: 'tomato',
        cheese: 'regular',
        toppings: ['pepperoni'],
    },
  };

  // Option 2
  // Pizzas could be returned from an API, so that the pizza 
  types are configurable outside of the code
  const response = [
    {
        type: 'hawaiian',
        crust: 'thin',
        sauce: 'tomato',
        cheese: 'regular',
        toppings: ['ham', 'pineapple'],
    },
    {
        type: 'pepperoni',
        crust: 'thin',
        sauce: 'tomato',
        cheese: 'regular',
        toppings: ['pepperoni'],
    },
  ];

  const makePizza = (pizza) => requests.post(PIZZA_URL, pizza);

  // Then, for Option 1
  makePizza(Pizzas.Hawaiian);
  makePizza(Pizzas.Pepperoni);

  // Or, for Option 2 (e.g. user selected via a menu)
  makePizza(selectedPizza);


That's very clever, but the OP's point is that you shouldn't try to be clever. Your life will be easier if the code is simple.

When I read the initial example I can understand it in the time taken to read the code.

With your example I had to think for about 1-2 min before it made sense. If the codebase is full of clever stuff then I have to spend hours understanding all of the clever things before I can make changes. If everything is simple then it's easy to change.

If you want to see where overengineering leads you then take a look at this project. https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpris...

It is satire but I have absolutely worked in places that write code like that.

Good programmers know that it's 10x times harder to read code than write it, so they deliberately keep it simple so that they can read it later.


Thank you for your response. Would you mind clarifying what is clever about it? Thank you very much.


> All these ideas are great. But remember that the fundamental goal here, is to send a POST request with a single JSON object. That is a very, very simple thing to do. Now we are talking about all kinds of fancy programming stuff to try to solve problems that only exist because we don't want to repeat the same 6 line snippet in a handful of different places because DRY tells us that's bad.

Yes, that is a very common beginner mistake. Sometimes a little copy-pasta is needed to avoid over-complicating what needs to be simple.

Where DRY is really important are things like:

- Hey, you seem to be using "foo" and "bar" all over the place. Put those strings in constants.

- Hey, you're using magic numbers all over the place. Use an enum (or constants depending on your language / situation.)

- Wow, you copied and pasted that logic all over the place. Now when we need to make a change we have to make it in 20 spots. That should be encapsulated in a function / method / object

- (And to get closer to home) Even though you just want to "send a POST request with a single JSON object," we have a common session management pattern and error handling pattern in our application to deal with this API. That particular pattern should be encapsulated so you aren't repeating it for every #%$#@ API request.


DRY is about code, not data. If you see two methods do the same thing somewhere in their code blocks, separate the duplicate logic into a separate method, then call that method when you need to execute that logic.

The reason for this is maintainability. Say that logic needs to be changed. If you didn't break it out into a callable method, you'd have to find all the places you use that logic and change it. If it is a callable method, you only have to change the logic in one place, thus DRY.

https://en.wikipedia.org/wiki/Don%27t_repeat_yourself

Here is an overly simplistic example:

If you see this scattered around your code:

  var formattedName = firstName + " " + lastName;
Create this method and call it when you need it:

  string FormatName(string firstName, string lastName)
  {
    return firstName + " " + lastName;
  }

  var formattedName = FormatName(firstName, lastName);
When business decides it wants to change name formatting from "firstName lastName" to "lastName, firstName", you only have to change the logic in the FormatName method because you "didn't repeat yourself."


> DRY is about code, not data.

No, it's about representation of information in a system, as your own Wikipedia link states right up at the top of the second paragraph. Code and data are both forms in which information may be represented ina a system.


>No, it's about representation of information in a system, as your own Wikipedia link states right up at the top of the second paragraph.

That's a little too broad to be useful, IMO. I learned the DRY principle before that book came out, and it was strictly about not repeating yourself in code. You can apply it to other things, but it was originally code. Normal form is a good example of applying DRY to RDBMS systems/schemas. "Single source of truth," is a good example of applying DRY to separate database systems. None of those were considered part of the DRY principle when I was starting out.

>They apply it quite broadly to include "database schemas, test plans, the build system, even documentation".

The article even hints those particular authors expanded on it beyond its original intent. They certainly didn't invent it.


This article's conclusion and headline is shallow. The example is a good one about where (depending on context), adding abstraction to reduce repetition might add complexity. Great point! Repetition is fine in reasons like that.

It does not invalidate DRY concerns in general! For example, an important reason to avoid repetition is that it adds maintenance inertia. Where a setup that a single-location change is easy to experiment with and improve, one that's repeated several places becomes tougher to change. Ie, friction. I would argue this is also adding complexity - what the author seeks to avoid.

You could imagine contexts for the pizza example where the repetition doesn't make sense, and a refactor could make things easier. You can't tell alone from the snippet.

From the headline and concluding paragraph, it feels like a straw-man. eg overrated, and:

> "Well, obviously I'm not saying we should throw DRY completely out the window. I'm not sure it would actually be possible to write code that "never doesn't repeat itself". But I do think we should tone down knee jerk reactions to PRs that contain several repetitions of a block of code. There are at least a few cases where that might be the exact right thing to do."


> but I would assert that any change that doesn't modify the existing calls of make_pizza or make a totally separate function for split topping pizzas (not DRY) will be some level of bad.

You make a make_pizza function that supports split toppings and you pull the guts out of the original make_pizza function that just calls the first make_pizza function with left_toppings=[toppings], right_toppings=[toppings]. You don't need to ruin your function signature with *args.

The fundamental assertion is that you should be structuring your code such that it is reflective of reality, but reality is really bloody messy. The immediate response to this example is that Pizza is in fact Toast[1] and so you should actually have a make_toast function that handles all forms of toast. This is clearly ridiculous, and if you're building a system to make pizzas and you build your function in a way that extends as far as building nigiri sushi, you're an idiot. You have to take a reasonable judgement of what is the underlying structure that you want to reflect. It's not a coincidence Hawaiian and Pepperoni Pizzas are structure the same.

[1]:https://cuberule.com/


This is like saying "database normal forms are overrated - they make my SQL more complex and harder to read!"

Well, yes, they do. They also make your database slower. This doesn't mean they are overrated - this means they are tradeoffs. Like everything engineering.

With DB normal forms you buy integrity (i.e. keeping the data consistent as it changes) and you pay with performance and schema complexity. Usually this is a sensible tradeoff because integrity is more important. But, for example, if your data never changes - you will be paying for nothing. Or, perhaps, you can't afford the performance price and you have some other way to ensure integrity. Then you de-normalize.

DRY is similar. As mentioned in the sibling thread, it's not about mechanically avoiding repeated code - just like normal forms are not about never having the same value in two different rows. It's about maintining logical integrity of your code as it changes. PI=3.14159265359? Probably safe to copy around. An implementation of some use case? Probably not.

I'd say following DRY/STEP principle is a sensible default. If a reviewer asks you why your code is not DRY - you should be able to articulate a reason.


Thanks for this article, I was just talking with my colleagues about it. And didn't find something simple to share with them, so this was just what I needed.

I think DRY is a good thing in some cases, but you should careful consider when something is worth to DRY and when rather WET gives you the best tradeoff for isolation.

My metrics to decide is to stick in favour of the Single Responsibility Principle. If DRY means compromising it, most likely is not worth it.


Probably the most common comment from me in code reviews is about code being prematurely DRY’d out. Fortunately, I’ve found that mentioning the Rule of Three is usually enough to correct it, and it tends to stick in the mind. Waiting that little bit longer, more often than not there’ll be no third instance of some pattern and no need to abstract it. When a pattern does emerge, it’s a clearer one. Either way it’s less work.


Except, once you are done, you probably never have to touch that code again, and creating a class does not really take long.

Depending on the given problem, this is sometimes more time consuming, but still gets easier and faster with practice, and the benefit down the road can be tremendous.

Associative arrays are bad even for such simple things imo, because it breaks autocompletion / code inspections, and your functions are then these blackboxes that are hard to understand without looking at the implementation (code). Sometimes this is also evident when it's just you working on the code; try leaving the code for a few months only to return at it, and waste time relearning how to use your own code, because it is not self-documenting, and you can also forget- or misspell an array key. Etc. This is not the case if you define data types as objects instead of using arrays.

I learned this from trial and error myself, and I used to use associative arrays a lot for things – now I find myself using/creating objects more often, and I just love returning to this code later, and have it work without too much crapping around.


There was rarely a point where I haven't regretted using an associative array instead of a class. Not only adds the class semantic meaning, you can also add constraints and methods to it.


Uh, no. Like your about to have to endure a discussion at the Lead Developers desk no.

"Copying and pasting a few lines of code takes almost zero thought and no time"...and you're fucked. Pardon my French but it's warranted. Been there, done that, now know better.

Firstly software development is a thinky sport and the moment you're cut-n-pasting code while not thinking you're exhibiting risky behavior. Here come the bugs.

Guess what happens next: senior devs are busy, simple bugfix is assigned to junior or farmed out to contractor. They fix just one of the cut-n-pasted routines and call it a day. Then you play a few iterations of the PR to test failure game or you ship a bug. I see this all the time. I saw it yesterday. After it fails the tests enough and I get the PR I have them DRY that code up.

This is especially important if you've inherited a crappy code-base with lots of duplicate code. We have a rule that if you touch it you DRY it. Never had that rule not serve us well. Getting people outside the core team to stick to it is work, but that's a different oroblem.


API/framework complexity can often be minimized within reason, but given the code-template nature of pattern components it is unreasonable to expect optimization without incurring tightly coupled code/structures.

For example, a project using a framework may only require a developer look at 4 small files to understand the functionality of a resource, and it acts as inline documentation to others on how to quickly contribute new features. In a way, through explicit separation of resources the “similar” code tends to differentiate rather quickly as use-cases rarely share the exact same context throughout the entire life-cycle of a program.

The worst maintenance teams of popular projects permute an API definition every 6 months, and break existing production code in downstream works. You know, ironically still building that bug infested Ivory Tower everyone assumed they could avoid with grossly oversimplified acronyms ( https://en.wikipedia.org/wiki/Ivory_tower ). ;-)


No, it’s a pretty good principle, but just like everything in programming, there are no hard rules. Every principle has times when they are appropriate and when they aren’t and it’s our job to find what is most appropriate for the situation at hand. It’s still worth striving towards DRY, but that doesn’t mean that there aren’t many cases where it doesn’t improve the code.

The examples are also pretty contrived, there’s hardly any duplication there and the duplication is very simple and unlikely to change much. DRY is beneficial when the repeated code is complex and will likely need to be changed in the future (eg to fix bugs or to be extended), where repeating the code will be a source of error since every change would then need to be applied to each instance and forgetting one is a problem. The example is trivial enough that I wouldn’t bother refactoring it until it became complex enough to be a problem.

A good principle is to not apply principles too quickly/soon but only when not doing so would introduce complexity or cognitive overhead. YAGNI, basically.


As usual, when someone is trying to show how certain code idioms are bad (or good in some cases), the entire example is bad which makes it hard to even care about the entire point of the article.

As some have pointed out - make_pizza shouldn't be anywhere in the code. There should be a mongo collection or rel table that has a bunch of typical pizzas and a way to make custom ordered pizzas, typically through a UI.

The more complicated thing is the data structure that represents any pizza (like 50/50, 10/90, 10/80/10, toppings per section, etc...)

And a mongo collection for "typical" pizzas would fit that pretty well. And beyond that custom ordered pizzas.

All that being said even the above is over engineered. Typically this over-engineering is a result of allowing end-users to create a pizza. I'm super old school and still call in my orders on the phone. The difference being that end-user UIs that let you make a pizza need to be over-engineered while call in orders is just a bunch of notes and a total "additional toppings" count.


People forget that DRY means "Every piece of knowledge must have a single, unambiguous, authoritative representation within a system".

The principle concerns the duplication of knowledge, not code.

Author's reference to "accidental duplication" is caused by two similar code that represents different code that is when joined becomes the code that is ambiguous in meaning.


There's no silver bullet. My opinion is that you should weight alternatives without repetition and with repetition and choose the most appropriate one. Also if in the future you feel that this common code is becoming more complex with options and switches, feel free to remove it by inlining, either completely or in a few places. Often it'll allow for better code or it'll allow to find out another way to extract common code.

Basically I like the refactoring approach. You have a set of refactorings. Like extract method / inline method. The point is that every refactoring is two-way. And both ways are useful in different situations.

To support this approach, sane IDE is a must and strictly typed language is preferable You should refactor your code without fear of breaking unrelated code.

What I definitely think is overrated is "if it works - don't touch it" principle. It's lazy and in the end it creates much more work than if one would gradually improve something that works.


Here's how I apply DRY.

As I build a thing, I just sling duplicate code like I'm getting paid by the line. Once I've created all or most instances of the duplicate code, and they're working, only then do I circle back refactor it to extract common concepts and abstractions.

I've learned over the years that I don't really understand what I'm building until it's built. I need to step back and look at the patterns that have formed to spot the difference between firm concepts and trivial duplication.

Doing this well takes experience. It involves predicting how your code is likely to change. One lesson from experience is to favor repetition over bad abstractions. I formed this opinion from living through the pain both types of anti-patterns. Duplication causes the need to find, change and test all instances. That sucks. It leave you open to bugs. But bad abstractions can require ripping the whole thing apart.


DRY absolutely can get this ugly and messy (I've seen it many times), but I believe that this is largely an experience problem. You have many tiers of DRY knowledge:

- Have heard of it and it sounds like a good idea

- Let's DRY everywhere!

- OK, maybe don't DRY everywhere...

- There are multiple ways to implement DRY, and it all depends on circumstance

Taking the article example, a better approach would be to DRY the data first (after discovering that in your organization the most common pizza is thin crust with tomato sauce and regular cheese):

    val STANDARD_PIZZA = {
        crust: "thin",
        sauce: "tomato",
        cheese: "regular",
    }

    val TOPPINGS_PEPPERONI = ["pepperoni"]
    val TOPPINGS_HAWAIIAN = ["pepperoni", "pineapple"]

    def make_pizza(design):
        requests.post(PIZZA_URL, design)

    def make_standard_pizza(toppings):
        make_pizza(STANDARD_PIZZA + {toppings: toppings})

Now it's easy to use with no repetition:

    make_standard_pizza(TOPPINGS_PEPPERONI)
    make_standard_pizza(TOPPINGS_HAWAIIAN)
    make_standard_pizza(["pepperoni", "ground beef", "olives", "feta cheese"])

You can easily add to it:

    val TOPPINGS_VEGETARIAN = ["green peppers", "tomato", "spinach"]

Then when you need to expand for half-and-half:

    def make_standard_half_and_half(left_toppings, right_toppings):
        make_pizza(STANDARD_PIZZA + {left_toppings: left_toppings, right_toppings: right_toppings})

    make_standard_half_and_half(TOPPINGS_HAWAIIAN, TOPPINGS_VEGETARIAN)

This gives you both low level and high level (convenience) interfaces to pizza generation, with none of the silly class complexity or function explosion.


That's quite ugly.


He doesn't address the problem of in a large project, sometimes it's better to copy and paste a function to avoid the extra library dependency. DRY and over coupling are two forces that must be balanced by the engineer. Dry is more important at small scales, over coupling is more important at large scales.


Generally, I never think in terms of writing "DRY" code. I think the presumption of re-usability is a primary reason.

I architected a React.js framework that needed to exist in our existing portal environment and play well with all of the other frameworks and scripts. My solution was tightly coupling bundles of code for widgets deployed on my platform. I have conventions all devs need to follow and it does result in not very dry code.

The benefit is that everyone can work independently and not affect each other. Testing is easier to do as there are less logic paths. Performance is still optimized with code-splitting, so the extra code really doesn't affect performance.

Whenever people try to create a DRY one-size-fits-all solution, I find them very inflexible and prone to breaking. Add to that, they are generally poorly documented, so making changes can be very stressful.


It looks to me that the author has not faced a problem with DRY, but a junior developer and/or YOLO approach to architecture design.

In many cases thinking about possible future features/extension was one of the most limiting, complexing and at the end irrelevant approach out there, because it usually turned out that the future was different than it had been imagined. On the opposite side, when you take into account just what you know at the time of writing, led to easier adjusts, because it was way more clear to understand.

DRY is a great technique to organize your code, you just need to think about the structure first. In this article, the problem was that he wanted to bend 2 pizzas method into many other different types of pizzas, so his pizza architecture changed so vastly that it was not possible to describe it with just the initial idea of pizza.


I think you could make a decent case for DRY being the only principle.

If there's something getting in the way of DRY it's probably the biggest problem you have. However, that doesn't mean you have the ability or control to fix it completely in the short term, but the arc of history bends towards DRY I think.

In this specific article, the first example seems better the DRY way to me. The author seems to suggest that rewriting it later is worse than repeating the logic everywhere, which sounds very fragile. If you can't confidently rewrite an API later than you're doomed to repeat yourself until it all collapses. You could make a reasonable argument that the developer who fulfils the short term requirements and has found a new job by the time it all collapses would have a more lucrative career, but I doubt you could argue it's better software.


Only part of this I would quibble with:

> Copying and pasting a few lines of code takes almost zero thought and no time. Find and replace are very good at finding repeating things later if we start to care

Find-and-replace is unfortunately not really adequate for finding repeated code, in my experience. There must be much better tools out there.


The first code example doesn't make sense. There is no good reason to write code like that which hardcodes payloads for different types of pizza. It's a realm of data. Realistically, those payloads will either be constructed by user in the UI, or they will be provided in json file as predefined variants.


I actually think that the example provided in the article can be solved easy - without adding too much complexity with OOP as the author explains in the third point.

Yes, I actually think OOP is not bad :)

But I disagree with their conclusion:

That the goal is to send a post with a single JSON object and marking this task as:

> That is a very, very simple thing to do

That is a very simple thing to do if you think that you will write this code once and never have to change it to fit some new requirements.

But probably there will be changes either from your own business or because the API will change thus the task becomes:

<< How can I implement sending a post request that will follow a body request format and create a code that is simple to understand and _easy to change_ >>

And as right now when you write this code you cannot know what kind of change will come in the future the best way to move forward is to write small functions with very few conditions and open to extensions.

Thus repeating that code there is not a good solution. What if the API will request to add any new key in the payload? And those two methods (def make_hawaiian_pizza and make_pepperoni_pizza) are not in the same file and the one doing the implementation is not the current author to remember "ahh the code is duplicated so I have to change it in multiple places"?

Anyhow there are cases when duplication is good, but when composing the payload for a request is not one of them :) IMHO.

Let me add to think one more thought: the structure of code tends to be duplicated in the future. So choose not to DRY having in mind that people who will write code after you will tend to make the same choice. They will look at what you wrote and then follow a similar structure.

So don't DRY but make sure you do this in a place where you will be ok with other people increasing the number of duplicate code.


This post is not very good.

DRY is a really solid principle (pun intended.) If you have to define the payload schema for an API yourself (i.e. the API provider doesn't supply a library for you) then you really should define that in one and only one place. It doesn't matter thst today you only want a "handful" of hard coded JSON dicts. That path quickly leads to so many headaches and run time errors.

Implementing the API schema in one and only one place means you limit the sources of bugs for all code that deals with it. You only have to test that functionality in the one place it is defined, if it changes you don't jave to chase down hard coded JSON all over your codebase, etc

This post basically amounts to "I want to write lazy bad code and DRY tells me not to."


I'm interested to know more about what you mean in this context by "define the payload schema for the API yourself". Can you provide an example?



If you have the same code copied to several places it makes it much harder to maintain. If there you find a bug in that code block then you have to fix it in several other places, and if you forget some, a bug that you thought you had already fixed might arise again.


I think DRY is one of the first pieces of advice that many programmers come across. Taking it to heart as a beginner it has the advantage of encouraging you to sometimes stop and give a bit more thought to what you are actually doing. Is this essentially the same as what I'm doing over there? How is it different? Why is it different? Etc...

As you gather experience you can recognise these patterns more easily and develop a stronger intuition for when you should pursue a DRY approach or just leave things as they are. You might even choose to make two things even more similar so that they feel more familiar (i.e. choosing to be even less DRY!)


Translation: I took DRY in isolation, never learned larger architectural concepts like SOLID, things that change together live together, loose coupling. Didn't really study up on code smells and their solutions. Didn't learn design patterns and the problems they are designed to solve. Then I ran into trouble and now I blame DRY.

The point of DRY is that when you need to change something, you should only have to change it once (a common code smell that comes about when not employing DRY is "Shotgun Surgery"). If the rest of your architecture is broken, DRY is not going to magically save it. That should be obvious.


Now. I am not a 10x developer.

But this concern here with writing a make_pizza() function:

> The problem is that these two pizzas just happen to have the same crust, sauce and cheese. Had we started out with two pizza types that have different crust/sauce/cheese, we never would have made this refactor.

You can solve for this by adding parameters with default values for all of those things. Use the defaults for the most common use cases, but of course for pizzas with different crusts and such, you can override the default.

Not all languages support default parameter values, it's true. And of course, there is a level of complexity at which this breaks down.


Like all principles, if you don't understand it, then you're going to misuse it, and then some people will wrongly blame it for being "over-rated" rather than blaming their own understanding of it.


My personal rule of thumb for this type of situation is to use the counting scheme “one, two, many” and to try to defer commonizing before you get to “many” instances of the repeating pattern.

It’s really easy to make assumptions about what you are going to need later that turn out to be completely unfounded (or even - years later there is no “many”, just the one or two usages you already have).

And I think folks shouldn’t freak out over a little bit of duplication, as long as it doesn’t get out of hand in the codebase, and you make sure to come back to refactor later when you do have many common usecases.


But wouldn't rejecting DRY (in a way) follow another over-rated programming principle called KISS? /s

In all fairness, the best programming principle is: just like with software licenses, know what to use and when.


I agree with the principle of this blogpost, though IMHO this was covered better and with better examples here: https://lbrito1.github.io/blog/2017/03/dont-obsess-over-code...

I also think the title of "don't obsess" covers the intent better. In other words, it's perfectly OK to write DRY code, but don't obsess over making all code DRY all the time at the expense of readability.


> Now we are talking about all kinds of fancy programming stuff to try to solve problems that only exist because we don't want to repeat the same 6 line snippet in a handful of different places because DRY tells us that's bad.

Introducing unnecessary complexity is, by definition, unnecessary. But we shouldn't be introducing complexity because DRY tells us. We should introduce complexity - some, but not more than needed - because some day a developer will know to update 2 of these 6 line snippets, but won't know about the third.


The best principle is KISS https://en.wikipedia.org/wiki/KISS_principle

> "Keep it simple, silly", "keep it short and simple", "keep it short and sweet", "keep it simple and straightforward", "keep it small and simple", "keep it simple, soldier", "keep it simple, sailor", or "keep it sweet and simple".

It's true that often, complexity is praised...


Do what's best for the situation, WET/DRY principles both have valid application use cases. They should be used to leverage the most advantageous effect for your usage.


I dunno, the only thing I came away with here is that the author either doesn’t know how to write decent code, or deliberately obfuscates the problem by writing obtuse stuff.


To help figure out when to DRY or leave wet i like to use the rule of 3. Where a piece of code needs to be repeated 3 times before it is DRYd up. You can adjust this number from 3 to whatever you would like.

What I really like about rule of 3, vs rule of 2, is that it allows more time to go by that may lead to the two pieces of code no longer being identical as requirements change. Which would either remove the need for the abstraction or allow for a more accurate abstraction.


> Which would either remove the need for the abstraction or allow for a more accurate abstraction.

or indicate a bug or nothing, because the differences aren't in logic, but e.g. in variable names.

^ perhaps voluntarily or, because happen to not find the duplication, or not search it in the first place.

The more time passes by the higher the risk.


Speaking from many painful experiences, DRY is underrated. Duplicate code is a major liability, and due to the natural entropy of code, duplicate sections will slowly drift apart over time.

Yes, sometimes you may discover that you prematurely DRY'ed the code, and it was just accidentally similar. Easy, you just un-DRY the code. This is a trivial operation. In an IDE it might be a single keyboard shortcut. Going the other way is a difficult and error prone process.


I'll claim the only universal truth in programming as in anything is: "moderation", and it's corollary, "there is no silver bullet". They are all rules of thumb, and knowing when to apply them is the most important aspect of rules.

In this case, unit tests are a very reasonable place to "repeat yourself", so you don't have to figure out what it's actually doing. Seeing the code all in place makes life easier.


DRY has the same idea as modules. Write once and re-use it.

This article is the result of someone exploring a new idea and cussing it out because he has to change his ways. I've been there countless times.

DRY is not overrated. DRY is a time saver in the long run. Why is this even on top of the 1st page, how new are you to development anyway?

Sorry but I'm really getting pissed off by people wasting my time with useless articles lately. It's getting out of hand.


Valid points, but it doesn't make DRY overrated. Just abused / taken too far if one is overeager. That's true of any principle if you're being dogmatic about it and start treating it as a goal in and of itself (rather than as means towards a goal). This, to me, fits the typical "X considered harmful" (headline) / "X considered harmful when done badly" (actual content) template.


I'm at most a 2x developer and?definitely have solutions in mind.

For one thing, pizza recipes are either dynamically built by the customer or they are just fixed recipes.

Having individual functions for different predefined pizzas is not really dry enough for me.

I would have a pizzas.yaml file, and a get_pizza_recipe("id"). Maybe I'd even read pizza data from an excel spreadsheet directly for ease of editing and sharing by management if needed.


It's worth knowing when it's useful to apply. Needlessly duplicating code also creates another kind of complexity: large surface areas of change that need to be updated in tandem. If you get changes where you have to change multiple places at once in the same way it's a good sign you need to do some refactoring before someone accidentally introduces an error into the program.


That's just justification for unnecessarily using microservices, which basically leads to 75% of your code being redundant plumbing. Microservices (correction: distributed systems) purposefully throw away DRY, small cohesive teams, developer ergonomics, and streamlined debugging, in favor of solving hard problems at scale, which most companies simply don't have.


DRY often leads to too much abstraction that while correct from the theory of software engineering, it increases complexity and maintainability costs.

That being said, I always allow myself to repeat code until a good enough patter emerges from that repetition, and then I refactor. Having the same code twice is not always sufficient to reveal what is the right refactor to do, if any.


The one and only important programming principle: use your brain.

Not every problem is the same and not every pattern could be used to solve every problem.


This true, but it is not very helpful advice for a novice programmer. DRY is popular because it is easy to follow. There are even tools which look for similar chunks of code in multiple places (and some companies put such checks in CI which IMO a bad idea).


DRY requires some judgment

If you have two features that have N parts in common, and you are certain that they will never diverge for feature-specific customization or special cases, then DRY is probably a good idea

If they may diverge at some point, then structuring the code in anticipation of that divergence is a good idea, else you end up with a messy DRY implementation that inevitably has to fork


The key skill is to distinguish between harmful repetition, harmless repetition and beneficial repetition.

Say you have two templates (web pages). They are conceptually independent and serve two different business purposes. Yet in terms of their structure/content/whichever, they have about 20% in common.

Somebody obsessed with DRY would now elevate that 20% into some reusable module, after which both templates use it and the repetition is gone. Feels clean.

In reality, you didn't solve a real problem whilst you created a new one. Now individuals/teams cannot independently edit these templates as they need to understand and check the dependency tree. It no longer is simple, there's no piece of mind.

Next, inevitably somebody is going to request changes impacting that 20% and before you know it and after cutting lots of corners, you end up with this freak component that changes output based on some flag.

It's taken me 20 years to come to this conclusion: the negative effects of (too much) DRY (it increases complexity) show up in every single project and make code harder to understand and change. Meanwhile, the negative effects of allowing (some) repetition are mostly theoretical and more often than not a benefit, not a negative.

I mean it. This is coming from an ex-DRY fan boy. The DRY principle makes us eager to connect dots that really aren't connected and shouldn't be connected.


For an excellent code base to see DRY in action, look at tinygrad: https://github.com/geohot/tinygrad

I believe it has a potential to be a great alternative to pytorch.

I love watching GeoHot's Twitch streams as he goes to the extreme to simplify the codebase, and the end result is amazing.


If I’m not mistaken the original point of dry is to not repeat data/state, rather than not repeating code.

Being dry about state is actually really useful (essential!), where state can be derived it should be, rather than stored as a new variable. Being dry about code is often less useful in my day to day coding, but I’m not a library designer I just make apps /2c


Looks like a perfect case for a builder pattern, that way you can support sensible defaults in just about every language.

default_pizza() .with_crust(Crust::Cheesy) .with_sauce(Sauce::Garlic) .add_topping(Topping::ExtraCheese) .cook()

I'm not going to weigh in on the DRY stuff because it's being discussed to death. I just liked thinking about how I would approach this problem.


There's nothing wrong with DRY. It's a concept not a law. I once saw some VB6 code that was 30k lines of copy pasted if/then statements. DRY would have reduced this to about 500 lines of highly readable code. Are there cases where you by design don't want to follow DRY yea... There are... But it's not a useless principal


> It's also probably one of the simplest principles to understand.

Turn‘s out - no it isn‘t!

I think DRY and KISS are probably the most important and most misunderstood principles by far. Why? Because they seem trivial at first sight, but really are not.

Not every repetition should be DRYed (dry those which pose a risk to integrity) and „simple“ is not the same as „easy“ or „familiar“!


The problem with DRY is that it doesn't tell you how many repetition is too many.

Here's my advice: don't refactor when you repeat yourself twice, refactor when repeat yourself _three_ times.

Having one chance to copy-paste before DRY'ing your code has been one of my most treasured coding tricks, it'll save you so much time and premature refactoring.


Agreed. Sometimes even three times isn’t enough. You’re being paid obscene amounts to make these judgement calls anyway.

If all your API does is return two pizza description jsons then keep it that way. If your client is a pizza delivery company and your api is supposed to allow definition and customization of pizza recipes, then you better take your ass to DRY town. Don’t blame the principle when you can’t understand what it is that you’re abstracting.

I have time and again gained enormous benefit by pursuing DRY principle to its absolute core. 20x speed and code complexity optimizations, making entire teams obsolete, etc. The most important point is to make sure that your abstractions absolutely match the fundamental principles of the concept you’re trying to represent. No matter how verbose you think it’s getting it’s totally worth it if this is your bread and butter.


"Sometimes duplicating things, either code or data, can significantly simplifies a system. DRY isn't absolute." - John Carmack

ref: https://twitter.com/id_aa_carmack/status/753745532619665408


DRY is mostly a good thing. What complicates stuff at companies is often coupling and dependencies though. Sometimes it is way faster and better to do a bit of copying just for the sake of removing coupling. This is largely due to the short comings of programming langs, packaging tools, CI/CD and source control.


> Having the meaning of the first argument change because you passed an optional second argument is very odd.

Is it, though?

And yes, I would totally just make it so the function detects whether I am passing in an object or an array of objects and respond accordingly.

I feel like I got this pattern from jQuery or something. Seems very normal for a good library.


No, the correct way is, write down some configuration, then with a little code to turn config into real pizza !


I think the underlying meta principle here is: Don't be dogmatic in your following of principles. Make sensible choices for the use case at hand which might be informed by the spirit of principles, but don't treat them like some biblical commandment that has to be applied at all time.


  The apprentice doesn't know about it. 
  The journeyman uses it dogmatically.
  The master uses it thoughtfully.


Misses the most important reason.

Repetition creates symmetrical cases. As they say in German, 'einmal ist keinmal' -- once is never. Overly DRY code is incredibly non-educational.

Why?

Because you learn by comparing and contrasting -- if you can't compare, you can't contrast, and therefore, you cannot learn.


One article advocating against unnecessary abstractions that I really appreciate and highly recommend is Dan Abramov's: https://overreacted.io/goodbye-clean-code/


To me, DRY, like Single Responsibility, is practical to a point, after which it doesn't support its weight. It strikes me as being similar to normalization in relational databases and summed up by "Normalize till it hurts, denormalize till it works".


> The problem is that these two pizzas just happen to have the same crust, sauce and cheese.

The problem with analogies is that they are often bad. Don't don't specify that you want tomato sauce and regular cheese when you order your pizza because that's the default.


New devs start with doing DRY everywhere. Over time they learn to be more thoughtful. Sometimes duplication is good. In my experience the priority should always be dev readability. If duplication helps you read the code better (as its not hidden away), thats fine.


On the other hand, I find most frameworks don't allow for DRY. The typical case is making a 40 char DB field, and then having to code a check in the payload to ensure field is 40 chars. I've wondered if any system has achieved such enlightenment.


I’ve had success with WET - write everything twice. If you copy paste something three times it’s a good candidate for abstraction. Ultimately these are just rules of thumb and none will fit all cases. I just found DRY to be too aggressive in practice.



In UI work I prefer copy-paste when there is no clear abstraction. So far this has paid off.


For me, a good way to evaluate whether you should combine them is to ask this question: If I need to change it on one location, do I need to do the same on all the location? If yes then it's a good candidate to abstract the code.


One of the dumbest blog posts I've ever read. Who is upvoting this stuff? If anything I see newer devs _not_ embracing DRY. Some of the codebases I've seen lately are hilariously bad, especially in frontend development.


Guy got upset during a code review and wrote a blog post about how he's right.


At our first start up, we always used to say "RY before you DRY", so: you have to repeat yourself first, consciously, before you can start with abstractions, because bad abstractions are worse than no abstraction


I think Kent C. Dodds gets to the goal behind the DRY principle fairly well here:

https://kentcdodds.com/blog/aha-programming


I tend to start DRYing up my code after five usages. I think it's a balance that has kept me away from the unnecessary complexity, although the Gateway drug is real- I catch myself wanting to DRY early now and then


A simple lesson on functional intent, variants and invariants, doesn’t need to be some overarching development management principle like DRY. Let’s just teach people how to write clean, readable, loosely-coupled code.


I view strict DRY (and only DRY) adherence asa sign of less experienced devs. I was like this in my early days, almost religiously following this creating wrong abstractions. Thanks a lot for those examples.


Premature generalisation is the second cousin of all evil.

The problem with DRY is that the cost of it being wrong in the future often isn’t accounted for.

Copy, paste, search, replace is underrated.

That said, it’s a balancing act. The right generalisations are great.


Search, replace it with a single call to unified method when there are more than enough repetitions in the code.

Probably people just call it 'Refactor'.

The requirement changes and you refactor it to meet requirement.

And that's all you need to do to avoid premature generalisation.


Ironically, the second example has a copy-and-paste bug:

  def make_pepperoni_pizza():
      make_pizza(["pepperoni"])
  
  def make_hawaiian_pizza():
      make_pizza(["pepperoni"])


Sometimes having repeated sections makes it easier to understand the code, and later figure out how to merge the logic to remove them.

Separating it out into a function can obfuscate things making it harder to do that.


I had a general guidelines in my previous company which I follow to this day to great results.

If a piece of code is duplicated thrice, it's ok. If a piece code is duplicated four times, then you must extract it.


Solve every problem once.

Two times repeated can be much worse than ten times repeated if one is a dizzying mess (e.g. to handhold some tragic third party API) and the other is a trivial sequence of instructions you'd understand even if dementia forced to read with a finger on the current line. That code wouldn't qualify as a problem so it isn't affected by the rule

(but you might still de-duplify if you know that if it changes it should change uniformly)


I must say, that mediocre article lead to very interesting comments.


Might be worth it to redo the examples here in Lisp (with and without point free) and see what happens.

Partial application might make the first example a little less ridiculous, for instance.


This article is fine, though the headline is misleading. In the end, the author still seems to believe that DRY should be the rule, not the exception, and I agree with that.


When I hear “DRY”, I think of Django with its Models, ModelForms, etc., not enterprise Java where every piece of functionality is hidden behind 15 levels of indirection.


A little copying is better than a little dependency - Rob Pike

DRY code is a good value, but it is not an all important value. It's one of many values, and must be kept in balance.


It's not the end-all, be-all but it is a generally good principle.

But yeah, 100% dry code that is also practical to maintain is also a fucking myth and not grounded in reality.


DRY is better for performance (cache efficiency). It’s also less work for the compiler. Those might not be concerns of someone writing pizza CRUD in python.


DRY is not necesarily better for performance. Loop unrolling is extremely un-dry and often provides better performance. DRY can also lead to more branches which can lead to branch predictor misses which can impact performance. For example: the author's update to make_pizza to handle split pizzas introduces a branch where previously the code would have been branchless.


The “branch” looks like an easy cmove target. Moreover, while it might not have a branch in the func itself, you will have to have one somewhere higher in the control flow anyway.

As for loop unrolling, I bet you a loop with unrolled calls to 2 different unDRY functions will be slower (and not only because of the most certainly present extra branches to select for them)


This article is just a rant against what the author sees as popular-but-wrong justifications for DRY, while failing to mention any of the good reasons. -1


Here is my take: if you run your code through gzip (or similar) and it gets any smaller, you've repeated yourself somewhere.

I'd rather read the gunzipped code.


Surely I'm not the only one to notice the first example refactoring is wrong, in a strangely ironic fashion too...(but too lazy to contact author)


Uhhhh... Anyone who says DRY is overrated has never worked on someone else's codebase that didn't follow attempt to reduce repetition.

What a waste of a click.



I feel like most of these articles would lose their click bait appeal if the title always included "when done wrong" at the end.


What does everyone think about this argument for functional languages especially with regards to parametricity in languages like Haskell?


Two is company. Three is a crowd.

If code is duplicated twice, I'm ok with that. If it's duplicated three times, then it's time to refactor.


Just set intelligent defaults with the ability to override with customizations. Bam. It's DRY and you don't lose flexibility.


DRY is mostly fine in code, but does not hold for data. There, inmutability and audit trails win over DRY, in my opinion.


A single source of truth is even more critical for data. It is the principle behind normalization.


The only programming principle you should be using is getting things done as fast as possible and shipping to production.


The "single responsibility principle" has caused a lot of damage as well. SOLID in general is somewhat dubious.


I like DRY. Do Repeat Yourself. You can always refactor later, once the requirement/change is fully implemented.


Repeating yourself 3 or 4 times is okay. After that, it is probably worth cleaning up.


Hm, I don’t think it’s about the number of repetitions. Even two can be too much if it’s the same logic, not just coincidentally. The same logic may not always be a code clone either. Maybe you need to generalize the code to remove the repetition.

The reverse is also true: If it’s not the same logic, it should not be deduplicated. Even if it is a code clone now. It will probably lead to unintended bugs down the line when that code changes.


The problem with dry is no one tells when when to not use it. It's great when you have 5 instances of the same string, much like oop is great when all your objects are animals that fit into a neat little category. It's not so great when you're drying code across feature boundaries. Features tend to diverge over time rather than converge so what's dry one day is garbage the next. You refactor the code to be dry so 3 features are now one function and someone comes along asking for an amendment to one of those 3 features which adds an edge case to your function. Now you've broken another feature because you didnt check if your new edge case would change other features but quality control didn't check the other features cause no one asked for that feature to be changed and how do they know that it's all one function under the hood. Now you have bugs in production and no one to cover your ass. Time goes on and you add more edge cases and now you have one function that does many things with special edge cases throughput it. You cant separate one feature without interacting with another, almost like you're interacting with strands of spagetti and you can't help but pick up a bunch when you only really wanted one noodle.

Tldr if you're making code dry and you insist that you make code dry across feature boundaries then for the love of god make unit tests for those functions. Or keep your functions dry and your features wet.


> DRY creates a presumption of reusability

I stopped reading there. DRY creates maintainability, not necessarily reusability.


I fondly remember the Ember.js docs.

They were awesome, but all the code examples where DRY to the max, it was quite funny.


TL/DR: It's a naive "sour grapes" type argument that doesn't take any kind of costs/tradeoffs into consideration...

Proper DRY at scale requires types that make sense and are easy to think about (you have to invent and document them even in dynamic langs ...that's why Typescript's a thing and so sucessful). You can't have DRY that doesn't slow you down and cause bugs without proper f types!

Eg. a sane solution to the authors' problem when the requirement for split pizza came would be:

- rename make_pizza(toppings: dict) to make_pizza_part(topings: dict)

- implement a new make_pizza(topping: list[dict]) calling make_pizza_part(toppings: dict) - and here you've change the type (important!), so you'll not miss any unchanged all calls to it, your tools will yell at you (ideally at build/compile/commit), or at worst at runtime but with an easy to interpret even from logs error

DRY is fine if done in the context of proper software engineering practices and tools.

Now being non-sloppy and following solid practices has a cost, and you might want to avoid it sometimes - in those cases do less DRY rather than crappy DRY!

(Whole languages are built around the OP's philosophy, eg. Go, but they are explicitly engineered to lower cost and defects in large corporate orgs! Randomly choosing to follow this in a project with 1-3 people of adequate skills and limited scope will just unnecessarily make that project have 4x the code, 4x the bugs, and 4x the cost for zero benefit.)


You should have called the blog post "DRY considered harmful" and go down in history.


I'd argue that over using inheritance has more dire effects than going full hog on dry.


DRY is about readability. If DRY makes the system more readable, do it. If not, don't.


DRY is about readability. If it makes the system more readable, DRY. If not, don't.


Reading the code I get a feeling author doesn't quite know how to write code.


,,To solve my sauce issue, maybe I could use an OOP style and have a PizzaOrderer class that can be subclassed for each pizza type, allowing each type to override sensible sauce/crust defaults.''

No, DRY doesn't mean that you should create classes just to prove your (invalid) point.


just treat things as suggestions. stop turning things into unmovable laws. dry - if you are constantly writing the 90% of something multiple times, maybe look to see if you can genericitize it.


I have wondered if we sometimes sacrifice readability for DRY.


there isn't a one-size-fit-all DRY. If there's much gain by DRY, then refactor it with a factory or something. Otherwise, keep it intuitive and stupid.


Not as overrated as this discussion is bike shedding.


By the title alone I would have to say I disagree


Tl;dr with the comments but I didn’t see “debugging” or “maintenance” showing up. Collecting things, which I assume is an aspect of DRY, makes it less risky - the change or fix can be applied once rather than hoping all of the instances in the code were addressed (correctly). Developer time is precious no matter how many of them you have. Given a sensible design, you can always tune hotspots. Can’t speed up debugging and brittle or unclear code means you’ll be doing more of it.

And if it’s “just going to be used once” who really care how it’s written other then “quickly and correctly”? And sadly, too many things aren’t just used once.


All things in porportion


Pizza Cost Optimization Dark Pattern Programming Example:

Here is the pizza cost optimizer from Pizzatool, written in object oriented NeWS PostScript, which checks all of the pre-defined base pizza styles and selects the "best" combination of style + extra toppings, ostensibly to save the user some money.

It's actually a dark pattern, because it's biased towards selecting higher level pizzas instead of the least expensive pizza. But at least the dark pattern is documented:

"Figure out the cost of the pizza, were we to order it as this style, and remember the style as the best match if it pleases us. The definition of pleasing us is biased towards matching higher level complex pizza styles, rather than economical lower level pizzas with extra toppings. This is the kick-back to Tony&Alba's for all that free beer."

The Story of Sun Microsystems PizzaTool How I accidentally ordered my first pizza over the internet:

https://medium.com/@donhopkins/the-story-of-sun-microsystems...

Tony and Alba's Pizza and Pasta, Mountain View:

https://www.yelp.com/biz/tony-and-albas-pizza-and-pasta-moun...

PizzaTool Source Code:

https://www.donhopkins.com/home/archive/NeWS/pizzatool.txt

    % Calculate the cost of this pizza.
    %
    /updatecost { % - => -
      10 dict begin % localdict
        /TheBest /defaultstyle ClassStyle send def
        /TheStyle null def
        /TheTopping null def
        /TheBestCost 99 def
        /TheBestExtras 0 def

        % For each and every pizza style in the universe:
        /styles ClassStyle send { % forall:               % style
          /TheStyle exch def                              %

          % Ask this style for its list of standard toppings.
          /TheToppings /toppings TheStyle send def

          % Is every topping from this style on our pizza?
          true                                            % true
          TheToppings { % forall:                         % true topping
            Toppings exch arraycontains? not { % if:      % true
              % Oops, this topping's not on the pizza. No dice.
              pop false exit                              % false
            } if                                          % true
          } forall                                        % true|false

          { % if: all the toppings of the style were on our pizza:
                                                          %
            % Make an array of our pizza toppings that aren't in the style.
            /ExtraToppings [
              Toppings {                                  % ... topping
                % Is this topping included in the style? Then toss it.
                TheToppings 1 index arraycontains? {      % ... topping
                  pop                                     % ...
                } if
              } forall
            ] store                                       %

            % Figure out the cost of the pizza,
            % were we to order it as this style,
            % and remember the style as the best match if it pleases us.
            % The definition of pleasing us is biased towards matching
            % higher level complex pizza styles, rather than economical
            % lower level pizzas with extra toppings.
            % This is the kick-back to Tony&Alba's for all that free beer. 
            PizzaSize /pizzasizeindex self send           % sizeindex
            ExtraToppings length                          % sizeindex extras
            /extraprice TheStyle send                     % $
            dup                                           % $ $
            ExtraToppings length                          % $ extras
            /extras TheStyle send sub                     % $ $ extras'
            1 le { .9 mul } if                            % $ biased$
            TheBestCost le { % ifelse:                    % $
              % Hey this is the best match so far, let's not forget it!
              /TheBestCost exch store                     %
              /TheBest TheStyle store
              /TheBestExtras
                ExtraToppings length /extras TheBest send sub
              store
            } { pop } ifelse                              %
          } if                                            %
        } forall                                          %

        % Set the window footers of the pizza topping panel.
        % The left footer displays the name of the pizza style,
        % and the right footer displays a message
        % telling the user to choose more toppings,
        % or the number of extra toppings,
        % or nothing at all.
        TheBestExtras dup 0 lt { % ifelse:                % extras
          neg dup 1 eq { () } { (s) } ifelse              % extras (plural?)
          exch (Choose % more topping%!) sprintf          % (message)
        } { % else:                                       % extras
          dup 0 ne { % ifelse:
            dup 1 eq { () } { (s) } ifelse                % extras (plural?)
            exch (With % extra topping%.) sprintf         % (message)
          } { % else:                                     % extras
            pop nullstring                                % ()
          } ifelse
        } ifelse                                          % (left footer)
        /name TheBest send exch                           % (left) (right)
        /setfooter ToppingWindow send                     %

        % Remember the price of this pizza in dollars rounded to cents,
        % and calculate its string value. 
        TheBestCost                                       % $
        Fraction mul
        100 mul round 100 div
        /Price 1 index store
        dup 100 mul round cvi 100 mod                     % $ cents
        exch floor cvi                                    % cents dollars
        1 index 10 lt { (%.0%) } { (%.%) } ifelse         % cents dollars fmt
        sprintf                                           % (price)

        % Set the value of the costfield and totalfield labels to
        % the price string.
        dup /setvalue costfield send
        /setvalue totalfield send                         %

        % Set the value of the stylevalue label to the name of the best style,
        % and set the stylemenu value to the index of that name in the list of
        % pizza styles. (The stylemenu is an exclusive settings menu.)
        /name TheBest send                                % name
        dup /setvalue stylevalue send
        PizzaStyleNames exch arrayindex {                 % index
            [exch] /setvalue stylemenu send               %
        } if                                              %

        % Remember the best match pizza style.
        /Style TheBest store

      end % localdict
    } def


what about early returns?


WRONG. OOP is the most over rated


just build it


The DRY example is better though. The payload is an object. When you have multiple objects of the same shape you have a class of objects. Menu items could be loaded from a JSON source. Separation of concerns and duplication is removed from the code.


I wouldn't recommend naming your function with a suffix that relates to a model. You are better off doing:

def generate_payload(crust, sauce, cheese, toppings):




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: