DRY Is a Trade-Off

shhsshs · on Dec 18, 2020

Someone else’s comment [1] I saved from an older post also about DRY

I’ve usually heard this phenomenon called “incidental duplication,” and it’s something I find myself teaching junior engineers about quite often.

There are a lot of situations where 3-5 lines of many methods follow basically the same pattern, and it can be aggravating to look at. “Don’t repeat yourself!” Right?

So you try to extract that boilerplate into a method, and it’s fine until the very next change. Then you need to start passing options and configuration into your helper method... and before long your helper method is extremely difficult to reason about, because it’s actually handling a dozen cases that are superficially similar but full of important differences in the details.

I encourage my devs to follow a rule of thumb: don’t extract repetitive code right away, try and build the feature you’re working on with the duplication in place first. Let the code go through a few evolutions and waves of change. Then one of two things are likely to happen:

(1) you find that the code doesn’t look so repetitive anymore,

or, (2) you hit a bug where you needed to make the same change to the boilerplate in six places and you missed one.

In scenario 1, you can sigh and say “yeah it turned out to be incidental duplication, it’s not bothering me anymore.” In scenario 2, it’s probably time for a careful refactoring to pull out the bits that have proven to be identical (and, importantly, must be identical across all of the instances of the code).

[1] https://news.ycombinator.com/reply?id=22022603&goto=item%3Fi...

leni536 · on Dec 18, 2020

Rule of thumb: If you can't give the thing a name then maybe don't extract it. What you extract becomes an abstraction. No abstraction is better than a bad abstraction.

hi_hello · on Dec 18, 2020

As a chronic over-generalizer, you have no idea how much time I've wasted trying to think of the right name for meta-monstrosities.

kylegill · on Dec 18, 2020

My friend told me at their company they'd commonly convene the "variable naming committee" for such occasions, and I can't help but think of it every time I find myself in the same place.

rob74 · on Dec 18, 2020

A variable naming committee might seem exaggerated, but I've seen far too many variable/method/class names already that are wrong, misleading or at best misspelled, so some more thoughtfulness is definitely warranted...

BlargMcLarg · on Dec 19, 2020

Unfortunately, a lot of people who believe to be "thoughtful" have caused a lot of 20+ character names, which still need other names compounded on top, when they could have used 10 character names that explain things just fine.

The developers I met generally weren't that great at narrating themselves, regardless of seniority. Narrative skills are woefully undervalued, and they aren't solved by a set of hard, scientific rules just yet. I pray I'm a unique example in experiencing this, but I doubt it.

Geminidog · on Dec 19, 2020

The bigger tragedy is the illogical need for programmers to come up with "elegant" names. A 20 character name doesn't do any damage if it communicates the correct point. Neither does a 10 character name that also communicates the same point.

Why does a developer favor the 10 character name over the 20 character name when both do the exact same thing? Is the goal to save memory? What is the point? There is no point.

It is a subconscious bias that makes programmers want to give things elegant names over clear names. There is no harm in creating a 40 character name that is ugly.

   def find_xy_coordinate_of_dogs_cats_and_baboons_in_picture(picture: Picture) -> List[XYCoordinates]:
          #there is NOTHING wrong with this function name.

It baffles me to no end why humans have a tendency to turn the above for no clear reason into:

   def imgrecFindAnimal(p: Pict) -> List[vectxy]:

Beauty and elegance in code belongs in structure not naming. Clarity belongs in naming not structure (Golang is the antithesis of this). When both are unionized perfectly you get elegant code that does not sacrifice clarity.

A really good example of this is a function that encapsulates a complex regular expression. That regex is all but unreadable but you can embed an entire comment/description into the function name. Seriously write a grammatically correct sentence and make it a function name, there is no reason why this is bad... was there a more elegant name that you could have came up with??? Who cares. No harm done with your huge name other than burning the eyes of your inner OCD.

Except of course if you don't have auto complete. Then I can see how it's annoying for you to type out a whole sentence when you just want to call a function.

tomxor · on Dec 19, 2020

> A 20 character name doesn't do any damage if it communicates the correct point.

That depends on the context. If the function is the highest level task called infrequently (as in your example), then long highly descriptive names are completely fine.

If this occurs at every level, all the way down to the building blocks it absolutely, severely affects legibility of the entire code base - it is literally a multiplier of code size, in the worst case of "all the way down" it's some kind of power function.

This barely fits on a line:

  (a leftmultipliedbyright (a leftsubtractedbyright 1)) leftdividedbyright 2

Yet it's a simple polynomial that should just be a(a-1)/2, probably part of a larger expression, now the other parts will end up on other lines (because no one writes 500 char width code), the effect is artificially spreading code thinly - this destroys locality and legibility.

You would be right to point out my example is extreme and absurd, however operators are functions, they only use different, implicit syntax. Many intrinsically complex pieces of code must create their own domain specific building blocks at a slightly higher level of abstraction that are much like operators, and this is the place for extremely short function names (think vector libraries), as such a commonly used building block it is unreasonable to expect each reference to fully and explicitly express the functions purpose.

As with all of these types of things, there is a balance, I am arguing _for_ balance, not suggesting all names should be single letter or single word - but that they have their place. However in my experience very long names are far more commonly due to thoughtlessness, they include excessive redundant context and at worst even grammatical words.

MrPatan · on Dec 19, 2020

You are right in everything you say, and I'm just replying to your comment because it seems like a good place to say my piece:

Why do we (devs) know that one-letter variables are bad, but still don't know that one-letter generic types are bad?

Why do we put up with this (random function from rxjs):

    export function concatMap<T, R, O extends ObservableInput<any>>(
      project: (value: T, index: number) => O,
      resultSelector?: (outerValue: T, innerValue: ObservedValueOf<O>, outerIndex: number, innerIndex: number) => R
    ): OperatorFunction<T, ObservedValueOf<O> | R> {
      return isFunction(resultSelector) ? mergeMap(project, resultSelector, 1) : mergeMap(project, 1);
    }

Why T and not INPUT_TYPE or whatever it is (see? I can't tell!)?

Aeolun · on Dec 19, 2020

Wait, you can use more than 1 character for generic types?!

My whole life is a lie!

rShergold · on Dec 19, 2020

Bit late to the party but I really enjoyed this comment and the subsequent conversation. I was given some advice when I was junior and I've been repeating it for years.

Code it read 100 times more than it is written/edited. Writing in a high level language is writing for a human first and a computer second.

As a general rule of thumb a variable name should be as big as the scope of that variable.

* If it's scope is one line it's okay to use a single letter.

    deletedDocuments = documents.find(d => d.deleted)

* If it's within a block normally one or two words will be fine.

* If it's one file, 3 or 4 words.

* If it's global it should read like the opening paragraph to war and peace.

The last two are generally indications that something has gone wrong with how you are encapsulating your code and you should consider a refactor. However you will often have no other option in which case always lean towards more descriptive not less.

akavel · on Dec 19, 2020

For the same reason some non-programmers seem to also value conciseness:

https://quoteinvestigator.com/2012/04/28/shorter-letter/

I think it may boil down to highly respecting the readers' time. If something conveys the same information to them but is shorter, it will appropriate less of the precious limited time of their lives. Notably this then becomes a subtle balancing act of estimating their knowledge and intelligence: make it too short and some implied context may be lost and require extra effort to research from them. An extreme example is science papers - the same paper can be clear and concise for you if you are an expert in the domain (the usually assumed audience), or an overwhelming effort if you're not.

Geminidog · on Dec 19, 2020

I agree with this in general.

For programming, specifically, though, I feel the typical style used in programming straddles the line where the brevity hits a point of obscurity that actually leads to more time spent trying to decipher meaning.

I would say the time that is lost to deciphering meaning is much much more detrimental then time lost to parsing over-verbose words by a very large margin. Thus it's better to err on the side of longer names in programming until the verbosity is equal to the English language. I mean nobody complains about the English language being way too verbose, so why not bring programming up to the same level of clarity and verbosity?

Better to over communicate so they say.

akavel · on Dec 19, 2020

That's an interesting argument. Personally, I don't think I agree, i.e. I feel very differently (though e.g. typical Haskell is an example of being too dense for me too). But I can't currently capture the feeling in more concrete words. That said, if we're both now speaking about what we feel, did I manage to at least succesfully counter your argument about this being illogical? ;)

One question came to my mind that I'm curious what's your take on, from the point of view you present: what's your opinion on notations such as: numbers (i.e. 123 vs. English language: hundred twenty three), and chemical formulas (e.g. H2O vs. English language: particle of water)?

Geminidog · on Dec 19, 2020

>But I can't currently capture the feeling in more concrete words.

I mean if you want something more quantitative: Count the amount of posts in this thread that were communicated with a programming language as the primary mode of information transfer versus the amount of posts that used English instead.

Because the usage of Verbose and wordy English exceeds the usage of code one can conclude that people prefer the general wordy nature of English over the conciseness of code.

Due to this, it makes sense to make your code as close to verbosity as English as possible. I can read a novel or magazine almost passively, the same cannot be said of code.

>One question came to my mind that I'm curious what's your take on, from the point of view you present: what's your opinion on notations such as: numbers (i.e. 123 vs. English language: hundred twenty three), and chemical formulas (e.g. H2O vs. English language: particle of water)?

Whatever makes sense. 123 and one hundred and twenty three basically communicate the same thing. Humans as a whole prefer 123. There's no lapse in communication using either method. "One hundred and twenty three" takes a bit longer to read but no crime was committed. 123 doesn't lack any clarity for a typical human being either.

Perhaps I would prefer 123 as Arabic numerals are more universal globally then English.

For chemical formulas, H2O is used for balancing equations. It's specific notation for an algebra of using symbols to derive an exact conclusion, it is less a form of communication and more a system of symbols used for solving problems.

H2O, specifically, however, has culturally entered English nomenclature as a well known concept so it can be used in naming. There is however a slight potential for confusion so ultimate clarity makes more sense to me here. There is zero ambiguity with "particle of water" or even "H2O molecule" and thus my preference is to use English if the goal is communication.

The thing that you have trouble putting into words here is H20 is an elegant symbol that communicates the exact same concept then the uglier "particle of water." Humans are instinctively reacting to an aesthetic issue not a practical one. Again no crime is committed when someone uses the variable name "particle_of_water" in a programming language over "H2O."

I didn't go into it in other arguments but I will get into it here because you've noted to me that you have identified this "feeling" and you acknowledge the contradiction between English and coding.

The reason why the contradiction exists is purely an aesthetic issue. It's a response to what we consider categorically to be "ugly" and "beautiful" and has nothing to do with practicality. When viewed this way the contradiction between English and programming languages makes sense:

We define bad grammar in English as "ugly" but we also have a separate aesthetic sense for poetically naming things. This comes from beyond just programming. For example humans generally find "the White house" to be a better and more poetic name for "The place where the president of the United States" resides.

There's actually two separate modules in your brain here that are odds here. Bad english grammar is clearly triggering the language module in your brain, but at the same time your brain has a poetic naming module that prefers "The White house" over "presidents place of residence" and this module is being triggered when you program or write poetry. You can only begin to see this when I point out the logical contradiction.

Both of modules in your brain are bypassing the neocortex of the brain where people conduct higher order logic. It's actually very hard to realize this if it's not pointed out as people often mistake these feelings for being something that arose from their own internal higher order logic. The whole point of my writing take all of these modules in your brain and place them at odds so you can identify the origin of each and give your neocortex executive control. Anyway here are the three modules that are getting triggered:

  - Bad english grammar triggers the language module in your brain. 
  - Verbose naming in code triggers the poetry module in your brain.
  - The logical contradiction between the two modules above when identified through meta analysis triggers your neocortex.

Now that you know, you can step above it all. You can override instinctual your emotions and use your higher order logic to come to the correct conclusion.

To go a bit off on an tangent here there is in fact a morality module in your brain as well! What most people consider to be good and evil is actually instinctual emotions triggered by this module! Again similar to the main topic, most people don't realize this and believe their morality is built around logic when in actuality people are all just building a logical scaffold to justify a pre-existing instinct. Think about it... same with the whole whole poetic naming instinct we have, morality actually starts out as a feeling before we begin to logically justify it.

Believe it or not using the exact same method of pointing out logical contradictions I can actually prove to you that morality is in fact an instinctive module within your brain. It's off topic though, and I've digressed to much, so I won't get into that here.

IncRnd · on Dec 19, 2020

You wrote 295 words to state then restate a single point - that you don't see an issue with verbose naming.

You demonstrated the equal importance of correctness, completeness, and also _brevity_.

Geminidog · on Dec 19, 2020

No there's a difference. :) I made a point AND proved my point by showing a contradiction in human logic.

While I agree that the length of my argument made you not actually read it at all and miss the entire point, proving a point with illustrative examples does necessitate such verbosity.

What you're doing is stating a point without proving it and claiming that my own argument is self defeating with no explanation.

Why don't you try proving your point and also countering my proof while being concise at the same time? Because right now you just stated a point with nothing. You're stating in a single sentence the world is flat after I Proved it's round. Ok... So what? prove it.

BlargMcLarg · on Dec 19, 2020

They proved their point already. You took much longer and much fancier wording to say what could be said in a few lines, without losing any meaning. Verbosity for the sake of verbosity.

This is what I mean. Using long names compounds not only on itself. It compounds onto the entire codebase. Unless you blackbox the code, make it 100% bug free and it doesn't require changes for future features, that code will be read. Reading takes a lot of time, but worse: it takes far longer for someone to process a much larger cognitive load. This is especially dangerous in huge codebases that need to be changed on a regular basis, usually retorted by "it takes time to get in the swing of things".

We have better things to do in life. Respect the person who will read the code. Be concise.

Geminidog · on Dec 19, 2020

In all my arguments I'm saying that this logic you present doesn't apply to English. Nothing was proven because my points weren't addressed.

You're not respecting my time with your grammatically correct comment above. You can shorten your comment by mangling the grammar and preserving the meaning.

>We have better things to do in life. Respect the person who will read the code. Be concise

See that sentence it's wasting my precious time you can get rid of a lot of unnecessary info and preserve meaning.

>We have better thing do. Respect reader. Be concise.

There. same point but more concise but now you sound as if you have brain damage. My point is we use programming languages and english to communicate a point, but clearly in english nobody takes any effort to respect anyone's time. It's full of wordy unnecessary stuff and the entire population of english speakers actually prefers reading this very verbose english then reading obscure code.

I am saying because of this contradiction all your logic flies out the door.

Bring the level of verbosity of code to the level of verbosity in english. We don't complain about english, we actually prefer it over code. So clearly nobody is actually caring about 'saving' those precious seconds of reading long grammatically correct sentences. Who cares if someone uses it as a function name.

BlargMcLarg · on Dec 19, 2020

You spent so many words, yet you still side-step the main argument, and then further ridicule my argument by taking it out of context.

These are different contexts. I do not wish to read a 300 page manual when it can be described in 2 pages, similarly to not wanting to scan 10 pages worth of code that has hundreds of thousands of code I may be expected to have to look at. I require both enough energy and insight to solve the problem after reading.

This is a discussion forum. We write differently here. Information sharing is not our prime objective, unlike writing code.

>I am saying because of this contradiction all your logic flies out the door.

Contradiction solved. Now, will you answer or continue to side-step?

Geminidog · on Dec 19, 2020

> You spent so many words, yet you still side-step the main argument, and then further ridicule my argument by taking it out of context.

Nothing is being ridiculed here nor taken out of context. This is simply a misunderstanding by you.

As for the side stepping... It's a matter of perspective. From my perspective you are the one side stepping because you didn't even bring up the contradiction. So I'll regurgitate your words right back you were side stepping the contradiction thank you for finally addressing the issue.

>These are different contexts. I do not wish to read a 300 page manual when it can be described in 2 pages, similarly to not wanting to scan 10 pages worth of code that has hundreds of thousands of code I may be expected to have to look at. I require both enough energy and insight to solve the problem after reading.

Let's frame the context here so that we both agree. The context is to communicate a concept to a reader WITHOUT being verbose. Code and English both live within this context because you use code to communicate to other programmers and you're using English to communicate to me Right Now.

We can also agree that BOTH code and english can be over verbose.

Context Established.

>This is a discussion forum. We write differently here. Information sharing is not our prime objective, unlike writing code.

Information sharing is the prime objective Of all written and verbal forms of communication. You need to understand this. There is zero point of writing anything if it is not communicating a point. Any form of communication is a form of information sharing, and english being a medium of communication which makes it a medium for information sharing.

Have you ever heard of "Documentation"? Documentation communicates the EXACT same information that code does but better because it's in English and more verbose. That's why Documentation often exists along side code. One can derive code from documentation and documentation from code.

>Contradiction solved. Now, will you answer or continue to side-step?

Contradiction not solved. You still need to address it. English and programming occupy the same context with English actually being used within programming. Think about it, Programming is basically a shitty version of english that's only used because a computer understands it. Nobody would be programming otherwise. If you can program a computer efficiently using English I guarantee you traditional programming languages will be thrown out the door within a day, nobody will use it anymore.

So the context that programming occupies is two fold. It occupies the same context as English to communicate with other people, and at the same time it also has to communicate with a computer. That is the prime difference. So why do we have reduce naming to some poetry contest when you can write a fully grammatically correct and clear sentence as function name and call it a day? We don't do it in English why can't we stop doing it in programming? The contradiction is still there and still invalidates all your points.

There is no logical reason why we shouldn't bring programming up the same level of clarity and verbosity as English. The only thing stopping us are the technical limitations of the computer, so we should get as close as possible with what we currently have.

GuB-42 · on Dec 19, 2020

> find_xy_coordinate_of_dogs_cats_and_baboons_in_picture(picture: Picture) -> List[XYCoordinates]

It is redundant. It doesn't need the "xy_coordinate" because that is the return type. Furthermore it is wrong, it should be "xy_coordinates", with an "s". Or it shouldn't return a list.

The "in_picture" part is also redundant, it takes a Picture argument so why mention it?

Note: these arguments depend on whether you language supports polymorphism or not. In C for instance, you often have no choice.

The "dogs_cats_and_baboons" part is fine as long as it really is what you are looking for. If your intent is to find any animal and you implementation only finds dogs cats and baboons now, then you should call it "animals" with maybe a comment clarifying that point.

The problem with long fonction names is that they produce long lines. Long lines are terrible. Not as bad as they used to, thanks to large, wide screens but still, I hate having my editor window unnecessary large or have a horizontal scroll bar.

minot · on Dec 19, 2020

I think the point is that you should use Longer names where possible.

Powershell’s Set-Location is better than shell’s cd.

Invoke-Webrequest is better than curl.

npm install —-global is better than npm i -g.

There was a need for names to be short forty years ago. We don’t have that anymore.

You might know that you’re dealing with x, y coordinates but how do I know there is no z?

Senior developers tell me to fix the problem in the correct spot. We can have aliases but the default should be long names.

bjohnson225 · on Dec 19, 2020

For maximum readability you want function names which are descriptive, but concise. My personal pet hate is when people make it concise by using acronyms and I’m just left wondering what the hell it stands for.

Personally, I find the ‘in_picture’ suffix superfluous as it’s clear from the input parameter what you’re finding the animal in, but otherwise find it a good name.

ericb · on Dec 19, 2020

> My personal pet hate is when people make it concise by using acronyms and I’m just left wondering what the hell it stands for.

I agree. I can find, and understand reducePermissionLevel, but reducePermLvl is unguessable, and not searchable because abbreviations are arbitrary. Never abbreviating provides a predictable scheme.

mercora · on Dec 19, 2020

i like clear and concise naming and don't care too much about how long it is generally if it helps understand what it does without being extraneously verbose. However, i think there is an argument to be made about how long a single line of code should be before it becomes too hard to read. The example given would be too long for me and i would try to shorten it. In this case probably by implementing some abstraction ;)

Geminidog · on Dec 19, 2020

You claim there is an argument. But you don't actually state your argument.

My claim is that you think there is an argument, but there really isn't. That function name can do no real damage to the clarity or structure of the code. You only wish to shorten it because of OCD.

The logic here is easily illustrated if I rewrote that function name in English:

If my goal was to communicate this to you:

   A function that finds the x y coordinates of dogs cats and baboons in a picture.

This is perfectly ok, but only because it's english. If I tried to write the English as if it was a function.

   func xy_babboon_cat_dog_detector

The above doesn't fly in the English language. But it only works in programming. This is contradictory logic.

The question is, if both programming and English are both mediums used for communication why do they both have contradictory styles?

The reason is because there is no reason behind it. It's the same reason why people in Japan still use fax machines. Habit and typical human irrationality.

When you peel away the layers of your bias you will realize that this contradiction exists because the level of verbosity of my function doesn't actually matter. It doesn't matter in English, and therefore it doesn't really matter in programming.

Seriously, didn't you find it strange that you made your point without even stating what your argument was? Typically if you had a clear reason you would give it, if you had examples you would show it, instead you just said an argument existed probably never realizing what that argument actually was.

It's not just you. The other commentor just reiterated some points without trying to prove any argument. English has a preferred communication style that is contradictory to the preferred programming communication style even though both mediums can go back and forth between either style without any clear difference.

If you self reflect about it, your desire to make my function concise arises more from a feeling and a "emotion." You did not logically deduce your point using evidence... rather you just felt that it needed shortening and that it looks "ugly."

Then when I questioned that logic, your mind, without realizing it, began building a logical scaffold around the feeling to support the desire with some rational framework. Such is human nature, and this type of thing happens for all kinds of strange human biases that we posses. Religion, no doubt, is a similar bias... when questioned the religious persons' brain will go through the exact same process that your brain did upon seeing that ugly wordy function name.

The question is, by going meta and describing the situation in this way would that help you take a step up above that bias? Or will you continue to build that logical scaffold and try to justify your strange desire to make the function more concise for no reason?

Think about this before you reply... did you already honestly have an argument that justified your point? or are you building one right now to respond to me?

This is literally as close as I can get to what I'm talking about. There's this strange bias that every human (including me) has when they first learn programming to write concise "elegant" names for no real purpose. It's so strong that sometimes a normal argument can't help the other party reach an epiphany. Hopefully by going meta I can help better illustrate what I'm referring to.

simo7 · on Dec 19, 2020

The problem with your analogy to English is that you're comparing the wrong things.

A function is a "thing", you should rather compare it to an entity in the English language.

Why do we say washing machine instead of a home appliance powered by electricity used to wash laundry through the use of centrifugal force?

When naming things you need to balance descriptiveness with conciseness, a name is not a definition.

As a rule of thumb, avoid specifying things that can be easily guessed, especially if they are right there in your function signature (!):

     def find_xy_coordinate_of_dogs_cats_and_baboons_in_picture(picture: Picture) -> List[XYCoordinates]:

I'd refactor it to:

    def find_animals_in_pic(picture: Picture) -> List[XYCoordinates]:

Geminidog · on Dec 19, 2020

>A function is a "thing", you should rather compare it to an entity in the English language.

It's more a verb that does an action on a noun. You can't express the concept with just a word.

   add one to number

>Why do we say washing machine instead of a home appliance powered by electricity used to wash laundry through the use of centrifugal force?

Obviously there's no need to describe the plumbing behind the machine just the purpose of the machine is good enough. Everyone understands what a washing machine does as it's culturally a part of our language. If I lived in a civilization without washing machines and I had to implement it as a variable name in code I would call it "clothes_washing_machine" because of added clarity and no actual harm done with the extra word.

>When naming things you need to balance descriptiveness with conciseness, a name is not a definition.

Sure and I'm saying most people are wrong about where this "balance" actually lies. People place too much emphasis on conciseness.

>As a rule of thumb, avoid specifying things that can be easily guessed, especially if they are right there in your function signature (!):

I don't like readers to do any guessing at all. It's wrong to make assumptions that a guess that comes easily to me will come easily to my audience such is the nature of documentation and documentation as code. I want people to read my code like they read english. But that's just my opinion.

    def find_animals_in_pic(picture: Picture) -> List[XYCoordinates]:

I mean what's the benefit of shortening this. I'm looking at this function and I'm feeling nothing happened. You just did extra work.

If I want to be nit, XYCoordinates should not be plural, List should contain many Types each called XYCoodinate. XYCoordinate(s) is better used as an alias for the entire list.

Additionally my function only operates on cats, dogs and baboons. It does not operate on all animals. Someone can mistakenly use this function to try to find lions and tigers and bears Oh my!

But that's besides my point. You attempted to shorten my function name while preserving meaning for which I totally understand the point you were trying to convey. What I'm saying is your changes are logically negligible. They do nothing to improve understanding of the program while trivially shortening things. Nothing practical occurred here. It's simply the scratching of an OCD need for more poetic names.

simo7 · on Dec 19, 2020

NB: I'm answering to the claim that we behave differently when naming things in programming languages vs natural languages. Otherwise I think it's mostly a matter of preferences.

I understand you prefer to be more detailed in your naming, that's fine, but in natural languages your names would sound unusual/verbose as much as they do sound unusual/verbose in Python.

You say everybody understands what a "washing machine" is therefore a short name.

Are you saying that when washing machines were still a novelty they should have been called "clothes-washing machines" instead? Unusual naming right? People do seem to have a distaste for long and overly-detailed names in spoken languages as well, don't you think?

And what's the point of a dictionary if names embed a full definition that leaves nothing to be guessed?

> Sure and I'm saying most people are wrong about where this "balance" actually lies. People place too much emphasis on conciseness.

Where to draw the line can be a matter of preferences, no intention of debating that, but people do tend to draw the line the same way whether they speak English or Python. No incoherent behaviour there.

Geminidog · on Dec 20, 2020

> NB: I'm answering to the claim that we behave differently when naming things in programming languages vs natural languages. Otherwise I think it's mostly a matter of preferences.

And I am saying this behavior is attributed to an irrational instinct. There is no practical logic to it even though are instincts push us to behave this way.

>I understand you prefer to be more detailed in your naming, that's fine, but in natural languages your names would sound unusual/verbose as much as they do sound unusual/verbose in Python.

The purpose of an action is to serve practical purpose. Something sounding unusual has nothing to do with whether the associated action was practical or impractical.

A name may sound unusual in python and suddenly sound perfectly fine in english. How this name sounds has nothing to do with the actual practical significance of the name.

If the name is informative then it is practical. Place that name in python, place it in english. How you feel about the name and how you think it sounds is irrelevant to your purpose of practicality.

The practical goal here is maximum clarity with zero ambiguity.

Your instincts and feelings are lying to you. You are subconsciously reacting to a purely aesthetic attribute. A poetic and elegant name does not serve an actual purpose. Only an informative name serves an actual practical purpose of being informative.

We program to make things work, not to come up with function names that are poetic/brief/unreadable. An aesthetically pleasing name does assist us in achieving the actual goal of our program but an informative name does.

>You say everybody understands what a "washing machine" is therefore a short name.

I'm saying anyone in our culture who speaks english.

>Are you saying that when washing machines were still a novelty they should have been called "clothes-washing machines" instead? Unusual naming right? People do seem to have a distaste for long and overly-detailed names in spoken languages as well, don't you think?

I'm saying in a hypothetical culture where we didn't have context on what a "washing machine" was "clothes-washing-machine" would properly communicate the intent and meaning about what that machine actually does. I am able to throw away any preconceived biases I have and not assume that a machine that washes things only washes clothes. Keep in mind I prefixed my entire point with a hypothetical culture that didn't know about "washing machines"... you seemed to have missed the fact that I did that.

>And what's the point of a dictionary if names embed a full definition that leaves nothing to be guessed?

There would be no point to a dictionary. But clearly the things we define in most functions aren't defined in the dictionary so rather then make up names no one can understand you can combine english words that everyone understands into sentences and phrases and use those to Name your functions.

>Where to draw the line can be a matter of preferences, no intention of debating that, but people do tend to draw the line the same way whether they speak English or Python. No incoherent behavior there.

Everything is a matter of preferences. Even believing in 1+1=1 is a preference you can choose to believe in.

I am saying in terms of of the set of attributes people hold to qualitatively describe whether or not something is practical, most people mistakenly believe that poetic and terse function names possesses the very same attributes they hold as "practical." TLDR: I am saying once people understand my point, most peoples preference are in full alignment with my preference.

I am saying when you ignore your inner OCD, you will see that aesthetic/poetic/elegant naming serves zero practical purpose and short and brief names have negligible practicality when compared with long names.

Thus a slightly longer function name that is ugly but very very informative and similar in grammar to the english language is the most practical and logical way to name your functions. It doesn't matter how "unusual/verbose" you feel that it looks/sounds as that feeling is orthogonal to the logical purpose of your naming in programming: to communicate and inform.

See past your bias and ignore pointless aesthetics.

simo7 · on Dec 20, 2020

You're mostly arguing on why your naming preferences are better. You're missing the point, I'm not addressing that.

Instead you seem to agree people name things in the same (irrational according to you) way both in English and Python. Which is exactly my point and what you previously claimed not to be the case.

Once again I am addressing this comment:

    A function that finds the x y coordinates of dogs cats and baboons in a picture.

> This is perfectly ok, but only because it's english.

No, it's not ok only because it's English. It's ok because it's a definition, it is not a name. In English as in Python we tend to prefer more concise names.

A function's name is...a name (duh), the comparison to prose makes no sense. Once you actually compare English and Python names you'll see they both tend to be more concise.

> See past your bias and ignore pointless aesthetics.

Ironically I find your style more poetic (we really have opposite tastes :P). But as you saw we both keep the same preferences independently of the language. No incoherence/bias there. That's the only point I'm making.

Geminidog · on Dec 20, 2020

> You're mostly arguing on why your naming preferences are better. You're missing the point, I'm not addressing that.

No I am not arguing for that. I am saying your style is objectively LESS practical and harder to read. You missed the point repeatedly.

>Instead you seem to agree people name things in the same (irrational according to you) way both in English and Python.

When did I claim this? You seem to be misreading everything. Functions in "python" or programming in general are by most people named by trying to find some word or hybrid mangling of the english language to find an elegant but less informative name. Similar to how poetry is a mangling of English grammar.

I am saying that it is more practical to NAME a function in programming with a longer phrase or sentence. Whether you "feel" that's ok or not is irrelevant to the actual practicality of that name. You didn't even read my post.

>No, it's not ok only because it's English. It's ok because it's a definition, it is not a name. In English as in Python we tend to prefer more concise names.

Your just getting semantics mixed up. We call function names a "name" but it can also be called a function "sentence" or "function phrase"

Here's a more clear way to get it through your head:

"Function phrases" are more practical and informative then "function names" aka "function abbreviations/poetic words"

I am saying what you "prefer" or how you feel about the above statement is completely irrelevant to the fact that logically "function phrases" are more informative with the cost of being slightly longer.

My claim is your preferences are totally irrational and you are missing the point. Read it more carefully.

>A function's name is...a name (duh), the comparison to prose makes no sense.

It makes no sense to you because your brain is limited by the words "function name" you think because we call the naming of a function a "function name" your brain is unable to wrap around the fact that you can use a collection of words in the naming of a function. That's why I renamed "function name" into "function phrase" to help kick start your brain into gear and get on my level.

>Once you actually compare English and Python names you'll see they both tend to be more concise.

Again, missing the point repeatedly. Stop letting the word "name" block your creativity. Whatever we call something in the universe, a title a name or whatever, I can choose to be not concise, concise, put an entire sentence into the title, put an entire novel into the title. There are no actual rules on what we want to do and can do. The argument is simply for what is actually practically better to put into this "title" for programming.

This is the playing field we're operating in, You are letting your personal vocabulary and definition of "name" to delude your thinking.

So I am saying I want to use longer phrases/sentences or "function phrases" as the title of a function and you are saying that you want to use shorter/briefer names and that is a "preference" Then you say that my "preference" is not a "name"

My counter to your above is that whatever you want to call "naming" is completely irrelevant, my "preference" is categorically, objectivity and logically more practical and informative then your "naming" style. It is better because it communicates more information get it?

The thing that's missing here is that you haven't objectively told me why your style is better. It conveys less information and is thus logically worse. I'm betting that you have no reason. You just irrationally "feel" it's better to use "function names" or "function phrases"

>Ironically I find your style more poetic (we really have opposite tastes :P).

This is where your thinking is cloudy. First off this has nothing to do with taste. I am literally saying your "preference" is objectively less informative and therefore worse.

The other part of your thinking that is cloudy is your misinterpretation of the word "poetic." Poetry is a mangling of English vocabulary and grammar that is more Concise. My proposal is to move away from mangled English and grammatically incorrect "function names" and write "function phrases" that are grammatically closer to correct English. Your proposal is to make it shorter and more like poetry as per the exact definition given above.

People who can't think straight tend to think anything goes in poetry and that either naming scheme (yours vs. mine) can be poetry as a matter of taste. Wrong. There are hard rules that separate poetry from written English. Again, my style is to make programming closer to grammatically correct english, your style is to create poetry as per the definition of poetry above.

>But as you saw we both keep the same preferences independently of the language.

What same preferences? Our preferences are objectively different. And my preference for "function phrases" is objectively better and more informative then your "preference." We never reached an agreement I don't know how your imagination is cooking this up.

>No incoherence/bias there. That's the only point I'm making.

Again I am saying my "preference" is logically BETTER then your preference because it saves the reader time from guessing context and is more informative.

Look man, stop. This is a typical argumentative strategy to turn hard facts into muddy "opinions" and "preferences." In your world nothing is better or worse and everything is just a preference and anything goes. This is weak.

I have a "preference" that is different from your preference and I am stating my preference is better then your preference and your response is that everyone can have their own opinion? Come on.

simo7 · on Dec 20, 2020

> We call function names a "name" but it can also be called a function "sentence" or "function phrase"

I mean...we can also say that pigs are birds if you wish, everything is possible XD

Who in English would ever name anything as "A function that finds the x y coordinates of dogs cats and baboons in a picture". Does that sound like a name to you?

Do you actually speak like this in your daily life? "Please can you turn on the device for remotely visualising entertainment shows, news and sport events?"

Ah I forgot names don't exist: a name is a phrase, a definition is a phrase, a question is a phrase, an assertion is a phrase...can all be used interchangeably, they all serve the same purpose...ssure.

I think at this point you're lost in your own sophism. Good luck getting out of it!

Geminidog · on Dec 20, 2020

>I mean...we can also say that pigs are birds if you wish, everything is possible XD

That's right we can. What you're failing to see here is that this naming is arbitrary. It truly seperate from structure.

>Who in English would ever name anything as "A function that finds the x y coordinates of dogs cats and baboons in a picture". Does that sound like a name to you?

That's why your thinking is limited. Whatever we call a function or a thing doesn't have to be limited by your definition of what is a "name" the limit is placed by you not reality.

>Do you actually speak like this in your daily life?

Can you get it through your head? You speak in sentences do you not? You don't ONLY use names to describe things, you use sentences. So we CAN use a sentence to DESCRIBE a function definition. Just because we call this description a "function name" doesn't mean we should be limited by the concept of what you define as a "name"

>"Please can you turn on the device for remotely visualising entertainment shows, news and sport events?"

Fortunately unlike speaking our editors assist us with auto complete.

If such a primitive existed in your program and you just called it "remote" and left the reader to guess what the hell it does by looking at context... you'd be a really bad coder.

Call it:

    "controller_that_changes_television_channels"

And auto complete assists us with the length of the name. So really length isn't even that big of a factor here.

>Ah I forgot names don't exist: a name is a phrase, a definition is a phrase, a question is a phrase, an assertion is a phrase...can all be used interchangeably, they all serve the same purpose...ssure.

They actually do serve the same purpose. The purpose is communication. The problem with you is that you think the only form of communication in programming is names and context. I am saying you can use english sentences as well. It's that simple.

>I think at this point you're lost in your own sophism. Good luck getting out of it!

My argument? This isn't my argument. I'm not smart enough to invent this concept.

You ever heard of a guy named Donald Knuth? The guy who basically turned programming and algorithms into a science? Wrote the books "The Art of Computer Programming"? Well he invented something called literate programming:

https://en.wikipedia.org/wiki/Literate_programming

Take a look because in Literate programming people create "macros" and name those Macros after Entire essays or paragraphs. Donald does not restrict the "naming" of macros into pathetic little snippets of poetic words. His mind does not restrict what you can "name" something, not like your mind.

Don't let the word "macro" confuse your brain... macros are the same thing as functions, just a bit more advanced. The primary difference is that functions are resolved at run time, Macros are resolved pre-compile time.

This is my point.

You're not trying to debate my argument. You're debating an entire style of programming created by Donald Knuth.

I deliberately hid the official name of this technique because dropping the name Donald Knuth would just get people to agree with his reputation rather then reason and logic behind his thinking. Given that reason and logic doesn't work with you I think a name drop is relevant here.

It's not my sophism. It's Donald Knuth. Good luck trying to resolve your sophism with a concept invented by Donald Knuth.

simo7 · on Dec 20, 2020

> They actually do serve the same purpose. The purpose is communication

You just said that a name, definition, question and assertion all serve the same purpose and can be used interchangeably...I mean what else can I say?

Exactly.

It was a nice exchange nonetheless but at this point it really looks like you hit a dead end.

Geminidog · on Dec 21, 2020

>You just said that a name, definition, question and assertion all serve the same purpose and can be used interchangeably...I mean what else can I say?

They can be used interchangeably similar to how people can choose to be wrong, right, stupid or smart. The ability to interchange techniques is irrelevant to my point. I am simply responding to a misguided concept stated by you.

A longer phrase can be used interchangeably with a shorter name. The longer phrase is superior to your misguided opinion on using shorter and as a result uninformative names. Donald Knuth agrees.

I am saying your way is the categorically wrong and worse way. You can't respond to this because you got nothing left to say. You're tongue tied.

>Exactly

Yeah you're exactly wrong.

>It was a nice exchange nonetheless but at this point it really looks like you hit a dead end.

Literally I have reams and reams of evidence, deconstructing each of your points and tearing apart every statement you made. You only have the capacity to respond to one teeny tiny snippet of what I wrote and your response is still misguided.

I'm sorry, but the dead end is you.

throwaway201103 · on Dec 19, 2020

The argument is that at some point the length of the name is detrimental to readability, in the same way that a run-on sentence is detrimental to the readability of prose. I thought it was quite clear and your 16 paragraph response didn't touch on that at all. Or if you did, it was lost in the noise.

Geminidog · on Dec 19, 2020

Oh I thought it was obvious that you don't want to rename a function after a 300 word paragraph. I didn't realize we need to get into that kind of semantics. My argument is that a sentence long function name of 10 words roughly is still a good function name. There's no hard rule here. Judgement is still qualitative The main point that I was arguing for is that this:

  find_xy_coordinate_of_dogs_cats_and_baboons_in_picture

is a perfectly good function name. The poster I responded to disagrees with me and I was responding to that specifically.

pitay · on Dec 19, 2020

Your original function definition here:

  def find_xy_coordinate_of_dogs_cats_and_baboons_in_picture(picture: Picture) -> List[XYCoordinates]:

Can be made better by turning it into this:

  def find_dogs_cats_and_baboons(picture: Picture) -> List[XYCoord]:

Reasoning:

'xy_coordinate_of_' taken out because XY coordinates are already in the return type. '_in_picture' taken out because the information is already in the 'picture: Picture' parameter. The return type 'List[XYCoordinates]' changed to 'List[XYCoord]' because Coord is well known as an abbreviation for a coordinate and having XY in front of it makes it completely obvious. Removed the 's' from the end of the return type because it is already contained in a list and shouldn't be pluralised. It would be pluralised if you are returning a list of lists of coordinates.

There are problems with having huge function names, especially if other programmers need to use the functions you write in their code. One is the amount of screen space needed to use your function, this bites when the could be many functions with long names used in a single expression, long names can be harder to understand in that context. There are reasons that mathematical operators in programming languages are usually one character, things like '+ * - /', it should be obvious the problems people writing in that language would have if the operator names were replaced with things like 'multiply'. Now imagine mathematics being written in a verbose English style, it is not done this way because of all the extra effort it would take to write it, and all the extra effort it would take to understand it.

ptr · on Dec 19, 2020

While I do like me some good meta-reasoning, at least I have an argument for shorter function names: Short names are read in O(1) while longer are O(n) (unless the name can be simplified as “the function with the long name”)

Geminidog · on Dec 19, 2020

Then you need to shorten your comment! Not as verbose as my replies but it can definitely use some improvement on the big Oh! Try this out for size:

> me like metareason, me have arg for short func name: Short name O(1), long are O(n) (some name too long: “the function with the long name”)

There's where your logic breaks down. If you truly followed the O(N) argument for all your communication mediums outside of programming you'd sound not so intelligent. Likely you built this argument as a response rather then actually followed it in both your English and programming communication styles.

Better re-examine that meta-reasoning.

ptr · on Dec 19, 2020

It's probably not worth continuing here, seeing that you're already pretty hostile. But I definitely try to minimize the amount of reading necessary to understand a concept. Ideally, the concept should be such that it's easy to describe in few words (note "words", not "letters". Also, using many words doesn't make you sound smarter, often it's the opposite). Things don't live in a vacuum, there's always a context, and that's important to consider when communicating...

mercora · on Dec 19, 2020

my point is that i think there should be a soft limit on how long a single line of code should be because i feel like it becomes too hard to read if they get too long. that is an emotion as you said and it might be kinda arbitrary but i don't think there is something fundamentally wrong with it.

Geminidog · on Dec 19, 2020

Sure there is a logical reason for why you don't want your function to be 4000 lines long. It gets unwieldy and becomes mechanically hard to manipulate or even understand the full meaning behind all 4000 lines. That's a logical argument. I completely agree with the soft limit.

But you stated that this function name:

   find_xy_coordinate_of_dogs_cats_and_baboons_in_picture

needs shortening.

There is no logical reasoning that can justify shortening this function yet the feeling is strong. That is the bias I am trying to illustrate. There is, in fact, nothing wrong with not shortening this function and keeping it as it is.

wolco2 · on Dec 19, 2020

Shorter is easier to read and understand at first glance.

Writing a paragraph kills any formatting in vi/vim.

Why not write a comment if a paragraph is required to understand what you are doing.

This sounds like Hungarian notion taken to the extreme.

Geminidog · on Dec 19, 2020

>This sounds like Hungarian notion taken to the extreme.

No it's sanity taken to the extreme.

Have you ever noticed that all things written to communicate things to people in the United States outside of programming is written in a very verbose manner using a language called English? It's used for technical manuals, text books, and stories.

Is English "hungarian notation taken to the extreme?" No dude. People actually find verbose English stuff easier to read. You don't have English writers abbreviating words and coming up with elegant acronyms in a physics text book.

>Why not write a comment if a paragraph is required to understand what you are doing.

I didn't say name your function after a paragraph. A functions English language analog is a word and at most a sentence. A paragraph would be several functions chained together. If you name your functions well, composed procedures will read close to an actual English paragraph.

That being said there's nothing wrong with comments, comment away but don't call your function doXYZ and put the entire description in the comment. Your comment doesn't follow the a function call.

Here's an example:

    list_of_profiles = get_list_of_profiles_from_file("profiles.txt")
    list_of_profiles_named_bob = filter_profiles_by_name("Bob", list_of_profiles)
    list_of_profiles_named_jane = filter_profiles_by_name("Jane", list_of_profiles)
    list_of_profiles_named_bob_and_jane = concatenate_profile_lists(list_of_profiles_named_bob,
                                              list_of_profiles_named_jane)
    list_of_pairs_with_a_married bob_and_jane = merge_married_profiles_into_list_of_pairs(
                                                   list_of_profiles_named_bob_and_jane)

Trust me, you may think your eyes are bleeding but they are not. The above is actually closer to the English language then 90% of code out there. What you don't realize is that there wasn't a need for a single comment and there wasn't a need to dive into any of these functions to read the definition. You just read the variable and function names and you know exactly what's going on. If you recompose these functions to do something else it's like recomposing sentences and words in the english language. The end result is still readable without the need for new comments.

When you read a recipe or follow directions to build something does the writer give you those directions in some coded nomenclature? No the writer writes verbose English with clear grammar. The point is clarity in naming in these entities, it makes zero sense why we don't do the same in programming.

>Writing a paragraph kills any formatting in vi/vim.

You know this might be a really different chain of thought that goes against the grain...

But maybe due to this lack of formatting in vi/vim makes vi/vim an extremely bad editor for programming? Seriously, humans are weird (Japan for example still uses fax machines). If it's so bad why do so many people use it? Maybe to look smart or maybe for the same nonsensical irrationality as to why we have to come up with some unreadable but "elegant" name for every programming primitive but at the same time we have to be extremely verbose for ALL other forms of written communication with English.

Programmers like to think they're smart and original. Most aren't... they follow the same tropes as every other programmer trying to come up with elegant names for no reason whatsoever and strangely unable to see the purposelessness behind this whole naming thing. If you can't come up with a good "name" for it, make the name an entire sentence, it's that simple... it's the reason why sentences exist.

BlargMcLarg · on Dec 19, 2020

"from_file", "by_name", etc. are fairly needless here. Most people are apt enough to grasp the first argument relates to the last word of the function name. Use properly named variables so a summary can take care of it, or only omit the noun, "get list of profiles from".

Get itself is a terrible prefix and could be omitted, especially if you keep "from". "list_of_profiles_from(X)" isn't any less clear. Alternatives like "read_list_of_profiles(file)" exist.

"List_of_profiles" can be changed to "profile list". You already did it with "concatenate_profile_lists", showing inconsistency. Alternatively, depending on the context, "list" itself can be omitted entirely, and just state "profiles".

"merge_married_profiles_into_list_of_pairs" can arguably be shorted to "merge_married_profiles" depending on IDE and language: that last bit should be clear from the returned value. Even without, it can still be shortened to "pair_married_profiles", as the context should make it obvious if we look for more than 1 married couple, there will be some kind of collection. Additionally, your naming has one problem: "merge_married_profiles". With all the verbose naming, it is still not clear what "married profile" is. I'll assume it means "pair profiles of married couples", where you might as well say "pair_profiles_of_married_couples".

>Programmers like to think they're smart and original.

It is because they think to be smart and original, they fall into the trap of overly verbose naming. Not from a lack of it. Have you looked at a legacy Java code base? Many of them can be slashed in half just by renaming variables to something that keeps the meaning, or draws meaning from the context very obviously. These guys are going against their own mind's natural ability to read context, or worse, conditioned themselves to learned helplessness.

spion · on Dec 19, 2020

They are in fact bleeding.

Clear writing is about structure, not verbosity or repetition. Concise-and-clear is preferred over verbose-and-clear.

ybothr · on Dec 19, 2020

You mean "being more conciseness is better if you don't have to sacrifice clarity for it"? Or just - "conciseness is a good thing"?

How would you rewrite that example, specifically, to make it more concise without sacrificing clarity, then? Do you mean to change the names without omitting type or relational information somehow? Or to omit some variables entirely in favor of nesting function calls?

If the latter, I don't see the relevance to that commenter's actual point about short vs long names; reducing the number of names is entirely tangential.

spion · on Dec 20, 2020

I would omit most information that can be inferred from context so long as context locality is good.

Single-letter names are OK for context smaller than 1 line (like lambdas), type information can be omitted for context up to 10 lines (lists etc) and generic functions (filter, concat, inner_join) are better over custom domain functions when the audience consists of other programmers.

Abbreviations are best avoided unless they are extremely common and familiar in the specific domain / industry. If they are only ever used within that team and company, that's probably bad.

Example here: https://news.ycombinator.com/item?id=25477495

Geminidog · on Dec 20, 2020

>I would omit most information that can be inferred from context so long as context locality is good.

context also changes with time as other people edit your code. Your single variable name with simple context can balloon in complexity and move around. Don't assume locality is fixed.

Additionally locality wastes precious time. It is preferable to read a variable name and not even need to touch context.

>generic functions (filter, concat, inner_join) are better over custom domain functions when the audience consists of other programmers.

Not true. For filter and innter_join especially the predicate or inner lambda can be intricately complicated and hard to decipher, better to wrap all that complexity in an english name. You save programmer time by having the programmer read an english over deciphering even simple code.

Rule of thumb: It is far easier to read one line of english then it is to read a one line of code. So it is better to allow readers to ascertain meaning from naming over context. You are wasting the programmers time otherwise.

spion · on Dec 20, 2020

> context also changes with time as other people edit your code

And you change the code accordingly, IF it does. Typically though, its best to change the structure of the code, not to increase its local complexity. If the context gets complicated, think about what makes sense to be split out. Local complexity should be kept minimal.

> can be intricately complicated and hard to decipher

And you change the code accordingly, IF they are. But you name the lambda, not the generic function. There is a huge value in using a standardized vocabulary. Even in English, we get value out of "baby" versus "small human between the age of 0 and 2 years")

More generally, I've seen this kind of thinking before. A programmer discovers some "universal truth" which applied well in some context and they get so obsessed with it that they start applying it everywhere. Please, stop and think before you overcommit to such ideas. They are not nearly as universal as they seem - caveats apply. If you ignore those caveats, your fellow team mates will suffer - so check with them frequently too. At the end of the day it doesn't matter what you think about your own code's readability, but what others think.

Geminidog · on Dec 20, 2020

>And you change the code accordingly, IF it does. Typically though, its best to change the structure of the code, not to increase its local complexity. If the context gets complicated, think about what makes sense to be split out. Local complexity should be kept minimal.

My method requires no changing ever. With better "functional phrasing" I make the title of a function independent from context. Similar to a module.

You however are saying that the name and context are tied together thus to handle it you change the name and context and structure all together. This is objectively worse.

>And you change the code accordingly, IF they are. But you name the lambda, not the generic function. There is a huge value in using a standardized vocabulary. Even in English, we get value out of "baby" versus "small human between the age of 0 and 2 years")

And there it is, more changes. Every change and edit to working code is a potential for a new bug. A change to structure of the code should be made independently to naming. Modularity is important.

>More generally, I've seen this kind of thinking before. A programmer discovers some "universal truth" which applied well in some context and they get so obsessed with it that they start applying it everywhere.

First off stop commenting on my character. Second off of course caveats apply. I never said disregard caveats. It's also perfectly fine to take on a bit of technical debt and use a one letter name to save time when needed and according to context. My argument is for knowing what is technical debt and what's not. Don't make random assumptions here, get your head straight and focus on the topic at hand.

We can get rid of the "caveat" distraction by just examining an example without caveats:

      find_xy_coordinate_of_dogs_cats_and_baboons_in_picture

My claim is that the above is an informative and perfectly good "function phrase" Your claim is that this is worse. Caveats do not apply in this example.

Geminidog · on Dec 19, 2020

>They are in fact bleeding.

>Clear writing is about structure, not verbosity or repetition. Concise-and-clear is preferred over verbose-and-clear.

Except this is where the contradiction lies. In both well written literature and great text books... clarity trumps all even when conciseness is sacrificed. English is one of the most verbose languages out there. I can take your sentence and make it concise:

>My eye bleed.

>Writing about struct. No verbose or repeat. Short n' Clear beter den wordy n' Clear.

Is that better? Your misguided logic paints my concise version of your comment as "preferred" even though it has the exact same clarity. The stark reality is, for purely human reasons, people prefer the former example over the later and there is no real rationality behind it.

Try to think a bit outside of the box here. You share the biased and delusional opinion of a typical average programmer.

The real logic is, that the conciseness or verbosity is inconsequential. Our human nature allows us to prefer contradictory approaches in code vs. english because verbosity and conciseness doesn't actually matter that much. Clarity is king by a long shot hence the reason why most humans prefer reading literature over code.

My code displays the ultimate clarity. You insultingly claim that your eyes may be bleeding, but I guarantee you that unless you're mentally deficient, no part of my code was unclear. It was 100% obvious and crystal clear what my intentions are. The best part is, you only need to read it one time.

When is the last time in your life you've read a similar snippet of unfamiliar production code at first glance and left with the exact same level of clarity? Most similar production code needs a good number of guesses and hypothesis and a couple of reads and code following to develop the same level of clarity and confidence of understanding that my code can produce on a SINGLE reading.

I would wager we can agree on that point and if your claim otherwise I would wager that you are lying.

yxhuvud · on Dec 19, 2020

The problem is that all those words make it harder to find the meaning. Well-chosen naming exposes it.

Geminidog · on Dec 19, 2020

If it can be found. But more often then not it can't be found. Do you always search for the most elegant single word to describe a point in English? Sometimes. But more often then not you have to resort to sentences.

There's no reason why this logic can't be applied to programming.

amatecha · on Dec 19, 2020

Yeah, I don't need conciseness or brevity. The processor or the network connection needs conciseness. Programmers need specificity and clarity. Verbosity is fine too, IMO :)

spion · on Dec 19, 2020

My comment was already concise to begin with. Your change reduced clarity. Here I will exaggerate your example:

    list_of_profiles_that_we_got_from_reading_a_file_named_profiles_dot_txt = get_list_of_profiles_from_file("profiles.txt")
    
    list_of_profiles_from_the_file_named_profiles_dot_txt_which_are_filtered_by_name_where_the_name_was_bob = 
      filter_profiles_by_name("Bob", list_of_profiles_that_we_got_from_reading_a_file_named_profiles_dot_txt)
    list_of_profiles_from_the_file_named_profiles_dot_txt_which_are_filtered_by_name_where_the_name_was_jane = 
      filter_profiles_by_name("Jane", list_of_profiles_that_we_got_from_reading_a_file_named_profiles_dot_txt)
    combination_of_the_profiles_from_the_profiles_dot_txt_file_containing_both_the_profiles_with_a_name_bob_and_the_profiles_with_a_name_jane = 
      concatenate_profile_lists(
        list_of_profiles_from_the_file_named_profiles_dot_txt_which_are_filtered_by_name_where_the_name_was_bob,
        list_of_profiles_from_the_file_named_profiles_dot_txt_which_are_filtered_by_name_where_the_name_was_jane
      )
    list_of_pairs_from_the_profiles_txt_file_where_the_first_person_is_named_bob_the_second_is_named_jane_and_they_are_also_married = 
      merge_married_profiles_into_list_of_pairs(
        combination_of_the_profiles_from_the_profiles_dot_txt_file_containing_both_the_profiles_with_a_name_bob_and_the_profiles_with_a_name_jane
      )

There is a point where verbosity decreases clarity by overwhelming the reader with irrelevant detail which can already be inferred from context.

Lets try a concise example:

  profiles = get_profiles_from_file("profiles.txt")

  bobs = profiles.filter(p => p.name == 'Bob')
  janes = profiles.filter(p => p.name == 'Jane')

  marriages = bobs.inner_join(janes, (bob, jane) => bob.partner == jane)

Does it matter whether profiles is a list or not? If the typical data structure / convention you use for plural variables is a list, you don't have to say it.

Do I have to say that the variable bobs is a list of people named bob? Not if its obvious from the context on the right hand side that I'm filtering by name

Do I have to use a more verbose argument name in the lambda passed to `filter`? Not really - its short and there is plenty of context around to deduce that its a profile, especially if the reader is familiar with a commonly used standard library function.

The last one is tricky, and it depends who you're communicating with. Do you expect your readers to be familiar with the standard library of the language, even less commonly used functions? If so, then its fine. If not, again it depends. Is the reader familiar with SQL or relational algebra? If so, then yes they probably have no problem with this.

As long as your context and conventions are clear, you can leave a lot of details out and still get the message across. Its better to err on the side of caution, yes, but it doesn't mean that unlimited verbosity leads to unlimited clarity. As with writing - your audience is what matters.

Geminidog · on Dec 19, 2020

> My comment was already concise to begin with. Your change reduced clarity. Here I will exaggerate your example:

Your comment was concise because it was just a statement. You didn't attempt to prove a point.

My comment was a proof against your point hence why it's longer. Now you're trying to disprove my proof which also explains why your subsequent reply is also significantly longer.

> There is a point where verbosity decreases clarity by overwhelming the reader with irrelevant detail which can already be inferred from context.

I agree with the above completely, but I also think the point is obvious. I never stated there was a level of verbosity that is excessive because I thought that notion is actually completely clear to all readers.

Here's a good rule of thumb to follow. We clearly don't think the English language is too excessive in verbosity. So All I'm saying is bring programming to the level of verbosity of English and don't go past that.

Obviously your first example is excessive and past typical English verbosity. But your second example is below English verbosity and has several problems.

> Does it matter whether profiles is a list or not? If the typical data structure / convention you use for plural variables is a list, you don't have to say it.

It doesn't hurt if I put "list" or "profiles" in the name it's just some additional letters letters and adds more information. It doesn't matter at all. Also your assumption is wrong. Many containers can be plural including linked lists, hash maps, trees and graphs.

>Do I have to say that the variable bobs is a list of people named bob? Not if its obvious from the context on the right hand side that I'm filtering by name

You don't have to, but you don't not have to either. The Bob variable can be used in a section very far away from the context... then what is a bobs? What is a janes? How do you even know it's a list of profiles? You're literally making me follow and decipher code to figure it out. That is the point. Give a function an English name where I don't need to decipher anything. I read the function name and I don't have to dive in to decode anything.

>Do I have to use a more verbose argument name in the lambda passed to `filter`? Not really - its short and there is plenty of context around to deduce that its a profile, especially if the reader is familiar with a commonly used standard library function.

There is nothing you "have to do" here. You can do whatever you want. I am saying what you're doing is actually is worse for communication and that my way is better for communication with the incredibly negligible downside of being more verbose. That being said context can balloon in complexity, reading code is harder than reading english so make the reader read english when he can rather then code.

>The last one is tricky, and it depends who you're communicating with. Do you expect your readers to be familiar with the standard library of the language, even less commonly used functions? If so, then its fine. If not, again it depends. Is the reader familiar with SQL or relational algebra? If so, then yes they probably have no problem with this.

All your variables can be used far away from the context where they are created. You can't rely on the fact that the creation of Bobs is right next to it's usage in marriages. Often times your style of coding will result in people having to follow code and dive into definitions to figure stuff out.

First off marriages. Marriages of what? Sam and Bob? George and Shirley? Second the expression itself. Again what is a bob and jane? What is a partner? Partners in crime? Also seriously:

   bob.inner_join(janes, (bob, jane) => bob.partner == jane)

compare the two.

   list_of_pairs_with_a_married bob_and_jane = merge_married_profiles_into_list_of_pairs(
                                                   list_of_profiles_named_bob_and_jane)

I think most people will agree that mine is more clear in communicating what's going. Your version despite the brevity needs some deciphering.

Also you can't expect that the profile data structure is so simple that it can be done in a one liner. You assumed the data structure to be very simple. What if the data structure is an incredibly complex graph structure of profiles. Marriages can only be found by a complex graph algorithm. I don't want people to decipher a graph algorithm to decode what I'm trying to do here.

Write your function names so people can avoid deciphering meaning from context. The point is so people can decipher meaning from English because English is ten times easier.

spion · on Dec 19, 2020

> Marriages of what? Sam and Bob? George and Shirley? Second the expression itself. Again what is a bob and jane? What is a partner? Partners in crime? Also seriously:

And we get to the key point you are missing. Its clear from the context. The code we had wasn't some imaginary code where the variable was far away and had a ton of context. It was that particular code. Different code might be better written in a different way. If you a have different code context in mind with higher complexity, show that one.

Additionally, "merge_married_profiles_into_list_of_pairs" is not necessarily better. When debugging the code, we don't know what that part really does. An implementation using a more generic standard library function lets us glance over that bit since we already have understanding of it. (And again, it might depend on the audience - are we talking to a language expert, or a domain expert? Do we have a well tested and well defined library of domain functions that everyone has a clear understanding of?)

Context and audience matter. Verbosity can be a lazy cop out for bad structure. (That's applicable to writing English as well.)

Geminidog · on Dec 20, 2020

>And we get to the key point you are missing. Its clear from the context.

I understood this point utterly and completely you have misunderstood the point I was making. I am saying you can't rely on context because context can grow in complexity and can actually live far away from where you are using a variable or a function. Relying on context leads to code that will inevitably become less and less readable as complexity grows. Read my post. I literally addressed "context" and you literally missed my point.

Let me spell it out for you. If I have a 500 line piece of code, Bobs is created on line 1 then reused again on line 500, and I'm currently looking at line 500, you're expecting the reader to scroll all the way back to line 1 to decipher context. Couple that with multitudes of other concepts littered throughout your code with context strung throughout the page and located in different files.... This is my point that I demonstrated to you earlier to COUNTER your point. Once you realize this, you'll know that you're the one who missed the point.

You function name should be so clear that a reader should never have to read context. He reads the name and he can move on with life without decoding everything you did.

If I called the variable list_of_profiles_named_bob, no context is needed. Critical information lives and moves with the concept.

Let me reiterate my point: Context used in place of naming is done by programmers who are bad at writing readable code.

>Additionally, "merge_married_profiles_into_list_of_pairs" is not necessarily better. When debugging the code, we don't know what that part really does.

This is 100% better. Nobody needs to know what a function actually does, this is how abstraction works the point is that you only need to dive in when there is bug, but before there's a bug complexity should be abstracted away so we can make sense of the bigger picture.

>And again, it might depend on the audience - are we talking to a language expert, or a domain expert? Do we have a well tested and well defined library of domain functions that everyone has a clear understanding of?

I assume the audience can understand english. No need to use "inner join" when both the person who knows SQL also knows english. I chose the methodology that everyone can understand. What is the cost of doing this? Nothing. Just a longer function name that actually does zero harm to the structure of a program.

>Context and audience matter.

Audience matters, assume the audience can read English and generally program, that's it. Context as a communication medium is a crutch used by bad programmers to avoid abstracting concepts and giving things clear names.

> Verbosity can be a lazy cop out for bad structure.

Verbosity and naming have nothing to do with structure this is categorically wrong, and also obvious but whatever, I'll show you..

   func add_two_nums(x,y):
        return x + y

   func add(x, y):
        return x + y

Literally, 2 functions that do the exact same thing. You may claim the bottom function is better because it's shorter. And I claim it's shorter by a measly two words, who cares, both functions convey equal meaning and equal structure.

>are we talking to a language expert, or a domain expert?

Literally domain expert code is a synonym for bad code. All code that is bad, when studied long enough will produce a domain expert that knows that shitty code inside and out. A domain expert is someone who mastered (or wrote) code only readable by other masters of reading that same bad code. Think about it this way, if you posted your code on github and people started reading the code, all domain expert code will be regarded as shitty code. This is the colloquial definition of bad code. The best code is code on github that should be readable by non-domain experts on a single pass.

Now I admit that there are some cases where it's just too hard to do this. You can't program a simulation in relativity that's so readable that someone who doesn't understand relativity can read the code. Of course that's just too much to ask. What I'm saying is that "inner_join" is utterly unnecessary and that "merge_married_profiles_into_list_of_pairs" way better then what you came up with.

throwaway201103 · on Dec 19, 2020

Programming is more like mathematical notation than english prose.

In mathematics we write a² + b² = c² to describe the relationship between the lengths of the sides of a right triangle. We don't write it out in english words longhand, because the notation is brief, packs a lot of meaning into a small amount of space, and lets our minds focus on larger concepts rather than parsing long phrases and keeping their meanings organized.

Geminidog · on Dec 19, 2020

>In mathematics we write a² + b² = c² to describe the relationship between the lengths of the sides of a right triangle.

Therein lies the problem. Can you explain to me the meaning of a² + b² = c² without English? Can you just write down an equation and expect me to know what you're talking about?

Can you explain to me the concept of entropy by just showing me all the equations?

Can you explain to me the meaning of your program with only one letter variable names as shown in your Pythagorean equation above?

You can't. That's why math texts consist of equations AND English, and there's no reason programming shouldn't either.

Anyway a side note, have you ever heard of literate programming?

MrPatan · on Dec 19, 2020

Yeah!

Why is matematical notation full of one-letter variables?

Cool when paper and ink is expensive and you're trying to send your proof to the other mathematician, in a letter in the mail, in the 16th century, but now? Why?!

gregmac · on Dec 19, 2020

Some of the highest value-to-effort feedback I've both given and received in a PR is about naming. Whenever I see something where my first impulse is to react with "WTF?!?" I now try to ask myself "does something here just have a bad name?" and much of the time that's all it is.

hanniabu · on Dec 19, 2020

I was working on a financial trading program at one point, specifically a function to filter orders into bids and asks. They named the order value "total" and the order size "sum".

It made a really simple function incredibly difficult to read.

rualca · on Dec 19, 2020

> My friend told me at their company they'd commonly convene the "variable naming committee" for such occasions, and I can't help but think of it every time I find myself in the same place.

You better call it the bike shedding committee. I don't see how that saves time over, say, just let anyone working on that code who really dislikes to propose their change in a next merge request.

throwaway201103 · on Dec 19, 2020

I've worked at a place where variables (and in particular, database table and column names), had to be approved by the naming committee.

sverhagen · on Dec 19, 2020

Hey! We've worked in the same place then!

Joking aside. I agree that it doesn't seem like the kind of thing you want to convene a committee for. But it happens where the database is the contract shared by a bunch of applications, in which case it's important to get it right, maybe important enough to spend a meeting on. It's not my architectural style of choice... but it happens.

throw1234651234 · on Dec 19, 2020

Agreed. My problem isn't even naming, but overly long names, which become a problem in C# and Java, where you have the namespace, the method, a service inside the method, a long type (because "var" can be an issue), and so on.

dcolkitt · on Dec 18, 2020

In principle I agree, but in practice don't think this leads to great results. Naming things well is really cognitively challenging. The human brain is naturally lazy and easily makes excuses to avoid thinking hard.

When something like this is adopted, the average person will look at something then quickly throw up their hands and declare that they can't name it without really trying. Many things can be named well, but it takes 60 seconds of hard thought and focus to realize it.

mrits · on Dec 18, 2020

Being at a startup for 10 years had the advantage that I'd frequently run into some of my original code. My favorite real life naming example:

IDataProcessor. And if you need the raw data it will be several layers deep in dataProcessor.data.data.data.

reactordev · on Dec 18, 2020

I read that last line in the theme of Super Mario Bros World 1-2 song. Data.data.data... Data.data.data...

boogies · on Dec 18, 2020

https://invidious.site/watch?v=OsnfYn_ZFdE

gonzo41 · on Dec 19, 2020

My naming ticks are "tasks, process, items"

sgerenser · on Dec 19, 2020

Can’t stand the ever-popular “Manager.” Java and C++ especially loves classes that manage things.

lyjackal · on Dec 19, 2020

Yeah, I agree with GPs sentiment, but would emphasize good abstraction over a good name. Hard to define a good abstraction though, but I think a good starting place would be to make it feel single purpose. Once an abstraction becomes too multi purpose is when configuration and case switching spaghetti begins to start.

civilized · on Dec 19, 2020

For this concern to make sense, people would have to go around easily inventing good abstractions all the time, but having great trouble naming them. That seems so implausible to me. If you can't name it, that's almost certainly because it sucks.

silisili · on Dec 18, 2020

Not a bad rule, but from what I've seen mainly in corporate Java, you can give everything and anything a name.

leni536 · on Dec 18, 2020

I should have wrote a "good name".

loopz · on Dec 18, 2020

GoodNameAbstractionFactory to the rescue!

rob74 · on Dec 18, 2020

    GoodNameAbstractionFactory goodNameAbstractionFactory = new GoodNameAbstractionFactory("GoodName", abstractionFactory);

silisili · on Dec 19, 2020

And don't forget the then obligatory GoodNameAbstractionFactoryFactory. How else could you get the factory, after all. I'll go cry in the corner now.

hnick · on Dec 19, 2020

The 'var' keyword has to be one of my favourite C# features.

oftenwrong · on Dec 19, 2020

Java also has `var` since version 10 (2018).

williamdclt · on Dec 19, 2020

Once again Javascript is ahead of the game, we've had `var` since forever /s

benibela · on Dec 19, 2020

It was always Delphi that is ahead of everything else

qwerty456127 · on Dec 19, 2020

Inventing names is among the hardest part of any project.

choward · on Dec 19, 2020

> No abstraction is better than a bad abstraction.

I misread that at first. At first I thought you were saying that there is no better abstraction than a bad abstraction. Gotta love the ambiguities in English.

inopinatus · on Dec 19, 2020

On the one hand, yes, or as Sandi Metz put it, “duplication is better than the wrong abstraction.”

On the other hand it is often possible to refactor and restructure code such that you no longer have to name something. Returning a closure rather than a value, or folding over a list, are two of my favourites approaches, since in many languages you then deal with the object symbolically, or implicitly through syntax, rather than explicitly or by named reference. Coroutines, continuations, and generators too.

bluSCALE4 · on Dec 18, 2020

This is absolutely the rule. The problem I see a lot is that someone else will come in and want to add an edge case to my generalized code or will try to include edge cases in their "generalized" code. Personally, I have 0 problems with DRY code. I don't even treat it like a requirement but sometimes, I'm trying to code something and I KNOW the code needs to be generalized or at least part of it and that's when I'll spend an hour on a 15 min fix. Effective DRY comes with experience.

systemvoltage · on Dec 18, 2020

> Then one of two things are likely to happen

You forgot the third most common case - 90% of those repeated methods will never have a bug and now you've got a code base that will start smelling bad and no one is going to want to work on it.

I agree with some of the things here, but allowing multiple places of repeated code unless there isn't a bug sounds like a terrible idea. Lot of these small methods will never have a bug and they'll continue to rot the codebase.

lostcolony · on Dec 18, 2020

Can you get the trade off right 100% of the time? Because I can tell you, every time I've worked on a codebase that repeated itself, it has been a freakin' -delight-, compared to the times where DRY was taken as a commandment from on high.

The former when something broke, I could just...fix it. And it would be fixed. Would other, similar situations, still be broken? Sure! And when those would be raised up we'd fix them too, and compare them with the other changes, and possible refactor. Fixing one bug = one bug less.

The latter? Oh God. Something is broken? We'd fix it. Aaaand, now there'd be two bugs. Fixing one bug = more bugs.

Perfectly balanced code, yes, fixing one bug = multiple bugs fixed. That's the goal. But you won't get it right if you do it pre-emptively. Which of those other two options would you prefer?

potta_coffee · on Dec 19, 2020

I feel like I'm taking crazy pills because you describe exactly what I have to deal with on a daily basis, but I'm the only on my team that feels this way.

throw1234651234 · on Dec 19, 2020

I don't even know how to comment to something like this. I guess with an example - I spent some weekend time fixing 30 file paths in a Cloud function, because the junior developer couldn't abstract the root to a single method.

After you do something like that, tell me how much you love fixing bugs everywhere.

lostcolony · on Dec 19, 2020

Maybe you misunderstood; I am not saying avoid DRY. I am responding to a thread wherein the parent said, essentially, "confirm that it's really repeating".

My whole point was that we reach for DRY too early, when we don't know if it's actually repeating ourselves. The logic -right now- looks the same. Will it be in the future? Most of the time, we don't. Even when we think it will, we're often wrong. Your example sounds like there's literally a string literal that was not abstracted out, but hardcoded 30 places. That isn't logic. It's a literal. It has one meaning, and you can probably ascertain whether or not the meaning is the same across everything. And, as mentioned, it seems weird you couldn't grep for it.

Either way though, as a counter example - I and a coworker spent three months playing whack a mole extending and supporting a rather small desktop application written by one, extremely senior developer, who had assumed 7-8 different things were basically the same, and so had written it to share a lot of the same code. They were not the same, and so a fix for one thing invariably broke two others. I was thinking of it in particular with my original post; we fixed one thing, and two more things would be broken.

I got permission to rewrite the application over the course of a couple of days from the team lead, and proceeded to basically do a lot of copy paste to separate out each thing into its own control flow, where a fix to one would not affect the others. Our MTTF plunged, and within a month it was basically stable.

throw1234651234 · on Dec 19, 2020

Sorry about misunderstanding.

I think that's where SRP and naming comes in like mentioned elsewhere in the thread. If methods really do one simple thing and are properly named, it's easy to tell if they should be re-used or not.

A lot of the time, side-effects are not mentioned in method names or params (or the params are optional, that's where the issue comes in).

E.g.

SaveCandidate(Candidate candidate);

In reality, SaveCandidate does two things, saves the candidate and updates their "ProfileCompletionStatus".

If someone sees SaveCandidate and not SaveCandidateAndUpdateCompletionStatus

or

SaveCandidate(candidate, SaveSideEffects.DO_UPDATE_COMPLETION_STATUS)

they will re-use SaveCandidate in places here completion status shouldn't be updated (contrived example - data migrations).

The problem, of course, is that these overly-clear, overly-detailed methods are a pain to read. The common advice is "it's separate", but then we are repeating

SaveCandidate(candidate); UpdateProfileCompletionStatus(candidate);

everywhere, despite them being generally one process. I guess this is all pretty pedantic. When you spend a ton of time on this, you are missing out on getting stories done, but if you don't, the code eventually becomes umanageable.

lostcolony · on Dec 19, 2020

Right, but then someone sees that, and goes "Ah-hah. This is saving Candidates. Over here I'm also saving Foos. I'll refactor this so I have a generic 'Save' method/function, and then both Candidate and Foo can call it. DRY!

And that works, because initially saving is just taking a serializable object and dumping it to JSON. But then things get moved to going to a DB or something, and now it has different expectations, and, oh, crap, we have to do some special logic in the transaction because it turns out saving a foo requires us holding a lock on the bar table as well, but not for candidates, and etc etc etc, and saving was -really just two lines of code in the first place-, and the correct solution is isolating it entirely (with maybe a Saver interface to say 'yeah, this thing knows how to save itself'. Though that likely breaks SRP since now Foos know how to be Foos, and how to save Foos, which involves knowledge of the DB, which seems suboptimal, but which is still cleaner than what you had before)

Etc. My only point is that if the area you're tempted to repeat is not provably the same, contextually, keep 'em separate until it is (even if it's like...they implement the same interface). The pain of getting that wrong is almost always less than combining them, building more assumptions on top of that, and then finding out they should have been separate.

throwaway201103 · on Dec 19, 2020

Fixing the 30 file paths was not just a quick global search/replace?

wolco2 · on Dec 19, 2020

Reminds me of the time I wrote 100 similiar sites separately with slight changes. A quick global search/replace will introduce so many unnecessary replaces unless you are careful and miss so many unless you take linebreaks into account.

A global search/replace is a hotfix hammer that should only get pulled out rarely and carefully by manually reviewing all changes.

throw1234651234 · on Dec 19, 2020

Bad example, it was more complicated than that. Azure functions have a different file system than App Services, and the method tried to determine where it was hosted 30 times, or something like that. Genuinely don't remember.

sagichmal · on Dec 19, 2020

Refactoring repeated code is toil, but cognitively trivial. Refactoring a bad or leaky abstraction can be fiendishly difficult. I'll take the toil any day.

hombre_fatal · on Dec 18, 2020

These kinds of abstract code discussions almost become immediately absurd because, for it to work, we have to be imagining the same hypothetical codebase. Yet we never bust out concrete code. It's funny.

But the outcome that you're lambasting so rudely (really? "terrible idea" when we aren't even looking at code?) is still often the best outcome.

Some rotting, bugless, duplicated code is some of the easiest code to work with. It's the code you wish you had when you're debugging the complicated failing abstraction that GP wanted to avoid. The most damning thing you yourself could say about it is that it was taking up space.

In fact, you seem to be making the exact reverse argument of GP: that some duplicated, unabstracted, bugless code poses such a risk ("rot") that it's a "terrible idea" to not immediately merge it into one frankenabstraction.

When this happens in an argument, usually you both are imagining an absurd extreme that's opposite of the other's chosen extreme. And you actually are in agreement, as you'd both go "oh well yeah, if you go to that extreme, then I'd definitely agree with you."

jonno123 · on Dec 19, 2020

I want to frame your first sentence. Abstract code discussions are absurd. I’ve learnt to mostly ignore programming blogs if they don’t include some concrete code.

oftenwrong · on Dec 19, 2020

Most code maintenance issues don't really appear until a relatively large amount of code and code evolution is involved. It's hard to include that in a blog post while keeping it digestible for the average reader.

In a small example, excessive duplication is not a problem, and excessive DRYness is not a problem.

Therefore, it makes some sense to me to leave out code, and just hope the reader has personal experience to draw relevant examples from.

dhruvkar · on Dec 18, 2020

Actually asking, as I've never worked on a large existing codebase:

If there is no bug, why would it continue to rot the codebase?

systemvoltage · on Dec 18, 2020

https://en.wikipedia.org/wiki/Code_smell

oxfeed65261 · on Dec 18, 2020

Will you be here all week?

systemvoltage · on Dec 18, 2020

Not sure if I follow?

oxfeed65261 · on Dec 18, 2020

I thought that your response was hilarious, and deserved a "Thank you, I'll be here all week." :)

dave_sid · on Dec 19, 2020

Uncle Bob covers this by talking about how real duplicate code should change for the same reason. If the two pieces of code look the same, but from a business perspective can change for very different reasons then you have incidental duplication and should be treated separately.

throwaway201103 · on Dec 19, 2020

I found this so often when I was doing simple back-office crud coding. There would be a new business use case that was very similar to something we already had coded. Code would be copied, since that's the easiest thing to do and you at least started with something that you knew was working.

Later, the new use case would evolve and have some new requirements. Had we abstracted the functionality originally, we'd have to go back and make the abstraction handle both cases. As it was, we could just change the copy that needed to change, and know that we weren't going to break the other case.

This was also before the practice of automated unit testing was well understood and supported by development tools, so the motivation to "not risk breaking working code" was much stronger than maybe it is today.

dave_sid · on Dec 22, 2020

Ah... simple back office crud coding. My favourite type of coding.

Aeolun · on Dec 19, 2020

Of course, if your abstraction was an interface, you could just use a different implementation of the same interface.

fiddlerwoaroof · on Dec 19, 2020

So, this take is sort of fashionable now, but it’s never really convinced me: what I’d suggest is that, when you discover that the extracted method is a bad abstraction, you do one of two things: (1) inline the method (IntelliJ is great at this for Java) or (2) duplicate the method and rename, adjusting the new version for the new use case. As hard as naming may be, adding the level of abstraction often ends up helping keep each method working at a single level of abstraction.

In my experience, the biggest messes I’ve inherited were a result of not being DRY enough: the cases of incidental duplication I’ve run across have been comparatively easy to unwind.

jonno123 · on Dec 19, 2020

Couldn’t agree more. I can agree that premature abstraction is bad. But unnecessarily duplicated code outweighs this by orders of magnitude, in my experience. So stay DRY, and later when you hit that 1 case in 100 where the code needs to diverge, it’s easy enough to copy/paste.

throw1234651234 · on Dec 19, 2020

This is a very key point - when you re-use, you often need to rename. I actually can't think of a clear example, but this happens ALL the time.

MaxBarraclough · on Dec 19, 2020

> before long your helper method is extremely difficult to reason about, because it’s actually handling a dozen cases that are superficially similar but full of important differences

This reminds me of Mike Acton's Three Big Lies of C++, specifically Lie 2: Code should model the world. [0]

> A chair is a chair, in real life. But in terms of data-transformations, in terms of what we do, these classes are really only superficially similar. In the context of a game, we have a Chair, a PhysicsChair, a StaticChair, a BreakableChair. These things are not at all similar. There's almost nothing that's the same between these contexts. How they're handled, how the data is managed, how the data is transformed, there's virtually nothing that's the same here, and yet the tendency would be because they share some world-modelling similarities, their similarities in the real world, they ought to be connected somehow in the code hierarchy, which is non-nonsensical. World-modelling leads to monolithic, unrelated data-structures and transforms. [...] You can't make a problem simpler than it is.

[0] https://youtu.be/rX0ItVEVjHc?t=1153

rualca · on Dec 19, 2020

> A chair is a chair, in real life. But in terms of data-transformations, in terms of what we do, these classes are really only superficially similar. In the context of a game, we have a Chair, a PhysicsChair, a StaticChair, a BreakableChair. These things are not at all similar.

I feel like that comment either takes a very literal and naive analysis to the problem or fails to identify the objects being used.

Just because the word "chair" pops up in a few objects that does not mean they are supposed to be the same thing, and thus modeled as specialization of a common Chair class.

For example, PhysicsChair makes sense as a specialization of a physics-related class, not a Chair-related class. Breakable hair would also make sense as a specializationof a physics object, which might be comprised of multiple discrete elements or track damage to generate new particles when a threshold is reached. If we take the single responsibility principle seriously, it makes sense to have specialized physics and graphics classes that implement the functional requirements of handling a chair.

This by no means implies that class Chair should be a superclass of all these other cases, or even that it makes sense to even consider them to be related at all. A failure to identify the models and their functional relationships doesn't mean that your domain has to include relationships that don't really exist nor make sense.

MaxBarraclough · on Dec 19, 2020

> Just because the word "chair" pops up in a few objects that does not mean they are supposed to be the same thing, and thus modeled as specialization of a common Chair class.

You agree with his point that class hierarchies shouldn't necessary model reality.

> For example, PhysicsChair makes sense as a specialization of a physics-related class, not a Chair-related class.

Right, that's his point.

> A failure to identify the models and their functional relationships doesn't mean that your domain has to include relationships that don't really exist nor make sense.

Again, that's the point he's making. In terms of modelling the world, they're all just different kinds of chair, but that's doesn't mean you should model it that way in code, with object-oriented types.

rualca · on Dec 19, 2020

> You agree with his point that class hierarchies shouldn't necessary model reality.

I'm pointing out that this assertion is meaningless, because it completely misrepresents what is actually done with class hierarchies.

The class hierarchies in the domain models (i.e., Chair) is never the class hierarchies used by components in the the service layer (i.e., PhysicsChair). Functional requirements of specific services never sleep into the domain model. Call it bounded context, single responsibility principle, or encapsulation, but even if we play little semantics games a home is not the same as a mobile home. Pinning together unrelated concepts based on the naive belief that sharing a keyword is enough to bundle concepts together just misrepresents the whole issue as a straw man.

> Right, that's his point.

No,the point is that it's absurd to talk about class hierarchies with this example, based alone on the simplistic idea that having the word "chair" appear in the identifier is all you need to logically bundle unrelated concepts. The example is poorly thought through even through the perspective of a straw man.

> Again, that's the point he's making. In terms of modelling the world (...)

That's what you are failing to understand: they are not modeling the world. At all. And from the start.

Only the domain model (chair) models the world. The rest is not the domain model, but data types used in the service layer to meet functional requirements (i.e., PhysicsChair).

Let's put it bluntly: the straw man you are trying to defend is something that no one one the world at all ever mixes up. Ever.

snapcore · on Dec 18, 2020

This sounds similar, but different to the three strike rule:

https://en.wikipedia.org/wiki/Rule_of_three_(computer_progra...

Groxx · on Dec 18, 2020

My boundary on DRYing has settled on "when it's semantically the same code".

Code that looks the same but is used for different purposes / has different meaning? Keep it separate. They may diverge in the future.

Code that truly means the same thing, but may or may not look the same? DRY it up, any variation is probably unintended and will surprise people later when A doesn't behave like B.

vlfig · on Dec 18, 2020

Exactly this. One thing is to have (today) the same behaviour, another is to (forever) be the same thing.

reader_mode · on Dec 19, 2020

DRY is a good principle to default on, juniors need to learn to pick this up - I constantly need to fix this kind of thing in reviews, they haven't learned to recognise when stuff should be factored out and they also write buggy code so it's easy to show them that writing the code once, isolating dependencies and testing it is a better approach than coming up with the similar code over and over in ad-hoc fashion.

Once you start getting it you tend to go too far (everything is a nail when all you have is a hammer) and experience helps you learn to judge when to use it - but it's an invaluable tool and you need to use to get the experience - so I would suggest extract shared code by default - inline when it gets too complicated.

m463 · on Dec 20, 2020

It took me a while to realize this truth, because you might learn it only by seeing your own code after forgetting it, or spending enough time in other people's code to understand how it works through and through.

One good example of this is paths in makefiles. It's so much easier if you just make all the paths relative to the root of the project, but people have to either over-abstract or under-abstract everything or both.

   $(CC) -o ../build/foo foo.o $(BARDIR)/bar.a ../bletch/bletch.o $(BAZLIB) etc..

why not just:

   $(CC) -o $ROOT/build/foo foo.o $ROOT/build/bar.a $ROOT/bletch/bletch.o $ROOT/baz/baz-2.0/bazlib.a

this goes for all sorts of things. Yes, you might build a directory from somewhere else or might, might might. But why not make it clear what you are doing, not unclear depending on what you might do?

potta_coffee · on Dec 19, 2020

I try to suggest this exact strategy, but it's hard to find acceptance in my current job. I call this "letting the code tell me what to do." I find that religious adherence to DRY and all kinds of other rules ends up in just...shitty, shitty code.

rileymat2 · on Dec 19, 2020

> So you try to extract that boilerplate into a method, and it’s fine until the very next change. Then you need to start passing options and configuration into your helper method... and before long your helper method is extremely difficult to reason about, because it’s actually handling a dozen cases that are superficially similar but full of important differences in the details.

What if the flaw is not the initial deduplication, but the flaw was to continue to use it by adding additional responsibilities?

When I think of the incidental type, I think of things that just happen to be alike, a contrived example would be ENUM value that happens to be 3 used in different places ands that works because both happen to use 3.

splittingTimes · on Dec 19, 2020

"Let the code go through a few evolutions and waves of change. Then one of two things are likely to happen...".

That sounds great in theory, but how long is your evolution time and do you keep track of the changes in practice? I would assume a good evolution time is 3 to 6 month. I would say half the time I work on a story i face that decision. If you close 1 to 3 tickets a week you have to track a lot these decisions [1]. Now add that you work on a team of 5 and you have 5 engineering teams overall.. it seems to me that this approach does not scale.

[1] granted not all tickets will be new decisions in new places of the coffee base and you do come across some of your old changes and do the actual evolution.

gwillz · on Dec 19, 2020

I've actively taught this to students as WET and DRY.

Write Everything Twice, then DRY it later.

SkyMarshal · on Dec 19, 2020

Good observation. I seem to recall in one of Linus Torvalds' old rants about C vs C++, he mentioned locality, and how abstracting everything possible in a codebase resulted in loss of locality.

Locality encompasses multiple concepts, one of which is what you mention. Another is the ability to look at any section of code and understand exactly what it does without having to dig through chains of abstractions.

The implication is that locality results in more LOC, but a more tractable codebase, especially as it gets larger and more complex and more difficult for engineers to hold in their head.

js8 · on Dec 18, 2020

In imperative languages without macros, it might be difficult to abstract structural similarity, like a common if statement.

In functional languages, where every statement is a function, this is often easier.

jayd16 · on Dec 18, 2020

You can write your method in a functional style in the popular imperative languages. This can stretch how DRY you can be a bit further.

daitangio · on Dec 19, 2020

I use a similar approach: first code, then take a refactoring session and look if some repetitive code can be "condensed" in a common method.

By the way the copy-pasta code is very resilient: a bug introduced in one place cannot spread, because the other code is duplicated and old.

Instead if you have too much condensed code, you end up with complex method which are difficult to maintain, because you have a huge "coverage land of code", and you get scared when you need to change it.

thunderbong · on Dec 20, 2020

Rule of Three - https://en.wikipedia.org/wiki/Rule_of_three_(computer_progra...

dnissley · on Dec 19, 2020

I like to say that the opposite of DRY is PYIAC: Painting Yourself Into A Corner -- because you often only notice a situation like this once it becomes incredibly difficult to get out of.

malux85 · on Dec 19, 2020

It’s a trade off - DRY vs abstract Dependency, are they superficially familiar (then don’t create a unnecessary dependency) or are they functionally familiar? Then DRY

thisiszilff · on Dec 18, 2020

I think this misses the point of DRY a little bit. DRY isn't about not copy pasting code, it's about ensuring that knowledge isn't repeated. If two parts of the system need to know the same thing (for example, who the currently logged in user is, or what elasticsearch instance to send queries to, etc.), then there should be a single way to "know" that fact. Put that way, DRY violations are repetitions of knowledge and make the system more complex because different parts know the same fact but in different ways and you need to maintain all of them, understand all of them, etc. etc.

Code blocks that look to be syntactically the same are the lowest expression of "this might be the same piece of knowledge" insofar as they express knowledge about "how to do X", but the key is identifying the knowledge that is duplicated and working from there. Sometimes it comes out that the "duplication" is something like "this is a for loop iterating over the elements of this list in this field in this object" and that is the kind of code block that contains very little knowledge in terms of our system. But supposing that that list had a special structure (ie, maybe we've parsed text into tokens and have information about whitespace, punctuation, etc in that list) and we start to notice we're repeating code to iterate over elements of the list and ignore the whitespace, punctuation elements in it, then we've got a piece of knowledge worth DRYing out given that all the clients now need to know what whitespace & punctuation look like even when they'd like to filter them out.

It's worth pointing out that DRYing out something isn't necessarily "abstracting", it is more like consolidating knowledge into one place.

ThrustVectoring · on Dec 18, 2020

> ensuring that knowledge isn't repeated

The most fun bug I've encountered as a web developer is of this category. Two pages, both check for a logged-in user and redirects to the other if found or not found, respectively. The bug was a subtle difference in how these were calculated, the details of which are unfortunately lost to the sands of time. The end result was that if you sat on one of the pages and waited for your user session to time out, you'd get stuck in a redirect loop between the "logged in" and "please log in" versions of the page.

Anyhow, the point of this is that when you calculate the same fact two different ways, you will occasionally build something that makes an unwarranted assumption that because it's the "same fact" you wind up with the same answer. This is an entire category of easily missed and often subtle bugs.

phist_mcgee · on Dec 19, 2020

And in both cases, it was a sign of mismanaged design. I have encountered THAT EXACT bug, and the reason we supported both was because both were released and users began to expect both pages for different reasons. What we needed was a designer to sit down and say, hey, this design seems replicated, how do we mitigate this? This version of DRY becomes a business and resource problem, above the developer, and unfortunately, this means that you or do not have the resources to adequately deal with it.

majormajor · on Dec 18, 2020

I haven't seen this "don't repeat knowledge" take before, it's pretty interesting. I see why you don't want mutated various versions of the same information all over the place, but you still have dangers.

Especially if you "overly reduce" your knowledge. If your common recipe is "do A, B, C, D, E" and you reduce that to just "do X," for instance.

I've seen this often turn into "now, instead of the knowledge being repeated in several places, it's hidden in one place and only one person knows it." Everybody else just relies on the library doing its magic, and when someone needs to do something differently, they have this huge mountain to climb to figure out how to modify the code to also do "J" for certain cases without breaking everyone else.

bluGill · on Dec 19, 2020

As someone who deals with 15 million lines of code (and many readers of this have bigger systems) i need to trust that do X does X without me having to know how. When I have to learn it slows me down from the part of the code I need to know well. If do J is needed, that needs to be someone else's problem who knows the rest of do X. Unless do X is my responsibility of course. But nobody has responsibility for more than a small fraction of the code.

hrktb · on Dec 19, 2020

This is a great point often forgotten in this kind of discussion.

Size matters, and depending of the system size we’re dealing with it will have a significant impact on what approach we take. Or how we handle documentation for instance.

thisiszilff · on Dec 18, 2020

There is definitely a spectrum of "knowledge" at play when it comes to these considerations. The most obvious DRY violations are those kinds of things that you go "oh I need to test for this case" because that is usually an indication of some knowledge you need to know when interacting with a piece of code. EG, if you ever use -1 as a sentinel value then the knowledge of "what -1" means should be consolidated together, otherwise all clients will have to know that -1 is a sentinel, what it means and at best you'll have duplicate code, at worst those interpretations won't align and you might have a subtle bug where that -1 is doing something somewhere (ie it is supposed to mean "No information provided" but somewhere something is keeping an arithmetic mean of this field and those -1s are now screwing up your metrics and you don't really notice).

When we think about the knowledge of "how to do something" that's where things can get confusing. 9/10 times I'd say that right move is to look for common assumptions or facts. IE it isn't just "doing something" that is important, but the assumptions made in the process of doing it:

As an example, consider finding the average word length in some piece of text. We might start writing that feature like:

  def count_words(text: str) -> int:
      return len(text.split(' '))

  def average_word_length(text: str) -> int:
      num_words = count_words(text)
      word_lengths = []
      for word in text.split(' '):
          word_lengths.append(len(word))
      return sum(word_lengths) / num_words

then the piece of knowledge they share is "what a word is" and the DRY refactoring would pull out that piece of knowledge into its own function

  def words(text: str) -> List[str]:
      return text.split(' ')

that might be code you write when starting to write a feature and that's the kind of "ding ding ding there's common knowledge here" that should guide refactoring. The system has a concept of a "word" that we've introduced and its important that knowledge about "what a word is" in one place. For DRY things it frequently doesn't make any sense for there to be multiple statements of "what a word is" where the system wants to use the same concept.

Kind of orthogonal to this is abstraction where the focus is on "usefulness" and that is where 100% you can abstract incorrectly, prematurely, get screwed over by requirement changes, write a library that hides everything and makes people angry. The example you provide seems more like an error in abstraction where things that should be close together are too far apart in the system (ie, some "fact" is hidden away and another part of the system wants to know it), but the consolidation and DRYing of those facts, I'd argue, is a lot easier once we've figured out how to identify them

majormajor · on Dec 18, 2020

Yeah, I like this approach, because the "what is a word" knowledge is a nice piece of common functionality that doesn't make sense to repeat. It's unlikely to change for just one of those two functions.

In my example, it's less a "core piece of knowledge" that people are trying to DRY, and more just a "common sequence." Someone sees a bunch of different places where we have a sequence of calls like A, B, C, D.. and says "oh this is a shared method I can extract" even if there's plenty of ways that in the future you might want to do A, B, C, E without D. And so then you pass in a bool, than another one, and you have a centralized mess...

jameshart · on Dec 19, 2020

I think the distinction is that if those two pieces of code had a different idea of what a word is then that would constitute a bug, then you definitely need to replicate the 'how to find words' logic. But if it doesn't really matter if two different pieces of code are using the same exact way to do something, then that's likely 'coincidental' replication. If you need to do word splitting, and someone else has written a word splitter, by all means copy paste their code to get you started, but definitely don't assume the best plan is to pull their code in as a dependency.

amatecha · on Dec 19, 2020

Yeah, I saw this approach in a book called "The Pragmatic Programmer", which I highly recommend. Agreed 100%

cogman10 · on Dec 18, 2020

These things need to be balanced. I live in an ecosystem of DRY gone amok and it's not pleasant.

There's a standard library to connect to databases. There's a huge hierarchy setup just to start an app running.

All of these super dry infrastructure changes have, unfortunately, come with a huge cost. We are still stuck on ubuntu 14.04 because our super dry puppet framework we invented can't be ported to puppet 6.

We are stuck talking to MS-SQL, because our super dry database connection management library can't handle establishing other database interactions.

We are still stuck on Tomcat 7 because our super dry Jersey libraries don't work with newer versions of Jersey (which has locked us into older versions of tomcat!).

Consolidation is a decent goal, but it really needs to be measured. For me, it is FAR more important to consolidate on the how to do things and not the what does things. In otherwords, rather than making an "elasticsearch connection library" specify "This environment variable is the elasticsearch host/credentials" and let the apps move from there.

That's because, when it comes right down to it, configuration code is super easy to write and it really doesn't matter if it's duplicated. You want your libraries consolidating knowledge to be for things that are easy to get wrong (such as checking who is currently logged in or how to authenticate).

thisiszilff · on Dec 18, 2020

> Consolidation is a decent goal, but it really needs to be measured. For me, it is FAR more important to consolidate on the how to do things and not the what does things. In otherwords, rather than making an "elasticsearch connection library" specify "This environment variable is the elasticsearch host/credentials" and let the apps move from there.

I think we're in agreement here. Config is the most basic kind of knowledge because when something wants to know about the elastic credentials,it almost never makes sense to have it in two places if those two places are supposed to be the same thing.

How to actually connect to elastic -- that's the part that is more iffy. If there is some knowledge we've added there, then it makes sense to DRY it up, but the knowledge of "this is how you pass credentials to this elastic search client" isn't the kind of system knowledge we care about. If, for example, there were some kind of parameters that we had to set on each connection and we claimed it as a piece of knowledge that all of our connections to this service are of this specific TYPE and have these specific parameters, then we've started to add some additional systemic knowledge that might need to get consolidated.If someone were to start working on a piece of code and I feel the need to tell them "Don't forget about X" then that is the kind of situation where DRY comes into play. If it's just a vanilla connection to a database and we don't care about the connections made, then I do given't think we have a violation of DRY given that there isn't an important piece of knowledge that's repeated.

At some point, especially when we pay too much attention to copy-pasted code, we end up abstracting. Abstracting is hard, more general, very difficult to do right, almost always done to early. DRYing out knowledge is easier and almost always improves things.

TrispusAttucks · on Dec 18, 2020

This is a good interpretation. Similar to a "Single Source of Truth" [1].

[1] https://en.m.wikipedia.org/wiki/Single_source_of_truth

citrin_ru · on Dec 18, 2020

IMHO it is not the author who misses the point of DRY, but countless developers who make code less readable only to reduce visible repetition or to avoid copy-n-paste. May be DRY is just a bad name.

thisiszilff · on Dec 18, 2020

Yeah, I'd agree. When the principle was introduced it was stated as:

> The DRY principle is stated as "Every piece of knowledge must have a single, unambiguous, authoritative representation within a system"

(from wikipedia)

It feels like the name really took over the intention and it became about code repetition instead of knowledge repetition.

gen220 · on Dec 18, 2020

I agree that the name took over. The intention sounds synonymous with bounded contexts of DDD.

I find the vocabulary of DDD to have more explanatory power. Especially with people who don’t grok the difference between removing repetition and consolidating models.

I think repetition is a symptom that a code base may be afflicted with interwoven domains, but the existence of repetition is not sufficient for the diagnosis, IMO.

pksebben · on Dec 19, 2020

The only problem is that's hard to speak. Also, what's DDD?

gen220 · on Dec 19, 2020

It’s “Domain Driven Design”: https://en.m.wikipedia.org/wiki/Domain-driven_design.

Bounded Contexts is an idea that helps you draw the boundaries between domains. It asks you to be disciplined in your abstractions, and in return it allows you to feel comfortable changing implementations within a domain without fear of cascading second order effects to other domains.

For example, your service/library for managing customers shouldn’t return data about the books they’ve purchased. That comes from the order context, which composes the customer and book contexts.

If your boundaries are well defined, you can change the order process without fear of the book and customer models, and vice versa.

It marries well with service oriented architecture, because you can use the network to help enforce a boundary. You still need some skill to enforce the correct boundary, of course.

Tyr42 · on Dec 18, 2020

One Source of Truth or OSoT doesn't sound as nice as DRY though.

unscaled · on Dec 19, 2020

+1000 on that point.

Yes, I've dealt with systems that had bad abstractions. And I've also dealt with systems where knowledge of highly nameable things - like how to authenticate a user, or how to connect to a database, how to obtain a token to the same API server - wasn't centralized.

Systems of the first kind are certainly bad. It takes a lot of time to understand before you can get ahead and start refactoring. If your organization had low code review discipline at any point, abstractions often become hard to refactor with time, since some developers don't understand the abstractions, and instead of fixing them, just work around them with thread locals or lots of branches.

But systems of the second kind are much worse. Here what happens is that duplicated knowledge invariably diverges with time. It can be developers fix a bug in one place and forget the other, or adding a certain feature in one place and another one the other place. Over time, each implementation of the knowledge has it own unique behavior and bugs, and some parts of the sprawling code base grow to depend on a certain behavior. Or perhaps your code doesn't, but you have other services in other part of the company consuming your API that do and you just have no idea if they rely upon the implementation difference or not.

nicbou · on Dec 18, 2020

If you write it once, you eliminate the chance of a small fix not propagating properly. This is particularly common when handling files and network connections, as those tend to develop edge cases over time.

DRY reduces the number of potential lose ends when you update your code.

BurningFrog · on Dec 19, 2020

Yeah, one important drying question is "if this fact/assumption/value changes, how many places in the code will have to change?".

If it's > 1, you have a moisture problem!

sagichmal · on Dec 19, 2020

Not all code is knowledge, in this sense. And sometimes repeating knowledge is better, on balance, than unifying it somewhere, when you consider the added costs of coupling, of reification, and of abstraction liability.

layer8 · on Dec 18, 2020

A better formulation of DRY is SPOT — Single Point Of Truth. In the event that the logic is changed in one copy, should the other copy always be updated accordingly? If the answer is yes, combine them into a single copy, so that they don’t diverge and create ambiguous “sources of truth” in the future. Conversely, if it is likely that the logic in the two copies will need to diverge in the future, due to having a different context, then do not combine them, because they represent different “truths” that just currently happen to have the same form.

Of course, the answer to that question can change over time, and one has to combine or duplicate accordingly. This also serves to document the intent that “yes, these two occurrences are expected to evolve identically”, or “no, these two occurrences are expected to evolve independently, even though they currently happen to look the same”.

The article is correct though that there is a trade-off in terms of the complexity created by the abstraction, and in how important the “common truth” is. Sometimes a source comment pointing out the dependency is better than introducing a nontrivial abstraction.

The book “A Philosophy of Software Design” argues that there are two sources of complexity in software: dependencies and obscurity. Combining two similar pieces of logic into one can reduce dependencies (of one having to be changed when the other is changed), but can increase obscurity due to the added abstraction. If the combining was done for the wrong reasons (the two occurrences actually need to evolve independently), then the dependencies are increased instead of reduced.

camgunz · on Dec 18, 2020

Love this, yeah. I've heard it phrased like "how do we answer questions" so like, "how do we answer questions about a job's status", "how do we answer questions about a user's bank account balance". Either way, once you have those kind of product requirements in place you can build to that spec, and then start iterating as you gain more knowledge.

bob1029 · on Dec 18, 2020

Duplication is usually a safe default course of action because you are not locking yourself in to any particular consensus of the problem domain. Obviously, too much of this will render a codebase a nightmare to maintain as bugfixes and feature enhancements have to be applied in multiple places.

I have found that starting with duplication is by far the easiest and most flexible way to work through problem domains that are complex. Once you have a really good grasp of the modeling, then you can iterate and decide on normalization where appropriate.

Thinking about this from an analytical perspective - If you build your application with duplication by default (i.e. define a domain model for each logical use case/scenario), then you will have an excellent analysis already in front of you regarding which business types should be normalized and which ones might be a little bit trickier to make common. Many times it is impossible to fully explore a problem domain until you have already written software against its entire extent.

xnx · on Dec 18, 2020

And often the process of de-duping repeated code later on isn't as bad as DRY purists make it out to be, especially since it can be done incrementally. Example: If you have similar functions in 7 places, you can consolidate them one by one. If you have 1 function used in 7 places, you have to consider all implications for all of those code paths.

Ma8ee · on Dec 18, 2020

But when you set out to do that you first have to make sure they really are identical, and since everything from indentation to names often differ it might not be so automatic. And then you do find differences, and you have to figure out if they are there by accident because someone forgot to implement a fix or a change in some places or if the differences are intentional.

On the other hand, if you realise your abstraction was bad, duplicating a function is always trivial.

chousuke · on Dec 18, 2020

In my experience, the "later" may come too late.

I've seen code copy-pasted and slightly modified over a dozen times, sometimes without even eliminating dead code! Copying a function or even a whole file is fine if you actually take a moment to consider whether it needs refactoring or not, but more often than not people will just copy and bash at the code until things work without actually making a conscious choice to duplicate code over refactoring.

azundo · on Dec 18, 2020

This seems like an equivalent duality to me but in the case of seven independent functions you're much more likely to miss considering a case. If behavior changes for one you likely should be considering if that change applies to all the others.

jtdev · on Dec 19, 2020

Yes, yes, yes! I wish devs could relax just little about duplication. It’s far less harmful than bad abstractions built with the only purpose of checking the DRY box.

BeetleB · on Dec 18, 2020

Predictably, another article that doesn't know what DRY is, leading to slaying of strawmen.

I believe DRY was coined in Programming Pearls, and probably none of the examples in the article are instances of DRY.

DRY is about knowledge/requirements, not similar code. It is about ensuring that a given requirement is not duplicated in multiple places in the code. It is not about similar looking code, which often involves differing requirements but just happen to be coded similarly. The latter leads to coupling if you unify it into one piece of code.

Ensorceled · on Dec 18, 2020

I sort of agree with you. A better title/approach would have been "These aren't DRY, they're silly 'no common code' bigotry". I find the article resonates because I've seen all of these anti-patterns defended with "because DRY". I agree, it's not DRY; but so many people get stuck on the "no code duplication" part. I'm not sure if the "This isn't DRY" is the best fight or "Sometimes DRY is not the best".

My favourite block of "DRY" code was a method that had a triple nested loop (for all object A, for all objects A.B, for all objects B.C) with a bunch of flags (like 15 different bools, ints, dates and arrays) that changed the ORM filters for A, for A->B, for B->C, and then changed what operations were done on A, B, C. Basically, at the end, the only similarity was the foreach part. The comment on the block of code had "Keep this loop together for DRY", as if they knew this was going wrong but not sure why. It ended up being 3 or 4 much simpler methods, based as you say, on requirements NOT code "shape".

BeetleB · on Dec 18, 2020

> I'm not sure if the "This isn't DRY" is the best fight or "Sometimes DRY is not the best".

The problem is that when people believe this is DRY, they then tend to oppose the "real" DRY as well.

Likely we'll have to give another name to the "real" DRY principle. In general, I've always felt that catchy names/acronyms are a bad idea for anything (e.g. free software, open source, pro-life/choice, etc). Almost all of them end up being used in ways that were differ from the original intent.

Ensorceled · on Dec 18, 2020

I guess I have to agree, in the example I gave, I actually had to fight with members of the team because they were sure the new multiple function approach (one for user, company and analytics) wasn't "DRY" in their eyes.

Ma8ee · on Dec 18, 2020

Yes, thank you! Exactly what I was trying to express in another comment. And if you do it the right way, none of the problems with DRY usually brought up will be relevant. The whole notion of looking at code and trying to spot similarities to find abstractions is completely backwards.

deathanatos · on Dec 18, 2020

I've referred to this in the past as "semantic duplication" (code that is the same by definition/requirement) vs. "syntactic duplication" (code that just happens to do the same thing today, but there is no requirement that requires both copies to remain the same).

jtdev · on Dec 19, 2020

The problem is that many/most junior through mid-level developers don’t share your definition of DRY.

camgunz · on Dec 18, 2020

The problem isn't DRY, the problem is "helpers". Helpers are an anti-pattern, they don't fit in your architecture, they have no mental model, they're difficult (impossible) to name and organize, and they're extremely resistant to refactoring. Effectively they're spaghetti code.

The example I always come back to is auth. If you're doing the same thing like "parse a cookie header, get the session, make a DB connection, look up the session info, etc. etc.", consider how you could architect the layers of your application using a mental model that people would find easy to reason about. That might be some OO, middleware, or even a macro, but the point is that it's thought about, designed, engineered, and documented.

The reason helpers are more prevalent than thoughtful architecture is that humans are a lot better at prioritizing the short term "I improved it" fix from factoring into helpers over doing the long term work of architecture. If you want to change this, it starts with cultural values that prioritize long term sustainability.

runald · on Dec 19, 2020

What about the "helpers" found in the standard library? How do you feel about those things?

camgunz · on Dec 19, 2020

Do you have an example you’re thinking of? I don’t think standard libraries automatically have good architecture (see: PHP), but they do have a big influence on culture, which is interesting.

runald · on Dec 19, 2020

No, I meant if you think helpers are anti-pattern that results into a spaghetti code, then you must also think that standard libraries are an anti-pattern mess. I don't think there's a standard library that doesn't have helpers, unless we don't agree on what a helper is?

camgunz · on Dec 19, 2020

Hmm that could be it. Like, Python has json and it works like pickle before it. I don’t think that’s a helper and I like that it reinforces that serialization pattern.

What I think of as helpers are like base64_auth_to_username_password. That’s factoring out like 2-3 lines of code that may be duplicated in a half dozen places, but in truth represents an incomplete abstraction, layer, or subsystem.

Osiris · on Dec 18, 2020

The way I like to work is first to write out all the code I need to make something work correctly, then I go back over the code to see if there's anything that could be simplified or split into separate functions, etc.

I really like to see DRY code, but if you have to make a helper function that takes a bunch of parameters with a bunch of conditionals to something slightly different, you might be better off just sticking the specific logic you need in each place.

The worst case of copy-pasta I saw in a codebase I came into was a function that was 1000 lines long, duplicated 3 times with < 10 lines of it different for each copy. That's a classic case for DRY to be applied.

jakelazaroff · on Dec 18, 2020

I feel like I post a link to this comment [1] every time the abstraction vs. DRY topic comes up, but it’s just such good advice. I consciously try to remember it whenever I’m programming.

> Dependencies (coupling) is an important concern to address, but it's only 1 of 4 criteria that I consider and it's not the most important one. I try to optimize my code around reducing state, coupling, complexity and code, in that order. I'm willing to add increased coupling if it makes my code more stateless. I'm willing to make it more complex if it reduces coupling. And I'm willing to duplicate code if it makes the code less complex. Only if it doesn't increase state, coupling or complexity do I dedup code.

> The reason I put stateless code as the highest priority is it's the easiest to reason about. Stateless logic functions the same whether run normally, in parallel or distributed. It's the easiest to test, since it requires very little setup code. And it's the easiest to scale up, since you just run another copy of it. Once you introduce state, your life gets significantly harder.

> I think the reason that novice programmers optimize around code reduction is that it's the easiest of the 4 to spot. The other 3 are much more subtle and subjective and so will require greater experience to spot. But learning those priorities, in that order, has made me a significantly better developer.

Dan Abramov also has a good talk about this [2].

[1] https://news.ycombinator.com/item?id=11042400

[2] https://overreacted.io/the-wet-codebase/

nmfisher · on Dec 19, 2020

That’s a really insightful comment, and also the reason why I think my code quality improved considerably after learning Clojure and F#.

I discovered that immutability in functional programming made it far easier to reason about state - something goes in, and you know exactly what comes out. No verbose checking logic, no bugs because something wasn’t set that should have been.

I’d personally switch coupling and complexity (though it really depends on your definition of the latter). Coupling is usually much less of a problem than you expect it to be, and is easier to identify/rectify in retrospect.

jakelazaroff · on Dec 19, 2020

Well, there's certainly a lot of overlap in the Venn diagram of coupling and complexity! The really insidious thing about coupling is that it can transcend module barriers — you can make changes that cause problems in places that seem to be entirely unrelated. Whereas e.g. cyclomatic complexity can be tough to tease apart, but it's at least limited to a single module that you can work on in isolation.

pydry · on Dec 18, 2020

It's not just DRY. Every axiomatic or semi-axiomatic principle of software development ends up being a trade off. Good code lies at a local minima where multiple competing concerns are all balanced against one another.

At least, I've yet to see one which isn't.

kens · on Dec 18, 2020

I agree about the importance of tradeoffs. Looking at the historical perspective, though, the reason that every principle is a tradeoff is that the principles that are uniformly worse get discarded.

For instance, structured programming (building code out of blocks with structured control flow rather than a pile of gotos) was victorious in the 1970s. Nowadays nobody considers the tradeoffs of using if-then-else versus gotos; structured code is the automatic choice.

Self-modifying code was very popular in the 1950s (since it was the only way to get many things done), but essentially nobody uses it now.

Modularity is another victorious principle of software development, winning out over big blobs of code with global variables.

Using a stack for subroutine calls used to have tradeoffs, but now nobody would consider an alternative.

Looking at the long perspective, there is real progress in software development (although slower than I'd hope).

MauranKilom · on Dec 19, 2020

> Self-modifying code was very popular in the 1950s (since it was the only way to get many things done), but essentially nobody uses it now.

...which proves that it is a tradeoff. Just one where the local minimum currently is at one end of the domain due to context (e.g. ROM size limitations) having evolved.

__jem · on Dec 18, 2020

Abstractions are addicting for many developers, including myself. I switch between Go and Java. Go is the language I want my coworkers to use. I'd rather read "bad" Go code than "bad" Java code all day long. Bad Java can be truly excruciating to read and review, particularly due to the poor choice of abstractions. Whereas, Go mostly gets out of the way and may be written poorly but is straightforwardly written poorly.

Still, there's a certain sense of aesthetic beauty that I just can't derive from Go, and why I kind of hate working in it. There's lots of things about Java and OO that I don't love, but reading a perfectly factored Java code base can be just beautiful. Mostly due to good choice of interfaces.

Now, those code bases might be rare and not worth the lift of a million bad abstractions. I'd probably agree at this point, but still, I find it odd that most Go code bases just feel dirty and thrown together to me. Hacking stuff together in a mostly procedural language with good deployment story is probably the right way to write for-profit code, but I'm not sure I love it.

city41 · on Dec 18, 2020

Happy to see people questioning DRY. Hard to argue with some of these points.

Dan Abramov also wrote on this fairly recently: https://overreacted.io/the-wet-codebase/

indymike · on Dec 18, 2020

A lot of programming maxims like "stay DRY" are rules of thumb, and often are dangerous, or at least lead to unexpected results when treated like laws of nature. I had a developer who drank functional flavored Koolaid and refactored any single line of code that appeared more than once in an application. Was about 38K lines of code. When he was done, it was still about 38K lines of code. Was it functionally pure? yes. Was it very difficult to debug? Yes... you had to step into sometimes five or six functions to get to a single line of logic.

bluetwo · on Dec 18, 2020

I agree. I also think that when building a prototype, flexibility is more important than stability. I allow myself to repeat code when I think two similar things will be different by the time requirements are better known. Later I'll usually re-write in a more stable fashion.

carapace · on Dec 19, 2020

The whole DRY concept irritates my inner curmudgeon because it is itself a lousy repetition of the (formal mathematical) concept of refactoring. When you refactor code it's just like factoring an algebraic equation. If you're just removing duplication without understanding how it affects the structure of the system then it's a kind of "cargo cult" programming (IMO.)

Even when you know what you're doing. there's good refactoring[1] and bad refactoring[2].

[1] https://osdn.net/projects/joypy/scm/git/Thun/commits/aa43eb6... This is a good clean bit of refactoring: it makes the branch symmetrical and lets you move code to a common post-branch step.

[2] https://github.com/calroc/HulloWurld/blob/master/Hullo.html#... This is a terrible function that, while it abstracts the core of the two following functions well, makes the system harder to understand. In other words, the three factored functions are less desirable than the original pair of functions despite their redundancy, because the original functions were easy to understand and the new factored helper funtion is inscrutable. Context counts.

danparsonson · on Dec 18, 2020

"...putting common lines into functions, without careful thought about abstractions, is never a good idea..."

(emphasis mine)

I think this is the crucial part. DRY works fine and in fact arises naturally if the code is well factored to isolate areas of commonality - as the author points out though, this is very difficult to do and I think that's the core problem.

lock-free · on Dec 18, 2020

Not sure if anyone else shares this anecdote, but I've noticed the most DRY-hard programmers tend to be the most resistant to things like functional programming, monads, and other generic approaches which are the ultimate realization of DRY. And often the most inscrutable.

Another anecdote on DRY: I'm currently refactoring two interrelated systems that share a single function, and very horrible bad things happen if the systems disagree on the return value of that function. However sharing the same code is more complex than duplicating it and this has a nontrivial impact on how the systems are distributed. So today I'm undoing the original work I did to make it DRY - turns out that sometimes, you need to copy/paste.

BrianOnHN · on Dec 19, 2020

Is it DRY if you use rsync instead of copy&paste?

weavejester · on Dec 18, 2020

Clearly we need a good acronym for "duplication is better than the wrong abstraction".

oftenwrong · on Dec 19, 2020

I will humbly suggest "DOBA": Duplication Over Bad Abstraction. Not nearly as good as "DRY", but it is fun to say.

I wonder how much of DRY's popularity is due to the catchy name. One can say "this needs to be DRYed up" (verb), or "this code is really DRY" (adjective), or "I appreciate the DRYness of this code" (noun). The opposite of DRY: "wet", of course. Being a homophone of an existing word that is both a noun and adjective, and that has an obvious antonym really lends itself to usage.

bluetwo · on Dec 18, 2020

"Abstraction extraction is an expensive transaction."

Ma8ee · on Dec 18, 2020

I see similar arguments come up over and over at HN, and I say they stem from a fundamental misunderstanding of DRY. What we must not repeat is not lines of code, but how we do certain things in the code. That is, it’s fine to repeat syntactically identical sections of code if their semantic meaning is different. But if the semantic meaning is the same, they must never be repeated because we must never have several definitions of the same thing in the same program. This is similar to the concept of normalisations in DBs.