Someone else’s comment [1] I saved from an older post also about DRY
I’ve usually heard this phenomenon called “incidental duplication,” and it’s something I find myself teaching junior engineers about quite often.
There are a lot of situations where 3-5 lines of many methods follow basically the same pattern, and it can be aggravating to look at. “Don’t repeat yourself!” Right?
So you try to extract that boilerplate into a method, and it’s fine until the very next change. Then you need to start passing options and configuration into your helper method... and before long your helper method is extremely difficult to reason about, because it’s actually handling a dozen cases that are superficially similar but full of important differences in the details.
I encourage my devs to follow a rule of thumb: don’t extract repetitive code right away, try and build the feature you’re working on with the duplication in place first. Let the code go through a few evolutions and waves of change. Then one of two things are likely to happen:
(1) you find that the code doesn’t look so repetitive anymore,
or, (2) you hit a bug where you needed to make the same change to the boilerplate in six places and you missed one.
In scenario 1, you can sigh and say “yeah it turned out to be incidental duplication, it’s not bothering me anymore.” In scenario 2, it’s probably time for a careful refactoring to pull out the bits that have proven to be identical (and, importantly, must be identical across all of the instances of the code).
Rule of thumb: If you can't give the thing a name then maybe don't extract it. What you extract becomes an abstraction. No abstraction is better than a bad abstraction.
My friend told me at their company they'd commonly convene the "variable naming committee" for such occasions, and I can't help but think of it every time I find myself in the same place.
A variable naming committee might seem exaggerated, but I've seen far too many variable/method/class names already that are wrong, misleading or at best misspelled, so some more thoughtfulness is definitely warranted...
Unfortunately, a lot of people who believe to be "thoughtful" have caused a lot of 20+ character names, which still need other names compounded on top, when they could have used 10 character names that explain things just fine.
The developers I met generally weren't that great at narrating themselves, regardless of seniority. Narrative skills are woefully undervalued, and they aren't solved by a set of hard, scientific rules just yet. I pray I'm a unique example in experiencing this, but I doubt it.
The bigger tragedy is the illogical need for programmers to come up with "elegant" names. A 20 character name doesn't do any damage if it communicates the correct point. Neither does a 10 character name that also communicates the same point.
Why does a developer favor the 10 character name over the 20 character name when both do the exact same thing? Is the goal to save memory? What is the point? There is no point.
It is a subconscious bias that makes programmers want to give things elegant names over clear names. There is no harm in creating a 40 character name that is ugly.
def find_xy_coordinate_of_dogs_cats_and_baboons_in_picture(picture: Picture) -> List[XYCoordinates]:
#there is NOTHING wrong with this function name.
It baffles me to no end why humans have a tendency to turn the above for no clear reason into:
def imgrecFindAnimal(p: Pict) -> List[vectxy]:
Beauty and elegance in code belongs in structure not naming. Clarity belongs in naming not structure (Golang is the antithesis of this). When both are unionized perfectly you get elegant code that does not sacrifice clarity.
A really good example of this is a function that encapsulates a complex regular expression. That regex is all but unreadable but you can embed an entire comment/description into the function name. Seriously write a grammatically correct sentence and make it a function name, there is no reason why this is bad... was there a more elegant name that you could have came up with??? Who cares. No harm done with your huge name other than burning the eyes of your inner OCD.
Except of course if you don't have auto complete. Then I can see how it's annoying for you to type out a whole sentence when you just want to call a function.
> A 20 character name doesn't do any damage if it communicates the correct point.
That depends on the context. If the function is the highest level task called infrequently (as in your example), then long highly descriptive names are completely fine.
If this occurs at every level, all the way down to the building blocks it absolutely, severely affects legibility of the entire code base - it is literally a multiplier of code size, in the worst case of "all the way down" it's some kind of power function.
This barely fits on a line:
(a leftmultipliedbyright (a leftsubtractedbyright 1)) leftdividedbyright 2
Yet it's a simple polynomial that should just be a(a-1)/2, probably part of a larger expression, now the other parts will end up on other lines (because no one writes 500 char width code), the effect is artificially spreading code thinly - this destroys locality and legibility.
You would be right to point out my example is extreme and absurd, however operators are functions, they only use different, implicit syntax. Many intrinsically complex pieces of code must create their own domain specific building blocks at a slightly higher level of abstraction that are much like operators, and this is the place for extremely short function names (think vector libraries), as such a commonly used building block it is unreasonable to expect each reference to fully and explicitly express the functions purpose.
As with all of these types of things, there is a balance, I am arguing _for_ balance, not suggesting all names should be single letter or single word - but that they have their place. However in my experience very long names are far more commonly due to thoughtlessness, they include excessive redundant context and at worst even grammatical words.
Bit late to the party but I really enjoyed this comment and the subsequent conversation. I was given some advice when I was junior and I've been repeating it for years.
Code it read 100 times more than it is written/edited. Writing in a high level language is writing for a human first and a computer second.
As a general rule of thumb a variable name should be as big as the scope of that variable.
* If it's scope is one line it's okay to use a single letter.
deletedDocuments = documents.find(d => d.deleted)
* If it's within a block normally one or two words will be fine.
* If it's one file, 3 or 4 words.
* If it's global it should read like the opening paragraph to war and peace.
The last two are generally indications that something has gone wrong with how you are encapsulating your code and you should consider a refactor. However you will often have no other option in which case always lean towards more descriptive not less.
I think it may boil down to highly respecting the readers' time. If something conveys the same information to them but is shorter, it will appropriate less of the precious limited time of their lives. Notably this then becomes a subtle balancing act of estimating their knowledge and intelligence: make it too short and some implied context may be lost and require extra effort to research from them. An extreme example is science papers - the same paper can be clear and concise for you if you are an expert in the domain (the usually assumed audience), or an overwhelming effort if you're not.
For programming, specifically, though, I feel the typical style used in programming straddles the line where the brevity hits a point of obscurity that actually leads to more time spent trying to decipher meaning.
I would say the time that is lost to deciphering meaning is much much more detrimental then time lost to parsing over-verbose words by a very large margin. Thus it's better to err on the side of longer names in programming until the verbosity is equal to the English language. I mean nobody complains about the English language being way too verbose, so why not bring programming up to the same level of clarity and verbosity?
That's an interesting argument. Personally, I don't think I agree, i.e. I feel very differently (though e.g. typical Haskell is an example of being too dense for me too). But I can't currently capture the feeling in more concrete words. That said, if we're both now speaking about what we feel, did I manage to at least succesfully counter your argument about this being illogical? ;)
One question came to my mind that I'm curious what's your take on, from the point of view you present: what's your opinion on notations such as: numbers (i.e. 123 vs. English language: hundred twenty three), and chemical formulas (e.g. H2O vs. English language: particle of water)?
>But I can't currently capture the feeling in more concrete words.
I mean if you want something more quantitative: Count the amount of posts in this thread that were communicated with a programming language as the primary mode of information transfer versus the amount of posts that used English instead.
Because the usage of Verbose and wordy English exceeds the usage of code one can conclude that people prefer the general wordy nature of English over the conciseness of code.
Due to this, it makes sense to make your code as close to verbosity as English as possible. I can read a novel or magazine almost passively, the same cannot be said of code.
>One question came to my mind that I'm curious what's your take on, from the point of view you present: what's your opinion on notations such as: numbers (i.e. 123 vs. English language: hundred twenty three), and chemical formulas (e.g. H2O vs. English language: particle of water)?
Whatever makes sense. 123 and one hundred and twenty three basically communicate the same thing. Humans as a whole prefer 123. There's no lapse in communication using either method. "One hundred and twenty three" takes a bit longer to read but no crime was committed. 123 doesn't lack any clarity for a typical human being either.
Perhaps I would prefer 123 as Arabic numerals are more universal globally then English.
For chemical formulas, H2O is used for balancing equations. It's specific notation for an algebra of using symbols to derive an exact conclusion, it is less a form of communication and more a system of symbols used for solving problems.
H2O, specifically, however, has culturally entered English nomenclature as a well known concept so it can be used in naming. There is however a slight potential for confusion so ultimate clarity makes more sense to me here. There is zero ambiguity with "particle of water" or even "H2O molecule" and thus my preference is to use English if the goal is communication.
The thing that you have trouble putting into words here is H20 is an elegant symbol that communicates the exact same concept then the uglier "particle of water." Humans are instinctively reacting to an aesthetic issue not a practical one. Again no crime is committed when someone uses the variable name "particle_of_water" in a programming language over "H2O."
I didn't go into it in other arguments but I will get into it here because you've noted to me that you have identified this "feeling" and you acknowledge the contradiction between English and coding.
The reason why the contradiction exists is purely an aesthetic issue. It's a response to what we consider categorically to be "ugly" and "beautiful" and has nothing to do with practicality. When viewed this way the contradiction between English and programming languages makes sense:
We define bad grammar in English as "ugly" but we also have a separate aesthetic sense for poetically naming things. This comes from beyond just programming. For example humans generally find "the White house" to be a better and more poetic name for "The place where the president of the United States" resides.
There's actually two separate modules in your brain here that are odds here. Bad english grammar is clearly triggering the language module in your brain, but at the same time your brain has a poetic naming module that prefers "The White house" over "presidents place of residence" and this module is being triggered when you program or write poetry. You can only begin to see this when I point out the logical contradiction.
Both of modules in your brain are bypassing the neocortex of the brain where people conduct higher order logic. It's actually very hard to realize this if it's not pointed out as people often mistake these feelings for being something that arose from their own internal higher order logic. The whole point of my writing take all of these modules in your brain and place them at odds so you can identify the origin of each and give your neocortex executive control. Anyway here are the three modules that are getting triggered:
- Bad english grammar triggers the language module in your brain.
- Verbose naming in code triggers the poetry module in your brain.
- The logical contradiction between the two modules above when identified through meta analysis triggers your neocortex.
Now that you know, you can step above it all. You can override instinctual your emotions and use your higher order logic to come to the correct conclusion.
To go a bit off on an tangent here there is in fact a morality module in your brain as well! What most people consider to be good and evil is actually instinctual emotions triggered by this module! Again similar to the main topic, most people don't realize this and believe their morality is built around logic when in actuality people are all just building a logical scaffold to justify a pre-existing instinct. Think about it... same with the whole whole poetic naming instinct we have, morality actually starts out as a feeling before we begin to logically justify it.
Believe it or not using the exact same method of pointing out logical contradictions I can actually prove to you that morality is in fact an instinctive module within your brain. It's off topic though, and I've digressed to much, so I won't get into that here.
No there's a difference. :) I made a point AND proved my point by showing a contradiction in human logic.
While I agree that the length of my argument made you not actually read it at all and miss the entire point, proving a point with illustrative examples does necessitate such verbosity.
What you're doing is stating a point without proving it and claiming that my own argument is self defeating with no explanation.
Why don't you try proving your point and also countering my proof while being concise at the same time? Because right now you just stated a point with nothing. You're stating in a single sentence the world is flat after I Proved it's round. Ok... So what? prove it.
They proved their point already. You took much longer and much fancier wording to say what could be said in a few lines, without losing any meaning. Verbosity for the sake of verbosity.
This is what I mean. Using long names compounds not only on itself. It compounds onto the entire codebase. Unless you blackbox the code, make it 100% bug free and it doesn't require changes for future features, that code will be read. Reading takes a lot of time, but worse: it takes far longer for someone to process a much larger cognitive load. This is especially dangerous in huge codebases that need to be changed on a regular basis, usually retorted by "it takes time to get in the swing of things".
We have better things to do in life. Respect the person who will read the code. Be concise.
In all my arguments I'm saying that this logic you present doesn't apply to English. Nothing was proven because my points weren't addressed.
You're not respecting my time with your grammatically correct comment above. You can shorten your comment by mangling the grammar and preserving the meaning.
>We have better things to do in life. Respect the person who will read the code. Be concise
See that sentence it's wasting my precious time you can get rid of a lot of unnecessary info and preserve meaning.
>We have better thing do. Respect reader. Be concise.
There. same point but more concise but now you sound as if you have brain damage. My point is we use programming languages and english to communicate a point, but clearly in english nobody takes any effort to respect anyone's time. It's full of wordy unnecessary stuff and the entire population of english speakers actually prefers reading this very verbose english then reading obscure code.
I am saying because of this contradiction all your logic flies out the door.
Bring the level of verbosity of code to the level of verbosity in english. We don't complain about english, we actually prefer it over code. So clearly nobody is actually caring about 'saving' those precious seconds of reading long grammatically correct sentences. Who cares if someone uses it as a function name.
You spent so many words, yet you still side-step the main argument, and then further ridicule my argument by taking it out of context.
These are different contexts. I do not wish to read a 300 page manual when it can be described in 2 pages, similarly to not wanting to scan 10 pages worth of code that has hundreds of thousands of code I may be expected to have to look at. I require both enough energy and insight to solve the problem after reading.
This is a discussion forum. We write differently here. Information sharing is not our prime objective, unlike writing code.
>I am saying because of this contradiction all your logic flies out the door.
Contradiction solved. Now, will you answer or continue to side-step?
> You spent so many words, yet you still side-step the main argument, and then further ridicule my argument by taking it out of context.
Nothing is being ridiculed here nor taken out of context. This is simply a misunderstanding by you.
As for the side stepping... It's a matter of perspective. From my perspective you are the one side stepping because you didn't even bring up the contradiction. So I'll regurgitate your words right back you were side stepping the contradiction thank you for finally addressing the issue.
>These are different contexts. I do not wish to read a 300 page manual when it can be described in 2 pages, similarly to not wanting to scan 10 pages worth of code that has hundreds of thousands of code I may be expected to have to look at. I require both enough energy and insight to solve the problem after reading.
Let's frame the context here so that we both agree. The context is to communicate a concept to a reader WITHOUT being verbose. Code and English both live within this context because you use code to communicate to other programmers and you're using English to communicate to me Right Now.
We can also agree that BOTH code and english can be over verbose.
Context Established.
>This is a discussion forum. We write differently here. Information sharing is not our prime objective, unlike writing code.
Information sharing is the prime objective Of all written and verbal forms of communication. You need to understand this. There is zero point of writing anything if it is not communicating a point. Any form of communication is a form of information sharing, and english being a medium of communication which makes it a medium for information sharing.
Have you ever heard of "Documentation"? Documentation communicates the EXACT same information that code does but better because it's in English and more verbose. That's why Documentation often exists along side code. One can derive code from documentation and documentation from code.
>Contradiction solved. Now, will you answer or continue to side-step?
Contradiction not solved. You still need to address it. English and programming occupy the same context with English actually being used within programming. Think about it, Programming is basically a shitty version of english that's only used because a computer understands it. Nobody would be programming otherwise. If you can program a computer efficiently using English I guarantee you traditional programming languages will be thrown out the door within a day, nobody will use it anymore.
So the context that programming occupies is two fold. It occupies the same context as English to communicate with other people, and at the same time it also has to communicate with a computer. That is the prime difference. So why do we have reduce naming to some poetry contest when you can write a fully grammatically correct and clear sentence as function name and call it a day? We don't do it in English why can't we stop doing it in programming? The contradiction is still there and still invalidates all your points.
There is no logical reason why we shouldn't bring programming up the same level of clarity and verbosity as English. The only thing stopping us are the technical limitations of the computer, so we should get as close as possible with what we currently have.
It is redundant. It doesn't need the "xy_coordinate" because that is the return type. Furthermore it is wrong, it should be "xy_coordinates", with an "s". Or it shouldn't return a list.
The "in_picture" part is also redundant, it takes a Picture argument so why mention it?
Note: these arguments depend on whether you language supports polymorphism or not. In C for instance, you often have no choice.
The "dogs_cats_and_baboons" part is fine as long as it really is what you are looking for. If your intent is to find any animal and you implementation only finds dogs cats and baboons now, then you should call it "animals" with maybe a comment clarifying that point.
The problem with long fonction names is that they produce long lines. Long lines are terrible. Not as bad as they used to, thanks to large, wide screens but still, I hate having my editor window unnecessary large or have a horizontal scroll bar.
For maximum readability you want function names which are descriptive, but concise. My personal pet hate is when people make it concise by using acronyms and I’m just left wondering what the hell it stands for.
Personally, I find the ‘in_picture’ suffix superfluous as it’s clear from the input parameter what you’re finding the animal in, but otherwise find it a good name.
> My personal pet hate is when people make it concise by using acronyms and I’m just left wondering what the hell it stands for.
I agree. I can find, and understand reducePermissionLevel, but reducePermLvl is unguessable, and not searchable because abbreviations are arbitrary. Never abbreviating provides a predictable scheme.
i like clear and concise naming and don't care too much about how long it is generally if it helps understand what it does without being extraneously verbose. However, i think there is an argument to be made about how long a single line of code should be before it becomes too hard to read. The example given would be too long for me and i would try to shorten it. In this case probably by implementing some abstraction ;)
You claim there is an argument. But you don't actually state your argument.
My claim is that you think there is an argument, but there really isn't. That function name can do no real damage to the clarity or structure of the code. You only wish to shorten it because of OCD.
The logic here is easily illustrated if I rewrote that function name in English:
If my goal was to communicate this to you:
A function that finds the x y coordinates of dogs cats and baboons in a picture.
This is perfectly ok, but only because it's english. If I tried to write the English as if it was a function.
func xy_babboon_cat_dog_detector
The above doesn't fly in the English language. But it only works in programming. This is contradictory logic.
The question is, if both programming and English are both mediums used for communication why do they both have contradictory styles?
The reason is because there is no reason behind it. It's the same reason why people in Japan still use fax machines. Habit and typical human irrationality.
When you peel away the layers of your bias you will realize that this contradiction exists because the level of verbosity of my function doesn't actually matter. It doesn't matter in English, and therefore it doesn't really matter in programming.
Seriously, didn't you find it strange that you made your point without even stating what your argument was? Typically if you had a clear reason you would give it, if you had examples you would show it, instead you just said an argument existed probably never realizing what that argument actually was.
It's not just you. The other commentor just reiterated some points without trying to prove any argument. English has a preferred communication style that is contradictory to the preferred programming communication style even though both mediums can go back and forth between either style without any clear difference.
If you self reflect about it, your desire to make my function concise arises more from a feeling and a "emotion." You did not logically deduce your point using evidence... rather you just felt that it needed shortening and that it looks "ugly."
Then when I questioned that logic, your mind, without realizing it, began building a logical scaffold around the feeling to support the desire with some rational framework. Such is human nature, and this type of thing happens for all kinds of strange human biases that we posses. Religion, no doubt, is a similar bias... when questioned the religious persons' brain will go through the exact same process that your brain did upon seeing that ugly wordy function name.
The question is, by going meta and describing the situation in this way would that help you take a step up above that bias? Or will you continue to build that logical scaffold and try to justify your strange desire to make the function more concise for no reason?
Think about this before you reply... did you already honestly have an argument that justified your point? or are you building one right now to respond to me?
This is literally as close as I can get to what I'm talking about. There's this strange bias that every human (including me) has when they first learn programming to write concise "elegant" names for no real purpose. It's so strong that sometimes a normal argument can't help the other party reach an epiphany. Hopefully by going meta I can help better illustrate what I'm referring to.
>A function is a "thing", you should rather compare it to an entity in the English language.
It's more a verb that does an action on a noun. You can't express the concept with just a word.
add one to number
>Why do we say washing machine instead of a home appliance powered by electricity used to wash laundry through the use of centrifugal force?
Obviously there's no need to describe the plumbing behind the machine just the purpose of the machine is good enough. Everyone understands what a washing machine does as it's culturally a part of our language. If I lived in a civilization without washing machines and I had to implement it as a variable name in code I would call it "clothes_washing_machine" because of added clarity and no actual harm done with the extra word.
>When naming things you need to balance descriptiveness with conciseness, a name is not a definition.
Sure and I'm saying most people are wrong about where this "balance" actually lies. People place too much emphasis on conciseness.
>As a rule of thumb, avoid specifying things that can be easily guessed, especially if they are right there in your function signature (!):
I don't like readers to do any guessing at all. It's wrong to make assumptions that a guess that comes easily to me will come easily to my audience such is the nature of documentation and documentation as code. I want people to read my code like they read english. But that's just my opinion.
I mean what's the benefit of shortening this. I'm looking at this function and I'm feeling nothing happened. You just did extra work.
If I want to be nit, XYCoordinates should not be plural, List should contain many Types each called XYCoodinate. XYCoordinate(s) is better used as an alias for the entire list.
Additionally my function only operates on cats, dogs and baboons. It does not operate on all animals. Someone can mistakenly use this function to try to find lions and tigers and bears Oh my!
But that's besides my point. You attempted to shorten my function name while preserving meaning for which I totally understand the point you were trying to convey. What I'm saying is your changes are logically negligible. They do nothing to improve understanding of the program while trivially shortening things. Nothing practical occurred here. It's simply the scratching of an OCD need for more poetic names.
NB: I'm answering to the claim that we behave differently when naming things in programming languages vs natural languages. Otherwise I think it's mostly a matter of preferences.
I understand you prefer to be more detailed in your naming, that's fine, but in natural languages your names would sound unusual/verbose as much as they do sound unusual/verbose in Python.
You say everybody understands what a "washing machine" is therefore a short name.
Are you saying that when washing machines were still a novelty they should have been called "clothes-washing machines" instead? Unusual naming right? People do seem to have a distaste for long and overly-detailed names in spoken languages as well, don't you think?
And what's the point of a dictionary if names embed a full definition that leaves nothing to be guessed?
> Sure and I'm saying most people are wrong about where this "balance" actually lies. People place too much emphasis on conciseness.
Where to draw the line can be a matter of preferences, no intention of debating that, but people do tend to draw the line the same way whether they speak English or Python. No incoherent behaviour there.
> NB: I'm answering to the claim that we behave differently when naming things in programming languages vs natural languages. Otherwise I think it's mostly a matter of preferences.
And I am saying this behavior is attributed to an irrational instinct. There is no practical logic to it even though are instincts push us to behave this way.
>I understand you prefer to be more detailed in your naming, that's fine, but in natural languages your names would sound unusual/verbose as much as they do sound unusual/verbose in Python.
The purpose of an action is to serve practical purpose. Something sounding unusual has nothing to do with whether the associated action was practical or impractical.
A name may sound unusual in python and suddenly sound perfectly fine in english. How this name sounds has nothing to do with the actual practical significance of the name.
If the name is informative then it is practical. Place that name in python, place it in english. How you feel about the name and how you think it sounds is irrelevant to your purpose of practicality.
The practical goal here is maximum clarity with zero ambiguity.
Your instincts and feelings are lying to you. You are subconsciously reacting to a purely aesthetic attribute. A poetic and elegant name does not serve an actual purpose. Only an informative name serves an actual practical purpose of being informative.
We program to make things work, not to come up with function names that are poetic/brief/unreadable. An aesthetically pleasing name does assist us in achieving the actual goal of our program but an informative name does.
>You say everybody understands what a "washing machine" is therefore a short name.
I'm saying anyone in our culture who speaks english.
>Are you saying that when washing machines were still a novelty they should have been called "clothes-washing machines" instead? Unusual naming right? People do seem to have a distaste for long and overly-detailed names in spoken languages as well, don't you think?
I'm saying in a hypothetical culture where we didn't have context on what a "washing machine" was "clothes-washing-machine" would properly communicate the intent and meaning about what that machine actually does. I am able to throw away any preconceived biases I have and not assume that a machine that washes things only washes clothes. Keep in mind I prefixed my entire point with a hypothetical culture that didn't know about "washing machines"... you seemed to have missed the fact that I did that.
>And what's the point of a dictionary if names embed a full definition that leaves nothing to be guessed?
There would be no point to a dictionary. But clearly the things we define in most functions aren't defined in the dictionary so rather then make up names no one can understand you can combine english words that everyone understands into sentences and phrases and use those to Name your functions.
>Where to draw the line can be a matter of preferences, no intention of debating that, but people do tend to draw the line the same way whether they speak English or Python. No incoherent behavior there.
Everything is a matter of preferences. Even believing in 1+1=1 is a preference you can choose to believe in.
I am saying in terms of of the set of attributes people hold to qualitatively describe whether or not something is practical, most people mistakenly believe that poetic and terse function names possesses the very same attributes they hold as "practical." TLDR: I am saying once people understand my point, most peoples preference are in full alignment with my preference.
I am saying when you ignore your inner OCD, you will see that aesthetic/poetic/elegant naming serves zero practical purpose and short and brief names have negligible practicality when compared with long names.
Thus a slightly longer function name that is ugly but very very informative and similar in grammar to the english language is the most practical and logical way to name your functions. It doesn't matter how "unusual/verbose" you feel that it looks/sounds as that feeling is orthogonal to the logical purpose of your naming in programming: to communicate and inform.
See past your bias and ignore pointless aesthetics.
You're mostly arguing on why your naming preferences are better. You're missing the point, I'm not addressing that.
Instead you seem to agree people name things in the same (irrational according to you) way both in English and Python.
Which is exactly my point and what you previously claimed not to be the case.
Once again I am addressing this comment:
A function that finds the x y coordinates of dogs cats and baboons in a picture.
> This is perfectly ok, but only because it's english.
No, it's not ok only because it's English. It's ok because it's a definition, it is not a name.
In English as in Python we tend to prefer more concise names.
A function's name is...a name (duh), the comparison to prose makes no sense. Once you actually compare English and Python names you'll see they both tend to be more concise.
> See past your bias and ignore pointless aesthetics.
Ironically I find your style more poetic (we really have opposite tastes :P). But as you saw we both keep the same preferences independently of the language. No incoherence/bias there. That's the only point I'm making.
> You're mostly arguing on why your naming preferences are better. You're missing the point, I'm not addressing that.
No I am not arguing for that. I am saying your style is objectively LESS practical and harder to read. You missed the point repeatedly.
>Instead you seem to agree people name things in the same (irrational according to you) way both in English and Python.
When did I claim this? You seem to be misreading everything. Functions in "python" or programming in general are by most people named by trying to find some word or hybrid mangling of the english language to find an elegant but less informative name. Similar to how poetry is a mangling of English grammar.
I am saying that it is more practical to NAME a function in programming with a longer phrase or sentence. Whether you "feel" that's ok or not is irrelevant to the actual practicality of that name. You didn't even read my post.
>No, it's not ok only because it's English. It's ok because it's a definition, it is not a name. In English as in Python we tend to prefer more concise names.
Your just getting semantics mixed up. We call function names a "name" but it can also be called a function "sentence" or "function phrase"
Here's a more clear way to get it through your head:
"Function phrases" are more practical and informative then "function names" aka "function abbreviations/poetic words"
I am saying what you "prefer" or how you feel about the above statement is completely irrelevant to the fact that logically "function phrases" are more informative with the cost of being slightly longer.
My claim is your preferences are totally irrational and you are missing the point. Read it more carefully.
>A function's name is...a name (duh), the comparison to prose makes no sense.
It makes no sense to you because your brain is limited by the words "function name" you think because we call the naming of a function a "function name" your brain is unable to wrap around the fact that you can use a collection of words in the naming of a function. That's why I renamed "function name" into "function phrase" to help kick start your brain into gear and get on my level.
>Once you actually compare English and Python names you'll see they both tend to be more concise.
Again, missing the point repeatedly. Stop letting the word "name" block your creativity. Whatever we call something in the universe, a title a name or whatever, I can choose to be not concise, concise, put an entire sentence into the title, put an entire novel into the title. There are no actual rules on what we want to do and can do. The argument is simply for what is actually practically better to put into this "title" for programming.
This is the playing field we're operating in, You are letting your personal vocabulary and definition of "name" to delude your thinking.
So I am saying I want to use longer phrases/sentences or "function phrases" as the title of a function and you are saying that you want to use shorter/briefer names and that is a "preference" Then you say that my "preference" is not a "name"
My counter to your above is that whatever you want to call "naming" is completely irrelevant, my "preference" is categorically, objectivity and logically more practical and informative then your "naming" style. It is better because it communicates more information get it?
The thing that's missing here is that you haven't objectively told me why your style is better. It conveys less information and is thus logically worse. I'm betting that you have no reason. You just irrationally "feel" it's better to use "function names" or "function phrases"
>Ironically I find your style more poetic (we really have opposite tastes :P).
This is where your thinking is cloudy. First off this has nothing to do with taste. I am literally saying your "preference" is objectively less informative and therefore worse.
The other part of your thinking that is cloudy is your misinterpretation of the word "poetic." Poetry is a mangling of English vocabulary and grammar that is more Concise. My proposal is to move away from mangled English and grammatically incorrect "function names" and write "function phrases" that are grammatically closer to correct English. Your proposal is to make it shorter and more like poetry as per the exact definition given above.
People who can't think straight tend to think anything goes in poetry and that either naming scheme (yours vs. mine) can be poetry as a matter of taste. Wrong. There are hard rules that separate poetry from written English. Again, my style is to make programming closer to grammatically correct english, your style is to create poetry as per the definition of poetry above.
>But as you saw we both keep the same preferences independently of the language.
What same preferences? Our preferences are objectively different. And my preference for "function phrases" is objectively better and more informative then your "preference." We never reached an agreement I don't know how your imagination is cooking this up.
>No incoherence/bias there. That's the only point I'm making.
Again I am saying my "preference" is logically BETTER then your preference because it saves the reader time from guessing context and is more informative.
Look man, stop. This is a typical argumentative strategy to turn hard facts into muddy "opinions" and "preferences." In your world nothing is better or worse and everything is just a preference and anything goes. This is weak.
I have a "preference" that is different from your preference and I am stating my preference is better then your preference and your response is that everyone can have their own opinion? Come on.
> We call function names a "name" but it can also be called a function "sentence" or "function phrase"
I mean...we can also say that pigs are birds if you wish, everything is possible XD
Who in English would ever name anything as "A function that finds the x y coordinates of dogs cats and baboons in a picture". Does that sound like a name to you?
Do you actually speak like this in your daily life? "Please can you turn on the device for remotely visualising entertainment shows, news and sport events?"
Ah I forgot names don't exist: a name is a phrase, a definition is a phrase, a question is a phrase, an assertion is a phrase...can all be used interchangeably, they all serve the same purpose...ssure.
I think at this point you're lost in your own sophism. Good luck getting out of it!
>I mean...we can also say that pigs are birds if you wish, everything is possible XD
That's right we can. What you're failing to see here is that this naming is arbitrary. It truly seperate from structure.
>Who in English would ever name anything as "A function that finds the x y coordinates of dogs cats and baboons in a picture". Does that sound like a name to you?
That's why your thinking is limited. Whatever we call a function or a thing doesn't have to be limited by your definition of what is a "name" the limit is placed by you not reality.
>Do you actually speak like this in your daily life?
Can you get it through your head? You speak in sentences do you not? You don't ONLY use names to describe things, you use sentences. So we CAN use a sentence to DESCRIBE a function definition. Just because we call this description a "function name" doesn't mean we should be limited by the concept of what you define as a "name"
>"Please can you turn on the device for remotely visualising entertainment shows, news and sport events?"
Fortunately unlike speaking our editors assist us with auto complete.
If such a primitive existed in your program and you just called it "remote" and left the reader to guess what the hell it does by looking at context... you'd be a really bad coder.
Call it:
"controller_that_changes_television_channels"
And auto complete assists us with the length of the name. So really length isn't even that big of a factor here.
>Ah I forgot names don't exist: a name is a phrase, a definition is a phrase, a question is a phrase, an assertion is a phrase...can all be used interchangeably, they all serve the same purpose...ssure.
They actually do serve the same purpose. The purpose is communication. The problem with you is that you think the only form of communication in programming is names and context. I am saying you can use english sentences as well. It's that simple.
>I think at this point you're lost in your own sophism. Good luck getting out of it!
My argument? This isn't my argument. I'm not smart enough to invent this concept.
You ever heard of a guy named Donald Knuth? The guy who basically turned programming and algorithms into a science? Wrote the books "The Art of Computer Programming"? Well he invented something called literate programming:
Take a look because in Literate programming people create "macros" and name those Macros after Entire essays or paragraphs. Donald does not restrict the "naming" of macros into pathetic little snippets of poetic words. His mind does not restrict what you can "name" something, not like your mind.
Don't let the word "macro" confuse your brain... macros are the same thing as functions, just a bit more advanced. The primary difference is that functions are resolved at run time, Macros are resolved pre-compile time.
This is my point.
You're not trying to debate my argument. You're debating an entire style of programming created by Donald Knuth.
I deliberately hid the official name of this technique because dropping the name Donald Knuth would just get people to agree with his reputation rather then reason and logic behind his thinking. Given that reason and logic doesn't work with you I think a name drop is relevant here.
It's not my sophism. It's Donald Knuth. Good luck trying to resolve your sophism with a concept invented by Donald Knuth.
>You just said that a name, definition, question and assertion all serve the same purpose and can be used interchangeably...I mean what else can I say?
They can be used interchangeably similar to how people can choose to be wrong, right, stupid or smart. The ability to interchange techniques is irrelevant to my point. I am simply responding to a misguided concept stated by you.
A longer phrase can be used interchangeably with a shorter name. The longer phrase is superior to your misguided opinion on using shorter and as a result uninformative names. Donald Knuth agrees.
I am saying your way is the categorically wrong and worse way. You can't respond to this because you got nothing left to say. You're tongue tied.
>Exactly
Yeah you're exactly wrong.
>It was a nice exchange nonetheless but at this point it really looks like you hit a dead end.
Literally I have reams and reams of evidence, deconstructing each of your points and tearing apart every statement you made. You only have the capacity to respond to one teeny tiny snippet of what I wrote and your response is still misguided.
The argument is that at some point the length of the name is detrimental to readability, in the same way that a run-on sentence is detrimental to the readability of prose. I thought it was quite clear and your 16 paragraph response didn't touch on that at all. Or if you did, it was lost in the noise.
Oh I thought it was obvious that you don't want to rename a function after a 300 word paragraph. I didn't realize we need to get into that kind of semantics. My argument is that a sentence long function name of 10 words roughly is still a good function name. There's no hard rule here. Judgement is still qualitative The main point that I was arguing for is that this:
'xy_coordinate_of_' taken out because XY coordinates are already in the return type. '_in_picture' taken out because the information is already in the 'picture: Picture' parameter. The return type 'List[XYCoordinates]' changed to 'List[XYCoord]' because Coord is well known as an abbreviation for a coordinate and having XY in front of it makes it completely obvious. Removed the 's' from the end of the return type because it is already contained in a list and shouldn't be pluralised. It would be pluralised if you are returning a list of lists of coordinates.
There are problems with having huge function names, especially if other programmers need to use the functions you write in their code. One is the amount of screen space needed to use your function, this bites when the could be many functions with long names used in a single expression, long names can be harder to understand in that context. There are reasons that mathematical operators in programming languages are usually one character, things like '+ * - /', it should be obvious the problems people writing in that language would have if the operator names were replaced with things like 'multiply'. Now imagine mathematics being written in a verbose English style, it is not done this way because of all the extra effort it would take to write it, and all the extra effort it would take to understand it.
While I do like me some good meta-reasoning, at least I have an argument for shorter function names:
Short names are read in O(1) while longer are O(n) (unless the name can be simplified as “the function with the long name”)
Then you need to shorten your comment! Not as verbose as my replies but it can definitely use some improvement on the big Oh! Try this out for size:
> me like metareason, me have arg for short func name: Short name O(1), long are O(n) (some name too long: “the function with the long name”)
There's where your logic breaks down. If you truly followed the O(N) argument for all your communication mediums outside of programming you'd sound not so intelligent. Likely you built this argument as a response rather then actually followed it in both your English and programming communication styles.
It's probably not worth continuing here, seeing that you're already pretty hostile.
But I definitely try to minimize the amount of reading necessary to understand a concept. Ideally, the concept should be such that it's easy to describe in few words (note "words", not "letters". Also, using many words doesn't make you sound smarter, often it's the opposite). Things don't live in a vacuum, there's always a context, and that's important to consider when communicating...
my point is that i think there should be a soft limit on how long a single line of code should be because i feel like it becomes too hard to read if they get too long. that is an emotion as you said and it might be kinda arbitrary but i don't think there is something fundamentally wrong with it.
Sure there is a logical reason for why you don't want your function to be 4000 lines long. It gets unwieldy and becomes mechanically hard to manipulate or even understand the full meaning behind all 4000 lines. That's a logical argument. I completely agree with the soft limit.
There is no logical reasoning that can justify shortening this function yet the feeling is strong. That is the bias I am trying to illustrate. There is, in fact, nothing wrong with not shortening this function and keeping it as it is.
>This sounds like Hungarian notion taken to the extreme.
No it's sanity taken to the extreme.
Have you ever noticed that all things written to communicate things to people in the United States outside of programming is written in a very verbose manner using a language called English? It's used for technical manuals, text books, and stories.
Is English "hungarian notation taken to the extreme?" No dude. People actually find verbose English stuff easier to read. You don't have English writers abbreviating words and coming up with elegant acronyms in a physics text book.
>Why not write a comment if a paragraph is required to understand what you are doing.
I didn't say name your function after a paragraph. A functions English language analog is a word and at most a sentence. A paragraph would be several functions chained together. If you name your functions well, composed procedures will read close to an actual English paragraph.
That being said there's nothing wrong with comments, comment away but don't call your function doXYZ and put the entire description in the comment. Your comment doesn't follow the a function call.
Trust me, you may think your eyes are bleeding but they are not. The above is actually closer to the English language then 90% of code out there. What you don't realize is that there wasn't a need for a single comment and there wasn't a need to dive into any of these functions to read the definition. You just read the variable and function names and you know exactly what's going on. If you recompose these functions to do something else it's like recomposing sentences and words in the english language. The end result is still readable without the need for new comments.
When you read a recipe or follow directions to build something does the writer give you those directions in some coded nomenclature? No the writer writes verbose English with clear grammar. The point is clarity in naming in these entities, it makes zero sense why we don't do the same in programming.
>Writing a paragraph kills any formatting in vi/vim.
You know this might be a really different chain of thought that goes against the grain...
But maybe due to this lack of formatting in vi/vim makes vi/vim an extremely bad editor for programming? Seriously, humans are weird (Japan for example still uses fax machines). If it's so bad why do so many people use it? Maybe to look smart or maybe for the same nonsensical irrationality as to why we have to come up with some unreadable but "elegant" name for every programming primitive but at the same time we have to be extremely verbose for ALL other forms of written communication with English.
Programmers like to think they're smart and original. Most aren't... they follow the same tropes as every other programmer trying to come up with elegant names for no reason whatsoever and strangely unable to see the purposelessness behind this whole naming thing. If you can't come up with a good "name" for it, make the name an entire sentence, it's that simple... it's the reason why sentences exist.
"from_file", "by_name", etc. are fairly needless here. Most people are apt enough to grasp the first argument relates to the last word of the function name. Use properly named variables so a summary can take care of it, or only omit the noun, "get list of profiles from".
Get itself is a terrible prefix and could be omitted, especially if you keep "from". "list_of_profiles_from(X)" isn't any less clear. Alternatives like "read_list_of_profiles(file)" exist.
"List_of_profiles" can be changed to "profile list". You already did it with "concatenate_profile_lists", showing inconsistency. Alternatively, depending on the context, "list" itself can be omitted entirely, and just state "profiles".
"merge_married_profiles_into_list_of_pairs" can arguably be shorted to "merge_married_profiles" depending on IDE and language: that last bit should be clear from the returned value. Even without, it can still be shortened to "pair_married_profiles", as the context should make it obvious if we look for more than 1 married couple, there will be some kind of collection. Additionally, your naming has one problem: "merge_married_profiles". With all the verbose naming, it is still not clear what "married profile" is. I'll assume it means "pair profiles of married couples", where you might as well say "pair_profiles_of_married_couples".
>Programmers like to think they're smart and original.
It is because they think to be smart and original, they fall into the trap of overly verbose naming. Not from a lack of it. Have you looked at a legacy Java code base? Many of them can be slashed in half just by renaming variables to something that keeps the meaning, or draws meaning from the context very obviously. These guys are going against their own mind's natural ability to read context, or worse, conditioned themselves to learned helplessness.
You mean "being more conciseness is better if you don't have to sacrifice clarity for it"? Or just - "conciseness is a good thing"?
How would you rewrite that example, specifically, to make it more concise without sacrificing clarity, then? Do you mean to change the names without omitting type or relational information somehow? Or to omit some variables entirely in favor of nesting function calls?
If the latter, I don't see the relevance to that commenter's actual point about short vs long names; reducing the number of names is entirely tangential.
I would omit most information that can be inferred from context so long as context locality is good.
Single-letter names are OK for context smaller than 1 line (like lambdas), type information can be omitted for context up to 10 lines (lists etc) and generic functions (filter, concat, inner_join) are better over custom domain functions when the audience consists of other programmers.
Abbreviations are best avoided unless they are extremely common and familiar in the specific domain / industry. If they are only ever used within that team and company, that's probably bad.
>I would omit most information that can be inferred from context so long as context locality is good.
context also changes with time as other people edit your code. Your single variable name with simple context can balloon in complexity and move around. Don't assume locality is fixed.
Additionally locality wastes precious time. It is preferable to read a variable name and not even need to touch context.
>generic functions (filter, concat, inner_join) are better over custom domain functions when the audience consists of other programmers.
Not true. For filter and innter_join especially the predicate or inner lambda can be intricately complicated and hard to decipher, better to wrap all that complexity in an english name. You save programmer time by having the programmer read an english over deciphering even simple code.
Rule of thumb: It is far easier to read one line of english then it is to read a one line of code. So it is better to allow readers to ascertain meaning from naming over context. You are wasting the programmers time otherwise.
> context also changes with time as other people edit your code
And you change the code accordingly, IF it does. Typically though, its best to change the structure of the code, not to increase its local complexity. If the context gets complicated, think about what makes sense to be split out. Local complexity should be kept minimal.
> can be intricately complicated and hard to decipher
And you change the code accordingly, IF they are. But you name the lambda, not the generic function. There is a huge value in using a standardized vocabulary. Even in English, we get value out of "baby" versus "small human between the age of 0 and 2 years")
More generally, I've seen this kind of thinking before. A programmer discovers some "universal truth" which applied well in some context and they get so obsessed with it that they start applying it everywhere. Please, stop and think before you overcommit to such ideas. They are not nearly as universal as they seem - caveats apply. If you ignore those caveats, your fellow team mates will suffer - so check with them frequently too. At the end of the day it doesn't matter what you think about your own code's readability, but what others think.
>And you change the code accordingly, IF it does. Typically though, its best to change the structure of the code, not to increase its local complexity. If the context gets complicated, think about what makes sense to be split out. Local complexity should be kept minimal.
My method requires no changing ever. With better "functional phrasing" I make the title of a function independent from context. Similar to a module.
You however are saying that the name and context are tied together thus to handle it you change the name and context and structure all together. This is objectively worse.
>And you change the code accordingly, IF they are. But you name the lambda, not the generic function. There is a huge value in using a standardized vocabulary. Even in English, we get value out of "baby" versus "small human between the age of 0 and 2 years")
And there it is, more changes. Every change and edit to working code is a potential for a new bug. A change to structure of the code should be made independently to naming. Modularity is important.
>More generally, I've seen this kind of thinking before. A programmer discovers some "universal truth" which applied well in some context and they get so obsessed with it that they start applying it everywhere.
First off stop commenting on my character. Second off of course caveats apply. I never said disregard caveats. It's also perfectly fine to take on a bit of technical debt and use a one letter name to save time when needed and according to context. My argument is for knowing what is technical debt and what's not. Don't make random assumptions here, get your head straight and focus on the topic at hand.
We can get rid of the "caveat" distraction by just examining an example without caveats:
My claim is that the above is an informative and perfectly good "function phrase" Your claim is that this is worse. Caveats do not apply in this example.
>Clear writing is about structure, not verbosity or repetition. Concise-and-clear is preferred over verbose-and-clear.
Except this is where the contradiction lies. In both well written literature and great text books... clarity trumps all even when conciseness is sacrificed. English is one of the most verbose languages out there. I can take your sentence and make it concise:
>My eye bleed.
>Writing about struct. No verbose or repeat. Short n' Clear beter den wordy n' Clear.
Is that better? Your misguided logic paints my concise version of your comment as "preferred" even though it has the exact same clarity. The stark reality is, for purely human reasons, people prefer the former example over the later and there is no real rationality behind it.
Try to think a bit outside of the box here. You share the biased and delusional opinion of a typical average programmer.
The real logic is, that the conciseness or verbosity is inconsequential. Our human nature allows us to prefer contradictory approaches in code vs. english because verbosity and conciseness doesn't actually matter that much. Clarity is king by a long shot hence the reason why most humans prefer reading literature over code.
My code displays the ultimate clarity. You insultingly claim that your eyes may be bleeding, but I guarantee you that unless you're mentally deficient, no part of my code was unclear. It was 100% obvious and crystal clear what my intentions are. The best part is, you only need to read it one time.
When is the last time in your life you've read a similar snippet of unfamiliar production code at first glance and left with the exact same level of clarity? Most similar production code needs a good number of guesses and hypothesis and a couple of reads and code following to develop the same level of clarity and confidence of understanding that my code can produce on a SINGLE reading.
I would wager we can agree on that point and if your claim otherwise I would wager that you are lying.
If it can be found. But more often then not it can't be found. Do you always search for the most elegant single word to describe a point in English? Sometimes. But more often then not you have to resort to sentences.
There's no reason why this logic can't be applied to programming.
Yeah, I don't need conciseness or brevity. The processor or the network connection needs conciseness. Programmers need specificity and clarity. Verbosity is fine too, IMO :)
Does it matter whether profiles is a list or not? If the typical data structure / convention you use for plural variables is a list, you don't have to say it.
Do I have to say that the variable bobs is a list of people named bob? Not if its obvious from the context on the right hand side that I'm filtering by name
Do I have to use a more verbose argument name in the lambda passed to `filter`? Not really - its short and there is plenty of context around to deduce that its a profile, especially if the reader is familiar with a commonly used standard library function.
The last one is tricky, and it depends who you're communicating with. Do you expect your readers to be familiar with the standard library of the language, even less commonly used functions? If so, then its fine. If not, again it depends. Is the reader familiar with SQL or relational algebra? If so, then yes they probably have no problem with this.
As long as your context and conventions are clear, you can leave a lot of details out and still get the message across. Its better to err on the side of caution, yes, but it doesn't mean that unlimited verbosity leads to unlimited clarity. As with writing - your audience is what matters.
> My comment was already concise to begin with. Your change reduced clarity. Here I will exaggerate your example:
Your comment was concise because it was just a statement. You didn't attempt to prove a point.
My comment was a proof against your point hence why it's longer. Now you're trying to disprove my proof which also explains why your subsequent reply is also significantly longer.
> There is a point where verbosity decreases clarity by overwhelming the reader with irrelevant detail which can already be inferred from context.
I agree with the above completely, but I also think the point is obvious. I never stated there was a level of verbosity that is excessive because I thought that notion is actually completely clear to all readers.
Here's a good rule of thumb to follow. We clearly don't think the English language is too excessive in verbosity. So All I'm saying is bring programming to the level of verbosity of English and don't go past that.
Obviously your first example is excessive and past typical English verbosity. But your second example is below English verbosity and has several problems.
> Does it matter whether profiles is a list or not? If the typical data structure / convention you use for plural variables is a list, you don't have to say it.
It doesn't hurt if I put "list" or "profiles" in the name it's just some additional letters letters and adds more information. It doesn't matter at all. Also your assumption is wrong. Many containers can be plural including linked lists, hash maps, trees and graphs.
>Do I have to say that the variable bobs is a list of people named bob? Not if its obvious from the context on the right hand side that I'm filtering by name
You don't have to, but you don't not have to either. The Bob variable can be used in a section very far away from the context... then what is a bobs? What is a janes? How do you even know it's a list of profiles? You're literally making me follow and decipher code to figure it out. That is the point. Give a function an English name where I don't need to decipher anything. I read the function name and I don't have to dive in to decode anything.
>Do I have to use a more verbose argument name in the lambda passed to `filter`? Not really - its short and there is plenty of context around to deduce that its a profile, especially if the reader is familiar with a commonly used standard library function.
There is nothing you "have to do" here. You can do whatever you want. I am saying what you're doing is actually is worse for communication and that my way is better for communication with the incredibly negligible downside of being more verbose. That being said context can balloon in complexity, reading code is harder than reading english so make the reader read english when he can rather then code.
>The last one is tricky, and it depends who you're communicating with. Do you expect your readers to be familiar with the standard library of the language, even less commonly used functions? If so, then its fine. If not, again it depends. Is the reader familiar with SQL or relational algebra? If so, then yes they probably have no problem with this.
All your variables can be used far away from the context where they are created. You can't rely on the fact that the creation of Bobs is right next to it's usage in marriages. Often times your style of coding will result in people having to follow code and dive into definitions to figure stuff out.
First off marriages. Marriages of what? Sam and Bob? George and Shirley? Second the expression itself. Again what is a bob and jane? What is a partner? Partners in crime? Also seriously:
I think most people will agree that mine is more clear in communicating what's going. Your version despite the brevity needs some deciphering.
Also you can't expect that the profile data structure is so simple that it can be done in a one liner. You assumed the data structure to be very simple. What if the data structure is an incredibly complex graph structure of profiles. Marriages can only be found by a complex graph algorithm. I don't want people to decipher a graph algorithm to decode what I'm trying to do here.
Write your function names so people can avoid deciphering meaning from context. The point is so people can decipher meaning from English because English is ten times easier.
> Marriages of what? Sam and Bob? George and Shirley? Second the expression itself. Again what is a bob and jane? What is a partner? Partners in crime? Also seriously:
And we get to the key point you are missing. Its clear from the context. The code we had wasn't some imaginary code where the variable was far away and had a ton of context. It was that particular code. Different code might be better written in a different way. If you a have different code context in mind with higher complexity, show that one.
Additionally, "merge_married_profiles_into_list_of_pairs" is not necessarily better. When debugging the code, we don't know what that part really does. An implementation using a more generic standard library function lets us glance over that bit since we already have understanding of it. (And again, it might depend on the audience - are we talking to a language expert, or a domain expert? Do we have a well tested and well defined library of domain functions that everyone has a clear understanding of?)
Context and audience matter. Verbosity can be a lazy cop out for bad structure. (That's applicable to writing English as well.)
>And we get to the key point you are missing. Its clear from the context.
I understood this point utterly and completely you have misunderstood the point I was making. I am saying you can't rely on context because context can grow in complexity and can actually live far away from where you are using a variable or a function. Relying on context leads to code that will inevitably become less and less readable as complexity grows. Read my post. I literally addressed "context" and you literally missed my point.
Let me spell it out for you. If I have a 500 line piece of code, Bobs is created on line 1 then reused again on line 500, and I'm currently looking at line 500, you're expecting the reader to scroll all the way back to line 1 to decipher context. Couple that with multitudes of other concepts littered throughout your code with context strung throughout the page and located in different files.... This is my point that I demonstrated to you earlier to COUNTER your point. Once you realize this, you'll know that you're the one who missed the point.
You function name should be so clear that a reader should never have to read context. He reads the name and he can move on with life without decoding everything you did.
If I called the variable list_of_profiles_named_bob, no context is needed. Critical information lives and moves with the concept.
Let me reiterate my point: Context used in place of naming is done by programmers who are bad at writing readable code.
>Additionally, "merge_married_profiles_into_list_of_pairs" is not necessarily better. When debugging the code, we don't know what that part really does.
This is 100% better. Nobody needs to know what a function actually does, this is how abstraction works the point is that you only need to dive in when there is bug, but before there's a bug complexity should be abstracted away so we can make sense of the bigger picture.
>And again, it might depend on the audience - are we talking to a language expert, or a domain expert? Do we have a well tested and well defined library of domain functions that everyone has a clear understanding of?
I assume the audience can understand english. No need to use "inner join" when both the person who knows SQL also knows english. I chose the methodology that everyone can understand. What is the cost of doing this? Nothing. Just a longer function name that actually does zero harm to the structure of a program.
>Context and audience matter.
Audience matters, assume the audience can read English and generally program, that's it. Context as a communication medium is a crutch used by bad programmers to avoid abstracting concepts and giving things clear names.
> Verbosity can be a lazy cop out for bad structure.
Verbosity and naming have nothing to do with structure this is categorically wrong, and also obvious but whatever, I'll show you..
func add_two_nums(x,y):
return x + y
func add(x, y):
return x + y
Literally, 2 functions that do the exact same thing. You may claim the bottom function is better because it's shorter. And I claim it's shorter by a measly two words, who cares, both functions convey equal meaning and equal structure.
>are we talking to a language expert, or a domain expert?
Literally domain expert code is a synonym for bad code. All code that is bad, when studied long enough will produce a domain expert that knows that shitty code inside and out. A domain expert is someone who mastered (or wrote) code only readable by other masters of reading that same bad code. Think about it this way, if you posted your code on github and people started reading the code, all domain expert code will be regarded as shitty code. This is the colloquial definition of bad code. The best code is code on github that should be readable by non-domain experts on a single pass.
Now I admit that there are some cases where it's just too hard to do this. You can't program a simulation in relativity that's so readable that someone who doesn't understand relativity can read the code. Of course that's just too much to ask. What I'm saying is that "inner_join" is utterly unnecessary and that "merge_married_profiles_into_list_of_pairs" way better then what you came up with.
Programming is more like mathematical notation than english prose.
In mathematics we write a² + b² = c² to describe the relationship between the lengths of the sides of a right triangle. We don't write it out in english words longhand, because the notation is brief, packs a lot of meaning into a small amount of space, and lets our minds focus on larger concepts rather than parsing long phrases and keeping their meanings organized.
>In mathematics we write a² + b² = c² to describe the relationship between the lengths of the sides of a right triangle.
Therein lies the problem. Can you explain to me the meaning of a² + b² = c² without English? Can you just write down an equation and expect me to know what you're talking about?
Can you explain to me the concept of entropy by just showing me all the equations?
Can you explain to me the meaning of your program with only one letter variable names as shown in your Pythagorean equation above?
You can't. That's why math texts consist of equations AND English, and there's no reason programming shouldn't either.
Anyway a side note, have you ever heard of literate programming?
Why is matematical notation full of one-letter variables?
Cool when paper and ink is expensive and you're trying to send your proof to the other mathematician, in a letter in the mail, in the 16th century, but now? Why?!
Some of the highest value-to-effort feedback I've both given and received in a PR is about naming. Whenever I see something where my first impulse is to react with "WTF?!?" I now try to ask myself "does something here just have a bad name?" and much of the time that's all it is.
I was working on a financial trading program at one point, specifically a function to filter orders into bids and asks. They named the order value "total" and the order size "sum".
It made a really simple function incredibly difficult to read.
> My friend told me at their company they'd commonly convene the "variable naming committee" for such occasions, and I can't help but think of it every time I find myself in the same place.
You better call it the bike shedding committee. I don't see how that saves time over, say, just let anyone working on that code who really dislikes to propose their change in a next merge request.
Joking aside. I agree that it doesn't seem like the kind of thing you want to convene a committee for. But it happens where the database is the contract shared by a bunch of applications, in which case it's important to get it right, maybe important enough to spend a meeting on. It's not my architectural style of choice... but it happens.
Agreed. My problem isn't even naming, but overly long names, which become a problem in C# and Java, where you have the namespace, the method, a service inside the method, a long type (because "var" can be an issue), and so on.
In principle I agree, but in practice don't think this leads to great results. Naming things well is really cognitively challenging. The human brain is naturally lazy and easily makes excuses to avoid thinking hard.
When something like this is adopted, the average person will look at something then quickly throw up their hands and declare that they can't name it without really trying. Many things can be named well, but it takes 60 seconds of hard thought and focus to realize it.
Yeah, I agree with GPs sentiment, but would emphasize good abstraction over a good name. Hard to define a good abstraction though, but I think a good starting place would be to make it feel single purpose. Once an abstraction becomes too multi purpose is when configuration and case switching spaghetti begins to start.
For this concern to make sense, people would have to go around easily inventing good abstractions all the time, but having great trouble naming them. That seems so implausible to me. If you can't name it, that's almost certainly because it sucks.
> No abstraction is better than a bad abstraction.
I misread that at first. At first I thought you were saying that there is no better abstraction than a bad abstraction. Gotta love the ambiguities in English.
On the one hand, yes, or as Sandi Metz put it, “duplication is better than the wrong abstraction.”
On the other hand it is often possible to refactor and restructure code such that you no longer have to name something. Returning a closure rather than a value, or folding over a list, are two of my favourites approaches, since in many languages you then deal with the object symbolically, or implicitly through syntax, rather than explicitly or by named reference. Coroutines, continuations, and generators too.
This is absolutely the rule. The problem I see a lot is that someone else will come in and want to add an edge case to my generalized code or will try to include edge cases in their "generalized" code. Personally, I have 0 problems with DRY code. I don't even treat it like a requirement but sometimes, I'm trying to code something and I KNOW the code needs to be generalized or at least part of it and that's when I'll spend an hour on a 15 min fix. Effective DRY comes with experience.
You forgot the third most common case - 90% of those repeated methods will never have a bug and now you've got a code base that will start smelling bad and no one is going to want to work on it.
I agree with some of the things here, but allowing multiple places of repeated code unless there isn't a bug sounds like a terrible idea. Lot of these small methods will never have a bug and they'll continue to rot the codebase.
Can you get the trade off right 100% of the time? Because I can tell you, every time I've worked on a codebase that repeated itself, it has been a freakin' -delight-, compared to the times where DRY was taken as a commandment from on high.
The former when something broke, I could just...fix it. And it would be fixed. Would other, similar situations, still be broken? Sure! And when those would be raised up we'd fix them too, and compare them with the other changes, and possible refactor. Fixing one bug = one bug less.
The latter? Oh God. Something is broken? We'd fix it. Aaaand, now there'd be two bugs. Fixing one bug = more bugs.
Perfectly balanced code, yes, fixing one bug = multiple bugs fixed. That's the goal. But you won't get it right if you do it pre-emptively. Which of those other two options would you prefer?
I feel like I'm taking crazy pills because you describe exactly what I have to deal with on a daily basis, but I'm the only on my team that feels this way.
I don't even know how to comment to something like this. I guess with an example - I spent some weekend time fixing 30 file paths in a Cloud function, because the junior developer couldn't abstract the root to a single method.
After you do something like that, tell me how much you love fixing bugs everywhere.
Maybe you misunderstood; I am not saying avoid DRY. I am responding to a thread wherein the parent said, essentially, "confirm that it's really repeating".
My whole point was that we reach for DRY too early, when we don't know if it's actually repeating ourselves. The logic -right now- looks the same. Will it be in the future? Most of the time, we don't. Even when we think it will, we're often wrong. Your example sounds like there's literally a string literal that was not abstracted out, but hardcoded 30 places. That isn't logic. It's a literal. It has one meaning, and you can probably ascertain whether or not the meaning is the same across everything. And, as mentioned, it seems weird you couldn't grep for it.
Either way though, as a counter example - I and a coworker spent three months playing whack a mole extending and supporting a rather small desktop application written by one, extremely senior developer, who had assumed 7-8 different things were basically the same, and so had written it to share a lot of the same code. They were not the same, and so a fix for one thing invariably broke two others. I was thinking of it in particular with my original post; we fixed one thing, and two more things would be broken.
I got permission to rewrite the application over the course of a couple of days from the team lead, and proceeded to basically do a lot of copy paste to separate out each thing into its own control flow, where a fix to one would not affect the others. Our MTTF plunged, and within a month it was basically stable.
I think that's where SRP and naming comes in like mentioned elsewhere in the thread. If methods really do one simple thing and are properly named, it's easy to tell if they should be re-used or not.
A lot of the time, side-effects are not mentioned in method names or params (or the params are optional, that's where the issue comes in).
E.g.
SaveCandidate(Candidate candidate);
In reality, SaveCandidate does two things, saves the candidate and updates their "ProfileCompletionStatus".
If someone sees SaveCandidate and not SaveCandidateAndUpdateCompletionStatus
they will re-use SaveCandidate in places here completion status shouldn't be updated (contrived example - data migrations).
The problem, of course, is that these overly-clear, overly-detailed methods are a pain to read. The common advice is "it's separate", but then we are repeating
everywhere, despite them being generally one process. I guess this is all pretty pedantic. When you spend a ton of time on this, you are missing out on getting stories done, but if you don't, the code eventually becomes umanageable.
Right, but then someone sees that, and goes "Ah-hah. This is saving Candidates. Over here I'm also saving Foos. I'll refactor this so I have a generic 'Save' method/function, and then both Candidate and Foo can call it. DRY!
And that works, because initially saving is just taking a serializable object and dumping it to JSON. But then things get moved to going to a DB or something, and now it has different expectations, and, oh, crap, we have to do some special logic in the transaction because it turns out saving a foo requires us holding a lock on the bar table as well, but not for candidates, and etc etc etc, and saving was -really just two lines of code in the first place-, and the correct solution is isolating it entirely (with maybe a Saver interface to say 'yeah, this thing knows how to save itself'. Though that likely breaks SRP since now Foos know how to be Foos, and how to save Foos, which involves knowledge of the DB, which seems suboptimal, but which is still cleaner than what you had before)
Etc. My only point is that if the area you're tempted to repeat is not provably the same, contextually, keep 'em separate until it is (even if it's like...they implement the same interface). The pain of getting that wrong is almost always less than combining them, building more assumptions on top of that, and then finding out they should have been separate.
Reminds me of the time I wrote 100 similiar sites separately with slight changes. A quick global search/replace will introduce so many unnecessary replaces unless you are careful and miss so many unless you take linebreaks into account.
A global search/replace is a hotfix hammer that should only get pulled out rarely and carefully by manually reviewing all changes.
Bad example, it was more complicated than that. Azure functions have a different file system than App Services, and the method tried to determine where it was hosted 30 times, or something like that. Genuinely don't remember.
Refactoring repeated code is toil, but cognitively trivial. Refactoring a bad or leaky abstraction can be fiendishly difficult. I'll take the toil any day.
These kinds of abstract code discussions almost become immediately absurd because, for it to work, we have to be imagining the same hypothetical codebase. Yet we never bust out concrete code. It's funny.
But the outcome that you're lambasting so rudely (really? "terrible idea" when we aren't even looking at code?) is still often the best outcome.
Some rotting, bugless, duplicated code is some of the easiest code to work with. It's the code you wish you had when you're debugging the complicated failing abstraction that GP wanted to avoid. The most damning thing you yourself could say about it is that it was taking up space.
In fact, you seem to be making the exact reverse argument of GP: that some duplicated, unabstracted, bugless code poses such a risk ("rot") that it's a "terrible idea" to not immediately merge it into one frankenabstraction.
When this happens in an argument, usually you both are imagining an absurd extreme that's opposite of the other's chosen extreme. And you actually are in agreement, as you'd both go "oh well yeah, if you go to that extreme, then I'd definitely agree with you."
I want to frame your first sentence. Abstract code discussions are absurd. I’ve learnt to mostly ignore programming blogs if they don’t include some concrete code.
Most code maintenance issues don't really appear until a relatively large amount of code and code evolution is involved. It's hard to include that in a blog post while keeping it digestible for the average reader.
In a small example, excessive duplication is not a problem, and excessive DRYness is not a problem.
Therefore, it makes some sense to me to leave out code, and just hope the reader has personal experience to draw relevant examples from.
Uncle Bob covers this by talking about how real duplicate code should change for the same reason. If the two pieces of code look the same, but from a business perspective can change for very different reasons then you have incidental duplication and should be treated separately.
I found this so often when I was doing simple back-office crud coding. There would be a new business use case that was very similar to something we already had coded. Code would be copied, since that's the easiest thing to do and you at least started with something that you knew was working.
Later, the new use case would evolve and have some new requirements. Had we abstracted the functionality originally, we'd have to go back and make the abstraction handle both cases. As it was, we could just change the copy that needed to change, and know that we weren't going to break the other case.
This was also before the practice of automated unit testing was well understood and supported by development tools, so the motivation to "not risk breaking working code" was much stronger than maybe it is today.
So, this take is sort of fashionable now, but it’s never really convinced me: what I’d suggest is that, when you discover that the extracted method is a bad abstraction, you do one of two things: (1) inline the method (IntelliJ is great at this for Java) or (2) duplicate the method and rename, adjusting the new version for the new use case. As hard as naming may be, adding the level of abstraction often ends up helping keep each method working at a single level of abstraction.
In my experience, the biggest messes I’ve inherited were a result of not being DRY enough: the cases of incidental duplication I’ve run across have been comparatively easy to unwind.
Couldn’t agree more. I can agree that premature abstraction is bad. But unnecessarily duplicated code outweighs this by orders of magnitude, in my experience. So stay DRY, and later when you hit that 1 case in 100 where the code needs to diverge, it’s easy enough to copy/paste.
> before long your helper method is extremely difficult to reason about, because it’s actually handling a dozen cases that are superficially similar but full of important differences
This reminds me of Mike Acton's Three Big Lies of C++, specifically Lie 2: Code should model the world. [0]
> A chair is a chair, in real life. But in terms of data-transformations, in terms of what we do, these classes are really only superficially similar. In the context of a game, we have a Chair, a PhysicsChair, a StaticChair, a BreakableChair. These things are not at all similar. There's almost nothing that's the same between these contexts. How they're handled, how the data is managed, how the data is transformed, there's virtually nothing that's the same here, and yet the tendency would be because they share some world-modelling similarities, their similarities in the real world, they ought to be connected somehow in the code hierarchy, which is non-nonsensical. World-modelling leads to monolithic, unrelated data-structures and transforms. [...] You can't make a problem simpler than it is.
> A chair is a chair, in real life. But in terms of data-transformations, in terms of what we do, these classes are really only superficially similar. In the context of a game, we have a Chair, a PhysicsChair, a StaticChair, a BreakableChair. These things are not at all similar.
I feel like that comment either takes a very literal and naive analysis to the problem or fails to identify the objects being used.
Just because the word "chair" pops up in a few objects that does not mean they are supposed to be the same thing, and thus modeled as specialization of a common Chair class.
For example, PhysicsChair makes sense as a specialization of a physics-related class, not a Chair-related class. Breakable hair would also make sense as a specializationof a physics object, which might be comprised of multiple discrete elements or track damage to generate new particles when a threshold is reached. If we take the single responsibility principle seriously, it makes sense to have specialized physics and graphics classes that implement the functional requirements of handling a chair.
This by no means implies that class Chair should be a superclass of all these other cases, or even that it makes sense to even consider them to be related at all. A failure to identify the models and their functional relationships doesn't mean that your domain has to include relationships that don't really exist nor make sense.
> Just because the word "chair" pops up in a few objects that does not mean they are supposed to be the same thing, and thus modeled as specialization of a common Chair class.
You agree with his point that class hierarchies shouldn't necessary model reality.
> For example, PhysicsChair makes sense as a specialization of a physics-related class, not a Chair-related class.
Right, that's his point.
> A failure to identify the models and their functional relationships doesn't mean that your domain has to include relationships that don't really exist nor make sense.
Again, that's the point he's making. In terms of modelling the world, they're all just different kinds of chair, but that's doesn't mean you should model it that way in code, with object-oriented types.
> You agree with his point that class hierarchies shouldn't necessary model reality.
I'm pointing out that this assertion is meaningless, because it completely misrepresents what is actually done with class hierarchies.
The class hierarchies in the domain models (i.e., Chair) is never the class hierarchies used by components in the the service layer (i.e., PhysicsChair). Functional requirements of specific services never sleep into the domain model. Call it bounded context, single responsibility principle, or encapsulation, but even if we play little semantics games a home is not the same as a mobile home. Pinning together unrelated concepts based on the naive belief that sharing a keyword is enough to bundle concepts together just misrepresents the whole issue as a straw man.
> Right, that's his point.
No,the point is that it's absurd to talk about class hierarchies with this example, based alone on the simplistic idea that having the word "chair" appear in the identifier is all you need to logically bundle unrelated concepts. The example is poorly thought through even through the perspective of a straw man.
> Again, that's the point he's making. In terms of modelling the world (...)
That's what you are failing to understand: they are not modeling the world. At all. And from the start.
Only the domain model (chair) models the world. The rest is not the domain model, but data types used in the service layer to meet functional requirements (i.e., PhysicsChair).
Let's put it bluntly: the straw man you are trying to defend is something that no one one the world at all ever mixes up. Ever.
My boundary on DRYing has settled on "when it's semantically the same code".
Code that looks the same but is used for different purposes / has different meaning? Keep it separate. They may diverge in the future.
Code that truly means the same thing, but may or may not look the same? DRY it up, any variation is probably unintended and will surprise people later when A doesn't behave like B.
DRY is a good principle to default on, juniors need to learn to pick this up - I constantly need to fix this kind of thing in reviews, they haven't learned to recognise when stuff should be factored out and they also write buggy code so it's easy to show them that writing the code once, isolating dependencies and testing it is a better approach than coming up with the similar code over and over in ad-hoc fashion.
Once you start getting it you tend to go too far (everything is a nail when all you have is a hammer) and experience helps you learn to judge when to use it - but it's an invaluable tool and you need to use to get the experience - so I would suggest extract shared code by default - inline when it gets too complicated.
It took me a while to realize this truth, because you might learn it only by seeing your own code after forgetting it, or spending enough time in other people's code to understand how it works through and through.
One good example of this is paths in makefiles. It's so much easier if you just make all the paths relative to the root of the project, but people have to either over-abstract or under-abstract everything or both.
this goes for all sorts of things. Yes, you might build a directory from somewhere else or might, might might. But why not make it clear what you are doing, not unclear depending on what you might do?
I try to suggest this exact strategy, but it's hard to find acceptance in my current job. I call this "letting the code tell me what to do." I find that religious adherence to DRY and all kinds of other rules ends up in just...shitty, shitty code.
> So you try to extract that boilerplate into a method, and it’s fine until the very next change. Then you need to start passing options and configuration into your helper method... and before long your helper method is extremely difficult to reason about, because it’s actually handling a dozen cases that are superficially similar but full of important differences in the details.
What if the flaw is not the initial deduplication, but the flaw was to continue to use it by adding additional responsibilities?
When I think of the incidental type, I think of things that just happen to be alike, a contrived example would be ENUM value that happens to be 3 used in different places ands that works because both happen to use 3.
"Let the code go through a few evolutions and waves of change. Then one of two things are likely to happen...".
That sounds great in theory, but how long is your evolution time and do you keep track of the changes in practice? I would assume a good evolution time is 3 to 6 month. I would say half the time I work on a story i face that decision. If you close 1 to 3 tickets a week you have to track a lot these decisions [1]. Now add that you work on a team of 5 and you have 5 engineering teams overall.. it seems to me that this approach does not scale.
[1] granted not all tickets will be new decisions in new places of the coffee base and you do come across some of your old changes and do the actual evolution.
Good observation. I seem to recall in one of Linus Torvalds' old rants about C vs C++, he mentioned locality, and how abstracting everything possible in a codebase resulted in loss of locality.
Locality encompasses multiple concepts, one of which is what you mention. Another is the ability to look at any section of code and understand exactly what it does without having to dig through chains of abstractions.
The implication is that locality results in more LOC, but a more tractable codebase, especially as it gets larger and more complex and more difficult for engineers to hold in their head.
I use a similar approach: first code, then take a refactoring session and look if some repetitive code can be "condensed" in a common method.
By the way the copy-pasta code is very resilient: a bug introduced in one place cannot spread, because the other code is duplicated and old.
Instead if you have too much condensed code, you end up with complex method which are difficult to maintain, because you have a huge "coverage land of code", and you get scared when you need to change it.
I like to say that the opposite of DRY is PYIAC: Painting Yourself Into A Corner -- because you often only notice a situation like this once it becomes incredibly difficult to get out of.
It’s a trade off - DRY vs abstract Dependency, are they superficially familiar (then don’t create a unnecessary dependency) or are they functionally familiar? Then DRY
I think this misses the point of DRY a little bit. DRY isn't about not copy pasting code, it's about ensuring that knowledge isn't repeated. If two parts of the system need to know the same thing (for example, who the currently logged in user is, or what elasticsearch instance to send queries to, etc.), then there should be a single way to "know" that fact. Put that way, DRY violations are repetitions of knowledge and make the system more complex because different parts know the same fact but in different ways and you need to maintain all of them, understand all of them, etc. etc.
Code blocks that look to be syntactically the same are the lowest expression of "this might be the same piece of knowledge" insofar as they express knowledge about "how to do X", but the key is identifying the knowledge that is duplicated and working from there. Sometimes it comes out that the "duplication" is something like "this is a for loop iterating over the elements of this list in this field in this object" and that is the kind of code block that contains very little knowledge in terms of our system. But supposing that that list had a special structure (ie, maybe we've parsed text into tokens and have information about whitespace, punctuation, etc in that list) and we start to notice we're repeating code to iterate over elements of the list and ignore the whitespace, punctuation elements in it, then we've got a piece of knowledge worth DRYing out given that all the clients now need to know what whitespace & punctuation look like even when they'd like to filter them out.
It's worth pointing out that DRYing out something isn't necessarily "abstracting", it is more like consolidating knowledge into one place.
The most fun bug I've encountered as a web developer is of this category. Two pages, both check for a logged-in user and redirects to the other if found or not found, respectively. The bug was a subtle difference in how these were calculated, the details of which are unfortunately lost to the sands of time. The end result was that if you sat on one of the pages and waited for your user session to time out, you'd get stuck in a redirect loop between the "logged in" and "please log in" versions of the page.
Anyhow, the point of this is that when you calculate the same fact two different ways, you will occasionally build something that makes an unwarranted assumption that because it's the "same fact" you wind up with the same answer. This is an entire category of easily missed and often subtle bugs.
And in both cases, it was a sign of mismanaged design. I have encountered THAT EXACT bug, and the reason we supported both was because both were released and users began to expect both pages for different reasons. What we needed was a designer to sit down and say, hey, this design seems replicated, how do we mitigate this? This version of DRY becomes a business and resource problem, above the developer, and unfortunately, this means that you or do not have the resources to adequately deal with it.
I haven't seen this "don't repeat knowledge" take before, it's pretty interesting. I see why you don't want mutated various versions of the same information all over the place, but you still have dangers.
Especially if you "overly reduce" your knowledge. If your common recipe is "do A, B, C, D, E" and you reduce that to just "do X," for instance.
I've seen this often turn into "now, instead of the knowledge being repeated in several places, it's hidden in one place and only one person knows it." Everybody else just relies on the library doing its magic, and when someone needs to do something differently, they have this huge mountain to climb to figure out how to modify the code to also do "J" for certain cases without breaking everyone else.
As someone who deals with 15 million lines of code (and many readers of this have bigger systems) i need to trust that do X does X without me having to know how. When I have to learn it slows me down from the part of the code I need to know well. If do J is needed, that needs to be someone else's problem who knows the rest of do X. Unless do X is my responsibility of course. But nobody has responsibility for more than a small fraction of the code.
This is a great point often forgotten in this kind of discussion.
Size matters, and depending of the system size we’re dealing with it will have a significant impact on what approach we take. Or how we handle documentation for instance.
There is definitely a spectrum of "knowledge" at play when it comes to these considerations. The most obvious DRY violations are those kinds of things that you go "oh I need to test for this case" because that is usually an indication of some knowledge you need to know when interacting with a piece of code. EG, if you ever use -1 as a sentinel value then the knowledge of "what -1" means should be consolidated together, otherwise all clients will have to know that -1 is a sentinel, what it means and at best you'll have duplicate code, at worst those interpretations won't align and you might have a subtle bug where that -1 is doing something somewhere (ie it is supposed to mean "No information provided" but somewhere something is keeping an arithmetic mean of this field and those -1s are now screwing up your metrics and you don't really notice).
When we think about the knowledge of "how to do something" that's where things can get confusing. 9/10 times I'd say that right move is to look for common assumptions or facts. IE it isn't just "doing something" that is important, but the assumptions made in the process of doing it:
As an example, consider finding the average word length in some piece of text. We might start writing that feature like:
def count_words(text: str) -> int:
return len(text.split(' '))
def average_word_length(text: str) -> int:
num_words = count_words(text)
word_lengths = []
for word in text.split(' '):
word_lengths.append(len(word))
return sum(word_lengths) / num_words
then the piece of knowledge they share is "what a word is" and the DRY refactoring would pull out that piece of knowledge into its own function
that might be code you write when starting to write a feature and that's the kind of "ding ding ding there's common knowledge here" that should guide refactoring. The system has a concept of a "word" that we've introduced and its important that knowledge about "what a word is" in one place. For DRY things it frequently doesn't make any sense for there to be multiple statements of "what a word is" where the system wants to use the same concept.
Kind of orthogonal to this is abstraction where the focus is on "usefulness" and that is where 100% you can abstract incorrectly, prematurely, get screwed over by requirement changes, write a library that hides everything and makes people angry. The example you provide seems more like an error in abstraction where things that should be close together are too far apart in the system (ie, some "fact" is hidden away and another part of the system wants to know it), but the consolidation and DRYing of those facts, I'd argue, is a lot easier once we've figured out how to identify them
Yeah, I like this approach, because the "what is a word" knowledge is a nice piece of common functionality that doesn't make sense to repeat. It's unlikely to change for just one of those two functions.
In my example, it's less a "core piece of knowledge" that people are trying to DRY, and more just a "common sequence." Someone sees a bunch of different places where we have a sequence of calls like A, B, C, D.. and says "oh this is a shared method I can extract" even if there's plenty of ways that in the future you might want to do A, B, C, E without D. And so then you pass in a bool, than another one, and you have a centralized mess...
I think the distinction is that if those two pieces of code had a different idea of what a word is then that would constitute a bug, then you definitely need to replicate the 'how to find words' logic. But if it doesn't really matter if two different pieces of code are using the same exact way to do something, then that's likely 'coincidental' replication. If you need to do word splitting, and someone else has written a word splitter, by all means copy paste their code to get you started, but definitely don't assume the best plan is to pull their code in as a dependency.
These things need to be balanced. I live in an ecosystem of DRY gone amok and it's not pleasant.
There's a standard library to connect to databases. There's a huge hierarchy setup just to start an app running.
All of these super dry infrastructure changes have, unfortunately, come with a huge cost. We are still stuck on ubuntu 14.04 because our super dry puppet framework we invented can't be ported to puppet 6.
We are stuck talking to MS-SQL, because our super dry database connection management library can't handle establishing other database interactions.
We are still stuck on Tomcat 7 because our super dry Jersey libraries don't work with newer versions of Jersey (which has locked us into older versions of tomcat!).
Consolidation is a decent goal, but it really needs to be measured. For me, it is FAR more important to consolidate on the how to do things and not the what does things. In otherwords, rather than making an "elasticsearch connection library" specify "This environment variable is the elasticsearch host/credentials" and let the apps move from there.
That's because, when it comes right down to it, configuration code is super easy to write and it really doesn't matter if it's duplicated. You want your libraries consolidating knowledge to be for things that are easy to get wrong (such as checking who is currently logged in or how to authenticate).
> Consolidation is a decent goal, but it really needs to be measured. For me, it is FAR more important to consolidate on the how to do things and not the what does things. In otherwords, rather than making an "elasticsearch connection library" specify "This environment variable is the elasticsearch host/credentials" and let the apps move from there.
I think we're in agreement here. Config is the most basic kind of knowledge because when something wants to know about the elastic credentials,it almost never makes sense to have it in two places if those two places are supposed to be the same thing.
How to actually connect to elastic -- that's the part that is more iffy. If there is some knowledge we've added there, then it makes sense to DRY it up, but the knowledge of "this is how you pass credentials to this elastic search client" isn't the kind of system knowledge we care about. If, for example, there were some kind of parameters that we had to set on each connection and we claimed it as a piece of knowledge that all of our connections to this service are of this specific TYPE and have these specific parameters, then we've started to add some additional systemic knowledge that might need to get consolidated.If someone were to start working on a piece of code and I feel the need to tell them "Don't forget about X" then that is the kind of situation where DRY comes into play. If it's just a vanilla connection to a database and we don't care about the connections made, then I do given't think we have a violation of DRY given that there isn't an important piece of knowledge that's repeated.
At some point, especially when we pay too much attention to copy-pasted code, we end up abstracting. Abstracting is hard, more general, very difficult to do right, almost always done to early. DRYing out knowledge is easier and almost always improves things.
IMHO it is not the author who misses the point of DRY, but countless developers who make code less readable only to reduce visible repetition or to avoid copy-n-paste. May be DRY is just a bad name.
I agree that the name took over. The intention sounds synonymous with bounded contexts of DDD.
I find the vocabulary of DDD to have more explanatory power. Especially with people who don’t grok the difference between removing repetition and consolidating models.
I think repetition is a symptom that a code base may be afflicted with interwoven domains, but the existence of repetition is not sufficient for the diagnosis, IMO.
Bounded Contexts is an idea that helps you draw the boundaries between domains. It asks you to be disciplined in your abstractions, and in return it allows you to feel comfortable changing implementations within a domain without fear of cascading second order effects to other domains.
For example, your service/library for managing customers shouldn’t return data about the books they’ve purchased. That comes from the order context, which composes the customer and book contexts.
If your boundaries are well defined, you can change the order process without fear of the book and customer models, and vice versa.
It marries well with service oriented architecture, because you can use the network to help enforce a boundary. You still need some skill to enforce the correct boundary, of course.
Yes, I've dealt with systems that had bad abstractions.
And I've also dealt with systems where knowledge of highly nameable things - like how to authenticate a user, or how to connect to a database, how to obtain a token to the same API server - wasn't centralized.
Systems of the first kind are certainly bad. It takes a lot of time to understand before you can get ahead and start refactoring. If your organization had low code review discipline at any point, abstractions often become hard to refactor with time, since some developers don't understand the abstractions, and instead of fixing them, just work around them with thread locals or lots of branches.
But systems of the second kind are much worse. Here what happens is that duplicated knowledge invariably diverges with time. It can be developers fix a bug in one place and forget the other, or adding a certain feature in one place and another one the other place. Over time, each implementation of the knowledge has it own unique behavior and bugs, and some parts of the sprawling code base grow to depend on a certain behavior. Or perhaps your code doesn't, but you have other services in other part of the company consuming your API that do and you just have no idea if they rely upon the implementation difference or not.
If you write it once, you eliminate the chance of a small fix not propagating properly. This is particularly common when handling files and network connections, as those tend to develop edge cases over time.
DRY reduces the number of potential lose ends when you update your code.
Not all code is knowledge, in this sense. And sometimes repeating knowledge is better, on balance, than unifying it somewhere, when you consider the added costs of coupling, of reification, and of abstraction liability.
A better formulation of DRY is SPOT — Single Point Of Truth. In the event that the logic is changed in one copy, should the other copy always be updated accordingly? If the answer is yes, combine them into a single copy, so that they don’t diverge and create ambiguous “sources of truth” in the future. Conversely, if it is likely that the logic in the two copies will need to diverge in the future, due to having a different context, then do not combine them, because they represent different “truths” that just currently happen to have the same form.
Of course, the answer to that question can change over time, and one has to combine or duplicate accordingly. This also serves to document the intent that “yes, these two occurrences are expected to evolve identically”, or “no, these two occurrences are expected to evolve independently, even though they currently happen to look the same”.
The article is correct though that there is a trade-off in terms of the complexity created by the abstraction, and in how important the “common truth” is. Sometimes a source comment pointing out the dependency is better than introducing a nontrivial abstraction.
The book “A Philosophy of Software Design” argues that there are two sources of complexity in software: dependencies and obscurity. Combining two similar pieces of logic into one can reduce dependencies (of one having to be changed when the other is changed), but can increase obscurity due to the added abstraction. If the combining was done for the wrong reasons (the two occurrences actually need to evolve independently), then the dependencies are increased instead of reduced.
Love this, yeah. I've heard it phrased like "how do we answer questions" so like, "how do we answer questions about a job's status", "how do we answer questions about a user's bank account balance". Either way, once you have those kind of product requirements in place you can build to that spec, and then start iterating as you gain more knowledge.
Duplication is usually a safe default course of action because you are not locking yourself in to any particular consensus of the problem domain. Obviously, too much of this will render a codebase a nightmare to maintain as bugfixes and feature enhancements have to be applied in multiple places.
I have found that starting with duplication is by far the easiest and most flexible way to work through problem domains that are complex. Once you have a really good grasp of the modeling, then you can iterate and decide on normalization where appropriate.
Thinking about this from an analytical perspective - If you build your application with duplication by default (i.e. define a domain model for each logical use case/scenario), then you will have an excellent analysis already in front of you regarding which business types should be normalized and which ones might be a little bit trickier to make common. Many times it is impossible to fully explore a problem domain until you have already written software against its entire extent.
And often the process of de-duping repeated code later on isn't as bad as DRY purists make it out to be, especially since it can be done incrementally. Example: If you have similar functions in 7 places, you can consolidate them one by one. If you have 1 function used in 7 places, you have to consider all implications for all of those code paths.
But when you set out to do that you first have to make sure they really are identical, and since everything from indentation to names often differ it might not be so automatic. And then you do find differences, and you have to figure out if they are there by accident because someone forgot to implement a fix or a change in some places or if the differences are intentional.
On the other hand, if you realise your abstraction was bad, duplicating a function is always trivial.
I've seen code copy-pasted and slightly modified over a dozen times, sometimes without even eliminating dead code! Copying a function or even a whole file is fine if you actually take a moment to consider whether it needs refactoring or not, but more often than not people will just copy and bash at the code until things work without actually making a conscious choice to duplicate code over refactoring.
This seems like an equivalent duality to me but in the case of seven independent functions you're much more likely to miss considering a case. If behavior changes for one you likely should be considering if that change applies to all the others.
Yes, yes, yes! I wish devs could relax just little about duplication. It’s far less harmful than bad abstractions built with the only purpose of checking the DRY box.
Predictably, another article that doesn't know what DRY is, leading to slaying of strawmen.
I believe DRY was coined in Programming Pearls, and probably none of the examples in the article are instances of DRY.
DRY is about knowledge/requirements, not similar code. It is about ensuring that a given requirement is not duplicated in multiple places in the code. It is not about similar looking code, which often involves differing requirements but just happen to be coded similarly. The latter leads to coupling if you unify it into one piece of code.
I sort of agree with you. A better title/approach would have been "These aren't DRY, they're silly 'no common code' bigotry". I find the article resonates because I've seen all of these anti-patterns defended with "because DRY". I agree, it's not DRY; but so many people get stuck on the "no code duplication" part. I'm not sure if the "This isn't DRY" is the best fight or "Sometimes DRY is not the best".
My favourite block of "DRY" code was a method that had a triple nested loop (for all object A, for all objects A.B, for all objects B.C) with a bunch of flags (like 15 different bools, ints, dates and arrays) that changed the ORM filters for A, for A->B, for B->C, and then changed what operations were done on A, B, C. Basically, at the end, the only similarity was the foreach part. The comment on the block of code had "Keep this loop together for DRY", as if they knew this was going wrong but not sure why. It ended up being 3 or 4 much simpler methods, based as you say, on requirements NOT code "shape".
> I'm not sure if the "This isn't DRY" is the best fight or "Sometimes DRY is not the best".
The problem is that when people believe this is DRY, they then tend to oppose the "real" DRY as well.
Likely we'll have to give another name to the "real" DRY principle. In general, I've always felt that catchy names/acronyms are a bad idea for anything (e.g. free software, open source, pro-life/choice, etc). Almost all of them end up being used in ways that were differ from the original intent.
I guess I have to agree, in the example I gave, I actually had to fight with members of the team because they were sure the new multiple function approach (one for user, company and analytics) wasn't "DRY" in their eyes.
Yes, thank you! Exactly what I was trying to express in another comment. And if you do it the right way, none of the problems with DRY usually brought up will be relevant. The whole notion of looking at code and trying to spot similarities to find abstractions is completely backwards.
I've referred to this in the past as "semantic duplication" (code that is the same by definition/requirement) vs. "syntactic duplication" (code that just happens to do the same thing today, but there is no requirement that requires both copies to remain the same).
The problem isn't DRY, the problem is "helpers". Helpers are an anti-pattern, they don't fit in your architecture, they have no mental model, they're difficult (impossible) to name and organize, and they're extremely resistant to refactoring. Effectively they're spaghetti code.
The example I always come back to is auth. If you're doing the same thing like "parse a cookie header, get the session, make a DB connection, look up the session info, etc. etc.", consider how you could architect the layers of your application using a mental model that people would find easy to reason about. That might be some OO, middleware, or even a macro, but the point is that it's thought about, designed, engineered, and documented.
The reason helpers are more prevalent than thoughtful architecture is that humans are a lot better at prioritizing the short term "I improved it" fix from factoring into helpers over doing the long term work of architecture. If you want to change this, it starts with cultural values that prioritize long term sustainability.
Do you have an example you’re thinking of? I don’t think standard libraries automatically have good architecture (see: PHP), but they do have a big influence on culture, which is interesting.
No, I meant if you think helpers are anti-pattern that results into a spaghetti code, then you must also think that standard libraries are an anti-pattern mess. I don't think there's a standard library that doesn't have helpers, unless we don't agree on what a helper is?
Hmm that could be it. Like, Python has json and it works like pickle before it. I don’t think that’s a helper and I like that it reinforces that serialization pattern.
What I think of as helpers are like base64_auth_to_username_password. That’s factoring out like 2-3 lines of code that may be duplicated in a half dozen places, but in truth represents an incomplete abstraction, layer, or subsystem.
The way I like to work is first to write out all the code I need to make something work correctly, then I go back over the code to see if there's anything that could be simplified or split into separate functions, etc.
I really like to see DRY code, but if you have to make a helper function that takes a bunch of parameters with a bunch of conditionals to something slightly different, you might be better off just sticking the specific logic you need in each place.
The worst case of copy-pasta I saw in a codebase I came into was a function that was 1000 lines long, duplicated 3 times with < 10 lines of it different for each copy. That's a classic case for DRY to be applied.
I feel like I post a link to this comment [1] every time the abstraction vs. DRY topic comes up, but it’s just such good advice. I consciously try to remember it whenever I’m programming.
> Dependencies (coupling) is an important concern to address, but it's only 1 of 4 criteria that I consider and it's not the most important one. I try to optimize my code around reducing state, coupling, complexity and code, in that order. I'm willing to add increased coupling if it makes my code more stateless. I'm willing to make it more complex if it reduces coupling. And I'm willing to duplicate code if it makes the code less complex. Only if it doesn't increase state, coupling or complexity do I dedup code.
> The reason I put stateless code as the highest priority is it's the easiest to reason about. Stateless logic functions the same whether run normally, in parallel or distributed. It's the easiest to test, since it requires very little setup code. And it's the easiest to scale up, since you just run another copy of it. Once you introduce state, your life gets significantly harder.
> I think the reason that novice programmers optimize around code reduction is that it's the easiest of the 4 to spot. The other 3 are much more subtle and subjective and so will require greater experience to spot. But learning those priorities, in that order, has made me a significantly better developer.
That’s a really insightful comment, and also the reason why I think my code quality improved considerably after learning Clojure and F#.
I discovered that immutability in functional programming made it far easier to reason about state - something goes in, and you know exactly what comes out. No verbose checking logic, no bugs because something wasn’t set that should have been.
I’d personally switch coupling and complexity (though it really depends on your definition of the latter). Coupling is usually much less of a problem than you expect it to be, and is easier to identify/rectify in retrospect.
Well, there's certainly a lot of overlap in the Venn diagram of coupling and complexity! The really insidious thing about coupling is that it can transcend module barriers — you can make changes that cause problems in places that seem to be entirely unrelated. Whereas e.g. cyclomatic complexity can be tough to tease apart, but it's at least limited to a single module that you can work on in isolation.
It's not just DRY. Every axiomatic or semi-axiomatic principle of software development ends up being a trade off. Good code lies at a local minima where multiple competing concerns are all balanced against one another.
I agree about the importance of tradeoffs. Looking at the historical perspective, though, the reason that every principle is a tradeoff is that the principles that are uniformly worse get discarded.
For instance, structured programming (building code out of blocks with structured control flow rather than a pile of gotos) was victorious in the 1970s. Nowadays nobody considers the tradeoffs of using if-then-else versus gotos; structured code is the automatic choice.
Self-modifying code was very popular in the 1950s (since it was the only way to get many things done), but essentially nobody uses it now.
Modularity is another victorious principle of software development, winning out over big blobs of code with global variables.
Using a stack for subroutine calls used to have tradeoffs, but now nobody would consider an alternative.
Looking at the long perspective, there is real progress in software development (although slower than I'd hope).
> Self-modifying code was very popular in the 1950s (since it was the only way to get many things done), but essentially nobody uses it now.
...which proves that it is a tradeoff. Just one where the local minimum currently is at one end of the domain due to context (e.g. ROM size limitations) having evolved.
Abstractions are addicting for many developers, including myself. I switch between Go and Java. Go is the language I want my coworkers to use. I'd rather read "bad" Go code than "bad" Java code all day long. Bad Java can be truly excruciating to read and review, particularly due to the poor choice of abstractions. Whereas, Go mostly gets out of the way and may be written poorly but is straightforwardly written poorly.
Still, there's a certain sense of aesthetic beauty that I just can't derive from Go, and why I kind of hate working in it. There's lots of things about Java and OO that I don't love, but reading a perfectly factored Java code base can be just beautiful. Mostly due to good choice of interfaces.
Now, those code bases might be rare and not worth the lift of a million bad abstractions. I'd probably agree at this point, but still, I find it odd that most Go code bases just feel dirty and thrown together to me. Hacking stuff together in a mostly procedural language with good deployment story is probably the right way to write for-profit code, but I'm not sure I love it.
A lot of programming maxims like "stay DRY" are rules of thumb, and often are dangerous, or at least lead to unexpected results when treated like laws of nature. I had a developer who drank functional flavored Koolaid and refactored any single line of code that appeared more than once in an application. Was about 38K lines of code. When he was done, it was still about 38K lines of code. Was it functionally pure? yes. Was it very difficult to debug? Yes... you had to step into sometimes five or six functions to get to a single line of logic.
I agree. I also think that when building a prototype, flexibility is more important than stability. I allow myself to repeat code when I think two similar things will be different by the time requirements are better known. Later I'll usually re-write in a more stable fashion.
The whole DRY concept irritates my inner curmudgeon because it is itself a lousy repetition of the (formal mathematical) concept of refactoring. When you refactor code it's just like factoring an algebraic equation. If you're just removing duplication without understanding how it affects the structure of the system then it's a kind of "cargo cult" programming (IMO.)
Even when you know what you're doing. there's good refactoring[1] and bad refactoring[2].
[2] https://github.com/calroc/HulloWurld/blob/master/Hullo.html#... This is a terrible function that, while it abstracts the core of the two following functions well, makes the system harder to understand. In other words, the three factored functions are less desirable than the original pair of functions despite their redundancy, because the original functions were easy to understand and the new factored helper funtion is inscrutable. Context counts.
"...putting common lines into functions, without careful thought about abstractions, is never a good idea..."
(emphasis mine)
I think this is the crucial part. DRY works fine and in fact arises naturally if the code is well factored to isolate areas of commonality - as the author points out though, this is very difficult to do and I think that's the core problem.
Not sure if anyone else shares this anecdote, but I've noticed the most DRY-hard programmers tend to be the most resistant to things like functional programming, monads, and other generic approaches which are the ultimate realization of DRY. And often the most inscrutable.
Another anecdote on DRY: I'm currently refactoring two interrelated systems that share a single function, and very horrible bad things happen if the systems disagree on the return value of that function. However sharing the same code is more complex than duplicating it and this has a nontrivial impact on how the systems are distributed. So today I'm undoing the original work I did to make it DRY - turns out that sometimes, you need to copy/paste.
I will humbly suggest "DOBA": Duplication Over Bad Abstraction. Not nearly as good as "DRY", but it is fun to say.
I wonder how much of DRY's popularity is due to the catchy name. One can say "this needs to be DRYed up" (verb), or "this code is really DRY" (adjective), or "I appreciate the DRYness of this code" (noun). The opposite of DRY: "wet", of course. Being a homophone of an existing word that is both a noun and adjective, and that has an obvious antonym really lends itself to usage.
I see similar arguments come up over and over at HN, and I say they stem from a fundamental misunderstanding of DRY. What we must not repeat is not lines of code, but how we do certain things in the code. That is, it’s fine to repeat syntactically identical sections of code if their semantic meaning is different. But if the semantic meaning is the same, they must never be repeated because we must never have several definitions of the same thing in the same program. This is similar to the concept of normalisations in DBs.
can refer to the distance a car has traveled over time at a fixed speed with a starting point as well as the cost of ordering a certain quantity of an item with a fixed shipping cost
Hey, you mean "DRY IA TO"! Don't repeat common phrases like "is a" and "trade-off", damn it! Define an acronym and use that instead.
DRY is literally impossible. If something has to be performed or evaluated two or more times, and you factor that out under a definition, you still have to invoke the definition multiple times. I.e. you are still repeating yourself, just using an abbreviation.
What you are doing is called "compression". Classic data compression algorithms like LZ77 work by abbreviating.
"LZ77 algorithms achieve compression by replacing repeated occurrences of data with references to a single copy of that data existing earlier in the uncompressed data stream. " - Wikipedia
Outside of alcoholic drinks, that's the ultimate DRY.
Thus, the argument against DRY is obvious: it's a form of compression, and excessive compression destroys readability: or else we would all be able to read source code that has been put through LZ77.
Only mild compression improves readability. Mild compression improves readability largely because it's easier to see that two brief invocations of a definition are exactly the same, than to see that two repetitions of a code block are exactly the same. When we see that two code blocks are exactly the same, we don't have to understand them separately.
Basically, brainless repetition and verbosity hinders readability, as does dense, thorough compression. One extreme might be represented by reams of Java boilerplate; the other by IOCCC entries.
I recently got an epiphany reading the venerable SICP book. Best abstractions are not about wrapping everything in functions and calling it DRY. You create layers.
If you have 2-3 problems that kinda look similar, you could wrap them up in a function, with some arguments for variation, and pat yourself on the back for DRYing it up. But after some time you get a 4th or a 5th occurrence of the same problem, all slightly different.
Obviously you could create a mess of things by abstracting away what are different problems, but I don’t think “just don’t do abstractions until its really obvious” is a solution, to me it sounds more like giving up. And worse you can end up with piles of code that look kinda similar but with small differences here and there, really hard for new comers to parse and reason about.
What I think is a better approach is to try to come up with abstraction “layers” - smaller pieces of blocks that would make the same problem trivial to solve.
You got your 2-3 pieces of code that look kinda similar. OK do they have smaller parts that its dead obvious that are similar? Abstract away those, now you are left with parts you might be happy to leave as is, don’t need one big function for all of it.
And sure this has pitfalls too, but a good way to test any abstraction is to see how leaky it is. Can I successfully understand and reason about whats going on by just reading the function names / descriptions / types? Does it compose? Yes? Awesome now you have a layer you can build your business logic on. No? Well either change it or get rid of the abstraction until you can come up with a better one.
I checked out the author: he seems to be an expert, wrote some books. But this article is full of rookie mistakes and I can't figure out the purpose of writing it except clickbaiting.
> The main problem with repeating a code chunk is that if a bug is found, there is more than one place where it needs to be fixed.
No, the main problem is that reading repeated code adds cognitive load for reader. Also, when you copy-paste the code you should copy-paste its tests too, and that leads to CI/CD resource wasting.
> The alternative to copy-pasting the code is usually to put it in a function (or procedure, or a subroutine, depending on the language), and call it. This means that when reading through the original caller, it is less clear what the code does.
If the function and local variables all has clear naming and function does not mutate passed variables (which otherwise should be clearly implied by it's name) - you never need to step into it while debugging and just save a bit of time while reading the code.
So, the takeaway here is: the code is written for humans, not computers. The idea of clean code is ease of reading and understanding. That's what DRY is for.
DRY is just a another way of saying the best practice of don't reinvent the wheel. The difference being reinventing the wheel is applied to third party packages and DRY to you own code. The friction from blindly applying both is the same, dependency. This is especially apparent in JavaScript where literally thousands of dependent libraries can be pulled in and any two version incompatiblies can break the build. Pulling in any dependency usually means pulling in orders of magnitude more lines of code then will be used. The inert code relative to one's use can be subject to version incompatiblies and security patches no different than any other code. It can also be subject to build differences such as numpy and pandas being non-native Python and native Python libraries exist for many limited use cases. Of course none of this is an endorsement of cut/copy/paste development. However, it is an indictment of cut/copy/paste enforcement of DRY and the more general don't reinvent the wheel.
Want to see what happens falls too much in love with DRY? He invents an "automatic refactoring system" that automatically DRY-ifies a whole program: http://strlen.com/restructor/ (not that great an idea in hind-sight, as the page explains).
Interesting to see all the comments here that make it seem obvious that "fanatic DRY" is bad. Historically, there was a wave of OO love where a lot of people went crazy with wrapping every program element in deep layered class hierarchies. DRY (and OAOO, YAGNI etc) was maybe a backlash to that, and some go too far with it. Now we all seem in love with "leave the redundancy in the code", which is another backlash. What's the next backlash? Maybe we'll finally focus on how to balance these factors rather than seeing some extreme as the silver bullet?
I agree strongly with the main thrust of this article, but it doesn't support the "Loss of Locality" prong of the argument very well.
> when reading through the original caller, it is less clear what the code does
In my experience it is usually clearer, if the function has an even remotely appropriate name. And unless you are coding with notepad.exe, toggling between caller and callee is pretty trivial.
> When you move this code to a function as a part of a straightforward DRY refactoring, this means that now a function is mutating its parameters.
Whether that matters depends a lot on the language.
And anyway it seems like the main problem in code like that is that the function is inappropriately mutating state outside of itself in a surprising way, which might be a problem separate from dryness.
"Loss of Locality" does strike me as reasonable argument against code that is too dehydrated though. Maybe somebody else can make some better supporting claims for it?
Well, yes, it's true that there's a trade-off, but the "DRY principle" is justified by the other side of that trade-off. And that side becomes stronger with every repetition of your code.
Also, some of the negatives of not-repeating-yourself are not general:
* Often your repetitions themselves are localized.
* Sometimes, you can generalize your code even _more_, to factor out the non-general parts, so that you end up with a generalization of the scheme/mechanism, applied with specific details.
* You can limit your DRY to within your scope of ownership, i.e. have module-local / unit-local utility constructs. Then, only if you agree with others on the generalization, do yu surrender ownership.
It helps a lot to only extract common code that has a clear purpose. If I find myself naming a method "doThisThingThisOtherThingAndLogSomeStuff" then it's not a good refactor. On the other hand, "markStaleRows" or "enforceTreeInvariants" have clear purposes that help future engineers make intelligent decisions about how to use these functions and how they should evolve. I want to build in to the common code answers to the questions "should I call markStaleRows or roll my own code?" and "should I make this code change in the calling function or in markStaleRows?".
And if you have a function that is called markStaleRows, everyone better use it, because when I need to change how stale rows are marked, I will change this function and nothing else. I won’t go through the whole codebase trying to find if someone might have written some inline code somewhere else trying to do the same thing.
There are always trade-offs in development, and the older you get the more you value clarity of code over other potential concers, and locality of behaviour is a big part of that.
I can't find the original source (it was somewhere on medium) but this graph struck me:
DRY and SRP are the air I breathe, so I need to defend it.
"This means that when reading through the original caller, it is less clear what the code does."
This is false - if you have a clear service and method name, with clear, properly-typed params, it SAVES time, because unless it's broken, it's a SHORT invocation that abstracts the details.
"the function might have some surprising semantics. For example, mutating contents of local variables is sensible in code."
Pure functions are the only "apply everywhere" part of the functional programming paradigm. Side effects are just as painful in .NET and Java as they are elsewhere.
"Overgeneralized code"
SRP let's you identify this fairly easily. As an example, I just tried to abstract away "Edit" and "SuggestEdits" in the same controller to stay DRY, but then realized it violates SRP.
"Each modification of the "common" function now requires testing all of its callers. In some situations, this can be subtly non-trivial."
Much less painful then knowing you have to change it in multiple places AND THEN checking the code. Just doesn't apply.
"When each of those code segments were repeated, ownership and responsibility were trivial. Whoever owned the surrounding code also owned the repeated segment."
The team owns the code. At worst, the lead owns the code. Bad excuse.
Feels like a dramatic contrarian piece to me. As mentioned, SRP and DRY are the only two things that allow our team to keep churning without significant tech debt. That, and avoidance of magic strings.
These 3 things are probably 80% of clean code.
Edit: Oh, and keeping methods to 5 lines or less, except in genuinely extreme, rare cases. Of course that's theoretically under "SRP and abstraction", but you know..
Edit 2: The single source of truth comment is on point as well. There was a rule when Python was designed "There should be ONE, and ideally ONLY one way of doing things." This is extremely important to writing good code and on-boarding people onto your project.
> Oh, and keeping methods to 5 lines or less, except in genuinely extreme, rare cases.
This confuses me. Extreme, rare cases? Don't get me wrong, I code Lisp as much as anyone and love keeping my functions short and simple, but not going above 5 lines (esp. in Java or .NET) is masochistic.
By that metric, an if in Java takes up 60% of a function's real-estate, a local variable 20%, and a try-catch is your maximum complexity.
This way, the code is extremely readable, especially if you can easily "Go to definition" in a service, and the file this try-catch isn't 200 lines long.
Also, try-catch is generally an anti-pattern. It should only occur when you genuinely have no way whatsoever to prevent the exception, as such, it will be pretty rare.
Again, it does happen, for example we have an unstable legacy db we hit that we don't have access to, but it should be fixed asap.
Tbh, this is a constant argument between our juniors and I. They want to write 15 line methods, but I have found that they don't want to debug it.
It's constantly them building quickly on top of my code and then me having to fix theirs because they get frustrated debugging their own code.
Whatever is inside the catch block is by definition the exception handler, so I find the extra function definition to not bring much value. It also doesn't annoy me too much since I can just go to definition, but if it's only 3-4 lines long I would rather have it in the block itself so I can read it at a glance.
> try-catch is generally an anti-pattern.
Exceptions are a very useful part of the language if you use them right (to divert code flow in predictable ways in unpredictable scenarios), although I agree it can be easy to go wrong.
Regarding your juniors, that seems more of a problem of skill or determination than anything. Debugging in Java is relaxing, it has stack traces, a debugger, a nice IDE, no segfaults.. a 15 line function should not be an obstacle. I would even prefer to have more of the code on the screen sometimes so I can get the implementation details in my head .
DRY is yet another tool. Skilled hands will be able to judge when to use it and when to keep a little duplication to preserve simplicity and legibility.
I see frequently developers going to great lengths to avoid duplication of couple of lines of code. In the process, they create much more new code, use half a dozen patterns and make the code more difficult to understand. That's not the ultimate goal.
The ultimate goal is always to keep the code produce more value and this usually is best served by simpler and more maintainable code.
In Data Engineering I only use DRY when it absolutely make sense. Code connecting to a database tends to be boiler plate, so can be abstracted away. However, I never abstract away ETL transformations, even if the code is duplicated. Often ETL logic starts out very repetitive in its early days, only to be customised further and further over the lifetime of the code. Each business stake holders tend to ask for modifications, e.g. can you please ignore sales in Australia for now, as it is handled by a different team, etc...
I'm known as a strong proponent for keeping code DRY (see: https://www.youtube.com/watch?v=S4LbUv5FsGQ), but even I sometimes advocate just copying and modifying code. Sometimes two simple pieces of code are less work/maintenance/mental-burden than one overly generalized/parameterized/complicated piece of code. As the number of copies grows, then the balance of the trade off shifts.
You definiteyly shouldn't strive to DRY for everything. The tricky part is to understand when duplicated code can be abstracted away and when it is duplicated "by chance", i.e. the code is the same but there is no real underlying abstraction.
The latter tends to happen for business logic: suddenly requirements change and your beautiful "abstraction" falls like a castle of cards.
I discuss this in my book "Street Coder" for beginner and mid-level programmers, in a section titled "Do Repeat Yourself". Being hardcore about not repeating yourself might create unnecessary dependencies and complicate your code structure. You can even cause irrelevant concerns to depend on each other causing it to become harder to maintain.
I like the piece but the author might want to update it for distributed repositories where 'DRY' might mean you inherit a CVE from some other developer, and it also may mean that you've got code in your system that nobody who works for you knows how it works or why it was written the way it was written. Both situations are "sub optimal."
I’ve got one criticism in that the over generalisation remark shouldn’t really exist in well designed DRY code - if two function calls and one doesn’t even use most of the parameters then these aren’t covering then these should be separate functions
I’ve usually heard this phenomenon called “incidental duplication,” and it’s something I find myself teaching junior engineers about quite often.
There are a lot of situations where 3-5 lines of many methods follow basically the same pattern, and it can be aggravating to look at. “Don’t repeat yourself!” Right?
So you try to extract that boilerplate into a method, and it’s fine until the very next change. Then you need to start passing options and configuration into your helper method... and before long your helper method is extremely difficult to reason about, because it’s actually handling a dozen cases that are superficially similar but full of important differences in the details.
I encourage my devs to follow a rule of thumb: don’t extract repetitive code right away, try and build the feature you’re working on with the duplication in place first. Let the code go through a few evolutions and waves of change. Then one of two things are likely to happen:
(1) you find that the code doesn’t look so repetitive anymore,
or, (2) you hit a bug where you needed to make the same change to the boilerplate in six places and you missed one.
In scenario 1, you can sigh and say “yeah it turned out to be incidental duplication, it’s not bothering me anymore.” In scenario 2, it’s probably time for a careful refactoring to pull out the bits that have proven to be identical (and, importantly, must be identical across all of the instances of the code).
[1] https://news.ycombinator.com/reply?id=22022603&goto=item%3Fi...