I'm skeptical because it is really easy to un-share code by copying it into multiple places but it is very hard to unify duplicated code. So I prefer to err on the side of sharing.
But yes, you should be ready to change sharing into duplication if you realize the code is just "accidentally similar" and need to evolve in separate directions.
In practice I have seen a lot more pain due to duplicate code compared to the issue of over-abstracting code, because the latter is much easier to fix.
On the other hand, it's really difficult to know who is using that shared code. If you make an innocuous change in a shared method, it could affect someone else you don't know.
It doesn't have to be that way, it's just because our existing language facilities don't have support for what we need them to do:
"Names are regularly repurposed to point to new definitions, even when the old definitions are still perfectly valid. For instance, a library author might make a function a bit more generic by adding an extra parameter, or decide to switch the parameter order. That's fine, but the old definition was not wrong, so should users be forced to upgrade? No.
Repurposing names is fine; it's hard to come up with good names for definitions, so using an old name for a related new definition often makes sense. The trouble is that existing tooling doesn't distinguish between repurposing a name and upgrading a definition. Unison changes that..."
https://www.unisonweb.org/2020/04/10/reducing-churn/#incompa...
I find it much easier to find the call sites for a function than to find code that’s duplicating or a variant of the code I just fixed a bug in so we can figure out if the same bug is latent in the duplicates too.
Depends on a specific codebase? I found exact opposite to be true - very hard to reuse code that was abstracted too soon, and abstracting copy&paste the right way is actually easier if you have it in multiple cases and can see how it was used.
How is it harder to copy/paste the helper method and modify as needed, vs tracking down and unifying multiple instances of the same code written slightly differently?
In Java, when I hit a bad abstraction, I hit the inline shortcut (command-alt-n) and then evaluate the resulting code with git diff. Other languages may be more manual, but, at worst, you just use ripgrep or similar to find all the relevant use sites and then manually expand the abstraction: this is only really a problem it the function is used hundreds of time: but, in that case, you can always duplicate the abstraction and rename.
My experience lines up with yours. Working in overly and poorly abstracted codebases dramatically hurts productivity. Poorly duplicated code increases the chance for missed patches, but poor duplication has, in my experience, been vastly easier to fix. One codebase comes to mind. Twisted Python. Multiple layers of inheritance, multiple mixins, and major overloading of methods. Just navigating the code was pain.
> it is really easy to un-share code by copying it into multiple places but it is very hard to unify duplicated code
Code that already exists has a gravity, a presumption of correctness. That presumption is very difficult to overcome, especially for programmers new to the codebase. An abstraction you think of as temporary will be, to those who come after you, simply the way things are done; breaking it apart and re-forming it is, for them, fraught with risk. It's good to keep this in mind as you make commits.
I don't think I agree. Identifying an abstraction as leaky and breaking it apart is substantially more difficult and riskier than identifying duplication and creating an abstraction for it.
Removing an abstraction layer can usually be done mechanically by inlining the calls. This is a trivial operation. Identifying duplication not trivial since there might be various differences and you have to investigate if they are inconsequential or not.
But yes, you should be ready to change sharing into duplication if you realize the code is just "accidentally similar" and need to evolve in separate directions.
In practice I have seen a lot more pain due to duplicate code compared to the issue of over-abstracting code, because the latter is much easier to fix.