Hacker News new | past | comments | ask | show | jobs | submit login

Not to take this on a huge tangent, but I really _do_ think it's good advice. Unrolling complicated abstractions is a lot of work. Keeping two pieces of nearly identical code in sync is work too, but I've never found it all that onerous. But there's obviously a continuum; on one side it's obvious that it's a shared concept, and on the other, code just happens to be similar almost by chance, and not for much longer. But lately duplication has been turned into a code smell to be linted out, causing a lot of people to get rid of all of it, at all cost.



If you only have 2 instances but are spending effort keeping them both in sync for changes then that might be a good time to abstract anyway. The fact that you’re keeping them in sync means they aren’t just coincidentally the same. But this is very situational.


The cost of maintaining two (or three ...) copies might well be less than creating an abstraction. The danger rather lies in situations where not everyone potentially modifying that code (now or later) is aware that there are N copies (and where they are) which need to be maintained.


To be fair, Sandi suggests adding dup tags (essentially comments with links to other duplicates, I guess) in the linked video.

I can see how this could work in some cases, but I remain skeptical about this technique. I have a strong gut feeling that over 20% of the junior engineers touching this code will not understand the dup tag and will end up modifying one case without keeping the other case in sync. This number is already high enough to be worrisome for me.


That's a big part of it for me. The abstraction would end up being the best way to document the duplication, imo. Far better than the likes of /* See also... */.

It depends of course, but I personally feel the work of 'simplifying an abstraction', is easier than the problem of 'tracking down anything that might need to be edited'.


Then such person needs better IDE, because finding callers is one shortcut away.


That's an amazingly mechanical view of code.

If two blocks of code refer to exactly the same thing (event, process, object, rule) in the application domain, then the duplication can be eliminated.

You can't eliminate duplication without asking, "what does this code mean in this context?".


I think there are a couple of things at play here.

One is the use of code quality tools like CodeClimate. It's true that those can sometimes be extremely aggressive when it comes to duplicate code to the point that I find their complaints to be uselessely beside the point. This is especially true if you have typical "structural" duplication like "many controllers start with the same sequence of steps" etc. or is even worse when you have to use configuration DSLs etc.

OTOH, it has been my personal experience that many people, if they use CodeClimate etc., routinely just ignore them for the most part, so I'm not always sure what the point of them is. But maybe other people have different experiences and some people really are routinely overabstracting the most coincidental of duplication in which case I agree that that is not a very useful thing to do.

As for the advice itself: it is definitely problematic if it is used as some sort of hard "rule". If it is taken as a heuristic / "rule of thumb", then it might be ok as long as you make sure people don't overemphasise it where other/better rules of thumb would be appropriate.

For example, if I were to write some billing code and somebody else just duplicated that code somewhere instead of using a shared abstraction, I would probably find that to be a serious code health issue as you really shouldn't perform billing calculations in two separate places: this is something that needs to be kept in sync across the code base; one sibling (nephew?) comment is right in pointing out that here you have to consider the cost of things that need to stay in sync accidentally going out of sync.

There are many more examples which is why I think that if you use "refactor on 3" as _one_ heuristic, it's fine, but if it's the sole one, then less so.




The deadline for YC's W25 batch is 8pm PT tonight. Go for it!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: