Any advice on teaching this to junior engineers? Seems like folks with 3-5 years of experience keep trying to not only over-abstract but also keep re-inventing the wheel with abstractions (vs looking for existing libraries).
It's largely because they're dealing with an area with no theoretical tools. Any time you hit an area that are full of people "Designing" solutions/abstractions rather then "Calculating" an optimal solution/abstraction you know you've hit an area where there's very little theoretical knowledge and most people are just sort of wandering chaotically in circles trying to find an "optimal" solution/abstraction without even a formal definition of what "optimal" is.... I mean what is the exact definition of the "perfect abstraction"? What is bad about duplication what is a bad over abstraction and what is this "cheaper" cost that the title is talking about? It's all a bunch of words with fuzzy meanings injected with peoples biased opinions.
That being said theories on abstractions do exist. If you learn it you'll be at the top of your game; but it's really really hard to master. If you do master it, you'll be part of a select group of unrecognized elites in a world of programmers that largely turn to "design" while eschewing theory.
You will note that both of these resources talk about functional programming at its core which should indicate to you that the path to the most optimal abstraction lies with the functional style.
My favorite example of really bad abstraction is add/edit crammed into single popup/model. You know edit is basically a copy paste of add so "ding ding ding here goes DRY!" in a junior mind. But quickly enough it shows up that some properties can be set in add, whereas in edit they have to be read only. Quite often you get also other business rules that can be applied only on edit or make sense only when adding new entity. But when you create first version they look a lot like the same code that should be reused.
For me this is really good example of how similar looking code is not the same because it has different use case.
> But quickly enough it shows up that some properties can be set in add, whereas in edit they have to be read only.
So? Just put in some conditionals.
What is the alternative? Duplicate most of the code with minor, non-explicit differences? What's the benefit? You just moved complexity around, you didn't get rid of it.
The drawback is that now anything you have to add, you have to add and maintain it in two places. And since your "add" and "edit" are probably 90% the same, it's going to happen 90% of the time. It's very annoying during development and you're likely to fuck it up at some point.
This is a good example of how this overall topic gets reduced to "How much abstraction?" instead of "In what ways should something be abstracted?"
Obviously an Add/Edit field are operating on the same record in a hypothetical database, so it makes little sense to duplicate the model.
On the other hand, if the conditionals within the abstracted version become too complex or keep referencing some notion of a mode of operation (like, ` if type(self) == EditType && last_name != null` lines of thinking), that is sometimes another type of smell.
But say you make some kind of abstract base class that validates all fields in memory before committing to the database, and then place all of your checking logic in a validate() method. That sounds like pretty clean abstractions to me.
And moreover, this is probably provided by an ORM system and documented by that system anyway--so that's a publicly documented and likely very common abstraction that you see even between different ORMs. That, I think, is the very best kind of abstraction, at least assuming you are already working in such an environment as a high-level language and ORM. Making raw SQL queries from C programs still contain their own levels of abstractions of course without buying whole sale into the many-layered abstraction that is a web framework or something.
This question becomes more important when you aren't just updating a database though. If you're writing some novel method with a very detailed algorithm, over abstraction through OOP can really obscure the algorithm. In such a case, I try to identify logical tangents within the algorithm, and prune/abstract them away into some property or function call, but retain a single function for the main algorithm itself.
The main algorithm gets its definition moved to the base class, and the logical tangents get some kind of stub/virtual method thingy in the base class so that they have to be defined by subclasses. The more nested tangents are frequently where detailed differences between use cases emerge, which makes logical sense. It's not just that it's abstract, but the logic is categorically separated.
It's a very general pattern supported by many languages, so you see it all over the place. That organization and consistency in itself helps you to understand new code. In that way, it also becomes a kind of "idiom" which in a sense is one more layer of abstraction, helping you to manage complexity.
As a counter of that, you see code where `a + x * y - b` becomes self.minus(self.xy_add(a), b). More abstract, but not more logical; not categorically separating; not conforming to common idioms; obscuring the algorithm; and so on...
And then there is performance! Let's not talk about the performance of runtime abstractions.
Each to his own. If I found that a junior had created two separate popups, one for add and one for edit, I'd want to look into the code with them to understand if that was a good choice, because usually it wouldn't be for anything with more that one or two properties.
I think there are two parts to it. First, you want to push them to get into the habit of solving problems by expressing the question clearly enough that the answer falls naturally from it. That's so fundamental that every aspect of engineering benefits from it, but it's particularly important as a first step in writing code.
The second part is building the intuition for the abstractions themselves. That's tricky as they have to teach themselves. They need to build coherency in their internal mental langauge of abstraction, and the only way to do that is to work directly on real code, and work through the consequences of doing it one way vs. another.
That means you have to let them commit code you don't like. By all means, explain what your concerns are, but then let them see how it evolves and as it becomes more untenable, that's when you go back to rethinking it and trying to state the problem clearly.
Likewise, when they do it well, you can highlight that, especially drawing attention to changes to their code that worked nicely.
Bring the idea that abstraction has a cost, like technical debt. It’s not something to be proud of, on the contrary, it must be justified and serve a true purpose and not be only an intellectual satisfaction.
Teach them about cyclomatic complexity and then review their work in these terms. It gives them something concrete to target rather than trying to accomplish some ethereal notion of "proper abstraction".