There's a lot of bad advice being tossed around in this thread. If you are worried about having to jump through multiple files to understand what some code is doing, you should consider that your naming conventions are the problem, not the fact that code is hidden behind functional boundaries.
Coding at scale is about managing complexity. The best code is code you don't have to read because of well named functional boundaries. Without these functional boundaries, you have to understand how every line of a function works, and then mentally model the entire graph of interactions at once, because of the potential for interactions between lines within a functional boundary. The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows. The cognitive load to understand code grows as the number of possible interactions grow. Keeping methods short and hiding behavior behind well named functional boundaries is how you manage complexity in code.
The idea of code telling a story is that a unit of work should explain what it does through its use of well named variables, function/object names, and how data flows between function/objects. If you have to dig into the details of a function to understand what it does, you have failed to sufficiently explain what the function does through its naming and set of arguments.
This is the problem right here. I don't just read code I've written and I don't only read perfectly abstracted code. When I am stuck reading someone's code who loves the book and tries their best to follow those conventions I find it far more difficult - because I am usually reading their code to fully understand it myself (ie in a review) or to fix a bug I find it infuriating that I am jumping through dozens of files just so everything looks nice on a slide - names are great, I fully appreciate good naming but pretending that using a ton of extra files just to improve naming slightly isnt a hindrance is wild.
I will take the naming hit in return for locality. I'd like to be able to hold more than 5 lines of code in my head but leaping all over the filesystem just to see 3 line or 5 line classes that delegate to yet another class is too much.
Carmack once suggested that people in-line their functions more often, in part so they could “see clearly the full horror of what they have done” (paraphrased from memory) as code gets more complicated. Many helper functions can be replaced by comments and the code inlined. I tried this last year and it led to overall more readable code, imho.
> The real enemy addressed by inlining is unexpected dependency and mutation of state, which functional programming solves more directly and completely. However, if you are going to make a lot of state changes, having them all happen inline does have advantages; you should be made constantly aware of the full horror of what you are doing. When it gets to be too much to take, figure out how to factor blocks out into pure functions (and don.t let them slide back into impurity!).
The idea is that without proper boundaries, finding the line that needed to be changed may be a lot harder than clicking through files with an IDE. Smaller components also help with code reviews since it’s a lot easier to understand a line within the context of a component (or method name) without having to understand what the huge globs of code before it is doing. Also, like you said a lot of the times a developer has to read code they didn’t write so there are other factors to consider like how easy it is for someone from another team to make a change or whether a new employee could easily digest the code base.
The problem being solved here is just scope, not re-usability. Functions are a bad solution because they force non-locality. A better way to solve this would be local scope blocks, /that define their dependencies.
This is also a more tailored solution to the problem than a function, it allows finer-grained control over scope restriction.
It's frustrating that most existing languages don't have this kind of feature. Regular scope blocks suck because they don't allow you to define the specific ways in which they are permeable, so they only restrict scope in one direction (things inside the scope block are restricted) - but the outer scope is what you really want to restrict.
You could also introduce this functionality to IDEs, without modifying existing languages. Highlight a few lines, and it could show you a pop-up explaining which variables that section reads, mutates and defines. I think that would make reading long pieces of code significantly easier.
This is one of the few comments in this entire thread that I think is interesting and born out of a lot of experience and not cargo culting.
In C++ you can make a macro function that takes any number of arguments but does nothing. I end up using that to label a scope because that scope block will then collapse in the IDE. I usually declare any variables that are going to be 'output' by that scope block just above it.
This creates the ability to break down isolated parts of a long function that don't need to be repeated. Variables being used also don't need to be declared as function inputs which also simplifies things significantly compared to a function.
This doesn't address making the compiler enforce much, though it does show that anything declared in the scope doesn't pollute the large function it is in.
Thank you. Your macro idea is interesting, but I definitely want to be able to defer to the compiler on things like this. I want my scope restrictions to also be a form of embedded test. Similar to typing.
I wish more IDEs had the ability to chunk code like this on-the-fly. I think it's technically possible, and maybe even possible to insert artificial blocks automatically, showing you how your code layout chunks automatically... Hmm.
You know, once I'm less busy I might try implementing something like this.
C++ lambda captures work exactly like this. You need to state which variables that should be part of the closure and whether they should be mutable and by reference or copies.
This is a big assumption. Many engineers prefer to grep through code without an IDE, the "clean code" style breaks grep/github code search and forces someone to install an IDE with go to declaration/find usages. On balance I prefer the clean code style and bought the jetbrains ultimate pack, however I do understand that some folks are working with grep/vim/code search and would rather not download a project to figure out how it works.
I've done both on a "Clean Code", lots-of-tiny-functions C++ codebase. Due to various reasons[0], I spent a year using Emacs with no IDE features to work on that codebase, after which I managed to get a language server to work in our specific context, and continued to use Emacs with all the bells and whistles LSP provides.
My conclusion? Small functions are still annoying. Sure, with IDE features in a highly-productive environment like Emacs is, I can jump around the codebase at the speed of thought. But it doesn't solve the critical problem: to understand a piece of code that does something useful, I have to keep all these tiny functions in my working memory. And it ain't big enough for that.
I've long been dreaming about IDE/editor feature that would let you inline code for viewing, without actually changing it. That is, I could mark a block of code, and my editor would replace all function calls[1] with their bodies, with names of their parameters replaced by the names of arguments passed[2].
This way, I could reap benefits of both approaches - small functions that compose and have meaningful ways, and long sequential blocks of code that don't tax my working memory.
--
[0] - C++ is notoriously hard to get reliable code intelligence (autocomplete, xref) to work. Even commercial IDEs get confused if the codebase is large enough, or built in an atypical fashion. Visual Studio in particular would happily crash for me every other day...
[1] - With some sane default filters, like "don't inline functions from the standard library and third-party libraries".
[2] - Or autogenerated ones when the argument is an expression. Think Lisp gensym. E.g. when I have `auto foo(F f);` and call it like `foo(2+2);`, the inlined code would start with `F f_1 = 2+2;`. Like with expanding Lisp macros, the goal of this exercise is that I should be able to replace my original code with generated expansion, and it should work.
You wrote: "I've long been dreaming about IDE/editor feature that would let you inline code for viewing, without actually changing it." That sounds like a great idea! That might be useful for both the writer and reader. It might be possible to build something like using the libraries that Clang provides, but it would be a huge feat -- like a master's or part of a PhD.
You also wrote: "Visual Studio in particular would happily crash for me every other day..." Have you tried CLion by JetBrains? (Usually, they have a free 30-day trial.) I have not used it for enterprise-large projects, but I have used it for a few personal projects. It is excellent. The pace of progress (new features in "code sensing") is impressive. If you find bugs, you can report them and they usually fix them. (They have fixed about 50% of the bugs I have raised about their products over the last 5 years. An impressive clearance rate!)
> It might be possible to build something like using the libraries that Clang provides, but it would be a huge feat -- like a master's or part of a PhD.
Yeah, that's how I feel about it too. A useful MVP would probably be shorter, though, even if it sometimes couldn't do the inlining, or misidentified the called function. I mean, this is C++, I haven't seen any product with a completely reliable type hints & autocompletion, and yet even buggy ones are still very useful.
> Have you tried CLion by JetBrains?
Not yet. Going by the experience with IntelliJ, I expect to be a very good product. But right now, I'm sticking to Emacs.
In my experience, professional IDEs (particularly the JetBrains ones) are the best for working with a particular programming language, but they aren't so good for polyglot work and all the secondary tasks surrounding programming - version control, file management, log inspection, and even generalized text editing. My Emacs setup, on the other hand, delivers superior ergonomics for all these secondary tasks, and - as long as I can find appropriate language server - is within order of magnitude on programming itself. So it feels like a better deal overall.
I agree about the "professional IDEs" point. Are you aware that IntelliJ has language plug-ins that lets you mix HTML/JavaScript/CSS/Java/Python in the same project? I guess CLion can at least mix C/C++/HTML/JavaScript/CSS/Python. This is great when you work with research scientists who like to use different languages in the same project due to external dependencies. I can vouch for /certain/ polyglot projects, it works fine in IntelliJ. (That said, you might have a very unusual polyglot project.)
As for tooling, I might read/write/compile/debug code in the IDE, but do all the secondary tasks in a Linux/Bash/Cygwin terminal. Don't feel guilty/ashamed of this style! "Use the right tool for the job." I am forced to work in Windows, but Cygwin helps me "Get that Linux feeling - on Windows". I cringe when I watch people using Git from a GUI (for the most part) instead of using the command line, which is normally superior. However, I also cringe when I watch people hunt and peck (badly!) in vim to resolve a merge conflict. (Have you seen the latest merge tool in IntelliJ? For 90% of users, it is a superior user experience.) To be fair, I have also watched real pros resolve merge conflicts in vim/emacs equally fast.
One thing you will find "disappointing" is the CPU & memory footprint of any modern IDE requires 1990s supercomputer resources. It is normal to see (multiple) enterprise-large projects take 1-10GB of RAM and 8-16 cores (for a few mins) to get fired up. (I am not resource constrained on my dev box, so I am willing to pay this tax.) However, after init, you can navigate the code quickly, and get good real-time static analysis feedback ("code sensing").
With ViM you can get decent results with a plugin that consumes the output from the ctags library.
It’s not perfect though and depending on how you have it set up you may have to manually trigger tag regeneration which can take a bit depending on deep into package files you set it to go.
I would extend this one level higher to say managing complexity is about managing risk. Risk is usually what we really care about.
From the article:
>any one person's opinions about another person's opinions about "clean code" are necessarily highly subjective.
At some point CS as a profession has to find the right balance of art and science. There's room for both. Codifying certain standards is the domain of professions (in the truest sense of the word) and not art.
Software often likens itself to traditional engineering disciplines. Those traditional engineering disciplines manage risk through codified standards built through industry consensus. Somebody may build a pressure system that doesn't conform to standards. They don't get to say "well your idea of 'good' is just an opinion so it's subjective". By "professional" standards they have built something outside the acceptable risk envelope and, if it's a regulated engineering domain, they can't use it.
This isn't to mean a coder would have to follow rigid rules constantly or that it needs a regulatory body, but that the practice of deviating from standardized best-practices should be communicated in terms of the risk rather than claiming it's just subjective.
A lot of "best practices" in engineering were established empirically, after root cause analysis of failures and successes. Software is more or less evolving along the same path (structured programming, OOP, higher-than-assembly languages, version control, documented ISAs).
Go back to earlier machines and each version had it's own assembly language and instruction set. Nobody would ever go back to that era.
OOP was pitched as a one-size-fits-all solution to all problems, and as a checklist of items that would turn a cheap offshored programmer into a real software engineer thanks to design patterns and abstractions dictated by a "Software Architect". We all know it to be false, and bordering on snake oil, but it still had some good ideas. Having a class encapsulate complexity and defining interfaces is neat. It forces to think in terms of abstractions and helps readability.
> This isn't to mean a coder would have to follow rigid rules constantly or that it needs a regulatory body, but that the practice of deviating from standardized best-practices should be communicated in terms of the risk rather than claiming it's just subjective.
As more and more years pass, I'm less and less against a regulatory body. Would help with getting rid of snake oil salesman in the industry and limit offshoring to barely qualified coders. And simplify hiring too by having a known certification that tells you someone at least meets a certain bar.
Software is to alchemy what software engineering is to chemistry. Software engineering hasn't been invented yet. You need a systematizing scientific revolution (Kuhn style) before you can or should create a regulatory body to enforce it. Otherwise you're just enforcing your particular brand of alchemy.
> OOP was pitched as a one-size-fits-all solution to all problems, and as a checklist of items that would turn a cheap offshored programmer into a real software engineer.
Not initially. Eventually, everything that reaches a certain minimal popularity in software development level gets pitched by snake-oil salesman to enterprise management as a solution to that problem, including things developed specifically to deal with the problem of othee solutions being cargo culted and repackaged that way, whether its a programming paradigm or a development methodology or metamethodology.
>having a known certification that tells you someone at least meets a certain bar.
This was tried a few years back by creating a Professional Engineer licensure for software but it went away due to lack of demand. It could make sense to artificially create a demand by the government requiring it for, say, safety critical software but I have a feeling companies wouldn't want this out of their own accord because that license gives the employee a bit more bargaining power. It also creates a large risk to the SWEs due to the lack of codified standards and the inherent difficulty in software testing. It's not like a mechanical engineer who can confidently claim a system is safe because it was built to ASME standards.
> It could make sense to artificially create a demand by the government requiring it for, say, safety critical software but I have a feeling companies wouldn't want this out of their own accord because that license gives the employee a bit more bargaining power.
For any software purchase above a certain amount the government should be forced to have someone with some kind of license sign on the request. So many projects have doubled or tripled in price after it was discovered the initial spec didn't make any sense.
I think that at this point, for the software made/maintained for the government, they should just hire and train software devs themselves.
From what I've seen, with a few exceptions, government software development always ends up with a bunch of subcontractors delivering bad software on purpose, because that's the way they can ensure repeat business. E.g., the reason Open Data movement didn't achieve much, why most public systems are barely integrated with each other, is because every vendor does its best to prevent that from happening.
It's a scam, but like other government procurement scams, it obeys the letter of the law, so nobody goes to jail for this.
The development of mass transit (train lines) has a similar issue when comparing the United States to Western Europe, Korea, Japan, Taiwan, Singapore, or Hongkong. In the US, as much as possible is sub-contracted. In the others, a bit less, and there is more engineering expertise on the gov't payroll. There is a transit blogger who writes about this extensively... but his name eludes me. (Does anyone know it?)
Regarding contractors vs in-house software engineering talent, I have seen (from media) UK gov't (including NHS) has hired more and more talent to develop software in-house. No idea if UK folks think they are doing a good job, but it is a worthy experiment (versus all contractors).
>should just hire and train software devs themselves
There are lots of people who advocate this but it’s hard to bring into fruition. One large hurdle is the legacy costs, particularly because it’s so hard to fire underperforming government employees. Another issue is that government salaries tend to not be very competitive by software industry standards so you’ll only get the best candidates if they happen to be intrinsically motivated by the mission. Third, software is almost always an enabling function that is often competing for resources with core functions. For example, if you run a government hospital and you can hire one person, you’re much more likely to prefer a healthcare worker hire than a software developer. One last, and maybe unfair point, is that the security of government positions tends to breed complacency. This often creates a lack of incentive to improve systems which results in a lot of legacy systems hobbling along past their usefulness.
I don’t think subcontractors build bad systems on purpose, but rather they build systems to bad requirements. A lot of times you have non-software people acting as program managers who are completely fine with software being a black box. They don’t particularly care about software as much as their domain of expertise and are unlikely to spend much time creating good software requirements. What I do think occurs is that contractors will deliberately under bid on bad retirements knowing they will make their profits on change orders. IMO, much of the cost overruns can be fixed by having well-written requirement specs
Do you mean sign as in qualify that the software is "good"?
In general, they already have people who are supposed to be responsible for those estimates and decisions (project managers, contracting officers etc.) but whether or not they're actually held accountable is another matter. Having a license "might" ensure some modicum of domain expertise to prevent what you talk about but I have my doubts
> Do you mean sign as in qualify that the software is "good"?
We're not there yet. Just someone to review the final spec and see if it makes any sense at all.
Canonical example is the Canadian Phenix Payroll System. The spec described payroll rules that didn't make any sense. The project tripled in cost because they had to rewrite it almost completely.
> In general, they already have people who are supposed to be responsible for those estimates and decisions (project managers, contracting officers etc.) but whether or not they're actually held accountable is another matter.
For other projects, they must have an engineer's signature else nothing gets built. So someone does the final sanity check for the project managers-contracting officers-humanities-diploma bureaucrat. For software, none of that is required, despite the final bill being often as expensive as a bridge.
> Having a license "might" ensure some modicum of domain expertise to prevent what you talk about but I have my doubts
Annoyingly, the government already sorta does this: many federal jobs, as well as the patent bar, require an ABET-accredited degree.
The catch is that many prominent CS programs don’t care about ABET: DeVry is certified, but CMU and Stanford are not, so it’s not clear to me that this really captures “top talent.”
I suspect this is because HR and probably even side hiring managers cannot distinguish between the quality of curriculums. One of the problems with CS is the wide variance in programs...some require calculus through differential equations and some don’t require any calculus whatsoever. Sob it’s easier to just require an ABET degree. Similar occurs with Engineering Technology degrees, even if they are ABET accredited.
To your point, it unfortunately and ironically locks out many CS majors for computer science positions.
In what sense do you think they haven't been exposed? As in, they've never seen their resumes? Or they've never worked with them?
I think it's an misalignment of incentives in most cases. HR seems to care very little once someone is past the hiring gate. So they would have to spend the time to understand the curriculum distinctions, probably change their grading processes, etc. It's just much easier for them to apply a lazy heuristic like "must have an ABET accredited degree" because they really don't have to deal much with the consequences months and years after the hire. In some cases, they even overrule the hiring manager's initial selection.
>the practice of deviating from standardized best-practices should be communicated in terms of the risk rather than claiming it's just subjective.
The problem I see with this is that programming could be described as a kind of general problem solving. Other engineering disciplines standardize methods that are far more specific, e.g. how to tighten screws.
It's hard to come up with specific rules for general problems though. Algorithms are just solution descriptions in a language the computer and your colleagues can understand.
When we look at specific domains, e.g. finance and accounting software, we see industry standards have already emerged, like dealing with fixed point numbers instead of floating point to make calculation errors predictable.
If we now start codifying general software engineering, I'm worried we will just codify subjective opinions about general problem solving. And that will stop any kind of improvement.
Instead we have to accept that our discipline is different from the others, and more of a design or craft discipline.
Could you elaborate on this distinction? At the superficial level, "general problem solving" is exactly how I describe engineering in general. The example of tightening screws is just a specific example of a fastening problem. In that context, codified standards are an industry consensus on how to solve a specific problem. Most people wrenching on their cars are not following ASME torque guidelines but somebody building a spacecraft should be. It helps define the distinction of a professional build for a specific system. Fastening is the "general problem"; fastening certain materials for certain components in certain environments is the specific problem that the standards uniquely address.
For software, there are quantifiable measures. As an example, there are some sorting algorithms that are objectively faster than others. For those systems that it matters in terms of risk, it probably shouldn't be left up to the subjective eye of an individual programmer, just like the spacecraft should rely on a technician's subjective opinion of that a bolt is "meh, tight enough."
>I'm worried we will just codify subjective opinions about general problem solving.
Ironically, this is the same attitude in many circles of traditional engineering. People who don't want adhere to industry standards have their own subjective ideas about should solve the problem. Standards aren't always right, but it creates a starting point to 1) identify a risk and 2) find an acceptable way to mitigate it.
>Instead we have to accept that our discipline is different from the others
I strongly disagree with this and I've seen this sentiment used (along with "it's just software") to justify all kinds of bad design choices.
>For software, there are quantifiable measures. As an example, there are some sorting algorithms that are objectively faster than others. For those systems that it matters in terms of risk, it probably shouldn't be left up to the subjective eye of an individual programmer, just like the spacecraft should rely on a technician's subjective opinion of that a bolt is "meh, tight enough."
Then you start having discussions about every algorithm being used on collections of 10 or 100 elements, it doesn't really matter to the problem to be solved. Instead the language's built in sort functionality will probably do here and increase readability, because you know what's meant.
Profiling and replacing the algorithms that matter is much more efficient than looking at each usage.
Which again brings us back to the general vs specific issue. In general this won't matter, but if you're in a real-time embedded system you will need algorithms that don't allocate with known worst case execution times. But here again, at least for the systems that matter, we have specific rules.
>Profiling and replacing the algorithms that matter is much more efficient than looking at each usage.
I think this speaks to my point. If you are deciding which algorithms suffice, you are creating standards to be followed just as with other engineering disciplines.
>Then you start having discussions about every algorithm being used on collections of 10 or 100 elements, it doesn't really matter to the problem to be solved
If you’re claiming it didn’t matter on the specific problem, then you’re essentially saying it’s not risk-based. The problem here is you will tend to over-constrain design alternatives regardless if it decreases risk or not. My experience is people will strongly resist this strategy as it gets interpreted as mindlessly draconian.
FWIW, examining specific use cases is exactly what’s done in critical applications (software as well as other domains). Hazard analysis, fault-tree analysis, and failure-modes effect analysis are all tools to examine specific use cases in a risk-specific context.
>But here again, at least for the systems that matter, we have specific rules.
I think we’re making the save point. Standards do exactly this. That’s why in other disciplines there are required standards in some use cases and not others (see my previous comment contrasting aerospace to less risky applications)
I didn’t downvote but I’ll weigh in on why I disagree.
The glib answer is “because it’s worth it.” As software interfaces with more and more of our lives, managing the risks becomes increasingly important.
Imagine if I transported you back 150 years to when the industrial revolution and steam power were just starting to take hold. At that time there were no consensus standards about what makes a mechanical system “good”; it was much more art than science. The numbers of mishaps and the reliability reflected this. However, as our knowledge grew we not only learned about what latent risks were posed by, say, a boiler in your home but we also began to define what is an acceptable design risk. There’s still art involved, but the science we learned (and continue to learn) provides the guardrails. The Wild West of design practice is no longer acceptable due to the risk it incurs.
I imagine that's part of why different programming languages exist -- IE you have slightly less footguns with Java than with C++.
The problem is, the nature of writing software intrinsically requires a balance of art and science no matter what language it is. That is because solving business problems is a blend of art and science.
It's a noble aim to try and avoid solving unnecessarily hard problems, but when it comes to the customer, a certain amount of it gets incompressible. So you can't avoid it.
Yes, coding at scale is about managing complexity. No, "Keeping methods short" is not a good way to manage complexity, because...
> then mentally model the entire graph of interactions at once
...partially applies even if you have well-named functional boundaries. You said it yourself:
> The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows. The cognitive load to understand code grows as the number of possible interactions grow.
Programs have a certain essential complexity. Making a function "simpler" means making it less complex, which means that that complexity has to go somewhere else. If you make all of your functions simple, then you simply need more functions to represent the same program, which increases the total number of possible interactions between nodes and therefore the cognitive load of understanding the whole graph/program.
Allowing more complexity in your functions makes them individually harder to understand, but reduces the total number of functions needed and therefore makes the entire program more comprehensible.
Also note that just because a function's implementation is complex doesn't mean that its interface also has to be complex.
And, functions with complex implementations are only themselves difficult to understand - functions with complex interfaces make the whole system more difficult to understand.
This is where Occam's Razor applies - do not multiply entities unnecessarily.
Having hundreds or thousands of simple functions is the opposite of this advice.
You can also consider this in more scientific terms.
Code is a mental model of a set of operations. The best possible model has as few moving parts as possible, there are as few connections between the parts as possible, each part is as simple as possible, and both the parts and the connections between them are as intuitively obvious as possible.
Making parts as simple as possible is just one design goal, and not a very satisfactory or useful one in its own terms.
All of this turns out to be incredibly hard, and is a literal IQ test. Mediocre developers will always, always create overcomplicated solutions. Top developers have a magical ability to combine a 10,000 foot overview with ground level detail, and will tear through complex problems and reduce them to elegant simplicity.
IMO we should spend less time teaching algorithms and testing algorithmic specifics, and more on analysing complex systems and implementing them with minimal, elegant, intuitive models.
Lately I’ve found decoupling to be helpful in this regard.
This is an auth layer, it’s primary charge is ensure those receiving and modifying resources have the permissions to do so.
This is the data storage layer. It’s focused on clean, relatively generic data storage abstractions and models that are relatively unopinionated, and flexible.
This is the contract layer. It’s more concerned with combining the apis of the data and auth than it is with data transformation or business logic.
This is the business logic layer. It takes relatively abstract data from our API and performs transformations to massage it into shapes that fit the needs of our customers and the mental models we’ve created around those requirements.
Etc. Etc.
Of course this pragmatic decoupling is easier said than done, but the logical grouping of like concerns allows for discoverability, flexibility, and a generally clear demarcation of concerns.
I've also been gravitating towards this kind of component categorization, but then there's the ugly problem of "cross-cutting concerns". For instance:
- The auth layer may have an opinion on how half of the other modules should work. Security is notoriously hard to isolate into a module that can be composed with others.
- Diagnostics layer - logging, profiling, error reporting, debugging - wants to have free access to everything, and is constantly trying to pollute all the clean interfaces and beautiful abstractions you design in other layers.
- User interface - UI design is fundamentally about creating a completely separate mental model of the problem being solved. To make a full program, you have to map the UI conceptualization to the "backend" conceptualization. That process has a nasty tendency of screwing with every single module of the program.
I'm starting to think about software as a much higher-dimensional problem. In Liu Cixin's "The Three Body Problem" trilogy, there's a part[0] where a deadly device encased in impenetrable unobtanium[1] is neutered by an attack from a higher dimension. While the unobtanium shell completely protects the fragile internals in 3D space, in 4D space, both the shell and the internals lie bare, unwound, every point visible and accessible simultaneously[2].
This is how I feel about building software systems. Our abstractions are too flat. I'd like to have a couple more dimensions available, to compose them together. Couple more angles from which to view the source code. But our tooling is not there. Aspect-oriented programming moved in that direction a bit, but last I checked, it wasn't good enough.
--
[0] - IIRC it's in the second book, "The Dark Forest".
[1] - It makes more sense in the book, but I'm trying to spoiler-proof my description.
[2] - Or, going down a dimension, for flat people living on a piece of paper, a circle is an impenetrable barrier. But when we look at that piece of paper, we can see what's inside the circle.
Neat, that's some heady shit. I'll have to check Aspect oriented programming out.
It's a bit of work but I've been thinking the concept of interchange logic is a neat idea for cross layer concerns.
So for instance, I design my UI to exist in some fashion (I've been thinking contexts are actually a decent way to implement this model cause then you can swap them in and out in order to use them in different manners...)
So say, I've got some component which exists in the ForumContext, and it needs all the data to display for the forum.
So I build a ForumContext provider which is an interchange layer between my ForumApi and my ForumUI.
Then if it turns out I want to swap out the Api with another, all I have to do is create a new ForumContext provider which provides the same shape of data, and the User Interface doesn't need to change.
Alternatively if I need to shape the data in a new fashion, all I need to do is update my ForumContext provider to reshape the API data and I don't need to muss with the API at all (unless of course, I need new data in which case, yea of course).
It's not perfect, and React's docs seem to warn against use of contexts but I think you could make a decent architecture out of them potentially. And they can be a lot less boiler plate than a similar redux store by using the state hooks React provides.
I still have to build out some sort of proof of concept of my idea, it's essentially connected component trees again. But when half the components in my library are connected to the API directly you just end up with such a mess any time you need to either repurpose a component for any other use or switch a section of your app over to a new data store or api.
At the end of the day, it seems like no matter how hard you try, it's really just about finding the best worst solution ;-).
And yea, security is a doozy in general. I've been working on decoupling our permissions logic a bit lately since it's couple between records, permissions, and other shit at the moment. Leaves a lot of room for holes.
>If you make all of your functions simple, then you simply need more functions to represent the same program
The semantics of the language and the structure of the code help hide irrelevant functional units from the global namespace. Methods attached to an object only need to be considered when operating on some object, for example. Private methods do not pollute the global namespace nor do they need to be present in any mental model of the application unless it is relevant to the context.
While I do think you can go too far with adding functions for its own sake, I don't see that they add to the cognitive load in the same way that possible interactions within a functional unit does. If you're just polluting a global namespace with functions and tiny objects, then that does similarly increase cognitive load and should be avoided.
> No, "Keeping methods short" is not a good way to manage complexity
Agreed
> Allowing more complexity in your functions makes them individually harder to understand
I think that that can mostly be avoided, by sometime creating local scopes {..} to avoid too much state inside a function, combined with whitespace and some section "header" comments (instead of what would have been sub function names).
Can be quite readable I think. And nice to not have to jump back and forth between myriads of files and functions
I have found this to be one of those A or B developer personas that are hard for someone to change, and causes much disagreement. I personally agree 100%, but have known other people who couldn't disagree more, it is what it is.
I've always felt it had a strong correlation to top-down vs bottom-up thinkers in terms of software design. The top-down folks tend to agree with your stance and the bottom-up group do not. If you're naturally going to want to understand all of the nitty gritty details you want to be able to wrap your head around those as quickly as possible. If you're willing to think in terms of the abstractions you want to remove as many of those details from sight as possible to reduce visual noise.
I wish there was an "auto-flattener"/"auto-inliner" tool that would allow you to automagically turn code that was written top-down, with lots of nicely high-level abstractions, into an equivalent code with all the actions mushed together and with infrastructure layers peeled away as much as possible.
Have you ever seen a codebase with infrastructure and piping taking about 70% of the code, with tiny pieces of business logic thrown here and there? It's impossible to figure out where the actual job is being done (and what it actually is): all you can see is just an endless chain of methods that mostly just delegate the responsibility further and further. What could've been a 100-line loop of "foreach item in worklist, do A, B, C" kind is instead split over seven tightly cooperating classes that devote 45% of their code to multiplexing/load-balancing/messaging/job-spooling/etc, another 45% to building trivial auxiliary structure and instantiating each other, and only 10% actually devoted to the actual data processing, but good luck finding those 10%, because there is a never-ending chain of calling each other: A.do_work() calls B.process_item() which calls A.on_item_processing() which calls B.on_processed()... wait, shouldn't there been some work done between "on_item_processing" and "on_processed"? Yes, it was done by an inconspicuously named "prepare_next_worklist_item" function.
Ah, and the icing on the cake: looping is actually done from the very bottom of this call chain by doing a recursive call to the top-most method which at this point is about 20 layers above the current stack frame. Just so you can walk down this path again, now with the feeling.
Your comment gives me emotional flashbacks. Years ago I took Java off my resume, because I don’t want to ever interact with this sort of thing again. (I’m sure it exists in other languages, but I’ve never seen it quite as bad as in Java.)
I think the best “clean code” programming advice is the advice writers have been saying for centuries. Find your voice. Be direct and be brief. But not too brief. Programming is a form of expression. Step 1 is to figure out what you’re trying to say (eg the business logic). Then say it in its most natural form (switch statements? If-else chain? Whatever). Then write the simplest scaffold around it you can so it gets called with the data it needs.
The 0th step is stepping away from your computer and naming what you want your program to express in the first place. I like to go for walks. Clear code is an expression of clear thoughts. You’ll usually know when you’ve found it because it will seem obvious. “Oh yeah, this code is just X. Now I just have to type it up.”
>I wish there was an "auto-flattener"/"auto-inliner" tool
I'm as big an advocate of "top-down" design as anyone, and I have also wished for such a tool. When you just want to know "what behavior comes next", all the abstractions do get in the way. The IDE should be able to "flatten" the execution path from current context and give you a linear view of the code. Sort of like a trace of a debug session, but generated on-the-fly. But still, I don't think this is the best way to write code.
Most editors have code folding. I've noticed this helps when there are comments or it's easy to figure out the branching or what not.
However, what you're asking for is a design style that's hard to implement I think without language tooling (for example identifying effectful methods).
GP is asking for the opposite. They're asking for code unfolding.
That is, given a "clean code like":
auto DoTheThing(Stuff stuff) -> Result {
const auto foo = ProcessSth(stuff);
const auto bar = ValidateSthElse(stuff);
return DoSth(foo, bar);
}
The tool would inline all the function calls. That is, for each of ProcessSth(), ValidateSthElse() and DoSth(), it would automatically perform the task of "copy the function body, paste it at the call site, and massage the caller to make it work". It's sometimes called the "inline function" refactoring - the inverse of "extract function"/"extract method" refactoring.
I'd really, really want such a tool. Particularly one where the changes were transient - not modifying the source code, just overlaying it with a read-only replacement. Also interactive. My example session is:
- Take the "clean code" function that just calls a bunch of other functions. With one key combination, inline all these functions.
- In the read-only inlined overlay, mark some other function calls and inline them too.
- Rinse, repeat, until I can read the overlay top-to-bottom and understand what the code is actually doing.
Signed up just to say that I've also really, really wanted such a tool since forever. While for example the Jetbrains IntelliJ family of editors has the automatic "inline function" refactoring, they do it by permanently modifying the source code, which is not quite what we want. Like you say, it should be transient!
So I recently made a quick&dirty interactive mock-up of how such an editor feature could look. The mockup is just a web page with javascript and html canvas, so it's easy to try here: https://emilprogviz.com/expand-calls/?inline,substitution
(Not mobile friendly, best seen on desktop)
There are 2 different ways to show the inlining. You can choose between them if you click the cogwheel icon.
Then I learned that the Smalltalk editor Pharo already has a similar feature, demonstrated at https://youtu.be/baxtyeFVn3w?t=1803
I wish other editors would steal this idea. Microsoft, are you listening?
My mock-up also shows an idea to improve code folding. When folding / collapsing a large block of code, the editor could show a quick summary of the block. The summary could be similar to a function signature, with arguments and return values.
Thank you! I'm favoriting this comment. This is exactly what I was thinking about (+/- some polish)!
In particular, the SieveOfErastothenes() call, which I can inline, and inside the overlay, I can inline the call to MarkMultiples(), and the top-level variable name `limit` is threaded all the way down.
Please don't take that demo site down, or publish it somewhere persistent - I'd love to show it around to people as the demonstration of the tool I'm looking for.
> When folding / collapsing a large block of code, the editor could show a quick summary of the block.
I love how you did this! It hasn't even occurred to me, but now that I saw it, I want to have this too! I also like how you're trying to guess which branches in a conditional won't be taken, and diminish them visually.
> Please don't take that demo site down, or publish it somewhere persistent
Feel free to spread the URL around, I plan to keep it online for the rest of my life, or until the feature is available in popular editors - whichever comes first. And if someone wants to mirror the demo elsewhere, it should be easy to do so, since it's client-side only and MIT licensed.
> Also, welcome to HN! :)
Thanks! Been lurking here in read-only mode for years, but today I finally had something to contribute.
I just finished binge-watching all five of your videos on better programming tools, and I must say, it just blew my mind. Thank you for making them.
I've been maintaining my own notes on the kind of tools I'd like to have, with hopes to maybe implement them one day, and your videos covered more than half of my list, while also showing tons of brilliant ideas that never occurred to me. I'm very happy to see that the pain points I identified in my programming work aren't just my imagination.
Also, on a more abstract level, I love your approach to programming dilemmas, and it's the first time I saw it articulated explicitly: when there are two strong, conflicting views, you do a pros/cons analysis on both, and try to find a new approach that captures all the benefits, while addressing all the drawbacks.
I've sent you an e-mail a while ago, let me know if it got through :). I'll be happy to provide all kinds of feedback on the ideas you described in your videos, and I'd love to bounce the remaining part of my list off you, if you're interested :).
> today I finally had something to contribute
That's a first-class contribution. I think you should post the link to your site as a HN submission, using title "Show HN: Ideas for better programming tools" ("Show HN" being a marker that you're submitting your own work).
Wow, thanks, I'm really happy you liked my videos so much! I wonder how many of us have great tool ideas in private notes sitting on our hard drives, not really sharing them with others. I'm glad your ideas overlap with mine, because the more people have the same idea, the more likely it is to be a good one, I think.
> when there are two strong, conflicting views, you do a pros/cons analysis on both
Yeah, it's not easy... I've participated in endless, looping debates as much as anyone - I guess it's just human psychology. But with enough conscious effort, I find that it's sometimes possible to take a step back, take a fair look at both sides, and design a best-of-both-worlds solution. I'll apply this method again in future videos, and if I can inspire a few more people to use it, that's great. Making the programming world a tiny bit less "tribal" and a bit more constructive.
> I've sent you an e-mail a while ago
Yeah, let's continue our discussion over email. I replied to your email from my private address, let me know if it got through.
> That's a first-class contribution. I think you should post the link to your site as a HN submission
"First-class contribution" gave me tears of joy :) I'd like to "Show HN" in a few months. Once I post there, I might get a lot of comments, and I want to be available to answer the comments and make follow-up videos quickly, but currently my personal life is too busy.
> I wonder how many of us have great tool ideas in private notes sitting on our hard drives, not really sharing them with others.
From talking to others, as well as spending way too much time on HN, I think the answer is, "quite a lot". Perhaps not relative to the number of programmers, but in absolute terms, I'm pretty sure there's a hundred strong ideas to be found among just the people who comment here.
I do feel that our industry has an implicit bias against those ideas - I think it's a combination of, if you complain you get labeled as whiny, and working on speculative tooling is considered time spent not providing business value.
> let me know if it got through.
Yeah, I got it, thanks! I'm desperately trying to trim down my draft reply, because I somehow managed to write a short article when describing two of my most recent ideas :).
> I'd like to "Show HN" in a few months.
Sure, take your time :). But I think people will love what you're already have. It's not just the ideas you're presenting, but also a kind of "impression of quality" your videos give.
I'm curious to understand your use-case, would be open to explaining more?
Do you actually want to overlay the code directly into the parent method or would a tooltip (similar to hyperlink previews) work? I wondering how expanding the real estate space would help with readability and how the userflow would work.
For example, code folding made a lot more sense because the window would have those little boxes to fold unfold (which is basically similar to the act of inline and un-inline).
Yes, I want to overlay the code directly into the parent method, preferably with appropriate syntax highlighting and whatever other goodies the IDE/editor provides normally. It would be read-only to indicate that it's just a transient overlay, and not an actual code change.
So, if I have a code like:
auto Foo(Bar b) {
return b.knob();
}
auto Frob(Frobbable f) {
auto q = Foo(f.thing());
return q.quux(f.otherthing());
}
auto DoSth(Frobbable frobbie) {
auto a = Frob(frobbie);
return a.magic();
}
Then I want to mark the last function, and automatically turn it into:
auto DoSth(Frobbable frobbie) {
auto foo_1 = frobbie.thing();
auto q_1 = foo_1.knob();
auto frob_1 = frobbie.otherthing();
auto a = q_1.quux(frob_1);
return a.magic();
}
Or something equivalent, possibly with highlights/synthetic comments telling me which bits of code came from where. I want to be able to keep inlining function calls like this, until I hit a boundary layer like the standard library, or a third-party library. I might want to expand past that, but I don't think I'd do that much. I'd also like to be able to re-fold code I'm not interested in, to reduce noise.
What such tool would do is automating the process I'm currently doing manually - jumping around the tiny functions calling other tiny functions, in order to reassemble the actual sequence of lower-level operations.
I don't want this to be a tooltip, because I want to keep expanding past the first level, and have the overlay stay in place until I'm done with it.
EDIT: languages in the Lisp family - like Common Lisp or Emacs Lisp - feature a tool called "macroexpander". Emacs/SLIME wraps it into an interactive "macrostepper" that behaves pretty much exactly like the tool I described in this discussion thread.
> I wish there was an "auto-flattener"/"auto-inliner" tool that would allow you to automagically turn code that was written top-down, with lots of nicely high-level abstractions, into an equivalent code with all the actions mushed together and with infrastructure layers peeled away as much as possible.
On today's HN with this thread is "the hole in mathematics".
It is directly germane to what you are talking about.
In the process of formalizing axiomatic math, 1+1=2 took 700 pages in a book to formally prove.
The point about assembly is more or less correct. The process of de-abstracting is going to be long and probably not that clear in the end.
I understand what you mean: the assembly commenter is correct, you'll need to actually execute the program and reduce it to a series of instructions it actually performed.
Which is either an actual assembly, or a pseudo-assembly instruction stream for the underlying turing machine: your computer.
I really need you to introduce you to Jester, my toy functional programming language. It compiles down to pure lambda calculus (data structures are implemented with Scott-Mogensen encoding) and then down to C that uses nothing but function calls and assignments of pointers to struct fields. The logic and arithmetic are all implemented in the standard library: a Bool is a function that takes 2 continuations, a Byte is 8 Bool, an Int is 4 Byte, addition uses the good old ripple-carry algorithm, etc.
Reading the disassembly of the resulting program is pretty unhelpful: any function consists entirely of putting values from the fields of the passed-in structures into the fields of new structures and (tail)calling another function and passing it some mix of old/new structures.
While I think you are onto something about top-down vs. bottom-up thinkers, one of the issues with a large codebase is literally nobody can do the whole thing bottom-up. So you need some reasonable conventions and abstraction, or the whole thing falls apart under it's own weight.
That's another aspect of my grand unifying theory of developers. Those same personas seem to have correlations in other ways: dynamic vs static typing, languages, monolith vs micro service. How one perceives complexity, what causes one to complain about complexity, etc all vary based on these things. It's easy to arrive in circumstances where people are arguing past each other.
If you need to be able to keep all the details in your head you're going to need smaller codebases. Similar, if you're already keeping track of everything, things like static typing become less important to you. And the opposite is true.
> Those same personas seem to have correlations in other ways: dynamic vs static typing, languages, monolith vs micro service.
Your theory needs to account for progression over time. For example, the first programming languages I've learned were C++ and Java, so I believed in static typing. Then I worked a lot in PHP, Erlang and Lisp, and became a dynamic typing proponent. Later on, with much more experience behind me, I became a static typing fan again - to the point that my Common Lisp code is thoroughly typed (to the point of being non-idiomatic), and I wish C++ type system was more expressive.
Curiously, at every point of this journey, I was really sure I have it all figured out, and the kind of typing I like is the best way to manage complexity.
--
EDIT: your hypothesis about correlated "frames of mind" reminds me of a discussion I had with 'adnzzzzZ here, who also claimed something similar, but broader: https://news.ycombinator.com/item?id=26076639. The topic started as, roughly, whether people designing addictive games using gambling mechanics are devil incarnate (my view) or good people servicing a different target audience than me (their view), but the overarching theory 'adnzzzzZ presented in https://github.com/a327ex/blog/issues/66 also touched on static/dynamic typing debate.
My programming path is similar to yours! Started with C++ then moved into Perl. Then realised that uber-dynamic-typing in Perl was a death-trap in enterprise software. Then oddly found satisfaction in Excel/VBA because you can write more strictly-typed code in VBA (there is also a dynamic side) and even safely call the Win32 API directly. Finally, I came back to C++ and Java which are "good enough" for expressing the static types that I need. The tooling and open-source ecosystem in Java makes it very hard to be more productive in other languages (except maybe C#, but they are in the same language family). I'm a role now that also has some Python. While the syntactical sugar is like written prose, the weaker typing (than C++/Java) is brutal in larger projects. Unless people are fastidious about type annotations, I constantly struggle to reason about the code while (second-)guessing about types.
You wrote: <<I wish C++ type system was more expressive.>> Can you share an idea? For example: Java 17 (due for release in the fall) will feature sealed classes. This looks very cool. For years, I (accidentally) simulated this behaviour using enums tied to instances or types.
I've often wondered why certain people feel so attached to static typing when in my experience it's rarely the primary source of bugs in any of the codebases I work with.
But it's true, I do generally feel like a codebase that's so complex or fractured that no one can understand any sizable chunk of it is just already going to be a disaster regardless of what kind of typing it uses. I don't hate microservices, they're often the right decision, but I feel they're almost always more complicated than a monolith would be. And I do regularly end up just reading implementation code, even in 3rd-party libraries that I use. In fact in some libraries, sometimes reading the source is quicker and more reliable than trying to find the relevant documentation.
I wouldn't extrapolate too much based on that, but it's interesting to hear someone make those connections.
Statically typed languages and languages that force you to be explicit are awesome for going into a codebase you have never seen and understanding things. You can literally just let your IDE show you everything. All questions you have are just one Ctrl-click away and if proper abstraction (ala Clean Code) has been used you can ignore large swaths of code entirely and just look at what you need. Naming is awesome and my current and previous code bases were both really good in this (both were/are mixes of monolith and microservices). I never really care where a file is located. I know quite a few coders that will want to find things via the folder tree. I just use the keyboard shortcut to open by name and start guessing. Usually first or second guess finds what I need because things are named well and consistently.
Because we use proper abstractions I can usually see at first glance what the overall logic is. If I need to know how a specific part works in detail I can easily drill down via Ctrl-click. With a large inlined blob of code I would have a really hard time. Do I skip from line 1356 to 1781 or is that too far? Oh this is JavaScript and I don't even know if this variable here is a string or a number or both depending on where in the code we are or maybe it's an object that's used as a map?
The whole thing is too big to keep in my head all the time and I will probably not need to touch the same piece of code over and over and instead I will move from one corner to the next and again to another corner over the course of a few weeks to months.
That's why our Frontend code is being converted to TypeScript and our naming (and other) conventions make even our javascript code bearable.
Is your backend Java or C#? Your IDE description feels like Java w/ Eclipse or IntelliJ or C# w/ Visual Studio. I have similar experience to you. The "discoverability" of a large codebase is greatly increased by combining language with tooling. If you use Java with Maven-like dependency management (you can use Gradle these days if 'alergic' to Maven's pom.xml), the IDE will usually automatically download and "hook-up" source code. It is ridiculous how fast you can move between layers of (a) project code, (b) in-house libraries, (c) open source libraries, and (d) commercial closed-source libraries (decompile on the fly in 2021!). (I assume all the same can be done for C# w/ Visual Studio.)
To be fair, when I started my career, I worked on a massive C project that was pretty easy to navigate because it was a mono-repo with everything in one place. CTags could index 99% of what you needed, and the macros weren't out of control. (Part of the project was also C++, but written in the style of career C programmers who only wanted namespaces and trivial generics like vector and map! Again, very simple to navigate huge codebase.)
I'm still surprised in 2021 when someone asks me to move a Java class to a different package during a code review. My inner monologue says: "Really... do they still use a file browser? Just use the IDE to find it!"
> I've often wondered why certain people feel so attached to static typing when in my experience it's rarely the primary source of bugs in any of the codebases I work with.
That's precisely why people are attached to it; because it's rarely a source of bugs. :-)
Yeah, monoliths are frequently easier to reason about, simply because you have fewer entities. The big win of microservices (IMHO) isn't "reason about", it is that they are a good way of getting more performance out of your total system IFF various parts of the system have different scaling characteristics.
If your monolith is composed of a bunch of things, where most things require resources (CPU/RAM/time) on an O(n) (for n being the number of active requests), but one or a few parts may be O(n log n). Or be O(n), but with a higher constant...
Then, those "uses more resources" is the limit of scaling for each instance of the monolith, and you need to deploy more monoliths to cope with a larger load.
On the other hand, in a microservice architecture, you can deply more instances of just the microservices that need it. This can, in total, lead to more thinsg being done, with in total less resources.
But, that also requires you to have your microservices cut out in suitable sizes, which requires you to at one point have understood the system well enough to cut them apart.
And that, in turn, may lead to better barriers between microservices, meaning that each microservice MAY be easier to understand in isolation.
> But, that also requires you to have your microservices cut out in suitable sizes, which requires you to at one point have understood the system well enough to cut them apart.
Sure, but that’s not particularly hard; it’s been basic system analysis since before “microservices” or even “service-oriented architecture” was a thing. Basic 70s-era Yourdon-style structured analysis (which, while its not the 1970s approach, can be applied incrementally in a story-by-story agile fashion to build up a system as well as doing either big upfront design or working from the physical design to the logical requirements of an existing system) produces pretty much exactly what you need to determine service boundaries.
(It’s also a process that very heavily leverages locality of knowledge within processes and flows, so its quite straightforward to carry out without ever having to hold the whole system in your head.)
Yep, there's no real magic here. There's some understanding forced by a (successful) transition to microservices ,but a transition to microservices is not a requirement for said gained insight.
And if all parts of your system scale identically, it may be better to scale it by replicating monoliths.
Another POSSIBLE win is if you start having multiple systems, sharing the same component (say, authentication and/or authorization), at which point there's something to be said for breaking at least that bit out of every monolith and putting them in a single place.
I mean, to some respect, "dynamic typing" is "type the data" and "static typing" is "type the variable".
In both cases, there's the possibility for doing type propagation. But, if you somehow manage to pass in two floats to an addition that a C compiler thinks is an integer addition, you WILL have a bad day. Whereas in Common Lisp, the actual passed-in values are typed (for floats, usually boxed, for integers, if they're fixnums, usually tagged and having a few bits less than you would expect).
I’m reminded of an earlier HN discussion about an article called The Wrong Abstraction, where I argued¹ that abstractions have both a benefit and a cost and that their ratio may change as a program evolves and which of those “nitty gritty details” are immediately relevant and which can helpfully be hidden behind abstractions changes.
The point is that bottom-up code is a siren song. It never scales. It makes it a lot easier to get started, but given enough complexity it inevitably breaks down.
Once your codebase gets to somewhere around the 10,000 line mark, it becomes impossible for a single mind to hold the entire program in their head at a single time. The only way to survive past that point is with carefully thought out, water tight layers of abstractions. That almost never happens with bottom-up. Bottom-up is a lot like natural selection. You get a lot of kludges that work great to solve their immediate problem, but behave in undefined and unpredictable ways when you extend them outside their original environment.
Bottom-up can work when you're inside well-encapsulated modular components with bounded scope and size. But there's no way to keep those modules loosely coupled unless you have a elegant top-down architecture imposing order at the large-scale structure.
But the reverse is also true. Top-down programming doesn't really work well for smaller programs, it definitely doesn't work well when you're dealing with small, highly performance-critical or complex tasks.
So sure, I'll grant that when your program reaches the 10,000 line mark, you need to have some serious abstractions. I'll even give you that you might need to start abstracting things when a file reaches 1,000 lines.
But when we start talking about the rule of 30 -- that's not managing complexity, that's alphabetizing a sock drawer and sewing little permanent labels on each sock. That approach also doesn't scale to large programs because it makes rewrites and refactors into hell, and it makes new features extremely cumbersome to quickly iterate on. Your 10,000 line program becomes 20,000 lines because you're throwing interfaces and boilerplate all over the place.
Note that this isn't theoretical, I have worked in programs that did everything from building an abstraction layer over the database in case we wanted to use Mongo and SQL at the same time (we didn't), to having a dependency management system in place that meant we had to edit 5 files every time we wanted to add a new class, to having a page lifecycle framework that was so complicated that half of our internal support requests were trying to figure out when it was safe to start adding customer data to the page.
The benefit of a good, long, single-purpose function that contains all of its logic in one place is that you know exactly what the dependendencies are, you know exactly what the function is doing, you know that no one else is calling into the inlined logic that you're editing, and you can easily move that code around and change it without worrying about updating names or changing interfaces.
Abstract your code, but abstract your code when or shortly before you hit complexity barriers and after you have enough knowledge to make informed decisions about which abstractions will be helpful -- don't create a brand new interface every time you write a single function. It's fine to have a function that's longer than a couple hundred lines. If you're building something like a rendering or update loop, in many cases I would say it's preferable.
It's funny how these things are literally what the Clean Code book advocates for. Sure there is mention of a lot of stuff that's no longer needed and was a band aid over language deficiencies of a particular language. But the ideas are timeless and I used them before I even knew the book and I used them in Perl.
> these things are literally what the Clean Code book advocates for
I'm not sure I understand what you're saying, I might be missing your point. The Clean Code book advocates that the ideal function is a single digit number of lines, double digits at the absolute most.
In my mind, the entire process of writing functions that short involves abstracting almost everything your code does. It involves passing data around all over the place and attaching state to objects that get constructed over multiple methods.
How do you create a low-abstraction, bottom-up codebase when every coroutine you need to write is getting turned into dozens of separate functions? I think this is showcased in the code examples that the article author critiques from Clean Code. They're littered with side effects and state mutations. This stuff looks like it would be a nightmare to maintain, because it's over-abstracted.
Martin is writing one-line functions whose entire purpose is to call exactly one other function passing in a boolean. I don't even know if I would call that top-down programming, it feels like critiquing that kind of code or calling it characteristic of their writing style is almost unfair to top-down programmers.
I'm not saying the entire book taken literally is how everything must be done. I was trying to say that the general ideas make sense such as keeping a function at the same level of abstraction and keeping them small.
I agree with you that having all functions be one liners is not useful. Keeping all functions to within just a few lines or double digits at most makes sense however. Single digit could be 9. That's a whole algorithm right there! For example quicksort (quoted from the Wikipedia article)
algorithm quicksort(A, lo, hi) is
if lo < hi then
p := partition(A, lo, hi)
quicksort(A, lo, p - 1)
quicksort(A, p + 1, hi)
This totally fits the single digit of lines rule and it describes the algorithm on a high enough level of abstraction that you get the idea of the whole algorithm easily. Do you think that inlining the partition function would make this easier or harder to read?
algorithm quicksort(A, lo, hi) is
if lo < hi then
pivot := A[hi]
i := lo
for j := lo to hi do
if A[j] < pivot then
swap A[i] with A[j]
i := i + 1
swap A[i] with A[hi]
quicksort(A, lo, i - 1)
quicksort(A, i + 1, hi)
(I hope I didn't mix up the indentation - on the phone here and it's hard to see lol)
Now some stuff might require 11 or 21 lines. But as we get closer to 100 lines I doubt that it's more understandable and readable to have it all in one big blob of code.
> But as we get closer to 100 lines I doubt that it's more understandable and readable to have it all in one big blob of code.
Well, but that's exactly what I'm pushing back against. I think the rule of 30 is often a mistake. I think if you're going out of your way to avoid long functions, then you are probably over-abstracting your code.
I don't necessarily know that I would inline a quicksort function, because that's genuinely something that I might want to use in multiple places. It's an already-existing, well-understood abstraction. But I would inline a dedicated custom sorting method that's only being used in one place. I would inline something like collision detection, nobody else should be calling that outside of a single update loop. In general, it's a code smell to me if I see a lot of helper functions that only exist to be called once. Those are prime candidates for inlining.
This is kind of a subtle argument. I would recommend http://number-none.com/blow/john_carmack_on_inlined_code.htm... as a starting point for why inlined code makes sense in some situations, although I no longer agree with literally everything in this article, and I think the underlying idea I'm getting at is a bit more general and foundational.
> Do you think that inlining the partition function would make this easier or harder to read?
Undoubtedly easier, although you should label that section with a comment and use a different variable name than `i`. Your secondary function is just a comment around inline logic, it's not doing anything else.[0]
But by separating it out, you've introduced the possibility for someone else in the same class or file to call that function without your knowledge. You've also introduced the possibility for that method to contain a bug that won't be visible unless you step through code. You've also created a function with an unlabeled side effect that's only visible by looking at the implementation, which I thought we were trying to avoid.
You've added a leaky abstraction to your code, a function that isn't just only called in one place, but should only be called in one place. It's a function that will produce unexpected results if anyone other than the `quickSort` method calls it, that lacks any error checking; it's not really a self-contained unit of code at all.
And for what benefit? Is the word `partition` really fully descriptive of what's going on in that method? Does it indicate that the method is going to manipulate part of the array? And is anyone ever going to need to debug or read a quicksort method without looking at the partition method? I think that's very unlikely.
----
Maybe you disagree with everything I'm saying above, but regardless, I don't think that Clean Code is actually advocating for the same ideas as I am:
> Abstract your code, but abstract your code when or shortly before you hit complexity barriers and after you have enough knowledge to make informed decisions about which abstractions will be helpful -- don't create a brand new interface every time you write a single function.
I don't think that claim is one that Martin would agree with. Or if it is, I don't think it's a statement he's giving actionable advice about inside of his book.
----
[0]: In a language like Javascript (or anything that supports inline functions), we might still use a function or a new context as a descriptive boundary, particularly if we didn't want `j` and `pivot` to leak:
Remember that your variable and function names can go out of date at the same speed as any of your comments. But the real benefit of inlining this partition function (besides readability, which I'll admit is a bit subjective), is that we've eliminated a potential source of bugs and gotten rid of a leaky abstraction that other functions might be tempted to call into.
> Remember that your variable and function names can go out of date at the same speed as any of your comments.
A very good point, thank you for voicing it!
As the luck would have it, two days ago I was writing comments about this at work during code review - there was a case where a bunch of functions taking a "connection" object had it replaced with a "context" object (which encapsulated connection, and some other stuff), but the parameter naming wasn't updated. I was briefly confused by this when studying the code.
Ha :) This is something that's also been drilled into me mostly just because I've gotten bitten by it in jobs/projects. The most recent instance I ran into was a `findAllDependents` method turning into `findPartialDependentsList`, but the name never getting updated.
Led to a non-obvious bug because from a high level, the code all looked fine, and it was only digging into the dependents code that revealed that not everything was getting returned anymore.
Absolutely agree that all naming can go out of date. With at least the tools I use nowadays it's even easier for comments to go out of date that it was previously because of all the automatic folding away in the IDE.
But one of the best reminders that comments don't do sh+t was early on in my career when my co worker asked me a question on a line of code (and it was literally just the two of us working on that code base). I probably had a very weird look on my face. I simply pointed to the line above the one he asked about. He read the comment and said "thank you".
I guess my point is that all you can do is to incorporate the "extra information" as closely as possible to the actual code, so that it's less likely to just be ignored/not seen. Thus incorporating it into the variable and function aiming itself is the closest you will get and as per your example (and my own experience as well) it can still happen. Nothing but rigorous code review practices and co workers that care will help with this.
But I think we can all agree (or I hope so at least) that it's better to have your function called `findAllDependents` and be slightly out of date than to have it called `function137` with a big comment on top that explains in 5 lines that it finds the list of all dependents.
Glad you admitted subjectivity. I will too and I am on the other side of that subjectivity. For the quicksort example, that was the pseudo code from the Wikipedia article.
I personally think that the algorithm is easier to grasp conceptually if I just need to know 'it partitions the data and the runs quicksort on both of those partitions. Divide and conquer. Awesome'.
I don't care at that level of abstraction _how_ the partitioning works. In fact there are multiple different partition functions people have created that have various characteristics. The fact that this changes its parameters is geberally bad if you ask me but in this specific case of a general purpose and high performance sorting function totally acceptable for the sake of speed and memory use considerations. In other 'real world' scenarios of 'simple business software' I would totally forsake that speed and memory efficiency for better abstractions. This is also where Carmack is basically not a good example. His world is that of high performance graphics and game engine programming where he's literally the one dude that has it all in his head. I can totally see why he would have different from someone like me that has to go look at a different piece of code that I've never seen before every day multiple times.
You mention various problems with this code such as the in place nature and bad naming and such. Most of that is simply the copy from Wikipedia and yes I agree I would also rename these in real code. I do not agree however with the parts about 'someone else could call this now'. To stick with Clean Code's language of choice, the partition function would actually be a private method to the quicksort class. Thus nobody outside can call it but the algorithm itself, as a self contained unit is not just a blob of code.
Same with your inlining of collision detection and such. I don't think I would do that. I think it has value to know that the overall loop is something like
do_X()
do_Y()
detect_collisions()
do_Z()
Overall "game loop" easily visible straight away. The collision detection function might be a private method to that class you're in though. Will depend on real world scenario I would say.
You also mention you could use a comment. Your comment only does half the job though. It only tells me where the partitioning starts, not where it ends. In this example it's sort of easy to see. As the code we are talking about gets larger it's not as easy any more. So you have to make sure to make a new comment above every 'section'. Problem is that this can be forgotten. Now I need to actually read and fully understand the code to figure out these boundaries. I can no longer just tell my editor to jump over something. I can no longer have the compiler ensure that the boundaries are set (it will ensure proper function definition and calls).
> The collision detection function might be a private method to that class you're in though.
Definitely making things private helps a lot, although its worth noting that classes often aren't maintained by only one person, and they often encapsulate multiple public methods and behaviors. It's still possible to clutter a class with private methods and to have other people working on that class that are calling them incorrectly. This is especially true for methods that mutate private state (at least, in my experience), because those state mutations and state assumptions are often not obvious and are undocumented unless you read the implementation code (and private methods tend to be less documented than public methods in my experience).
Writing in a more functional style (even inside of a class) can help mitigate that problem quite a bit since you get rid of a lot of the problematic hidden state, but I don't want to give the impression that if you make a method private that's always safe and it'll never get misused.
> You also mention you could use a comment. Your comment only does half the job though. It only tells me where the partitioning starts, not where it ends.
In this example, I felt like it was overkill to include a closing comment, since the whole thing is like 20 lines of code. But you could definitely add a closing comment here. If you use an editor that supports regions, they're pretty handy for collapsing logic as well. That's a bit language dependent though. If you're using something like C# -- C# has fantastic region support in Visual Studio. Other languages may vary.
Of course, people who don't use an IDE can't collapse your regions, but in my experience people who don't use an IDE also often hate jumping between function definitions since they need to manually find the files or grep for the function name, so I'm somewhat doubtful they'll be too upset in either case.
> I can no longer have the compiler ensure that the boundaries are set
You may already know this, but heads up that if you're worried about scope leaking and boundaries, check if your language of choice supports block expressions or an equivalent. Languages like Rust and Go can allow you to scope arbitrary blocks of code, C (when compiled with gcc) supports statement expressions, and many other languages like Javascript support anonymous/inline functions. Even if you are separating a lot of your code into different functions, it's still nice to be able to occasionally take advantage of those features. I often like to avoid the extra indentation in my code if I can help it, but that's just my own visual preference.
As mainly a bottom-up person, I completely agree with your analysis but I wonder if you might be using "top-down architecture" here in an overloaded way?
My personal style is bottom up, maximally direct code, aiming for monolithic modules under 10kloc, combined with module coupling over very narrow interfaces. Generally the narrow interfaces emerge from finding the "natural grain" of the module after writing it, not from some a priori top-down idea of how the communication pathways should be shaped.
Edit: an example of a narrow interface might be having a 10kloc quantitative trading strategy module that communicates with some larger system only by reading off a queue of things that might need to be traded, and writing to a queue of desired actions.
> The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows.
That's only 1 part of the complexity equation.
When you have 100 lines in 1 function you know exactly the order in which each line will happen and under which conditions by just looking at it.
If you split it into 10 functions 10-lines-long each now you have 10! possible orderings of calling these functions (ignoring loops and branches). And since this ordering is separated into multiple places - you have to keep it in your mind. Good luck inventing naming that will make obvious which of the 3628800 possible orderings is happening without reading through them.
Short functions are good when they fit the problem. Often they don't.
I feel like this is only a problem if the small functions share a lot of global state. If each one acts upon its arguments and returns values without side effects, ordering is much less of an issue IMO.
Well, if they were one function before they probably share some state.
Clean code recommends turning that function into a class and promoting the shared state from local variables into fields. After such a "refactoring" you get a nice puzzle trying to understand what exactly happens.
A lot depends on what your language and its ecosystem can support. For instance, the kind of monadic stuff people do with Haskell and Scala can compress programs tremendously, but then I've worked in a codebase that tried the same things in C++ - and there, the line count expands, because the language just can't express some of the necessary concepts in a concise way.
In javascript I sometimes break up the behaviour of a large function by putting small internal functions inside it. Those internal functions often have side effects, mutating the state of the outer function which contains them.
I find this approach a decent balance between having lots of small functions and having one big function. The result is self contained (like a function). It has the API of a function, and it can be read top to bottom. But you still get many of the readability benefits of small functions - like each of the internal methods can be named, and they’re simple and each one captures a specific thought / action.
If you're calling those functions once each in a particular order then I can't possibly figure out what that does for you that whitespace and a few comments wouldn't. How does turning 100 lines of code into 120 and shuffling it out of execution order possibly make it easier to read?
I coded this way for a while and found it makes the code easier to read and easier to reason about. Instead of your function being
func foo() {
// do A.1
// do A.2
// do B.1
// do B.2
// etc...
}
It becomes
func foo() {
// do A
// do B
// etc...
// func A()...
// func B()...
}
When the func is doing something fairly complicated the savings can really add up. It also makes expressing some concurrency patterns easier (parallel, series etc...), I used to do this a lot back in the async.js days. The main downside seems to be less elegant automated testing from all the internal state.
No; I wouldn't do it if I was just calling them once each in a particular order. And I don't often use this trick for simple functions. But sometimes long functions have repeated behaviour.
For example, in this case I needed to do two recursive tree walks inside this function, so each walk was expressed as an inner function which recursively called itself, and each is called once from the top level method:
I don't do it like this often though. The code in this file is easily the most complex code I've written in nearly 3 decades of programming. Here my ability to read and edit the code is easily the most important factor. I think this form makes the core algorithm more clear than any other way I could factor this code. I considered smearing this internal logic out over several top level methods, but I doubt that would be any easier to read.
Now this is usually in my opinion not a good advice (it is like reintroduction of global variables) as unnecessary state certainly makes things more difficult to reason about.
I have read the book (not very recently) and I do not recall this but perhaps I am just immune to such advice.
I like his book about refactoring more than Clean Code but it introduced me to some good principles like SOLID (a good mnemonic), so I found it somewhat useful.
What I find is that function boundaries have a bunch of hidden assumptions we don't think about.
Especially things like exceptions.
For all these utility functions are you going to check input variables, which means doing it over, over and over again. Catching exceptions everywhere etc?
A function can be used for a 'narrow use case' - but - when it's actually made available to other parts of the system, it needs to be kind of more generalized.
This is the problem.
Is it possible that 'nested functions' could provide a solution? As in, you only call the function once, in the context of some other function, so why not physically put it there?
I can have it's own stack, be tested separately if needed, but it remains exclusive to the context that it is in from a readability perspective - and you don't risk having it used for 'other things'.
You could even have an editor 'collapse' the function into a single line of code, to make the longer algorithm more readable.
The problem is abstraction isn't free. Sometimes it frees up your brain from unnecessary details and sometimes the implementation matters or the abstraction leaks.
Even something as simple as Substring which is a method we use all the time and is far more clear than most helper functions I've seen in code bases.
Is it Substring(string, index, length) or Substring(string, indexStart, indexEnd)
What happens when you pass in "abc".Substring(0,4) do you get an exception or "abc"?
What does Substring(0,-1) do? or Substring (-2,-3).
What happens when you call it on null? Sometimes this matters, sometimes it doesn't.
- Does it destructively modify the argument, or return a substring? Or both?
- If it returns a substring, is it a view over the original string, or a fresh substring that doesn't share memory with the original?
- If it returns a fresh substring, how does it do it? Is it smart or dumb about allocations? This almost never matters, except when it does.
- How does it handle multibyte characters? Do locales impact it in any way?
With the languages we have today, a big part of the function contract cannot be explicitly expressed in function signatures. And it only gets worse with more complicated tools of abstraction.
I posted this elsewhere in the thread, but local blocks that define which variables they read, mutate and export would IMO be a very good solution to this problem:
This is a fascinating idea. In some languages like C or Java or C#, the IDE can probably do this "for free" -- generate, then programmer can spot check for surprises. Or the reverse, highlight a block of code and ask the IDE to tell you about read/mutate/export. In some sense, when you use automatic refactoring tools (like IntelliJ), extract a few lines of code as a new method needs to perform similar static analysis.
In the latest IntelliJ, the IDE will visually hint about mutable, primitive-typed local variables (including method parameters). A good example is a for loop variable (i/j/k). The IDE makes it stand-out. When I write Java, I try to use final everywhere for primitive-typed local variables. (I borrowed this idea from functional programming styles.) The IDE gives me a hint if I accidentally forget to mark something as final.
It's similar, but lambdas don't specify the behaviour as precisely, and they're not as readable since the use of a lambda implies a different intention, and the syntax that transforms them into a scope block is very subtle. They may also have performance overhead depending on the environment, which is (arguably) additional information the programmer has to consider on usage.
>If you split it into 10 functions 10-lines-long each now you have 10! possible orderings of calling these functions (ignoring loops and branches). And since this ordering is separated into multiple places - you have to keep it in your mind. Good luck inventing naming that will make obvious which of the 3628800 possible orderings is happening without reading through them.
It's easy to make this argument in the abstract, but harder to demonstrate with a concrete example. Do you happen to have any 100 lines of code that you could provide that would show this as a challenge to compare to the refactored code?
You're missing likely missing one or more techniques that make this work well:
1. Depth first function ordering, so the execution order of the lines in the function is fairly similar to that of the expanded 100 lines. This makes top to bottom readability reasonable.
2. Explicit naming of the functions to make it clear what they do, not just part1(); part2() etc.
3. Similar levels of abstraction in each function (e.g. not having both a for loop, several if statements based on variables defined in the funtion, and 3 method calls, instead having 4-5 method calls doing the same thing).
4. Explicit pre/post conditions in each method are called out due to the passing in of parameters and the return values. This more effectively helps a reader understand the lifecycle of relevant variables etc.
In your example of 100 lines, the counterpoint is that now I have a method that has at least 100 ways it could work / fail. By breaking that up, I have the ability to reason about each use case / failure mode.
> It's easy to make this argument in the abstract, but harder to demonstrate with a concrete example.
One of the codebases I'm currently working is a big example of that. I obviously can't share parts of it, but I'll say that I agree with GP. Lots of tiny functions kills readability.
> 1. Depth first function ordering, so the execution order of the lines in the function is fairly similar to that of the expanded 100 lines. This makes top to bottom readability reasonable.
Assuming your language supports this. C++ notably doesn't, especially in the cases where you'd produce such small functions - inside a single translation unit, in an anonymous namespace, where enforcing "caller before callee" order would require you to forward-declare everything up front. Which is work, and more lines of code.
> 2. Explicit naming of the functions to make it clear what they do, not just part1(); part2() etc.
That's table stakes. Unfortunately, quite often a properly descriptive name would be 100+ characters long, which obviously nobody does.
> 3. Similar levels of abstraction in each function
That's a given, but in a way, each "layer" of such functions introduces its own sublevel of abstraction, so this leads to abstraction proliferation. Sometimes those abstractions are necessary, but I found it easier when I can handle them through few "deep" (as Ousterhout calls it) functions than a lot of "shallow" ones.
> 4. Explicit pre/post conditions in each method
These introduce a lot of redundant code, just so that the function can ensure a consistent state for itself. It's such a big overhead that, in practice, people skip those checks, and rely on everyone remembering that these functions are "internal" and had their preconditions already checked. Meanwhile, a bigger, multi-step function can check those preconditions once.
I've heard this argument a lot, and I've found generally there's another problem that causes lack of readability than the small functions.
>Assuming your language supports this. C++ notably doesn't, especially in the cases where you'd produce such small functions - inside a single translation unit, in an anonymous namespace, where enforcing "caller before callee" order would require you to forward-declare everything up front. Which is work, and more lines of code.
Here though you're kind of used to reading code upwards though, so flip the depth first and make it depth last (or take the hit on the forward declarations. If you've got more than you can handle of these, your classes are probably too complex regardless (i.e. doing input, parsing, transformation, and output in the same method).
> quite often a properly descriptive name would be 100+ characters long
Generally if this is the case, then the containing class / module / block / ? is too big. Not a problem of small methods, problem is at a higher level.
> Explicit pre/post conditions in each method
I should have been more explicit here - what I meant is that you know that in the first method, that only the first 3 variables matter, and those variables / parameters are not modified / relevant to the rest of the method. Even without specifically coding pre/post-cons, you get a better feel for the intended isolation of each block. You fall into a pattern of writing code that is simple to reason about. Paired with pure methods / immutable variables, this tends to (IMO) generate easily scannable code. Code that looks like it does what it does, rather than code that requires reading every line to understand.
I have seen so much GUI code like this in my career! Real world sophisticated GUIs can have tens or hundreds of attributes to setup. Especially ancient Xlib stuff, this was the norm. You have a few functions with maybe hundreds of lines doing pure GUI setup. No problem -- easy to mentally compartmentalise.
Your deeper point (if I may theorise): Stop following hard-and-fast rules. Instead, do what makes sense and is easy to read and maintain.
> I know how to do it, I just don't always think it's worth it.
Agreed:)
Generally no problem with this method other than it's difficult to at a glance see what each item will get set to. Something like the following might be an easy first step:
For someone looking at this for the first time, the rationale for each random function choice is obtuse so you might consider pulling out each type of random function into something descriptive like randomIntUpto(65536), randomDensity(20, 5), randomIntRange(30, 70).
Does it add value? Maybe - ask a junior to review the two and see which they prefer maintaining. Regardless, this code mostly exists at a single level of abstraction, which tends to imply simple refactorings rather than complex.
My guess is if this extended to multiple (levels / maps / ?) you'd probably split the settings into multiple functions, one per map right...?
Basically I wanted to redraw as little as possible so I build a dependency graph.
But then I wanted to add more parameters and to group them, so I can have many different kinds of trees without hardcoding their parameters. It was mostly an UI problem, not a refactor problem. So I'm rewriting it like this:
Graph editor keeps my dependencies for me, and user can copy-paste 20 different kinds of trees and play with their parameters independently. And I don't need to write any code - a library handles it for me :)
Also now i can add interpolate node which takes 2 configurations and a number and interpolates the result between them. So I can have high grass go smoothly to low grass while trees go from one kind to another.
I am surprised that this is the top answer (Edit: at the moment, was)
How does splitting code into multiple functions suddenly change the order of the code?
I would expect that these functions would be still called in a very specific order.
And sometimes it does not even make sense to keep this order.
But here is a little example (in a made up pseudo code):
function positiveInt calcMeaningOfLife(positiveInt[] values)
positiveInt total = 0
positiveInt max = 0
for (positiveInti=0; i < values.length; i++)
total = total + values[i]
max = values[i] > max ? values[i] : max
return total - max
===>
function positiveInt max(positiveInt[] values)
positiveInt max = 0
for (positiveInt i=0; i < values.length; i++)
max = values[i] > max ? values[i] : max
return max
function positiveInt total(positiveInt[] values)
positiveInt total = 0
for (positiveInt i=0; i < values.length; i++)
total = total + values[i]
return total
function positiveInt calcMeaningOfLife(positiveInt[] values)
return total(values)-max(values)
> How does splitting code into multiple functions suddenly change the order of the code?
Regardless of how smart your compiler is and all the tricks it pulls to execute the codein much the same order, the order in which humans read the pseudo code is changed
01. function positiveInt max(positiveInt[] values)
02. positiveInt max = 0
03. for (positiveInt i=0; i < values.length; i++)
04. max = values[i] > max ? values[i] : max
05. return max
07. function positiveInt total(positiveInt[] values)
08. positiveInt total = 0
09. for (positiveInt i=0; i < values.length; i++)
10. total = total + values[i]
11. return total
12. function positiveInt calcMeaningOfLife(positiveInt[] values)
13. return total(values) - max(values)
Your modern compiler will take care of order in which the code is executed, but as humans need to trace the code line-by-line as [13, 12, 01, 02, 03, 04, 05, 07, 08, 09, 10, 11]. By comparison, the inline case can be understood sequentially by reading lines 01 to 07 in order.
01. function positiveInt calcMeaningOfLife(positiveInt[] values)
02. positiveInt total = 0
03. positiveInt max = 0
04. for (positiveInt i=0; i < values.length; i++)
05. total = total + values[i]
06. max = values[i] > max ? values[i] : max
07. return total - max
> Better? No?
In most cases, yeah probably your better off with the two helper functions. max() and total() are common enough operations, and they are named well enough that we can easily guess their intent without having to read the function body.
However, depending on the size of the codebase, the complexity of the surrounding functions and the location of the two helper functions it's easy to see that this might not always be the case.
If you want to try and understand the code for the first time, or if you are trying to trace down some complex bug there's a chance having all the code inline would help you.
Further, splitting up a large inline function is more trivial than reassembling many small functions (hope you got your unit tests!).
> And sometimes it does not even make sense to keep this order.
Agreed. But naming and abstractions are not trival problems. Often times it's the larger/more complex codebases, where you see these practices get applied more dogmatically
Well, inlining by the compiler would be expected but we do not only write the code for the machine but also for another human being (that could be yourself at another moment of time of course).
Splitting the code into smaller functions does not automatically warrant a better design, it is just one heuristic.
A naive implementation of the principle could perhaps have found a less optimal solution
function positiveInt max(positiveInt value1, positiveInt value2)
return value1 > value2 ? value1 : value2
function positiveInt total(positiveInt value1, positiveInt value2)
return value1 + value2
function positiveInt calcMeaningOfLife(positiveInt[] values)
positiveInt total = 0
positiveInt max = 0
for (positiveInt i=0; i < values.length; i++)
total = total(total, values[i])
max = max(max, values[i])
return total - max
Now this is a trivial example but we can imagine that instead of max and total we have some more complex calculations or even calls to some external system (a database, API etc).
When faced with a bug, I would certainly prefer the refactoring in the GP comment than one here (or the initial implementation).
I think that when inlining feels strictly necessary then there has been problem with boundary definition but I agree that being able to view one single execution path inlined can help to understand the implementation.
I completely agree that naming and abstractions are perhaps two most complicated problems.
> but we do not only write the code for the machine but also for another human being (that could be yourself at another moment of time of course).
That's the thing, isn't it? Various arguments have been raised all across this thread, so I just want to put a spotlight on this principle, and say:
Myself, based on my prior experience, find code with few larger functions much more readable than the one with lots of small functions. In fact, I'd like a tool that could perform the inlining described by the GP for me, whenever I'm working in a codebase that follows the "lots of tiny functions" pattern.
Perhaps this is how my brain is wired, but when I try to understand unfamiliar code, the first thing I want to know is what it actually does, step by step, at low level, and only then, how these actions are structured into helpful abstractions. I need to see the lower levels before I'm comfortable with the higher ones. That's probably why I sometimes use step-by-step debugging as an aid to understanding the code...
>the first thing I want to know is what it actually does, step by step, at low level
I feel like we might be touching on some core differences between the top-down guys and the bottom-up guys. When I read low level code, what I'm trying to do is figure out what this code accomplishes, distinct from "what it's doing". Once I figure it out and can sum up its purpose in a short slogan, I mentally paper over that section with the slogan. Essentially I am reconstructing the higher level narrative from the low level code.
And this is precisely why I advocate for more abstractions with names that describe its behavior; if the structure and the naming of the code provide me with these purposeful slogans for units of work, that's a massive win in terms of effort to comprehend the code. I wonder if how the bottom-up guys understand code is substantially different? Does your mental model of code resolve to "purposeful slogans" as stand-ins for low level code, or does your mental model mostly hold on to the low level detail even when reasoning about the high level?
> Does your mental model of code resolve to "purposeful slogans" as stand-ins for low level code,
It does!
> or does your mental model mostly hold on to the low level detail even when reasoning about the high level?
It does too!
What I mean is, I do what you described in your first paragraph - trying to see happens at the low level, and build up some abstractions/narrative to paper it over. However, I still keep the low-level details in the back of my mind, and they inform my reasoning when working at higher levels.
> if the structure and the naming of the code provide me with these purposeful slogans for units of work, that's a massive win in terms of effort to comprehend the code
I feel the same way. I'm really grateful for good abstractions, clean structure and proper naming. But I naturally tend to not take them at face value. That is, I'll provisionally accept the code is what it says it is, but I feel much more comfortable when I can look under the hood and confirm it. This practice of spot-checking implementation saved me plenty of times from bad naming/bad abstractions, so I feel it's necessary.
Beyond that, I generally feel uncomfortable about code if I can't translate it to low-level in my head. That's the inverse of your first paragraph. When I look at high-level code, my brain naturally tries to "anchor it to reality" - translate it into something at the level of sequential C, step through it, and see if it makes sense. So for example, when I see:
foo = reduce(map(bar, fn), fn2)
My mind reads it as both:
- "Convert items in 'bar' via 'fn' and then aggregate via 'fn2'", and
- "Loop over 'bar', applying 'fn' to each element, then make an accumulator, initialize it to first element of result, loop over results, setting the accumulator to 'fn2(accumulator, element)', and return that - or equivalent but more optimized version".
To be able to construct the second implementation, I need to know how 'map' and 'reduce' actually work, at least on the "sequential C pseudocode" level. If I don't know that, if I can't construct that interpretation, then I feel very uncomfortable about the code. Like floating above the cloud cover, not knowing where I am. I can still work like this, I just feel very insecure.
One particular example I remember: I was very uncomfortable with Prolog when I was learning it in university, until one day I read a chapter about implementing some of its core features in Lisp. When I saw how Prolog's magic works internally, it all immediately clicked, and I could suddenly reason about Prolog code quite comfortably, and express ideas at its level of abstraction.
One side benefit of having a simultaneous high and low-level view is, I have a good feel about the lower bound of performance of any code I write. Like in the map/reduce example above: I know how map and reduce are implemented, so I know that the base complexity will be at least O(n), how complexity of `fn` and `fn2` influence it, how the data access pattern will look like, how memory allocation will look like, etc.
Perhaps performance is where my way of looking at things comes from - I started programming because I wanted to write games, so I was performance-conscious from the start.
>If I don't know that, if I can't construct that interpretation, then I feel very uncomfortable about the code.
This is probably the biggest difference with myself. If I have a clear concept of how the abstractions operate in the context of the related abstractions and the big picture, I feel perfectly comfortable not knowing the details of how it gets done at a lower level. To me, the details just get in the way of comprehending the big picture.
A common problem with code written like that is checking the same preconditions repeatedly (or worse - never) and transforming data one way and back for no reason. I remember a bug I helped fix a fresh graduate that joined our project. It crashed with NPE when a list was empty. It's weird cause empty list should cause IndexOutOfBound if anything and the poor guy was stumped.
I looked at call stack and we got list as input then it was changed to null if it was empty then it was checked for size and in yet another function it was dereferenced and indexed.
Guy was trying to fix it by adding yet another if then else 5 levels in callstack below the first time it was checked for size. No doubt then another intern would have added even more checks ;)
If you don't know what happens to your data in your program you're doing voodoo programming.
There's certainly some difference in priorities between massive 1000-programmer projects where complexity must be aggressively managed and, say, a 3-person team making a simple web app. Different projects will have a different sweet spot in terms of structural complexity versus function complexity. I've seen code that, IMO, misses the sweet spot in either direction.
Sometimes there is too much code in mega-functions, poor separation of concerns and so on. These are easy mistakes to make, especially for beginners, so there are a lot of warnings against them.
Other times you have too many abstractions and too much indirection to serve any useful purpose. The ratio of named things, functional boundaries, and interface definitions to actual instructions can easily get out of hand when people dogmatically apply complexity-managing patterns to things that aren't very complex. Such over-abstraction can fall under YAGNI and waste time/$ as the code becomes slower to navigate, slower to understand in depth, and possibly slower to modify.
I think in software engineering we suffer more from the former problem than the latter problem, but the latter problem is often more frustrating because it's easier to argue for applying nifty patterns and levels of indirection than omitting them.
Just for a tangible example: If I have to iterate over a 3D data structure with an X Y and Z dimension, and use 3 nested loops to do so, is that too complex a function? I'd say no. It's at least as clear without introducing more functional boundaries, which is effort with no benefit.
Well named functions are only half (or maybe a quarter) of the battle. Function documentation is paramount in complex codebases, since documentation should describe various parameters in detail and outline any known issues, side-effects, or general points about calling the function. It's also a good idea to document when a parameter is passed to another function/method.
Yeah, it's a lot of work, but working on recent projects have really taught me the value of good documentation. Naming a function send_records_to_database is fine, but it can't tell you how it determines which database to send the records to, or how it deals with failed records (if at all), or various alternative use cases for the function. All of that must come from documentation (or reading the source of that function).
Plus, I've found that forcing myself to write function documentation, and justify my decisions, has resulted in me putting more consideration into design. When you have to say, "this function reads <some value> name from <environmental variable>" then you have to spend some time considering if future users will find that to be a sound decision.
> documentation should describe various parameters in detail and outline any known issues, side-effects, or general points about calling the function. It's also a good idea to document when a parameter is passed to another function/method.
I'd argue that writing that much documentation about a single function suggests that the function is a problem and the "send_records_to_database" example is a bad name. It's almost inevitable that the function doing so much and having so much behavior that needs documentation will, at some point, be changed and make the documentation subtly wrong, or at least incomplete.
What's the alternative? Small functions get used in other functions. Eventually you end up with a function everyone's calling that's doing the same logic, just itself calling into smaller functions to do it.
You can argue that there should be separate functions for `send_to_database` and `lock_database` and `format_data_for_database` and `handle_db_error`. But you're still going to have to document the same stuff. You're still going to have to remind people to lock the database in some situations. You're still going to have to worry about people forgetting to call one of those functions.
And eventually you're going to expose a single endpoint/interface that handles an entire database transaction including stuff like data sanitation and error handling, and then you're going to need to document that endpoint/interface in the same way that you would have needed to document the original function.
> Small functions get used in other functions. Eventually you end up with a function everyone's calling that's doing the same logic, just itself calling into smaller functions to do it.
Invert the dependencies. After many years of programming I started deliberately asking myself "hmm, what if, instead of A calling B, B were to call A?" and now it's become part of my regular design and refactoring thinking. See also Resource Acquisition Is Initialization.
> See also Resource Acquisition Is Initialization.
I'm not sure I follow. RAII removes the ability to accidentally forget to call destruction/initialization code and allows managing resource lifecycle. It doesn't remove the need to document how that code works, it just means you're now documenting it as part of the class/block. Freeing a resource during a destructor, locking the database during a constructor -- that stuff still has to be documented the same way it would have been documented if you put it into a single function instead of a single class.
Even with dependency inversion, you still end up eventually with the same problem I brought up:
> And eventually you're going to expose a single endpoint/interface that handles an entire database transaction including stuff like data sanitation and error handling, and then you're going to need to document that endpoint/interface in the same way that you would have needed to document the original function.
Maybe you call your functions in a different order or way, maybe you invert the dependency chain so your smaller functions are getting passed references to the bigger ones. You're still running the same amount of code, you haven't gotten rid of your documentation requirements.
Unless I'm misunderstanding what you mean by inversion of dependencies. Most of the dependency inversion systems I've seen in the wild increase the number of interfaces in code because they're trying to reduce coupling, which in turn increases the need to document those interfaces. But it's possible I've only seen a subset, or that you're doing something different.
> increase the number of interfaces in code because they're trying to reduce coupling
Yes, exactly! You want lots of interfaces. You want very small interfaces.
> which in turn increases the need to document those interfaces.
Not if the interfaces are small. For example, in the Go language standard library we find two interfaces: io.Reader and io.Writer. They each define a single method. In the case of io.Reader, that method is defined as Read(p []byte) (n int, err error) and correspondingly io.Writer has Write(p []byte) (n int, err error)
These interfaces are so small they barely need documentation.
> These interfaces are so small they barely need documentation.
Sort of.
On the other end of the dependency inversion chain, there is some code that implements those interfaces. That code comes with various caveats that need to be documented.
Then there's the glue code, the orchestration - the part that picks a concrete thing, makes it conform to a desired interface, and passes it to the component which needs it. In order to do its job correctly, this orchestrating code needs to know all the various caveats of the concrete implementation, and all the idiosyncratic demands of the desired interface. When writing this part you may suddenly discover that your glue code is buggy, because the "trivial" interface was thoroughly undocumented.
My style is similar about tiny interfaces: My usual style in Java is an interface with a single method and a nested POJO (struct) called Result. Then, I have a single implementation in production, and another implementation for testing (mocking in 2010s forward). Some of my longer lived projects might have 100s of these after a few years.
Please enjoy this silly, but illustrative example!
public interface HerdingCatsService {
/*public static*/ final class Result {
...
}
Result herdThemCats(ABunchOfCats soMuchFun)
throws Exception;
}
Yikes, I hope I don't have to read documentation to understand how the code deals with failed records or other use cases. Good code would have the use cases separated from the send_records_to_database so it would be obvious what the records were and how failure conditions are handled.
How else are you going to understand how a library works besides RTFM or RTFC? I guess the third option is copy pasta from stack overflow and hope your use case doesn't require any significant deviation?
You seriously never have to read documentation?
Must be nice, I've been balls-deep in GCP libraries and even simple things like pulling from a PubSub topic have footguns and undocumented features in certain library calls. Like subscriber.subscribe returns a future that triggers a callback function for each polled message, while subscriber.pull returns an array of messages.
That's a pretty damn obvious case where functions should have been named "obviously" (pull_async, pull_sync), yet they weren't. And that's from a very widely used service from one of the biggest tech companies out there, written by a person that presumably passed one of the hardest interviews in the industry and gets paid in the top like 1% of developer.
Without documentation, I would have never figured those out.
"Plus, I've found that forcing myself to write function documentation, and justify my decisions, has resulted in me putting more consideration into design."
This, this, and... this.
Sometimes, I step back after writing documentation and realise, this is a bunch of baloney. It could be much simpler, or this is a terrible decision! My point: Writing documentation is about expressing the function a second time -- the first time was code, the second time was natural language. Yeah, it's not a perfect 1:1 (see: the law in any developed country!), but it is a good heuristic.
Documentation is only useful it is up to date and correct. I ignore documentation because I've never found the above are true.
There are contract/proof systems that seem like they might work help. At least the tool ensures it is correct. However I'm not sure if such systems are readable. (I've never used one in the real world)
Oh I agree, but a person who won't take the time to update documentation after a significant change, certainly isn't going to refactor the code such that the method name matches the updated functionality. Assuming they can even update the name if they wanted to.
After all, documentation is cheap. If you're going to write a commit message, why not also update the function docs with pretty much the same thing? "Filename parameter will now use S3 if an appropriate URI is passed (i., filename='s3://bucket/object/path.txt'). Note: doesn't work with path-style URLs."
> The idea of code telling a story is that a unit of work should explain what it does through its use of well named variables, function/object names, and how data flows between function/objects.
Code telling a story is a fallacy that programmers keep telling themselves and which fails to die. Code doesn't tell stories, programmers do. Code can't explain why it exists; it can't tell you about the buggy API it relies on and which makes its implementation weird and not straight-forward; it can't say when it's no longer needed.
Good names are important, but it's false that having well-chosen function and arguments names will tell a programmer everything they need to know.
Code can't tell every relevant story, but it can tell a story about how it does what it does. Code is primarily written for other programmers. Writing code in such a way that other people with some familiarity with the problem space can understand easily should be the goal. But this means telling a story to the next reader, the story of how the inputs to some functional unit are translated into its outputs or changes in state. The best way to explain this to another human is almost never the best way to explain it to a computer. But since we have to communicate with other humans and to the computer from the same code, it takes some effort to bridge the two paradigms. Having the code tell a story at the high level by way of the modules, objects and methods being called is how we bridge this gap. But there are better and worse ways to do this.
Software development is a process of translating the natural language-spec of the system into a code-spec. But you can have the natural language-spec embedded in the structure of the code to a large degree. The more, the better.
Code is not primarily written for other programmers. It's written for the computer, the primary purpose is to tell the computer what to do. Readability is desirable, but inherently secondary to that concern, and abstraction often interferes with your ability to understand and express what is actually happening on the silicon - even if it improves your ability to communicate the abstract problem. Is that worth it? It's not straightforward.
An overemphasis on readability is how you get problems like "Twitter crashing not just the tab but people's entire browser for multiple years". Silicon is hard to understand, but hiding it behind abstractions also hides the fundamental territory you're operating in. By introducing abstractions, you may make high-level problems easier to tackle, but you make it much harder to tackle low-level problems that inevitably bubble up.
A good symptom of this is that the vast majority of JS developers don't even know what a cache miss is, or how expensive it is. They don't know that linearly traversing an array is thousands of times faster than linearly traversing a (fragmented) linked list. They operate in such an abstract land that they've never had to grapple with the actual nature of the hardware they're operating on. Performance issues that arise as a result of that are a great example of readability obscuring the fundamental problem.
>Code is not primarily written for other programmers.
I should have said code should be written primarily for other programmers. There are an infinite number of ways to express the same program, and the computer is indifferent to which one it is given. But only a select few are easily understood by another human. Code should be optimized for human readability barring overriding constraints. Granted, in some contexts efficiency is more important than readability down the line. But such contexts are few and far between. Most code does not need to consider the state of the CPU cache, for example.
Joel Spolsky opened my eyes to this issue: Code is read more than it is written. In theory, code is written once (then touched-up for bugs). For 99.9% its life, it is read-only. That is a strong case for writing readable code. I try to write my code so that a junior hire can read and maintain it -- from a technical view. (They might be clueless about the business logic, but that is fine.) Granted, I am not always successful in this goal!
Code should be written for debugability, not readability. I don't care if it takes someone 20 minutes to understand my algorithm, if when they understand it bugs become immediately obvious.
Most simplification added to your code obscures the underlying operations on the silicon. It's like writing a novel so a 5-year-old can read it, versus writing a novel for a 20-year-old. You want to communicate the same ideas? The kid's version is going to be hundreds of times longer. It's going to take longer to write, longer to read, and you're much more likely to make mistakes related to non-local dependencies. In fact, you're going to turn a lot of local dependencies into non-local dependencies.
Someone who's competent can digest much more complex input, so you can communicate a lot more in one go. Training wheels may make it so anyone can ride your bike but they also limit your ability to compete in, say, the Tour de France.
Also, this is a side note, but "code is read by programmers" is a bit of a platitude IMO - it's wordplay. Your code is also read by the computer a lot more than it's read by other programmers. Keep your secondary audience in mind, but write for your primary audience.
My point was not just about performance - a lot of bugs come from the introduction of abstractions to increase readability, because the underlying algorithms are obscured. Humans are just not that good at reading algorithms. Transforming operations on silicon into a form we can easily digest requires misrepresenting the problem. Every time you add an abstraction, you increase the degree of misrepresentation. You can argue that's worth it because code is read a lot, but it's still a tradeoff.
But another point worth considering is that a lot of things that make code easier to read make it much harder to rewrite, and they can also make it harder to debug.
>Transforming operations on silicon into a form we can easily digest requires misrepresenting the problem.
Do you have an example, as this is entirely counter to my experience. Of course, you can misrepresent the behavior in words, but then you just used the wrong words to describe what's going on. That's not an indictment of abstraction generally. Abstractions necessarily leave something out, but what is left out is not an assertion of absence. This is not a misrepresentation.
You don't need to assert absence, the abstraction inherently ignores that which is left out, and the reader remains ignorant of it (that's the point, in fact). The abstraction asserts that the information it captures is the most useful information, and arguably it asserts that it is the only relevant information. This may be correct, but it may also be wrong. If it's wrong, any bugs that result will be hard to solve, because the information necessary to understand how A links to B is deliberately removed in the path from A to B.
2. ---
An abstraction is a conceptual reformulation of the problem. Each layer of abstraction reformulates the problem. It's lossy compression. Each layer of abstraction is a lossy compression of a lossy compression. You want to minimise the layers because running the problem through multiple compressors loses a lot information and obscures the constraints of the fundamental problem.
3. ---
You don't know a-priori if the information your abstraction leaves out is important.
I would go further and argue: leaving out the wrong information is usually a disaster, and very hard to reverse. One way to avoid this is to avoid abstractions (not that I'd recommend it, but it's part of the tradeoff).
4. ---
Abstractions misrepresent by simplifying. For example, the fundamental problem you're solving is moving electrons through wires. There are specific problems that occur at that level of specificity which you aren't worried about once you introduce the abstraction of the CPU's ISA. For example, bit instability.
Do those problems disappear at the level of the ISA? No, you've just introduced an abstraction which hides them, and hopefully they don't bubble up. The introduction of that abstraction also added overhead, partly in order to ensure the lower-level problems don't bubble up.
Ok, let's go up a few levels. You're now using a programming language. One of your fundamental problems here is cache locality. Does your code trigger cache misses? Well, it's not always clear, and it becomes less clear the more layers of abstraction you add.
"But cache locality rarely matters," ok, but sometimes it does, and if you have 10 layers of abstraction, good luck solving that. Can you properly manage cache locality in Clojure? Not a chance. It's too abstract. What happens when your Clojure code is eventually too slow? You're fucked. The abstraction not only makes the problem hard to identify, it makes it impossible to solve.
Abstractions are about carving up the problem space into conceptual units to aid comprehension. But these abstractions do not suggest lower level details don't exist. What they do is provide sign posts from which one can navigate to the low level concern of interest. If I need to edit the code that reads from a file, ideally how the problem space is carved up allows me to zero-in on the right code by allowing me to eliminate irrelevant code from my search. It's a semantic b-tree search. Without this tower of abstractions, you have to read the entire codebase linearly to find the necessary points to edit. There's no way you can tell me this is more efficient.
Of course, not all problems are suited to this kind of conceptual division. Cross-cutting concerns are inherently the sort that cannot be isolated in a codebase. Your example of cache locality is case in point. You simply have to scan the entire codebase to find instances where your code is violating cache locality. Abstractions inherently can't help, and do hurt somewhat in the sense that there's more code to read. But the benefits overall are worth it in most contexts.
I feel like you didn't really engage with most of what I said. It sounds like you're repeating what you were taught as an undergraduate (I hope that doesn't come across as crass).
I understand the standard justifications for abstraction - I'm saying: I have found that those justifications do not take into account or accurately describe the problems that result, and they definitely underestimate the severity. Repeatedly changing the shape of a problem until it is unrecognisable results in a monster, and it's not as easy to tame as our CS professors make out.
To reiterate: Twitter, with a development budget of billions was crashing people's entire browsers for multiple years. That's not even server-side, where the real complexity is - that's the client. That kind of issue simply should not exist, and it wouldn't if it were running on a (much) shallower stack.
This is a side note, but you keep referencing the necessity of the tower. Bear in mind what happens when you increase the branching factor on a tree. You don't need a tower to segment the problem effectively. 100-item units allow segmenting one million items with three layers, and 10 billion items with five. Larger units mean much, much fewer layers.
>I feel like you didn't really engage with most of what I said.
I didn't engage point-by-point because I strongly disagree with how you characterize abstractions and going point-by-point seemed like overkill. They don't misrepresent--they carve up. If you take the carving at a given layer as all there is to know, the mistake is yours. And this isn't something I was taught in school, rather I converged to this style of programming independently. My CS program taught CS concepts, we were responsible for discovering how to construct programs on our own. Most of the students struggled to complete moderately large assignments. I found them trivial, and I attribute this to being able to find the right set of abstractions for the problem. Find the right abstractions, and the mental load of the problem is never bigger than one moderately sized functional unit. This style of development has served me very well in my career. You will be hard-pressed to talk me out of it.
>Repeatedly changing the shape of a problem until it is unrecognisable results in a monster
I can accept some truth to this in low-level/embedded contexts where the "shape" of the physical machine is a relevant factor and so hiding this shape behind a domain-specific abstraction can cause problems. But most software projects can ignore the physical machine and program to a generic Turing-machine.
>You don't need a tower to segment the problem effectively
Agreed. Finding the right size of the functional units is critical. 100 interacting units is usually way too much. The right size for a functional unit is one where you can easily inspect it for correctness and be confident there are no bugs. As the functional unit gets larger, your ability to even be confident (let alone correct) falls off a cliff. A good set of abstractions is one where (1) the state being manipulated is made obvious at all times, (2) each functional unit is sized such that it can easily be inspected for correctness, and (3) each layer provides a non-trivial increase in resolution of the solution. I am as much against useless abstractions and endless indirection as anyone.
I don't think we're going to agree on this, so I'll just say that I do grok the approach you're advocating, I used to think like you, and I've deliberately migrated away from it. I used to chunk everything into 5ish-line functions that were very clean and very carefully named, being careful to encapsulate with clean objects with clearly-defined boundaries, etc. I moved away from that consciously.
I don't work in low-level or embedded (although I descend when necessary). My current project is a desktop accessibility application.
Like, I can boil a lot of our disagreement down to this:
> 100 interacting units is usually way too much.
I don't think this is true. It's dogma.
First, they aren't all interacting. Lines in a function don't interact with every other line (although you do want to bear in mind the potential combinatorial complexity for the reader). But more specifically: 100-line functions are absolutely readable most of the time, provided they were written by someone talented. The idea that they aren't is... Wrong, in my opinion. And they give you way more implementation flexibility because they don't force you into a structure defined by clean barriers. They allow you to instead write the most natural operation given the underlying datastructure.
Granted, you often won't be able to unit-test that function as easily, but unit tests are not the panacea everyone makes out, in my opinion. Functional/integration tests are usually significantly more informative and they target relevant bugs a lot more effectively - partly because the surface you need to cover is much smaller with larger units, so you can focus your attacks.
> 100-line functions are absolutely readable most of the time, provided they were written by someone talented.
Readable, sure. Easily inspected for correctness, not in most cases. The 100 lines won't all interact, but you don't know this until you look. So much mental effort is spent navigating the 100 lines to match braces, find where variables are defined, where they are in scope, and whether they are mutated elsewhere within the function, comprehend how state changes as the lines progress, find where errors can occur and ensure they are handled within the right block and that control flow continues or exits appropriately, and so on. So little of this is actually about understanding the code's function, its about comprehending the incidental complexity due to its linear representation. This is bad. All of this incidental complexity makes it harder to reason about the code's correctness. Most of these incidental concerns can be eliminated through the proper use of abstractions.
The fact is, code is not written linearly nor is it executed linearly. Why should it be read linearly? There is a strong conceptual mismatch between how code is represented as linear files and its intrinsic structure as a DAG. Well structured abstractions help us move the needle of representation towards the intrinsic DAG structure. This is a win for comprehension.
>Functional/integration tests are usually significantly more informative and they target relevant bugs a lot more effectively - partly because the surface you need to cover is much smaller with larger units, so you can focus your attacks.
Honestly, this characterisation doesn't ring true to me at all. I find long functions much easier to read, inspect and think about than dutifully decomposed lasagne that forces me to jump around the codebase. But also, like... Scanning for matching braces? Who is writing your code? Indentation makes that extremely clear. And your IDE should have a number of tools for quickly establishing uses of a name, and scope.
The older I get, the more I think the vast majority of issues along the lines of "long code is hard to reason about" are just incompetent programmers being let loose on the codebase. Comment rot is another one - who on earth edits code without checking and modifying the surrounding comments? That's not an inherent feature of programming to me, it's crazy. However, I absolutely see comment rot in lasagne code - because the comments aren't proximate to the algorithm.
With regards to the idea that abstractions inherently misrepresent, I'll defer to Joel Spolsky for another point:
It is, but GP's point is pretty clear. Perhaps a better way to express it would be: unlike natural languages, programming languages are insufficiently expressive for the code to tell the full story. That's why books tell stories, and code is - at best - Cliff's Notes.
Literate programming is for programs that is static and don't ever change much. Works great for those cases though.
No, what works is the same that worked 20 years ago. Nothing have truly changed. You still have layers upon layers, that sometimes pass something, othertimes not, and you sometimes wished it passed something, othertimes not.
Your argument falls apart once you need to actually debug one of these monstrosities, as often the bug itself also gets spread out over half a dozen classes and functions, and it's not obvious where to fix it.
More code, more bugs. More hidden code, more hidden bugs. There's a reason those who have worked in software development longer tend to prefer less abstraction: most of them are those who have learned from their experiences, and those who aren't are "architects" optimising for job security.
If a function is only called once it should just be inline, the IDE can collapse. A descriptive comment can replace the function name. It can be a lambda with immediate call and explicit captures if you need to prevent the issue of not knowing which local variables it interacts with as the function grows significantly, or if the concern is others using leftover variables its own can go into a plain scop e. Making you have to jump to a different area of code to read just breaks up linear flow for no gain, especially when you often have to read it anyway to make sure it doesn't have global side effects, might as well read it in the single place it is used.
If it is going to be used more than once and is, then make a function (unless it is so trivial the explicit inline is more readable). If you are designing a public API where it may need to be overridden count it as more than once.
I don't get this. This is literally what the 'one level of abstraction' rule is for.
If you can find a good name for a piece of code I don't need to read in detail, why do you want to make me skip from line 1458 to line 2345 to skip over the details of how you do that thing? And why would you add a comment on it instead of making it a function that is appropriately named and I don't have to break my reading flow to skip over a horrendously huge piece of code?
> why do you want to make me skip from line 1458 to line 2345 to skip over the
You should be using an editor that can jump to the matching brace if it is all in its own scope or lambda. There are other tools like #pragma region depending on language. For a big function of multiple large steps and I only wanted to look at a part of it I'd fold it at the first indent level for an overview and unfold the parts I want to look at. But when I'm reading through the whole thing or stepping through in the debugger it is terrible to make you jump around and needs much more sophisticated tooling to jump to the right places consistently in complicated languages like C++.
If there is a big long linear sequence of steps that you need to read, you just want to read it, not jump around because someone wanted to put a descriptive label over the steps. Just comment it, that's the label, not the function name, since it's only ever used once.
You would rarely want it in something like a class overview since it is only called once, but if you could make a case for needing that, profiling tools are limited to it, etc., then those could be reasons.
My editor can fold/jump, no issues there. Though to be fair vi can easily do it for languages that use {} for blocks which Python is not for example. But it breaks my flow nonetheless. Instead, if I had a function my reading flow is not broken. I can skip over the details of how "calculateX()" is achieved. All I need to know at the higher level of abstraction is that in order for this (hypothetical scenario) piece of code to do its thing is that in that step it needs to calculate X and I can move on and see what it does with X. It is not important how X is calculated. If calculateX() was say a UUIDv4 calculation, would you want to inline that or just call "uuid.v4()" and move on to the next line that does something interesting with that UUID now?
You mention debuggers too. Here I can't jump easily. I still can jump in various ways depending on your tooling but again it is made harder. With proper levels of abstraction I can either step over the function call because I don't care _how_ calculateX() is done or I can step in and debug through that because I've narrowed it down to something being wrong in said function somewhere.
Maybe you've just never had a properly abstracted code base (none are perfect of course but there's definitely good and bad ones). Code can either make good use of these different levels of abstraction as per Clean Code or it can throw in functions nilly willy, badly named, with global state manipulations all over the place for good measure, side effects from functions that don't seem like they would etc. If those are the only code bases you've worked with I would understand your frustration. Still I'd rather move towards a properly structured and abstracted code base than to inline everything and land in code duplication hell.
> The best code is code you don't have to read because of well named functional boundaries.
I don't know which is harder. Explaining this about code, or about tests.
The people with no sense of DevX see nothing wrong with writing tests that fail as:
Expected undefined to be "foo"
If you make me read the tests to modify your code, I'm probably going to modify the tests. Once I modify the tests, you have no idea if the new tests still cover all of the same concerns (especially if you wrote tests like the above).
Make the test red before you make it green, so you know what the errors look like.
I have a couple people who use a wall of boiler plate to do something 3 lines of mocks could handle, and not couple the tests to each other in the process.
Every time I have to add a feature I end up rewriting the tests. But you know, code coverage, so yay.
I see this with basically any Javascript test. Yes, mocking any random import is really cool and powerful, but for fucks sake, can we just use a DI container so that the tests don’t look like satans’ invocation.
“There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.”
― C. A. R. Hoare
This quote does not scale. Software contains essential complexity because it was built to fulfill a need. You can make all of the beautiful, feature-impoverished designs you want - they won't make it to production, and I won't use them, because they don't do the thing.
If your software does not do the thing, then it's not useful, it's a piece of art - not an artifact of software engineering that is meant to fulfill a purpose.
But not everybody codes “at scale”. If you have a small, stable team, there is a lot less to worry about.
Secondly it is often better to start with less abstractions and boundaries, and add them when the need becomes apparent, rather than trying to remove ill conceived boundaries and abstractions that were added at earlier times.
Coding at scale is not dependent on the number of people, but on the essential complexity of the problem. One can fail at a one-man project due to lack of proper abstraction with a sufficiently complex problem. Like, try to write a compiler.
> The idea of code telling a story is that a unit of work should explain what it does through its use of well named variables, function/object names, and how data flows between function/objects. If you have to dig into the details of a function to understand what it does, you have failed to sufficiently explain what the function does through its naming and set of arguments.
That's fine in theory and I still sort-of believe that, but in practice, I came to believe most programming languages are insufficiently expressive for this vision to be true.
Take, as a random example, this bit of C++:
//...
const auto foo = Frobnicate(bar, Quuxify);
Ok, I know what Frobnification is. I know what Quuxify does, it's defined a few lines above. From that single line, I can guess it Frobs every member of bar via Quuxify. But is bar modified? Gotta check the signature of Frobnicate! That means either getting an IDE help popup, or finding the declaration.
template<typename Stuffs, typename Fn>
auto Frobnicate(const std::vector<Stuffs>&, Fn)
-> std::vector<Stuffs>;
From the signature, I can see that bar full of Bars isn't going to be modified. But then I think, is foo.size() going to be equal to bar.size()? What if bar is empty? Can Frobnicate throw an exception? Are there any special constraints on the function Fn passed to it? Does Fn have to be a funcallable thing? Can't tell that until I pop into definition of Frobnicate.
I'll omit the definition here. But now that I see it, I realize that Fn has to be a function of a very particular signature, that Fn is applied to every other element of the input vector (and not all of them, as I assumed), that the code has a bug and will crash if the input vector has less than 2 elements, and it calls three other functions that may or may not have their own restrictions on arguments, and may or may not throw an exception.
If I don't have a fully-configured IDE, I'll likely just ignore it and bear the risk. If I have, I'll routinely jump-to-definition into all these functions, quickly eye them for any potential issues... and, if I have the time, I'll put a comment on top of Frobnicate declaration, documenting everything I just learned - because holy hell, I don't want to waste my time doing the same thing next week. I would rename the function itself to include extra details, but then the name would be 100+ characters long...
Some languages are better at this than others, but my point is, until we have programming languages that can (and force you to) express the entire function contract in its signature and enforce this at compile-time, it's unsafe to assume a given function does what you think it does. Comments would be a decent workaround, if most programmers could be arsed to write them. As it is, you have to dig into the implementation of your dependencies, at least one level deep, if you want to avoid subtle bugs creeping in.
This is a good point and I agree. In fact, I think this really touches on why I always had a hard time understanding C++ code. I first learned to program with C/C++ so I have no problem writing C++, but understanding other people's code has always been much more difficult than other languages. Its facilities for abstraction were (historically) subpar, and even things like aliased variables where you have to jump to the function definition just to see if the parameter will be modified really get in the way of easy comprehension. And then the nested template definitions. You're right that how well relying on well named functional boundaries works depends on the language, and languages aren't at the point where it can be completely relied on.
This is true but having good function names will at least help you avoid going two levels deep. Or N levels. Having a vague understanding of a function call’s purpose from its name helps because you have to trim the search tree somewhere.
Though, if you’re in a nest of tiny forwarding functions, who knows how deep you’ll have to go?
> having good function names will at least help you avoid going two levels deep. Or N levels.
I agree. You have to trim your search space, or you'll never be able to do anything. What I was trying to say is, I don't know of the language that would allow you to only ever rely on function names/signatures. None that I worked could do that in practice.
> if you’re in a nest of tiny forwarding functions, who knows how deep you’ll have to go?
That's the reason I hate the "Clean Code"-ish pattern of lots of very tiny functions. I worked in a codebase written in this style, and doing anything with it felt like it was 90% jumping around function definitions, desperately trying to keep them all in my working memory.
I think part of the problem is imitating having abstraction boundaries without actually doing the work to make a clean abstraction. If you’re reading the source code of a function, the abstraction is failing.
The function calls you write will often “know too much,” depending on implementation details in a way that make the implementation harder to change. It’s okay if you can fix all the usages when needed.
Real abstraction boundaries are expensive and tend only to be done properly out of necessity. (browser API’s, Linux kernel interface.) If you’re reading a browser implementation instead of standards docs to write code then you’re doing it wrong since other browsers, or a new version of the same browser, may be different.
Having lots of fake abstraction boundaries adds obfuscation via indirection.
One more angle: reliable & internalized abstraction vs unfamiliar one.
Java string is abstraction over bytes.
I feel I understand it intimately even though I have not read the implementation.
When I try to understand code fully (searching for root cause), and there is String.format(..), I don't dig deeper into string - I already am confident that I understand what that line does.
Browser and linux api I guess would fall into same category (for others).
Unfamiliar abstraction even with seemingly good naming and documentation, will not cause same level of confidence. (I trust unfamiliar abstraction naming&docs the same way I trust weather forecast)
I think it may be harder still: typically, when writing against a third-party API, I usually consult that API's documentation. The documentation thus becomes a part of the abstraction boundary, a part that isn't expressed in code.
Oh definitely. And then there are performance considerations, where there are few guarantees and nobody even knows how to create an implementation-independent abstraction boundary.
> If you have to dig into the details of a function to understand what it does, you have failed to sufficiently explain what the function does through its naming and set of arguments.
Which is often unavoidable, many functions are insufficiently explained by those alone unless you want four-word camelcase monstrosities for names. The code of the function should be right-sized. Size and complexity need to be balanced there- simpler and easier-to-follow is sometimes larger. I work on compilers, query processors and compute engines, cognitive load from the subject domains are bad enough without making the code arbitrarily shaped.
[edit] oh yes, what jzoch says below. Locality helps with taming the network of complexity between functions and data.
Quite a bit of sentiment around against long names, I personally am fine with them up to about 30-35 chars or so, then they start to really intrude. Glad you’re not put off by choosing function over form!
Stretch it to 36, and that is four not-all-that short words, at 4× 9 = 36 letters. Form and function! :-)
So it gets monstrous only from five words upwards or so... But still, I think I may by sheer coincidence have come up with a heuristic (that I'm somewhat proud of): The more convoluted the logic = the longer the code needed to express it ==> the longer a name it "deserves".
I think we need to recognize the limits of this concept. To reach for an analogy, both Dr. Seuss and Tolstoy wrote well but I'd much rather inherit source code that reads like 10 pages of the former over 10 pages of the latter. You could be a genuine code-naming artist but at the end of the day all I want to do is render the damn HTML.
> If you have to dig into the details of a function to understand what it does, you have failed to sufficiently explain what the function does through its naming and set of arguments.
This isn't always true in my experience. Often when I need to dig into the details of a function it's because how it works is more important than what it says it's doing. There are implementation concerns you can't fit into a function name.
Additionally, I have found that function names become outdated at about the same rate as comments do. If the common criticism of code commenting is that "comments are code you don't run", function names also fall into that category.
I don't have a universal rule on this, I think that managing code complexity is highly application-dependent, and dependent on the size of the team looking at the code, and dependent on the age of the code, and dependent on how fast the code is being iterated on and rewritten. However, in many cases I've started to find that it makes sense to inline certain logic, because you get rid of the risk of names going out of date just like code comments, and you remove any ambiguity over what the code actually does. There are some other benefits as well, but they're beyond the scope of the current conversation.
Perfect abstractions are relatively rare, so in instances where abstractions are likely to be very leaky (which happens more often than people suspect), it is better to be extremely transparent about what the code is doing, rather than hiding it behind a function name.
> The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows.
I'll also push back against this line of thought. The sum total of possible interactions do not decrease when you move code out into a separate function. The same number of lines of code still get run, and each line carries the same potential to have a bug. In fact, in many cases, adding additional interfaces between components and generalizing them can increase the number of code paths and potential failure points.
If you define complexity by the sum total of possible interactions (which is itself a problematic definition, but I'll talk about that below), then complexity always increases when you factor out functions, because the interfaces, error-handling, and boilerplate code around those functions increases the number of possible interactions happening during your function call.
> The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows.
What I've come to understand is that complexity is relative. A solution that makes a codebase less complex for one person in an organization may make a codebase more complex for someone else in the organization who has different responsibilities over the codebase.
If you are building an application with a large team, and there are clear divisions of responsibilities, then functional boundaries are very helpful because they hide the messy details about how low-level parts of the code work.
However, if you are responsible for maintaining both the high-level and low-level parts of the same codebase, than separating that logic can sometimes make the program harder to manage, because you still have to understand how both parts of the codebase work, but now you also have understand how the interfaces and abstractions between them fit together and what their limitations are.
In single-person projects where I'm the only person touching the codebase I do still use abstractions, but I often opt to limit the number of abstractions, and I inline code more often than I would in a larger project. This is because if I'm the only person working on the code, I need to be able to hold almost the entire codebase in my head at the same time in order to make informed architecture decisions, and managing a large number of abstractions on top of their implementations makes the code harder to reason about and increases the number of things I need to remember. This was a hard-learned lesson for me, but has made (I think) an observable difference in the quality and stability of the code I write.
>> If you have to dig into the details of a function to understand what it does, you have failed to sufficiently explain what the function does through its naming and set of arguments.
> This isn't always true in my experience. Often when I need to dig into the details of a function it's because how it works is more important than what it says it's doing. There are implementation concerns you can't fit into a function name.
Both of these things are not quite right. Yes, if you have to dig into the details of a function to understand what it does, it hasn't been explained well enough. No, the prototype cannot contain enough information to explain it. No, you shouldn't look at the implementation either - that leads to brittle code where you start to rely on the implementation behavior of a function that isn't part of the interface.
The interface and implementation of a function are separate. The former should be clearly-documented - a descriptive name is good, but you'll almost always also need docstrings/comments/other documentation - while you should rarely rely on details of the latter, because if you are, that usually means that the interface isn't defined clearly enough and/or the abstraction boundaries are in the wrong places (modulo things like looking under the hood to refactor, improve performance, etc - all abstractions are somewhat leaky, but you shouldn't be piercing them regularly).
> If you define complexity by the sum total of possible interactions (which is itself a problematic definition, but I'll talk about that below), then complexity always increases when you factor out functions, because the interfaces, error-handling, and boilerplate code around those functions increases the number of possible interactions happening during your function call.
This - this is what everyone who advocates for "small functions" doesn't understand.
> all abstractions are somewhat leaky, but you shouldn't be piercing them regularly).
I think this gets back to the old problem of "documentation is code that doesn't run." I'm not saying get rid of documentation -- I comment my code to an almost excessive degree, because I need to be able to remember in the future why I made certain decisions, I need to know what the list of tradeoffs were that went into a decision, I need to know if there are any potential bugs or edge-cases that I haven't tested for yet.
But what I am saying is that it is uncommon for a interface to be perfectly documented -- not just in code I write, but especially in 3rd-party libraries. It's not super-rare for me to need to dip into library source code to figure out behaviors that they haven't documented, or interfaces that changed between versions and aren't described anywhere. People struggle with good documentation.
Sometimes that's performance: if a 3rd-party library is slow, sometimes it's because of how it's implemented. I've run into that with d3 addons in the past, where changing how my data is formatted results in large performance gains, and only the implementation logic revealed that to me. Is that a leaky abstraction? Sure, I suppose, but it doesn't seem to be uncommon. Is it fragile? Sure, a bit, but I can't release charts that drop frames whenever they zoom just because I refuse to pay attention to the implementation code.
So I get what you're saying, but to me "abstractions shouldn't be leaking" is a bit like saying "code shouldn't have bugs", or "minor semvar increases should have no breaking changes." I completely agree, but... it does, and they do. Relying on undocumented behavior is a problem, but sometimes documented behavior diverges from implementation. Sometimes the abstractions are so leaky that you don't have a choice.
And that's not just a problem with 3rd-party code, because I'm also not a perfect programmer, and sometimes my own documentation on internal methods diverges from my implementation. I try very hard not to have that happen, but I also try hard to compensate for the fact that I'm a human being who makes mistakes. I try to build systems that are less work to maintain and less prone to having their documentation decay over time. I've found that in code that I'm personally writing, it can be useful to sidestep the entire problem and inline the entire abstraction. Then I don't have to worry about fragility at all.
If you're not introducing a 3rd-party library or a separate interface for every measly 50 lines of code, and instead you just embed your single-use chunk of logic into the original function you want to call it in, then you never have to worry about whether the abstraction is leaky. That can have a tangible effect on the maintainability of your program, because it reduces the number of opportunities you have to mess up an interface or its documentation.
For perfect abstractions, I agree with you. I'm not saying get rid of all abstractions. I just think that perfect abstractions are more difficult and rarer than people suppose, and sometimes for some kinds of logic, a perfect abstraction might not exist at all.
Finally! I'm glad to hear I'm not the only one. I've gone against 'Clean Code' zealots that end up writing painfully warped abstractions in the effort to adhere to what is in this book. It's OK to duplicate code in places where the abstractions are far enough apart that the alternative is worse. I've had developers use the 'partial' feature in C# to meet Martin's length restrictions to the point where I have to look through 10-15 files to see the full class.
The examples in this post are excellent examples of the flaws in Martin's absolutism.
"How do you develop good software? First, be a good software developer. Then develop some software."
The problem with all these lists is that they require a sense of judgement that can only be learnt from experience, never from checklists. That's why Uncle Bob's advice is simultaneously so correct, and yet so dangerous with the wrong fingers on the keyboard.
That's why my advice to junior programmers is, pay attention to how you feel while working on your project - especially, when you're getting irritated. In particular:
- When you feel you're just mindlessly repeating the same thing over and over, with minor changes - there's probably a structure to your code that you're manually unrolling.
- As you spend time figuring out what a piece of code does, try to make note of what made gaining understanding hard, and what could be done to make it easier. Similarly, when modifying/extending code, or writing tests, make note of what took most effort.
- When you fix a bug, spend some time thinking what caused it, and how could the code be rewritten to make similar bugs impossible to happen (or at least very hard to introduce accidentally).
Not everything that annoys you is a problem with the code (especially when one's unfamiliar with the codebase or the tooling, the annoyance tends to come from lack of understanding). Not everything should be fixed, even if it's obviously a code smell. But I found that, when I paid attention to those negative feelings, I eventually learned ways to avoid them up front - various heuristics that yield code which is easier to understand and has less bugs.
As for following advice from books, I think the best way is to test the advice given by just applying it (whether in a real or purpose-built toy project) and, again, observing whether sticking to it makes you more or less angry over time (and why). Code is an incredibly flexible medium of expression - but it's not infinitely flexible. It will push back on you when you're doing things the wrong way.
> When you feel you're just mindlessly repeating the same thing over and over, with minor changes - there's probably a structure to your code that you're manually unrolling.
Casey has a good blog post about this where he explains his compression-oriented programming, which is a progressive approach, instead of designing things up front.
I read that a while ago. It's a great article, I second the recommendation! I also love the term, "compression-oriented programming", it clicked in my mind pretty much the moment I saw it.
I like the idea of trying to over-apply a rule on a toy project, so you can get a sense of where it helps and where it doesn't. For example, "build Conway's Game of Life without any conditional branches" or "build FizzBuzz where each function can have only one line of code".
Yeah to some degree. I am in that 25 years of experience range. The software I write today looks much more like year 1 than year 12. The questions I ask in meetings I would have considered "silly questions" 10 years ago. Turns out there was a lot of common sense I was talked out of along the way.
Most people already know what makes sense. It's the absurdity of office culture, JR/SR titles, and perverse incentive that convinces them to walk in the exact opposite direction. Uncle Bob is the culmination of that absurdity. Codified instructions that are easily digested by the lemmings on their way to the cliff's edge.
The profession needs a stronger culture of apprenticeship.
In between learning the principles incorrectly from books, and learning them inefficiently at the school of hard knocks, there's a middle path of learning them from a decent mentor.
The problem is that there is a huge amount of “senior” devs who only got that title for having been around and useless for a long time. It is the best for all to not have them mentoring anyone.
But otherwise I agree, it’s just hard to recognize good programmers.
Also, good programmers don't necessarily make good mentors.
But I imagine these problems aren't unique to the software industry. It can't be the case that every blacksmith was both a good blacksmith and a good mentor, and yet the system of apprenticeship successfully passed down knowledge from generation to generation for a long time. Maybe the problem is our old friend social media, and how it turns all of us into snobs with imposter syndrome, so few of us feel willing and able to be a mentor.
The right advice to give new hires, especially junior ones, is to explain to them that in order to have a good first PR they should read this Wikipedia page first:
Then when their first PR opens up, you can just point to the places where they didn't quite get it right and everyone learns faster. Mentorship helps too, but much of software is self-learning and an hour a week with a mentor doesn't change that.
I've also never agreed completely with Uncle Bob. I was an OOP zealot for maybe a decade, and I'm now I'm a Rust convert. The biggest "feature" of Rust is that is probably brought semi-functional concepts to the "OOP masses." I found that, with Rust, I spent far more time solving the problem at hand...
Instead of solving how I am going to solve the problem at hand ("Clean Coding"). What a fucking waste of time, my brain power, and my lifetime keystrokes[1].
I'm starting to see that OOP is more suited to programming literal business logic. The best use for the tool is when you actually have a "Person", "Customer" and "Employee" entities that have to follow some form of business rules.
In contradiction to your "Uncle Sassy's" rules, I'm starting to understand where "Uncle Beck" was coming from:
1. Make it work.
2. Make it right.
3. Make it fast.
The amount of understanding that you can garner from make something work leads very strongly into figuring out the best way to make it right. And you shouldn't be making anything fast, unless you have a profiler and other measurements telling you to do so.
"Clean Coding" just perpetuates all the broken promises of OOP.
Simula, arguably the first (or at least one of the earliest) OOP languages, was written to simulate industrial processes, where each object was one machine (or station, or similar) in a manufacturing chain.
So, yes, it was very much designed for when you have entities interacting, each entity modeled by a class, and then having one (or more) object instantiations of that class interacting with each other.
These two fall under requirements gathering. It's so often forgotten that software has a specific purpose, a specific set of things it needs to do, and that it should be crafted with those in mind.
> 3. Understand not just where you are but where you are headed
And this is the part that breaks down so often. Because software is simultaneously so easy and so hard to change, people fall into traps both left and right, assuming some dimension of extensibility that never turns out to be important, or assuming something is totally constant when it is not.
I think the best advice here is that YAGNI, don't add functionality for extension unless your requirements gathering suggests you are going to need it. If you have experience building a thing, your spider senses will perk up. If you don't have experience building the thing, can you get some people on your team that do? Or at least ask them? If that is not possible, you want to prototype and fail fast. Be prepared to junk some code along the way.
If you start out not knowing any of these things, and also never junking any code along the way, what are the actual odds you got it right?
>These two fall under requirements gathering. It's so often forgotten that software has a specific purpose, a specific set of things it needs to do, and that it should be crafted with those in mind.
I wish more developers would actually gather requirements and check if the proposed solution actually solves whatever they are trying to do.
I think part of the problem is that often we don't use what we work on, so we focus too much in the technical details, but we forget what the user actually needs and what workflow would be better.
In my previous job, clients were always asking for changes or new features (they paid dev hours for it) and would come with a solution. But I always asked what was the actual problem, and many times, there was a solution that would solve the problem in a better way
>Write code that takes the above 3 into account and make sensible decisions. When something feels wrong ... don't do it.
The problem is that people often need specifics to guide them when they're less experienced. Something that "feels wrong" is usually due to vast experience being incorporated into your subconscious aesthetic judgement. But you can't rely on your subconscious until you've had enough trials to hone your senses. Hard rules can and often are overapplied, but its usually better than the opposite case of someone without good judgement attempting to make unguided judgement calls.
You are right, but also I think the problem discussed in the article is that some of these hard rules are questionable. DRY for example: as a hard rule it leads to overly complex and hard to maintain code because of bad and/or useless abstractions everywhere (as illustrated in TFA). It needs either good experience to sense if they "feel good" like you say, or otherwise proven repetitions to reveal a relevant abstraction.
My last company was very into Clean Code, to the point where all new hires were expected to do a book club on it.
My personal take away was that there were a few good ideas, all horribly mangled. The most painful one I remember was his treatment of the Law of Demeter, which, as I recall, was so shallow that he didn't even really even thoroughly explain what the law was trying to accomplish. (Long story short, bounded contexts don't mean much if you're allowed to ignore the boundaries.) So most everyone who read the book came to earnestly believe that the Law of Demeter is about period-to-semicolon ratios, and proceeded to convert something like
val frobnitz = Frobnitz.builder()
.withPixieDust()
.withMayonnaise()
.withTarget(worstTweetEver)
.build();
into
var frobnitzBuilder = Frobnitz.builder();
frobnitzBuilder = frobnitzBuilder.withPixieDust();
frobnitzBuilder = frobnitzBuilder.withMayonnaise();
frobnitzBuilder = frobnitzBuilder.withTarget(worstTweetEver);
val frobnitz = frobnitzBuilder.build();
and somehow convince themselves that doing this was producing tangible business value, and congratulate themselves for substantially improving the long-term maintainability of the code.
Meanwhile, violations of the actual Law of Demeter ran rampant. They just had more semicolons.
On that note, I've never seen an explanation of Law of Demeter that made any kind of sense to me. Both the descriptions I read and the actual uses I've seen boiled down to the type of transformation you just described, which is very much pointless.
> Long story short, bounded contexts don't mean much if you're allowed to ignore the boundaries.
I'd like to read more. Do you know of a source that covers this properly?
Talk to directly linked objects and tell them what you need done, and let them deal with their linked objects. Don't assume that you know what is and always will be involved in doing something on dependent objects of the one you're interacting with.
E.g. lets say you have a Shipment object that contains information of something that is to be shipped somewhere. If you want to change the delivery address, you should consider telling the shipment to do that rather than exposing an Address and let clients muck around with that directly, because the latter means that now if you need to add extra logic if the delivery address changes there's a chance the changes leaks all over the place (e.g. you decide to automate your customs declarations, and they need to change if the destination country changes; or delivery costs needs to updated).
You'll of course, as always, find people that takes this way too far. But the general principle is pretty much just to consider where it makes sense to hide linked objects behind a special purpose interface vs. exposing them to clients.
If objects are allowed to talk to friends of friends, that greatly increases the level of interdependency among objects, which, in turn, increases the number of ancillary changes you might need to make in order to ensure all the code talking to some object remains compatible with its interface.
More subtly, it's also a code smell that suggests that, regardless of the presence of objects and classes, the actual structure and behavior of the code is more procedural/imperative than object-oriented. Which may or may not be a big deal - the importance of adhering to a paradigm is a decision only you can make for yourself.
Talk to directly linked objects and tell them what you need done, and let them deal with their linked objects. Don't assume that you know what is and always will be involved in doing something on dependent objects of the one you're interacting with.
IMHO, this is one of those ideas you have to consider on its merits for each project.
My own starting point is usually that I probably don’t want to drill into the internals of an entity that are implementation details at a lower level of abstraction than the entity’s own interface. That’s breaking through the abstraction and defeating the point of having a defined interface.
However, there can also be relationships between entities on the same level, for example if we’re dealing with domain concepts that have some sort of hierarchical relationship, and then each entity might expose a link to parent or child entities as part of its interface. In that case, I find it clearer to write things like
if (entity.parent.interesting_property !== REQUIRED_VALUE) {
abort_invalid_operation();
}
instead of
let parent_entity = entities.find(entity.parent_id);
if (parent_entity.interesting_property !== REQUIRED_VALUE) {
abort_invalid_operation();
}
and this kind of pattern might arise often when we’re navigating the entity relationships, perhaps finding something that needs to be changed and then checking several different constraints on the overall system before allowing that change.
The “downside” of this is that we can no longer test the original entity’s interface in isolation with unit tests. However, if the logic requires navigating the relationships like this, the reality is that individual entities aren’t independent in that sense anyway, so have we really lost anything of value here?
I find that writing a test suite at the level of the overall set of entities and their relationships — which is evidently the smallest semantically meaningful data set if we need logic like the example above — works fine as an alternative to dogmatically trying to test the interface for a single entity entirely in isolation. The code for each test just sets up the store of entities and adds the specific instances and relationships I want for each test, which makes each test scenario nicely transparent. This style also ensures the tests only depend on real code, not stand-ins like mocks or stubs.
I don't think the two versions are relevant to Law of Demeter. One example has pointers/references in a strong tree and another has indexed ones, but neither is embracing LoD more or less than the other.
This would be a more relevant example:
parent_entity.children.remove(this)
vs
parent_entity.remove_child(this)
...Where remove_child() would handle removing the entity from `children` directly, and also perhaps busting a cache, or notifying the other children that the heirarchy has changed, etc etc.
Going back to your original case, you _could_ argue that LoD would advise you to create a method on entity which returns the parent, but I think that would fall under encapsulation. If you did that though, you could hide the implementation detail of whether `parent` is a reference or an ID on the actual object, which is what most ORMs will do for you.
Ah, but what if children is some kind of List or Collection which can be data-bound? By Liskov's substition principle, you ought to be able to pass it to a Collection-modifying routine and have it function correctly. If the parent must be called the children member should be private, or else the collection should implement eventing and the two methods should have the same effect (and ideally you'd remove one).
That takes us back up to viardh's concluding remark from earlier in the thread:
> You'll of course, as always, find people that takes this way too far. But the general principle is pretty much just to consider where it makes sense to hide linked objects behind a special purpose interface vs. exposing them to clients.
I would say that if you're using a ViewModel object that will be data-bound, then you're sort of outside the realm of the Law of Demeter. It's really more meant to concern objects that implement business logic, not ones that are meant to be fairly dumb data containers.
On the other hand, if it is one that is allowed to implement business logic, then I'd say, yeah, making the children member public in the first place is violating the law. You want to keep that hidden and supply a remove_child() method instead, so that you can retain the ability to change the rules around when and how children are removed, without violating LSP in the process.
In the other branch I touched on this- iterating children still being a likely use case after all, you have the option of exposing the original or making a copy which could have perf impacts.
But honestly best to not preoptimize, I would probably do
private _children : Entity[];
get children() {
return this._children.slice();
}
And reconsider the potential mutation risk later if the profiler says it matters.
It can be, though there are some interesting philosophical issues there.
The example that I always keep coming back to is Smalltalk, which is the only language I know of that represents pure object-oriented programming. Similar to how, for the longest time, Haskell was more-or-less the only purely functional language. Anyway, in Smalltalk you generally would not do that. You'd tell the object to iterate its own children, and give it some block (Smalltalk equivalent of an anonymous function) that tells it what to do with them.
Looping over a data structure from the outside is, if you want to get really zealous about it, is more of a procedural and/or functional way of doing things.
FWIW, I was writing JavaScript in that example, so `entity.parent` might have been implemented internally as a function anyway:
get parent() {
return this.entities.find(this.parent_id);
}
I don’t think whether we write `entity.parent` or `entity.parent()` really matters to the argument, though.
In any case, I see what you’re getting at. Perhaps a better way of expressing the distinction I was trying to make is whether the nested object that is being accessed in a chain can be freely used without violating any invariants of the immediate object. If not, as in your example where removing a child has additional consequences, it is probably unwise to expose it directly through the immediate object’s interface.
Yes, its a great case for making the actual `children` collection private so that mutation must go through the interface methods instead. But still, iteration over the children is a likely use case, so you are left with either exposing the original object or returning a copy of the array (potentially slower, though this might not matter depending).
That problem could potentially be solved if the programming language supports returning some form of immutable reference/proxy/cursor that allows a caller to examine the container but without being able to change anything. Unfortunately, many popular languages don’t enforce transitive immutability in that situation, so even returning an “immutable” version of the container doesn’t prevent mutation of its contained values in those languages. Score a few more points for the languages with immutability-by-default or with robust ownership semantics and support for transitive immutability…
Very true. JS has object freezing but that would affect the class's own mutations. On the other hand you could make a single copy upon mutation, freeze it, and then return the frozen one for iteration if you wanted to. Kind of going a bit far though imho.
If you really want to dig into it, perhaps a book on domain-driven design? That's where I pulled the term "bounded context" from.
My own personal oversimplification, probably grossly misleading, is that DDD is what you get when you take the Law of Demeter and run as far with it as you can.
I have Evans' book on my bookshelf. I understand it's the book on DDD, right? I tried to read it a while ago, I got through about one third of it before putting it away. Maybe I should revisit it.
Agree that the transformation described is pointless.
A more interesting statement, but I am not sure it is exactly equivalent to the law of Demeter:
Distinguish first between immutable data structures (and I'd group lambdas with them), and objects. An Object is something more than just a mutable data structure, one wants to also fold in the idea that some of these objects exist in a global namespace providing a named mutable state to the entire rest of the program. And to the extent that one thinks about threads one thinks about objects as providing a shared-state multithread story that requires synchronization, and all of that.
Given that distinction, one has a model of an application as kind of a factory floor, there are widgets (data structures and side-effect-free functions) flowing between Machines (big-o Objects) which process them, translate them, and perform side-effecting I/O and such.
Quasi-law-of-Demeter: in computing you have the power to also send a Machine down a conveyor belt, and build other Machines which can respond to that.[1] This is a tremendous power and it comes with tremendous responsibility. Think a system which has something like "Hey, we're gonna have an Application store a StorageMechanism and then in the live application we can swap out, without rebooting, a MySQL StorageMechanism for a local SQLite Storage Mechanism, or a MeticulouslyLoggedMySQL Storage Mechanism which is exactly like the MySQL one except it also logs every little thing it does to stdout. So when our application is misbehaving we can turn on logging while it's misbehaving and if those logs aren't enough we can at least sever the connection with the live database and start up a new node while we debug the old one and it thinks it's still doing some useful work."
The signature of this is being identified by this quasi-Law-of-Demeter as this "myApplication.getStorageMechanism().saveRecord(myRecord)" chaining. The problem is not the chaining itself; the idea would be just as wrong with the more verbose "StorageMechanism s = myApp.getStorageMechanism(); s.saveRecord(myRecord)" type of flow. The problem is just that this superpower is really quite powerful and YAGNI principles apply here: you probably don't need the ability for an application to hot-swap storage mechanisms this way.[2]
Bounded contexts[3] are kind of a red herring here, they are extremely handy but I would not apply the concept in this context.
1. FWIW this idea is being shamelessly stolen from Haskell where the conveyor belt model is an "Arrow" approach to computing and the idea that a machine can flow down a conveyor belt requires some extra structure, "ArrowApply", which is precisely equivalent to a Monad. So the quasi-law-of-Demeter actually says "avoid monads when possible", hah.
2. And of course you may run into an exception to it and that's fine, if you are aware of what you are doing.
3. Put simply a bounded context is the programming idea of "namespace" -- a space in which the same terms/words/names have a different meaning than in some other space -- applied to business-level speech. Domain-driven design is basically saying "the words that users use to describe the application, should also be the words that programmers use to describe it." So like in original-Twitter the posts to twitter were not called tweets, but now that this is the common name for them, DDD says "you should absolutely create a migration from the StatusUpdate table to the Tweet table, this will save you incalculable time in the long-run because a programmer may start to think of StatusUpdates as having some attributes which users don't associate with Tweets while users might think of Tweets as having other properties like 'the tweet I am replying to' which programmers don't think the StatusUpdates should have... and if you're not talking the same language then every single interaction consists of friction between you both." The reason we need bounded contexts is that your larger userbase might consist both of Accountants for whom a "Project" means ABC, and Managers for whom a "Project" means DEF, and if you try to jam those both into the Project table because they both have the same name you're gonna get hurt real bad. In turn, DDD suggests that once you can identify where those namespace boundaries seem to exist in your domain, those make good module boundaries, since modules are the namespaces of our software world. And then if say you're doing microservices, instead of pursuing say the "strong entity" level of ultra-fine granularity, "every term used in my domain deserves its own microservice," try coarser-graining it by module boundaries and bounded contexts, create a "mini-lith" rather than these "nanoservices" that each manage one term of the domain... so the wisdom goes.
I love how this is clearly a contextual recommendation.
I'm not a software developer, but a data scientist. In pandas, to write your manipulations in this chained methods fashing is highly encouraged IMO. It's even called "pandorable" code
The latter example (without the redundant assignments) is preferred by people who do a lot of line-by-line debugging. While most IDEs allow you to set a breakpoint in the middle of an expression, that's still more complicated and error prone than setting one for a line.
I've been on a team that outlawed method chaining specifically because it was more difficult to debug. Even though I'm more of a method-chainer myself, I have taken to writing unchained code when I am working on a larger team.
var frobnitzBuilder = Frobnitz.builder();
frobnitzBuilder.withPixieDust();
frobnitzBuilder.withMayonnaise();
frobnitzBuilder.withTarget(worstTweetEver);
val frobnitz = frobnitzBuilder.build();
...is undeniably easier to step-through debug than the chained version.
TBH, the only context where I've seen people express a strong preference for the non-chained option is under the charismatic influence of Uncle Bob's baleful stare.
Otherwise, it seems opinions typically vary between a strong preference for chaining, and rather aggressive feelings of ¯\_(ツ)_/¯
I think following some ideas in the book, but ignoring others like the ones applicable for the law of demeter can be a recipe for a mess. The book is very opinionated, but if followed well I think it can produce pretty dead simple code. But at the same time, just like with any coding, experience plays massively into how well code is written. Code can be written well when using his methods or when ignoring his methods and it can be written badly when trying to follow some of his methods or when not using his methods at all.
>his treatment of the Law of Demeter, which, as I recall, was so shallow that he didn't even really even thoroughly explain what the law was trying to accomplish.
oof. I mean, yeah, at least explain what the main thing you’re talking about is about, right? This is a pet peeve.
> It's OK to duplicate code in places where the abstractions are far enough apart that the alternative is worse.
I don't recall where I picked up from, but the best advice I've heard on this is a "Rule of 3". You don't have a "pattern" to abstract until you reach (at least) three duplicates. ("Two is a coincidence, three is pattern. Coincidences happen all the time.") I've found it can be a useful rule of thumb to prevent "premature abstraction" (an understandable relative of "premature optimization"). It is surprising sometimes the things you find out about the abstraction only happen when you reach that third duplicate (variables or control flow decisions, for instance, that seem constants in two places for instance; or a higher level idea of why the code is duplicated that isn't clear from two very far apart points but is clearer when you can "triangulate" what their center is).
I don't hate the rule of 3. But i think it's missing the point.
You want to extract common code if it's the same now, and will always be the same in the future. If it's not going to be the same and you extract it, you now have the pain of making it do two things, or splitting. But if it is going to be the same and you don't extract it, you have the risk of only updating one copy, and then having the other copy do the wrong thing.
For example, i have a program where one component gets data and writes it to files of a certain format in a certain directory, and another component reads those files and processes the data. The code for deciding where the directory is, and what the columns in the files are, must be the same, otherwise the programs cannot do their job. Even though there are only two uses of that code, it makes sense to extract them.
Once you think about it this way, you see that extraction also serves a documentation function. It says that the two call sites of the shared code are related to each other in some fundamental way.
Taking this approach, i might even extract code that is only used once! In my example, if the files contain dates, or other structured data, then it makes sense to have the matching formatting and parsing functions extracted and placed right next to each other, to highlight the fact that they are intimately related.
> You want to extract common code if it's the same now, and will always be the same in the future.
I suppose I take that as a presumption before the Rule of 3 applies. I generally assume/take for granted that all "exact duplicates" that "will always be the same in future" are going to be a single shared function anyway. The duplication I'm concerned about when I think the Rule of 3 comes into play is the duplicated but diverging. ("I need to do this thing like X does it, but…")
If it's a simple divergence, you can add a flag sometimes, but the Rule of 3 suggests that sometimes duplicating it and diverging it that second time "is just fine" (avoiding potentially "flag soup") until you have a better handle on the pattern for why you are diverging it, what abstraction you might be missing in this code.
The rule of three is a guideline or principle, not a strict rule. There's nothing about it that misses the point. If, from your experience and judgement, the code can be reused, reuse it. Don't duplicate it (copy/paste or write it a second time). If, from your experience and judgement, it oughtn't be reused, but you later see that you were wrong, refactor.
In your example, it's about modularity. The directory logic makes sense as its own module. If you wrote the code that way from the start, and had already decoupled it from the writer, then reuse is obvious. But if the code were tightly coupled (embedded in some fashion) within the writer, than rewriting it would be the obvious step because reuse wouldn't be practical without refactoring. And unless you can see how to refactor it already, then writing it the second time (or third) can help you discover the actual structure you want/need.
As people become more experienced programmers, the good ones at least, already tend to use modular designs and keep things decoupled which promotes reuse versus copy/paste. In that case, the rule of three gets used less often by them because they have fewer occurrences of real duplication.
I think the point you and a lot of other commenters make is that applying hard and fast rules without referring to context is simply wrong. Surely if all we had to do was apply the rules, somebody would have long ago written a program to write programs. ;-)
You have a point for extracting exact duplicates that you know will remain the same.
But the point of the rule of 3 remains. Humans do a horrible job of abstracting from one or two examples, and the act of encountering an abstraction makes code harder to understand.
Also known as, “AbstractMagicObjectFactoryBuilderImpl” that builds exactly one (1) factory type that generates exactly (1) object type with no more than 2 options passed into the builder and 0 options passed into the factory. :-)
The Go proverb is "A little copying is better than a little dependency." Also don't deduplicate 'text' because it's the same, deduplicate implementations if they match in both mechanism 'what' it does as well as their semantic usage. Sometimes the same thing is done with different intents which can naturally diverge and the premature deduplication is debt.
I'm coming to think that the rule of three is important within a fairly constrained context, but that other principle is worthwhile when you're working across contexts.
For example, when I did work at a microservices shop, I was deeply dissatisfied with the way the shared utility library influenced our code. A lot of what was in there was fairly throw-away and would not have been difficult to copy/paste, even to four or more different locations. And the shared nature of the library meant that any change to it was quite expensive. Technically, maybe, but, more importantly, socially. Any change to some corner of the library needed to be negotiated with every other team that was using that part of the library. The risk of the discussion spiraling away into an interminable series of bikesheddy meetings was always hiding in the shadows. So, if it was possible to leave the library function unchanged and get what you needed with a hack, teams tended to choose the hack. The effects of this phenomenon accumulated, over time, to create quite a mess.
An old senior colleague of mine used to insist that if i added a script to the project, i had to document it on the wiki. So i just didn't add my scripts to the project.
I'd argue if the code was "fairly throw-away" it probably did not meet the "Rule of 3" by the time it was included in the shared library in the first place.
At a previous company, there was a Clean Code OOP zealot. I heard him discussing with another colleague about the need to split up a function because it was too long (it was 10 lines). I said, from the sidelines, "yes, because nothing enhances readability like splitting a 10 line function into 10, 1-line functions". He didn't realize I was being sarcastic and nodded in agreement that it would be much better that way.
There seems to be a lot of overlap between the Clean Coders and the Neo Coders [0]. I wish we could get rid of both.
[0] People who strive for "The One" architecture that will allow any change no matter what. Seriously, abstraction out the wazoo!
Honestly. If you're getting data from a bar code scanner and you think, "we should handle the case where we get data from a hot air balloon!" because ... what if?, you should retire.
The problem is that `partial` in C# should never even have been considered as a "solution" to write small, maintainable classes. AFAIK partial was introduced for code-behind files, not to structure human written code.
Anyways, you are not alone with that experience - a common mistake I see, no matter what language or framework, is that people fall for the fallacy "separation into files" is the same as "separation of concerns".
Seriously? That's an abuse of partial and just a way of following the rules without actually following them. That code must have been fun to navigate...
Many years ago I worked on a project that had a hard “no hard coded values” rule, as requested by the customer. The team routinely wrote the equivalent to
const char_a = “a”;
And I couldn’t get my manager to understand why this was a problem.
Clearly it is still a hardcoded value! It fails the requirement. Instead there should be a factory that loads a file that reads in the "a" to the variable, nested down under 6 layers of abstraction spread across a dozen files.
True, but we can both agree that this is a better constant than "a". Much better job security in that code.. unless you get fired for writing it, that is
What, in the code base, does char_a mean? Is it append_command? Is it used in a..z for all lowercase? Maybe an all_lowercase is needed instead.
I know that it's an "A". I don't know why that matters to the codebase. Now, there are cases where it is obvious to even beginners, and I'm fine with it as magic characters, but I've seen cases where people were confused to why 1000 was in a `for(int i = 0; i<1000; i++)`. Is 1000 arbitrary in that case, or is it based on a defunct business requirement in 2008? Will it break if we change it to 5000 because our computers are faster now?
> where I have to look through 10-15 files to see the full class
The Magento 2 codebase is a good example of this. It's both well written and horrible at the same time. Everything is so spread out into constituent technical components, that the code loses the "narrative" of what's going on.
I started OOP in '96 and I was never able to wrap my head around the code these "Clean Code" zealots produced.
Case in point: Bob Martin's "Video Store" example.
My best guess is that clean code, to them, was as little code on the screen as possible, not necessarily "intention revealing code either", instead everything is abstracted until it looks like it does nothing.
I have had the experience of trying to understand how a feature in a C++ project worked (both Audacity and Aegisub I think) only to find that I actually could not find where anything was implemented, because everything was just a glue that called another piece of glue.
Also sat in their IRC channel for months and the lead developer was constantly discussing how he'd refactor it to be cleaner but never seemed to add code that did something.
SOLID code is a very misleading name for a technique that seems to shred the code into confetti.
I personally don't feel all that productive spending like half my time just navigating the code rather than actually reading it, but maybe it's just me.
> I personally don't feel all that productive spending like half my time just navigating the code rather than actually reading it
Had this problem at a previous job - main dev before I got there was extremely into the whole "small methods" cult and you'd regularly need to open 5-10 files just to track down what something did. Made life - and the code - a nightmare.
What is mostly surprising I find most of developers are trying to obey the "rules". Code containing even minuscule duplication must be DRYied, everyone agrees that code must be clean and professional.
Yet it is never enough, bugs are showing up and stuff that was written by others is always bad.
I start thinking that 'Uncle Bob' and 'Clean code' zealots are actually harmful, because it prevents people from taking two steps back and thinking about what they are doing. Making microservices/components/classes/functions that end up never reused and making DRY holy grail.
Personally I am YAGNI > DRY and a lot of times you are not going to need small functions or magic abstractions.
I think the problem is not the book itself, but people thinking that all the rules apply to all the code, al the time. A length restriction is interesting because it makes you think if maybe you should spit your function into more than one, as you might be doing too much in one place. Now, if splitting will make things worse, then just don't.
In C# and .NET specifically, we find ourselves having a plethora of services when they are "human-readable" and short.
A service has 3 "helper" services it calls, which may, in turn have helper services, or worse, depend on a shared repo project.
The only solution I have found is to move these helpers into their own project, and mark the helpers as internal. This achieves 2 things:
1. The "sub-services" are not confused as stand-alone and only the "main/parent" service can be called.
2. The "module" can now be deployed independently if micro-services ever become a necessity.
I would like feedback on this approach. I do honestly thing files over 100 lines long are unreadable trash, and we have achieved a lot be re-using modular services.
We are 1.5 years into a project and our code re-use is sky-rocketing, which allows us to keep errors low.
Of course, a lot of dependencies also make testing difficult, but allow easier mocks if there are no globals.
>I would like feedback on this approach. I do honestly thing files over 100 lines long are unreadable trash
Dunno if this is the feedback you are after, but I would try to not be such an absolutist. There is no reason that a great 100 line long file becomes unreadable trash if you add one line.
I mean feedback on a way to abstract away helper services. As far as file length, I realize that this is subjective, and the 100 lines number is pulled out of thing air, but extremely long files are generally difficult to read and context gets lost.
Putting shared context in another file makes it harder to read though. Files should be big enough to represent some reasonable context, for more complicated things that necessary creates a big shared context you want bigger files and simpler things smaller files.
A thing that can be perfectly encapsulated in a 1000 line file with a small clean interface is much much better than splitting that up into 20 files 100 lines each calling each other.
What do you say to convince someone? It’s tricky to review a large carefully abstracted PR that introduces a bunch of new logic and config with something like: “just copy paste lol”
Yes, it's the first working implementation before good boundaries are not yet known. After a while it becomes familiar and natural conceptual boundaries arise that leads to 'factoring' and shouldn't require 'refactoring' because you prematurely guessed the wrong boundaries.
I'm all for the 100-200 line working version--can't say I've had a 500. I did once have a single SQL query that was about 2 full pages pushing the limits of DB2 (needed multiple PTFs just to execute it)--the size was largely from heuristic scope reductions. In the end, it did something in about 3 minutes that had no previous solution.
> WET (Write everything twice), figure out the abstraction after you need something a 3rd time
so much this. it is _much_ easier to refactor copy pasta code, than to entangle a mess of "clean code abstractions" for things that isn't even needed _once_. Premature Abstraction is the biggest problem in my eyes.
I think where DRY trips people up is when you have what I call "incidental repetition". Basically, two bits of logic seem to do exactly the same thing, but the contexts are slightly different. So you make a nice abstraction that works well until you need to modify the functionality in one context and not the other...
If you mostly deduplicate by writing functions, fixing this problem is never very hard: duplicate the function, rename it and change the call-site.
The interesting thing about DRY is that opinions about it seem to depend on what project you’ve worked on most recently: I inherited a codebase written by people skeptical of DRY, and we had a lot of bugs that resulted from “essential duplication”. Other people inherit code written by “architecture astronauts”, and assume that the grass is greener on the WET side of the fence.
Personally, having been in both situations, I’d almost always prefer to untangle a bad abstraction rather than maintain a WET codebase.
Conversely, fixing duplication is never hard. Just move the duplicated code into a function. Going in reverse can be much tougher if the function has become an abstraction, where you have to figure out what path each function call was actually taking.
Or, put another way: I'd much rather deal with duplication than with coupling problems.
The problem with duplication it is hard to spot and fix. Converting liters to ml or quarts isn't hard, but the factors are different, and there is also other units. If you only do a few of these isn't a big deal, but if you suddenly realize that you have tons of different conversions scattered around and you really need to implement a good unit conversion system it will be really hard to retrofit everything. Note that even if you have a literToml, Literto Quart and MileToKm functions retrofitting the right system will be hard. (Where I work we have gone through 4 different designs of a uber unit system module before we actually got all the requirements right, and each transition was a major problem)
> Conversely, fixing duplication is never hard. Just move the duplicated code into a function.
I think the single biggest factor determining the difficulty of a code change is the size of the codebase. Codebases with a lot of duplication are larger, and the scale itself makes them harder to deal with. You may not even realize when duplication exists because it may be spread throughout the code. It may be difficult to tell which code is a duplicate, which is different for arbitrary reasons, and which is different in subtle but important ways.
Once you get to a huge sprawl of code that has a ton of mostly-pointless repetition, it is a nightmare to tame it. I would much rather be handed a smaller but more intertwined codebase that only says something once.
I think the opposite is true. Bad abstractions can be automatically removed with copy/paste and constant propagation. N pieces of code that are mostly the same but have subtle differences have no automatic way to combine them into a single function.
N pieces of code that are mostly the same but have subtle differences isn't repetition and probably shouldn't be combined into a single function, especially if the process of doing so it non-obvious.
> N pieces of code that are mostly the same but have subtle differences isn’t repetition
Often, they are.
IME, a very common pattern is divergent copypasta where – because there is no connection once the copying occurs – a bug is noticed and fixed in one place and not in others, later noticed separately and fixed a slighly different way in some of the others, in still others a different thing that needs done in the same function gets inserted in between parts of the copypasta, etc. IT’s still essentially duplication – more over its still the same logical function being performed different places, but in slightly different ways, creating a singificant multiple on maintainance cost, which – not literal code duplication in and of itself, is the actual problem addressed with DRY, which is explicitly not about duplication of code but single source of truth of knowledge: “Every piece of knowledge must have a single, unambiguous, authoritative representation within a system”. Divergent implementations of the same logical function are different representations of the same knowledge.
Often it is. Earlier in this thread I used the example of a unit system - one example of there there can be a ton of repetition to remove, but there are fundamental differences between liters and meters that make removing the duplication hard if you didn't realize upfront you had a problem. Once you get it right converting meters to chains isn't that hard (I wonder how many reading even know chain was a unit of measure - much less have any idea how big it is), but there are a ton of choices to make it work.
I think they mean if the code does the same thing but has syntax differences. Variables are named differently, one uses a for loop while the other uses functional list operations, etc.
You never know if these subtle differences were intentional to begin with. It might have been the same once upon a time, but then during an emergency outage someone makes a quick fix in one place but forgets to update all other copies of this code.
Repeating what others already mentioned, often it can be the same thing but written in a slightly different way. Even basic stuff like string formatting vs string concatenation can make it non-obvious that two pieces of code are copies.
The issue I have is that duplication is a coupling problem, but there’s no evidence in the coupled parts of the code that the coupling exists. It can be ok on a single-developer project or if confined to a single file, but my experience is that it’s a major contributor to unreliability, especially when all the original authors of the code have left.
If you find a bug in the duplicated part and has no idea that it was actually duplicated (or even if you do, where are they?), you still have multiple lurking bugs around.
Fixing duplication is never hard because by nature, duplicated code will drift over time even if it shouldn't have. So it's technically not "duplicate" anymore even if they are supposed to do the exact same thing.
Fixing a bad abstraction is only hard because there's some weird culture about not tearing down abstractions. Rip them apart and toss them on the heap. It's a million times easier than finding duplicate code that has inadvertently drifted apart over time.
There are certain abstractions that can cause real problems: XML-driven DI, bad ORMs, code generators, etc. But, in general, I agree: people are generally too unwilling to refactor aggressively.
Yep. Re-use is difficult. When you overdo it you cause a whole new set of problems. I once watched a person write a python decorator that tried unify HTTP header based caching with ad hoc application caching done using memcached.
When I asked exactly what they were trying to accomplish they kept saying "they didn't want caching in two places". I think anyone with experience can see that these items are unrelated beyond both having the word "cache"
This is what was actually hanging them up ... the language. Particularly the word "cache". I've seen this trap walked into over and over. Where a person tries to "unify" logic simply because two sets of logic contain some of the same words.
So much this. Especially early in project's life when you aren't sure what the contexts/data model/etc really need to be, so much logic looks the same. It becomes so hard to untangle later.
Then you copy function and modify one place? I don't get what is so hard about it. The IDE will even tell you about all places where function is called, there is no way to miss one.
So long as it remains identical. Refactoring almost identical code requires lots of extremely detailed staring to determine whether or not two things are subtly different. Especially if you don't have good test coverage to start with.
I personally love playing the game called "reconcile these very-important-but-utterly-without-tests sequences of gnarly regexes that were years ago copy-pasted in seven places and since then refactored, but only in three of the seven places, and in separate efforts each time".
There's a problem with being overly zealous. It's entirely possible to write bad code, either being overly dry or copy paste galore. I think we are prone to these zealous rules because they are concrete. We want an "objective" measure to judge whether something is good or not.
DRY and WET are terms often used as objective measures of implementations, but that doesn't mean that they are rock solid foundations. What does it mean for something to be "repeated"? Without claiming to have TheGreatAnswer™, some things come to mind.
Chaining methods can be very expressive, easy to follow and maintain. They also lead to a lot of repetition. In an effort to be "DRY", some might embark on a misguided effort to combine them. Maybe start replacing
`map(x => x).reduce(y, z => v)`
with
`mapReduce(x => x, (y,z) => v)`
This would be a bad idea, also known as Suck™.
But there may equally be situations where consolidation makes sense. For example, if we're in an ORM helper class and we're always querying the database for an object like so
I totally agree assuming that there will be time to get to the second pass of the "write everything twice" approach...some of my least favorite refactoring work has been on older code that was liberally copy-pasted by well-intentioned developers expecting a chance to come back through later but who never get the chance. All too often the winds of corporate decision making will change and send attention elsewhere at the wrong moment, and all those copy pasted bits will slowly but surely drift apart as unfamiliar new developers come through making small tweaks.
I worked on a small team with a very "code bro" culture. No toxic, but definitely making non-PC jokes. We would often say "Ask your doctor about Premature Abstractuation" or "Bad news, dr. says this code has abstractual dysfunction" in code reviews when someone would build an AbstractFactoryFactoryTemplateConstructor for a one-off item.
When we got absorbed by a larger team and were going to have to "merge" our code review / git history into a larger org's repos, we learned that a sister team had gotten in big trouble with the language cops in HR when they discovered similar language in their git commit history. This brings back memories of my team panicked over trying to rewrite a huge amount of git history and code review stuff to sanitize our language before we were caught too.
Those aren't particularly bad, but dick jokes have no place in a professional workplace imo. I'm so very weary of it. And yes, in the past I was part of the problem. Then I grew up.
It wasn't just these phrases, but we took them out to be safe (thinking that them being jokes about male... "performance", like from pharma ads on TV with an old man throwing a football through a tire swing).
The biggest thing is that we used non-PC language like "retarded" very casually, not to mention casually swearing in commit messages e.g. "un-fucking up my prior commit". Our sister team got in trouble for "swears in the git commit history", so we wanted to get ahead of that if possible.
In a healthy company culture, we'd just say "okay we'll stop using these terms", but the effort was made to erase their existence because this was a company where non-engineering people (e.g. how well managers and HR liked you) was a big factor in getting promoted.
Once I realized how messed up that whole situation was, I left as fast as I could.
> it is _much_ easier to refactor copy pasta code,
Its easy to refactor if its nondivergent copypasta and you do it everywhere it is used not later than the third iteration.
If the refactoring gets delayed, the code diverges because different bugs are noticed and fixed (or thr same bug is noticed and fixed different ways) in different iterations, and there are dozens of instances across the code base (possibly in different projects because it was copypastad across projects rather than refactored into a reusable library), the code has in many cases gotten intermixed with code addressing other concerns...
OMG. This is exactly my experience after trying to write code first for 10+ years. (Yes, I am a terrible [car] driver, and a totally average programmer!)
"Bad programmers worry about the code. Good programmers worry about data structures and their relationships." - Linus Torvalds
He wasn't kidding!
And the bit about "immutable structures". I doubted for infinity-number-of-years ("oh, waste of memory/malloc!"). Then suddenly your code needs to be multi-threaded. Now, immutable structures looks genius!
I think this ties in to something I've been thinking, though it might be project specific.
Good code should be written to be easy to delete.
'Clever' abstractions work against this. We should be less precious about our code and realise it will probably need to change beyond all recognition multiple times. Code should do things simply so the consequences of deleting it are immediately obvious. I think your recommendations fit with this.
Aligns with my current meta-principle, which is that good code is malleable (easily modified, which includes deletion). A lot of design principles simply describe this principle from different angles. Readable code is easy to modify because you can understand it. Terse code is more easily modified because there’s less of it (unless you’ve sacrificed readability for terseness). SRP limits the scope of changes and thus enhances modifiability. Code with tests is easier to modify because you can refactor with less fear. Immutability makes code easier to modify because you don’t have to worry about state changes affecting disparate parts of the program.
Etc... etc...
(Not saying that this is the only quality of good code or that you won’t have to trade some of the above for performance or whatnot at times).
The unpleasant implication of this is that code has a tendency towards becoming worse over time. Because the code that is good enough to be easy to delete or change is, and the code that is too bad to be touched remains.
> * WET (Write everything twice), figure out the abstraction after you need something a 3rd time
There are two opposite situations. One is when several things are viewed as one thing while they're actually different (too much abstraction), and another, where a thing is viewed as different things, when it's actually a single one (when code is just copied over).
In my experience, the best way to solve this is to better analyse and understand the requirements. Do these two pieces of code look the same because they actually mean thing in the meaning of the product? Or they just happen to look the same at this particular moment in time, and can continue to develop in completely different directions as the product grows?
I read Clean Code in 2010 and trying out and applying some of the principles really helped to make my code more maintainable.
Now over 10 years later I have come to realize that you cannot set up too many rules on how to structure and write code. It is like forcing all authors to apply the same writing style or all artists to draw their paintings with the exact same technique.
With that analogy in mind, I think that one of the biggest contributors to messy code is having a lot of developers, all with different preferences, working in the same code base. Just imagine having 100 different writers trying to write a book, this is the challenge we are trying to address.
I'm not sure that's really true. Any publication with 100 different writers almost certainly has some kind of style guide that they all have to follow.
If it's really abstractable it shouldn't be difficult to refactor. It should literally be a substitution. If it's not, then you have varied cases that you'd have to go back and tinker with the abstraction to support.
It's a similar design and planning principle to building sidewalks. You have buildings but you don't know exactly the best paths between everything and how to correctly path things out. You can come up with your own design but people will end up ignoring them if they don't fit their needs. Ultimately, you put some obvious direct connection side walks and then wait to see the paths people take. You've now established where you need connections and how they need to be formed.
I do a lot of prototyping work and if I had to sit down and think out a clean abstraction everytime I wanted to get for a functional prototype, I'd never have a functional prototype--plus I'd waste a lot of cognitive capacity on an abstraction instead of solving the problem my code is addressing. It's best, from my experience, to save that time and write messy code but tuck in budget to refactor later (the key is you have to actually refactor later not just say you will).
Once you've built your prototype, iterated on it several times had people break designs forcing hacked out solutions, and now have something you don't touch often, you usually know what most the product/service needs to look like. You then abstract that out and get 80-90% of what you need if there's real demand.
The expanded features beyond that can be costly if they require significant redesign but at that point, you hopefully have a stable enough product it can warrant continued investment to refactor. If it doesn't, you saved yourself a lot of time and energy worrying trying to create a good abstract design that tends to fail multiple times at early stages. There's a balance point of knowing when to build technical debt, when to pay it off, and when to nullify it.
Again, the critical trick is you have to actually pay off the tech debt if that time comes. The product investor can't look bright eyed and linearly extrapolate progress so far thinking they saved a boatload of capital, they have to understand shortcuts were taken and the rationale was to fix them if serious money came along or chuck them in the bin if not.
> If it's really abstractable it shouldn't be difficult to refactor. It should literally be a substitution.
This is overstated. Not all abstractions are obvious substitutions. To elaborate: languages vary in their syntax, typing, and scope. So what might be an 'easy' substitution and one language might not be easy in another.
WET is great until JSON token parsing breaks and a junior dev fixes it in one place and then I am fixing the same exact problem somewhere else and moving it into a shared file. If it's the exact same functionality, move it into a service/helper.
How do you deal with other colleagues that have all the energy and time to push for these practices and I feel makes things worse than the current state?
Explain that the wrong abstraction makes code more complicated than copy-paste and that before you can start factoring out common code you need to be sure the relationship is fundamental and not coincidental.
> figure out the abstraction after you need something a 3rd time
That's still too much of a "rule".
Whenever I feel (or know) two functions are similar, the factors that determine if I should merge them:
- I see significant benefit too doing so, usually the benefit of a utility that saves writing the same thing in the future, or debugging the same/similar code repeatedly.
- How likely the code is to diverge. Sometimes I just mark things for de-duping, but leave it around a while to see if one of the functions change.
- The function is big enough it cannot just be in-lined where it is called, and the benefit of de-duplication is not outweighed by added complexity to the call stack.
Documentation is rarely adequately maintained, and nothing enforces that it stay accurate and maintained.
Comments in code can lie (they're not functional); can be misplaced (in most languages, they're not attached to the code they document in any enforced way); are most-frequently used to describe things that wouldn't require documenting if they were just named properly; are often little more than noise. Code comments should be exceedingly-rare, and only used to describe exception situations or logic that can't be made more clear through the use of better identifiers or better-composed functions.
External documentation is usually out-of-sight, out-of-mind. Over time, it diverges from reality, to the point that it's usually misleading or wrong. It's not visible in the code (and this isn't an argument in favor of in-code comments). Maintaining it is a burden. There's no agreed-upon standard for how to present or navigate it.
The best way to document things is to name identifiers well, write functions that are well-composed and small enough to understand, stick to single-responsibility principles.
API documentation is important and valuable, especially when your IDE can provide it readily at the point of use. Whenever possible, it should be part of the source code in a formal way, using annotations or other mechanisms tied to the code it describes. I wish more languages would formally include annotation mechanisms for this specific use case.
> wouldn't require documenting if they were just named properly
I mean not really. Having decent names for things tells us what you are doing, but not why.
> only used to describe exception situations
Again, just imagine if that wasn't the case. Imagine people had the empathy to actually do a professional job?
> The best way to document things is to name identifiers well, write functions that are well-composed and small enough to understand, stick to single-responsibility principles.
No, thats the best way to code.
The best way to document is to imagine you are a new developer to the code base, and any information they should know is where they need it. Like your code, you need to test your documentation.
I know you don't _like_ documenting, but thats not the point. Its about being professional, just imagine if an engineer didn't _like_ doing certification of their stuff? they'd loose their license. Sure you could work it out later, but thats not the point. You are paid to be professional, not a magician.
> I know you don't _like_ documenting, but thats not the point.
On the contrary, I like writing documentation. What I can't stand is unnecessary documentation; documentation of obvious things; documentation as a replacement for self-documenting code; documentation for a vague ill-defined future use-case that probably won't exist (YAGNI); and many of the other documentation failures I've already mentioned. It has nothing to do with my professionalism. It has to do with wasting time and energy and using it as a crutch in place of more valuable knowledge transfer mechanisms.
It's not. Good quality codebases do exist, and generally are fairly well self-documented.
Typings, well-named functions and variables, consistent APIs and once in a while, a comment to summarize rationales and point out edge cases, and a README file to explain the setup and entry points.
That is generally all you need in terms of documentation. I've dealt with plenty of codebases like these and they're a delight to learn.
Boto3, its massive. The only reason its tolerable is that its got a large amount of accurate documentation. On the face of it, its an utter arse, because the "client" section of the library is deeply unpythonic (not in the oh its not pretty sense, as it its barely OO and uses capitals for argument names)
If that was "self documenting" I wouldn't be using it.
Take flask for example, its usable because its got good documentation and its well thought out.
but, as I keep on having to tell people. Good function names and tying is not documentation. These tell you what but not why. You don't need documentation if you are working on the same codebase day in day out. But if you have to pick it up again after 6months, you'll wish that you had properly documented it.
I get that you are proud that you don't need documentation. I don't need documentation, I want it because it makes my life 500% easier.
What are you talking about? Boto3 is extremely well documented, and is absolutely not self-documenting. It's a mostly autogenerated API, and a horrible example to use for this.
I don't think you understand what my post is trying to convey. Documentation isn't a sin, it's great when necessary. In the case of boto, it's obviously necessary. In the case of Flask (a framework where you're not going to read the code), it's obviously necessary.
You'll notice, if you read through the Flask codebase, that it's only documented in terms of user-facing functions, which end up in the official end-user documentation. In that sense, the codebase itself is in fact mostly self-documenting.
There's a difference between end-user documentation and developer-facing documentation.
One of the most difficult to argue comments in code reviews: “let’s make it generic in case we need it some place else”.
First of all, chances that we need it some place else aren’t exactly high, unless you are writing a library and code explicitly design to be shared. And even if such need arises, chances of getting it right generalizing from one example are slim.
Regarding the book though, I have participated in one of the workshops with the author and he seemed to be in favor of WER and against “architecting” levels of abstraction before having concrete examples.
You can disagree over what exactly is clean code. But you will learn to distinguish what dirty code is when you try to maintain it.
As a person that has had to maintain dirty code over the years, hearing someone saying dirty code doesn't exist is really frustrating. Noone wants to clean up your code, but doing it is better than allowing the code to become unmaintainable, that's why people bring up that book. If you do not care about what clean code is, stop making life difficult for people that do.
> I think the point is that following that book does not really lead to Clean Code.
And that is why I started saying "You can disagree over what exactly is clean code". Different projects have different requirements. Working on some casual website is not the same as working on a pacemaker firmware.
Yeah, I don't know that we're disagreeing here. For me, there's definitely good code and bad code. I read OP more as saying perfect is the enemy of good, not that all code is good code.
I'd say if you want good code in your working environment, set up a process for requiring it. Automated testing and deployment, and code reviews would be the way to go. Reading Clean Code won't get you there.
I think it's more that clean code doesn't exist because there's no objective measure of this (and those services that claim there are are just as dangerous as Clean Code, the book); anyone can come along and find something about the code that could be tidied up. And legacy is legacy, it's a different problem space to the one a greenfield project exists in.
> As a person that has to maintain dirty code
This is a strange credential to present and then use as a basis to be offended. Are you saying that you have dirty code and have to keep it dirty?
The counterintuitive aspect of this problem that acts as a trap of the less pragmatic people, is that an objective measure is not always necessary.
Let's say you are a feeding a dog. You can estimate what amount of food is dog too little, and what amount of dog food is too much... but now, some jerk comes around and tells you they're going to feed the dog the next time. You agree.
You check the plate, and there's very little food in it. So you say: "hey, you should add more dog food".
Then, the jerk reacts by putting an excessive amount of food in it, just to fuck with you. Then you say "that's too much food!"... So then the jerk reacts saying "you should tell me exactly how many dog pellets I should put on the plate".
Have you ever had to count dog pellets individually to feed a dog? no. You haven't, you have never done it, yet you have fed dogs for years without problems just using a good enough estimate of how much a dog can eat.
Just to please the fucking jerk, you take the approximate amount dog food you regularly feed the dog every day, count the pellets, and say: "there, you fuck, 153 dog pellets".
But the jerk is not happy yet. Just to really fuck with you, the guy will challenge you and say: "so what happens if I feed the dog 152 pellets, or 154... see? you are full of shit". Then you have to explain the jerk that 153 was never important, what's important is the approximate amount of dog food. But the jerk doesn't think that way, the jerk wants a fucking exact number so they can keep fighting you...
Then the jerk will probably say that a dog pellet is not a proper unit of mass, and then the jerk will say that nutrients are not equally distributed in dog pellets, and the bullshit will go on and on and on.
And if you are ever done discussing the optimal number of pellets then there will be another discussion about the frequency in which you feed the dog, and you will probably end up talking about spacetime and atomic clocks and the NTP protocol and relativity and inertial frames just to please the jerk whose objective is just to waste your time until you give up trying to enforce good practices.
And this is how the anti-maintainability jerks operate, by getting into endless debates about how an objective measure of things is required, when in reality, it's not fucking needed and it never was. Estimation is fine.
Just like you won't feed a dog a fucking barrel of dog food you won't create a function that is 5 thousand lines of code long because it's equally nonsensical.
So in the end, what do you do? you say: this many lines of code in a function is too fucking much, don't write functions this long. Why? because I say so, end of debate, fuck it.
There's a formal concept for this in philosophy, but that's up to you to figure out.
This is why I never talk software architecture with anyone on the internet anymore. It’s always an intellectual race to the bottom with someone who insists on a rigid process for things that are absolutely obvious to anyone with experience and genuine concern for the state of the codebase.
In this case the cost of counting food pellets versus the benefit of precision; but in software generally it is the high cost of (accurate) estimating versus the benefit of just getting on with it.
It is also, to some degree, the impossibility of estimating given that the task is given to someone who is an expert at writing code, not estimating. Coding expertise give insight into how long coding tasks take, but that it the least critical component of estimating a task. Writing code and seeing if it works is the best estimate; sometimes the prototype works first time, never always.
Despite approaching this from completely different angles I think we're roughly pointing in the same direction.
A pragmatist would allow for a strong baseline of good practices and automated rules, with some room for discretion where it counts. That way, no one gets bogged down in exhausting debates, and people who work better with strict constraints can avoid ambiguity, but you can still bend the rules with some justification. I don't like all of the rules, but it's easy to fall back on the tooling than it is to hash out another method length debate.
As far as maintaining code goes, I've had the (mis)fortune of dealing with overly abstracted messes and un-architected spaghetti. I'm not sure which is worse, but what I rarely have to deal with now is all the crap that is auto-formatted away.
If I'm involved in any debate, it's around architecture and figuring out how to write better code by properly understanding what we're trying to achieve, rather than trying to lay down Clean Code style design patterns as if they were Rubocop rules.
Well, if there is a problem, you can identify it, discuss it and address it, instead of ignoring it entirely which is what some people do. Do what works for your team.
Excessive abstractions, leaky abstractions and such are indeed a problem. The principle or least power works there (given a choice of solutions, use the least flexible/powerful to solve your problem).
This is what I'm doing even while creating new code. There's a few instances for example where the "execution" is down to a single argument - one of "activate", "reactivate" and "deactivate". But I've made them into three distinct, separate code paths so that I can work error and feedback messages into everything without adding complexity via arguments.
I mean yes it's more verbose, BUT it's also super clear and obvious what things do, and they do not leak the underlying implementation.
I’ve never heard the term WET before but that’s exactly what I do.
The other key thing I think is not to over-engineer abstractions you don’t need yet. But to try and leave ‘seams’ where it’s obvious how to tease code about if you need to start building abstractions.
My experience interviewing recently a number of consultants with only a few years experience was the more they mumbled clean code the less they knew what they were doing.
That's not what WET means. The GP is saying you shouldn't isolate logic in a function until you've cut-and-pasted the logic in at least two places and plan to do so in a third.
> Put logic closest to where it needs to live (feature folders)
Can you say more about this?
I think I may have stumbled on a similar insight myself. In a side project (a roguelike game), I've been experimenting with a design that treats features as first-class, composable design units. Here is a list of the subfolder called game-features in the source tree:
actions
collision
control
death
destructibility
game-feature.lisp
hearing
kinematics
log
lore
package.lisp
rendering
sight
simulation
transform
An extract from the docstring of the entire game-feature package:
"A Game Feature is responsible for providing components, events,
event handlers, queries and other utilities implementing a given
aspect of the game. It's primarily a organization tool for gameplay code.
Each individual Game Feature is represented by a class inheriting
from `SAAT/GF:GAME-FEATURE'. To make use of a Game Feature,
an object of such class should be created, preferably in a
system description (see `SAAT/DI').
This way, all rules of the game are determined by a collection of
Game Features loaded in a given game.
Game Features may depend on other Game Features; this is represented
through dependencies of their classes."
The project is still very much work-in-progress (procrastinating on HN doesn't leave me much time to work on it), and most of the above features are nowhere near completion, but I found the design to be mostly sound. Each game feature provides code that implements its own concerns, and exports various functions and data structures for other game features to use. This is an inversion of traditional design, and is more similar to the ECS pattern, except I bucket all conceptually related things in one place. ECS Components and Systems, utility code, event definitions, etc. that implement a single conceptual game aspect live in the same folder. Inter-feature dependencies are made explicit, and game "superstructure" is designed to allow GFs to wire themselves into appropriate places in the event loop, datastore, etc. - so in game startup code, I just declare which features I want to have enabled.
(Each feature also gets its set of integration tests that use synthetic scenarios to verify a particular aspect of the game works as I want it to.)
One negative side effect of this design is that the execution order of handlers for any given event is hard to determine from code. That's because, to have game features easily compose, GFs can request particular ordering themselves (e.g. "death" can demand its event handler to be executed after "destructibility" but before "log") - so at startup, I get an ordering preference graph that I reconcile and linearize (via topological sorting). I work around this and related issues by adding debug utilities - e.g. some extra code that can, after game startup, generate a PlantUML/GraphViz picture of all events, event handlers, and their ordering.
(I apologize for a long comment, it's a bit of work I always wanted to talk about with someone, but never got around to. The source of the game isn't public right now because I'm afraid of airing my hot garbage code.)
I'd be interested in how you attempt this. Is it all in lisp?
It might be hard to integrate related things, e.g. physical simulation/kinematics <- related to collisions, and maybe sight/hearing <- related to rendering; Which is all great if information flows one way, as a tree, but maybe complicated if it's a graph with intercommunication.
I thought about this before, and figured maybe the design could be initially very loose (and inefficient), but then a constraint-solver could wire things up as needed, i.e. pre-calculate concerns/dependencies.
Another idea, since you mention "logs" as a GF: AOP - using " join points" to declaratively annotate code. This better handles code that is less of a "module" (appropriate for functions and libraries) and more of a cross-cutting "aspect" like logging. This can also get hairy though: could you treated "(bad-path) exception handling" as an aspect? what about "security"?
> I'd be interested in how you attempt this. Is it all in lisp?
Yes, it's all in Common Lisp.
> It might be hard to integrate related things, e.g. physical simulation/kinematics <- related to collisions, and maybe sight/hearing <- related to rendering;
It is, and I'm cheating a bit here. One simplification is that I'm writing a primarily text-based roguelike, so I don't have to bother with a lot of issues common to real-time 3D games. I can pick and choose the level of details I want to go (e.g. whether to handle collisions at a tile granularity, or to introduce sub-tile coordinates and maybe even some kind of object shape representation).
> Which is all great if information flows one way, as a tree, but maybe complicated if it's a graph with intercommunication.
The overall simulation architecture I'm exploring in this project is strongly event-based. The "game frame" is basically pulling events from a queue and executing them until the queue is empty, at which point the frame is considered concluded, and simulation time advances forward. It doesn't use a fixed time step - instead, when a simulation frame starts, the code looks through "actions" scheduled for game "actors" to find the one that will complete soonest, and moves the simulation clock to the completion time of that action. Then the action completion handler fires, which is the proper start of frame - completion handler will queue some events, and handlers of those events will queue those events, and the code just keeps processing events until the queue empties again, completing the simulation frame.
Structure-wise, simulation GF defines the concept of "start frame" and "end frame" (as events), "game clock" (as query) and ability to shift it (as event handler), but it's the actions GF that contains the computation of next action time. So, simulation GF knows how to tell and move time, but actions GF tells it where to move it to.
This is all supported by an overcomplicated event loop that lets GFs provide hints for handler ordering, but also separates each event handling process into four chains: pre-commit, commit, post-commit and abort. Pre-commit handlers fire first, filling event structure with data and performing validation. Then, commit handlers apply direct consequences of event to the real world - they alter the gameplay state. Then, post-commit handlers process further consequences of an event "actually happening". Alternatively, abort handlers process situations when an event was rejected during earlier chains. All of them can enqueue further events to be processed this frame.
So, for example, when you fire a gun, pre-commit handlers will ensure you're able to do it, and reject the event if you can't. If the event is rejected, abort chain will handle informing you that you failed to fire. Otherwise, the commit handlers will instantiate an appropriate projectile. Post-commit handlers may spawn events related to the weapon being fired, such as informing nearby enemies about the discharge.
This means that e.g. if I want to implement "ammunition" feature, I can make an appropriate GF that attaches a pre-commit handler to fire event - checking if you have bullets left and rejecting the event if you don't (events rejected in pre-commit stage are considered to "have never happened"), and a post-commit handler on the same event to decrease your ammo count. The GF is also responsible for defining appropriate components that store ammo count, so that (in classical ECS style) your "gun" entity can use it to keep track of ammunition. It also provides code for querying the current count, for other GFs that may care about it for some reason (and the UI rendering code).
> I thought about this before, and figured maybe the design could be initially very loose (and inefficient), but then a constraint-solver could wire things up as needed, i.e. pre-calculate concerns/dependencies.
I'm halfway there and I could easily do this, but for now opted against it, on the "premature optimization" grounds. That is, since all event handlers are registered when the actual game starts, I "resolve constraints" (read: convert sorting preferences into a DAG and toposort it; it's dumb and very much incorrect, but works well enough for now) and linearize handlers - so that during gameplay, each event handler chain (e.g. "fire weapon", pre-commit) is just a sequence of callbacks executed in order. It would be trivial to take such sequence, generate a function that executes them one by one (with the loop unrolled), and compile it down to metal - Common Lisp lets you do stuff like that - but I don't need it right now.
> Another idea, since you mention "logs" as a GF
FWIW, logs GF is implementing the user-facing logs typical for roguelike games - i.e. the bit that says "You open the big steel doors". Diagnostic logs I do classically.
> AOP - using " join points" to declaratively annotate code
In a way, my weird multi-chain event loop is a reimplementation of AOP. Method combinations in Common Lisp are conceptually similar too, but I'm not making big use of them in game feature-related code.
> This can also get hairy though: could you treated "(bad-path) exception handling" as an aspect? what about "security"?
Yeah, I'm not sure if this pattern would work for these - particularly in full-AOP, "inject anything anywhere" mode. I haven't tried it. Perhaps, with adequate tooling support, it's workable? Common Lisp is definitely not a language to try this in, though - it's too dynamic, so tooling would not be able to reliably tell you about arbitrary pointcuts.
In my case, I restricted the "feature-oriented design" to just game features - I feel it has a chance of working out, because in my mind, quite a lot of gameplay mechanics are conceptually additive. This project is my attempt at experimentally verifying if one could actually make a working game this way.
I've gone down roads similar to this. Long story short - the architecture solves for a lower priority class of problem, w/r to games, so it doesn't pay a great dividend, and you add a combination of boilerplate and dynamism that slows down development.
Your top issue in the runtime game loop is always with concurrency and synchronization logic - e.g. A spawns before B, if A's hitbox overlaps with B, is the first frame that a collision event occurs the frame of spawning or one frame after? That's the kind of issue that is hard to catch, occurs not often, and often has some kind of catastrophic impact if handled wrongly. But the actual effect of the event is usually a one-liner like "set a stun timer" - there is nothing to test with respect to the event itself! The perceived behavior is intimately coupled to when its processing occurs and when the effects are "felt" elsewhere in the loop - everything's tied to some kind of clock, whether it's the CPU clock, the rendered frame, turn-taking, or an abstracted timer. These kinds of bugs are a matter of bad specification, rather than bad implementation, so they resist automated testing mightily.
The most straightforward solution is, failing pure functions, to write more inline code(there is a John Carmack posting on inline code that I often use as a reference point). Enforce a static order of events as often as possible. Then debugging is always a matter of "does A happen before B?" It's there in the source code, and you don't need tooling to spot the issue.
The other part of this is, how do you load and initialize the scene? And that's a data problem that does call for more complex dependency management - but again, most games will aim to solve it statically in the build process of the game's assets, and reduce the amount of game state being serialized to save games, reducing the complexity surface of everything related to saves(versioning, corruption, etc). With a roguelike there is more of an impetus to build a lot of dynamic assets(dungeon maps, item placements etc.) which leads to a larger serialization footprint. But ultimately the focus of all of this is on getting the data to a place where you can bring it back up and run queries on it, and that's the kind of thing where you could theoretically use SQLite and have a very flexible runtime data model with a robust query system - but fully exploiting it wouldn't have the level of performance that's expected for a game.
Now, where can your system make sense? Where the game loop is actually dynamic in its function - i.e. modding APIs. But this tends to be a thing you approach gradually and grudgingly, because modders aren't any better at solving concurrency bugs and they are less incentivized to play nice with other mods, so they will always default to hacking in something that stomps the state, creating intermittent race conditions. So in practice you are likely to just have specific feature points where an API can exist(e.g. add a new "on hit" behavior that conditionally changes the one-liner), and those might impose some generalized concurrency logic.
The other thing that might help is to have a language that actually understands that you want to do this decoupling and has the tooling built in to do constraint logic programming and enforce the "musts" and "cannots" at source level. I don't know of a language that really addresses this well for the use case of game loops - it entails having a whole general-purpose language already and then also this other feature. Big project.
I've been taking the approach instead of aiming to develop "little languages" that compose well for certain kinds of features - e.g. instead of programming a finite state machine by hand for each type of NPC, devise a subcategory of state machines that I could describe as a one-liner, with chunks of fixed-function behavior and a bit of programmability. Instead of a universal graphics system, have various programmable painter systems that can manipulate cursors or selections to describe an image. The concurrency stays mostly static, but the little languages drive the dynamic behavior, and because they are small, they are easy to provide some tooling for.
Thanks for the detailed evaluation. I'll start by reiterating that the project is a typical tile-based roguelike, so some of the concerns you mention in the second paragraph don't apply. Everything runs sequentially and deterministically - though the actual order of execution may not be apparent from the code itself. I mitigate it to an extent by adding introspection features, like e.g. code that dumps PlantUML graphs showing the actual order of execution of event handlers, or their relationship with events (e.g. which handlers can send what subsequent events).
I'll also add that this is an experimental hobby project, used to explore various programming techniques and architecture ideas, so I don't care about most constraints under which commercial game studios operate.
> The perceived behavior is intimately coupled to when its processing occurs and when the effects are "felt" elsewhere in the loop - everything's tied to some kind of clock, whether it's the CPU clock, the rendered frame, turn-taking, or an abstracted timer. These kinds of bugs are a matter of bad specification, rather than bad implementation, so they resist automated testing mightily.
Since day one of the project, the core feature was to be able to run headless automated gameplay tests. That is, input and output are isolated by design. Every "game feature" (GF) I develop comes with automated tests; each such test starts up a minimal game core with fake (or null) input and output, the GF under test, and all GFs on which it depends, and then executes faked scenarios. So far, at least for minor things, it works out OK. I expect I might hit a wall when there are enough interacting GFs that I won't be able to correctly map desired scenarios to actual event execution orders. We'll see what happens when I reach that point.
> that's the kind of thing where you could theoretically use SQLite and have a very flexible runtime data model with a robust query system - but fully exploiting it wouldn't have the level of performance that's expected for a game.
Funny you should mention that.
The other big weird thing about this project is that it uses SQLite for runtime game state. That is, entities are database rows, components are database tables, and the canonical gameplay state at any given point is stored in an in-memory SQLite database. This makes saving/loading a non-issue - I just use SQLite's Backup API to dump the game state to disk, and then read it back.
Performance-wise, I tested this approach extensively up front, by timing artificial reads and writes in expected patterns, including simulating a situation in which I pull map and entities data in a given range to render them on screen. SQLite turned out to be much faster than I expected. On my machine, I could easily get 60FPS out of that with minimum optimization work - but it did consume most of the frame time. Given that I'm writing a ASCII-style, turn(ish) roguelike, I don't actually need to query all that data 60 times per second, so this is quite acceptable performance - but I wouldn't try that with a real-time game.
> The other thing that might help is to have a language that actually understands that you want to do this decoupling and has the tooling built in to do constraint logic programming and enforce the "musts" and "cannots" at source level. I don't know of a language that really addresses this well for the use case of game loops - it entails having a whole general-purpose language already and then also this other feature. Big project.
Or a Lisp project. While I currently do constraint resolution at runtime, it's not hard to move it to compile time. I just didn't bother with it yet. Nice thing about Common Lisp is that the distinction between "compilation/loading" and "runtime" is somewhat arbitrary - any code I can execute in the latter, I can execute in the former. If I have a function that resolves constraints on some data structure and returns a sequence, and that data structure can be completely known at compile time, it's trivial to have the function execute during compilation instead.
> I've been taking the approach instead of aiming to develop "little languages" that compose well for certain kinds of features
I'm interested in learning more about the languages you developed - e.g. how your FSMs are encoded, and what that "programmable painter system" looks like. In my project, I do little languages too (in fact, the aforementioned "game features" are a DSL themselves) - Lisp makes it very easy to just create new DSLs on the fly, and to some extent they inherit the tooling used to power the "host" language.
Sounds like you may be getting close to an ideal result, at least for this project! :) Nice on the use of SQLite - I agree that it's right in the ballpark of usability if you're just occasionally editing or doing simple turn-taking.
When you create gameplay tests, one of the major limitations is in testing data. Many games end up with "playground" levels that validate the major game mechanics because they have no easier way of specifying what is, in essence, a data bug like "jump height is too short to cross gap". Now, of course you can engineer some kind of test, but it starts to become either a reiteration of the data (useless) or an AI programming problem that could be inverted into "give me the set of values that have solutions fitting these constraints" (which then isn't really a "test" but a redefinition of the medium, in the same way that a procedural level is a "solution" for a valid level).
It's this latter point that forms the basis of many of the "little languages". If you hardcode the constraints, then more of the data resides in a sweet spot by default and the runtime is dealing with less generality, so it also becomes easier to validate. One of my favorite examples of this is the light style language in Quake 1: https://quakewiki.org/wiki/lightstyle
It's just a short character string that sequences some brightness changes in a linear scale at a fixed rate. So it's "data," but it's not data encoded in something bulky like a bunch of floating point values. It's of precisely the granularity demanded by the problem, and much easier to edit as a result.
A short step up from that is something like MML: https://en.wikipedia.org/wiki/Music_Macro_Language - now there is a mostly-trivial parsing step involved, but again, it's "to the point" - it assumes features around scale and rhythm that allow it to be compact. You can actually do better than MML by encoding an assumption of "playing in key" and "key change" - then you can barf nearly any sequence of scale degrees into the keyboard and it'll be inoffensive, if not great music. Likewise, you could define rhythm in terms of rhythmic textures over time - sparse, held, arpeggiated, etc. - and so not really have to define the music note by note, making it easy to add new arrangements.
With AI, a similar thing can apply - define a tighter structure and the simpler thing falls out. A lot of game AI FSMs will follow a general pattern of "run this sequenced behavior unless something of a higher priority interrupts it". So encode the sequence, then hardcode the interruption modes, then figure out if they need to be parameterized into e.g. multiple sequences, if they need to retain a memory scratchpad and resume, etc. A lot of the headache of generalizing AI is in discovering needs for new scratchpads, if just to do something like a cooldown timer on a behavior or to retain a target destination. It means that your memory allocation per entity is dependent on how smart they have to be, which depends on the AI's program. It's not so bad if you are in something as dynamic as a Lisp, but problematic in the typical usages of ECS where part of the point is to systematize memory allocation.
With painting what you're looking for is a structuring metaphor for classes of images. Most systems of illustration have structuring metaphors of some kind specifically for defining proportions - they start with simple ratios and primitive shapes, and then use those as the construction lines for more detailed elements which subdivide the shapes again with another set of ratios. This is the conceptual basis of the common "6-8 heads of height" tip used in figure drawing - and there are systems of figure drawing which get really specific about what shapes to draw and how. If I encode such a system, I therefore have a method of automatic illustration that starts not with the actual "drawing" of anything, but with a proportion specification creating construction lines, which are then an input to a styling system that defines how to connect the lines or superimpose other shapes. Something I've been experimenting with to get those lines is a system that works by interpolation of coordinate transforms that aggregate a Cartesian and polar system together - e.g. I want to say "interpolate along this Cartesian grid, after it's been rotated 45 degrees". It can also perform interpolation between two entirely different coordinate systems(e.g. the same grid at two different scales). I haven't touched it in a while, but it generates interesting abstract animations, and I have a vision for turning that into a system for specifying character mannequins, textures, etc. Right now it's too complex to be a good one-liner system, but I could get there by building tighter abstractions on it in the same way as the music system.
My main thing this year has been a binary format that lets me break away from text encodings as the base medium, and instead have more precise, richer data types as the base cell type. This has gone through a lot of iteration to test various things I might want to encode and forms I could encode them in. The key thing I've hit on is to encode with a lot of "slack" in the system - each "cell" of data is 16 bytes; half of that is a header that contains information about how to render it, its ID in a listing of user named types, bitflags defined by the type, a "feature" value(an enumeration defined by the type), and a version field which could be used for various editing features. The other half is a value, which could be a 64-bit value, 8 bytes, a string fragment, etc. - the rendering information field indicates what it is in those general terms, but the definite meaning is named by the user type. The goal is to use this as a groundwork to define the little languages further - rather than relying on "just text" and sophisticated parsing, the parse is trivialized by being able to define richer symbols - and then I can provide more sophisticated editing and visualization more easily. Of course, I'm placing a bet on either having an general-purpose editor for it that's worthwhile, or being able to define custom editors that trivialize editing, neither of which might pan out; there's a case for either "just text" or "just spreadsheets" still beating my system. But I'd like to try it, since I think this way of structuring the bits is likely to be more long-run sustainable.
I don't think I can respond properly and ask the many more questions I have in a HN thread that's already this aged, so if you'd like to talk more, feel free to hit me up (my contact details are in my profile).
I think one should always be careful not to throw out the baby with the bathwater[0].
Do I force myself to follow every single crazy rule in Clean Code? Heck no. Some of them I don't agree with. But do I find myself to be a better coder because of what I learned from Bob Martin? Heck yes. Most of the points he makes are insightful and I apply them daily in my job.
Being a professional means learning from many sources and knowing that there's something to learn from each of them- and some things to ignore from each of them. It means trying the things the book recommends, and judging the results yourself.
So I'm going to keep recommending Clean Code to new developers, in the hopes that they can learn the good bits, and learn to ignore the bad bits. Because so far, I haven't found a book with more good bits (from my perspective) and fewer bad bits (from my perspective).
I'm completely with you here. Until I read Clean Code, I could never really figure out why my personal projects were so unreadable a year later but the code I read at work was so much better even though it was 8 years old. Sure, I probably took things too far for a while and made my functions too small, or my classes too small, or was too nitpicky on code reviews. But I started to actually think about where I should break a function. I realized that a good name could eliminate almost all the comments I had been writing before, leaving only comments that were actually needed. And as I learned how to break down my code, I was forced to learn how to use my IDE to navigate around. All of a sudden new files weren't a big deal, and that opened up a whole new set of changes that I could start making.
I see a lot of people in here acting like all the advice in Clean Code is obviously true or obviously false, and they claim to know how to write a better book. But, like you, I will continue to recommend Clean Code to new developers on my team. It's the fastest way (that I've found so far, though I see other suggestions in the comments here) to get someone to transition from writing "homework code" (that never has to be revisited) to writing maintainable code. Obviously, there are bad parts of Clean Code, but if that new dev is on my team, I'll talk through why certain parts are less useful than others.
Perfect, Its definitly my personal impression, but while reading the post it looks like the author was looking for a "one size fits all" book and was dissapointed they did not find it.
And to be honest that book will never exist, every knowledge contributes to growing as a professional, just make sure to understand, discuss, and use it (or not) for a real reason, not just becaue its on book A or B.
Its not like people need to choose one book and follow it blindly for the rest of their lives, read more books :D
In my opinion the problem is not that the rules are not one size fits all, but that they are so misguided that Martin himself couldn't come up with a piece of code where they would lead to a good result.
One mistake I think people like the author make is treating these books as some sort of bible that you must follow to the letter. People who
evangelised TDD were the worst offenders of this. "You HAVE to do it like this, it's what the book says!"
You're not supposed to take it literally for every project, these are concepts that you need to adapt to your needs. In that sense I think the book still holds up.
For me this maps so clearly to the Dreyfus model of skill acquisition. Novices need strict rules to guide their behavior. Experts are able to use intuition they have developed. When something new comes along, everyone seems like a novice for a little while.
The Dreyfus model identifies 5 skill levels:
Novice
Wants to achieve a goal, and not particularly interested in learning.
Requires context free rules to follow.
When something unexpected happens will get stuck.
Advanced Beginner
Beginning to break away from fixed rules.
Can accomplish tasks on own, but still has difficulty troubleshooting.
Wants information fast.
Competent
Developed a conceptual model of task environment.
Able to troubleshoot.
Beginning to solve novel problems.
Seeks out and solve problems.
Shows initiative and resourcefulness.
May still have trouble determining which details to focus on when solving a problem.
Proficient
Needs the big picture.
Able to reflect on approach in order to perform better next time.
Learns from experience of others.
Applies maxims and patterns.
Expert
Primary source of knowledge and information in a field.
Constantly look for better ways of doing things.
Write books and articles and does the lecture circuit.
Work from intuition.
Knows the difference between irrelevant and important details.
> Primary source of knowledge and information in a field. Constantly look for better ways of doing things. Write books and articles and does the lecture circuit.
Meh. I'm probably being picky, but it doesn't surprise me that a Thought Leader would put themselves and what they do as Thought Leader in the Expert category. I see them more as running along a parallel track. They write books and run consulting companies and speak at conferences and create a brand, and then there are those of us who get good at writing code because we do it every day, year after year. Kind of exactly the difference between sports commentators and athletes.
I don’t think that’s picky at all. GP’s characterization appears to come from a book by Andy Hunt[1]. The two creators (brothers, so both are Dreyfus) of the model don’t say anything of the sort[2].
The problem is that the book presents things that are at best 60/40 issues as hard rules, which leads novices++ follow them to the detriment of everything else.
Uncle Bob himself acts like it is a bible, so if you buy into the rest of his crap then you'll likely buy into that too.
If treated as guidelines you are correct Clean Code is only eh instead of garbage. But taken in the full context of how it is presented/intended to be taken by the author it is damaging to the industry.
I've read his blog and watched his videos. While his attitude comes off as evangelical, his actual advice is very often "Do it when it makes sense", "There are exceptions - use engineering judgment", etc.
Yup. I see the book as guide to a general goal, not a specific objective that can be defined. To actually reach that goal is sometimes completely impossible and in many other cases it introduces too much complexity.
However, in most cases heading towards that goal is a beneficial thing--you just have to recognize when you're getting too close and bogging down in complying with every detail.
I still consider it the best programming book I've ever read.
I understand that the people that follow Clean Code religiously are annoying, but the author seems to be doing the same thing in reverse: because some advice is nuanced or doesn't apply all the time then we should stop recommending the book and forget it altogether.
Its not just that it doesn't always apply. Its that the absracted rules are not useful as stand alone guides to developing code, although they are presented as such. Its the entire purpose of the book isn't it? The argument against this book isn't that books about code style and rules are bad. Its that this one is bad. And its often recommended as core reading material to new developers (several examples of that in this thread). I've read several code style / guide books over the last decade. This is one of the few I put down fairly early on because it just didn't seem very good.
I agree with the sentiment that you don't want to over abstract, but Bob doesn't suggest that (as far as I know). He suggests extract till you drop, meaning simplify your functions down to doing one thing and one thing only and then compose them together.
Hands down, one of the best bits I learned from Bob was the "your code should read like well-written prose." That has enabled me to write some seriously easy to maintain code.
That strikes me as being too vague to be of practical use. I suspect the worst programmers can convince themselves their code is akin to poetry, as bad programmers are almost by definition unable to tell the difference. (Thinking back to when I was learning programming, I'm sure that was true of me.) To be valuable, advice needs to be specific.
If you see a pattern of a junior developer committing unacceptably poor quality code, I doubt it would be productive to tell them Try to make it read more like prose. Instead you'd give more concrete advice, such as choosing good variable names, or the SOLID principles, or judicious use of comments, or sensible indentation.
Perhaps I'm missing something though. In what way was the code should read like well-written prose advice helpful to you?
Specifically in relation to naming. I was caught up in the dogma of "things need to be short" (e.g., using silly variable names like getConf instead of getWebpackConfig). The difference is subtle, but that combined with reading my code aloud to see if it reads like a sentence ("prose") is helpful.
"This module is going to generate a password reset token. First, it's going to make sure we have an emailAddress as an input, then it's going to generate a random string which I'll refer to as token, and then I want to set that token on the user with this email address."
So you're interpreting it to mean use identifiers which are as descriptive as they can practically be, and which are meaningful and self-explanatory when used in combination.
I agree that's generally good advice; the mathematical style of using extremely short identifiers generally just confuses matters. (Exception: very short-lived variables whose purpose is immediately clear from the context.) It's only one possible interpretation of code should read like prose, though. It you who deserves the credit there, not Bob.
> silly variable names like getConf instead of getWebpackConfig
To nitpick this particular example: I'd say that, if it's a method of an object which is itself specific to WebPack, then the shorter identifier is fine.
I'm in the "code should read like poetry" camp. Poetry is the act of conveying meaning that isn't completely semantic - meter and rhyme being the primary examples. In code, that can mean maintaining a cadence of variable names, use of whitespace that helps illuminate structure, or writing blocks or classes where the appearance of the code itself has some mapping to what it does. You can kludge a solution together, or craft a context in which the suchness of what you are trying to convey becomes clear in a narrative climax.
> use of whitespace that helps illuminate structure,
Good luck with that now that everybody uses automatic linters & formaters that mess up your whitespace because of some stupid rule that there should be only one empty line and no spaces after a function name or something.
Code should be simple and tight and small. It should also, however, strive for an eighth grade reading level.
You shouldn't try to make your classes so small that you're abusing something like nested ternary operators which are difficult to read. You shouldn't try to break up your concepts so much that while the sentences are easy, the meaning of the whole class becomes muddled. You should stick with concepts everyone knows and not try to invent your own domain specific language in every class.
Less code is always more, right up until it becomes difficult to read, then you've gone too far. On the other hand if you extract a helper method from a method which read fine to begin with, then you've made the code harder to read, not easier, because its now bigger with an extra concept. But if that was a horrible conditional with four clauses which you can express with a "NeedsFrobbing" method and a comment about it, then carry on (generating four methods from that conditional to "simplify" it is usually worse, though, due to the introduction of four concepts that could be often better addressed with just some judicious whitespace to separate them).
And I need to learn how to write in English more like Hemmingway, particularly before I've digested coffee. That last paragraph got away from me a bit.
Absolutely this. Code should tell a story, the functions and objects you use are defined by the context of the story at that level of description. If you have to translate between low-level operations to reconstruct the high level behavior of some unit of work, you are missing some opportunities for useful abstractions.
Coding at scale is about managing complexity. The best code is code you don't have to read because of well named functional boundaries. Natural language is our facility for managing complexity generally. It shouldn't be surprising that the two are mutually beneficial.
I tried to write code with small functions and was dissuaded from doing that at both my teams over the past few years. The reason is that it can be hard to follow the logic if it's spread out among several functions. Jumping back and forth breaks your flow of thought.
I think the best compromise is small summary comments at various points of functions that "hold the entire thought".
The point of isolating abstractions is that you don't have to jump and back and forth. You look at a function, and you understand from its contract and calling convention you immediately know what it does. The specific details aren't relevant for the layer of abstraction you're looking at.
Because of well structured abstractions, thoughtful naming conventions, documentation where required, and extensive testing you trust that the function does what it says. If I'm looking at a function like commitPending(), I simply see writeToDisk() and move on. I'm in the object representation layer, and jumping down into the details of the I/O layers breaks flow by moving to a different level of abstraction. The point is I trust writeToDisk() behaves reasonably and safely, and I don't need to inspect its contents, and definitely don't want to inline its code.
If you find that you frequently need to jump down the tree from sub-routine to sub-routine to understand the high level code, then that's a definite code smell. Most likely something is fundamentally broken in your abstraction model.
Check out the try/catch and logging pattern I use in the linked post. I added that specifically so I could identify where errors were ocurring without having to guess.
When I get the error in the console/browser, the path to the error is included for me like "[generatePasswordResetToken.setTokenOnUser] Must pass value to $set to perform an update."
With that, I know exactly where the error is ocurring and can jump straight into debugging it.
Nice! However, none of this is required for this endpoint. Here's why:
1. The connect action could be replaced by doing the connection once on app startup.
2. The validation could be replaced with middleware like express-joi.
3. The stripe/email steps should be asynchronous (ex: simple crons). This way, you create the user and that's it. If Stripe is down, or the email provider is down, you still create the user. If the server restarts while someone calls the endpoint, you don't end up with a user with invalid Stripe config. You just create a user with stripeSetup=false and welcomeEmailSent=false and have some crons that every 5 seconds query for these users and do their work. Also, ensure you query for false and not "not equal to true" here as it's not efficient.
Off topic but is connecting to Mongo on every API hit best practice? I abstract my connection to a module and keep that open for the life of the application.
Yes, that one did a lot to me too. Especially when business logic gets complicated, I want to be able to skip parts by roughly reading meaning of the section without seeing details.
One long stream of commands is ok to read, if you are author or already know what it should do. But otherwise it forces you to read too many irrelevant details on a way toward what you need.
Robert Martin and his aura always struck me as odd. In part because of how revered he always was at organizations I worked. Senior developers would use his work to end arguments, and many code reviews discussions would be judged by how closely they adhere to Clean Code.
Of course reading Clean Code left me more confused than enlightened due precisely to what he presents as good examples of Code. The author of the article really does hit the nail on the head about Martin's code style - it's borderline unreadable a lot of times.
Who the f. even is Robert Martin?! What has he built? As far as I am able to see he is famous and revered because he is famous and revered.
He ran a consultancy and knew how to pump out books into a world of programmers that wanted books
I was around in the early 90s through to the early 2000s when a lot of the ideas came about slowly got morphed by consultants who were selling this stuff to companies as essentially "religion". The nuanced thoughts of a lot of the people who had most of the original core ideas is mostly lost.
It's a tricky situation, at the core of things, there are some really good ideas, but the messaging by people like "uncle bob" seem to fail to communicate the mindset in a way that develops thinking programmers. Mainly because him, and people like Ron Jerfferies, really didn't actually build anything serious once they became consultants and started giving out all these edicts. If you watched them on forums/blogs at the time, they were really not that good. There were lots of people who were building real things and had great perspectives, but their nuanced perspectives were never really captured into books, and it would be hard to as it is more about the mentality of using ideas and principles and making good pragmatic choices and adapting things and not being limited by "rules" but about incorporating the essence of the ideas into your thinking processes.
So many of those people walked away from a lot of those communities when it morphed into "Agile" and started being dominated by the consultants.
10 or so years ago when I first got into development I looked to people like Martin's for how I should write code.
But I had more and more difficulty reconciling bizarrely optimistic patterns with reality. This from the article perfectly sums it up:
>Martin says that functions should not be large enough to hold nested control structures (conditionals and loops); equivalently, they should not be indented to more than two levels.
Back then as now I could not understand how one person can make such confident and unambiguous statements about business logic across the spectrum of use cases and applications.
It's one thing to say how something should be written in ideal circumstances, it's another to essentially say code is spaghetti garbage because it doesn't precisely align to a very specific dogma.
This is the point that I have the most trouble understanding in critiques of Fowler, Bob, and all writers who write about coding: in my reading, I had always assumed that they were writing about the perfect-world ideal that needs to be balanced with real-world situations. There's a certain level of bluster and over-confidence required in that type of technical writing that I understood to be a necessary evil in order to get points across. After all, a book full of qualifications will fail to inspire confidence in its own advice.
This is true only for people first coming to development. If you're just starting your journey, you are likely looking for quantifiable absolutes as to what is good and what isn't.
After you're a bit more seasoned, I think qualified comments are probably far more welcome than absolutes.
> After all, a book full of qualifications will fail to inspire confidence in its own advice.
I don't think that's true at all. One of the old 'erlang bibles' is "learn you some erlang" and it full of qualifications titled "don't drink the kool-aid" (notably not there in the haskell inspiration for the book). It does not fail to inspire confidence to have qualifications scattered throughout and to me it actually gives me MORE confidence that the content is applicable and the tradeoffs are worth it.
The article is not written by Robert Martin, so that doesn’t necessarily establish he said that. You also implied Fowler said it. Thanks for clarifying.
I believe Robert Martin did say this, but there was probably a preface from the book that didn’t make it into the article, so the quote in the article may be a bit out of context.
Now, some people will claim that having 8-character indentations makes
the code move too far to the right, and makes it hard to read on a
80-character terminal screen. The answer to that is that if you need
more than 3 levels of indentation, you're screwed anyway, and should fix
your program.
The big problem that I have with Clean Code -- and with its sequel, Clean Architecture -- is that for its most zealous proponents, it has ceased to be a means to an end and has instead become an end in itself. So they'll justify their approach by citing one or other of the SOLID principles, but they won't explain what benefit that particular SOLID principle is going to offer them in that particular case.
The point that I make about patterns and practices in programming is that they need to justify their existence in terms of value that they provide to the end user, to the customer, or to the business. If they can't provide clear evidence that they actually provide those benefits, or if they only provide benefits that the business isn't asking for, then they're just wasting time and money.
One example that Uncle Bob Martin hammers home a lot is separation of concerns. Separation of concerns can make your code a lot easier to read and maintain if it's done right -- unit testing is one good example here. But when it ceases to be a means to an end and becomes an end in itself, or when it tries to solve problems that the business isn't asking for, it degenerates into speculative generality. That's why you'll find project after project after project after project after project with cumbersome and obstructive data access layers just because you "might" want to swap out your database for some unknown mystery alternative some day.
I don’t disagree with the overall message or choice of examples behind this post, but one paragraph stuck out to me:
> Martin says that it should be possible to read a single source file from top to bottom as narrative, with the level of abstraction in each function descending as we read on, each function calling out to others further down. This is far from universally relevant. Many source files, I would even say most source files, cannot be neatly hierarchised in this way.
The relevance is a fair criticism but most programs in most languages can in fact be hierarchized this way, with the small number of mutually interdependent code safely separated. Many functional languages actually enforce this.
As an F# developer it can be very painful to read C# programs even though I often find C# files very elegant and readable: it just seems like a book, presented out of order, and without page numbers. Whereas an .fsproj file provides a robust reading order.
> "with the level of abstraction in each function descending as we read on, each function calling out to others further down." ...
> Many functional languages actually enforce this.
Don't they enforce the opposite? In ML languages (I don't know F# but I thought it was an ML dialect), you can generally only call functions that were defined previously.
Of course, having a clear hierarchy is nice whether it goes from most to least abstract, or the other way around, but I think Martin is recommending the opposite from what you are used to.
Hmm, perhaps I am misreading this? Your understanding of ML languages is correct. I have always found “Uncle Bob” condescending and obnoxious so I can’t speak to the actual source material.
I am putting more emphasis on the “reading top-to-bottom” aspect and less on the level of abstraction itself (might be why I’m misreading it). My understanding was that Bob sez a function shouldn’t call any “helper” functions until the helpers have been defined - if it did, you wouldn’t be able to “read” it. But with your comment, maybe he meant that you should define your lower-level functions as prototypes, implement the higher-level functions completely, then fill in the details for the lower functions at the bottom. Which is situationally useful but yeah, overkill as a hard rule.
In ML and F# you can certainly call interfaces before providing an implementation, as long as you define the interface first. Whereas in C# you can define the interface last and call it all you want beforehand. This is what I find confusing, to the point of being bad practice in most cases.
So even if I misread specifically what (the post said) Bob was saying, I think the overall idea is what Bob had in mind.
It seems your idea is precisely the opposite of Robert Martin's. What he is advocating for is starting out the file with the high-level abstractions first, without any of the messy details. So, at the top level you'd have a function that says `DoTheThing() { doFirstPart(); doSecondPart();}`,then reading along you'd find out what doFirstPart() and doSecondPart() mean (note that I've used imperative style names, but that was a random choice on my part).
Personally I prefer this style, even though I dislike many other ideas in Clean Code.
The requirement to define some function name before you can call it is specific to a few languages, typically older ones. I don't think it's very relevant in this context, and there are usually ways around it (such as putting all declarations in header files in C and C++, so that the actual source file technically begins with the declarations from the compiler's perspective, but doesn't actually from a programmer perspective (it just begins with #include "header").
> I am putting more emphasis on the “reading top-to-bottom” aspect and less on the level of abstraction itself (might be why I’m misreading it). My understanding was that Bob sez a function shouldn’t call any “helper” functions until the helpers have been defined - if it did, you wouldn’t be able to “read” it. But with your comment, maybe he meant that you should define your lower-level functions as prototypes, implement the higher-level functions completely, then fill in the details for the lower functions at the bottom.
Neither really, he's saying the higher-level abstractions go at the top of the file, meaning the helpers go at the bottom and get used before they're defined. No mention of prototypes that I remember.
Personally, I've never liked that advice either - I always put the helpers at the top to build up the grammar, then the work is actually done at the bottom.
> In ML languages, you can generally only call functions that were defined previously.
Hum... At least not in Haskell.
Starting with the mostly dependent code makes a large difference in readability. It's much better to open your file and see what are the overall functions. The alternative is browsing to find it, even when it's on the bottom. Since you read functions from the top to the bottom, locating the bottom of the function isn't much of a help to read it.
1 - The dependency order does not imply on any ordering in abstraction. Both can change in opposite directions just as well as on the same.
As someone who almost exclusively uses functional languages: don’t do this. This kind of pedantic gatekeeping is not only obnoxious... it’s totally inaccurate! Which makes it 100x as obnoxious.
“Functional” means “functions are first-class citizens in the language” and typically mean a lot of core language features designed around easily creasing android manipulating functions as ordinary objects (so C#, C++, and Python don’t really count, even with more recent bells-and-whistles). Typically there is a strong emphasis on recursive definitions. But trying to pretend “functional programming languages” are anything particularly specific is just a recipe for dumb arguments. And of course, outside of the entry point itself, it is quite possible (even desirable) to write side-effect-free idiomatic ISO C.
The “original” functional language was LISP, which is impure as C and not even statically-typed - and for a long time certain Lisp/Scheme folks would gatekeep about how a language wasn’t a Real Functional Language if it wasn’t homoiconic and didn’t have fancy macros. (And what’s with those weird ivory tower Miranda programmers?) In fact, I think the “gate has swung,” so to speak, to the point that people downplay the certain advantages of Lisps over ML/Haskells/etc.
> for a long time certain Lisp/Scheme folks would gatekeep about how a language wasn’t a Real Functional Language if it wasn’t homoiconic and didn’t have fancy macros
This never happened. You are confusing functional with the 2000s "X is a good enough Lisp" controversy, which had nothing to do with functional programming.
> “Functional” means “functions are first-class citizens in the language”
No, the word function has a clearly defined meaning. I don't know where you get your strange ideas from - you need to look at original sources. The word "functional" did not become part of the jargon until the 1990s. Even into the 1980s most people referred to this paradigm as "applicative" (as in procedure application), which is a lot more appropriate. The big problem with the Lisp community is that early on, when everyone used the words "procedures" or "subroutines," they decided to start calling them "functions," even though they could have side effects. This is probably the reason why people started trying to appropriate "functional" as an adjective from the ML and Haskell communities into their own languages. A lot of people assume that if you can write a for loop as a map, it makes it "functional." What you end up with is a bunch of inappropriate cargo-culting by people who do not understand the basics of functional programming.
Term "functional" has been watered down and has different meanings now. However, the original meaning comes from mathematics and is still in use. You might not like it, but that's how it is - hence why I deliberately disambiguated it in my response to make it clear.
This has also nothing to do with gatekeeping.
If you disagree with me, maybe you should go to Wikipedia first and change it there, because by what you say, Wikipedia does it wrong too.
> In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions.
> The subroutine may return a computed value to its caller (its return value), or provide various result values or output parameters. Indeed, a common use of subroutines is to implement mathematical functions, in which the purpose of the subroutine is purely to compute one or more results whose values are entirely determined by the arguments passed to the subroutine. (Examples might include computing the logarithm of a number or the determinant of a matrix.) This type is called a function.
> In programming languages such as C, C++, and C#, subroutines may also simply be called functions (not to be confused with mathematical functions or functional programming, which are different concepts).
We follow this approach closely - the problem is that people confuse helper services for first-order services and call them directly leading to confusion. I don't know how to avoid this without moving the "main" service to a separate project and having `internal` helper services. DI for class libraries in .NET Core is also hacky if you don't want to import every single service explicitly.
Is there a reason why private/internal qualifiers aren’t sufficient? Possibly within the same namespace / partial class if you want to break it up?
As I type this out, I suppose “people don’t use access modifiers when they should” is a defensible reason.... I also think the InternalsVisibleTo attribute should be used more widely for testing.
You can't make a helper private if you move it out to a separate service. The risk of making it public is if people think the helper should be used directly, and not through the parent service.
Internal REQUIRES that the service be moved out to a class library project, which seems like overkill in a lot of cases.
> But mixed into the chapter there are more questionable assertions. Martin says that Boolean flag arguments are bad practice, which I agree with, because an unadorned true or false in source code is opaque and unclear versus an explicit IS_SUITE or IS_NOT_SUITE... but Martin's reasoning is rather that a Boolean argument means that a function does more than one thing, which it shouldn't.
I see how this can be polemic because most code is littered w/ flags, but I tend to agree that boolean flags can be an anti-pattern (even though it's apparently idiomatic in some languages).
Usually the flag is there to introduce a branching condition (effectively breaking "a function should do one thing") but don't carry any semantic on it's own. I find the same can be achieved w/ polymorphism and/or pattern-matching, the benefit being now your behaviour is part of the data model (the first argument) which is easier to reason about, document, and extend to new cases (don't need to keep passing flags down the call chain).
As anything, I don't think we can say "I recommend / don't recommend X book", all knowledge and experience is useful. Just use your judgment and don't treat programming books as a holy book.
> Usually the flag is there to introduce a branching condition (effectively breaking "a function should do one thing")...
But if you don't let the function branch, then the parent function is going to have to decide which of two different functions to call. Which is going to require the parent function to branch. Sooner or later, someone has to branch. Put the branch where it makes the most sense, that is, where the logical "one-ness" of the function is preserved even with the branch.
> I find the same can be achieved w/ polymorphism and/or pattern-matching, the benefit being now your behaviour is part of the data model (the first argument) which is easier to reason about, document, and extend to new cases (don't need to keep passing flags down the call chain).
You just moved the branch. Polymorphism means that you moved the branch to the point of construction of the object. (And that's a perfectly fine way to do it, in some cases. It's a horrible way to try to deal with all branches, though.) Pattern-matching means that you moved the branch to when you created the data. (Again, that can be a perfectly fine way to do it, in some cases.)
> As anything, I don't think we can say "I recommend / don't recommend X book", all knowledge and experience is useful. Just use your judgment and don't treat programming books as a holy book.
People don't want to go through the trouble of reading several opposing points of view and synthesize that using their own personal experience. They want to have a book tell them everything they need to do and follow that blindly, and if that ever bites them back then that book was clearly trash. This is the POV the article seems to be written from IMHO.
As far as the boolean flag argument goes, I've seen it justified in terms of data-oriented design, where you want to lift your data dependencies to the top level as much as possible. If a function branches on some argument, and further up the stack that argument is constant, maybe you didn't need that branch at all if only you could invoke the right logic directly.
Notably, this argument has very little to do with readability. I do prefer consolidating data and extracting data dependencies -- I think it makes it easier to get a big-picture view, as in Brook's "Show me your spreadsheets" -- but this argument is rooted specifically in not making the machine do redundant work.
> This is done as part of an overall lesson in the virtue of inventing a new domain-specific testing language for your tests. I was left so confused by this suggestion. I would use exactly the same code to demonstrate exactly the opposite lesson. Don't do this!
This example (code is in the article) was very telling of the book author's core philosophy.
Best I can tell, the OOP movement of the 2000s (I wasn't a professional in 2008, though I was learning Java at the time) was at its heart rooted in the idea that abstractions are nearly always a win; the very idealistic perspective that anything you can possibly give a name to, should be given a name. That programmers down the line will thank you for handing them a named entity instead of perhaps even a single line of underlying code.
This philosophy greatly over-estimates the value, and greatly under-estimates the cost, of idea-creation. I don't just write some code, I create an idea, and then I write a bit of code as implementation details for it. This is a very tantalizing vision of development: all messy details are hidden away, what we're left with is a beautiful constellation of ideas in their purest form.
The problem is that when someone else has to try and make sense of your code, they first have to internalize all of your ideas, instead of just reading the code itself which may be calling out to something they already understand. It is the opposite of self-documenting code: it's code that requires its own glossary in addition to the usual documentation. "wayTooCold()" may read more naturally to the person who wrote it, but there's a fallacy where they assume that that also applies to other minds that come along and read it later.
Establishing a whole new concept with its own terminology in your code is costly. It has to be done with great care and only when absolutely necessary, and then documented thoroughly. I think as an industry we have more awareness of this nowadays. We don't just make UML diagrams and kick them across the fence for all those mundane "implementation details" to be written.
This thread is full of people saying what's wrong with the book without posing alternatives. I get that it's dogmatic, but do people seriously take it as gospel? I'd read it along with other things. Parts are great and others are not. It's not terrible.
I agree. Trying to apply the lessons in there leads to code that is more difficult to read and reason about. Making it "read like a book" and keeping functions short sound good on the surface but they lead to lines getting taken up entirely by function names and a nightmare of tracking call after call after call.
It's been years since I've read the book and I'm still having trouble with the bad ideas from there because they're so well stuck with me that I feel like I'm doing things wrong if I don't follow the guidelines in there. Sometimes I'll actually write something in a sensible way, change it to the Clean Code way, and then revert it back to where it was when I realize my own code is confusing me when written like that.
This isn't just a Robert C Martin issue. It's a cultural issue. People need to stop shaming others if their code doesn't align with Clean Code. People need to stop preaching from the book.
I make my code "read like a book" with a line comment for each algorithmic step inside a function, and adding line-ending comments to clarify. So functions are just containers of steps designed to reduce repetition, increase visibility, and minimize data passing and globals.
I recently read this cover to cover and left a negative review on Amazon. I'm happy to see I'm not the only one, and this goes into it in a whole lot more detail.
The author seems like they took a set of rules that are good for breaking beginning programmers bad habits and then applied them into the extreme. There's a whole lot of rules which aren't bad up until you try to apply them like laws of gravity that must always be followed. Breaking up big clunky methods that do way too much is great for readability, right up until you're spraying one line helper methods all over your classes and making them harder to read because now you're inventing your own domain specific language everyone has to learn (often with the wrong abstractions which get extended through the years and wind up needing a massive refactoring down the road which would have been simpler with fewer methods and abstractions involved at the start).
A whole lot of my job is taking classes, un-DRY'ing them completely so there's duplication all over the place, then extracting the right (or at least more correct) abstractions to make the whole thing simple and readable and tight.
My biggest gripe: Functions shouldn't be short, they should be of appropriate size. They should contain all the logic that isn't supposed to be exposed to the outside for someone else to call. If that means your function is 3000 lines long, so be it.
Realize that your whole program is effectively one big function and you achieve nothing by scattering its guts out into individual sub-functions just to make the pieces smaller.
If something is too convoluted and does too much, or has too much redundancy, you'll know, because it'll cause problems. It'll bother you. You shouldn't pre-empt this case by just writing small functions by default. That'll just cause its own problems.
This is an interesting article because as I was reading Martin's suggestions I agreed with every single one of them. 5 lines of code per function is ideal. Non-nested whenever possible. Don't mix query/pure and commands/impure. Then I got to the code examples and they were dreadful. Those member variables should be readonly.
Using Martin's suggestion with Functional Hexagonal Architecture would lead to beautiful code. I know because that's what I've been writing for the past 3 years.
Great! While we're on it, can we retire the gang of four as well? I mean, the authors are obviously great software engineers, and the Patterns have helped to design, build, and most importantly read, a lot of software. But as we move forward, more and more of these goals can be achieved much more elegantly and sustainably with new languages and more functional approaches. Personally, I find re-teaching junior programmers, who are still trying to make everything into a class, very tiring.
I don’t understand the amount of hate that Clean Code gets these days…it’s a relatively straightforward set of principles that can help you create a software system maintainable by humans for a very long time. Of course it’s not an engineering utopia, there’s no such thing.
I get the impression that it’s about the messengers and not the messages, and that people have had horrible learning experiences that have calcified into resistance to do with anything clean. But valuable insights are being lost, and they will have to be re-learned in a new guise at a later date.
Development trends are cyclical and even the most sound principle has an exception. Even if something is good advice 99% of the time, it will eventually be criticized with that 1% of the time being used as a counter.
For me Clean Code is not about slavishly adhering to the rules therein, but about guidelines to help make your code better if you follow them, in most circumstances. On his blog Bob Martin himself says about livable code vs pristinely clean code: "Does this rule apply to code? It absolutely does! When I write code I fight very hard to keep it clean. But there are also little places where I break the rules specifically because those breakages create affordances for transient issues."
I've found the Clean Code guidelines very useful. Your team's mileage may very. As always: Use what works, toss the rest, give back where you can.
I never recommended Clean Code, but I've become a strong advocate against it on teams that I lead after reading opinions by Bob Martin such as this one: https://blog.cleancoder.com/uncle-bob/2017/01/11/TheDarkPath.... That whole article reads as someone who is stuck in their old ways and inflexible, then given their large soapbox tries to justify their discomfort and frustration. I consider Swift, Kotlin (and Rust) to be one of the most important language evolutions that dramatically improved software quality on the projects I've worked on.
I've seen so many real world counter-examples to arguments made in that article and his other blog posts that I'm puzzled why this guy has such a large and devoted following.
Actually, I found the post you linked to fairly logical. He’s saying that humans are inherently lazy, and that a language that gives us the option between being diligent (strong types) or being reckless (opt-out of strong types) will lead to the worst form of recklessness: opting out while not writing tests, giving the misimpression of safety.
His point is that you can’t practically force programmers to be diligent through safety features of a language itself, since edge-cases require escape hatches from those safety features, and those safety hatches will be exploited by our natural tendency to avoid “punishment”.
I’m not sure I agree with his point, but I don’t find it an unreasonable position. I’d be curious if Rust has escape hatches that are easily and often abused.
My favorite example here, and a counterpoint to Bob, is Reacts’s dangerously-unsafe-html attribute. I haven’t seen it in years (to the point where I can’t recall the exact naming), and perhaps it was removed at some point. But it made the escape hatch really painful to use. And so the pain of using the escape hatch made it less painful to actually write React in the right manner. Coming from Angular, I think I struggled at first with thinking I had to write some dangerous html, but over time I forgot the choice of writing poor React code even existed.
So I guess I disagree with Bob’s post here. It is possible to have safety features in languages that are less painful than the escape-hatches from those safety features. And no suite of tests will ever be as powerful as those built-in safety features.
He actually misunderstands and mischaracterizes the features of the languages he complains about. These features remove the need for a developer to keep track of invariants in their code, so should be embraced and welcomed by lazy developers who don't have to simulate the boring parts of code in their head to make sure it works. "If it type-checks then it works" philosophy really goes a long way toward relieving developer's stress.
For example, if I'm using C or Java I have take into account that every pointer or reference can be null, at every place where they are used. I should write null checks, (or error checks say from opening a file handle) but I usually don't because I'm lazy, or I forget, or its hard to keep track of all possible error conditions. So I'm stressed during a release because I can't predict the input that may crash my code.
In a language like Swift I am forced to do a null or an error check once in the code, and for that small effort the compiler will guarantee I will never have to worry about these error conditions again. This type system means I can refactor code drastically and with confidence, and I don't have to spend time worrying about all code paths to see if one of them would result in an unexpected null reference. On a professional development team, it should be a no-brainer to adopt a new technology to eliminate all null-reference exceptions at runtime, or use a language to setup guarantees that will hold under all conditions and in the future evolution of the code.
Worse than that, he sets up a patronizing and misguided mental image of a common developer who he imagines will use a language with type safety just to disable and abuse all the safety features. Nobody does that, in my experience of professional Swift, Kotlin or rust development.
He advocates for unit tests only and above all else. That is also painfully misguided: a test tells you it passes for one given example of input. In comparison a good type system guarantees that your code will work for ALL values of a given type. Of course type systems can't express all invariants, so there is a need for both approaches. But that lack of nuance and plain bad advice turned me into an anti-UncleBob advocate.
I find that these two books are in many recommended lists, but I found them entirely unforgettable, entirely too long, and without any "meat."
So much of the advice given is trivial things you'll just figure out in the first few months of coding professionally.
Code for more than a week and you'll figure out how to name classes, how to use variables, how scope works, etc.
The code examples are only in C++, Java, and Visual Basic (ha!). Completely ignoring non-OO and dynamic languages.
Some of the advice is just bad (like prefixing global variables with g_) or incredibly outdated (like, "avoid goto"? Thanks 1968!).
Work on a single software project, or any problem ever, and you'll know that you need to define the problem first. It's not exactly sage advice.
These are cherry-picked examples, but overall Code Complete manages to be too large, go into too specific detail in some areas, while giving vague advice in others.
All books are written in a time and a few become timeless. Software books have an especially short half-life. I think Code Complete was a book Software Engineering needed in 2004, but has since dropped in value.
I will say, that Code Complete does have utility as a way to prop up your monitor for increased ergonomics, which is something you should never ignore.
I have similar issues with Clean Code. One is better off just googling "SOLID Principles" and then just programming using small interfaces more often and use fewer subclasses.
A better alternative is (from above) The Pragmatic Programmer (2019), a good style guide, and/or get your code reviewed by literally anyone.
Another thing Martin advocates for is not putting your name in comments, e.g. "Fixed a bug here; there could still be problematic interactions with subsystem foo -- ericb". He says, "Source control systems are very good at remembering who added what, when." (p. 68, 2009 edition)
Rubbish! Multiple times I've had to track down the original author of code that was auto-refactored, reformatted, changed locations, changed source control, etc. "git blame" and such are useless in these cases; it ends up being a game of Whodunit that involves hours of search, Slack pings, and what not. Just put your name in the comment, if it's documenting something substantial and is the result of your own research and struggle. And if you're in such a position, allow and encourage your colleagues to do this too.
Better put such a long explanation there that your name isn't needed any more. Because if it is your name that makes the difference chances are that you have left the company by the time someone comes across that comment and needs access to your brain.
Sometimes what is interesting is that you have found that another engineer—whom you might not know in a large enough organization—has put time and thought into the code you are working on, and you can be a lot more efficient and less likely to break things if you can talk to that engineer first. It's not always the comment itself.
Sure, but in IT the rule of assumption should be that the code will outlive the coder. If being able to talk to other engineers is going to make the difference (instead of just being an optimization) then you already have problems.
Talking to other engineers is necessary when your codebase, organization, and architecture is large enough that changes can have far-reaching effects. I would say, in fact, that it's the most important distinction between a junior and senior engineer.
Let me give you a real-world example. Facebook used to have a service called BCF‡; it was basically a "lookup" service for a given host, where you could do forward- or reverse-lookups of a hostname, or a class of hosts, and get information about their physical location, network properties, hardware configuration, and so forth.
This code was old. It had survived the transition from SVN to Git, and I'm sure it has in some form survived the transition from Git to Mercurial, though that was after my time. It had also been moved several times as no team formally owned the service. It was originally slapped together under extreme pressure by a few engineers, basically a "hackathon." Despite that, it was so useful that it had been adopted by pretty much every team that touched infrastructure, which at the time I became involved was ~300 engineers.
I was working on a service that made extensive use of the data in BCF. There was a problem with one of its Thrift RPCs which required a bugfix. This bug had plagued users of the service for several years, but because the code was so old and hoary—it didn't even have auto-generation of its Thrift bindings—nobody had bothered to fix it. Instead, every team who used this RPC had coded around it, or (worse still) skipped the binding and queried the backing database directly.
Well, I was determined to fix the bug. "git blame" showed a bot. No problem, let's go back before that commit...another bot. Before that, a human! Cool, let's reach out—no, turns out he'd done some code formatting on it. Before that—whoops, the beginning of the code history! OK, so check the old SVN repo. Five contributors over its history. I pinged each and every one who was still at the company—they hadn't written it. Finally got ahold of somebody who said, "Oh yeah, Samir‡ wrote that, ask him." I looked up and realized I could see the back of Samir's head, because he sat 10m away. It took two hours to get to that point, and 5 minutes to sort out what I needed from him without literally bringing down the site. I fixed the bug.
Every single one of those things, of course, defied the "rule of assumption." But every single one of those things was done under a specific kind of duress: keeping the company running and the features rolling. The real world is messy, and putting a little extra in your comments and leaving threads for future engineers is an enormously powerful lubricant.
Every company has these problems, my friend. Like I said, good comments are lubrication against the problems that will arise in sufficiently large, churning code.
Yes, that is exactly the thesis of the classic paper "Programming as Theory Building"[0], that it is the theory in the human's head that we pay for and that the complete history of the code is often not enough to effectively modify a program.
Or have gotten busy since. I worked at Coinbase in 2019 and saw a comment at the top of a file saying that something should probably be changed. I git-blamed and saw it was written six years earlier by Brian Armstrong.
I think most of what martin says is rubbish, but this is not. I have never had `git blame` fail...ever. I know what user is responsible for every line of code. Doing this is contemporaneous. Its right up there with commenting out blocks of code so you don't lose them.
I don't know what to say, this is a real problem I have encountered in actual production code multiple times. Any code that lives longer than your company's source control of choice, style of choice, or code structure of choice is vulnerable. Moreover, what's the harm? It's more information, not less.
>Moreover, what's the harm? It's more information, not less
If your code is outliving your source control system you got bigger problems than whatever is in your comments. I can confidently say that in 25 years in this industry this isn't something I've encountered. So probably sufficiently low enough probability to safely ignore.
>Moreover, what's the harm? It's more information, not less.
Too much information is every bit as bad as too little.
> I can confidently say that in 25 years in this industry this isn't something I've encountered.
I've been in the industry less than half that time and it's happened to me. blame will tell you the last person/commit to touch that line of code. To find out when it was originally written, I may have to (in some cases) manually binary search the history of the file.
If you're a CLI user, you should check out tig, which is a terminal UI for git. In the blame view, you can select a line and press "," to go to the blame view for the parent of the commit blamed for that line.
This lets you quickly traverse through the history of a file. Another good tool for digging into the history is the -S flag to "git log", which finds commits that added or removed a certain string (there's -G that searches for a regex, too).
If you prefer a GUI tool, DeepGit[0] is a very nice tool that allows you to do some pretty amazing code archeology. I use this all the time for figuring out how legacy code evolved over time.
why? Just go to the commit hash in the blame and run blame again.
All in all it has a negligible impact on the readability of the code. It's mostly aesthetic for me. It's ugly and only solves far fetched problems.
Do you scratch your name and SSN into the side of your car? What if your title blows away in the wind on the same day that city hall burns down destroying all ownership records?
Was going to post the exact same thing. I make use of this repeated git blame method all the time, and for everyone who is just learning this for the first time, you'll actually want to write `git blame <commit>~` to go back one commit from the commit hash in the blame, because otherwise you'll still get the same results on the line you're looking at.
Also, if you're using GitHub, their Blame view also a button that looks like a stack of rectangles between the commit information and the code. Clicking that will essentially do the same thing command-line git operation above.
> If your code is outliving your source control system you got bigger problems than whatever is in your comments.
You've never been at a company that finds it needs to upgrade from $old_version_control_sytem to $newer_version_control_system?
Because I've never seen that done as a big-bang rollout, it's always been "oh that code lives in the new system now, and we only kept 6 months of history"
Even just having an architect manually move folders around in the version control system has broken the history.
>You've never been at a company that finds it needs to upgrade from $old_version_control_sytem to $newer_version_control_system?
I've done this maybe a dozen times. Its always been big bang and I've never lost any history I didn't choose to lose. In fact I just did this last week for a 10 year old project that moved from SVN to GIT.
If you work somewhere where someone tells you this isn't possible consider finding a new job or alternatively become their new source control lord with your new found powers. Moving things between SCM systems is about the easiest thing you can do. Its all sequentially structured and verifiable. The tools just work everytime.
So then you checkout at the refactor commit and look through the blame to continue searching. If you have to repeat this more than a few times then the person has probably left the company or hasn't touched the code in years so its better to understand it yourself before modifying.
No worries, after typing it out I definitely feel like there should be an easier way to say "show me the stack of commits that touched this line(s) of code", and I'm sure some git wizard has a fancy one liner that could more easily do that.
If it gets moved then the blame will tell me who moved it, It will also tell me what the hash was before it was moved. That hash will have all the original information. Same for the refactor case.
The parent's comment holds when reformatting, especially in languages with suspect formatting practices like golang, where visibility rules are dictated by the case of the first letter (wat?) or how it attempts to align groups of constants or struct fields depending on the length of the longest field name. Ends up in completely unnecessary changes that divert away from the main diff.
What's the downside of adding a few extra characters!?
Of course, this view is already available to people: `git blame` - and it's the same for comments, so there is no need.
The exception is "notes to future self" during the development of a feature (to be removed before review), in which case the most useful place for them to appear is at the _start_ of the comment with a marker:
// TODO(jen20): also implement function X for type Y
What you've shown is code clutter and reductio ad absurdum to what I wrote in the top-level comment. I am speaking of architectural comments, bug-fixes, and especially in service architecture where unusual and catastrophic interactions might happen (or have happened) with code that's not under your control.
I think your comment is controversial, for a number of reasons. One, I think nobody should own code. Code should be obvious, tested, documented and reviewed (bringing the number of people involved to at least two), the story behind it should be either in the git comments or referenced to e.g. a task management system. Code ownership just creates islands.
I mean by all means assign a "domain expert" to a PART of your code, but no individual segment of code should belong to anyone.
Second: There's something to be said about avoiding churn. Everybody loves refactoring and rewriting code, present company included, but it muddles the version control waters. I've seen a few github projects where the guidelines stated not to create PRs for minor refactorings, because they create churn and version control noise.
Anyway, that's all "ideal world" thinking, I know in practice it doesn't work like that.
Either the code is recent (in which case 'git blame' works better since someone changing a few characters may or may not decide to add their name to the file) or it's old and the author has either left the company or has forgotten practically everything about the code.
But sometimes it is bad, and not fixable within the author's control. I occasionally leave author notes, as a shortcut. If I'm no longer here, yeah you gotta figure it all out the hard way. But if I am, I can probably save you a week, maybe a month. And obviously if its something you can succintly describe, you'd just leave a comment. This is the domain of "Based on being here a few years on a few teams, and three services between this one, a few migrations etc etc". Some business problems have a lot of baggage that aren't easily documented or described, its the hard thing about professional development especially in a changing business. There's also cases where I _didnt'_ author the code, but did purposefully not change something that looks like it should be changed. In those cases, without my name comment, git blame wouldn't point you to me. YMMV.
A 1000 times this. We never use git blame - who cares? The code should be self-explanatory, and if it's not, the author doesn't remember why they did it 5 years down the line either.
> First, the class name, SetupTeardownIncluder, is dreadful. It is, at least, a noun phrase, as all class names should be. But it's a nouned verb phrase, the strangled kind of class name you invariably get when you're working in strictly object-oriented code, where everything has to be a class, but sometimes the thing you really need is just one simple gosh-danged function.
Moving from Java as my only language to JavaScript and Rust, this point was driven home in spades. A programming language can be dysfunctional, causing its users to implement harmful practices.
SetupDeardownIncluder is a good example of the kind of code you get when there are no free-standing functions. It's also one path on the slippery slope to FactoryFactoryManager code.
The main problem is that the intent of the code isn't even clear. Compare it with something you might write in Rust:
If you saw that function at the top line of file, or if you saw render.rs in a directory listing, you'd have a pretty good idea of what's going on before you even dug into the code.
Just randomly searching the Fitness repo, there's this:
// Copyright (C) 2003,2004,2005 by Object Mentor, Inc. All rights reserved.
// Released under the terms of the GNU General Public License version 2 or later.
package fitnesse.html;
public class HtmlPageFactory
{
public HtmlPage newPage()
{
return new HtmlPage();
}
public String toString()
{
return getClass().getName();
}
}
For me all this kind of stuff is only to sell books, conferences and consulting services, and a big headache when working in teams whose architects have bought too much into it.
The problem is not really with this book IMHO. Most of its advice and guidelines are perfectly sensible, at least for its intended domain.
The problem is people applying principles dogmatically without seeing the larger picture or considering the context purpose of the rules in the first place.
This book or any book cannot be blamed for people applying the advice blindly. But it is a pervasive problem in the industry. It runs much deeper than any particular book. I suspect it has something to do with how CS education typically happen, but I'm not sure.
Why is it that software engineering is so against comments?
I know nothing of clean code. When I read the link, I assumed that clean code meant very simple and well commented code. I hit cmd+f # and nothing came up. Not one comment saying "this function is an example of this" or "note the use of this line here, it does this" etc, on a blog no less where you'd expect to see these things. The type of stuff I put in my own code, even the code that only I read, because in two weeks I'm going to forget everything unless I write full sentences to paragraph comments, and spend way more time trying to get back in the zone than the time it took me to write those super descriptive comments in the first place.
I hate looking at other peoples scripts because, once again, they never write comments. Practically ever. What they do write is often entirely useless to the point where they shouldn't have even bothered writing those two words or whatever. Most people's code is just keyboard diarrhea of syntax and regex and patterns that you can't exactly google for, assuming whoever is looking at the code has the exact same knowledge base as you, and knows everything that you've put down into the script. Maybe it's a side effect of CS major training, where you don't write comments on your homework because the grader is going to know what is happening. Stop doing that with your code and actually make a write up to save others (and often yourself) mountains of time and effort.
> Why is it that software engineering is so against comments?
Good question. Funny thing is, I worked for a company that mandated that every method be documented, which gets you a whole bunch of "The method GetThing(name) gets a Thing, and the argument 'name' is the name of the Thing". Plus 4 lines of Doxygen boilerplate. Oof.
Of course, I've seen my share of uncommented, unreadable code. And also code with some scattered comments that have become so mangled over 10 years of careless maintenance and haphazard copy-pasting that their existence is detrimental. Many of the comments I come across that might be useful are incoherent ungrammatical ramblings. In large projects, often some of the engineers aren't native English speakers.
My point being that writing consistently useful comments (and readable, properly organized code) is hard. Very, very hard. It requires written communication skills that only a small percentage of engineers (or even humans in general) are capable of. And the demand for software engineers is too high to filter out people who aren't great at writing. So I guess many people just try to work around the problem instead.
There's something bad about going over-the-top halfway. Those sort of strict rules that everyone follows half assed are so common on software teams (and the rest of the business and society, but whatevs). It seems like they have all the downsides of both strictness and laxness. It would work better if you just let devs do their things. It would also work better if you went all the way insane. Like the first time you write some garbage docstring like that the CTO comes to your desk and tells you off. I'm not saying that would be the right move in this case, but at least it's something.
One reason is that comments get stale. People need to maintain them but probably won't. Second reason is that they think the code should be self-documenting. If it's not then you just need better names and code structure. Many books like clean code advocate this approach, and that's where I first learned the idea of don't write comments as well.
Personally now I've held both sides of the argument at different times. I think in the end it's a trade-off. There's no hard and fast rule, you need to use your best judgement about what's going to be easiest for the reader and also for the people maintaining the code. Usually I just try to strike a balance that I think my coworkers won't complain about. The other thing I've realized that makes this tricky is that people will almost always overestimate their (or others) commitment to maintaining comments, and/or overestimate how "self-documenting" their code is.
It's also probably time to stop recommending TDD, object-oriented programming, and a host of other anti-patterns in software development, and get serious about treating it like a real engineering profession instead of a circus of personality- and company-driven paradigms.
It is interesting that he uses a fitnesse example.
Years ago we started using fitnesse at a place I was working, and we needed something that was not included, I think it was being able to make a table of http basic auth tests/requests.
The code base seems large and complex at first, but I was able to very quickly add this feature with minimal changes and was pretty confident it was correct. Also, I had little experience in Java at the time. All in all it was a pretty big success.
Interesting, is probably the wrong, word. I should say interesting to me, because I had a different experience with it. And it was not any sort of theoretical analysis, it was a feature I needed to get done.
Like everything else: it's fine in moderation. Newbies should practice clean code, everybody else gets to make their own decisions. Treating anything as dogma whether it is Clean Code, TDD, Agile or whatever is the flavor of madness of the day is going to lead to suboptimal outcomes. And they're also a great way to get rid of your most productive and knowledgeable team members.
So apply with caution and care and you'll be fine.
There's a word, in other comments, that I expected to find: zealots. Zealots aren't sufficiently critical, and they don't want to think for themselves; a reasonable person should be able to, and a professional should be constantly itching to, step back, look at code, and decide whether some refactoring or rewriting is an improvement, taking a book like Clean Code as a source of general principles and good examples, not of rules.
All the "bad" examples discussed in the article are rather context dependent, representing uninspired examples or extreme tastes in the book rather than bad or obsolete ideas.
Shredding medium length meaningful operations into many very small and quite ad hoc functions can reduce redundancy at the expense of readability, which might or might not be an improvement; a little DSL that looks silly if used in a couple of test cases can be readable and efficient if more extensive usage makes it familiar; a function with boolean arguments can be an accretion of special cases, mature for refactoring or a respectable way to organize otherwise repetitive code.
Most of these types of books approach things from the wrong direction. Any recommendation should look at the way well designed, maintainable systems are actually written and draw their conclusions from there. Otherwise you allow too much theorizing to sneak in. Lots of good options to choose from and everyone will have their own pet projects, but something like SQLite is probably exemplary of what a small development team could aim for, Postgres or some sort of game engine would maybe be good for a larger example (maybe some of the big open source projects from major web companies would be better, I don't know).
There are books that have done something like this[0], but they are a bit high level. There is room for something at a lower level.
"Promote I/O to management (where it can't do any damage)" is the actionably good thing i've taken from Brandon Rhoades' talk based on this: https://www.youtube.com/watch?v=DJtef410XaM
Living in a world where people regularly write single functions that: 1. loads data from a hardcoded string path of file location 2. does all the analysis inside the same loop that the file content iteration happens in and 3. plots the results ... that cleavage plane is a meaningfully good one.
The rest of the ideas fall into "all things in moderation, including moderation", and can and should be special-cased judiciously as long as you know what you're doing. But oh god please can we stop writing _that_ function already.
Let's not throw the baby out with the bathwater. We can still measure how quickly new (average) developers become proficient, average team velocity over time, and a host of other metrics that tell us if we are increasing or decreasing the quality of our code over time. Ignoring it all because it's somewhat subjective is selfish and bad for your business.
Leave off the word "clean" or whatever... DO have metrics and don't ignore them. You have people on your team that make it easier for the others, and people who take their "wins" at the expense of their teammates' productivity.
I know that I'm late to this party, but what would Clean Coders think about algorithm heavy weight code like "TimSort.java"[1]? (This is the Java port of the famous Python stable sort.) Since Java doesn't have mutable references (pointers) or allow multiple return values, it gets very tricky to manage all the local variables across different functional scopes. I guess you could put all your locals into a struct/POJO, and then pass it around to endless tiny functions. (Honestly, the Java regex library basically does this... sucessfully.) Somehow, I feel it would be objectively worse if this algo code were split into endless 1/5/10 line functions! (Yes yes, that is an _opinion_... not a fact!)
Come to think of it, is the original C code for Python's timsort equally non-Clean Code-ish?[2] Probably not!
Articles like these make me feel better about never having read any of the 'how to code' books. Mainly substituting them by reading TheDailyWTF during the formative years.
I have the same complaint with Code Complete. I read bits in college and I'm not sure I follow most of its advice today (i.e. putting constants on the left side of a comparison).
However, the book also presents the psych study about people not remembering more than 7 (+/- 2) things at a time (therefore you should simplify your code so readers don't have to keep track of too much stuff) and it stuck with me. I must be one of the people with only 5 available slots in their brain...
That study was done for specific stimuli (words, digits), and doesn't generalize to e.g. statements. There are studies that show that rate of presentation, complexity, and processing load have an effect. However, STM capacity is obviously limited, so it's good to keep that in mind when you're worried about readability. And I think it's also safe to assume that expert programmers can "chunk" more than novices, and have a lower processing load.
> putting constants on the left side of a comparison
Yoda conditions? I hate those, they are difficult to read. Yes they are useful for languages which allow assignments in conditionals, but even then it's not really worth it. It's a very novice mistake to make. For me equality rarely appears in conditionals, it's either a numeric comparison or checking for existence.
Also, the tooling and compiler speed aren't fucked like they are in Scala or Kotlin. I like Kotlin, especially the null-safety, but the ecosystem's quality is kinda shoddy. Everything is just faster and less buggy in Java.
Honestly Clean Code probably isn't worth recommending anymore. We've taken the good bits and absorbed it into best practice. I think it has been usurped by books like "Software Engineering at Google" and "Building Secure and Reliable Systems".
I don't believe in being prescriptive to anyone about how they write their code, because I think people have different preferences and forcing someone to write small functions when they tend to write large functions well is a unique form of torture. Just leave people alone and let them do their job!
I don't think it is the perfect solution, but a lot of people assert "we can't do better, no point in trying, just write whatever you feel like" and I think that is a degenerate attitude. We CAN find better ways to construct and organize our code, and I don't think we should stop trying because people don't want to update their pull requests.
I've heard this before, and I agree, but don't let the name put you off. I agree that designing and iterating for google scale is a bad idea, but there is a lot in that book that is applicable to all software teams.
Maybe there is no such thing as clean code by following a set of rules. I know the author of the book never advocateshis book as a "bible" but it does give the reader such feeling.
There is only years, decades of deep experience into certain domains (e.g. game rendering engine programming, or, Customer Relationship backends), extra hours reading proved high quality code, countless times of reflection on existing code (that also means extra hours reviewing existing code) based on the reading and a strong will to improve them, not based on some set of rules, but based on common sense on programming and many trials of re-writing the code into another form.
I think ultimately it goes down to something similar to 10,000-hour rule: We need to put down a lot of time in X, and not only that, we also need to challenge ourselves for every step.
I think the book is still useful with all it's flaws, mainly because "overengineering code" is like exercising too much: sure, it happens to some and can be a big issue, but for the vast majority of people it's the opposite that is the problem.
What well-regarded codebases has this author written, so you can see his principles in action? OTOH, if you’re wondering about the quality of John Ousterhout’s advice in _A Philosophy of Software Design_, you can just read the Tcl source.
The article quotes a code sample from FitNesse – the author has apparently maintained that codebase since then. You can check out the code for the current version at https://github.com/unclebob/fitnesse, or browse the code in a Monaco editor using https://github1s.com/unclebob/fitnesse/blob/HEAD/src/. (I have no idea if that code is “well-regarded”, but as you wrote, you can read it for yourself.)
I'm surprised by the amount of detractors.
We know from history that any book with advice should not be taken too literal. Reading the comments here, it feels almost like I read a different book (about 10 years ago).
I did actually found the SetupTeardownIncluder class author complains about really easier to read and interpret then the original version. I know from the start what the intention was and where I should go if I have issue with some part of the code.
I dont even take issue with name. It makes it easy to find the class just by remembering what it should do. I dont really care all that much about verbs vs nouns. I want to be able to find the right place by having rough idea about what it should do. I want to get hint about class functionality from its name too.
'Clean Code' is a style, not all practices are best. I feel that a good software team understands each other's styles, therefore making it easier to read others code within the context of a style. However, when people disagree on code style it has a way of creating cliques within teams, so sometimes it's just easier to pick a style that is well documented already and be done with mainly petty disagreements. Clean code fits the definition of well documented and is a lazy way of defining a team wide style.
I am interested in reading books about software development and best practices like Clean Code and The Pragmatic Programmer [0]. I have coded for about eight years, but I would like to do it better. I would like to know your opinion about [0], since Clean Code has been significantly criticized.
How about we throw Clean Architecture in this while we're at it. And also realize that the only rule in SOLID that isn't defined subjectively or partially is the "L".
This is the first time I've heard of this book. I certainly agree some of these recommendations are way off the mark.
One guideline I've always tried to keep in mind is that statistically speaking, the number of bugs in a function goes way up when the code exceeds a page or so in length. I try to keep that in mind. I still routinely write functions well over a page in length but I give them extra care when they do, lots of comments and I make sure there's a "narrative flow" to the logic.
The big one to keep an eye on is cyclomatic complexity with respect to function length. Just 3 conditional statements in your code gives you no less than 8 ways through your code and it only goes up from there.
All of these 'clean code' style systems have the same flaw. People follow them without understanding why the system was made. It is why you see companies put in ping pong tables, but no one uses them. They saw what someone else was doing and they were successful so they copy them. Not understanding why the ping pong table was there. They ignore the reason the chesterton's fence was built. Which is just as important if you are removing it. Clean code by itself is 'ok'. I personally am not very good at that particular style of coding. I do like that it makes things very nice to decompose into testing units.
A downside to this style of coding is it can hide complexity with an even more complex framework. It seems to have a nasty side effect of smearing the code across dozens of functions/methods which is harder in some ways to get the 'big picture'. You can wander into a meeting and say 'my method has CC of 1' but the realty is that thing is called at the bottom of a for loop, inside of 2 other if conditions. But you 'pass' because your function is short.
4 line functions everywhere is insanity. yes, you should aim for short functions that do one thing, but in the real world readability and maintainability would suffer greatly if you fragment everything down to an arbitrarily small number.
Number of bugs per line also goes way up when the average length of functions goes below 5, and the effect in most studies is larger than the effect of too large functions.
> Why are we using both int[] and ArrayList<Integer>? (Answer: because replacing the int[] with a second ArrayList<Integer> causes an out-of-bounds exception.)
Isn't it because one is pre-allocated with a known size of n and the other is grown dynamically?
> And what of thread safety?
Indeed. If he had written the prime number class like the earlier example, with public static methods creating a new instance for each call and all the other methods being private instance methods, this wouldn't be an issue.
Pardon me for stating this but pundits of the Clean Code mantra I've worked with tend to be those consultants who bill enormous amount of money to ensure that they have lengthy contracts which is justified by wrapping codes in so much classes to abstract it and considered *CLEAN* and *TESTABLE*.
They will preach the awesomeness of clean code in terms of maintainability, scalability and all those fancy enter-pricey terms that at the end of the day, brought not enough value to justify their cost.
Welcome to another episode of "X, as per my definition of X, is bad - Let's talk about Y, which is another definition of X, but not the one I disagree with".
So many coding recommendations trip up when they fail to take into account Ron's First Law: all extreme positions are wrong. Functions that are too long are bad, but functions that are too short are equally bad. 2-4 lines? Good grief! That's not even enough for a single properly formatted if-then-else!
IMHO it's only when you can't see the top and bottom of a function on the screen at the same time that you should start to worry.
I don't completely disagree, but his point about the irrelevance of SOLID, OO, and Java in this supposedly grand new age of FP programming ignores that OO is still the pre-eminent paradigm for most applications and Java is remains one of the largest and most utilized languages in the world. Also, I would say that excitement around FP has waned more than it has for Java.
I often hear that people should read Clean Code and that it is necessary in large projects. I would say that there is no direct correlation between how large and complex the business logic is and the difficulty of understanding and maintaining the code. I have seen small simple applications that are not maintainable because people have followed SOLID to the extreme.
One of the biggest issues I have found is that I can sometimes not easily create a test for code that I have modified because it is part of a larger class (I'm coding in C++). This normally happens when I cannot extract the function out of the class and it relies on internal state, and the class is not already being tested.
Love to know if there is an easy way of doing this!
A lot of Robert C Martins pieces are just variations on his strong belief that ill-defined concepts like "craftsmanship" and "clean code" (which are basically just whatever his opinions are on any given day) is how to reduce defects and increase quality, not built-in safety and better tools, and if you think built-in safety and better tools are desirable, you're not a Real Programmer (tm).
I'm not the only one who is skeptical of this toxic, holier-than-thou and dangerous attitude.
Removing braces from if statements is a great example of another dangerous thing he advocates for no justifiable reason
>The current state of software safety discussion resembles the state of medical safety discussion 2, 3 decades ago (yeah, software is really really behind time). > >Back then, too, the thoughts on medical safety also were divided into 2 schools: the professionalism and the process oriented. The former school argues more or less what Uncle Bob argues: blame the damned and * who made the mistakes; be more careful, damn it. > >But of course, that stupidity fell out of favor. After all, when mistakes kill, people are serious about it. After a while, serious people realize that blaming and clamoring for care backfires big time. That's when they applied, you know, science and statistic to safety. > >So, tools are upgraded: better color coded medicine boxes, for example, or checklists in surgery. But it's more. They figured out what trainings and processes provide high impacts and do them rigorously. Nurses are taught (I am not kidding you) how to question doctors when weird things happen; identity verification (ever notice why nurses ask your birthday like a thousand times a day?) got extremely serious; etc. > >My take: give it a few more years, and software, too, probably will follow the same path. We needs more data, though.
Clean Code is not a blind religion, Uncle Bob is trying to make a point with the concepts behind the book and teaching you to consider/question if you're falling into bad code traps.
This book was written to make developers think and consider their choices, not a script for good code.
An output argument is when you pass an argument to a function, the function makes changes, and after returning you examine the argument you passed to see what happened.
Example: the caller could pass an empty list, and the method adds items to the list.
Why not return the list? Well, maybe the method computes more things than just the list.
> Why not return the list? Well, maybe the method computes more things than just the list.
Or in C you want to allocate the list yourself in a particular way and the method should not concern with doing the allocation itself. And the return value is usually the error status/code since C doesn't have exceptions.
That's a C/C++ trick where a location to dump the output is presented as an argument to the function. This makes functions un-pure and leads to all kind of nastiness such as buffer overruns and such if you are not very careful.
It's wrong to call output parameters a "C/C++ trick" because the concept really has nothing to do with C, C++, buffer overruns, purity, or "other nastiness".
The idea is that the caller tells the function its calling where to store results, rather than returning the results as values.
For example, Ada and Pascal both have 'out' parameters:
Theoretically, other than different calling syntax, there's conceptually no difference between "out" parameters and returning values.
In practice, though, many languages (C, C++, Java, Python, ...) support "out" parameters accidentally by passing references to non-constant objects, and that's where things get ugly.
Not only in C land; C# has "ref" (pass by reference, usually implying you want to overwrite it) and "out" (like ref but you _must_ set it in all code paths). Both are a bit of a code smell and you're nearly always better off with tuples.
Unfortunately in C land for all sorts of important system APIs you have to use output arguments.
An output argument (or parameter) is assigned a result. In Pascal, for instance, a procedure like ReadInteger(n) would assign the result to n. In C (which does not have variable parameters) you need to pass the address of the argument, so the function call is instead ReadInteger(&n). The example function ReadInteger has a side effect so it is therefor preferable to use an output parameter rather than to return a result.
1. Clean Code is not a fixed destination to which you'll ever arrive. It's a way of life.
2. We might not use the same methods to write clean code. But when you see clean code, you know it is clean.
3. Some traits clean code can have
- When you read the function name, you understand what the function does
- When you read the content of a function, you can understand what it is about without reading line by line.
- When you try to refactor clean code, you find yourself sometimes ending up only changing one cog in the whole system.
I worked at a large billion dollar company in the Bay Area (who is in the health space) and they religiously followed Clean Code. Their main architect was such a zealot for it.
My problem is not with the book and author itself but the seniority that peddles this as some gospel to the more junior engineers.
Clean code is not end all be all. Be your own person and develop WHAT IS RIGHT FOR YOUR ORG not peddle some book as gospel
So glad I work at a company now where we actually THINK on the right abstractions now and not peddle some book
It's an ok book to read and think about, but understand it is written by someone that hasn't really built a lot of great software, but rather is paid to consult and give out sage advice that is difficult to verify.
Read with great skepticism, but don't feel bad if you decide not to read it at all.
I'd like to say that software engineering is a lot like playing jazz. It's really hard for the beginner to know where to start, and there're also endless sources for the "right" way to do things.
In truth however, like playing jazz, the only thing that really matters is results, and even those can be subjective. You can learn and practice scales all day long, but that doesn't really tell you how to make music.
I developed a style of software engineering that works really well for me. It's fast, accurate, and naturally extensible and easily refactorable. However, for various reasons, I've never been able to explain it junior (or even senior) engineers, when asked about why I code a certain way. At a certain point, it's not the material that matters, but the audience's ability to really get what's at the heart of the lesson.
Technically, you could say something that's indisputable accurate, like, "there're only 12 notes in an (western) octave, and you just mix them until they sound good", but that's obviously true to a degree that's fundamentally unhelpful. At the same time, you could say "A good way to think about how to use a scale is to focus less on the notes that you play and more on the ones you don't". This is better advice but it may altogether be unhelpful, because it doesn't really yet strike at the true heart of what holds people back.
So at a certain point, I don't really know if anyone can be taught something as fundamentally "artful" (I.e. a hard thing largely consisting of innumerable decisions that are larger matters of taste - which is a word that should not be confused with mere "opinion") as software engineering or jazz music. This is because teaching alone is just not enough. At a certain point people just have to learn for themselves, and obviously the material that's out there is helpful, but I'm not sure if anything can ever be explained so well as to remove the need for the student at a certain point to simply "feel" what sounding good sounds like, or what good software engineering feels like.
I'll add one last thing. Going back to what I was saying about not being able to explain to "junior (or even senior)" engineers. Were the same "lesson" to happen with someone who is very, very advanced, like a seasoned principal engineer who's built and delivered many things, time and time again, across many different engineering organizations and different technologies - someone like a jazz music great for example. Anything I would have to say on my approach to such a software engineer would be treated as obvious and boring, and they'd probably much rather talk about something else. I don't say this because I mean to imply that whatever I would have to say is wrong or incorrect, but rather at a certain level of advancement, you forget everything that you know and don't remember what it took to get there. There are a few who have a specific passion for teaching, but that's orthogonal to the subject.
I think it was Bill Evans who said something like "it takes years of study and practice to learn theory and technique, but it takes still a lifetime to forget it all". When you play like you've forgotten it all, that is when you achieve that certain sound in jazz music. Parenthetically, I'll add that doesn't mean you can't sound good being less advanced, but there's a certain sound that I'm trying to tie together with this metaphor that's parallel to great software engineering from knowledge, practice, and then the experience to forget it all and simply do.
I think that's fundamentally what's at the heart of the matter, not that it takes anyone any closer to getting there. You just have to do it, because we don't really know how to teach how to do really hard things in a way that produces reproducible results.
Uncle "Literally who?" Bob claims you should separate your code into as many small functions spread across as many classes as you can and makes a living selling (proverbial) shovels. John Carmack says you should keep functions long and have the business logic all be encapsulated together for mental cohesion. Carmack makes a living writing software.
I happen to agree with you and have posted in various HN threads over the years about the research on this, which (for what it's worth) showed that longer functions were less error prone. However, the snarky and nasty way that you made the point makes the comment a bad one for HN, no matter how right you are. Can you please not post that way? We're trying for something quite different here: https://news.ycombinator.com/newsguidelines.html.
It's even more important to stick to the site guidelines when you're right, because otherwise you discredit the truth and give people a reason to reject it, which harms all of us.
> The function that is least likely to cause a problem is one that doesn't exist, which is the benefit of inlining it. If a function is only called in a single place, the decision is fairly simple.
> In almost all cases, code duplication is a greater evil than whatever second order problems arise from functions being called in different circumstances, so I would rarely advocate duplicating code to avoid a function, but in a lot of cases you can still avoid the function by flagging an operation to be performed at the properly controlled time. For instance, having one check in the player think code for health <= 0 && !killed is almost certain to spawn less bugs than having KillPlayer() called in 20 different places.
On the spectrum you've described, I'm progressively shifting from Uncle Bob's end to Carmack's the further I get into my career. I think of it as code density. I've found that high density code is often easier to grok because there's less ceremony to keep in my head (e.g. many long method names that may or may not be named well, jumping around a bunch of files). Of course, there's a point at which code becomes so dense that it again becomes difficult to grok.
Or perhaps the length of the function is orthogonal to the quality of the author's code. Make the function as long as necessary to be readable and maintainable by the people most likely to read and maintain it. But that's not a very sellable snippet, nor a rule that can be grokked in 5 minutes.
Carmack is literally the top .1% (or higher) of ability and experience. Not to mention has mostly worked in a field with different constraints than most. I don't think looking to him for general development advice is all that useful.
Read the doom source code and you can see that he didn't mess around with trying to put everything into some nonsense function just because he has some part of a larger function that can be scoped and named.
The way he wrote programs even back then is very direct. You don't have to jump around into lots of different functions and files for no reason. There aren't many specialized data structures or overly clever syntax tricks to prove how smart he is.
There also aren't attempts to write overly general libraries with the idea that they will be reused 100 times in the future. Everything just does what it needs to do directly.
But why should any of that be aspirational to an "average" developer? It's like learning how to do mathematics by copying Terrence Tao's patterns of behavior. Perhaps Carmack's output is more a function of the programmer than an indication of good practice for average devs.
I guess I'm still not being clear - when you read John Carmack's programs you realize that it just isn't necessary to do complex nonsense.
If you take a look at the doom source code you realize this isn't the cutting edge of mathematics, he is cranking out great software by avoiding all that and using high school algebra (literally and figuratively) instead.
While other people spin their wheels sweating over following snake oil bob's pamphlet Carmack is making programs that people want, source code that people want and work that stands the test of time.
The circumstances he operated under while writing Doom were much different than most people encounter. The couple of people working with him on the code were all experts in their field and have a complete understanding of the problem space. What you seem to be identifying as indications of unnecessary complexity of modern development practices might really just be accident of circumstance and the individual skill of the contributors at the time. It is a mistake to look at the behaviors of the unusually talented few and see takeaways to apply more broadly.
He wrote doom by himself and worked with people on later projects. You can look at the source yourself. What he himself said is the exact opposite of what you are saying - because he was able to write things directly he was able to experiment a lot. He was able to get the easier things working and go through trial and error on more difficult aspects.
If you would actually read through some of his work you would see that it a refreshingly simple way to do thing. No trying to hide that global data isn't global, no unnecessary indirection etc. etc.
> What you seem to be identifying as indications of unnecessary complexity of modern development practices might really just be accident of circumstance and the individual skill of the contributors at the time
I have no idea what this is supposed to mean.
> It is a mistake to look at the behaviors of the unusually talented few and see takeaways to apply more broadly.
This is the point you are trying to make but you just keep repeating it without backing it up in any way.
John Carmack used his skill to do things in a simple and direct way. Anyone can start to imitate that immediately. There is no invisible magic going on, he just doesn't subscribe to a bunch of snake oil nonsense that distracts people from writing the parts of their program that actually do things.
>This is the point you are trying to make but you just keep repeating it without backing it up in any way.
Do I really need to back up the claim that what applies to the rare talents doesn't automatically apply to everyone? That should be the default assumption unless proven otherwise.
> Do I really need to back up the claim that what applies to the rare talents doesn't automatically apply to everyone? That should be the default assumption unless proven otherwise.
You do actually, yes
This makes me think you are just ignoring what I'm actually saying. Read some of his source code like I mentioned multiple times and tell me this somehow only applies to John Carmack. There is no reason anyone couldn't program that way, but people have their head filled with nonsense and get distracted thinking they need unnecessary overhead. John Carmack's programs are incredibly simple and clear. Why would someone try to emulate this charlatan who doesn't make anything when they could mimic Carmack?
This isn't saying everyone can drive as fast as a race car driver, it's saying that you should at least go in the same direction if you want to get to the same place.
>Read some of his source code like I mentioned multiple times and tell me this somehow only applies to John Carmack.
I've read the Quake 3 Arena source code before. I wasn't impressed. In fact, it was what I would consider "bad code". Granted, his constraints were different so I don't judge him or his code for it. It turns out you cannot generalize from very specific scenarios with unique constraints, as I have been saying. I really don't see what general principles you think you can glean from the source code written by one unusually capable man in a mad rush to be the first to deliver a revolutionary gaming experience. The fact that you think you can is rather puzzling.
If you were so unimpressed, why do you think Carmack is in the 0.1% of programming skill?
> In fact, it was what I would consider "bad code".
lol, I don't think history agrees with you.
> The fact that you think you can is rather puzzling.
It shouldn't be puzzling since I gave you half a dozen examples of pitfalls that Carmack doesn't fall in to.
Let's reiterate: Carmack doesn't do any of the bob martin snake oil bullshit and he writes very simple, clear, direct programs that were used as foundations for multiple companies and reused over and over for decades after. He demonstrates what exceptional programming looks like through simplicity and ignoring nonsense silver bullets from charlatans.
You can keep saying how baffled and confused you are, but you haven't actually given any examples or anything concrete at all.
The biggest tell that Bob Martin isn't operating from direct experience and first principals is his 180 from object-oriented to functional programming. You don't get a dichotomy like that if you derived your opinions. The principals that functional programming is built on, what actually provides benefit to the program, would have gradually permeated the earlier teachings, they wouldn't just come out of nowhere and flip the entire ideology.
He's selling a product. He'll say what sounds convincing and he won't say things that are true but unpopular (or boring). I don't think it's deliberate but the end result is similar.
One point I do think is relevant though, is that smarter people and better programmers can cope with more complexity in one place. Carmack can probably cope with a lot more state/lines/whatever within a single function, whereas chunking everything up into a form digestible by 5-year-olds is sometimes necessary for your code to be approached by a junior. But the consequence of making dostoevsky digestible by children is that you either remove a significant amount of the content, or the book becomes hundreds of thousands of pages long. And it raises the obvious question, do you want your 747 designed by juniors?
The type of software John writes is different (much more conceptually challenging), and I don't recall him being as big of a proponent of TDD (which is the biggest benefit to small functions).
I think the right answer depends on a number of other factors.
The problem with Clean Code is also the problem with saying to ignore Clean Code. If you treat everything as a dogmatic rule on how to do things, you're going to have a bad time.
Because, they're more like guidelines. If you try not to repeat yourself, you'll generally wind up with better code. If you try to make your methods short, you'll generally wind up with better code.
However, if you abuse partial just to meet some arbitrary length requirement, then you haven't really understood the reason for the guideline.
But the problem isn't so much because the book has a mix of good and bad recommendations. We as an evolutionary race have been pretty good at selectively filtering out bad recommendations over the long term.
The problem is that Uncle Bob has a delusional cult following (that he deliberately cultivated), which takes everything he says at face value, and are willing to drown out any dissenting voices with a non-stop barrage of bullshit platitudes.
There are plenty of ideas in Clean Code that are great, and there are plenty that are terrible...but the religiosity of adherence to it prevents of from separating the two.
Clean Code is fine. It's a little dated, as you would expect, and for the most part, everything of value in it has been assimilated into the best practices and osmotic ether that pervades software development now. It's effectively all the same stuff as you see in Refactoring or Code Complete or Pragmatic Programmer.
I suspect a lot of backlash against it centers around Uncle Bob's less than progressive political and social stances in recent years.
I never read Clean Code and know nothing about its author so I'm willing to trust you on the first part, but the second paragraph is really uncalled for IMO. The article is long and gives precise examples of its issues with the book. Assuming an ulterior motive is unwarranted.
This article is garbage. The argument is basically like saying "famous scientist X was wrong about Y, let's stop doing science. Clearly there is no point to it."
I cannot believe what I am reading here.
My open source community knows exactly what good code looks like and we've delivered great products in very short timeframes repeatedly and often beating our own expectations.
These kinds of articles make me feel like I must have discovered something revolutionary... But in reality I'm just following some very simple principles which were invented by other people several decades ago.
Too many coders these days have been misled into all sorts of goofy trends. Most coders don't know how to code. The vast majority of the people who claim to be experts and who write books about it don't know what they're talking about. That's the real problem. The industry has been hijacked by people who simply aren't wise or clever enough to be sharing any kind of complex knowledge. There absolutely is such a thing as good code.
I'm tired of hearing developers who have never read a single word of Alan Kay (the father of OOP) tell everyone else how bad OOP is and why FP is the answer. It's like watching someone drive a nail straight into their own hand and then complain to everyone that hammer and nails are not the right tool for attaching two pieces of wood together... That instead, the answer is clearly to tie them together with a small piece of string because nobody can get hurt that way.
Just read the manual written by the inventor of the tool.
Alan Kay said "The Big Idea is Messaging"... Yet almost none of the OOP code I read designs their components in such a way that they're "communicating" together... Instead, all the components try to use methods to micromanage each other's internal state... Passing around ridiculously complex instances to each other (clearly a whole object instance is not a message).
> The argument is basically like saying "famous scientist X was wrong about Y, let's stop doing science. Clearly there is no point to it."
In my opinion the argument is more "famous paper X by scientist Y was wrong, let's stop citing it". Except that Clean Code isn't science and doesn't pretend to be.
If the article only attacked that specific book "Clean Code", then I would not be as critical. But the first line in the article suggests that it is an attack against the entire idea of writing good quality code:
'It may not be possible for us to ever reach empirical definitions of "good code" or "clean code"'
It might seem far fetched that someone might question the benefits of writing high quality code (readable, composable, maintainable, succinct, efficient...) but I've been in this industry long enough (and worked for enough different kinds of companies) to realize that there is an actual agenda to push the industry in that direction.
Some people in the corporate sphere really believe that the best way to implement software is to brute force it by throwing thousands of engineers at a giant ball of spaghetti code then writing an even more gargantuan spaghetti ball of tests to ensure that the monstrosity actually works.
> is an attack against the entire idea of writing good quality code:
> 'It may not be possible for us to ever reach empirical definitions of "good code" or "clean code"'
I read it as an attack against the idea that there are hard and fast, objective and empirically verifiable rules for good quality code. The Clean Code book itself is a perfect example of how subjective such rules are. I feel really sorry for you if the only methods for software development that you know are "brute force it by throwing thousands of engineers at a giant ball of spaghetti code" and sticking a book that has some fanatic supporters. "Readable", "maintainable", "succint" or "efficient" don't really describe Martin's examples and many functional programming enthusiasts would question "composable" too. Yes, I wasted several hours of my life reading that book and I'm never getting them back.
I never said that this is what I believe. I said it's what a lot of people in the corporate sphere believe. It's the opposite of what I believe.
OOP solves complex problems in a simple way.
Functional Programming solves simple problems in a complex way.
Some concepts from FP are useful when applied within OOP, but pure FP is simply not practical. It doesn't scale in terms of code size, it's inefficient, it's inflexible, it takes longer to develop and maintain, it's less readable because it encourages devs to write long chains of logic spread out across many files. FP's lack of emphasis on important concepts such as blackboxing, high cohesion and loose coupling encourages developers to produce poor abstractions whose names sound highly technical but whose responsibilities are vague and impossible to explain without a long list of contrived statements which have little in common with one another.
Abstractions in FP tend to be all over the place. It seems to encourage vague, unstructured thinking. Decoupling state from logic makes it impossible to produce abstractions which are high cohesion and loosely coupled. It forces every component to mind every other component's business.
This is madness. If you don't care about structure, why not just write the entire system as a single file and define thousands of functions which call each other all over the place? You would get the same spaghetti, but you would save yourself the effort of having to jump around all these files which don't add any meaningful structure anyway.
It's always the case that when I present these arguments above to FP devs, they respond with personal insults instead of counter-arguments. This suggests that they know my arguments are accurate but they are too invested in FP and are in denial - It's emotional, so they respond emotionally.
You shouldn't think of it like "I've been fooled and wasted all this time on FP". You should think about it like "I gave FP a thorough analysis over several years and it was a worthwhile experiment which didn't work out but I learned a lot from it".
I also spent quite a lot of time working with and reading FP code over the last 15 years. That's why I can criticize it with confidence today. It was not a waste.
Comments like "Functional Programming solves simple problems in a complex way, "Abstractions in FP tend to be all over the place." or "If you don't care about structure, why not just write the entire system as a single file and define thousands of functions which call each other all over the place?" don't really invite polite discussion. To me they look like you either are reacting on pure emotion, or don't understand what you are writing about (note that this opposite of "accurate"). It also seems that you think that I'm whatever you believe "functional programmers" are.
If there's something we maybe agree on, it's that enforcing pure functionality (I guess that's what you're writing about. If you think functions as first class objects is bad, you're incorrigibly wrong) is about as bad as maximal OOP that Martin is preaching. There's a lot of space between these two extremes if you're willing to look. And a lot of money to be made by sticking to an ideology and preaching it.
It would be an interesting experiment if you could show me the GitHub repo of the best written open source FP project you've ever encountered and I could point out its flaws and rank them on a scale based on how critical they are in terms of maintenance and performance.
> But the first line in the article suggests that it is an attack against the entire idea of writing good quality code:
> 'It may not be possible for us to ever reach empirical definitions of "good code" or "clean code"
If you re-read that line you quoted you may find that it talks of the (im)possibility of defining what "good code" or "clean code" is, not of actually writing it.
Martin is, and always has been, a plagiarist, ghost-written, clueless, idiot, with a way of convincing other know-nothings that he knew something. At one time he tried to set up a reputation on StackOverflow, and was rapidly seen off.
Yeah I agree with the author, and I would go further, it's a nice list of reasons why Uncle Bob is insufferable.
Because of stuff like this:
> Martin's reasoning is rather that a Boolean argument means that a function does more than one thing, which it shouldn't.
Really? Really? Not even for dependency injection? Or, you know, you should duplicate your function into two very similar things to have one with the flag and another one without. Oh but DRY. Sure.
> . He says that an ideal function has zero arguments (but still no side effects??), and that a function with just three arguments is confusing and difficult to test
Again, really?
I find it funny who treats him as a guru. Or maybe that's the right way to treat him, like those self-help gurus with meaningless guidance and whishy-washy feel-good statements.
> Every function in this program was just two, or three, or four lines long. Each was transparently obvious. Each told a story. And each led you to the next in a compelling order.
Wow, illumination! Salvation! Right here!
Until, of course you have to actually maintain this and has to chase down 3 or 4 levels of functions deep what is it that the code is actually doing. And think of function signature for every minor thing. And passing all the arguments you need (ignoring that "perfect functions have zero arguments" above - good luck with that)
Again, it sounds like self-help BS and not much more than that.
> Until, of course you have to actually maintain this and has to chase down 3 or 4 levels of functions deep what is it that the code is actually doing.
The art is to chain your short functions like a paragraph, not to nest them a mile deep, where the "shortness" is purely an illusion and the outer ones are doing tons of things by calling the inner ones.
That's a lot harder, though.
But it fits much better with the spirit of "don't have a lot of args for your functions" - if you're making deeply nested calls, you're gonna have to pass all the arguments the inner ones need through the outer ones. Or else do something to obfuscate how much you're passing around (global deps/state, crazy amounts of deep DI, etc...) which doesn't make testing any easier.
> Really? Really? Not even for dependency injection? Or, you know, you should duplicate your function into two very similar things to have one with the flag and another one without. Oh but DRY. Sure.
I'm not sure dependency injection has anything to do with boolean flags or method args. I think the key point here is that he is a proponent of object oriented programming. I think he touches on dependency injection later in the book, but it's been a while since I've read it. He suggests your dependencies get passed at object initialization, not passed as method options. That let's you mock stuff without needing to make any modifications to the method that uses that dependency easily.
> Until, of course you have to actually maintain this and has to chase down 3 or 4 levels of functions deep what is it that the code is actually doing. And think of function signature for every minor thing. And passing all the arguments you need (ignoring that "perfect functions have zero arguments" above - good luck with that)
I myself find it easier to read and understand simple functions than large ones with multiple indentation levels. Also, it definitely does not make sense to pass many arguments along with those many small functions. He recommends making them object instance properties so that you don't need to do that.
It may not be for everyone, but I'll take reading code that follows his principles instead of code that had no thought about design put into it any day of the week. It's not some dogmatic law that should be followed in all cases, but to me it's a set of pretty great ideas to keep in mind to lay out code that is easy to maintain and test.
> I'm not sure dependency injection has anything to do with boolean flags or method args.
DI can be abused as a way to get around long function signatures. "I don't take a lot of arguments here" (I'm just inside of an object that has ten other dependencies injected in). Welcome back to testing hell.
Yea... that could probably turn ugly in a lot of cases. I think the flow of object creation and dependency injection would be the important part to handle well in a case with a lot of dependencies. I think the dependencies should be passed down through objects in one direction. So if your object (a) that works on objects (b) that have a lot of dependencies, that first outer object (a) is responsible for injecting dependencies into those 2nd objects (b). So if you have a mocked dependency, you pass that when initializing object a, and object a is responsible for injecting that mocked dependency into object b.
A disclaimer, I'm an sre, so definitely not the most expert proponent of oop.
It's actually a good marketing trick. He can sell something slightly different and more "pure" and make promises on it and then sell books, trainings and merchandises.
That's what the wellness industry do all the time.
The boolean issue is probably the one that's caused me the most pain. That contradiction with DRY has actually had me go back and forth between repeating myself and using a flag, wasting a ton of time on something incredibly pointless to be thinking that hard about. I feel like the best thing for my career would have been to not read that book right when I started my first professional programing job.
It's been a while since I've read it, but I think to handle boolean flag type logic well he suggests to rely on object subclassing instead. So, for an example that uses a dry run flag for scary operations, you can have your normal object (a) and all of its methods that actually perform those scary operations. Then you can subclass that object (a) to create a dry run subclass (b). That object b, can override only the methods that perform the scary operations that you want to dry run while at the same time using all of its non scary methods. That would let you avoid having if dry_run == true; then dry_run_method() else scary_method() scattered in lots of different methods.
It might make sense to divide your function with boolean flag into two functions and extract common code into third private function. Or may be it'll make things ugly.
I treat those books as something to show me how other people do things. I learn from it and I add it to my skill book. Then I'll apply it and see if I like it. If I don't like it in this particular case, I'll not apply it. IMO it's all about having insight into every possible solution. If you can implement something 10 different ways, you can choose the best one among them, but you have to learn those 10 different ways first.
Coding at scale is about managing complexity. The best code is code you don't have to read because of well named functional boundaries. Without these functional boundaries, you have to understand how every line of a function works, and then mentally model the entire graph of interactions at once, because of the potential for interactions between lines within a functional boundary. The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows. The cognitive load to understand code grows as the number of possible interactions grow. Keeping methods short and hiding behavior behind well named functional boundaries is how you manage complexity in code.
The idea of code telling a story is that a unit of work should explain what it does through its use of well named variables, function/object names, and how data flows between function/objects. If you have to dig into the details of a function to understand what it does, you have failed to sufficiently explain what the function does through its naming and set of arguments.