Hacker News new | past | comments | ask | show | jobs | submit login
Code only says what it does (brooker.co.za)
170 points by mjb on July 6, 2020 | hide | past | favorite | 114 comments



This is a major problem with code: You don't know which quirks are load-bearing. You may remember, or be able to guess, or be able to puzzle it out from first principles, or not care, but all of those things are slow and error-prone.

This is a problem from both the negative (not breaking things) and positive (knowing how to add things) perspectives. The positive perspective was written about by Peter Naur in one of my favorite software engineering papers, "Programming as Theory Building," in which he describes how the original authors of a codebase have a mental model for how it can be extended in simple ways to meet predictable future changes, which he calls their "theory" of the program, and how subsequent programmers inheriting the codebase can fail to understand the theory and end up making extensive, invasive modifications to the codebase to accomplish tasks that the original authors would have accomplished much more simply.

I highly recommend finding Naur's paper (easily done via Google) and reading it to understand why divining the "theory" of a codebase is a fundamentally difficult intellectual problem which cannot be addressed merely by good design, and not with 100% reliability by good documentation, either.


On the topic of the "theory" (mental model) of a program, I recommend John Ousterhout's book "A Philosophy of Software Design". You get such gems as:

"... the greatest limitation in writing software is our ability to understand the systems we are creating."

"Complexity manifests itself in three general ways... change amplification, cognitive load, and unknown unknowns."

"Complexity is caused by two things: dependencies and obscurity."

"Obscurity occurs when important information is not obvious."

"The goal of modular design is to minimize the dependencies between modules."


This is important to understand when moving through the early stages of a project.

Many projects go through a clear prototype stage (where a lot of disjoint things are written, like a set of utilities to print out information on a file based on the format spec, make files with hardcoded content, etc.), then a system starts coalescing, and finally it's released.

The problem I've encountered is when the prototype is too good. It's an 80% solution, it seems to do everything that's wanted, but the people who wrote it (contractors/consultants/too expensive older devs) aren't the ones who are tasked with finishing the last 20%. The original developers may have understood how to create that last portion with what they'd written, or they may have intended to throw it away [0].

The new developers don't know what's present (and so recreate a lot of existing capabilities), don't understand how to extend it properly (so a lot of copy/paste when the original devs laid out a nice extendable system with generics and or interfaces or whatever the language provides), and the whole thing turns into a mess. This communication between developers is critical, but usually absent.

[0] "This is more of a proof of concept, it does everything you want for converting two file formats between each other, but doesn't scale yet because it's all 1-to-1 mappings, we are working on the intermediate representation now that we have a firmer grasp of what's needed."

"Oh, that's fine, you guys can go work on the next project we've got a crack team that can wrap this up."

"...Ok, thanks for the money."

The crack team never makes that intermediate representation and just creates 1-to-n mappings between each format. The explosion in code size becomes unmaintainable, most of the mappings are the result of copy/paste, and bugs proliferate throughout because, while fixed in one section, they don't realize how many other places that same bug resides in.

EDIT: For the record, [0] started off short enough to be a footnote then grew to be too long for it, and I forgot to edit it properly when I came back from getting a glass of water.


I think I have to disagree with Naur on this, in that people using the Scientific Method don't ship their theories, but we do.

As a scientist who has just succeeded in testing a hypothesis, I now need to go back and document a simplified series of steps that should lead any independent party to the same phenomenon. Once we are on the same page, they can confirm or refute my theory based on their own perspectives on the problem space.

During that process I may discover that I based half of my experiment on another hypothesis that I never tested, or was plain wrong. Now I've discovered my 'load bearing' assumptions. I may discover something even more interesting there, or I may slink away having never told anybody about my mistake.

Essentially, scientists still 'build one to throw away'. We haven't in ages. And my read on Brook's insistence that we build one to throw away is that it was aspirational and not descriptive. And notably, he apparently recants in the 20th anniversary edition (which is itself 25 years old now):

> "This I now perceived to be wrong, not because it is too radical, but because it is too simplistic. The biggest mistake in the 'Build one to throw away' concept is that it implicitly assumes the classical sequential or waterfall model of software construction."

So we are very much at odds with the scientific method. And we have the benefit of hindsight. We have seen the horrors that can occur when you take the word Theory out of context and try to apply it to non-scientific theories. We should learn from the mistakes of others and summarily reject any plan where we do it too.

In other words: next metaphor, please, and with all due haste.


I think I have to disagree with you on this one, I've used the scientific method (though not in an explicit checkbox-y way) plenty of times to ship and debug code.

In particular, since (as I've said on this forum many times) I work primarily on the maintenance end of software. I don't know what the creators or previous developers were thinking, especially with more recent projects (documentation quality has really gone down hill, people call autogenerated UML diagrams "design docs", but without commentary they only reflect the state of the system, not its design). I have to try different changes based on my understanding of the system and see the consequences. That is, I form a hypothesis about what will happen if I do X, I do it, I collect the results and I've either confirmed my hypothesis, refuted it, or left it in an indeterminate state. I form another and repeat. Over time I build up a model (theory) of how the system behaves and should be updated/extended. Since I can't keep tens of thousands of lines of code in my head, let alone hundreds of thousands or millions, I always only have a model (theory), because I never have the totality of it in my mind. Though good code, with good use of modules, makes it easier to keep large chunks in mind, I still have to have a model of how those modules work and work together.

Hell, this is half (or more) of testing for older software systems. You put in some input and see if you get the output you expected. If you don't, you evaluate why (is my model wrong or is the system wrong) and repeat.


I don't mean 'use' as in a #7 torx wrench. I mean 'use' as in air.

I have shipped bug fixes using organized hypothesis checking as well. Especially sanity checks (make sure the instruments are working). But it is not the software developer's default behavior, and I'm sure you've lamented it just as I have. You and I are tourists, and many around us aren't even that. So when we speak of whether 'we' apply formal rigor to our work? Is it still rigor when there is no discipline? I don't think rigor is something you do on a random Thursday. It's something you do all the time.

So no, 'we' do not use the scientific method. We dabble.

And so when someone like Naur tries to summarize software with a line about theory proving, he's not speaking about everybody. If he were honest he might not even be speaking accurately about himself.

ETA: But he's talking about the long arc, not a single bug fix. That we are circling in on what the actual problem is and feeling it out with code. But since we stop at "if it ain't broke don't fix it", we never actually crystallize the thing we built. We never test the hypothesis we suppose that we have created. We have spot checked this organic thing that never gets pinned down and might actually be DOA. We hope the evidence we are wrong is just 'glitches' or problems with the user's machine. Until someone comes to us with a counter-proof that shows unequivocally that we were wrong.

Which leads to problems like those mentioned in this comment tree.


Insofar as "what happens when I do this?" goes, it neglects the null hypothesis. If you don't pay attention to falsifiability can you claim to be doing science?


I use the scientific process constantly while shipping code (most of my time is spent writing fixes for large production systems that are being actively used, where a regression could cost millions of dollars). In particular, I explicitly state my hypotheses, and use positive and negative control experiments when evaluating my fix.

I often "build one to throw away", but half the time what I build is good enough that it goes into production and lasts for a while.


I've had that experience describing the plot of a novel to friends of mine. The novel is pretty complex and covers ideas in crime, internet anonymity, memetics/virality, and then some technical things with vehicles and atmospheric science. Some friends I've talked it through with, they come up with ideas to add stuff and it's like "that makes no sense for this novel, but it's not a dumb idea given how little of the theory of the novel I've communicated to you." Whereas other friends seem to immediately grasp onto the theory and make suggestions that actually fit with the overall concept very well, and usually recommend small changes rather than wholesale plot rearchitectures. It's like an architect coming in and saying "we should move this door two inches so that this door doesn't bang this wall" vs. "We should build this whole house as one story into the side of a cliff."


>...or not care, but all of those things are slow and error-prone

Nah. Not caring is pretty quick and simple. It has served me well!

Seriously I do agree though. One mistake I've seen a lot is assuming that an extensive code base, developed by competent engineers, but which is very complex needs simplifying or rewriting in a simpler way.

Often that complexity is there for a reason, covering platform, customer or situation specific edge cases discovered through hard won experience and feedback from production use.

Twice I've worked at companies where a massive project to replace the core product with a new clean sheet implementation killed the business. That doesn't mean clean sheet implementations are always bad, not at all, but they can like a nice clean beautiful opportunity while actually being blood curdlingly risky.


> One mistake I've seen a lot is assuming that an extensive code base, developed by competent engineers, but which is very complex needs simplifying or rewriting in a simpler way.

Omg, this yes. I've made this mistake countless times. I'e done my share of rewrites or refactorings that ended poorly. Work long enough on a big project, and junior engineers will do it to your code too. Being on that end of it is a very frustrating experience.

But let's now balance that against the assumption at the other end of the spectrum: that an extensive code base, developed by competent engineers doesn't need simplifying or rewriting in a simpler way. As it turns out, this too, is a flawed assumption.

The more code I've seen, the more I've seen that most codebases that have survived any length of time are a mix of both, and it's hard to figure out what is what until you get some deep experience with it yourself. If there's been turnover in the team and the code has been under heavy churn, it's probably a mix of everything!


Thanks for sharing that, I'll read it!

I do agree that comments, documentation, and other artifacts aren't sufficient to solve this problem. The closest I've come is with formal specification, where the intent of the program can be communicated very clearly. Multiple approaches are needed. One of those approaches is continuity: Keeping people around who are familiar with the code base and can pass on this knowledge to others.


So many times this.

"Clear code shouldn't need comments" - clear code can make it easy to see what but it can never say why. Let me know what corner cases you thought about when you wrote this.

"The comments are in the commit messages" - almost nobody ever goes looking for them there, they're effectively invisible from `git blame` when they remove lines, people rarely make fine grained enough commits to be able to target specific lines or blocks sufficiently with context.

"Nobody ever updates comments, so they're always out of date" - don't hire such people. It is an crucial task resolving the meaning of comments to make sure everything still makes cohesive sense. Neglecting to do this will often lead to commits that don't quite grok any subtleties of the original design. Don't make the reader of the code do the job of trying to piece together the scattered history of 5 different people's intentions. Of course, it's also useful to try to keep comments as close to the code in question as possible so that references which need updating are obvious to see.


> almost nobody ever goes looking for them there

I've seen this claim a number of times and it's always so odd to me. One of my most common activities each day - certainly more common than the activity of writing new code - is reading the commit history for different files. It's always surprising to me to hear that this is an uncommon thing to do.

Edit to add: But I also think comments and documentation of all kinds are good. I don't advocate good commit messages instead of comments, but rather in addition to comments. The more documentation the better.


Commit messages are much harder to get to for a given line of code than a comment would be.




I actually find file and directory level history useful more often than the line-level annotate/blame history that seems to get all the attention. I do find line level history useful as well, I'm just saying less often.

At the directory level, I look at the history of commit messages to get a sense for how the module(s) under that directory have evolved over time. At the file level, I look at the history of commit messages and often dig into the associated diff in order to see how the API and/or implementation have evolved. I'll often follow the file-level diff to the full commit to see what the entire change entailed. This gives me tons of context on the what and the why of a codebase's evolution, and it's all aided by good commit messages.

Line level blame/annotate is sometimes useful when I'm trying to trace how some particular implementation came to be, but it is easy to lose the thread across deletions and renames, which I find file and directory level history to be much more resilient to.


Which tool do you use to view commit messages and revisions for each file? I think one of the reasons this is uncommon is due to lack of tooling (or wide-spread knowledge of them). I'd really like to be able to easily see all the previous commits that affected a specific line while I'm editing code. But I usually have to resort to interacting with git, rather than having something popping up on my screen (I use pycharm and vim regularly).


Not the person you were replying to, but in PyCharm:

- right click on the line number, click Annotate; this gives you the commit date and author in the gutter

- hover over the date/author name; this gives you the commit hash and message

- click on the hash itself in the popover; this shows the git commit graph on the Version Control tab

- right click on the date/author name, click Annotate Revision; this opens up the version committed then, with its git blame in the gutter.


That's nice, and setting the options to detect movements really improves it. It'd be nice to see the commit message, rather than the author, without the need to hovering, though.


I use gitlens for vscode


I use tig, which is a terminal UI. The "blame" view has a shortcut to jump to the commit before the one that changed a given line. This makes it easy to quickly dig backwards through the history of a file.


Visual Studio has this. It's called "Annotate".


Agreed. I think the "nobody ever goes looking for them" case is common for code bases where people are not writing good useful commit messages.


The only time I place comments is exactly this: to explain why.

Today I just had this example. I placed a little sleep in a loop. But there is absolutely no way to know why it is there. So I inserted a comment to explain the loop is DOSing a server by constantly requesting it and the sleep will reduce the load on that server.

Those comments are not only for others but also for yourself. Even weeks from now it is easy to lose track on why you did things the way you did.

Comments on why are very helpful.


That's exactly what I do.

Why is a lot more important that what.

Here's an example I use (Verbatim from here[0]):

-

Why Vs. What

I’ve come to realize that the most important inline documentation concerns WHY we are doing something; not WHAT we are doing. For example, no one wants to read “// Set the value of b to 3,” for a line of code that looks like let b = 3. That’s just dumb.

let b = 3 // Set the value of b to 3

However, they may want to know “// Set the value of b to the number of iterations we'll be making.”

let b = 3 // Set the value of b to the number of iterations we'll be making

-

[0] https://medium.com/chrismarshallny/leaving-a-legacy-1c2ddb0c...


Or perhaps even better:

let iterations = 3 // We need to iterate 3 times for the value to stabilize


Yes, I think your example is better because "// Set the value of b to the number of iterations we'll be making" is almost the same as "/ Set the value of b to 3" when the code explains that `b` is used for the iterations.


I actually take it forward a bit more in that article, to where clear naming eliminates the need for a comment at all:

-

With a good name, we could probably do away with the comment entirely:

let numberOfIterations = 3

-

It’s a fairly exhaustive article (but kind of a long read).

Documentation is an important topic.

https://medium.com/chrismarshallny/leaving-a-legacy-1c2ddb0c...


Yeah, but the phrase "only time" somewhat suggests you use "why" as an excuse to comment rarely. You can nonetheless write such a comment for essentially every line.

My job description is not "developer" at the moment, so when I was asked to comment my code in order to turn it over to the developers, I looked for some standards. The document I found said, more or less, that you should write comments such that if the code was removed, someone could use the comments to completely reconstruct it.

An obvious problem is that you can write about the "why" of anything on a micro or macro level or in between.


> that you should write comments such that if the code was removed, someone could use the comments to completely reconstruct it.

I think this is an "archaic" concept in line with heavily structured methodologies, where everything is first documented in meticulous detail and then the code will map to documentation 1:1. Anyone heard of/still remembers SSADM?

Anyway, that approach has turned out to be less than practical, to say the least, and was one of the reasons the agile methodologies were invented in the first place.

The point today is that the documentation should supplement the code to make the other developers/readers of the code aware of the hidden intricacies of the code. Lately, we are focusing on the "why" because it is turning out to be the most useful. As you say, there are still questions on the level of detail of the "why" we need to capture. Mostly, people just go with their gut feeling and that's fine, as long as we take care to calibrate the feeling through feedback and improve it with time.

Maybe there could be a formal way of knowing what needs more clarification but I doubt it. We still lack a way of mapping our human level understanding of the system to a formal machine analyzable system, so a computer analyzing our program cannot know what step in the program would be surprising to a human reader.


> you should write comments such that if the code was removed, someone could use the comments to completely reconstruct it.

I understand that this is just a rule of thumb, but it's so far from anything I could expect to happen in reality that it serves as no justification at all. A codebase is a living entity that grows and changes over time. Without sound justification that butresses both when and when not to comment, advice like this can lead to exactly the brittle comments that disillusion people from commenting as a whole.

Comments are just a part of a healthy breakfast. You want the code to be as clear as possible, both in the small (algorithmically) and in the large (architecturally). When the code must necessarily fall short, comments must fill that gap -- and only that gap. (Other forms of documentation serve other needs.)

It's like unit tests and integration tests: you want as many unit tests as possible, to give you assurance that the pieces from which you assemble your system are correct. Where unit tests cannot speak, other kinds of tests fill the gaps. But if you try to build your test suite out of integration tests, you'll end up pretty miserable.


> advice like this can lead to exactly the brittle comments that disillusion people from commenting as a whole

As I turn this over in my head, it doesn't sound that convincing because the whole "code should be self-explanatory" ethos seems to me just as susceptible to encouraging bad behavior. Saying something is clear allows you to elevate yourself and blame others if they don't follow. Expecting people to judge their own communication is a definite conflict of interest. Exaggerated commenting requirements as I described are at least a reminder that you should try to err on the other side.

Also, for context, the agency I work for is responsible for an accounting system that affects a lot of people and is very much not a "move fast and break things" place. The most exciting thing that happens year after year is reducing the scheduled downtime window(s). Even that obviously must have diminishing returns.

The other thing is that in my particular case, the code I wrote is not directly applicable to the accounting system and is written in a different language, so anyone trying to modify it will probably be relatively inexperienced and there is zero chance of anyone being hired to work on it.


> the whole "code should be self-explanatory" ethos seems to me just as susceptible to encouraging bad behavior.

I hope it's clear from what I wrote that I don't think code can be perfectly self-explanatory. I don't think it's all-or-nothing in either direction. I do, however, think that direct clarity is the first line of defense, followed by comments and other documentation artifacts to capture what the code cannot.

> Expecting people to judge their own communication is a definite conflict of interest.

I don't disagree: while I think it's possible to train your communications so that you meet a minimum bar by default, that training necessarily comes by testing your communication against others. We have peer review processes in part to help account for this: if something isn't clear, that's the first forum of opportunity to address it.

The (only) good thing about pithy quotes like "you should be able to rewrite it from the comments" and "code should be self-explanatory" is that they stake out concrete, extreme positions that can be judged on their merits. The space of practice is much wider than the ideologically pure boundary.

I'm sure that, for your situation, commenting everything such that it could be rewritten from scratch makes sense. If you needed to pass ownership of a codebase more or less instantaneously, it may make sense as a kind of snapshot of the intent behind the codebase at that point in time.

I also work with some critical, ancient software. I wish I had more documentation of all kinds, but mostly because it would help me decipher the codebase itself. I wish the codebase didn't need the degree of deciphering that it does. It makes even minor changes take longer just to make sure I'm not breaking something else along the way -- to say nothing of major changes, which while relatively rare do come along. Again: my biggest need is to decipher the codebase itself, and I would rather it inherently require less effort to do so. There are straightforward principles, like "avoid globals", that are violated left and right in this software by no real necessity. Commenting these aspects is a simply a band-aid on something that should have been engineered better from the beginning.


"...an excuse to comment rarely"

Well you can also look at it from another angle: unnecessary comments are a distraction and make code less readable.

You are not writing comments just to comment. You write them to make things clear.

Sometimes I also write comments to explain what, but most of the time those are written in large peaces of logic with multiple steps that cannot (or should not) split into multiple functions. But even then those comments tend to be in the form of 'why'.

For example: "// First we do this // Then we do this because.. // And as last step we do this because..."

My personal rule for comments:

If I come back in half a year, would I still understand what is going on here? No? -> Comment!


> people rarely make fine grained enough commits to be able to target specific lines or blocks

Take your own advice then: don't hire such people. Why would you insist on maintaining documentation and not maintaining a reasonable change log in your commits, if knowing "why" is important to you?

(I make the advice tongue-in-cheek btw. Better to educate and lift developers up wherever you can than just freeze them out. Not everyone has the experience with these things that we do.)

There's not a one-size-fits-all solution here. Different orgs handle their code history different ways, but having tasted the power Git gives you over certain other VCSes to make the commit log really useful as a record of change, and having taken advantage of that feature several times myself, I wouldn't want to go back.

This follows another comment I just made on the topic: https://news.ycombinator.com/item?id=23743973


> "Clear code shouldn't need comments"

That's like saying articles don't need summary.


Agreed; or like saying "You don't need a map; you can clearly see where this road leads if you follow it far enough."


A lot of comments can (and should) be replaced with good naming. Sometimes the only reason to introduce a variable or function is to name something. The advantage is that it is harder to not read a name than it is to not read a comment, and it is more obviously a bug when we try to set a variable, or make a function calculate, something not agreeing with its name.


> Let me know what corner cases you thought about when you wrote this.

We actually rarely use comments in code. The best way to reflect the corner cases are tests. It is the comments codified.

> but it can never say why

Usually people don't need to know why on small piece of code unless it's a high-level stuff. For high-level stuff we have architecture decision records (ADR) which records the motivations and situations. But we also keep it very short or people would never read those.

> "Nobody ever updates comments, so they're always out of date" - don't hire such people.

That's a really high standards. It's almost like "We have bugs", "Well, then don't hire people writing bugs"...

There are many ways of breaking comments. For example, someone changed module A, then the properties on module B changed because of A. If it's a test it will reflects on CI, but if it's a comment there's almost no way for people to realize that.


Clear code needs some comments. Not many, though. If you write a numerical algorithm it probably needs an explanation or a reference to where it came from or something like that. Generally, one needs a comment when it is difficult to figure out why something was written that way. In many cases comments are not needed. I perhaps write a comment once every ten files or maybe even less but then it tends to be a rather long comment because the thing explained is very non-obvious. The kind of comment that I hate seeing is when somebody feels the need to tell me what methods are constructors or similar such nonsense.


> "The comments are in the commit messages" - almost nobody ever goes looking for them there

Consider it another tool in the toolbox. I've gone spelunking through git history to decipher the reasoning of something still in use, though comments had been deleted and no one had documented it before me.


This works iff there are good commit messages. In my experience, people who don't document their code tend not to bother with good commit messages either.


Clear code and clear tests shows both the what and the why. With the additional advantage that they can’t diverge and be out of sync like the comments because otherwise the tests will fail. Comments should be used to explain something unexpected. Commenting each line of code is a recipe for disaster.


I've found that questions during a code review make excellent fodder for comments. I.e. if someone has a question (not necessarily an issue) about a piece of code, it's a good bet someone might have a similar question when reading the code later.


yes, comments should be considered code too, but for humans, so later coders (including your later self) can effectively simulate (not but necessarily replicate) the prior code-writing process(es). it's as critical as the code for machines.

which is to say, the comments should be kept concise and current too.


If I had a nickel for every programmer who thought their code was so good it didn't require comments... or thinks somehow that unit tests make up for comments... only to come back years later and have no idea why the logic is working how it is.


This was a thing at my last job. The lack of comments always made me cringe. While not every line needs a comment, there's a reason why comments would be useful, especially in security software.

I learned a lot of bad habits there and I'm glad I no longer work there.

Their excuses were literally "no one reads comments" "no one keeps comments up to date" "my code is self documenting" etc.


The self documenting one always gets me. It's not like the person has never read code that is difficult to understand.

Yet they think it is just other people who write 'bad' code. Their own code can't possibly be bad. In fact it is so good that it 'documents itself'. It's just a statement that drips with arrogance.


Ya, but with a security product there are important considerations and while we commented on areas where we fixed a bug due to something fixable, it was a pain in general.

Decisions maybe don’t belong in code but with a security product I feel there has to be some quality level of comments to explain why and how. Someone coming along later can’t be expected to be in the authors head and the author won’t remember all this stuff years later when it might matter or need to be rewritten.

Such a shit show


It seems to be fairly common for experienced programmers to point to unit test code as a way to explore or understand an open source software project. I don't doubt that this works for them, but it definitely doesn't work for me. When I'm trying to explore a new software project, the first thing I want to do is find the relevant "entry point," which is arguably the exact opposite end from the unit test code.


Tests that target the surfaces of the project (public APIs, endpoints, etc) and code examples start to blend together at some point. "Here's a basic example that should do X" sounds a lot like "Do something basic and assert it does X".


One thing that I like about Rust is that they actually allow you to merge those two into one. In your documentation of anything, you can include code snippets, and those code snippets become tests that can ensure that the documented behavior still applies to the current code.


Yes, this is how I use them as well. I’ll go to the unit tests after I have a decent understanding of the project and am looking for examples that aren’t covered in the README.


I think the biggest problem with using unit tests for this purpose is that unit tests tend to (rightfully) spend a disproportionate amount of lines of code on edge cases.


Where applicable, this is what doctests are for, and they're magical. ... where applicable.


Clear code and clear tests absolutely don’t need comments that explain them if they are really clear, at least for me. Comments are extremely useful to explain something unexpected. Commit messages are too limited to explain properly a use case, but linking to a Jira with the proper explanation does the trick. From the tests you can see both the typical use cases and the correct way of using some piece of code and have a guarantee that the code respects the specification. With the comments you have none of this guarantees. I had the opposite experience from you apparently, the people that I worked with that used excessive comments were not up to par with the rest of the team. Also it may related to the mental model, comments just break my flow and make more difficult for me to read the code.


Right you are exactly the kind of person I'm talking about. You think your code is clear. It is not. The logic your code performs is the distilled by-product of higher level reasoning. Reasoning that should be the basis of your comments.

People reading your code will not be in the same mind state that you were in when you wrote it. That is what the comments are meant to assist with.

If you write comments for anyone, write comments for yourself first. You will not remember years from now the reasoning that went into the code your wrote.


If someone's code isn't self explanatory from the naming and organization, why would you expect their comments to do a better job?

The vast majority of comments I see are completely useless, either giving incorrect/outdated information or restating exactly what the code says. I've even made comments nearly the same color as the background in my text editor so it's easier for my brain to skip them.

Comments certainly can be useful, but I think they should be used extremely sparingly.


I don't even mind if the comments restate what the code says. The code only says what it does, not what it is intended to do.


The tests say what the code is supposed to do. And, unlike the comments, they cannot be out of sync with the code because they will fail.


Ok, so for example, let's say I have a function npow2(x) that returns the next power of two above x, unless x is a power of 2 already, in which case it returns x. And I want to use this to get the next power of 2 below x.

I can write:

y = npow2(x+1)/2; //get y such that the next power of 2 above y is at least x.

The comment says exactly what the code does, but at least it also says that that's what it's intended to do. This lets a reviewer check correctness, add optimizations, and also understand the intentions at a glance. The why is important too, of course.

Now I could alternatively add unit tests that specify a large number of input cases in which the output happens to be the previous power of 2. But unless I make the test cases exhaustive, I'm only hinting at what this code should do, not defining or explaining it.

I could break this out into a function too, called prevpow2(x), which might or might not be appropriate in this case. But if so, I'd expect that function to have a docstring explaining... exactly the same thing as the above inline comment: what the function supposedly does.


Adding a comment there is not useful, it should be added only in the function definition if needed. In this case it’s needed because the function does something unexpected. I would call it nextPower2(n) rather than npow2 and add a comment in the definition saying that the function returns n if n is a power of 2 because it’s unexpected and it can’t be inferred looking at the function name. As I said before, comments for unexpected behaviours are not only perfectly fine but extremely important. Also removing all the useless comments will increase the relative importance of the ones remaining since you will have only comments for really important stuff.

Edit: And anyway in the tests you should have at least these two test cases:

Test nextPower2 returns n if it’s a power of 2

Test nextPower2 returns the next power of 2 if n is not a power of 2


The comments are informing the reader about the trick that turns npow2 into prevpow2. Your reply seems focused on npow2 itself, not the line npow2(x+1)/2 that I said deserves a comment. I agree that npow2 should have the tests and documentation that you say, but in my example I'm taking that function as given, eg. by a library.


It's a matter of judgement. I've read code in languages I don't know, even, where I could immediately figure out what was going on without reading any comments. I've also patched libraries in ten minutes without reading comments. It's certainly possible to write code that doesn't need comments, though perhaps, we're not the best people to judge when that's the case or not for our own code.


Same here. Though if you are putting thought into the code you are writing - the logic, the structure, etc.. Then those thoughts should probably be written down as well as comments.


Self-documenting code is not a judgment that you can pass on your own code: that's the role of code review.

95% of the code you write day to day should be clear enough that someone who didn't write it can understand it on first read (the reviewer). If they can't, you probably need to rewrite the code so that it is clearer, not comment it. That's what self-documenting code is.

Instances where the code needs comments to be understandable are rare.


I agree with most of the points made here, though I think some of the bias toward up-front exhaustive documentation is probably not a good fit for most of the projects I've been a part of. Prototyping often reveals necessary changes due to resources constraints, or to unconsidered corner cases. Documentation needs to be a living thing as much as the code, and I think that pushes you toward documenting within the code more than externally.

One of the more important points the author brings up is that authorial intent and the 'why's of comments are the most important. A corollary I'll bring up to that is that the 'what's should be encoded in tests. Tests can be great documentation, and they have the added benefit of informing developers when the goals of the software is being voided (when they fail).

What has worked for me is conceiving of documentation this way:

- Design Documents: Historical use only, not to be updated.

- Readme: intro to project; why it exists, overview of how it's meant to function, how to edit, etc. Tends to be updated when big things change.

- Code comments: why something exists, what considerations were made in that code's creation

- Test descriptions and comments: binding goals of previous development to future development

This approach has done a pretty good job of keeping documentation from getting too out-of-sync with code while enforcing basic business objectives, still tilting the balance toward development rather than documentation.


This is quite good actually.

I would add that some elements of design are worth keeping up, like a general architectural overview and the details of some things, like state-machines or specific kinds of statefulness.

It can be done in the comments, at the package level, that way developers can keep it up to date without much fuss.


A big issue with documenting what the code does is that the code and documentation can very quickly fall out of sync. As this posts says, it's much more useful to document the intent of the code, or why there's this mess of seemingly hacky code (see issues #80681, #82108, #66065).

Also be wary of unit tests that are overly tied to the specifics of an implementation. These can be worse than useless when it comes to changing code. I.e. asking "why are my tests failing?" and finding out it's only because I breathed near the code.


System tests too!

The last code-base I worked on had do-everything system tests (with unexpectedly good coverage, I'll admit). They were so slow and passed often enough that I didn't immediately spot flakiness.

I got suspicious when the tests started failing regularly when I added new unrelated code.


A good way of avoiding this trap is to require updated comments as a condition of merging a pull request.


I’ve had similar arguments here once or twice. There’s so much context that isn’t deducible from code.

You rarely need to document the “how” (that much should be evident if the code is well-written) but you absolutely should document the “why” (or, often as important, the “why not”: what code could be here but isn’t).


I agree that you shouldn't document "how", but when I'm reading unfamiliar code, I find I what I miss is "what", not "why".

To my mind, in well-written code each function should be documenting its contract: what it assumes, what it guarantees if that assumption holds.

(And if it turns out that what you'd write is just the function's name and its parameter and return types with a few grammatical particles added, maybe it's OK to omit the documentation.)

Then if you find that in order to do that you have to write a little essay, or you need terminology that doesn't correspond to a named thing in the codebase, or you're repeating yourself in multiple comments, that tells you something you need to put in higher-level documentation.


Yes. Floyd-Hoare logic is worth learning about in this sense — a formal system for imperative languages where the rules essentially say "given preconditions X and code Y, if X is true before you run Y, we guarantee Z to be true afterwards". I've never ever ever proven code correct using this formalism, but that way of thinking permeates my every action as a programmer.


Whether or not to document "how" will depend on the target audience.

The code is written for people to read and only incidentally for computers to execute. From that point of view the target audience will matter a lot. I agree that in most cases, you expect the audience to be on level with the code and if you are writing code that is not on level with the audience (for example it is way too advanced) then you are doing something wrong (at the very least not considering who will be working with it).

But not always.

Example: I do with a lot of bright quants. With regards to code that does a lot of complicated calculations, they typically understand "why" but might need help with understanding "how".

Another example: when your audience is mostly junior team you might want to explain "how". My current project starts using more and more reactive constructs and oftentimes I throw in an explanation of "how" when I, for example, get queries to explain during code reviews.


Give me an example, and I will write you code documenting the business context.


And then what? You throw away the example specification so that the why only continues on in the code?

And after 10 years of maintenance, the code has drifted with each iteration so that the why is no longer clear in the code.


I think the most reliable thing to see what quirks are load bearing is tests, particularly regression tests. You change something, you break expected behavior, you fix it and you write a test. Now the next person may wonder if some quirk is load bearing, but they'll know for sure when running the regression tests.

Additionally, I'd say naming things, though one of the hardest things we have to do, can go a long way towards explaining the "why". Some programmers I've worked with have a knack for knowing just went to break a giant line into separate lines, giving local variables great expressive names, and all of a sudden the code reads a million times better.

All that being said - I agree with most of the points of the article and do push my teams to do a lot of upfront writing down of designs. These things tend to go stale, but in the moment they're a great tool for fleshing out ideas and sparking discussions.


I comment my code[0].

I don't particularly care what people think about it.

I will say that I have turned over a lot of code, over the years, and virtually never get asked about what it does. When people ask me about my code, I generally tell them where to look, and contact me if they need explanations.

I don't get contacted, so I guess they could figure it out.

I also tend to write a lot of supporting documentation.

We do have to be careful, though. Documentation can easily become "concrete galoshes"[1], so things like header/auto documents are pretty important.

[0] https://medium.com/chrismarshallny/leaving-a-legacy-1c2ddb0c...

[1] https://medium.com/chrismarshallny/concrete-galoshes-a5798a5...


My pathway into software development was through electrical engineering and embedded systems. So I don't know if this applies to other ways into software development as well. But what really stood out to me in the beginning was how useless code comment where. I would almost always see code like this:

  x = 1;      // assign 1 to x
  y = x * 2;  // multiply x by 2
I don't know if it was because they thought electrical engineers needed to be explained everything about code. Or if it was because all teaching material used this style and people just copied it. But I never understood why you would add comments like this, but had to do so anyways otherwise I would not pass my exams.

It took me a while to learn that comments are the tool in which you can express your expectation of what the code should do.


Those samples were probably written by someone who had learned how to code in assembly language. In assembly, this kind of comment reminds the reader of the semantic meaning of what is in each CPU register. It could be very useful. Then the person learns C and never drops this habit.


It gives me no end to pain that "Comments are lies because they aren't code" is a fad that we're currently suffering through as an industry. For decades prevailing wisdom was that comments were a net benefit, and now in the last few years this trend has become prevalent. How much perfectly-good code is going to have to be rewritten from scratch in 10 years because no one remembers what it does?


If no one understands what it does, it's not perfectly good code is it? Of course there are rare cases where code cannot be simplified, made more readable or self explanatory and in those cases comments are vital. But the aim should be for the vast majority of code to be easily readable by humans.


Essential versus accidental complexity.

Perfectly good code can be unclear because of the accidental complexity included within it. Memory management, error handling (especially in languages with less expressive type systems), configuring hardware/database/network connections, etc. Those things are important, but they prevent the essential portion of the program from being expressed on its own.

Type systems, a brief example: C versus Ada. Implement a network protocol where the data packet has specific n-bit sized fields with ranges less than the maximum for that size. You can easily do this in both languages. But in C, you'd either need to add bounds checking to all of those fields or risk letting errors propagate. That error handling obscures the essential portion of the program. In Ada, you make a type that is n-bits and only accepts values of the correct range. The errors can still exist in received packets, but the error checking is partially elided from the code because the type system itself can catch it.

There's nothing wrong with the C code, and there's nothing wrong (many will disagree with that) with choosing C to implement the protocol. But it will increase the complexity due to factors beyond the inherent, essential complexity of the network protocol itself.


I mean it's "perfectly fine" because the people building it know how it works (because they were there when it was built), and think they don't need comments to help out future maintainers.


Relevant: Writing system software: code comments. from http://antirez.com/news/124 + https://news.ycombinator.com/item?id=18157047


The marketing corollary is that metrics only tell you what happened, they cannot tell why. Yet somehow, entire companies have been built on the promise that they can answer the "why" by looking at the metrics.


What I’ve noticed is that code is the only medium of communication that is non-ambiguous. Designers, product people, stakeholders, etc, all use ambiguous mediums. It’s impossible to understand just how pedantic and explicit you need to be when writing code versus, say, giving your human colleagues instructions.

So it's 0% ambiguity for code. 100% ambiguity for the rest. In other words, communicating with the computer is overly pedantic, and code bares all the frustration. Can we make it more fair? 70-30 perhaps?

Rather than concentrating on mediums that help with communication (as this post mentions: Design Docs, TLA+, comments…) I want a new medium that allows me share the burden of overscrupulousity with the rest of the people in my team, and not just developers.


Pseudocode, perhaps?


When I teach coding I tell the students to document their code well but don't document what the code is doing it. That should be evident from the code itself (if it's not, then consider a refactor). Rather, explain why the code is doing it.


The counterexample here is the declarative style of programming. Most ideally this looks like an executable spec and is documentation itself.


Sure, but then you offload the complexity to the functions used as the declarative building blocks, so you do the documenting in a different place, though you will probably end up documenting complex declarative business logic anyway. (like why is process X that is so similar to process Y require Z different declarative blocks)


Only for simple cases. When it becames complex enough the same problems show.

Example: CSS.


This article fits nicely with the recent post discussing how Linus spends the majority of his time writing emails. For projects with n+1 contributors, inter-contributor communication is just as important as what code is being written. Emails, commit messages, code comments, docs, are all just different ways to communicate.

I always think of the Underhanded C Contest[^1] as my favorite example of readable code that doesn't act as expected after a quick read.

[1] http://underhanded-c.org/


If you want your cake and also the ability to consume it, you might want to consider what functional programming can do for you regarding the ability of your codebase to self-document itself. Having type systems that are very closely aligned with the abstract business model is the best way to avoid frustration when you are trying to figure out why something is the way it is.

The trick is understanding that functional vs imperative is a spectrum, and trying to force 100% on one side or the other is how you wind up killing any project. We find that keeping our business-level abstractions functional with the underlying infrastructure code imperative provides the best of both worlds. The code that is changing and analyzed most frequently is in the functional domain, whereas code that we touch maybe 1-2 times per month lives in an imperative domain (but sometimes functional wherever it makes sense here too).


I find functional code much shorter and cleaner, but also when you want to change something along a new axis you need to do a much bigger rewrite than with imperative/procedural/object-oriented code.

OO code: "here's a detailed and long-winded description of what happens that's hard to understand. Ignore 95% of it and change that 1 little detail and hope for the best"

Functional code: "here's a concise and easy to understand description of what happens, understand it fully, throw it away, and create a new, just as clear and concise description of what should happen from now on"


The huge value that I see in a formal specification language like TLA+ is that we could have a precise way of communicating the problem in a way that is agnostic to the implementation language.

Imagine something like StackOverflow, but instead of posting a question, you post a formal spec. Thinking even further, you could then find a way to combine/interface these specs and build something like a global database of computational problems.

We're currently doing this already with StackOverflow, but we're focusing on the implementations, not the problems themselves.

Please correct me if there's a mistake in this line of thought, I'd love to know.


What you're talking about there is a Model Repository. We're building one at the bank I work at, except because our modelling language (or meta model) is based on OMG's MOF we can generate artifacts (code) from our models. You can't do that with TLA+ as far as I know. It's pretty powerful - you can compose models together very easily, as well as generate loads of useful things for data-in-motion.


Hey, thanks a lot for your response! It's really hard to search for abstract ideas like this if you don't know the terminology (like Model Repository), so this is super helpful. This is a very interesting topic for me, may I ask you a few questions? I sent you a request on LinkedIn.


False. If we used only meaningless symbols like “A”, “B”, etc. for names, then it would be true. But if I have a method named “addItemToCart”, then I know what it ought to do. If it does NOT in fact add the item to the cart, then I’ve found a bug. It’s true that a short method name might not capture all of the subtleties, but usually the variable names within the method can give you an idea of what the intent of the programmer is as well. Obviously there are still things you should write comments for, but really well-thought-out names can get you surprisingly far.


About the only time I use comments are to delineate a block of code that for whatever reason can’t be a method or because I am working around some bug and I think I might be tempted in the future to remove that code as “useless”. (Think “the dispatch_main here ensures that the code runs on the next runloop iteration, which is necessary for the animation to work”.)


Code is a mixture of what and how (and with IaC, sometimes who and where).

I agree that "why" is the role of documentation. I've been experimenting with tying the two together with (machine checked, automatically surfaced) cross-references, so we can better know what bits of documentation a test supports, &c. I haven't yet gotten rigorous about it.


“works as coded”

My last job we used to say that if asked whether our code was correct or bug free ;). Often the devs get thrown under the bus if something doesn’t work “correctly” when in reality it might perfectly pass all unit tests based on the best understanding of the problem.

Of course whether we could get any support to help define “correct” from anyone was another matter...


This fits my theory of programming, and theory of bugs - We take a problem, create a plan, and then write code that implements that plan.

Defects can come from:

   * having/being given the wrong problem

   * right problem, but plan does not actually solve it

   * right plan, but your code did not correctly implement it


Then of course there are the maddening cases of accidental correctness. ie. a bug in your implementation of the wrong plan does the right thing.


I'm looking at some ETL code from the dark past, and it uses low-level Db code to shovel in .csv files like a boss.

I'm all' "Well, I guess that code that obtains the .csv files must be rock solid."

Famous last words.

A freshly done system later, I'm left to infer that there had been some other cleansing to to which I was never privy.


The "why" can be expressed in code.

Think of "Event Storming". A great way to talk about the "why" and to grow an understanding of how a problem can be solved. The result can be multiple "flows" that describe a series of events from the first command to the expected outcome.

We have the option to directly translate a flow to code. And by keeping the flow in one place, we also keep a direct mapping from our code to the EventStorming-results. The "why" will not be lost. The code can contain multiple flows and every flow can have a scenario-like description like: "the user is able to select a product"

This is what my passion is all about. To keep the "why" in one place. This also enables better collaboration between multiple disciplines (like UX <-> DEV)

This is the idea behind scenario based programming. I am working on an open source project: https://github.com/ThomasDeutsch/flowcards

Write me a line if you would like to get involved.

Have you found other solutions for this problem?



Here is little demo of codeBERT - https://youtu.be/oDqW1JHmaYY

codeBERT is trying to predict if a certain function and its docstring are associated or not.

Thought about sharing it, I guess it is interesting in this context


Ja to me the "rule-of-thumb" is still: "Code Is The How, Comments Are The Why"


Any thoughts on applying this article to...

programming languages defined by a single portable implementation?


So.. code is what it is?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: