I have the same problem (i.e. terrible short-term memory, though my long-term memory is fine), and I've picked up a number of compensating behaviors over the years. By far, documentation is my #1 go-to strategy. I document code extensively, even if I'm the only person who is ever going to read it again. People have mocked me for this, but I believe it's a superpower.
Most programmers believe a number of blatant falsehoods about documentation, with the most prevalent being "comments go out of date quickly, so there's no point in investing in them". Maybe I'm just hyper-aware of it because my short-term memory sucks, but code comments have saved me on so many occasions that they're simply not optional.
You can document your code. You can keep it up to date. It isn't that hard. You just don't want to.
I practice something I could only describe as “comment-driven development”. Whenever I need to implement something non-trivial, something that might have subtle edge cases or has to be done in a certain way because of the external dependencies, I first write the comment before writing the code. I might iterate on the comment, not unlike one might iterate on a design document, and basically use it as a mental tool to fully understand the details of what needs to be done. After I do that, I use the comment to remind me about all these details as I go through the implementation.
All this is doubly important when designing an API, where the comment (also) serves as the externally visible documentation for that API. It helps me put myself into the shoes of the caller reading that documentation without knowing the implementation details, which I believe helps me design a better API.
I hear that “comments go out of date” a lot, but in my case, if a comment doesn’t match the code, it’s usually the code that needs fixing.
"The literate programming paradigm, as conceived by Donald Knuth, represents a move away from writing computer programs in the manner and order imposed by the computer, and instead gives programmers macros to develop programs in the order demanded by the logic and flow of their thoughts.[4] Literate programs are written as an exposition of logic in more natural language in which macros are used to hide abstractions and traditional source code, more like the text of an essay."
I would go so far as to say it is literate programming. People tend to think literate programs require an extensive system to support them, such as org-modes's emacs integration, or weird meta-programming facilities, but the truth is a program written in any language that support comments can be literate programs, so long as the content of the comments dominates and dictates the structure of the code -- which is exactly what GP commenter's method sounds like.
> so long as the content of the comments dominates and dictates the structure of the code
The key feature of literate programming is separating the order that code is written in from the order that the compiler requires, not comment to code ratio.
"The key feature of literate programming is separating the order that code is written in from the order that the compiler requires"... in the languages that were popular at the time Knuth wrote the definition, which were very rigid and uncompromising on layout.
We don't need specialized literate programming tools today because modern programming languages are already capable out-of-the-box of sufficient flexibility in organization that the additional marginal benefit of a superspecialized higher-layer toolset to weave pieces together is no longer worth the effort of learning and propagating out to teams. Yes, they technically can do things programming languages can't, but I could even honestly quibble on whether those things are a good idea.
This is why strict Knuth-ian "literate programming" has never taken off on a technical level and why it never will; by the time it might have gotten enough attention to really take off, programming languages had already largely incorporated the necessary additional flexibility that made it unnecessary to use specialized tools to write good code.
The human reason, of course, is that merely making documentation easier or nicer is not generally enough to get people to write it, and arguably literate programming makes it harder to write. Not because of the tooling being bad or anything, but because of the bar being raised really high to have "proper" documentation under its dictates. People can write plenty of well-documented, high-quality code in any modern language right now. They just generally don't.
You are certainly correct that older languages had more constraints on ordering, but I completely disagree that modern languages have removed all restrictions. I assume in whatever language you are imagining you have things like methods in classes or namespaces. This forces you to write them all together. The ability to separate those out is powerful.
As evidence of this, compare a well organized C codebase to whatever language that you have in mind. Does it really look that different? Literate programs on the otherhand do like quite different.
Furthermore a big programming project is still just a list of files. What do you read first? What are the important concepts? There is nothing to tell you this information in a typical program. A literate program has a page 1 and you start there. You can think of it like the difference between API documentation, and a well-written tutorial. Modern code files are great API documentation, but are lacking in that other form of presentation.
Another feature modern languages don't have (and probably shouldn't) is the ability to integrate other media types into the code (images, math notation, etc). This is another feature facilitated by literate programming systems, although not the essence of it.
"I completely disagree that modern languages have removed all restrictions."
So do I. That's why I said the marginal advantage makes it not worth it, and said already that literate programming does things that conventional doesn't.
(If you don't know what a marginal advantage is, you may want to poke around on the internet a bit. It's a very valuable concept. And I see a lot of people using the word in a way that shows they don't really know what it means; it is commonly thought that it means something "small" or "insignificant". That's not what it means.)
"There is nothing to tell you this information in a typical program."
This is true, but it's not true because nobody uses strict Knuthian literate programming, it's because nobody is writing those. Languages generally do have a place to put a top-level summary and organize all the rest of the documentation. I know; I use them and write them. All my major systems have an official "top level" location, and generally I put it in the local system's language-specific documentation format (godoc for most of my systems). People just don't write them. Handing them better tools won't solve that problem.
> The key feature of literate programming is separating the order that code is written in from the order that the compiler requires
But what's the purpose of divorcing code execution order from the machine, and instead tying that to prose? It's simply to facilitate the reader's understanding of the logical structure of the program without the superficial parts.
Just because you do discuss those superficial features with the "literate-programming as code comments" paradigm doesn't make the program less literate, it just means the prose has to be less efficient.
I do agree it's better to use a system like org-mode for literate programming because it is better to separate the description from its execution, but I'd argue it isn't really the essential element of literate programming.
I do this too, and it’s great. I’d much rather rewrite a comment six times and then write a function once than rewrite the function and re-modify all affected code six times.
Also, difficulty writing a clear, concise comment is a strong code smell indicating that you don’t sufficiently understand the following block of code. Much better to realise this at the comment stage than at the debugging stage.
Yes! Rewrite that comment until it's sound! Being able to write in spoken language what the code is doing is so often related to the writer's ability to write that code succinctly.
I did this early in my career then stopped when I realized it generally made the code less readable. Generally, not always. Sometimes a comment about what the code does is invaluable if it’s complex, but usually it is redundant with reading the code directly. I feel like coming across code littered with comments about what the next line does reflects someone not cleaning up after themselves, a string of trivial comments would not pass code review on most teams I’ve been part of.
This applies to comments about single statements in the code, unless they have an esoteric side-effect or there is a particular why that needs to be explained.
But comments that describe a paragraph/block of code have a separate value. Being able to skim code faster because of a series of well chosen sentences is an aide beyond clear code. As the OP described it is a good way to design before the code is written.
Very similar situation here. I have always approached my code, my codes documentation, and the actual user documentation with this attitude: I assume that someone in the future looking at it will be on their first day on the job, and may not be familiar with the context I am so familiar with.
Comments on non-trivial code is invaluable. I do it constantly and appreciate traversing previously made comments, especially when beyond short-term memory. Regexes, multi-conditionals, all those weird edge cases. How does any of that go "out of date"? As time progresses, comments become _more_ valuable, not less. What is the use case where comments are bothersome or out of date?
For figuring out how to build complex systems I’ve started doing something like this too, but in prose form, in a separate document, purely as a form of thinking through how the pieces will all fit together. Most recently, it was for building a multitrack video player, and working out how to organize the various parts (decoders, mixer, compositor, player, handling seeking, etc etc). I drew out a TON of block diagrams while constantly feeling like they were missing the how and the why, and then I sat down and wrote out my thoughts it helped a lot with getting clear on how it would work.
I do the same thing. I then replace the comment with a method name that describes exactly what the comment described, e.g. _getLowestFooIdFromBarCollection(). I then implement the code inside this private method.
I do pseudo code sometimes too. I notice that your functions/methods/whatever are verbs that are assigned to nouns. It reminds me of:
Verbs in Javaland are responsible for all the work, but as they are held in contempt by all, no Verb is ever permitted to wander about freely. If a Verb is to be seen in public at all, it must be escorted at all times by a Noun.
I’m not too lazy for comments, but I hate it when the flow of explanation/logic gets piled up with details, all in the same visual style. A source becomes a lenghty book with randomly discussed topics mixed with code. Instead of comments I’d like to emphasize some parts and write irrelevant details and technicalities in a smaller font with a lower contrast. And/or collapse parts of it, but with manual collapse statements, not whereever program syntax allows (I find this completely useless). Also abbreviate expressions into comments that by default replace them completely. E.g.
<cycle through params and start a service for each one>
…
<set up an ocr subprocess and handle errors>
…
if (<is a numeric string>) {
<trim whitespace>
<extract integer part>
<send it to /path/to/route>
}
expands into a corresponding full-page code and more details when you click on these parts. Can’t simulate fonts and colors on HN, sorry. This isn’t exactly literate programming, but more like a programming markdown. I wish it was every language’s way to go. I know you can simulate that with functions and IDE, but having a continuous context has its own benefits (you don’t have to pass it around, which is tedious and again technical), and also naming things properly and in a unique non-polluting way is hard.
I ran into this issue and have settled on very descriptive naming for variables, functions, classes, objects, what have you. I still comment a lot, but it's things you might not infer from the naming conventions. Even when I don't think anybody will ever see the code I do it this way because when I open it up a year later to expand, between the git logs and my comments I'm ready to rock right away.
This is slightly different from my idea. I also comment a lot, create sections of code and use names which at least I can pick up later. My issue is not with names (words), but with a structure of text. A wall of it is “either read or go away”. I’d like to see through it as if it was a high-level description and only dig into details when required. This comes from an observation that you perceive best what’s on the page when you don’t have to jump around and collect knowledge. A page is a best perception unit. Everyone talks about good descriptive names, but then you open any project and see verbAdjNouns at best, which implies that you are familiar with their pretty local jargon.
There are TwoHardThings. I'll take longer descriptive names any day. There is a discussion whether the prefix of words like "get" (i.e. `getName()`) is necessary. Please keep as rare as possible the describing of the _type_ of object, like `MyDataDTO`, `ListOfPeopleArray`, `ApiRequestObject`.
I definitely do this as well when I have a complex feature or pattern I need to implement. It’s best to break down the problem into solvable chunks and writing the comments out first helps you break it apart and also make sure you haven’t missed anything.
This is the stuff that a software engineering professor will tell you to do, you ignore it for years, and then later on, after you’ve matured, you find yourself picking up again because it’s just one of those incredibly simple and responsible ways to break down complex problems.
I've also hit on doing this as well. I find it really helps to organize my thoughts, uncover edge cases, etc before I start to write code.
Then I might start sketching the implementation with function signatures docs, types, loops and calls, and gradually start filling it in beginning with the parts that are less clear to me.
Usually, when I do this, I immediately end up with a comment that doesn't reflect the code, and is cut and moved around so many times that it's not legible anymore. So I try to replace the comment with code, at once or piecewise (on what case, the comment doubles as a TODO list) during the process.
I believe "TODO" comments are valuable and should be left in; it shows the author also understands there's a better way, once it's viable. Most of my professional colleagues do not agree. I swear it's an ego thing.
I believe this is due to how IDEs have introduced useful features to elevate things like specific comment structures like TODO into something "official" and recognized and reported on.
So things like a quick "let's improve this part later, but this is fine for now" become items that are highlighted as not "DONE" and then are seen as eyesores or unprioritized tickets, which leaves a bad feeling for devs.
While I think these features of automatically tracking TODOs can have their uses, like most things if they become metrics they start to lose their value. Sometimes it's better to just have a keyword you can easily grep for when you do find some time, having these results show up as cases against your abilities to finish the work is unhelpful, but that's what the manager and maybe coworkers see, unfortunately.
Hum... Your comment reads like "I have this very specific and subjective process that works for me; people that do things differently are too egocentric to notice their incompetence". Your colleagues may react badly to that style.
I believe (but I'm not sure) I do get what you re trying to say, but not every TODO reflects some eternal trade-off. Some are for real clear improvements (like the "there isn't any code here" into "the code was written" on my example) whose history won't add any value after they got done.
Code comments are implicitly associated with code that already exists. Sounds like your making a presumption about what code exists and potential missing the forest for the trees.
On larger teams the likelihood than comments get out of date is probably higher - particularly if the original comment was very long and detailed and the original author has moved on.
The number of times I've seen obscure code that would have been elucidated by a good comment is vastly higher than the number of times I've seen outdated comments.
Funnily enough, most of the outdated comments I can remember were dead docs links despite that being a very common 'solution' to the problem of outdated comments.
One way to cap the cognitive load of adding and maintaining comments is to limit them to explain why a piece of code does what it does, rather than how. Rely on the fact that the how is self-documenting to other programmers who may need to maintain your code later (including you), but the why is not always so.
Eg, were other options considered and this one chosen? Why? Is this piece of code non-standard or particularly clever? Why? Is this piece of code more complex than it seems like it needs to be? Why? Does this piece of have non-local dependency that may not be obvious at first glance? Are there security concerns with this piece of code which are non-obvious to a novice or an outsourced worker? Etc.
There's the old saying, "since debugging code is more difficult than writing it, then if you write the most clever code you are capable of, then you are by definition incapable of debugging it". Pretend you're explaining your code to future you who has forgotten why you did what you did and isn't clever enough to debug it.
I think "why" comments are the most useful, but "what" comments are a bit undervalued. A few types of "what" comments are, I find, very useful:
1. Comments that are effectively headers:
# If the user is foobar, do thing
user = User.find(current_user_id)
user.preload_bizbaz!(cache: false)
is_foobar = user.bizbaz.foobar.status == "yes"
if (is_foobar) { do_thing(user) }
Yes, you can read the 4 lines of code to see the "what' - the comment is duplicative. But when I'm skimming through a large codebase trying to find or understand something, these headers can be invaluable.
2. Comments on dense code
# Matches any email like foo@bar.com (only .com, .net, .org email addresses)
return email.match(/^.*@.*\.(com|net|org)$/)
Yes I can that regex, but I can read that comment a lot faster.
I love header comments, but many people fail to get their usefulness, and I get the "good code does not need comments" mantra recited to me pretty often.
Here is the thing: Header comments are not really about explaining their following code. They are about reducing the lines you need to read to navigate through the code by a factor of 5 to 10. Huge time-saver, and helps a ton newcomers to become productive quickly.
// The following block should have the same
// result as this (simpler) code:
// <code>
// but for <reason> this version is better
This is great for things like opaque optimizations, things that mimic a library function except for one critical difference, blocks that become really convoluted to deal with special cases, and similar.
> I think "why" comments are the most useful, but "what" comments are a bit undervalued. A few types of "what" comments are, I find, very useful:
I agree. However, "what" comments need a little more skill and judgement to know what to write and what to leave out.
Having a terrible memory helps develop that skill, because you kind of get a sense of what you'll want explained to you again in six months and what's hard to decode.
> And the latter is a perfect example of what goes wrong, when someone adds "gov" to the regex but doesn't notice the comment to update it.
That always gets trotted out, but it's sort of like "don't write code unless you can guarantee the code won't have any bugs, so never write any code."
The trick is write code and comments, but be a little skeptical of both. If you're disciplined (especially with commit messages), it's usually pretty easy to figure out a mistake or omission after the fact.
I am 99% in agreement with everything you said, so of course I'm going to focus on the 1%! If you have a 4 line stanza that needs a comment to explain what it's doing then I'd suggest that needs breaking out into a function so that your code then becomes `if (is_foobar?(current_user_id)) {do_thing()}`. Just as readable with no comments needed.
There are definitely places where that's hard or impossible and your regex code is a good example of that. I write a fair amount of C at the moment and you can have some pointer arithmetic that's not even that complicated, but a comment on it makes scanning through the function much simpler.
> If you have a 4 line stanza that needs a comment to explain what it's doing then I'd suggest that needs breaking out into a function
I'm mostly in agreement, so I'm going to focus on my disagreement too! ;)
This view is pretty common in the ruby community. So much so that I think they take it too far at times. I sometimes find myself reading a class that could have been a single 24-line method but is instead one method with method calls, each of which calls 0-2 more methods.
This style is good for when I'm trying to quickly understand what that class is supposed to do - I just read the top-level method that looks like:
But it's terrible when I need to understand what that function is actually doing - eg to debug it or to find some underlying function call I'm looking for. I have to bounce all over the file to mentally reconstruct a linear sequence of code which could have just been one function with some headers.
Of course, this style is a reaction to impenetrable 500-line functions which are also terrible. I'd definitely prefer many small functions to that! I think it's a matter of judgement and experience to know whether some code is better as small stanzas with comments or small functions.
For your first example, use a method with a fitting name.
Second one is not a good example, because I think it's a very common pattern and email.match with com|net|org provides enough context by itself. It's just bloat imo.
The "why" is the most underrated aspect to good documentation.
I generally prefer the word "context".
The trickiest aspects of writing and editing software tend to boil down to two subjects:
1. What's the underlying system we are trying to manipulate? How is its data structured, and what kind of loops and logic branches can be hijacked for our new functionality?
2. Where and when is our new code interacting with the underlying structure?
3. To tie the first two together, what data/functionality are we exposing to the user? How and when? What is the user?
Most contemporary programming over the last two decades has been focused on nouns. The last 5-10 years has seen a shift in popularity to verb-based (functional programming) paradigms.
We can find insight from noun-based programming to answer "what?", and verb-based programming is good at answering, "how?".
Neither really do a good job of answering "where?" or "why?". Those have to be peiced together like evidence in a crime scene, and by the time you figure it out, you've probably learned everything there is to know about the entire codebase.
This is funny to me because we seem to be complete opposites! I have a great short-term memory, but my long-term memory is sadly completely trashed. I don't care if someone wants to document their code, but I never actually trust that documentation to be accurate or up-to-date. I expect it to be misleading and time-wasting, and so I only trust the code itself. I personally only write comments for bits of code that I find confusing, and I think it's a bit of a failing when I have to do that -- I'd much prefer to put in the extra time and effort to rewrite the code to not be confusing.
(I don't write code for a living these days btw, only for myself. It's much more fun, and I get to not care anymore whether other people think I'm wrong about things like these).
You may not remember, but you and I have had this discussion while working together. You were probably one of the people mocking me. ;-)
> I never actually trust that documentation to be accurate or up-to-date. I expect it to be misleading and time-wasting, and so I only trust the code itself.
Two things here:
1) I don't implicitly trust comments. Obviously, the code is the definitive source of reference, and where the comment differs from the code, it only means that something interesting happened. But...
2) I believe in commenting "why" or "who", and less often, "what" (and rarely "how"). My experience is that "why" comments age well -- even if the code drifts, the intent of a method or class changes infrequently.
For example:
# I (timr) wrote this method because I needed a way to
# invert the index for {situation}, and {method a} and
# {method b} didn't work because {reason}.
is a better, more evergreen comment than:
# this method inverts the index for {situation}.
which is far better than:
##
# inverts index.
# @args foo, bar, baz
#
Unfortunately, nearly all doc generation software encourages the latter, and so many comments are pretty darned useless. The first example is great because even if {method a} and {method b} and {reason} fail to be true in the future, some other programmer can come along and read it and say "ok, I understand why this was written this way, and the preconditions motivating it are no longer valid. maybe I can refactor."
> some other programmer can come along and read it and say "ok, I understand why this was written this way, and the preconditions motivating it are no longer valid. maybe I can refactor."
I appreciate having this kind of information available, but it often gets too verbose for my taste to keep as inline comment. For this, I typically push this kind of documentation to the commit messages. This however requires very disciplined use of git: you need to "massage" your commits so each of them is a self-contained change, and OFC avoid squashing during PR merges. Then, a git blame + git show will bring up the relevant information.
Oh wow! No, sadly I don't remember that, nor much else from that long ago. That's funny (assuming I wasn't a dick about mocking you!)
I actually completely agree with your ordering of least-useful to most-useful comments. I still tend to view these most-useful comments as just kind of interesting historical tidbits, rather than something that really helps me do my job, but at least we agree on the order :)
This so much. I really don't trust most comments at all. I think we also need to be explicit about the kind of comment.
There are comments that are categorically always bad. These are the "this code does X" kind of comments. Those go out of date really quickly in a shared code base. They don't even make sense for myself on a project I'm doing myself. The code should read like the comment would. Almost like prose. If it doesn't, I haven't made the code good enough yet. We no longer live in a world where the code needs to only be machine readable and unreadable to most mortals. We have the luxury of being able to use good abstractions and extract functions without worrying about code running slow because of too many indirections, stacks that are too big etc. We can optimize for direct code readability!
Then there are the "why" comments. Those can be invaluable. If I assume I have the kind of good code I can just read as to the "what is this doing", I can sparingly add information to things that might seem unusual or weird or inexplainable.
Same with tests mentioned in some sibling posts. Tests should be written in a self documenting way. I like to name my tests after what they're testing. Like "should behave in X manner when doing Y to Z" from a user's point of view, not on a technical level. User not necessarily meaning end user, say if you're testing an API or library function. Different language make this easier or harder but I do it in all of them. Armed with that documentation of what my tests pre-requisites are and what the expected outcome is, I can write the actual test. I should be able to deduce the expectations of the test from the test's "name", thus double checking that I am testing the correct thing. Many tests I find in shared code bases are utterly unreadable, have way too many expectations and side effects and test too many things at once. With the above technique there's usually only one or very few expectations. If I parsed out the test names from all my tests and just gave them to you as a document, it should almost read like a documentation of all of the expected behaviors of my piece of software.
but I never actually trust that documentation to be accurate or up-to-date
This is like saying food is bad, because it can spoil. Sure, but we have ways of preventing that and figuring out when it happens.
A simple git blame or history (or the equivalent) should quickly answer a number of questions. Is the code significantly newer than the comment? Who can I ask for verification? etc.
It's not perfect, but significantly better than the alternative.
Similarly, presumably code changes are approved by reviewers, who should be preventing the merging of code that invalidates its own inline documentation without an update to the comments.
Yes, I am also in this camp. I document very rarely, only when sure something is inherently complicated and needs to be written down. A great example of this is often times odd bug fixes related to the evolution of features, underlying services and data structures deserve callouts often linking to the bug in question. I do take quite a lot of care in naming things, clarity in how things operate and what the key data structures are.
We need to write code that is optimized for communication and clarity. We have a number of tools we can use to craft communicative code.
1. Variable and function names. These should be descriptive and never deceptive. For example, I've seen metric tons of code like this `const json = makeSomeApiCall(params)` where the contents of `json` is a the decoded response data and not a JSON string. This code is deceptive and obfuscatory. But if you write `const decodedFooResponse = makeSomeApiCall(params);` then you are accurately describing what the value is. This is particularly import in untyped/loosely typed languages where the value of the variable could be anything.
2. Code structure and layout. When writing prose, we achieve clarity through structure. Code is no different. Keep related things together. Avoid run-ons. In other words, write code that has smallish functions, with smallish interfaces, it's easier to reason about how the code will behave. Avoid unnecessary mutation of values--repeated mutation of the same value is particularly pernicious. Use white space to create visual separations of distinct ideas. Use abstraction and encapsulation to keep code focused. Avoid deeply nested conditionals, flattening the tree whenever possible. And a million other strategies that can all help.
3. Tests. Tests can be documentation, but they certainly aren't always. They also aren't convenient documentation--what they offer isn't going to show up in your editor when you inspect a method. Unit tests are a nice way of recording verifiable expectations for behavior. But if the test code is complicated or poorly organized, all it does is compound the misery of working on a messy codebase.
4. Comments are distinct from documentation here. Comments are little side notes like "this implementation is a bit of a hack. It is brittle because .... but we decided to keep it because X. See someURL for contemporaneous discussion." They tell us why a thing is the way it is or about some sort of risk or unexpected detail.
4. Documentation is absolutely a part of making code understandable. These docs are primarily going to be used in an IDE/editor when viewing function signature data. Ideally you can write JavaDoc/TSDoc/POD or similar inline docs. Examples are worth their weight in gold.
We should be using all our tools to write communicative code. We don't need every technique for every line or block, but over the scope of a package or project, we should judiciously employ them all. Every line added comes with a concomitant maintenance cost--we must ensure that every line we add is worth that cost.
Comments are absolutely necessary when explaining complex logic that is hard to read for the casual reader (or yourself 6 months later).
Comments are not necessary when the code itself explains what it does by being clearly named, isolated and organized.
Comments are not documentation and shouldn't be used as such. They are tools to improve code clarity for the person who is changing the code. They are not tools for someone who is using the code. If the intent is to teach someone what the code does so they can use it somewhere else, write documentation in a wiki with examples and link the wiki URL to that point in the code.
I have the same problem. Rather than fight it, use it as your superpower. I find it helps me write easier to understand abstractions because I can't physically hold enough in my head to understand anything more complex. Same is true for documentation!
We have some blatant falsehoods about how documentation should be organized as well. Virtually everyone seems to think that people know which file to open, and so we organize files as if the person is always in the right place.
This doesn't scale, because as the size of the project and the number of docs increases, we start refining generic knowledge into more refined slices of the problem domain. At first the docs are so far apart that you rarely miss, but as time goes you miss more and more. So while your outline might suggest that finding docs grows logarithmicly, it's linear at best.
The first line of any doc should answer "where am I?" and "why do I care" because the odds that they don't care go up over time, and people put a time limit on self-service. After 2-3 wrong pages in a row they start getting impatient, unless they got through those wrong pages in 5 seconds apiece.
If i could i would upvote this 10x times. Always write documentation (especially for your own code). Future you will say thank you to todays you. At least i have done so multiple times. And i always feel cheating my self when i feel lazy to write documentation.
One thing that I believe is under-documented in code is _business decisions_. The team arrives at a decision in a meeting or informal discussion, and it leads to some code that might be tough to understand without context. I'll usually add a comment briefly describing the decision, and initial and date it.
Initially I would just link to a wiki page in a comment, but occasionally these links break, so in my experience it's better to include the notes directly.
It is often also useful not only to document the decision the team arrives at, but also briefly write down what was decided not to implement and why. This is very helpful when people come in a few weeks later with a "new" idea that has already been discussed, or in onboarding new people to the project.
The commit message should contain all the unusual context needed to understand the change. In practice however, most people write terrible commit messages.
It's not just that. Commit messages have a property that they get overwritten. When I refactor some code for performance reasons, I don't want to remove documentation - which is exactly what happens if I make a commit. True, older commit messages are available in the history, but... not conveniently available.
Oh gosh, this 100 times. I comment everything, even rationale behind decisions, I'm recently writing a book as I write a compiler[1] because some parts are confusing me so much that I just can't solve them, and can't go back to work on them later, if I don't document them.
Even if things go out of date, it does not matter, because it still is somewhat close to what used to be there and can still help the future reader (me or someone else) figure out what might have happened since then. It's a million times better than no documentation.
I feel like nobody goes to look at changelogs, or PRs, or commits. Probably because they don't ever expect anything good from it. Also they're not really searchable.
But still, how would you search through commit history to figure out one thing? Comments are right here in the code, and books/external doc/rfcs refer to concepts.
I feel like commits are only good if you're spelunking, which is usually for a single reason: you're bisecting looking for a bug.
If you write down in comments the history of why the code is written the way it is now, not just the current code but all the things that were tried before and why they had to be changed, you'll have too many comments and it will be hard to read the code. That's why it's rarely done.
> I feel like commits are only good if you're spelunking, which is usually for a single reason: you're bisecting looking for a bug.
I've read history to see why things are the way they are, but I agree that most of the time it's looking to see when a bug is introduced and what was known about it at the time. That's a pretty important use, though. If all you get is a commit with no useful message, you can see what line of code was changed but not what the reasoning and investigative data behind them was. Many bugs show up from changes that were themsleves supposed to fix bugs or make something subtle work in a particular way, so the reasoning behind them is relevant when a new bug is discovered.
With a good issue tracker, a commit message like "fixed #386" is theoretically enough because the information is in issue #386. But tbh it's still friction to see lists of commits which contain nothing more than # references to pages somewhere on GitHub and no useful description. I prefer to summarise the issue and the fix in the commit message (and PR message) in those cases.
(To an extent it depends on whether you're using Git itself, or GitHub/equivalent, as the latter expand # references to include the one-liner description when displaying the messages. I find GitHub extremely slow compared with Git, and it has awful commit history tools (won't show the graph for example), so I use Git and see # references by themselves. When colleagues produces a lot of these, it's like a sea of unexplained changes, as if nobody can be bothered to say what their code does at all.)
Another completely different reason I've grepped through git history with the Linux kernel and other widely used projects like Glibc and GCC, is to see every change to an API or subsystem or function throughout it's history, in order to write "portable" code that will work with every version across a large time range. Occasionally I've even written a short document listing every change that's relevant to what I'm building, to help me build the thing.
This is particularly important with system calls, library functions, and internal APIs (e.g. for kernel modules). Although it's rare for an external API change to break existing code (though it does happen), it's common for an API feature which works today to be missing or buggy in the past, in versions which are still being used by someone. Internal APIs change more often, so finding the changes is even more essential. Writing portable code means finding the history of all those changes, including bugs and feature additions, to write code that works correctly when it's running on any version.
For example when I was writing code to use io_uring, a large part of the work was going through every change to the kernel io_uring subsystem to check every change affecting those parts of the API I was using, so I could avoid using them on buggy kernel versions, and so I could adapt to API changes that occurred. (This was also useful for future-proofing the code in that my test environment wasn't able to run the latest kernel, but in examining the history I'd also see "future" changes that my code would need to work with when shipped.)
The explanatory commit messages were essential for that. There's no way I could have understood the purpose of relevant changes in a useful timescale without those messages. Particularly for things which affected performance or thread correctness in subtle ways only with some machines and some applications, that you simply could not see from the code.
You might argue that comments should be there to explain all non-obvious aspects of the current code, but for code like which contains thousands of "Chesterton's fences" at high density, that style would be very comment-heavy, and that style is generally discouraged. In effect, there's more to the code than meets the eye. At least with the Linux kernel, the culture evolved to expect explanatory Git commits (before Git it was the mailing list, go back far enough and there were more comments in the code), so everyone knows to look at Git and lists now, keeping the code itself relatively clean as a result.
"Comments go out of date quickly" is true -- it's the "there's no point in investing in them" that's incorrect. In addition to being valuable in the short term (to both you and anyone else who ends up looking at the code), "comment just became outdated" is a highly valuable signal that a prior assumption made by other code could now be false, and you should look into that.
I have worked on teams where the code is very well documented (basically every public method, parameter, property or class gets a comment) and it's not at all hard to keep up to date. Sometimes people forget, but when you look at a PR and see the usage or meaning has changed with no corresponding comment change, it's easy to flag.
I feel like many people take the "comments will go out of date" adage the wrong way. To me, it doesn't say "don't write comments", it says "give your comments some love while you code".
It's meant to warn against letting comments become false, not against writing them. To me, at least.
> Most programmers believe a number of blatant falsehoods about documentation, with the most prevalent being "comments go out of date quickly, so there's no point in investing in them".
More broadly, a problem with our cognition is that when we remember something right now, we often think it is unlikely or even not possible that we wouldn't remember it a day, a week, a month from now.
IMO there is some common sense to comments that is often not followed. Example: I work on a project that has a pre-submit script that requires comments for tests. This leads to ridiculousness where I have code like
// Test that uploading data with XYZ function works
TEST(UploadingDataWithXYZFuncWorks) {
...
}
Like WTF! The function name says all that needs to be said. There are is no more info needed but I had to appease some over zealous programmer.
Meanwhile, in an existing test in the same file I see code like this
// Set the dimensions
SetDimensions(0, 0, width, height)
Again, WTF!
Comments are important for explaining why something exists, what its assumptions are, edge cases, etc..., but you can also go overboard with comments and you can make your code more readable by using readable names for functions and variables.
I agree it's a lame excuse not to document code, however there are just some parts that just changes way too often by multiple people where outdated comments can actually lead you astray. The stuff I touch doesn't change often so I'll comment to explain the weird and ambiguous looking parts.
I'm not saying you have to document each line of code with equal rigor. In fact, there's an art to knowing what to document, and like most arts, you get better at it over time. Sometimes I feel silly about what I've documented and what I've neglected, but that's OK. I learn.
The "code changes way too often" excuse is basically a restatement of the myth that I wrote above. Yes, code changes over time. It's pretty easy to change the comment with the code, when necessary. The team members who don't do this aren't doing their jobs.
But at the end of the day, even if code deviates from comments that's still fine. Having an outdated, well-written comment is a historical record of what the code was supposed to be doing. That's useful.
The added bonus to writing good comments is you’re practicing technical writing directly associated with your code.
It’s similar to writing your own flash cards, but by rephrasing the code into a comment you’re visually and linguistically using a form of repetition to stick the code inside your head with a human readable explanation. This can make you a better technical communicator when discussing any code with others.
There is a lot more than “I will read this comment in 6 months and it will save my ass” going on when you practice writing good comments.
I really like this because in my opinion, half of my job is good communication, another quarter is planning/design, and the final quarter is actually writing code.
> You can document your code. You can keep it up to date. It isn't that hard. You just don't want to.
Documentation that is close to the things it is talking about isn't hard to keep up to date. What is hard is knowing, when I make a change here, that someone (maybe me) talked about this thing over there without any pointer from here to there. Can I do a careful search of everywhere there might be documentation every time I make a change? Yeah, I guess; it'll definitely slow me down in the short term and I'll still miss things.
Does "documentation" mean separate documents completely disconnected to the codebase? Isn't that the problem? Code gets updated according to necessity whereas documentation is for human consumption and not required to make the thing work.
> Does "documentation" mean separate documents completely disconnected to the codebase?
At worst, yes. I've increasingly seen docs in Notion, but Confluence was common before and probably still is. I agree that a big part of the solution is "get it all in the repo" but I don't think that's enough.
Even within a repo, writing can be pretty distant from the code it discusses. In my experience, programmers and reviewers are great about comments on or adjacent to changed lines (not coincidentally, what shows up in git diff and code review software). Both are still pretty good about comments that they can structurally expect to be present, like function documentation at the top of a function.
It falls off a lot for programmers when it's a comment talking about something more than maybe 20 lines away, whether that's a quick sketch at the top of a loop or a block comment somewhere in the file about invariants or caveats or gotchas. At that point, reviewers simply will not catch that the programmer didn't update the documentation, unless they happen to be unusually familiar with the file in question.
Comments that reference code in other files are thankfully rare if you have reasonable modularity, but will also be missed by both programmer and reviewer unless the task in question touches both files in a relevant way.
Documentation in the repository that is not a part of code is hit and miss. Developer setup guides are often brittle to particular system configurations, but usually get some effort every time the new hire trips over something. Runbooks similarly, and less limited to the new hire, and hopefully you look them over proactively on occasion. Large scope architecture documents are doomed; I am more optimistic about something like ADRs which should be relatively narrow and aren't intended to be updated beyond deprecation.
Much of this can be helped with tooling, but where it exists its poorly standardized.
Docs don't go stale like bread. They atrophy from disuse.
I always force the onboarding process to go through our docs, and I spend a little time with each new person observing their progress looking for regressions in the docs. You can't get that with old-timers because of the echo chamber/curse of knowledge effect.
This breaks down when you have a place that never hires new people. And rather than thinking that's a flaw in my process, I'm starting to think that's a flaw in the business itself. Without fresh ideas and feedback a project stagnates.
Depending on the nature of your memory and thought process, the insights gained from potentially-false statements in the documentation when they are true might be outweighed by the blind alleys and misunderstandings generated when they are false. You'd have to be very good at remembering where things came from and tagging them with their level of certainty. An important skill! But not a trivial one.
> You'd have to be very good at remembering where things came from and tagging them with their level of certainty.
I wonder why not more people do this? Not just for code, but for everything. I remember where I learned everything I know, I don't trust anything unless I remember the source. How do other people think, do they think that whatever pops up in their head is the truth without any source? Then how do they know it is a fact and not just a hunch or a guess?
It would be neat if your code editor could easily highlight the code newer than the comment(s) and tests based on your git history (without jumping out of your work, of course). Even something as simple as "highlight all code newer than the current line", for example, would be quite useful.
Of course comments stay in sync with the code... if you're the only one working on that code. As soon as you have multiple developers, you can forget about it.
"Of course comments stay in sync with the code... if you're the only one working on that code."
If I am in a rush to implement something and then am sidetracked because of a stupid small bug, then I just fix that small bug in the code. And then another one. In these states I only pay attention to code, and not text. If I would also have to read the text, then I would forget the original task.
So sadly no, comments do not automatically stay in sync with code for me, unless I put in the extra effort of cleaning up afterwards.
What if the small bug got introduced because of lack of time to cater all the verbose comments?
But seriously, in the cases I remember - not really. Most bugs are a cause of lack of higher level understanding of a certain module. Some clear comments can help with that, but better is proper higher level documentation and the time to read and maintain them.
> Most bugs are a cause of lack of higher level understanding of a certain module. Some clear comments can help with that, but better is proper higher level documentation
Agreed. However I've found that the farther away the documentation is, the less people will use it.
If it's external, it's very hard to get people to use it.
I'd not approve your PRs. "Cleanup later" virtually never happens, and it's too likely someone will re-fix your rushed fix incorrectly because of misleading comments.
And you are free to be as slow as you want, but my flow state works a bit different and context switches are expensive. Which is why I want the minimum of comments and rather have self explaining code and proper higher level documentation.
> You can document your code. You can keep it up to date. It isn't that hard. You just don't want to.
Yeah I can or an individual can, that misses the point of the advice which is intended for groups on large projects with deadlines. Like once a manager says get this done now and we'll find time later for the rest... That never happens
If you are diligent about documenting AND updating, sure, seems very reasonable. But most people aren't. You know what the best documentation is? Working tests, much harder to have wrong tests than wrong documentation. Even some of the largest projects like Ruby on Rails have had incorrect internal code docs.
No. Tests are not documentation. Tests are tests, written in code, which must be explained. You should document your tests, too, because I guarantee the next programmer won't understand your tests as well as you (think you) do. Also: the "next programmer" will be you in a year.
This is maybe a close #2 on the list of documentation falsehoods that programmers believe.
Yes. Test _can_ very much be documentation if you write them as such, although they aren't complete documentation. Tests are the specification, ie the the how---documentation provides the why.
Test can be written to tell a story but most people aren't taught this way (or simply don't buy it or don't write tests).
No, tests are not specification, they are examples, specifically on the form "do this, and this happens".
Imagining a specification out of this is like solving those problems of "what is the next number on this sequence: 1 18 31". It's simply can not be done, you can guess something, but you will never know if it's the real answer.
That _can_ be examples but I prefer to write them as specs. I mean, there are entire libraries centred around writing tests as specs. Just because you don't do it that way or buy into the idea doesn't mean it's not possible or valuable.
Specs are complete. If you create real world software with complete tests, I will really want to read your Turing Award receiving discourse. It should be great.
You are right about commenting tests, but it seems wrong to categorically reject their value as documentation for users. I use tests all the time to get code examples and understand what's actually supported. One of the great things about open source projects is that you can see the tests.
> You should document your tests, too, because I guarantee the next programmer won't understand your tests as well as you (think you) do. Also: the "next programmer" will be you in a year.
I hear this a lot, but I haven't found the same with myself. I can read and understand old code I wrote.
I attribute it to my IQ being mostly held up by reading and writing comprehension skills. The way I process information makes it easier for me to remember and understand old code (I believe), compared to a lot of programmers whose inherit skills align more closely to things like math and logic.
Documentation can be a rabbit hole for me, and I often don't feel I benefit from it within the code.
I appreciate this is as good tactic for many people, but implying it's for everyone is too dogmatic.
I wouldn't describe it as a "falsehood that programmers believe", just one of those perhaps unfortunate realities that exist for some codebases that have a decent set of unit tests but little else in the way of up-to-date documentation that explains all edge case behaviour etc.
The only documentation you can really trust is that which 100s of others rely on regularly, but a significant percentage of the code most of us work with isn't going to fulfill that criteria.
Explaining edge case behavior is one use-case for comments, and not the most valuable one in my estimation. Aside from that, often incorrect edge case handling is in both the code and the unit test because the problem is that the developer didn't understand the requirements. In my experience, in an undocumented and difficult codebase the tests will be as mysterious or unreliable as the code itself, which makes sense since they are usually written by the same people.
You'd be surprised. There are generations of programmers now who think nothing of writing 20+ unit tests with quite clear names demonstrating what the behaviour should be under a variety of conditions, but with virtually no other documentation. Especially true in dev shops that have high coverage requirements for a successful ci build.
Tests are often a pretty good way to learn how a given piece of code behaves. Particularly, if the author is being nice and provides a "usage example" type test.
Tests verify what the code is doing. And I agree: they are a great source of insight when understanding an unfamiliar codebase.
However, comments are typically better at answering the why.
If you are diligent about documenting AND updating
I don't understand why this is viewed as challenging. Writing a sentence or two here and there is orders of magnitude easier than writing the code itself. And any code review process should help to prevent situations where the code and comments are out of sync.
Lastly, I feel like there's a larger human issue. I write comments to explain certain why's in the code because I care about my teammates and I care about the project.
If others don't do the same, I think it speaks to a lack of care for their fellow engineers and the work itself. I think, "I just spent five hours figuring out something that you could have explained with a single 30-second comment?"
I'm baffled that some engineers think that this is okay.
> You know what the best documentation is? Working tests
Tests do have a slight overlap with documentation, but it's that, only slight.
If a piece of code has some weird non-obvious behavior, the presence of a test for that particular behavior is a signal that it's actually intentional, not a random bug.
But, it doesn't tell me anything why that design choice happened. That's what the documentation is for. So facing such code, I sure hope it's well documented.
Show me a set of tests that I can use to build something more easily than using the documentation? It sounds like just kicking the can down the road to the next person.
Same issue here. It's one of the reasons I've become so enamored with Ada.
With Ada, it's not only easy, but encouraged, to encode so much information in about how things are modeled into the program itself. Not only does it function somewhat like documentation, it also lets the compiler helpfully yell at me when I still manage to forget how things actually work. It's saved me so much stress and debugging time.
Now if only any of these 'safer' languages would add even just strong typedefs. Even if they don't particularly encourage their use, it'd be something.
> "comments go out of date quickly, so there's no point in investing in them"
The key difference is where the described thing lives.
A comment describing the next few lines of code or some loop that follows, or similar, won't go out of date since it's easily updated together with what it describes.
A comment living in another file, describing loosely some far-reaching but still-evolving aspect will likely be outdated soon, if the thing it describes changes and that comment is out of sight and needs extra effort to be remembered and updated.
I (should better maintain) my little compendium of shell 'one liners' that do all sorts of incredibly useful things, that getting right first time, were hard won.
Having that handy you can seem like a god at times to your colleagues when they are trying to solve something. "Oh, did you know you can do this? I'll message it to you."
I think my working memory isn't very good, but there is something in there that is very good surely, because how else would I have been able to computers for this long?
Usually when writing some code, which deals with something new for me, I get many "idiot questions" in my head. I try to write comments in a way, which will answer my future self's "idiot questions". Answering all those questions, I feel more like I truly understand, what I am doing.
I often don't comment code, especially in personal stuff, but when I occasionally do, which happens mostly when things get overwhelming, I find bugs or fix things that I was stuck on. Writing forces you to understand better, name better, and almost feels like providing you with another perspective, all without leaving your own self.
Joking aside, documenting as a habit alongside coding is like a superpower. I find that writing can act as validation against my understanding of a problem; if I struggle to write about it, then it's likely that I don't understand the problem as well as I thought.
I'm in exactly the same position. My memory is awful so I write notes to my future self. It's not hard to keep comments up to date if you keep them near the code and don't use a lot of boilerplate.
> It's not hard to keep comments up to date if you keep them near the code and don't use a lot of boilerplate.
Yep, exactly. It only gets "complicated" when you start having these heavy-handed doc generators that are parsing your code and breaking the build.
I feel pretty strongly that sphinx, rubydoc, python docstrings et al. are fine tools, but you have to have a light touch with how they're applied. Autogenerated external docs are a separate problem, and shouldn't discourage developers from commenting their code.
There are some “tricks”. For example blending what you want to remember with something memorable, creating a mental picture & recall it a few times. That’s what I got from the book Moon Walking with Einstein.
Write things down. Use reminders. Place things where you will notice them when you need to be reminded of them, like the proverbial string tied around your finger.
Code is already a formal specification of what the machine is set to do.
All your documentation can do is make it more ambiguous. Usually the documentation is wrong as well, but that might be because the programmer didn't know how to specify clearly what he wanted.
> Code is already a formal specification of what the machine is set to do.
I take back my other comment where I said that "tests are documentation" is the #2 falsehood about documentation that programmers believe. This is the #2 falsehood.
Yes, "formally" the code is the spec. That doesn't help you very much though, squishy human, because you're not a computer.
// there is a method overload which accepts an array,
// but it's buggy and crashes. workaround: cast to tuple first.
...
then you seem to be assuming people only want to write and read the first kind. Because no matter how well you 'know all the fine details of the programming language', comments can tell you things the code can't.
Even aside from the pointless gatekeeping of "competent programmers" - ok what about people who aren't employed as programmers but still need to read and write code? What about the people who just have to deal with it being an unfamiliar language because nobody who knows the language is available?
Substitue any comment that tells you something which the code cannot tell you[1], to see why "competent programmers understand the language thoroughly" is not a good justification for being against comments.
[1] e.g. why the code was written this way instead of another way, or why it exists at all.
A competent English reader would understand what I wrote without any further comments needed, right? Apparently understanding of the language isn't enough. That's the point I was trying to make, but meta.
In this scenario you can't fix or remove it, it's third party library code which is outside your (scope, remit, time or effort constraints).
First you wouldn't write such stuff in comments, that's useless and not what comments are for. Comments are to explain non-obvious things in code, or clarify assumptions that are made, not to narrate that you found bugs but are too lazy to fix them.
If a third-party library is broken, you either fix it or stop using it.
As a developer, you're responsible with ensuring your application works well and is easy to maintain. It doesn't matter if someone else wrote some of the code.
> First you wouldn't write such stuff in comments, that's useless and not what comments are for. Comments are to explain non-obvious things in code,
First, that is a non-obvious thing in code. You look at code someone else wrote, you use your "competent programmer's understanding of the language", you see the overload they "should" have used, you are about to rewrite their function call to cut out the unnecessary cast, and the comment tells you the non-obvious reason why you shouldn't do that. Thus making it a useful comment.
> "If a third-party library is broken, you either fix it or stop using it. As a developer, you're responsible with ensuring your application works well and is easy to maintain. It doesn't matter if someone else wrote some of the code."
If touching any piece of code written by anyone makes it "your application" and immediately mandates that you rewrite all of it to your standards, that is "not how software development works".
Most programmers believe a number of blatant falsehoods about documentation, with the most prevalent being "comments go out of date quickly, so there's no point in investing in them". Maybe I'm just hyper-aware of it because my short-term memory sucks, but code comments have saved me on so many occasions that they're simply not optional.
You can document your code. You can keep it up to date. It isn't that hard. You just don't want to.