Accidentally Load Bearing

TheCapn · on July 20, 2023

Huh. Finally a name for it.

I do a lot of support work for Control Systems. It isn't unheard to find a chunk of PLC code that treats some sort of physical equipment in a unique way that unintentionally creates problems. I like to parrot a line I heard elsewhere: "Every time Software is used to fix a [Electrical/Mechanical] problem, a Gremlin is born".

But often enough when I find a root cause of a bug, or some sort of programmed limitation, the client wants removed. I always refuse until I can find out why that code exists. Nobody puts code in there for no reason, so I need to know why we have a timer, or an override in the first place. Often the answer is the problem it was solving no longer exists, and that's excellent, but for all the times where that code was put there to prevent something from happening and the client had a bunch of staff turnover, the original purpose is lost. Without documentation telling me why it was done that way I'm very cautious to immediately undo someone else's work.

I suppose the other aspect is knowing that I trust my coworkers. They don't (typically) do something for no good reason. If it is in there, it is there for a purpose and I must trust my coworkers to have done their due diligence in the first place. If that trust breaks down then everything becomes more difficult to make decisions on.

bamfly · on July 20, 2023

This is why I comment a "why" for any line of code that's not incredibly obvious. And 100% of the time when it's due to interaction with something outside the codebase, whether that's an OS, filesystem, database, HTTP endpoint, hardware, whatever, if it's not some straightforward call to some API or library.

Sleep due to rate limiting from another service? COMMENT. Who's requiring it, the limits if I know exactly what they are at the time (and noting that I do not and this is just an educated guess that seems to work, if not), what the system behavior might look like if it's exceeded (if I know). Using a database for something trivial the filesystem could plausibly do, but in-fact cannot for very good reasons (say, your only ergonomic way to access the FS the way that you need to, in that environment, results in resource exhaustion via runaway syscalls under load)? Comment. Workaround for a bug in some widely-used library that Ubuntu inexplicably refuses to fix in their LTS release? Comment. That kind of thing.

I have written so very many "yes, I know this sucks, but here's why..." comments.

I also do it when I write code that I know won't do well at larger scale, but can't be bothered to make it more scalable just then, and it doesn't need to be under current expectations (which, 99% of the time, ends up being fine indefinitely). But that may be more about protecting my ego. :-) "Yes, I know this is reading the whole file into memory, but since this is just a batch-job program with an infrequent and predictable invocation and this file is expected to be smallish... whatever. If you're running out of memory, maybe start debugging here. If you're turning this into something invoked on-demand, maybe rewrite this." At least they know I knew it'd break, LOL.

MrBuddyCasino · on July 20, 2023

You're doing the lords work. I often get pushback on doing this with some variation of "comments bad, code should be self-documenting". This is unwise, because there are "what code does" and "why code does" comments, but this turns out to be to nuanced to battle the meme.

derefr · on July 20, 2023

“Why” isn’t a property of the code itself, though; it’s rather a property of the process of creating the code.

Because of this, my personal belief is that the justification for any line of code belongs in the SCM commit message that introduced the code. `git blame` should ultimately take you to a clear, stand-alone explanation of “why” — one that, as a bonus, can be much longer than would be tolerated inline in the code itself.

Of course, I’m probably pretty unusual, in that I frequently write paragraphs of commit message for single-line changes if they’re sufficiently non-obvious; and I also believe that the best way to get to know a codebase isn’t to look at it in its current gestalt form, but rather to read its commit history forward from the beginning. (And that if other programmers also did that, maybe we’d all write better commit messages, for our collective future selves.)

yuliyp · on July 21, 2023

I think there's room for the comments having a brief explanation and a longer one in the commit message. Sometimes people need the callout to go looking at the history because otherwise they might not realize that there's anything significant there at all.

derefr · on July 21, 2023

The "brief explanation" is what the commit message's subject line is for :) Just turn on "show `git blame` beside code" in your IDE, and you'll get all the brief explanations you could ever want!

...but no, to be serious for a moment: this isn't really a workable idea as-is, but it could be. It needs discoverability — mostly in not just throwing noise at you, so that you'll actually pay attention to the signal. If there was some way for your text editor or IDE to not show you all the `git blame` subject lines, but just the ones that were somehow "marked as non-obvious" at commit time, then we could really have something here.

Personally, I think commit messages aren't structured enough. Github had the great idea of enabling PR code-reviewers to select line ranges and annotate them to point out problems. But there's no equivalent mechanism (in Github, or in git, or in any IDE I know of) for annotating the code "at commit time" to explain what you did out-of-band of the code itself, in a way that ends up in the commit message.

In my imagination, text-editors and IDEs would work together with SCMs to establish a standard for "embeddable commit-message code annotations." Rather than the commit being one long block of text, `git commit -p` (or equivalent IDE porcelain) would take you through your staged hunks like `git reset -p` does; but for each hunk, it would ask you to populate a few form fields. You'd give the hunk a (log-scaled) rating of non-obviousness, an arbitrary set of /[A-Z]+/ tags (think "TODO" or "XXX"), an eye-catching one-liner start to the explanation, and then as much additional free-text explanation as you like. All the per-hunk annotations would then get baked into a markdown-like microformat that embeds into the commit message, that text-editors/IDEs could recognize and pull back out of the commit message for display.

And then, your text-editor or IDE would:

1. embed each hunk's annotation-block above the code it references (as long as the code still exists to be referenced — think of it as vertical, hunk-wise "show `git blame` beside code");

2. calculate a visibility score for each annotation-block based on a project-config-file-based, but user-overridable arbitrary formula involving the non-obviousness value, the tags, the file path, and the lexical identifier path from the syntax highlighter (the same lexical-identifier-path modern `git diff` gives as a context line for each diff-hunk);

3a. if the visibility score is > 1, then show the full annotation-block for the hunk by default;

3b. else, if the visibility score is > 0, then show the annotation-block folded to just its first line;

3c. else, hide the annotation-block (but you can still reveal the matching annotation with some hotkey when the annotated lines are focused.)

Of course, because this is just sourced from (let's-pretend-it's-immutable) git history, these annotation-block lines would be "virtual" — i.e. they'd be read-only, and wouldn't have line-numbers in the editor. If the text-editor wants to be fancy, they could even be rendered in a little sticky-note callout box, and could default to rendering in a proportional-width font with soft wrapping. Think of some hybrid of regular doc-comments, and the editing annotations in Office/Google Docs.

---

...though, that's still not going as far as I'd really like. My real wish (that I don't expect to ever really happen) is for us to all be writing code as a series of literate codebase migrations — where your editor shows you the migration you're editing on the left, and the gestalt codebase that's generated as a result of running all migrations up to that one on the right, with the changes from the migration highlighted. You never directly edit the gestalt codebase; you only edit the migrations. And the migrations are what get committed to source-control — meaning that any code comments are there to be the literate documentation for the changes themselves; while the commit messages exist only to document the meta-level "editing process" that justifies the inclusion of the change.

Why? Because the goal is to structure the codebase for reading. Such a codebase would have one definitive way to learn it: just read the migrations like a book, front to back; watching the corresponding generated code evolve with each migration. If you're only interested in one component, then filter for just the migrations that make changes to it (`git log -S` style) and then read those. And if you, during development, realize you've just made a simple typo, or that you wrote some confusing code and later came up with a simpler way to express the same semantics — i.e. if you come up with code that "you wish you could have written from the start" — then you should go back and modify the earlier migration so that it's introduced there, so that new dev reading the code never has to see you introduce the bad version and then correct it, but instead just sees the good version from the start.

In other words: don't think of it as "meticulously grooming a commit history"; instead, think of it as your actual job not being "developer", but rather, as you (and all your coworkers) being the writers and editors of a programming textbook about the process of building program X... which happens to compile to program X.

jimmaswell · on July 21, 2023

I don't understand the focus on commit messages. I never read the git log. You can't assume anyone reading the code has access to the commit history or the time to read it. The codebase itself should contain any important documentation.

derefr · on July 22, 2023

> I never read the git log.

Well, no, because 1. it's not useful, because 2. most people never write anything useful there (which is a two-part vicious cycle), and 3. editors don't usefully surface it.

If we fix #3; and then individual projects fix #2 for themselves with contribution policies that enforce writing good commit messages; then #1 will no longer be true.

> You can't assume anyone reading the code has access to the commit history or the time to read it.

You can if you're the project's core maintainer/architect/whoever decides how people contribute to your software (in a private IT company, the CTO, I suppose.) You get to decide how to onboard people onto your project. And you get to decide what balance there will be between the amount of time new devs waste learning an impenetrable codebase, vs. the amount of time existing devs "waste" making the codebase more lucid by explaining their changes.

> The codebase itself should contain any important documentation.

My entire point is that commit messages are part of "the codebase" — that "the codebase" is the SCM repo, not some particular denuded snapshot of an SCM checkout. And that both humans and software should take more advantage of — even rely upon! — that fact.

jimmaswell · on July 22, 2023

I've been in enough projects that changed version control systems, had to restart the version control from a snapshot for whatever reason (data loss, performance issues with tools due to the commit history, etc) that I wouldn't want to take this approach.

> amount of time new devs waste learning an impenetrable codebase, vs. the amount of time existing devs "waste" making the codebase more lucid by explaining their changes.

That's a false dichotomy. The codebase won't be impenetrable if there are appropriate comments in it. In my experience time would be better spent making the codebase more lucid in the source code than an external commit history. The commit messages should be good too but I only rely on them when something is impossible to understand without digging through and finding the associated ticket/motivation, which is a bad state to be in, so at that point a comment is added. Of course good commit messages are fine too, none of this precludes them.

funcDropShadow · on July 23, 2023

Agreed on most points, but a good SCM provides also the ability to bisect bugs and to show context that is hard to capture by explicit comments. E.g. what changed at the same time as some other piece of code. What was changed by the same person a few days before and after some line got introduced.

Regarding your first point:

> I've been in enough projects that changed version control systems

I have the impression that with the introduction of git, it became suddenly en-vogue to have tools to migrate history from one SCM to another. Therefore, I wouldn't settle on restarting from a snapshot anymore.

With git you can cut off history that is too old but weighs down the tools. You can simplify older history for example, while keeping newer history as it is. That is of course not easy but it can be done with git and some scripting.

lmm · on July 24, 2023

> I've been in enough projects that changed version control systems, had to restart the version control from a snapshot for whatever reason (data loss, performance issues with tools due to the commit history, etc) that I wouldn't want to take this approach.

When was that? I've never seen that in 15-20 years of software development; I've seen plenty of projects change VCS but they always had the option of preserving the history.

yuliyp · on July 22, 2023

Sure it's there, but having to wade through a large history of experiments tried and failed when trying to answer why this thing is here right now just feels subpar. Definitely sometimes it's helpful to read the commits which introduced a behavior, but that feels like a fallback when reading the code as it exists now. It works, but is much slower.

funcDropShadow · on July 23, 2023

> Sure it's there, but having to wade through a large history of experiments tried and failed when trying to answer why this thing is here right now just feels subpar

That is one the downsides of trunk-based developments. One keeps a history of all failed experiments, the usefulness of the commit history deteriorates. That is for reading commit messages as well as for bisecting bugs.

derefr · on July 22, 2023

Which is why I said this in my previous comment:

> In other words: don't think of it as "meticulously grooming a commit history"; instead, think of it as your actual job not being "developer", but rather, as you (and all your coworkers) being the writers and editors of a programming textbook about the process of building program X... which happens to compile to program X.

If you have to "wade through experiments" to read the commit history, that means that the commit history hasn't had a structural editing pass applied to it.

Again: think of your job as writing and editing a textbook on the process of writing your program. As such, the commit history is an entirely mutable object — and, in fact, the product.

Your job as editor of the commit-history is, like the job of an editor of a book, to rearrange the work (through rebasing) into a "narrative" that presents each new feature or aspect of the codebase as a single, well-documented, cohesive commit or sequence of commits.

(If you've ever read a programming book that presents itself as a Socratic dialogue — e.g. Uncle Bob's The Clean Coder — each feature should get its own chapter, and each commit its own discussion and reflected code change.)

Experiments? If they don't contribute to the "narrative" of the evolution of the codebase — helping you to understand what comes later — then get rid of them. If they do contribute, then keep them: you'll want to have read about them.

Features gradually introduced over hundreds of commits? Move the commits together so that the feature happens all at once; squash commits that can't be understood without one-another into single commits; break commits that can be understood as separate "steps" into separate commits.

After factoring hunks that should have been independent out into their own commits, squashing commits with their revert-commits, etc., your commit history, concatenated into a file, should literally be a readable literate-programming metaprogram that you read as a textbook, that when executed, generates your codebase. While also still serving as a commit history!

(Keeping in mind that you still have all the other things living immutably in your SCM — dead experiments in feature branches; a develop branch that immutably reflects the order things were merged in; etc. It's only the main branch that is groomed in this fashion. But this groomed main branch is also the base for new development branches. Which works because nobody is `git merge`ing to main. Like LKML, the output-artifact of a development branch should be a hand-groomed patchset.)

And, like I said, this is all strictly inferior to an approach that actually involves literate programming of a metaprogram of codebase migrations — because, by using git commit-history in this way, you're gaining a narrative view of your codebase, but you're losing the ability to use git commits to track the "process of editing the history of the process of developing the program." Whereas, if you are actually committing the narrative as the content of the commits, then the "process of editing the history" is tracked in the regular git commits of the repo — which themselves require no grooming for presentation.

But "literate programming of a metaprogram that generates the final codebase" can only work if you have editor support for live-generating+viewing the final codebase side-by-side with your edits to the metaprogram. Otherwise it's an impenetrably-thick layer of indirection — the same reason Aspect-Oriented Programming never took off as a paradigm. Whereas "grooming your commit history into a textbook" doesn't require any tooling that doesn't already exist, and can be done today, by any project willing to adopt contribution policies to make it tenable.

---

Or, to put this all another way:

Imagine there is an existing codebase in an SCM, and you're a technical writer trying to tell the story of the development of that codebase in textbook form.

Being technically-minded, you'd create a new git repo for the source code of your textbook — and then begin wading through the messy, un-groomed commit history of the original codebase, to refactor that "narrative" into one that can be clearly presented in textbook form. Your work on each chapter would become commits into your book's repo. When you find a new part of the story you want to tell, across several chapters, you'd make a feature branch in your book's repo to experiment with modifying the chapters to weave in mentions of this side-story. Etc.

Presuming you finish writing this textbook, and publish it, anyone being onboarded to the codebase itself would then be well-advised to first read your textbook, rather than trying to first read the codebase itself. (They wouldn't need to ever see the git history of your textbook, though; that's inside-baseball to them, only relevant to you and any co-editors.)

Now imagine that "writing the textbook that should be read to understand the code in place of reading the code itself" is part of the job of developing the program; that the same SCM repo is used to store both the codebase and this textbook; and that, in fact, the same textual content has to be developed by the same people under the constraints of solving both problems in a DRY fashion. How would you do it?

funcDropShadow · on July 23, 2023

> I never read the git log.

Imho, you are missing out on a great source of insight. When I want to understand some piece of code, I usually start navigating the git log from git blame. Even just one-line commit messages that refer to a ticket can help understanding tremendously. Even the output of git blame itself is helpful. You can see which lines changed together in which order. You see, which colleague to ask questions.

funcDropShadow · on July 23, 2023

One could accomplish a part of your vision, today, by splitting commits into smaller commits. Then you have the relation between hunks of changes and text in the commit message on a smaller level. Then you can use branches with a non-trivial commit message in the merge commit to document the whole set of commits.

As far as I know changes to the Linux kernel are usally submitted as a series of patches, i.e. a sequence of commits. I.e. a branch, although it is usually not represented as git branch while submitting.

jrs235 · on July 21, 2023

I disagree.

WHY: Using _______ sort because at time of writing code, the anticipated sets are ... and given this programming language and environment ... this tends to be more performant (or this code was easiest and quickest to write and understand).

This way when someone later come along and says WTF?! They know why or at least some of the original developers reasoning for choosing that code implementation.

ericbarrett · on July 20, 2023

Completely agree. I find this attitude toward (against?) comments is common in fields where code churn is the norm and time spent at a company rarely exceeds a year or two. When you have to work on something that lasts more than a few seasons, or are providing an API, comments are gold.

sedev · on July 20, 2023

Concurrence: the ability of the code to be self-documenting ends at the borders of the code itself. Factors outside of the code that impose requirements on the code must be documented more conventionally. The "sleep 10 seconds to (hopefully) ensure that a file has finished downloading" anecdote from upthread is a great example.

scubbo · on July 21, 2023

You may enjoy https://brooker.co.za/blog/2020/06/23/code.html

MrBuddyCasino · on July 21, 2023

I indeed enjoyed.

mostlylurks · on July 20, 2023

Self-documenting code is perfectly capable of expressing the "why" in addition to the "what". It's just that often the extra effort and/or complexity required to express the "why" through code is not worth it when a simple comment would suffice.

tetha · on July 20, 2023

> Self-documenting code is perfectly capable of expressing the "why" in addition to the "what".

I don't think anymore that's true, at least in a number of areas.

In another life, I've worked on concurrent data structures in java and/or lock-free java code I'd at this point call too arcane to ever write. The code ended up looking deceptively simple and it was the correct, minimal set of code to write. I don't see any way to express the correctness reasoning for these parts of code in code.

And now I'm dealing with configuration management. Code managing applications and systems. Some of these applications just resist any obvious or intuitive approach, and some others exhibit destructive or catastrophic behavior if approached with an intuitive approach. Again, how to do this in code? The working code is the smallest, most robust set of code I can setup to work around the madness. But why this is the least horrible way to approach this, I cannot express that in code. I can express this in comments.

scubbo · on July 21, 2023

This is a statement which is technically-true (so long as your language of choice has no length on names), but unhelpful since it does not apply in most practical cases.

gav · on July 20, 2023

I like to make sure the "why" is documented, but it's hard to get people to care about that.

I remember a former client tracking me down to ask about a bug that they had struggled to fix for months. There was a comment that I'd left 10 years earlier saying that while the logic was confusing, there was a good reason it was like that. Another developer had come along and commented out the line of code, leaving a comment saying that it was confusing!

overnight5349 · on July 20, 2023

Hah, I did one of these just last week. There's some sort of silicon bug or incorrect documentation that causes this lithium battery charger to read the charge current at half of what it should be. This could cause the battery to literally explode, so I left a big comment with lots of warnings explaining why I'm sending "incorrect" config values to the charge controller.

It's absolutely imperative that the next guy knows what the fuck I'm doing by tampering with safety limits.

jrs235 · on July 21, 2023

I like to add the date the why comment was added and the date the comment was last reviewed/veified as still true/neccesary (which will rarely differ because they are seldomly reviewed/re-verified).

COMMENT WRITTEN: 2023-03-21

COMMENT LAST REVIEWED/VERIFIED AS STILL TRUE: 2023-05-04

WHY THIS CODE: This sucks but ...

csours · on July 20, 2023

There was some inscrutable code that had sleep 10 [seconds] in the middle.

It took several hours to figure it out, but the sleep was there in case a file had not finished downloading

scubbo · on July 21, 2023

I see your "wait for a file to finish downloading", and raise you a "wait before responding because, for one of our clients, if the latency is below a certain threshold, it will assume that the response was a failure and will discard it". That was a fun codebase.

PlunderBunny · on July 21, 2023

Back in the 90s, we were trying to debug a crash on a customer’s site (using the then miraculous pcAnywhere). We couldn’t figure it out, and in desperation sent a developer that lived in the same country to observe ‘live’. As soon as he watched us reproduce it, he said “the modem was still disconnecting” - he could hear it, and we couldn’t. The solution, of course, was a sleep statement.

eschneider · on July 20, 2023

shudder

jasfi · on July 21, 2023

I like to write frequent comments, so that I can just scan the comments (only) of a function to know how and why it does what it does.

mike_hock · on July 20, 2023

The last part is also useful. Tells the next person where to look when they do run into scaling issues, and also tells them that there wasn't some other reason to do it.

Karellen · on July 20, 2023

> But often enough when I find a root cause of a bug, or some sort of programmed limitation, the client wants removed. I always refuse until I can find out why that code exists. Nobody puts code in there for no reason, so I need to know why we have a timer, or an override in the first place.

Isn't that just the regular Chesterton's Fence argument though?

The one the article is specifically written to point out is not enough by itself, because you need to know what else has been built with the assumption that that code is there?

TheCapn · on July 20, 2023

All my comment is adding a software anecdote to the story. It really is just regular Chesterton's Fence, a term I've never heard until now but dealt with for the last several years.

You're not wrong, but in the context of a PLC controlling a motor or gate it is far more segregated than the code you're probably thinking of. Having a timer override on a single gate's position limit sensor would have no effect on a separate sensor/gate/motor.

If the gate's function block had specific code built into it that affected all gates then what you're talking about would be more applicable.

bombcar · on July 20, 2023

I think they’re thinking of things like that software control for the health machine that could get in a state where it would give a lethal dose to a patient.

letitbeirie · on July 20, 2023

The most haunting comment line I've ever seen was buried deep in an Allen Bradley PLC:

> I don't know why this rung is needed but delete it and see what happens for yourself

Did not fuck around; did not find out.

mauvehaus · on July 20, 2023

Context for those who haven't worked in the field: A PLC is a programmable logic controller. They are typically programmed with ladder logic which grew out of discrete relay based control systems.

Generally they're controlling industrial equipment of some sort, and making changes without a thorough understanding of what's happening now and how your change will affect the equipment and process is frowned upon.

https://en.wikipedia.org/wiki/Ladder_logic

mhink · on July 20, 2023

I interned briefly at a company which mainly built industrial control systems. One of its most interesting features (which is also very mind-bending if you're coming from any sort of typical programming ecosystem) is that every "rung" is evaluated in parallel. (As a physical relay-based control system would have back in the day.)

Very wild stuff.

praptak · on July 20, 2023

This has the "more magic" anecdote feel to it.

http://www.catb.org/jargon/html/magic-story.html

w0mbat · on July 20, 2023

I remember reading a story like this from the early days of Acorn. The first production sample of the BBC Micro came in, and would crash unexpectedly. Trial and error found that connecting a jumper wire between 2 particular points on the board stopped it crashing, but nobody could work out why it crashed or how that fixed it. They never worked it out and ended up shipping mass quantities of the BBC Micro with the magic jumper wire in place on each one.

sumtechguy · on July 20, 2023

but now its like the number of licks to the center of tootsie pop the world may never know!

weaksauce · on July 20, 2023

do you remember what the instruction was?

letitbeirie · on July 21, 2023

I think it was something like

|---]/[---[ONS]---[MOV]---|

ke88y · on July 20, 2023

> I do a lot of support work for Control Systems. It isn't unheard to find a chunk of PLC code that treats some sort of physical equipment in a unique way that unintentionally creates problems. I like to parrot a line I heard elsewhere: "Every time Software to fix a [Electrical/Mechanical] problem, a Gremlin is born".

At least some of this is cultural. EEs and MEs have historically viewed software less seriously than electrical and mechanical systems. As a result, engineering cultures dominated by EEs/MEs tend to produce shit code. Inexcusably incompetent software engineering remains common among ostensibly Professional Engineers.

TheCapn · on July 20, 2023

You're not wrong. It shows in the state of PLC/HMI Development tools. Even simple things like Revision Control is decades behind in some cases.

I've basically found my niche in the industry as a Software Engineer though I can't say I see myself staying in the industry much longer. The amount of time's I've gotten my hands on code published by my EE coworkers only to rewrite it to work 10x faster at half the size with less bugs? Yikes. HMI/PLC work is almost like working in C++ at times, there's so many potential pitfalls for people that don't really understand the entire system, but the mentality by EE/ME types in the industry is to treat the software as a second class citizen.

Even the clients treat their IT/OT systems that way. A production mill has super strict service intervals with very defined procedures to make sure there is NO downtime to production. But get the very same management team to build a redundant SCADA server? Or even have them schedule regular reboots? God no.

dmvdoug · on July 20, 2023

I have no clue, but I wonder how unique this attitude turns out to be among MEs/EEs versus, say, everyone dealing with electronics. Because the stories and complaints about mechanical and electrical engineers treating code (and programmers) as second-class, remind me very much of how front-end people complain about how they’re perceived by back-end people. And about how backend people complain about how they’re perceived by systems programming people. And so on up/down the line(s) of abstraction.

weaksauce · on July 20, 2023

> It shows in the state of PLC/HMI Development tools.

i mean you are talking about upgrading things that are going to be in service for decades perhaps. the requirements for the programs is generally not complicated. turn on a pump for a time, weigh something and then alarm if some sensor doesn't see something.

Structured text was an improvement over ladder logic as you could fit more of the particular program in the screen real estate you had and could edit it easier since it was just text. though, that had its own set of issues that needed to be worked through and it wasn't a panacea.

liminalsunset · on July 20, 2023

I think one of the reasons I've seen this happen is because typically, EE and ME programs in university teach very little CS "enough to be dangerous", and the few coding projects you are required to do are often taught in a way that downplays the importance of the software. Software is often seen as simply a translation or manifestation of a classical mathematical model or control system ( or even directly generated by Matlab/Simulink).

Software, being less familiar, is not viewed as a fundamental architectural component because there often isn't sufficient understanding of the structure or nuance involved in building it. In my experiences software or firmware engineers tend to be isolated from the people who designed the physical systems, and a lot of meaning is lost between the two teams because the software side does not understand the limitations and principles of the hardware and the hardware team does not understand the capabilities and limitations of the software.

ke88y · on July 20, 2023

The worst part is there's no particular reason for this -- infusing proper software development best practices into existing EE/ME coursework isn't that hard.

It's an instance of the larger pattern in which technical degree programs lag industry requirements by decades, as older faculty ossify at the state of the art circa 2-3 years prior to when they received tenure.

IMO one way to help would be to get rid of the entire notion of a "Professor".

Instead, courses should be taught primarily by a combination of professional instructors on permanent contracts and teacher-practitioners supported by the instructors. The instructors should have occasional sabbaticals for the professional instructors to embed in firms and ensure they're up to date on the industry.

The research side of the university can even more easily replace Professors and tenure with first-line lab managers on 3-5 year contracts whose job is simply to apply for grants and run labs, and who can teach if they want but are held to the same standards as any other applicant for an ad junct teaching position in any particular term.

liminalsunset · on July 20, 2023

I definitely think there are two sides to this. The school I went to had a lot of professors for whom the "ossified 2 years prior to tenure" thing was true for, but I also found them to be helpful for teaching fundamental concepts that didn't change in an effective way.

I think one barrier to better engineering programs in universities is that there typically is an onerous set of "accreditation requirements" which prevents significant modification of the curriculum to adapt to modern needs.

The other barrier is that students culturally appear to not always want to do more coding than needed. Courses involving coding were widely regarded as the most difficult by the people around me, despite something like up to 80% of an EE class going into SW engineering after graduating.

I think in general, degree programs are designed to be something that they're not used for anymore often. The usual line is that they're designed to provide a track to academia, and aren't vocational training. But nowadays degrees seem very ritualistic and ornamental - it seems that people are doing their learning on the job mostly whatever they do, and the relevance of the degree itself is just a shibboleth of some sort.

0xffff2 · on July 20, 2023

> I think one barrier to better engineering programs in universities is that there typically is an onerous set of "accreditation requirements" which prevents significant modification of the curriculum to adapt to modern needs.

This seems to be rapidly dissolving, at least in California. Several schools including Stanford, Cal Tech, and several of the UCs have dropped ABET accreditation for most of their programs in recent years, with more likely to follow as they come up for renewal.

overnight5349 · on July 20, 2023

Why should EEs also be software engineers? These are two distinct specializations.

No sane person would expect a programmer to just design a lithium battery charge circuit that goes in your user's pocket, that'd be reckless and dangerous. I likewise would never expect a programmer to break out an oscilloscope to debug a clock glitch, or characterize a power supply, or debug protocol communication one bit at a time. I wouldn't ask a programmer to do FCC pre-validation, or to even know what that means.

Why then do you want to rely on an EE to produce critical software?

As an EE, I know my limits and how to separate concerns. I keep my bad C++ restricted to firmware and I simply do not write code further up the stack. We have programmers for that. Where the app needs to talk to the device, I'll write an example implementation, document it, and pass it off to the programmer in charge. It's their job to write good code, my job is to keep the product from catching fire.

If you want good code, hire a programmer. If you want pretty firmware, hire a programmer to work with your EEs. If you expect an EE to write code, you get EE code because their specialization is in electronics.

Unless you really want an EE who is also a software engineer, but then you're paying someone very highly skilled to do two jobs at once.

Electronics and software are two different jobs requiring two differently specialized people. It just looks like they overlap.

bee_rider · on July 20, 2023

I think it is the result of how these things tend to be taught. At least in my school, all the EE’s and all the computer engineering students had the same first couple programming classes.

Lots of EE’s need to do some programming, and lots of people getting EE degrees end up in programming careers, so it would be a disservice not to teach them any programming at all. In particular, an engineer should be taught enough programming to put together a matlab or Numpy program, right?

Meanwhile, some of their classmates will go on to program microcontrollers as their job.

Writing programs and a product, and writing programs to help design a product, are two basically different types of jobs that happen to use similar tools, so it isn’t that surprising that people mix them up.

TheCapn · on July 24, 2023

This is a discussion that is much larger than what's available in a comment section like this. But I agree with you wholeheartedly.

I think part of the thing is Software Engineers haven't been a thing for as long in the industry. I'm the only Software Engineer I've met doing controls. My supervisor has a CS degree and an Electrical Technician diploma, but never another SE.

Second is I think up until recently, the work done by Control Systems has been what's capable of an EE or ME so having a SE hasn't been necessary. I've been with my company for 10 years now, and in that time I've watched the evolution of what my clients are seeking in terms of requirements to their systems.

I primarily work in Agriculture or Food Production. 10 years ago my projects were assembling plants and getting their motors to start, with the required protections then some rudimentary automation to align paths and ensure flow.

Today? I'm building traceability systems to report on exactly which truck load was the origin of contamination for a railcar shipped months later. Or integrating production data to ERP systems. Adding MES capabilities to track downtimes and optimize production usage. Generating tools to do root cause analysis on events... It's a different world and the skills of a Software Engineer haven't really been a super important role for quite a while.

I think the mindset is shifting, but it is slow.

hef19898 · on July 20, 2023

On the other hand there was Juicero...

ke88y · on July 20, 2023

I don't follow. Did you accidentally reply to the wrong comment?

hef19898 · on July 20, 2023

No, why? The point was EEs and MEs, or rather traditional engineering heavy culture produces bad software (never mind the first software devs tended to be EEs), so Juicero is good example of a software-leaning culture producing shitty hardware products.

overnight5349 · on July 20, 2023

Uh, the Juicero was spectacular hardware. The mechanical engineering in that beast was absolutely beautiful.

I don't recall what the software was like, but none of that is why it failed, it was simply a moronic business idea. An overpriced subscription for low quality fruit in a DRM-laden pouch. Nobody wanted it then or now.

ndiddy · on July 20, 2023

The Juicero was terrible hardware in the sense that they could have made a product that was functionally similar but cost far less to make. It seems like they got a hardware engineering team straight out of college and gave them no constraints or budget. You have a giant CNC'd aluminum frame, a custom gearbox, a custom power supply, a custom drive motor, etc. All of this is only necessary because they decided to squeeze the juice out of the bags by applying thousands of pounds of force to the entire surface of the bag at once vs. using a roller or something. They were likely losing hundreds of dollars per unit even when they were selling the press for $600.

Dylan16807 · on July 21, 2023

Squeezing the entire bag was the selling point. It's not negotiable. The important question is how much money they could have saved without changing that aspect.

sam_bristow · on July 20, 2023

I kinda want a Juicero now, but only as an object d'Art. It was a garbage product.

causi · on July 20, 2023

I'll never do PLC work again. Forget undocumented code, most of the time there's no schematics for the hardware you're working on because it was custom-built thirty years ago.

TheCapn · on July 20, 2023

My company is generally good about that. We have lots of overlapping documentation that answers questions like that in different ways. From Electrical schemas to QA docs, picture archives of panels and wiring, ticketing systems, spreadsheets over I/O, etc. etc.

I hate PLC work for other reasons. I'm starting to look at going back to more traditional software role. I'm a bit tired of the road work and find the compensation for the amount asked of you to be drastically underwhelming. This meme is very much relevant:

https://i.redd.it/rawo5uki1v9b1.jpg

causi · on July 20, 2023

Ayep. When you're the PLC guy they expect you to know how to fix anything that plugs into a wall but isn't understood by the electricians.

smallpipe · on July 20, 2023

> I suppose the other aspect is knowing that I trust my coworkers. They don't (typically) do something for no good reason

This so much. Depending on the git blame, I'll either remove it blindly or actually think about it way more.

glonq · on July 20, 2023

> "Every time Software is used to fix a [Electrical/Mechanical] problem, a Gremlin is born"

Early in my career, I was confused by seemingly-crazy questions in the Hacker Test (https://www-users.york.ac.uk/~ss44/joke/hacker.htm) like...

> 0133 Ever fix a hardware problem in software?

> 0134 ... Vice versa?

But after spending years developing embedded systems, I don't even blink at such questions. Yes, of course I have committed such necessarily evils!

hinkley · on July 20, 2023

> Nobody puts code in there for no reason, so I need to know why we have a timer, or an override in the first place.

I would like to think that if I sent out an email about git hygiene that you would support me against the people who don’t understand why I get grumpy at them for commits that are fifty times as long as the commit message, and mix four concerns two of which aren’t mentioned at all.

Git history is useless until you need it, and then it’s priceless.

I can’t always tell what I meant by a block of code I wrote two years ago, let alone what you meant by one you wrote five years ago.

scubbo · on July 21, 2023

> commits that are fifty times as long as the commit message

One of my proudest commits had a 1:30 commit:message length ratio. The change may have only been ~3 lines, but boy was there a lot of knowledge represented there!

hinkley · on July 21, 2023

Been there, got the tshirt. The one liners can be the worst. Any time the comment has a ticket url for another product it tends to get juicy.

kvmet · on July 21, 2023

I work with control systems and have a similar mantra: "You can't overcome physics with software." It's super common to have someone ask if a mechanical/human/electrical/process issue can be fixed with software because people believe that programming time is free. Sometimes it's not even that it's impossible to do in software, but adding unnecessary complexity almost always backfires and you'll wind up fixing it the right way anyway in the end.

woleium · on July 24, 2023

Chesterton's Fence is a principle that says change should not be made until the reasoning behind the current state of affairs is understood. It says the rash move, upon coming across a fence, would be to tear it down without understanding why it was put up.

throw9away6 · on July 20, 2023

Sometimes you have to remove the chunk and see what breaks because the person that put it in is long gone and documentation is missing

cortesoft · on July 20, 2023

This sounds more like the original Chesterton’s fence than what the article is describing. The article is about understanding something’s actual current purpose, rather than just the intended purpose.

What the article is describing reminds me of the XKCD comic workflow: https://xkcd.com/1172/

A system exists external to the creators original purpose, and can take on purposes that were never intended but naturally evolve. It isn’t enough to say “well that is not in the spec”, because that doesn’t change reality.

dacox · on July 21, 2023

The unspoken thing here is that PLC code often(usually?) isn't exactly written in text, or in a format readable by anything other than the PLC programming software.

After a year long foray into the world of PLC, I felt like I was programming in the dark ages.

I'm assuming its a bit better at very big plants/operations, but still.

alex-robbins · on July 20, 2023

> "Every time Software is used to fix a [Electrical/Mechanical] problem, a Gremlin is born"

I'm definitely going to use this, and I think there's a more general statement: "Every time software is used to fix a problem in a lower layer (which may also be software), a gremlin is born."

xen2xen1 · on July 20, 2023

cough 737 Max cough

peteradio · on July 20, 2023

> "Every time Software to fix a [Electrical/Mechanical] problem, a Gremlin is born".

I think I get the gist, but that sentence is missing some words.

TheCapn · on July 20, 2023

oooboy is it. That's what I get for not proofreading.

Izkata · on July 20, 2023

> "Every time Software is used to fix a [Electrical/Mechanical] problem, a Gremlin is born"

Gizmo caca.

(...I just watched both Gremlins movies last weekend...)

HeyLaughingBoy · on July 20, 2023

> "Every time Software is used to fix a [Electrical/Mechanical] problem, a Gremlin is born".

Just did one of those this morning. Hmmm.

irisgrunn · on July 20, 2023

I had something similar. A couple of years ago I bought on old house. The previous owners did most of the work themselves from the 1960s on.

The zinc gutter had leaked for probably decades and it destroyed part of the roof structure. The roof was held up by the wooden paneling the used to cover it on the inside (70s). So the wooden paneling was actually load bearing

Actually I've found way more stuff in this house. For example at the end of the roof the tiles weren't wide enough and instead of buying extra roofing tiles, they decided to fill it with cement and pieces of ceramic flower pots.

hinkley · on July 20, 2023

If you go back to the early 1900s, the cladding was installed on a diagonal, which helped greatly in preventing the building from racking. Now we rely on sheet rock and plywood to provide that protection in earthquakes and wind storms.

If you ever see a house stripped down to the sticks for a rebuild, you will hopefully notice a few braces added. Not to keep the walls from falling down, but to keep them square and true until the walls are rebuilt.

jefftk · on July 20, 2023

For what it's worth, in the Boston area traditional sheathing is boards parallel to the ground, not diagonal.

(I've seen dozens of houses from ~1900-1920 in the process of full gut rebuilds, and none of them had diagonally installed sheathing.)

hinkley · on July 20, 2023

It might be more of a commercial building thing than a domestic construction one. I know I've seen it in a number of videos of renovations of larger buildings, including barns. I might have the time range off. Big cities are full of 1920's constructions especially on the West Coast, and they don't do that.

My old house (~1920) had diagonal shiplap under the floors instead of plywood (but parallel in the oldest walls). That's probably more for making hardwood floors easier to install than structural integrity.

Edit: The internet says 'start of the 20th century' phased out in 1950's (plywood), and 'sometimes diagonally'.

jefftk · on July 20, 2023

The subfloors I've seen for old houses in Boston are also laid perpendicular to the joists, though diagonal does seem like it would make more sense.

hinkley · on July 20, 2023

Laying a board over a seam in the subfloor sounds like a titanic pain in the ass. Diagonal means you might have to move the nail a little. Probably also limits the amount of dust and water that passes through from floor to ceiling.

UncleEntity · on July 20, 2023

Yep, sounds like my garage.

At first look it seem like someone backed into the garage door and mangled the hell out of it but on more careful inspection the roof is being barely held up by the tracks that the door runs in and is pretty near to giving up the ghost. Was just going to splice the ends of the rafters (like someone did on the other side who knows how many years ago...if it works, it works) and replace the garage door but now its looking like I'll need a whole new roof.

What really worries me is the dodgy wiring strung all across the basement which is a combination of newish wires, old cloth covered wires and liberal applications of electrical tape to splice it all together. Luckily none of the wires seem to be load bearing...

jedberg · on July 20, 2023

I had to do a garage roof a few years ago. The previous owner thought it would be a good idea to put a hole in the main crossbeam to attach the garage door opener. As soon as I bought the house I removed the garage door opener and put metal plates on both side of the hole bolted together.

My "fix" held for about 11 years, but apparently it very slowly weakened, creating a small divot on the roof. Which got bigger and bigger with each rain, but since I never go on the garage roof, I didn't notice.

Until during one heavy rain I got a surprise skylight!

So yeah, you probably want to fix that before you get a total collapse like I did.

hinkley · on July 20, 2023

The thing I learned when we fixed our old house up for sale, is that in some municipalities you need a building permit to replace a floor that already exists with a new one. In our case water damage was from a claw foot tub drain, which I had fixed but years after the problem started, so it was in the middle of the floor. The contractor ripped it down to the joists and doubled them, which does not it turns out require a permit.

Sounds like your guy had a similar experience.

bombcar · on July 20, 2023

Carefully reading the NEC indicates that loads of electrical tape is permitted way more often than you’d think it should …

dmbche · on July 20, 2023

Degrees away from load bearing paint!

hef19898 · on July 20, 2023

Or a poster!

bcrosby95 · on July 20, 2023

I had a house that had really bad termite damage and the contractor called it "structural stucco".

hinkley · on July 20, 2023

PSA: houses don't 'get termites'. They get water damage, and then the water damage gets termites.

Most problems with houses come back to managing water, air, or some other infiltration. But mostly it's water.

jwrallie · on July 21, 2023

Maybe in general, but there are particular species of termites (drywood termites) that can derive their water from the wood they consume.

w10-1 · on July 20, 2023

This article and every other comment seems to miss the real issue: the lack of testing.

Software differs from all other means of production in that we can in fact test any change we make before realizing it in the world.

With good tests, I don't care what the intent was, or whether this feature has grown new uses or or new users. I "fix" it and run the tests, and they indicate whether the fix is good.

With good tests, there's no need for software archeology, the grizzled old veteran who knows every crack, the new wunderkind who can model complex systems in her brain, the comprehensive requirements documentation, or the tentative deploy systems that force user sub-populations to act as lab rats.

Indeed, with good tests, I could randomly change the system and stop when I get improvements (exactly how Google reports AI "developed" improvements to sorting).

And yet, test developers are paid half or less, test departments are relatively small, QA is put on a fixed and limited schedule, and no tech hero ever rose up through QA. Because it's derivative and reactive?

jefftk · on July 20, 2023

> With good tests, I don't care what the intent was, or whether this feature has grown new uses or or new users. I "fix" it and run the tests, and they indicate whether the fix is good.

Except when the tests verify what the code was designed to do, but other systems have grown dependencies on what the code actually does.

Or when you're removing unused code and its associated tests, but it turns out the code is still used.

Or when your change fails tests, but only because the tests are brittle so you fix the test to match the new situation. Except it turns out something had grown a dependency on the old behavior.

Tests are great, and for sufficiently self contained systems they can be all you need. In larger systems, though, sometimes you also need telemetry and/or staged rollouts.

conor- · on July 21, 2023

> Except when the tests verify what the code was designed to do, but other systems have grown dependencies on what the code actually does.

Assuming you mean systems in terms of actually separate systems communicating via some type of messaging, isn't that where strong enforcement of a contract comes into play so that downstream doesn't have to care about what the code actually does as long as it produces a correct message?

> Or when your change fails tests, but only because the tests are brittle so you fix the test to match the new situation. Except it turns out something had grown a dependency on the old behavior.

I think this supports GPs point about tests being second-class and not receiving the same level of upkeep or care that the application code receives and one can argue that you should try to prevent being in a position where tests become so brittle you end up with transient dependencies on behavior that was not clear.

makeitdouble · on July 21, 2023

To extend on jefftk's point, the tests have a specific scope (what the code is responsible for), which probably doesn't cover what it is now expected to do after months/years of getting used.

Your code was supposed to calculate the VAT for a list of purchases, but progressively becomes a way to also force update the VAT cache for each product category and will be called in originally unexpected contexts.

BTW this is the same for comments: they'll cover the original intent and side effects, but not what the method/class is used for or actually does way down the line. In a perfect world these comments are updated as the world around them changes, but in practice that's seldom done except if the internal code changes as well.

fardo · on July 20, 2023

> the lack of testing

Having worked on very long-running projects, testing or well-funded QA doesn’t tend to save you from organizational issues.

What typically seems to happen is tests rot, as tests often seem to have a shelf life. Eventually some subset of the tests start to die - usually because of some combination of dependency issues, API expectations changes, security updates, account and credential expirations, machine endpoint and state updates, and so on - and because the results from the test no longer indicate correctness of the program, and the marginal business value of fixing one individual broken test is typically very low, they often either get shut off entirely, or are “forced to pass” even if they ought to be an error.

Repeat for a decade or two and there quickly start being “the tests we actually trust”, and “the tests we’re too busy to actually fix or clean up.”

Which tests are the good ones and bad ones quickly become tribal knowledge that gets lost with job and role changes, and at some point, the mass of “tests that are lying that they’re working” and “tests we no longer care to figure out if they’re telling the truth that they’re failing” itself becomes accidentally load-bearing.

joe_the_user · on July 20, 2023

And yet, test developers are paid half or less, test departments are relatively small...

Huh? Your first part seemed to be repeating TDD optimism but then you switch test departments. Just make your claims consistent, I'd suggest you instead talk about tests being written by the programmers, kept with the code and automatically run with the build process.

However, I don't think even TDD done right can replace good design and good practices. Notably, even very simple specifications can't be replaced by tests; if f(S) just specified to spit out a string concatenated with itself, there's not obvious test treating f as a black box that verifies that f is correct. Formal specifications matter, etc. You can spot check this but if the situation is one magic wrong value screws you in some way, then your test won't show this.

there's no need for software archeology, the grizzled old veteran who knows every crack, the new wunderkind who can model complex systems in her brain, the comprehensive requirements documentation, or the tentative deploy systems that force user sub-populations to act as lab rats.

Wow, sure, all that stuff can be parodied but it's all a response to software being hard. And software is hard, sorry.

perrygeo · on July 20, 2023

test developers? test departments? QA teams? It's worth mentioning that the vast majority of software orgs don't have the luxury of such distinctions, i.e. software teams are generally directly responsible for the quality of their own work and can't punt problems across the org chart.

hinkley · on July 20, 2023

Probably the most shocking revelation I've had in my time as a developer has been that testers are never paid the correct salary (some too high, most too low), and that if you have someone who is good at testing and is becoming proficient at coding, their smartest move is not to become a FT developer, but to skip right over developing and go into security consulting. Instead of making 75% more money by switching careers, you can make 150% more.

Software needs people with the suspicious minds of good testers, but security people make more money for the same skillset.

mucle6 · on July 20, 2023

Chesterton's Test

pickledish · on July 20, 2023

This (the unimportant stud becoming load bearing later on) makes sense, but in my experience it’s kind of a sign of lazy design. When making software at least, you know when you’re trying to use a decorative stud to hold up part of your house, and choosing to do it anyway instead of building some new better structure does make for a pretty sad dev team later on

This is to say —- I agree with the article, but much nicer is to work at a place where you don’t expect to make this particular discovery very often, hah

bgribble · on July 20, 2023

> you know when you’re trying to use a decorative stud to hold up part of your house

"You" might know it in "your" creations, but in my career I am much more often working and reworking in other people's creations.

I think the point of the article is not that you should avoid using decorative studs as load-bearing elements, but that you should be aware that others may have done so before you came along.

This is an even more conservative position than the default Chesterton's Fence reading, which is itself dismissed by a lot of people as pedantically restrictive.

For me, the parent article resonates. I have definitely had ceilings come crashing down on my head when I removed a piece of "ornamental" trim (programmatically speaking)

jerf · on July 20, 2023

"This is an even more conservative position than the default Chesterton's Fence reading, which is itself dismissed by a lot of people as pedantically restrictive."

In a normal, real-life context, I can see why someone would feel that way.

In a software engineering context I think it's just a further emphasis that you ought to understand what something is doing before fiddling with it, and both the original intent and what it is currently doing are interesting information. Many times I've removed dead code, only to learn that not only was it alive (which wouldn't have been that surprising, it's easy to miss some little thing), but that it is very alive and actually a critical component of the system through some aspect I didn't even know existed, which means I badly screwed up my analysis.

The differences between the physical world and the software world greatly amplify the value of the Chesterton's Fence metric. In the physical world we can all use our eyes and clearly see what is going on, and while maybe there's still a hidden reason for the fence that doesn't leap to mind, it's still a 3-dimensional physical problem in the end. Software is so much more interconnected and has so much less regard for physical dimensions that it is much easier to miss relationships at a casual glance. Fortunately if we try, it's actually easier to create a complete understanding of what a given piece of code does, but it is something we have to try at... the "default view" we tend to end up with of software leaves a lot more places for things to hide on us. We must systematically examine the situation with intention. If you don't have time, desire, or ability to do that, the Chesterton's Fence heuristic is more important.

We're also more prone to drown under Chesterton's fences if we're not careful, though. I've been in code bases where everyone is terrified to touch everything because it seems like everything depends on everything and the slightest change breaks everything. We have to be careful not to overuse the heuristic too. Software engineering is hard.

wpietri · on July 20, 2023

In software, I would strongly disagree with this:

> you ought to understand what something is doing before fiddling with it

I think understanding before fiddling is one option. But I think a better option is often fiddling and seeing what happens. The trick is to make it so that fiddling is safe. E.g., on a project where I have good test coverage and find mystery code, I can just delete it and see what fails. Or I set up a big smoke test that runs a bunch of input and compares outputs to see what changes.

A lot of bad software is effectively incoherent, so it can't be understood as machinery. Instead it has to be understood in psychological, historical, or archaeological terms. "Well back then they were trying to achieve X, and the programmer doing it was not very experienced and was really interested in trendy approach Y so he tried using library Z, but it wasn't really suited for the problem at hand, so he misused it pretty severely."

That can be interesting, but it's often much more efficient to say, "Who cares how or why this got to be such a tangled mess. Let's solve for the actual current needs."

jefftk · on July 20, 2023

> But I think a better option is often fiddling and seeing what happens. The trick is to make it so that fiddling is safe. E.g., on a project where I have good test coverage and find mystery code, I can just delete it and see what fails. Or I set up a big smoke test that runs a bunch of input and compares outputs to see what changes.

There was a good discussion of this in the comments, starting at https://www.jefftk.com/p/accidentally-load-bearing#fb-109168...

In addition to tests, you can also add logging to your running system, or make the change behind a flag that you A/B test in production.

bombcar · on July 20, 2023

The problem “fiddle and see if Rome burns” is that you may have broken something that only runs once a year/decade and you won’t know if that process isn’t in the list of tests.

And by the time it breaks, will anyone remember the probable cause?

wpietri · on July 21, 2023

If people are building things that only run once a year or once a decade and haven't included sufficient tests to make sure those things work and keep working, then the failure lies with them, not with the people who work on the system later. The real "probable cause" isn't some later change. It's the initial negligent construction.

jerf · on July 20, 2023

"But I think a better option is often fiddling and seeing what happens."

If you're safely fiddling with it, I would consider that part of the process of understanding. I'm particularly prioritizing understanding what is actually doing. Historical context is primarily useful because it will point you in the direction of what else you may need to look at; when I fail to realize that removing X broke Y it's because I didn't realize that there's a reason those are together.

wpietri · on July 20, 2023

I agree, but what I'm talking about is distinct from a usual "process of understanding", because the goal is explicitly remain as ignorant as possible. In many cases I want to learn the exact minimum needed to safely make a change. What were they thinking? What motivated them? Don't know, don't care. I just want to clean up the mess and move on to something actually useful and pleasant.

As a practical example, last year I was supposed to figure out why a data collection system was not getting all the data we wanted. The person who worked on it was long gone, so I looked at the code. It was a bunch of stuff using Apache Beam. No tests, no docs. The original author was not a very experienced programmer. It jumbled together a variety of concerns. And after a day of trying to understand, it became obvious to me that some of the uses of Beam were unconventional. Plus Beam itself is its own little world.

The next day, I said "fuck that", and wrote a very simple Python requests-based collector. I pretty quickly had something that was both much simpler and much more effective at actually getting the data. And from there I went into the usual sort of feedback-driven discovery process to figure out what the current users actually needed, with zero regard for original intent of the system was in solving problems for people no longer present.

What was the thing doing? Why did it to it? How did it end up that way? All excellent mysteries that will remain mysteries, as I eventually deleted all the Beam-related code and removed its scheduled jobs. For me, this ignorance was truly bliss. And for others too in that the users got their needs solved with less work, and that developers had something much cleaner and clearer to work with in the future.

marcosdumay · on July 20, 2023

> But I think a better option is often fiddling and seeing what happens.

Yeah, as always, IMMV.

But I do agree that online discourse puts too much emphasis on statically analyzing systems, and too little on adding instrumentation or just breaking it and seeing what happens.

At the same time, my experience is that on practice people put too much emphasis on instrumentation or just breaking it and seeing what happens, and way too little on statically analyzing the system.

dkarl · on July 20, 2023

> in my experience it’s kind of a sign of lazy design

Is it always lazy in the bad way, though? In software there's no sharp distinction between "built to carry weight" and "built to tack drywall onto." Whether a system is robust or dangerously unscalable depends on the context. You can always do the thought experiment, "What if our sales team doubled in size and then sold and onboarded customers as fast as they could until we had 100% of the market," and maybe using your database as a message queue is fine for that.

If it results in a sad dev team, then that's a case where it was a mistake. It's hard to maintain, or it's an operational nightmare. That isn't the inevitable result of using a (software) decorative stud as a (software) load-bearing element, though. There are a lot of systems happily and invisibly doing jobs they weren't designed for, saving months of work building a "proper" solution.

im3w1l · on July 20, 2023

Let's say you code defensively. You add some handling invalid input to your function. Because the rest of your codebase never sends it invalid input it's dead code - not load bearing. Until at some point a bug is introduced and sends you invalid input which is then dutifully handled and recovered from. The branch has become load bearing.

andrewaylett · on July 20, 2023

I've been happily running a service that's non-critical, only to discover when we have an outage (that should be a non-event) that another team has started relying on it for something business critical.

jefftk · on July 20, 2023

This was famously a problem for Google's distributed lock service, Chubby. They handled it by intentionally having outages to flush out ways it might have started to bear loads it wasn't designed for: https://sre.google/sre-book/service-level-objectives/#xref_r...

throwawaymobule · on July 22, 2023

I'm a fan of the 'chaos monkey' (Netflix software) approach of this.

Can't expect your platform to be reliable, if it just breaks at random.

herpdyderp · on July 20, 2023

> you know when

More often than not I've seen this happen because they, in fact, do not know.

cesaref · on July 20, 2023

Well, let's say you are working on a well engineered and tested product, and you look at the code coverage, and there's a whole lump of functionality that has no coverage.

You could conclude that the code is unnecessary and remove it, or you could conclude that some test cases need to be added to exercise it. How do you decide which is correct?

The problem is usually that well thought and and designed software was build for a moving target, and invariably things have changed over time. It's not necessarily a sign of lazy design, it's where the real world intersects with the nice neat pretend world we design for :)

convolvatron · on July 20, 2023

I'm deeply frustrated at the missing answer here. read the code. figure out what's it for. take out it and see what breaks. software is not a house, you can completely wreck it and set in back exactly the way it was in a second.

there is no excuse for not owning and knowing the software you are supposed to be in control of.

cesaref · on July 20, 2023

Well, if it isn't covered, then you can take it out, and see what breaks (nothing) and then discover in production why it was there :)

But yes, it's basically what the job comes down to - having a strategy for managing complexity in all it's forms, and this is a fine example of the sort of problem that you don't learn in college.

I've (thankfully) never deprecated code and caused serious production issues, but i've seen it happen. The best places to work expect this sort of issue, and have process in place to roll back and deal with it, like any other business continuity issue (e.g. power/network loss).

The moment you find yourself scared of changing code because you don't understand the consequences then you've basically lost the battle.

convolvatron · on July 21, 2023

hopefully there is some other place to try your code than in production. that gives you the agency to say "lets just take it out and see what happens".

bjornasm · on July 20, 2023

>When making software at least, you know when you’re trying to use a decorative stud to hold up part of your house

Do you? Sometimes quick-one-time fixes becomes the center of important software.

dylan604 · on July 20, 2023

>When making software at least, you know when you’re trying to use a decorative stud to hold up part of your house, and choosing to do it anyway instead

Whenever I find myself doing this, I at least leave a comment typically worded along the lines of "the dev is too lazy, not enough time to do it right, or just has no clue what to do, so here you go..."

Qwertious · on July 20, 2023

s/lazy/lack of/

Hidden files in unix were a bug.

troupo · on July 20, 2023

And then .DS_Store in MacOS was basically a bug, too. A lack of understanding coupled with a lack of design.

lambda · on July 20, 2023

My favorite "accidentally load bearing" artifact that I've found was a misconfigured sudo that would easily allow arbitrary code execution as root (it allowed passwordless sudo for a find command, which could be used with -exec), which a number of important support scripts for the product had been written to use. Load bearing privilege escalation!

duck · on July 20, 2023

We remodeled our kitchen several years ago. In the old kitchen there was a large beam running across one end, added during some other remodel before we owned the house to add a second floor, but since we wanted to expand on the kitchen this really needed to go. Upon tearing into the ceiling, it was found this beam was ~2 feet to the right of where it should of been to support the wall above. They fixed it and moved the beam into the wall upstairs and it all worked out, but when I asked about the original location the builder basically said: "you had one person that wanted to do the right thing, and one person that didn't care, and quality always ends up with the lowest setting".

ochoseis · on July 20, 2023

One benefit software has over physical systems is that you can easily document it right inside the code with comments and types to make the intention clearer. This isn’t foolproof, especially in dynamic languages like Python, but helps a lot.

The analogy for the load bearing stud might be a hackathon project that never expected to see production. In reality, a lot of what we do is hack on something until it barely works, and move on to the next thing.

MilStdJunkie · on July 20, 2023

Yuuuuuup. This precise set of problems is why Systems Engineering was adopted in Aerospace and Defense, because maintenance plans need to know what the "loads" are for each replaceable (spareable) unit or assembly.

Today's SE has wandered far, far afield from original goals, tragically enough, but that was the original conception. One of the reasons for today's relatively toothless SE departments is the rise of finance into maintenance planning. Inventory depreciation is a cruel mistress, and "what gets spared" is rarely a SE judgement these days, at least in my experience. This has predictable results, but is partially offset by the exceptionally high bar for aerospace maintenance staff, who are generally pretty damn badass compared to, say, a washing machine repairman. Finance, naturally, would like to knock that bar down a few pegs, too.

eschneider · on July 20, 2023

That's easy to do, if it's _intentional_. I'm always amazed at how often I see systems rate limited by some seemingly "decorative" upstream component, that when removed, causes everything else to run amok. :/

geocrasher · on July 20, 2023

This reminds me of end users who unknowingly exploit a bug in a piece of software and integrate it into their normal workflow. The result is that when the bug is fixed, their workflow is interrupted, causing complaints.

gerad · on July 20, 2023

https://xkcd.com/1172/

Karellen · on July 20, 2023

More formally:

> With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody.

https://www.hyrumslaw.com/

dventimihasura · on July 20, 2023

> I could easily tell why it was there: it was part of a partition for a closet.

> Except that over time it had become accidentally load bearing

> through other (ill conceived) changes to the structure this stud was now helping hold up the second floor of the house

Evidently, you couldn't easily tell why it was there. Moreover, I'm not persuaded that it accidentally became load bearing. It seems quite plausible that it deliberately became load bearing, for reasons which are ill conceived to you but not to the people who had them.

lmm · on July 20, 2023

> Evidently, you couldn't easily tell why it was there.

No, they could tell why it was there. It's just that knowing why it was there in the first place doesn't tell you what it's doing now.

dventimihasura · on July 21, 2023

"Why it was there" is something that matters to people, plural. Knowing why it was for some people doesn't rule out the possibility that you don't know why it was there for other people.

lmm · on July 21, 2023

No, why it was there is a question for whoever put it there, and only them. You might ask other people why they didn't remove it, but that's a different question from why it was there.

dventimihasura · on July 21, 2023

Whoever put it there isn't answering any questions, and neither are the people who made it load-bearing later, if they're not alive anymore. No, the question is for you. If you owned this property, would you want to know about the work done by the people who came later and made it load-bearing, or would you rather remain unaware as you go about your renovations?

lmm · on July 24, 2023

All models are false, some are useful. Knowing the whole history of everything that anyone has ever done in your property would be ideal, but impossible. Knowing why things were originally put in is a useful heuristic that strikes a good balance between simplicity and false positives/negatives.

dventimihasura · on July 31, 2023

Sounds like the answer is, "No, I would prefer to be unaware as I go my renovations."

lmm · on July 31, 2023

If I could know everything, I would. But unfortunately every piece of knowledge has a cost.

dventimihasura · on Aug 1, 2023

So does ignorance.

7402 · on July 20, 2023

I used to work with a physics postdoc who would sometime leave a sign on an equipment setup that read, "Do not touch. There are hidden dangers."

The lab was full of smart people who were used to looking at things and making their own well-reasoned conclusions about whether it was OK to change something. This was a warning not to be too hasty about doing that!

JohnMakin · on July 20, 2023

I have seen too often overly complicated systems where one small change creates these situations that go unnoticed for far too long and then something breaks in a mystifying and spectacular way. The reaction, generally, is then a fear to make any change to the system at all, regardless of how benign, even if it can't possibly mess something up - because of fear of the "unknown."

IME, having robust alerting and monitoring tools, good rollback plans and procedures/automation should eliminate this fear entirely. If I was afraid to touch anything for fear of breaking it, I'd likely never get anything done.

ninkendo · on July 20, 2023

> having robust alerting and monitoring tools, good rollback plans and procedures/automation should eliminate this fear entirely.

Sure, but that all sounds like stuff that happens after you deploy/release… you really need to catch things sooner than that. Don’t make the user into the one who has to find the breakage, please. No matter how fast you roll back. Test your software thoroughly!

JohnMakin · on July 20, 2023

Nothing I said implies the user has to find it. With rolling deployments and blue/green strategies, bad changes don't even have the potential to go live.

cratermoon · on July 20, 2023

This could be considered a variant of Hyrum's Law: "With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody". At some point the stud was became a structural member, because other changes made it so. Now the second floor depends on it, and removing it would compromise the structural integrity of the second floor.

rkagerer · on July 20, 2023

Part of writing and maintaining good code is maintaining useful documentation (such as comments) expressing clear rationale and intent. Simple stuff can be elegantly implicit, and modern languages are getting better at capturing more of this in syntax and eliminating common footguns. But more complex systems tend to benefit from explicit explanation or even external design documents (like diagrams). A fix that doesn't keep those current isn't a completed fix.

When I come in on a consulting basis I often have to help developers unwind the unintended effects of years of patches before we can safely make seemingly-simple changes. It's analysis-intensive, and like an archaeologist any artifacts I can dig up that you've left behind providing clues to what was in your head at the time can be helpful. In some cases where a set of functions is super critical we've made it part of the culture every time altered code is checked in to perform a walkthrough analysis to uncover any fresh unintended side-effects, ensure adherence to design requirements and discover and spotlight non-obvious relationships. The key is to turn the unintended or non-obvious into explicit. Sounds painful but in practice due to the high risk attached to getting that portion of code wrong the developers actually love that time for this is worked into the project schedules - it helps them build confidence the alteration is correct and safe.

I wish it were easier to impress the importance of this on inexperienced developers without them having to go through the hell of maintaining code in an environment where it was lacking.

It's a skill and art to keep the appropriate level of conciseness to avoid the documentation (no matter which form it takes) becoming too verbose or unwieldy to be helpful.

latchkey · on July 20, 2023

document the why, not the what.

this is also why tests are so important. if you want to remove something, you have to think twice... once for the original code and once to fix the broken tests.

regularfry · on July 20, 2023

That has saved me more times than I care to count. "Oh, that's why that code was there. I see."

Izkata · on July 20, 2023

> and like an archaeologist any artifacts I can dig up that you've left behind providing clues to what was in your head at the time can be helpful.

This is why I'm against squashing commits in git, that our other development teams don't understand: you're going through extra effort to make it more difficult in the future.

joshka · on July 20, 2023

> This is a concept I've run into a lot when making changes to complex computer systems. It's useful to look back through the change history, read original design documents, and understand why a component was built the way it was. But you also need to look closely at how the component integrates into the system today, where it can easily have taken on additional roles.

We're probably at a stage where doing this in an automated fashion might be reasonable. The context for why something works a particular way (say in some github repo) is the commit message and change detail, the PR, related issues, related chats in <chat tool of choice>, and then things temporally related to those. Feeding an LLM with the context and generating ADRs for an entire project automatically could be a feasible approach.

themillerdave · on July 20, 2023

I always refer to this as "Load-bearing wallpaper" - in so much that folks who see it, wouldn't think twice about tearing it down - it's wallpaper, right? Well, turns out it was load-bearing wallpaper and the house came down as a result.

binarymax · on July 20, 2023

I feel like part of the story is missing. Did he remove the stud and learn the hard way?

jefftk · on July 20, 2023

>> How did you find out the closet partition stud was load bearing, in the end ?

> I tried to cut it and it started to bind the saw. I stopped and thought, and then did not cut anymore until I had shored up the upstairs.

https://www.jefftk.com/p/accidentally-load-bearing#fb-231219...

h2odragon · on July 20, 2023

A smack with a hammer can tell you all sorts of things. If the board is rotten; some is likely to come off. If it's under tension it'll sing out with a resonance absent from a board carrying only its own weight.

In your circumstance, the board should have had some wiggle but prolly wouldn't have.

Ensorceled · on July 20, 2023

>> I tried to cut it and it started to bind the saw. I stopped and thought, and then did not cut anymore until I had shored up the upstairs.

This is exactly I found out that a stud was now load bearing.

wellpast · on July 20, 2023

What I've learned from decades of software development: ANY and every component or capability you build into your system, incidentally or explicitly, will immediately (ie tomorrow) become "Accidentally Load Bearing".

Dependencies flow in and onto new components like water. This is a fundamental law.

Even secret interfaces, etc that you think no one could find or touch will be found and fondled and depended upon.

This is why the correct view is that all software is a liability and we should write as little of it as possible, just the minimal that is needed to get the dollars and nothing more.

nighthawk454 · on July 21, 2023

Right, so the trick is not to fall into the false-dichotomy trap. Chesterson's Fence tells if you if you don't understand it's purpose, it's probably not safe to remove. That does not imply that if you do understand it's purpose you can now do whatever you want. Chesterson's Fence doesn't tell you anything about when it's safe to remove, only one category of situations where it probably isn't. But that could be an easy misconception.

iancmceachern · on July 21, 2023

In mechanical engineering we call this the load path. You want to be sure 1- you know the load path 2- the load path doesn't go where you don't want it to and 3- everything in the load path is strong enough.

A good example of a famous failure if this:

https://en.m.wikipedia.org/wiki/Hyatt_Regency_walkway_collap...

praptak · on July 20, 2023

Let the system go long enough and everything becomes accidentally load bearing. See also "implementation becomes the specification" - people will come to rely on every single undocumented implementation detail regardless of what the official spec says.

That's why it is a good practice to have DiRT or similar exercises. Break things on purpose so that people don't rely on them when they shouldn't.

hinkley · on July 20, 2023

> Figuring out something's designed purpose can be helpful in evaluating changes, but a risk is that it puts you in a frame of mind where what matters is the role the original builders intended.

One of the biggest challenges I find in refactoring/rearchitecting is getting people to separate why something was done from whether it still needs to exist.

Too many times someone has insisted that a weird piece of code needs to exist because we never would have shipped without it. They treat every piece of the structure as intrinsic, and can't conceive of any part of it being scaffolding, which can now be removed to replace with something more elegant and just as functional.

When you put a doorway through a loadbearing wall, contractors and smart people place a temporary jack to brace the wall until a lintel can be substituted for the missing vertical members. Some developers behave as if the jack is now a permanent fixture, and I have many theories why but don't know which ones are the real reasons.

csours · on July 20, 2023

See also "Swiss Cheese Accident Model" and Melted Cheese Accident Model. Also see also Stockton Rush's statements on safety. He said that most accidents are caused by operator error, so making the sub strong enough wouldn't affect safety.

marcosdumay · on July 20, 2023

What is that molten cheese accident model? Are you trying to describe deviance normalization?

csours · on July 20, 2023

Molten cheese means that something that was intended to increase safety became a hazard during an incident.

Think of how the SCRAM at Pripyat/Chernobyl caused the reactor to go supercritical because of the graphite tips. The control rods should have reduced reactivity in the reactor, but the first section of the rod increased reactivity.

Or how a hard hat dropped from a height may cause injury.

mcv · on July 20, 2023

I've definitely had this happen. With my own code, even. I build something complex, it's released to production, and a few months later I realise that something about it should be done in a different way, so I look at it, and other people have already built on top of the thing I wanted to change.

On the other hand, if you can't figure out what something is for, sometimes the easiest way to find out is to remove it and see what breaks. Maybe that's not such a great idea for that load-bearing stud, but in software it's easy to undo.

dmbche · on July 20, 2023

The term load bearing always bring to my mind the breakdown of how the towers failed during 911 - especially how the plane cutting a lot of vertical steel beams redistributed the weight of the building onto other beams that were not rated to hold most of the buildings weight (while being heated) - they buckle and fail.

Guess I'm thinking of being very aware of how the structure reacts to failure, and not necessarily the strenght of all individual parts.

hprotagonist · on July 20, 2023

reminds me of Hyrum’s law a bit.

kuchenbecker · on July 20, 2023

It is Hyrums law. The observable feature of the board was that it could bear load despite that not being the intention, and the house started using this observable feature.

jefftk · on July 20, 2023

>> Hyrum’s law

> It is Hyrums law.

No, it's "Hyrum's" not "Hyrums": https://www.hyrumslaw.com/

skitter · on July 20, 2023

Is this an instance of Cole's law: The fastest way to get the right answer is to post the wrong answer?

pitaj · on July 20, 2023

Not sure if this is a joke or if you're actually that pedantic to think they purposefully left off the apostrophe because they think it's called "Hyrums law" not "Hyrum's law".

They were agreeing with the top-level comment, not correcting their spelling.

kuchenbecker · on July 23, 2023

You're correct, I was agreeing and using and example.

Dylan16807 · on July 21, 2023

It looks like a correction. Confusion here is not being "pedantic".

ez667 · on July 21, 2023

you misspelled "your"

wduquette · on July 21, 2023

I think the article interprets Chesterton's Fence too strictly. Chesterton's point was simply that you shouldn't destroy something before making sure that it no longer serves any purpose. The fence was simply an illustration.

Or, to put it another way, figure out what the consequences are before you decide whether you're willing to intend them or not.

squokko · on July 20, 2023

I've seen this SO many times. "We'll spin up this server that renders thumbnails of images for the UI."... months pass... "Hey why don't we use the thumbnail server to create low-res images for our marketing emails".... months pass... "The UI doesn't show thumbnails anymore so we can tear out the thumbnail server"

dventimihasura · on July 20, 2023

> It is extremely probable that we have overlooked some whole aspect of the question

https://www.reddit.com/r/ProgrammerHumor/comments/q9x1d2/ask...

Simon_O_Rourke · on July 20, 2023

Good Gawd! A stud holding up a second floor by itself... time to have that domicile condemned and the inhabitants moved out asap!

jefftk · on July 20, 2023

The closet partition stud was bearing enough load to matter, but it wasn't holding up the whole second floor by itself.

I put in a new beam to replace the load bearing wall the preivous-previous owners had removed, with posts down to the basement: https://www.jefftk.com/p/bathroom-construction-framing

Cerium · on July 20, 2023

The only option is to do a complete rebuild. The load bearing stud indicates that the foundation design was inadequate for current needs.

OldGuyInTheClub · on July 20, 2023

Also the denouement to the Fawlty Towers "Builders" episode.

https://www.youtube.com/watch?v=jfBOIbjbLv0

(or, jump to https://youtu.be/jfBOIbjbLv0?t=1560)

dclowd9901 · on July 20, 2023

This is precisely the reason so many developers have come to love functional programming and side-effect-free designs. If that stud _can't_ affect anything outside of its original intended purpose, it can more easily be understood for its purpose and maintained or removed more easily.

orbital-decay · on July 20, 2023

Related concept from software engineering is "accidentally quadratic", which can manifest itself in the most unexpected places when two unrelated things collide.

Both generalize to various forms of causal and failure analysis in systems design.

oftenwrong · on July 20, 2023

The worst part is that it's typically harder to repair the "accidentally load bearing" change than it was to create it in the first place.

Waterluvian · on July 20, 2023

On the topic of load bearing studs. Is there some way to test for that? How to tell non-destructively how much compression one is under?

avgcorrection · on July 20, 2023

HN has such love for that early 20th century writer. So many seemed to have discovered The Fence at about the same time.

gumby · on July 20, 2023

Evolution does this all the time, which is one of the myriad reasons biology is so hard.

Really good, brief post.

hinkley · on July 20, 2023

I’ve been using, “that was a load bearing poster” for years. Usually right after a P1 and right before an RCA.

ajuc · on July 20, 2023

This is why comments in the code aren't that useful, and why commit messages are essential.

dudeinjapan · on July 20, 2023

I’m a believer in making a “Single Responsibility: ___” comment at the top of each class.

marcosdumay · on July 20, 2023

If your class name doesn't answer it, that comment won't either.