Working on linux storage drivers for a decade or so taught me how to write code iteratively, slowly and methodically morphing it towards my goal, with each step along the way functioning, and an improvement over the previous step. You basically were forced to do things gradually. Upstream generally wouldn't accept a patch that just changed things wholesale, or even a patch that didn't make one "logical" change. Tools emerged to make such a workflow easier, easy, even. First Andrew Morton's patch scripts[1], then quilt[2], and then stgit[3], which I still use to this day on my own projects.
I will say it does help when slowly morphing code that it was all C code. C is very malleable, like clay. I've found code in some other languages I won't name to be less malleable, more crystalline, more necessary to smash to atoms to effect the desired change.
I used quilt over 15 years ago as part of a from-scratch embedded distro, for applying the package patches.
Years later I encountered the same trick inside Yocto: one of the few things done right which made life easier, in contrast to almost everything else inside that dumpster fire.
Having had to do the occasional "start with an empty file" kind of rewrite/refactor, both in hobby projects and at work, I just take the "copy/paste" approach.
Start with the structure of what you really want it to be and then flesh it out. Soon discover that a lot of the existing code is OK, anywhere from a simple "for" loop to an entire module, because they work just as well in the new structure and don't have significant technical debt on their own. These can be pasted into the rewrite and possibly adjusted lightly.
When I was working on Knot Resolver, I did a pretty successful rewrite of its I/O. The original one was a mess, but of course a lot of it did make sense in the grand scheme of things - it's only that the requirements had changed with time (mostly caused by the addition of DNS-over-TLS, DNS-over-HTTPS, and, in the future, DNS-over-QUIC), so various adaptations needed to be made to the original code. But, as the changes were largely incremental, it resulted in something that was pretty hard to reason about.
So I literally tore out the thing that I considered the most "wrong" about it, but sort of kept it at the side; then I started reimplementing it by mixing new code better suited for the current (and hopefully near future) requirements with the old copy-pasted code.
Spinning a bit off-topic, but perhaps sort of relevant to anyone planning to do something like this. The biggest gratification for that effort did not come immediately. Rather, the result was initially pretty underwhelming. Yes, the code seemed to be more readable, robust, and extensible, but since there was no immediate need to touch that area for some time, I started questioning whether it was actually worth it. But then, about a year and a half later, some new requirements arised, which proved that the effort was actually worthwhile. And we even had a direct comparison, because a security fix was needed for both the old, still-supported, version, and the new one, so it was nice to see the difference between what it took to fix the old one and the new one.
Much as I hate to admit it I can relate to this. When you come onto a new project, the first thing you notice are all of the things that could be improved. But most seasoned developers know that knocking down walls right away violates the whole Chesterton's Fence[1] thing and so you make restrained changes and small commits while learning the system. A few years on when you finally have the mental model and understanding to make the changes... you don't see the inefficiencies anymore. I don't know how to square this circle.
option 1: when you are new, work with a seasoned dev. They can tell you if this design can be changed and provide guidance, you do the actual work. This is great for learning codebase too, btw.
option 2: keep notes during onboarding. Revisit them after a few years working on code.
option 3: help new people onboard. Pay attention to where they struggle and if those parts could be improved.
> you don't see the inefficiencies anymore. I don't know how to square this circle.
I think it is because you are then able to assess the value of changing it - which is usually not very high. Also, when you are a couple of years in on a project, you probably have more important things to attend to.
I think it is great for new developers to attempt to fix these things when they come in: If they are able to, without breaking anything (and also convincing the rest of the team in a cheap fashion that they did not brea anything), then they know the code base and optimized it.
However, one should be aware that there is a high risk of them failing. Especially if the new developer is junior.
If you dislike a situation you're in and you try and fix it by switching to a new situation, you'll generally bring with you some of the problems that created that prior situation.
If instead, you bit by bit improve the situation until you feel at peace with it, you'll then either no longer want to move to a new situation, or if you do want to move, you'll no longer bring with you the problems of the prior situation.
Applies to job changes, relationships, projects, goals. And, from OP, applies to architecting software projects.
I wouldn’t knock that as a personal approach, but I do wonder whether it’s possible to hold to it in group settings, which require not only your own self-discipline, but the discipline of others to pursue the improvements.
Personally I am a fan of switching to new situations in groups, as a way to push people out of their comfort zone and force them to account for things they may not have had the perspective to appreciate previously. People are generally resistant to change, but once they start to get caught up in it, it’s difficult to avoid growing from the experience.
This is almost universally true. We need morals/ideals, but they must be grounded in reality, in what is. If we just trudge along, we loose any vision for the future that could be better. If we just have idealism when we try to make things better, things usually get worse.
I'm not a constructivist (and definitely not a critical constructivist); I believe reality is knowable and consequential.
I enjoyed this post since this is exactly my approach to improvements: imagine the magic want solution, find the deltas between it and current state, code the deltas.
At work I often get tapped to work with folks who struggle with "Better Engineering" ideas (codebase improvements with an eye towards increased productivity). Usually it's just people being unable to come up with any improvements.
I always prompt them: 1. "Is this the best codebase you've ever worked in?" and 2. "If you were to rewrite this from scratch, would it look exactly like this?".
It's amusing how often those two questions trigger a light bulb moment. I of course follow up to ensure their ideas are actually good and grounded (no "let's convert the monolith to microservices") but it does wonders for inspiration.
I have a somewhat more wordy version of this blogpost as a conference talk I've done pre-COVID (and pre-kids). In my perception, this mostly boils down to reviewing and revising interfaces.
Code that's not well compartmentalized and is full of complex dependency chains and flawed abstractions is hard to work in, and more importantly to the topic at hand: extremely hard to refactor well.
Once the abstractions are shuffled to their own "corners" of the codebase, and you've got well defined modules/services/microservices/foobars... you'll find refactoring to be far less of an investment. It also becomes far less attractive, as a well abstracted module is easy to ignore and forget about.
Of course, it's always best to make these things right the first time. Whenever I kick off a greenfield project, my first code-style objective is to make things easy to delete/remove.
Addendum: I find the worst spaghetti code comes from very dynamically typed languages. All the "easy" coding makes skipping interfaces/abstractions effortless, thus nothing's "doing just one thing well" and it snowballs from there. On the flip-side, when done right, it's quite a joy to write delete-able code in Python, and it makes prototyping and defining boundaries a breeze.
sometimes when a dev pushes for a refactor they'll say that the reason is about building maintainability/best-practices/code-philosophy/whatever but the TRUE reason that they may not even be willing to admit to themselves is that they see something FUN in the refactor.
finding patterns that could be abstracted is fun like solving a tricky puzzle. or getting a chance to play with a new framework or language feature and just seeing what it's like.
i don't know how a manager is supposed to handle this situation, but as a dev once i realize what my true motivation is it becomes a lot easier. i can separate the 'coding for work' from the 'coding for fun' and just put my head down and do the boring work and look for the fun somewhere else.
A simplistic take on a complicated subject. Some things, many things, warrant a rewrite. There are several reasons why: 1) the system is not used in the way in which it was designed 2) you've learned a great deal more about the problem 3) duct-tape instead of welds in the initial approach 4) technology obsolescence 5) overly-clever code 6) undocumented code, etc.
Now, I think the 'no rewrite' argument has validity at certain levels. Systems will morph (because life/problems change) but the underlying functions may not need to. Composability and interfaces are wonderful tools to address that problem.
Composability is a sadly underrated concept, at least for code with a longer expected lifespan. A lot of good things fall out from keeping one eye on it while designing / developing / maintaining.
This is related to how I define technical debt: the delta between what exists, and what you would write if you could start from scratch and had infinite time. Paying down technical debt is therefore the process of moving the system toward that clean-slate state.
Of course, if you look at all those little deltas between the current and clean-slate states, some of them are much more expensive to live with or solve than others. This definition doesn’t help you decide which changes to make first, only what your North Star should be.
> 2. There are probably edge cases this code solves that we don’t remember.
But fortunately, I write in-code comments documenting these, because I pay close attention to such details.
> 4. Your own code always feels better to read, because you wrote it. That doesn’t mean it’s actually better to read than someone else’s.
Having that mental context makes your own code easier to read. That's why those comments are so important: they share that context with the next reader. Well-commented code legitimately is better.
Kinda funny I was writing a chess engine in Python that was able to beat my tester (who beats the average person but is near the bottom of the bracket at the chess club) with just 6 plies of alpha-beta.
He tells me he'd be happy if it played at the same level but was faster (play more games) and also that if I want to take it to the chess club it has to respect time control. Supporting something like XBoard and UCI would also be a hassle in Python because it needs a comms thread that can interrupt a think thread.
I rewrote it in Java and the process was super-fast because I could cut-and-paste the data in the test suite also I had mastered the signs in the negamax algorithm (I screwed that up and it discovered this https://en.wikipedia.org/wiki/Fool%27s_mate !)
It's different from a lot of applications work because it's really a simple program and doesn't have the panopoly of features that you miss when you try something like
the hard part is that right now it is spending roughly equal time in evaluation and managing transposition tables. I think I can speed up eval about 20x which is going to make me code up some kind of specialized off-heap hashtable.
> The programmer builds from pure thought-stuff: concepts and very flexible representations thereof. Because the medium is tractable, we expect few difficulties in implementation; hence our pervasive optimism. Because our ideas are faulty, we have bugs; hence our optimism is unjustified.
Humans just have bad intuitions for this problem space, so you have to be consciously empirical; externalize decision factors, track outcomes, articulate hypotheses and be honest.
An old friend once said "Evolution beats Revolution" and that just stuck as mantra in my mind.
> I find that usually (good) programmers enter a new project with idealistic dreams of ripping out the walls; this could be done differently, that could be removed entirely, and so on. Then, later, the longer they stay the more those walls seem familiar, and the idea of changing everything becomes instead a distant memory.
Oh... analogies! In reality those walls will not disappear. Instead, the rooms that have been defined will become inhabited, and you'll notice that you just lack proper resources to remove walls. You'll start questioning whether you're in the construction business, interior designer (walls need decoration!) or just any other tenant for your landlord. You'll never be the needed tornado, so it's just best to hope to discover that the walls are actually made out of cardboard and some tropical storm is coming up after the past dry summer! :D
Fact seems, bad decisions can be done in a heartbeat, and it takes a lot of time and effort to iteratively leave it behind. I estimate that ratio to be somewhere in the 1/1'000'000 - 1/10'000 range and that's where I have some actual hope in LLMs to bring that number closer to 1!
My newest revelation to unwanted walls: If part of your code doesn't need to have a new release-(or life-)cycle, just don't put it in a new repository. The scaling advantages of microservices can be achieved with just a single repository.
> There are probably edge cases this code solves that we don’t remember.
Oh yeah, I agree with that. I wrote a program for work recently that probably should have been object-oriented. It would have been nice for it to be, because it's a bit of a mess now. But it works flawlessly and in truth it doesn't need any real feature upgrades now that it's done. So, I decided to keep it as it is.
I will say it does help when slowly morphing code that it was all C code. C is very malleable, like clay. I've found code in some other languages I won't name to be less malleable, more crystalline, more necessary to smash to atoms to effect the desired change.
[1] https://lore.kernel.org/lkml/3DB30283.5CEEE032@digeo.com/
[2] https://savannah.nongnu.org/projects/quilt/
[3] https://stacked-git.github.io/