"Come and hear how we’ve built an international team of developers to tackle the problems of resurrecting a poorly-understood, gigantic code-base extensively commented in German, with no unit tests, a tangled build infrastructure, and twenty-five years of un-paid technical debt."
Looks like a project I worked on. It's eerie how similar it is (and no, it was not an office app, it was something different).
At least the comments weren't in German, but there were a couple of ones in Dutch. (Not that the comments were of any help)
No unit tests: CHECK. Twenty-five year of technical debt, CHECK. And now we're supposed to make it work in a different platform
So thank you, but I don't plan on touching a monolithical piece of old C++ ever again. Not even with gloves.
Start by adding integration tests, then (try to) identify the core of the project and isolate that in its own sub-project in the repo. Add unit tests for the functions in the core and then split the remainder into parts that are peripheral and that could be removed/disabled. (say a spellchecker). Next up re-factor the core until you're happy with the state of affairs, but keep the interface the same. If there was no interface to the rest of the code then you'll need to define one and implement that as well as changing the peripheral code to use that interface. Then look at each peripheral piece and decide if it is worth saving, needs rebuilding or refactor it as well.
The basic trick is to reduce scope until a subsection of the work becomes tractable.
With a really large project this is a multi-year effort with a team of seasoned programmers, it's hard and it means that while you do it there will be no honoring feature requests. You need management buy-in, patience and perseverance but that goes for any large effort.
Building fresh and green stuff is obviously much more fun to do but taking an old codebase and making it nice is (to me, at least) just as rewarding.
I find this argument smells of rationalisation to me: You admit yourself that building new stuff is much more fun. Say making an old codebase is "just as rewarding" but in order to actually understand all the legacy you need to put in 2x (or whatever the amount) of work that you'd need to build a new thing. Wouldn't you want to build 2 new things instead?
I could never imagine someone on their own accord, going by themselves into a 30-year-old codebase and refactoring it, just to make it "look nice". On the other hand people constantly build new stuff (too much of it really). I think it would be fine to admit that working on new code is more desirable than working on old code. Sometimes the latter needs to be done, but I think most programmers are happy to avoid maintenance work when they can.
The problem is that in cases like these, invariably the old, crappy code contains documentation and functionality that will never be reproduced by a ground-up re-write. Understanding the old code and rewriting is, in general, just as expensive as refactoring. This is the common dilemma with old codebases like this -- in spite of its horrible, unreadable state, it contains business knowledge too valuable to lose.
As far as making the code "look nice" -- you've missed the point. The goal is to make it functional, readable and extensible again, and doing that to a 1M+ line codebase (which I've had to do) can be enormously rewarding, the business value gained more than tangible.
Building something new is more fun in the beginning. Refactoring an old codebase and making it fly is rewarding when you're done. This is not about cosmetics.
Both are fun, assuming you stay in long enough the first will at some point turn into the second, no matter how well your intentions were at the beginning ('this one will be different, this time I'll get it right').
Yeah but programmers are generally not bound to a codebase for life. A friend of mine who works at Google says that he generally works on projects in the "do it new and figure out what it needs to do" phase, while there's other engineers at Google who are very good at the maintenance phase and "keeping stuff running and adding slowly to it". I think that's a very nice way of wording it, but I'd definitely want to see myself in the former group, not the latter.
> Yeah but programmers are generally not bound to a codebase for life.
If you're young I can see how you might have that impression. But I know quite a few programmers personally that have been working on the same codebase for more than a decade, and even a couple that have been working on the same codebase for 3 decades.
The web is still young enough that most people that came into programming through building websites have no idea how long most code bases are alive.
There is a neat little proverb here: programs are like children, you can start one in an evening but you'll end up supporting it for the rest of your life.
Not arguing against the fact that old codebases exist and that people have to take care of them.
However I don't agree with glorifying old codebases or putting them as "equal" to working on new stuff. If you're happy doing that, then fine, no one's going to stop you from doing it, but I do think that in general, given the choice, almost anyone would pick building something anew over working on a legacy codebase.
Heh -- stick around, kid. :) Those shiny new greenfield codebases, so full of promise, perfect architectures and unfettered extensibility, can turn into slow-moving, spaghetti-code Leviathans in a matter of weeks. It's not just the decades-old, massive LOC codbases that provoke this dichotomy: chances are very good that the code you're working on right now, that gorgeous, sleek, sexy code -- will be staring at you with its cold, incomprehensible dead eyes within the year, asking you how, how did it come to this, how did you neglect me so, and why do you hate me now?
OK, or maybe not. But it can happen, and it does, all the time, in really good shops with very experienced coders using all the right techniques and best intentions. I know. Dear God, I know.
Of course anyone would rather bootstrap new code and feel the thrill of new object models, shiny, hip new libraries breezing through their editors. But until you've experienced both sides of the equation, the long-term consequences of technical debt growing through a codebase like some kind of...virus...you'll not know how important the second-order effects of refactoring, TDD, and truly clean code really are.
> Not arguing against the fact that old codebases exist and that people have to take care of them.
ok
> However I don't agree with glorifying old codebases or putting them as "equal" to working on new stuff.
Ok, so we will disagree on that then. I think that both jobs, new stuff and maintaining old stuff are equally rewarding and are essential skills. If all you can do is make new shiny MVPs you'll never run a business.
> If you're happy doing that, then fine, no one's going to stop you from doing it, but I do think that in general, given the choice, almost anyone would pick building something anew over working on a legacy codebase.
You won't be given the choice, unless you keep running away from your creations.
The neat thing about old codebases, is that even if the code is messy, the problem it solves can be very interesting.
"Legacy codebase" is another word for "important codebase".
When you're coding, you solve the most important problems first. Thus, in some sense, the older the code base, the more important the problem it solves.
> I could never imagine someone on their own accord, going by themselves into a 30-year-old codebase and refactoring it, just to make it "look nice".
One of my "hobbies" that I unfortunately have too little time to delve into, is to find old code that I want to see live on, especially old Amiga code, and make it live on. Some I "only" make compile for AROS (an AmigaOS re-implementation). Some I try to port to e.g. Linux.
Part of it is nostalgia. Part of it the learning experience - code written for a system with 512KB or RAM and a floppy is often structured very differently, and sometimes the approaches are still interesting. Part of it that it is just relaxing to clean up something where all the hard implementation decisions have been done.
You could do the refactoring to enable yourself to add shiny new functionality to the big old product. In the end you will have a revived mature product with shiny new brass bells and whistles. While that other guy will have nothing more than just another half-usable prototype.
- The project ran on embedded hw (hence it does not have a user interface, only interface is with hw devices)
- There was a lack of problem domain understanding, so it's not - like in LibreOffice case, for example, 'this is a list of paragraphs', but 'we don't know what's this piece of data' because if refers to a specific domain of knowledge
Not to mention unit tests (both writing and running) are more painful in C++ than in Python for example
I think acquiring domain knowledge is one of the most overlooked elements in software design. Lots of people write programs for things they essentially do not understand.
This causes all kinds of misery.
I'd see this being C++ over Python and embedded as advantages rather than disadvantages! Each to his own I guess :)
Another side of the same approach is just starting to hack bits out in to their own services. Really the key point that you identified is trying to find the internal interfaces of the code, and fracturing the monolithic codebase along those lines.
I agree, it's doable. You do need to put a lot more effort into coordination between team members than usual, because it's very easy for the developers (should I say the refactorers?) to step on each others' toes.
I think that this scenario routinely happens at many companies. Only you don't hear public stories about it, because in the end what happens is the same for all software: continuous transition/refactoring. I know I've been working on many projects like in these conditions, though not of that size.
With more and more experience of that kind and by working on stinking code-bases, I've come to the conclusion that while in the past I could have thought that trashing the code and starting from scratch would help, now I would probably approach most problems by pushing new code in the form that I want and transition the rest as changes are required.
I had projects that I did myself with great design care, but after 5-6 years due to shifting requirements also started to look like you could have done a better job by starting from scratch again. Reality is, in retrospect all code is suboptimal.
Looks like a project I worked on. It's eerie how similar it is (and no, it was not an office app, it was something different).
At least the comments weren't in German, but there were a couple of ones in Dutch. (Not that the comments were of any help)
No unit tests: CHECK. Twenty-five year of technical debt, CHECK. And now we're supposed to make it work in a different platform
So thank you, but I don't plan on touching a monolithical piece of old C++ ever again. Not even with gloves.