They neglect maintenance because we reward resolving outages and fighting fires more than we do good stewardship.
Imagine the following scenario.
Acme Co. has some piece of software that is critical to the operation for their business.
Dave is in charge of the software system. The software crashes often but Dave is always there to save the day.
Dave often has to stay up all night and is seen fixing problems at the final hour. Management takes notice of this and often sends out emails congratulating Dave for his hard work.
One day Dave leaves, and is replaced by Amy. Amy takes over the system and slowly and methodically fixes all the bugs that were causing the outages in the software.¹
Over time the code becomes more reliable, till eventually it runs smoothly.
Now what would management say if you asked which programmer was better. They'd likely say: "Amy is pretty good, but boy that guy Dave was a real rockstar!"
When you reward a behaviour, you will get more of it.
If you want software maintenance, then you need to reward it. But often, perversely we reward its opposite.
I'm working mostly as an ops person these days, and if I do my job well it makes my contribution look invisible. When things are going right, there won't be anything to notice.
Amy would have seemed as rockstar as Dave in the beginning, as she had to do all his work plus diagnostics plus analysis and fixing. If she made sure to inform what is she fixing, management would appreciate her more.
There are nefarious managers, but most times it's our own failings to communicate that makes us invisible.
> It's not a failing to be introverted and not desire attention/spotlight.
It is a failing if you care about getting recognition. Self-promotion isn't the only reason for recognition. For instance you might want to promote proactive maintenance instead of reactive repairs.
> Most people usually get annoyed when you tell them what you're doing for them.
That is very different from my personal experience.
Anecdotally I have to agree with you. When I worked at Microsoft as a Cloud Solution Architect, my job was to help companies adopt the Microsoft cloud. There were some companies at a fairly famous startup accelerator which everyone here is familiar with who I was already engaged with when a new colleague also started helping companies out of this accelerator. I was quietly doing my job and helping solve problems, he made a big fuss about how he got Microsoft's foot in the door at this accelerator. Naturally he got tons of accolades and even a bonus for his work. This need to constantly be selling yourself internally was ultimately what lead to my departure from Microsoft. I'm here to solve problems, not to talk myself up as the savior of the world. It's also why I much prefer working with startups. Your work is much more evident and you aren't constantly having to tell everyone what you've done.
Maybe I got unlucky in my career but I've worked with a "Dave" and an "Amy" where the OP's description is spot on.
"Dave" got all the hero credit and "Amy" was deemed as nothing more than someone average at her job.
While peers might now "Amy" is the real hero, non-technical people don't grasp that very well. And unfortunately normally those are the managers and executives.
> While peers might now "Amy" is the real hero, non-technical people don't grasp that very well
This could be also a failing to communicate effectively. Us developers sometimes have a hard time translating:
"the join duplicates records because X table has duplicate values in Y field. We need to add a unique key constraint to it and catch the exception in module Z"
into
"your report has duplicated products because someone captured the serial number twice. I'm going to add the missing validation so this doesn't happen again".
If Dave brought up the system and features, from nothing to something, which is where the difficulty is, then he probably should get more praise.
I think polishing or even refactoring an existing solution that now has well defined and understood required features, corner cases, and quirks will always be vastly easier than the unknowns involved in building something from scratch.
In all the places I've worked over the last decade Dave would have many, many incident reports against his name, and all the root cause investigations would not look good for him either.
I wonder if the difference lies in whether the business specializes in software. In a business that doesn't, I don't think they would invest much in investigating. Dave might be the only person qualified to do such an investigation, after all.
It could also be a matter of the ratio between tech work to be done and tech workers available. In an IT department that has too much pending work, it might be more worthwhile to implement quick crap solutions than to go back and properly fix stuff. You could call this a failure in properly funding such a department, but this is viable to happen in a business where the executives (or whoever controls the funding) don't really understand the difference between crap tech work and good tech work.
This would presume that the person doing the firefighting is the original author of the software. That you're always on call specifically for your own code.
In most ops roles, Dave would be fixing someone else's broken-ass system, and anyone else in the rotation would have a similar number of "incidents" against them.
I worked at a place where the chief architect was exactly Dave. Our core app was a pile of crap he had written that routinely got gummed up and he heroically "fixed it" and all the execs raved about him. The poor sucks who supported the app (and me as well) knew it was because the app leaked like a sieve but was fixable, but he wouldn't allow it because then he would have to admit it was his fault.
Over several decades I have been burnt many, many times trying to be the "Amy".
because Dave isn't blamed for the incident reports - either the original reporter is or no blame is assigned.
Dave gets all the accolades though.
Human nature at its "finest".
I now work at a place where that's no longer an issue for me because "blaming the messenger" isn't a management culture where I am now.
I worked at a place where I was able to focus on maintainability for a few months. We were seeing about 30 PagerDuty alerts per month. Through a combination of bug resolution and auto scaling, we got it to an average of less than two pages a month. Moving to AWS Aurora removed the final 2.
Eventually the entire system self supported...and was retired a year later.
Even in the case of a suitable rewards system, one still needs the skill to be able to find and fix bugs that cause systemic issues. Many times, a company opts to get the developers who are good enough - they can build the system, and then they hand it off to some Ops team that maintains, and good luck trying to get them to fix any of the bugs introduced over the course of the project.
What should be rewarded is the skill necessary to architect software in such a way that it is easy to maintain. I often say to myself, does this decrease the effort needed by the reporting team to create reports? Does it isolate features and make it easy for devs to find the source of issues and fix them? Is it modular enough that users can expect a quick turnaround time for new features?
What I've found is that many developers, even those who believe they should be seniors and above, are unable to think from this perspective, or frequently think it's "impossible" to make software that addresses and solves the issues of all groups. In my experience, it is never impossible, it is all about having the right architecture, which can takes weeks to fully think up and realize, and may still change afterwards, but the design of the software is what I usually choose to reward.
> frequently think it's "impossible" to make software that addresses and solves the issues of all groups.
I find this increasingly to be the case.
The fashion now is to do everything with MVPs and sprints.
The big hidden assumption with this style of development is that any problem can be solved using incremental 'local searches'¹ through the design space.
This is definitely not true, especially for co-optimization problems like you describe above².
i started my career in electrical control systems (relays, mcc design, etc) and have a very different take on maintaining a system as a result. we maintained systems as a matter of safety and productivity. this isn’t just a truck roll to check terminations and maybe do a thermal scan, but revisiting the system design when a change was ordered. will this create a safety concern? will it cause a production stop? will the lines go down?
after a number of years in (mostly) software i’ve observed less tolerance for that level of analysis. i understand why it can/will impact velocity, and that the cost to change software (and time) is a fraction of that with physical systems...but still...a part of me winces whenever im like “screw it, im merging this sans impact analysis” ;)
Early in my career, I got stuck doing a lot of software maintenance on legacy systems (I suspect this permanently stunted my professional growth, but this was during the great recession and I took whatever work I could find). It's a hard rut to get out of - I spent a lot of time developing relevant skills in my spare time.
Anyway, one place I worked had a 10 page procedure for building the system. It involved running several scripts, logged into 2 or 3 different servers (we were running Solaris 8 - which was already quite old by then) as different users. About half the procedure was written down, the rest was unspoken and passed down via oral tradition.
A particularly intelligent and motivated guy once took it upon himself to unravel the whole mess and come up with a sane build solution. Unfortunately, his quest was not approved by management, the time taken away from bug fixing caused him to fall behind on "points" (i.e., number of tickets closed in the bug tracker), and he was laid off. So we stuck to the old system that sort of worked, most of the time.
Now, one of the user accounts that was required to build a certain module belonged to a person that no longer worked at the company (let's call him Steve - I don't remember his real name). We were certain there was a permissions or environment issue somewhere, but were unable to track down the specific reason why only Steve's user account could successfully build said module. We were successfully ignoring this obviously absurd situation until we got a new sysadmin....
The new sysadmin was unfortunately a very motivated guy who wanted to set things in order. He reviewed all of the user accounts, noticed "Steve" no longer worked for the company, and deleted his account. We were unable to build until the backup containing Steve's account was restored.
We never fixed the FUBARed build procedure, mostly because our customer (this was a defense contract, so the govt was the customer) didn't want to pay for such an effort - every thing we did had to be tied back to a specific req number in our system requirements specification (SRS). Fixing technical debt was not covered in the SRS, so it never got fixed.
Eventually I was caught up in a mass layoff and went on to better things :)
1) Long term software maintenance should not be treated like help desk ticket work nor should it be dumped off on junior programmers. Every company should have at least one guy who enjoys maintaining legacy systems and that should be his sole job.
2) Make sure fixing technical debt is covered in the SRS.
> Every company should have at least one guy who enjoys maintaining legacy systems and that should be his sole job.
That's a career death sentence. I suppose if I was retired and wanted a part-time gig for extra money/amusement, I'd do it. But one thing I learned from working maintenance early in my career was... don't do maintenance work.
I think it depends on the KIND of maintenance work. In my case, I have a few languages under my belt (both OOP and FP). But this means I have a reputation in the area as a "Mikey for Maintenance: He'll fix anything"
Which actually wound up giving me the opportunity to study a lot of both good and bad designs.
I still maintain things, but now people ask me for design advice and help with POCs too.
Which, trust me, is less stressful and more socially rewarding than being a technical lead on a project (done that too).
But that's not a common thing. I do agree with GPs assessment that there are good maintenance devs. But they aren't common, and it can be tough to tell if someone with 20 years of maintenance actually enjoys it... or just could never be good enough at it to get a chance at a different role.
if by maintenance work, you mean "patch up the system so it keep trudging a long" then sure. it sucks.
And I feel that when people refer to maintenance work that is what they end up doing.
I like improving systems (I think it is a much a bigger challenge than making a new system from scratch). I suppose this is a kind of maintenance work.
But I am still trying to figure out how to get the green flag to do this. I often want to re-architecture big parts of the system; but at the same time, I don't want get overworked (neither does anybody really).
I find that I don't really have an incentive to refactor a semi-large part of the code into a better design (how do I even know if it's better? seems subjective). My incentives (deliver the story, complete the sprint's points) push me to just change as little as I can (patch it up) and then forget about it instead of refactoring a lot of it.
And every time I have done that I keep thinking as I do it: "this is exactly how this code got to this point. A few generations of developers with no incentive to refactor this from a slightly bigger picture, just carefully hack in what the story requires and move on". aah the joy of software engineering in real life :/
It is. Also, people don't want to do it for any amount of money.
I quit a previous job and took less money because I got moved to maintain the crumbling Forte4GL core system - I did get a handsome raise for that, but the writing was also on the wall as they were also working to replace it. Migration project has been ongoing for 8 years and is nowhere near done, I might have been able to retire before it's completed though. It was also soul-sucking.
> It is. Also, people don't want to do it for any amount of money.
Like many things, there are people who actually are interested in doing this kind of thing. But since it isn't rewarded in any way, hell, because it's actively punished by many organizations (often by the time you need to look for a new job and you lack the right bullet points to hire you), you're not going to find anybody willing to do it, who actually would like to do it.
But there are people that love doing it. My wife is an example of one. To her is it a great mystery novel where she gets to tease out all of the details and try to understand why things were built the way they were. Then try to make them 'better'.
Of course, she was a government employee (mil) and her career path was set regardless of what she actually did and the pay level was set in stone regardless of what she did as well. So there is that.
> Of course, she was a government employee (mil) and her career path was set regardless of what she actually did and the pay level was set in stone regardless of what she did as well. So there is that.
I mean, this is only relevant insofar as that this position allowed her to pursue her interest in what kind of programming she wanted to do. In any private company, the incentives are so perverse that she wouldn't have been able to do any of it, but that wouldn't change her preferences. It's just further indication that, given the right environment, there are people willing to do this kind of thing.
this. I like this too. but then, in practice, I very rarely get a green flag to actully make it better. Not until is do or die. And by that point necessity forces a rush job, so it's not actually made better, just patched up to live another 'day'
Only if you optimize for your current salary instead of optimizing for your future career prospects.
There isn't a reasonable amount of money (it would have to be twice my current salary) for me to accept a job doing VB6 or any other job that would endanger my future employment prospects.
If I can retire one my savings after the job is done I don't care about future prospects. I'd rather work on [insert any hobby with no commercial value] at home than go to work.
Exactly, that’s why I said any “reasonable” amount. It usually takes me less than a month to find a job as a bog standard “enterprise developer” when I need the “right now” job or contract. It would take me a lot longer if my resume showed that I’ve been maintaining legacy software for two years and wasn’t doing Resume Driven Development.
I would have to spend a few months doing some work I could post on Github using new technology (something I never do).
The other take away is this; for many situations, investments made toward cultivating the existing system can save the company more money in personnel and run time costs than can be gained from developing new functionality for the same system.
I've worked for Gov, Consultancies and Private sector clients. They all have shared part of the blame at some point.
But, I've specifically seen the companies charge £75k for firewall changes or sever upgrades, when the reality was they're already paid £10,000,000 a year manage servers that a couple of good puppet admins could have done.
Point being, it's sometimes not about not wanting to pay, but simply about not wanting to pay for something that should have already been done.
Of course, this all comes back to the fact they gleefully oursorced their tech competency, based on the recommendation of consultancies, based on their lack of long term strategic management, so they apparently get what they deserve.
It's a shame that Gov is spending tax money though.
Outsourcing of technical capability (especially prevalent in government) bites you hard when you lose the ability to work directly with your contract teams and have to rely on their direction. (Especially if the management is non-technical)
There’s a clear conflict of interest on the part of the contractor, and you need to be able to set their direction competently. I would recommend always having an in-house capability, even if it’s just an in-house consultancy, so that you don’t run into these kinds of issues and can provide adequate direction / oversight.
The GS payscale is not set up to be able to hire and retain highly skilled technical experts. Even if I were to be hired at the GS-15 level, I'd take a pay cut from what I earn as a contractor.
I once worked on a contract in a Systems Engineering and Technical Assistance (SETA) capacity, which was both an odd and eye-opening experience - basically advising the govt customer, as a technical expert, on contractors' proposals and sitting in on reviews and such.
That's a good fallback option, though it's still difficult to control turnover and feel like you are aligned in terms of value using that approach. You'd want to find somebody that you trust and bring them on in that special capacity, rather than just putting an RFP on the street for a SETA advisor.
Agreed on the GS payscale and its ability to keep up with high-end private sector pay though I also think that the entire hiring process may be more of a barrier than the pay itself.
All that said, a non-supervisory GS-15 in the DC area (18% locality pay bump - more if you're SF or NY) is maxed out in the 170s which isn't shabby!!
When you combine that wage with generous retirement matching options, job security, good work/life balance, and excellent health insurance at well negotiated rates it ends up being a pretty attractive offer.
Furthermore, there are other government agencies and roles that have their own specialized payscale outside of the GS scale. This can include financial regulators (higher pay to avoid revolving door problems) as well as "cybersecurity" billets across different places.
I find the trick is to do maintenance while doing meaningful (as described by the powers that be) work. So when a new feature or bug fix is asked for, add another 15% to the time required, and clean up the code base a bit.
The knee-jerk, which rarely works, is to demand time to refactor everything or ignore everything until it breaks irreparably.
I think in your case, simplifying a script by a dozen lines a week or so could have been done without taking away "points". Abandoning the status quo and committing too much time will inevitably be looked down on in most industries, however.
The poster was doing a military contract, not working for his own company. His companies money - and thus his bonus - is in putting in hours to do what is requested. Saving money means they get paid less hours now the project is smaller and so the bonus is less because they are paid less.
In private industry it can (but often does not) mean that you can get more profit and thus a larger bonus by saving your company money long term. However in the world of contractors saving the client money this is not rewarded.
this story is so familiar that it’s not even funny. I also have seen this while the project is developed. No incentives for fixing stuff that would exponentially increase the productivity of everyone - just keep adding shit until you move on. Pretend like learning the obscurities of a home-brewed/hastily hacked mess matter.
OP mentioned the work was on legacy systems, and many of the top jobs that everyone want are likely going to be doing new development with new(ish) tech. I'd imagine especially doing legacy work at a non-tech company will not look very good on your resume to a tech company (whether that be a startup, small software firm, or big n)
> ... doing legacy work at a non-tech company will not look very good on your resume to a tech company...
This, more or less. Maintenance also tends to be a very low-visibility activity at most companies as well. Promotions go to folks who can make a name for themselves working on the company's hot new project.
It took a while to find a job using modern tech when I had a resume full of things like Solaris 8, Ada95, ClearCase, and Oracle. I finally found a company to give me a chance doing new development using modern tech (well, Java, at least) back in 2012 (in my early thirties). My career has been doing much better since, but I feel like I'm far behind where I should be for someone my age (not in skill, mind you, but in pay, role, status, etc).
Interestingly this is also related to employee churn. New devs coming to our company usually don’t want to “maintain and fix” the code of others. They’re keen on building new stuff with the newest technology/stack. That adds to the problem.
At your annual review, "I kept our rickety leaning tower of tech debt from falling over" doesn't sounds as good as "I deployed blah blah systems to (do whatever)".
This is bad, of course, but it's generally how things go.
Even more so, when you're interviewing somewhere else you (and usually they) want to talk about new things you deployed or new systems you created, not how you maintained this boring code base for 2 years.
I've had good results highlighting performance and security-holes I've fixed.
Engineers also sympathize/appreciate with improvements I've done just for my own sanity, like making the build pipeline faster.
None of these where things I did because someone told me to, they where things I do because I cared. Meanwhile plenty of my peers ignored the problems or couldn't be bothered to fix them.
Newest technology isn't always the problem. The problem is always the choices and incentives you have.
All projects I lead always have the aspect of how are you going to maintain it when I'm gone. So I tend to "force" the project owner to spend money/time on simplifying unmaintainable parts of the infrastructure. They're always happy about these changes after the fact however.
It's also the opposite of what most big consulting firms do, because they do want to maintain a steady revenue stream from their clients.
It is because maintain and fix doesn't raise your profile as much. People do what they are incentivized to do. If there were extra points at review time for maintenance, people would refactor old code all the time.
I might enjoy making systems cleaner and more maintainable but if the incentives for my next promotion are "lead new large scale project to ship something a few times and maybe you can get a promo", why bother?
It's probably one of the worst parts of large-ish organizations
Really? I'm quite to opposite, I love to look through old code, add missing testcases and fix minor issues.
It helps me greatly to learn the ins and outs of the software.
>I love to look through old code, add missing testcases and fix minor issues.
You've been working with decent code, in that case.
The legacy stuff I come across isn't even testable without massive refactoring, even if the company allows you to do that in the first place.
Also, deprecated libraries for which there is no upgrade, which leads to further framework support issues (e.g. you can't upgrade X 'cos it doesn't support Y where swapping out X would take siginificant effort).
The worst problem I come across is spaghetti code where the business logic is ridiculously hard to follow, and it's not entirely clear if something is intentional or not, till you inevitably break it, and get an angry email from someone you didn't even know used the system.
I like the definition of legacy code as untested code, since it means your fresh pile of untested crap is immediately legacy code someone will have to deal with later -- often yourself.
Refactoring (for testing or not) is something you can do without the company 'allowing' it or adding 'dedicated trust work time' to the schedule. Sure, it does need buy-in from whoever reviews/signs off your commits if they bother to ask why you did that thing, and in pathological cases you'll have to keep a test suite to yourself / a local environment, but the social issues seem less of a barrier than people just throwing their hands up at perceived technical issues. I like recommending Feathers' Working Effectively With Legacy Code since it serves as a technical reference of solutions to the usual technical challenges/complaints that hang people up.
Thats pretty tame. I once got a bug like: Some contractor wrote this, got mgmt to get a new, insane deployment procedure scripted and deploy it on completely the wrong server. Some days it produces a report, other days it swallows 16GB and dies.
After he left we wrote the contents of his hard drive to this stack of CDs, and they are scratchy enough to have data loss. Of course the prod executable differs from all variants of code found on the CDs.
It turns out to be a custom reporting framework for producing exactly 1 report. For extra speed, it starts 100 threads but doesnt have any synchronisation. Which turns out not to matter as they are all fighting for 1 DB connection.
You are not allowed to touch prod directly, but have to get some unix guy on the phone so far to recover logs, executable, and config. And as that guy has to support this mess, he hates your guts by association.
I actually quite enjoy the true mysteries that nobody has touched for years. It's a situation where you're expected to slink off into a corner and swear quietly for a few days, and nearly-certain hero status if you pull it off.
The more frustrating situation it needing to jump in and make a one-off fix to something that is at least notionally "modern". These can be just as impenetrable, but you have to hold your tongue much more (especially if some of the developers are still around), and there's generally a lot less kudos for sorting things out. Personal "favourite": debugging stuff that's virtually impossible to build locally "...because we've got this great CD pipeline".
Hah, I have the opposite. New devs come in and want to refactor everything. We should make our whole react codebase functional instead of class based, because it’s clearer.
That's what you get for choosing react. Many ways to do the same, many alternative packages. Extra complexity layers to just bind rest json to a control. Good CV booster though.
The only company where I was tasked with "maintaining and fixing" the code of others, parts of the codebase where 90s' blackboxes nobody would touch with a 10 ft stick, and simply setting up the proper environnement to reproduce a bug could take up to two days.
Then some weird error relating to some obscure fax sending module reconverted into an email templating service would creep up somehow and have to be fixed before I could go back to debugging.
So, yeah. I churned, and went on to build new things.
New devs almost certainly shouldn't be given maintenance and repair tasks. How can someone know all of the business logic and needs a given solution provides within a month of starting?
This problem is because the old guard wants to move on from the problems they created. They want to play with new toys. So they hire new devs to do the work they don't want and "free" themselves for greenfield development (i.e. the exact opposite of how it should be managed).
> New devs almost certainly shouldn't be given maintenance and repair tasks.
This would just massively increase churn. Old guys get sick of doing the same old crap especially when seeing the new guys get all the good stuff. So it increases the likelihood that people will just move around a lot more often in order to build that "new stuff". Eventually you'll have lots of companies saturated with "new stuff" that's no longer so new and nobody to maintain it because everybody knows the second you're too old in the company you get saddled with maintenance tasks.
> the old guard wants to move on from the problems they created
Just because it's old doesn't mean it's a problem. It still has to be maintained though, with higher effort since the technology may be outdated. Even the best designed systems become a chore to maintain when you have to do it for 10 years instead of doing anything new.
There's a difference between handing new engineers ownership of maintenance tasks and training them to maintain an existing system. The age of a system isn't the problem. It's the baked in processes that are critical to the business that a new engineer could not have learned from outside of the company. Management sees that baked in knowledge as "some light configuration" when it is often far more complicated than whatever system it was built on.
that’s okay. they get brought in and save the day when the new guy messes up. the classic definition of win-win: you’re off the monster you created and doing things with exciting technology AND you are a hero for saving everyone on a regular basis.
It sucks being the old hand as well. I was once sent to a customer's site to fix a deployment that had taken too long. It was for a telco product that required a couple racks of UNIX machines and Cisco routers.
They had sent a new guy with essentially no training and he had not done a bad job. He was just new and needed time to get up to speed. When I started working with him he clearly knew what he was doing. But we got everything working 1 day after I showed up, because I had been deploying this product for a few years already.
Management basically praised me and shat on the new guy. I felt like crap and took new guy aside. We became friends, but this hurt his review later on.
This can be good or bad depending on how it’s done. If the new code addresses the same issues as the old code more succinctly, or if it can cut out some unused code that the original solution had, then it can be a net improvement. Eg refactoring an NIH solution to use new libraries that didn’t exist originally.
If it’s replacing sound code with a half-thought-out implementation, maybe not so great.
As I get older the simpler things seem so much more attractive - for example a recent project is using lists of dicts (python base base base types) because it's simple and easy to think about. It's recently got to the stage where "we should use pandas dataframes" is entering my brain, but it has meant that we focused on the business issues of handling the data not the issues of a large third party code base (even a well run well thought out one)
Maintenance starts with not adding complexity to your code
I agree, with a caveat: don't do dumb things performance-wise.
The other day I worked with a Lisp code base that used built-in lists for math-heavy code. The result had garbage performance and even looked less readable than proper array-based code that loops on stuff imperatively.
Threads like this really help me appreciate my workplace and stay confident in applying pressure to reduce technical debt.
I've been the lead at a ~30 person, ~7 dev, ~$5M/yr company for 10 years. As we've grown from < 10 people and 2-3 devs I've had to transition from doing whatever I felt was needed as I found the time to having to get approval across our departments and schedule work across quarters, years, and across a handful or two more people, but I've always been afforded a good deal of trust in my opinion and understanding of the costs and risks of technical debt.
I often second-guess my emphasis on improving existing code. To come even remotely close to keeping up takes a lot of developer and QA time and has opportunity cost.
Deep down I believe it's vitally important, but there's a nagging voice asking if this is really worth it. Sometimes it turns out not to have been. More often I look back and think that part we made better could've used even more time investment.
Personally, I enjoy maintenance work. It's an opportunity to refine and improve ideas I or others had previously. It's less uncertain - the big picture is there, it has history, we've seen how it's expanded and changed over time already, I don't have to try to predict the future as much as I make architectural decisions.
Completing a significant refactoring effort is every bit as gratifying to me as launching something new. Staying in one place for a decade has given me opportunity to see through efforts that take years before all the pieces are in place.
Everything has economic tradeoffs, triage is needed, but a big problem seems to be a deep set belief that the maintenance you can achieve in x time/budget won't actually pay for itself in fewer issues coming at you later that amount to not paying a cost > x to address. Or similarly is not great enough or frequent enough to create a feedback loop for compounded savings.
I don't want to discount just building things better to begin with though. For all the buggy stuff out there limping along there's also plenty of stuff that simply doesn't need maintenance, and another lot of stuff that only needs trivial maintenance.
Well, maintenance can have many different consequences. It can save more than it costs on disaster aversion, it can save more than it costs on small problems, it can save less than it costs on a disaster or small problems. If done earlier it can reduce the cost of future maintenance by much more than it costs, or it can reduce by less.
Doing it right on the first place has also all those possibilities, and the risk that you may discover that you spent a lot of time doing the wrong thing completely right.
I'd say in a case like this, it's an institutional issue. If you aren't getting support from your organization for incentivizing and clearing the path towards maintenance work, then it quite simply isn't your problem anymore.
Exercise, eating well, sleeping enough, socialising and having a little fun are often treated as luxuries in many cultures yet are essential to maintaining healthy living.
Developers who do maintenance are seen (by other, typically greener, developers[0]) as being less skilled than those who work on "new" projects.
Personally, I prefer a good mix. I like new development, but I also like helping a business thrive and so I know that maintenance is a part of that.
[0] I am not convinced that most managers see it this same way. I think most managers see the people they want on maintenance as some of their most valuable employees.
The article has some good points but I think the primary issue is that of economics, most organizations are too small for the cultural aspect to override economic concerns.
Maintenance doesn't provide immediate economic value, not to customers, not to board meetings, not to developers, not to managers promotions.
Developers like myself like it or not, maintenance is a strategic investment driven by long term policy for which many organization have none or can't afford one.
Joel Splosky once said "I can’t tell you how many times I’ve been frustrated by programmers with crazy ideas that make sense in code but don’t make sense in capitalism". I think it applies to cost of maintenance fairly reasonably.
By the time that outstanding tech debt becomes truly problematic, the original authors have most likely moved companies. I think a lot of people know this deep down, which is why so many of them don't go as far to build things on a maintainable way. There's economics for the company, then there's economics for the individual engineer.
Similarly, fixing that technical debt will likely look like pure overhead while the benefit of "other people will finish future features faster and with fewer bugs" gets socialized, realized diffusely and goes largely unrecognized (except, if you're lucky, by the other developers themselves).
I learned that one the hard way. Features get priority because features have got visibility.
The picture gets even worse when you consider the probability that fixing technical debt will break things.
Even if they were still at the same company they'd have moved on or been promoted so it's unlikely they'd be asked to maintain.
In my own experience though it is one reason I left my last job. 2 year of new dev, 3 years of maintaince with no end in sight. GPU related stuff, new GPUs come out with new drivers all the time. Working around those bugs so the people using my code don't have to deal with them. There is some feeling of reward but it's mostly no fun.
Even in my personal projects the more I have the more maintainance piles on. Browsers break stuff or warnings about libraries. Each project adds X more notifications a month. At some point I'll be overwhelmed with maintainance notificications and have no time for anything else unless I ignore them.
> Maintenance doesn't provide immediate economic value
Reading Taleb's Skin in the game helped me articulate it to my team and managers how tech debt is a huge risk and need to find sustainable ways to pay it off. For last few years I have insisted on making sure that every sprint one engineer in dedicated to working on maintenance. Also once every year we do a 1% hackthon (2-3 days) where we work on low priority tickets that have not be worked on for a long time. This stops low priority tasks getting starved out. They impact only a very small number of customers, but added up do end up increasing overall satisfaction of all customers.
Joel said that in the context of begging programmers to learn some micro-economics. It's fair to treat maintenance as an economic tradeoff, but the keyword is tradeoff: it's not just a list of pros and cons, but interlinked relations (just as supply-demand) where doing more/less of X necessitates more/less of Y.
Beyond that real engineering programs will also at least have some sort of 'engineering economics' component that if nothing else exposes one to the fact that say a mechanical engineering company will use present value / future value / maintenance cost / replacement cost / lifetime cost calculations and estimates all the time for machinery and so forth, some of which could be applied to software.
Since Joel's "Joel Test" includes things like "Do you fix bugs before writing new code?" I doubt code maintenance was high on his mind when he wrote the bit about crazy ideas. Can any company even answer yes to that these days? Do they have a strong economic argument? (Best I see is "P1 and P2 bugs closed before release".)
I don't think so. Tech debt isn't like real debt, you don't have conspicuous interest rates set by the market or state. Also, "future discounting" of both engineers and managers is driven by career options (i.e. promotions and job-hopping), which don't directly depend on interest rates.
you don't have conspicuous interest rates set by the market or state
Not explicitly, but you could say that the interest rate on technical debt is determined by the rate of change in connected components (be it build-time or run-time dependencies, data producers or data consumers).
I don't think we're going to make it on this planet if we think of maintenance as a crazy idea that doesn't make sense in capitalism. We don't have to see capitalism as the be-all and end-all, much as it seems like that right now.
I worked for a company at one point where there was two teams. One team was maintaining the company's main product which was consistently making positive returns to the company year after year. The second team was working on green field projects whose value was entirely speculative. I started on team two, and quickly switched to team one.
After a year when the layoffs came, which team got cut?
The statement was about getting a promotion, not about avoiding a layoff.
I've also seen the opposite of your scenario where a company (almost blindly) cuts staff, including IT/Engineering, and lets go of some of the very key people (sometimes the ONLY people) who have any intricate knowledge of the system in order to maintain it and get caught in all kinds of problems.
In one case, I walked into a team that was maintaining a component and they had to claw back the actual laptop from the guy that left and attempt to dig up uncommitted source code that was need to maintain the system.
It will impress your fellow engineers but management won’t care at bonus time. It’s the classic sysadmin’s dilemma: the better you are, the more the organisation takes you for granted.
They will say that you did a good job fixing the issue but most mid level managers won't give you a bonus for that. In fact, in the next breath, they will criticize you for not putting in enough effort into introducing new tech. Fixing the problem is an opportunity cost.
The only exception I have seen to the above is if you catch the eye of top level engineering management by your actions.
I have seen major problems lingering for years because management's actions have shown that they don't value problems being fixed.
Furthermore, when bugs become a priority, the metrics they care about are bug fix time. You are better off not being associated with a difficult bug because it lowers your close rate and lengthens your average time to close bugs. Perverse incentives.
It might not impact your bonus but it can be added to a doc detailing your accomplishments for justifying a promotion (or at least as a potential topic reminder when interviewing), and can help ensure positive feedback from coworkers when management asks them about you. If there's anything generalizable from the experience ("I once saw something similar and so I did such and such to try to confirm or rule it out") you can sometimes multiply the value to yourself and the company by documenting it, automating it, hosting a lunch and learn type of thing...
It all depends of course.. But in any case you have to be the primary person looking out for your own interests since it's the rare manager who will go beyond the minimum guidance, support, and feedback. On the other hand some engineers go too far on self-interest and seemingly start the fires that they then 'heroically' fight to get high level cross-team attention and fast-tracked up the ladder...
Pretty much everybody in this thread is putting the blame on management. Man, no wonder SV is such a shitshow. Either all management is that retarded or all the developers eschew responsibility for their actions.
I have noticed the very good maintainers are so indispensable that they hardly get promotion or let go into other stuff. That bound dies with the maintained product. Unfortunatelly for some maintainers it is layoff sentence.
Here's an amusing personal anecdote I find relevant, and would enjoy reliving by typing it out:
Years ago I had recently started a new job at a clustered database startup as a software engineer. Part of their on-boarding process was to have new hires go through the bug tracker and resolve issues which had been flagged for this purpose. They were supposed to be low-priority low-hanging-fruit type things that jumped all over the code base, good for acquiring some breadth of familiarity while being useful.
Well I got bored with that pretty quickly and started looking for the oldest high-priority bugs. To my surprise there were actually quite a number of them, and it wasn't like they were ignored - they had all been touched by many hands but seemed to be difficult to reproduce according to the comments from engineers.
One in particular looked especially terrible; the database would close connections as if they were idle during the data import process, causing the import to fail. The import was a very common thing as this product was supposed to replace large, existing FOSS databases, by providing a more scalable clustered solution. Customers would be doing this as the first thing they did with the product, and it's failing! And the bug was over a year old.
The thing with import is it's highly optimized because the databases are expected to be huge, so while the logs were showing the connection had been closed due to inactivity, that's very unlikely to be what really happened. But the engineers were all just pointing fingers at the customer side saying there must be a network problem causing the timeout under all the load.
But the way the database was implemented was rather complex with cooperatively scheduled coroutines running in their own per-cpu schedulers. It was very Golang-like, except this was not Golang, it was C, and we controlled everything. Knowing that the architecture was cooperatively scheduled, and imports would be causing a whole lot of write-bound work saturating the cluster on the backend, I figured it was probably a starvation problem preventing the coroutine responsible for servicing the sockets from running in a timely fashion, then some idle socket timer would notice and kill it.
Some instrumentation code later and a synthetic import running with the cluster on my slow personal laptop (you see I hadn't even gotten a beastly company machine yet, so running a cluster on my old personal machine made it very easy to saturate the local cluster), and I easily reproduced the failure with the instrumentation clearly showing it was indeed due to internal socket-IO coroutine starvation.
The way they had the socket-IO getting serviced was only when a given CPU's scheduler was idle. The mostly correct assumption was that there'd frequently be moments where no coroutine was runnable on some CPU, and that would be the best time to service the CPU's sockets. Well, it turns out sometimes that just isn't the case.
Coded a trivial fix to have a per-CPU timer service all the local CPU's sockets occasionally, like every five seconds or so, just to prevent the scheduler starvation from triggering a timeout, and submitted it for review as a fix to this ancient, horrible, and embarrassing bug.
Well as you can imagine this started a bit of a shitstorm requiring a lot of arguing back and forth and demonstration of the problem, proof of instrumentation showing this was possible, patch actually fixing it, etc.
The reason I'm telling this story however is that after the dust settled, and an urgent release was shipped with this fix, I had some new friends in high places at that company. One of the cofounders was in charge of the operations and support department and had been fighting with engineering for ages over this particular bug.
Obviously maintenance of neglected but important things can get you promoted. Especially in a smaller company where the important decision makers are watching and still give a fuck.
I wonder how big part of that shitstorm was due to "hey look, the new guy thinks he is smarter than us!" attitude. Which is often warranted, way too often fresh blood charges at the Chesterton fences.
Except for this time, the new guy was indeed smarter. Kudos.
Honestly there were many people on the team much smarter than me. It was mostly just a case of a fresh set of eyes, and I think their egos prevented them from seriously considering such an obvious oversight could exist in their masterpiece.
As for the shitstorm, no doubt my being new and irreverent played a part. But the real trouble was the senior engineers had been arguing it can't be on their end for over a year. They weren't stupid, it was just some overlooked details, and like I mentioned, ego probably got in the way. It was going to make them look quite bad regardless of who fixed it, just because of how long it had been ignored while impacting paying customers.
There was a subtlety to the bug I didn't mention. After I had posted the tested fix for review, it became known that someone on the team had already investigated the socket-IO starvation theory and was unable to observe anywhere near long enough intervals between schedulers idling. But what they didn't do properly was measure the idle intervals per-CPU. All they had checked was if any scheduler went idle often enough. The problem with that is every CPU had its own list of sockets associated with coroutines belonging to that CPU's scheduler. So it required checking the interval on a per-CPU basis, and that's when my instrumentation showed the problem under load. They just missed that detail in the early investigation, and nobody ever revisited it. It was literally a mistake of using a global vs. per-CPU "last-idle-ticks" variable in their instrumentation code, oops!
There is great value is maintaining old software, from an evolutionary perspective it is often the critical systems that have a reason to exist. An organism that has been around for a hundred years has a bigger change of surviving to the next day compared to something born yesterday. Its counter intuitive until you compare the life expectancy of an awk tree and a fruit fly.
Most principal software engineers I've known have been with the project since the beginning and live in symbiosis with the project.
It's a U-shaped mortality curve. More generally, many things show a Pareto power law distribution: if the thing has existed for 10 seconds, expected remaining life is 10 seconds. If it has existed for 2 days, expected remaining life is 2 days. And so on. It shows up in a lot of places.
Large enterprise solutions usually have maintainers or source a company to do maintaining for them.
Startups in today's world - not necessarily 'startups', but also new products - move too fast for maintenance; by the time they pivot, the old stack is easier to replace with whatever new fancy system was sold to stakeholders.
Also, most new software churned out today is built from some building blocks, usually community maintained, so no one is going to hire a developer to maintain <insert_fancy_open_source_framework_that_is_heavily_used_and_actively_maintained_by_the_community>.
It is disappointing how their post devolves into the same old tar pit of racial intersectionality nonsense that stunts and atrophies the mind.
"Concierges paid minimum wage" … many concierges make immense amounts of money for what they do, and often make more than manager of hotels. Most people don't even realize that with overtime most of the hotel desk clerks make more money than salaried managers at hotels. Not even to mention the hotel porters that run the valet, that make immense amounts of money.
It's all just the intersectionality nonsense of privileged, sheltered navel gazers that have no idea what goes on outside of their university and screens.
I really hope that this intersectionality degeneracy will be killed with fire and relegated to the embarrassing and shameful pile of mass mental abuse and degeneracy that it belongs. One can hope.
The worst part about it all is that it just utterly stultifies the mind like any other abused person exhibits, that can only, e.g., reference anything and everything through what their abuser would want or do. But, that was the point after all, creating abused minds that can only think in subservient and submissive ways. It's sad that that abuse of several generations now was allowed to happen.
Here's a reason I heard once. I've never heard anyone else mention it, so it's probably BS, maybe a lawyer or accountant can confirm or deny.
Developing new features is classed as "productive" or "R&D" or such, which means companies can claim some kind of tech subsidy or tax rebate on employees' wages for time spent doing it.
Maintenance tasks like fixing bugs and upgrading servers don't come under this classification, so they can't.
The incentives are thus set up for upper management to push against and actively neglect maintenance efforts.
Pure maintenance is a grind to keep the status-quo. Ideally we would love everything to be maintenance free.
There are exceptions to this. This usually occurs when the object or system to maintain evokes an intrinsic positive emotional response as opposed to being something purely functional or utilitarian.
Some people love maintaining e.g. old steam engines and keeping them in peak condition. However, even in those cases it usually amounts to select parts of the effort usually involving some physical/sensory satisfaction, while other parts are still considered a chore.
That's an interesting question, because it is clearly a function of culture, a culture of maintaining things. It is a culture that has to be fostered and cultivated and … well … maintained. And just like things that are not maintained, a culture of maintenance will not be maintained either if it is not maintained.
The way I see it, maintenance is a function of values and whether one earned what one has; a kind of positive self-fulfilling prophesy. It is why across the board today you can see a crumbling of the maintenance culture at all levels, even if there is clear clustering, from the poor who are given things they don't have to earn anymore and therefore don't maintain anything, to the top ruling class that has never maintained anything and has seen everything, from the most solid castles, to human beings, to ships, to centuries old culture and traditions as totally and utterly disposable if it even slightly stands in their way of achieving their usually megalomanic personal gain; to the common people today that have all manner of things, from phones to telecommunications to television to flight to institutions to cars … literally everything … yet lack any perspective whatsoever on how it was produced, the thousands if not millions of prerequisite steps and failures and attempts it took to produce what they just take for granted.
One that does not know the actual value of something will not and cannot ever understand the notion of maintenance. It's a concept that even the ancients knew, yet we have lost in many, if not most ways as people have lost all manner of sense of value that they don't even value money anymore to such a degree that they don't even think of it as anything but a mindless exercise they engage in to do or acquire the thing that stimulates their primitive impulses, whatever those may be.
I know some people are trying to push a maintenance culture again, by the stick of enforcement, but reality is that it simply will not work as long as the seemingly lasting negation of scarcity holds. Maybe one day people will realize that all the promises to pay or be paid for the work they did, as embodied in debt, is simply not going to be paid nor can be paid, and it all comes crumbling down and a natural order of scarcity is established that will make people want to maintain things, but until that time, this will only get worse with every passing day as you chuck your $1,000 phone aside very year for a new one without even a second thought.
I am dealing with a huge maintenance headache right now, upgrading an entire set of ancient Rails applications from 4.2 to 5.x. It’s a monumental task that involves fixing all the the things, from libraries to controllers to models to rspec tests. We’ve literally waited 5 years to do this. Why? Because pushing out cool new features have greater “business value” than maintenance. It’s all customer-driven.
One of the strongest reasons is Minimum Effort Syndrome. People do everything with a focus on having the minimum effort on everything. Maintenance takes work, sweat, burns neurons. The same as admitting that you're wrong takes effort. Giving preference when driving takes effort etc. Of course there's the money factor too.
The problem of maintenance is not new and is a global problem. Global as in any area that needs maintaining. This effects software systems, hardware systems, legal, political, medical, buildings, equipment, biological, etc., etc,. etc.
In the great scheme of things, it is ignored because it is not seen as a "here and now" problem. To see the importance of any form of maintenance in any field, you have to have a long term view of things. You have to understand the "costs" of not doing maintenance for the organisation or system in question.
I did not understand this long term view until I worked for someone who did understand and who was able to articulate the "costs" of not doing maintenance.
People talk about "tech debt", but are unable to articulate that every system has a cost due to its usage. This is irrespective of whether or not it appears to be bringing income into the organisation or system. This income can be energy, finance (profit) or some other "thing".
We live in a world in which everything decays due to ongoing use. In the case of computer systems, this decay can simply be based on changing requirements, bugs, old code running in new environments, etc.
If you cannot understand that the "cost" you pay today will effect the "cost" you pay later, then you'll not understand how important the art of maintenance is.
The example that jcadam gives about being "stuck" doing maintenance on "legacy" systems is a good example of the lessons about "cost" not being understood by the organisation. From the point of view of maintenance, anything you spend money to keep going is an "asset". It has a function that brings about a benefit for the organisation or system to continue in its existence. So, "legacy" systems are assets, if they need to exist then they need maintenance.
However, organisations have a mentality of assigning various areas or systems as "cost" centres and other areas and systems as "profit" centres. This inherently works against both the organisation and the maintenance of its "assets". The idea that some section or system is a "cost" centre is a artifice of the "accountant" mentality. This mentality is very prevalent through most organisations whether very small or very large and all the sizes in-between.
Is there anything that can be done about it? If your organisation is controlled by the "accountant" mentality then probably not. If this mentality is not in charge, then you should be able to bring a case for maintenance to those who have the authority to approve it. But, you need to show the benefits both short-term and long-term for doing maintenance for your organisation.
This is a great comment! From very early in my career I heard this "cost" center classification and I never understood it. Unless a company's hand is dictated by some type of external force such as regulation compliance, why on earth would a company voluntarily keep a "cost" center? Either the ROI is acceptable or it's not.
It ... Kinda is? The company started the project long before I showed up, continues to profit from the project, and is absolutely going to reap the rewards of refusing to maintain the stupid thing. This isn't about other people refusing to help with my favorite pet project, it's the company refusing to act and its own best interests.
Why do many companies have internal projects that do the same as well known open source software? Because someone was bored, or wanted job security, or wanted to feel needed.
You should not start a project if a readily available, production-strength alternative exists unless you have a very good reason.
Internal projects usually arw underfunded and have bad UI.
Imagine the following scenario.
Acme Co. has some piece of software that is critical to the operation for their business.
Dave is in charge of the software system. The software crashes often but Dave is always there to save the day.
Dave often has to stay up all night and is seen fixing problems at the final hour. Management takes notice of this and often sends out emails congratulating Dave for his hard work.
One day Dave leaves, and is replaced by Amy. Amy takes over the system and slowly and methodically fixes all the bugs that were causing the outages in the software.¹
Over time the code becomes more reliable, till eventually it runs smoothly.
Now what would management say if you asked which programmer was better. They'd likely say: "Amy is pretty good, but boy that guy Dave was a real rockstar!"
When you reward a behaviour, you will get more of it.
If you want software maintenance, then you need to reward it. But often, perversely we reward its opposite.
I'm working mostly as an ops person these days, and if I do my job well it makes my contribution look invisible. When things are going right, there won't be anything to notice.
1: In other words, code maintenance.