There are multiple cases of "killed by technical debt".
There's the case of mysterious and unsolvable breakage. The product simply stops working, and the team is unable to get it working again, period. This can happen with really ancient legacy products where the original team is gone, or young products that are written badly by inadequate teams.
There's the case of unpleasantness. A product is so difficult and slow to work on that the company simply loses interest in it, and shuts it down rather than suffering through more maintenance. This does not happen with products that are highly successful business-wise, no matter how bad the suffering, so it's really a business failure rather than a technical one.
There's real antiquation. The product is dependent on a product of an outside vendor that is no longer available/maintained. I've dealt with this on a mainframe replacement, and it was horrible. I've also dealt with this in Java, and it was plenty painful there too.
And finally, there's replacement. A product is replaced (or intended to be replaced) by a new product that does more or less the same thing, only this time with a smart new team, in a hip new language, and by the gods, this time it's not going to be stupid and suck like that piece of crap the morons on the old team built! Most of these projects fail before they ever replace the old, working code, so I'm not sure this counts as technical debt failure.
> This does not happen with products that are highly successful business-wise, no matter how bad the suffering, so it's really a business failure rather than a technical one.
One thing often feeds on the other. Because the system is hard to change, it does not get necessary features. Because it does not have necessary features, it provides less business value. Because it provides less business value, there is less of a budget for improving it. And so on.
Some of the worst examples of that are when a project uses a custom build of a library of which the source no longer exists and no record of the changes exists either.
Reintegrating future changes in the upstream is also made nearly impossible as a result; our tech lead made a change to numpy a few years ago, didn't manage to get it accepted by the project, and we're stuck with this version until the sun burns out.
If there are changes in future numpy versions we want, it's up to us to backport them, which is nowhere near our core business.
There's a lot to be said for standardization and 'boring.'
Well you could estimate the impact of backing out of the changes on the application side, with the upside being continued savings in operational complexity. Or, you could address the operational complexity with processes or tooling - which would be easiest with something like a developer OS image.
Yep, seen that. You might be thinking of tweaked builds of open-source components, but before package managers were so common, I've seen also internal projects with a `/lib` folder full of artefacts like `MiscDbUtils.dll` which are internal "useful" utility functions that are widely used.
Now add in a script that updates this artefact with the latest version, breaking changes, and it all goes wrong and it's hard to find the correct previous version of the binary artefact to build your code any more. Especially if the build has been broken for a while due to the project being on the back-burner and it quietly dies when no-one is looking.
Builds that involve non-version-controlled files that exist only on a certain developer's machine, and because that developer is a control freak (or overworked), he refuses to automate or put those files in version control...
I believe this is called "Job security." I've seen it a few times including a dev deleting the source code repo and substituting all copy of a set of scripts he was responsible for with compiled binaries. This was discovered due to a platform incompatibility between one of the hosts running the script and the binary wrapping. Data was then restored from backups and dev was summarily let go.
I should add here that the "antiquation" case is the one that has caused the most observed grief in my career. The forces causing failure are coming from outside the code/business (dependence on an outside vendor), and sometimes collide with forward momentum of other parts of the code (i.e. that graph library will never, ever work with Java 7, to name an example). These become life-or-death situations, and the tendrils of the product dependency are often deeply integrated. It might be easier to rewrite than to fix.
Also, this case can impact not just products, but organizations. You can still find teams dependent on an antique commercial version control system or IDE that greatly slows down or even stops work. I've tech-led jumps to new version control systems a few times, and it's always riddled with anxiety, strain, and management angst. (And it always makes the team far happier and more productive!)
Yeah, that. But even things like cvs, that were modern and hip in 1998, are still floating around 20 years later.
I actually thought Subversion would be the last version control system, when it came out. Of course, now it's git. Maybe someday we'll get something better, and git will look decrepit.
I have little doubt that git will be replaced eventually (or perhaps severely modified). It seems like a fad to me. It is indeed very powerful, but its UI is sheer insanity. At least SVN is very straightforward to use and understand. It just doesn't offer the distributed nature that git does, as it relies on a centralized server.
The cool thing about Git is that the underpinnings ultimately boil down to a key-value data store.
I'm not an expert on these innerworkings, but in theory there's nothing stopping someone from creating a new UI that maintains most or all of the same strengths, except that everyone already uses and is used to the current way.
I suspect if you came out with "SuperVCS" that was ultimately just a new UI on Git you'd have more success than releasing the exact same project as some kind of Git enhancement.
What UI? Using git on the command line is exactly the same as using SVN on the command line. At least for basic, every day things like status, add, commit etc.
> I've tech-led jumps to new version control systems a few times, and it's always riddled with anxiety, strain, and management angst. (And it always makes the team far happier and more productive!)
In the cases of this I've seen, it's always been because management and team priorities were unaligned.
Management in those places cared about a minimum level of productivity and minimizing risk.
Teams cared about maximizing productivity and their work days not sucking.
As long as teams kept managing to soldier through... rarely saw things change in those shops.
mysterious and unsolvable breakage: Helping another startup work through one now. It's a case of reclaiming functionality from a mystery outsourced codebase (without source control) meets inexperienced developers who try their hand at sysadmin plus a 100% rotated bevy of actors (the whole team, PM and all, have jumped ship), no documentation and no technical oversight. Offshore outsourcing adds cultural fun.
unpleasantness: I would expand this to unpleasant or incomprehensible. I have seen projects be de-resourced because of lack of management comprehension when they literally paved the best and most rapid path to profit (later taken successfully by the now-dominant competition).
antiquation: The best example of this I've seen was a hardware product an employer was developing as a joint venture in Taiwan early in my career. Engineers had made the decision to use a sucky chipset from a struggling company to save money, but the supplier went under and the API froze (bugs, missing functionality and all) before our product development could complete. The target feature set was literally impossible to implement on the hardware and nobody wanted ownership. Many millions of USD, wasted.
replacement: It can work out, just infrequently. Generally when it works it's a smaller system with well defined interfaces.
Would these be cases where Robert Martin's 'The Clean Architecture' [1] would help, where the core enterprise logic is separated from third party dependencies, making the latter easy to swap out and replace?
I'd imagine a number of these cases are caused by a heavy reliance on third party technologies that are no longer supported, or very few people still understand.
It's a great idea but in reality most third-party tools are going to work slightly differently and the abstractions will leak (unless they were developed against an existing interface, in which case you don't need to create the wrappers yourself anyway).
I think the optimal route is to not bother with (extra) abstractions and interfaces, but try to avoid using things that is unreasonably tied to vendor - unless you save a lot of time.
If the code base is not a pile of dung anyway, the cost of find/replacing and refactoring obsolete or replaced api:s once is so much smaller than the running costs of maintaining an extra layer of leaky abstractions for many years.
It is guaranteed that the abstraction will not work without a lot of changes anyway, and what typically takes the most time is the regression testing.
"They did it that way because they where stupid" is a ridiculously common assumption, when the correct answer is often "They did it that way because they knew stuff I don't know".
Well, I think one has to be alert to both possibilities.
This is perhaps a subtle and underappreciated reason that code quality is so important. If you're looking at an obviously well-written piece of code and you see something you don't understand, you can figure it's probably there for a good reason. If the code has visible sloppiness, it's much more difficult to tease apart the good parts from the bad.
There's obviously a lot of stuff I don't know then. Like the benefits of copy pasted code, or 300 column lines, or implementing the logic in 20 places when it has existed in the standard library for a decade. If only the ancient sage I inherited this code base from had left notes to guide me on this path of wisdom.
300 column lines: Support for management buying everyone nice, new, giant monitors.
Logic in 20 places: But what if we want subtle differences between each implementation?
On a more serious note, a product that I've worked on was started in about 1998 in C++. We support something like 15 different platforms, and we've got our own implementations of things like vectors because we needed a least common denominator codebase; the standard libraries of a lot of platforms didn't provide what we needed, or provided implementations that were incompatible with other platforms. By the time everything we needed to support was modern enough (in about 2010), the system had a few million lines of code, and replacing things with library functions/classes would've been a nightmare. New development is saner, but the legacy stuff is entrenched.
Development of that particular product moved to China and India last year, so I don't have a part in its development anymore, just build+release, because there are some legal benefits to releasing it from this country.
On the plus side, there are only a few platforms that they still have to support gcc 3.x on, and all the ones that ran on 2.x are out of support (until a customer holds a few million dollars in management's face, as happened a few weeks ago with AIX 5.1).
I've been there a few times, and the worst thing is when every five or ten WTFs there's something that seems as ill thought as the previous couple similar blocks, except this time it actually makes sense, as it implements (awkwardly of course) some important corner case.
Well, the correct answer is often also "They did it that way because it was a reasonable choice then, and they didn't have the benefit of hindsight that I have now."
Its also possible that whatever constraint they were working around no longer exists. Either way, it's a case of not tearing down a fence before you know why it was put up.
G.K. Chesterton, 1929:
>In the matter of reforming things, as distinct from deforming them, there is one plain and simple principle; a principle which will probably be called a paradox. There exists in such a case a certain institution or law; let us say, for the sake of simplicity, a fence or gate erected across a road. The more modern type of reformer goes gaily up to it and says, “I don’t see the use of this; let us clear it away.” To which the more intelligent type of reformer will do well to answer: “If you don’t see the use of it, I certainly won’t let you clear it away. Go away and think. Then, when you can come back and tell me that you do see the use of it, I may allow you to destroy it.
"They did it that way because they knew stuff I don't know".
Or "They did what they did because they were the first one's to do it, and it only looks whack in hindsight"
Or "They did what they did because they were living within totally different constraints - like having to support old crap browsers like ie9, or a 'lowest common denominator' of slow end-user PCs etc., or some old chipset on the firmware code they wrote, or some old language paradigm, old libraries etc."
They did something in an horrible way knowing it was horrible but because they have been asked to deliver a feature as soon as possible and at any cost.
This is a slippery slope. It lets a company move faster till it reaches the point that the software becomes an unmaintainable pile of hacks.
I worked for a company that grew considerably for 10 years and then lost its biggest client and folded quickly. We had spent a few years reworking our platform in a way that might have been successful enough to weather the storm of losing that client, but technical debt really slowed us down.
Technical debt may not have killed the company directly, but we have to wonder how we might have done if we could have spent more of our time on new development.
This is revenue diversification not technical debt. Companies with a single customer funding the business should be actively pursuing a high priority strategy to reduce this risk.
You misunderstand - our software itself was full of technical debt. We spent a lot of time dealing with the consequences of that debt, and I'm wondering if we'd have been able to hit vital targets sooner without it and possibly have survived.
The project wasn't killed specifically because "you have technical debt". It was killed because there was no way for anyone to be effective with the combination of poor undocumented code.
"We need to change the email message that goes out when someone registers". This took a team of (4?) people 5 calendar days to change. As a contractor, I had to vpn in to one system, then remote desktop over another vpn to another system. Building web apps, these dev systems were not allowed to talk to the internet at all, so things like pulling external dependencies (security libraries, templating libraries, etc) was impossible - pretty much everything was handrolled, largely due to this restriction.
The last big killer was that the system was not passing accessibility audits. Trying to determine where to make a change to any single element would take minutes to hours, vs seconds to minutes you'd normally expect. Much of the 'templates' used were the result of a SQL statement joining 12 tables (html_meta, html_form, html_link, html_grid, etc) and complex concat()s, so adding a page or making a change might take an hour to track down the appropriate collection of tables, then figure out a SQL script to run, then send it to the person who had permissions to make updates to the SQL, then wait and see.
Did the technical debt itself kill the project? Technically no, but the inability to do anything productive in a reasonable amount of time forced the project to shut down.
This is a great example of how technical debt 'kills'. It's not a murder, it's negligence and a slow demise.
I went through one of these projects. The tech debt was never as bad as you describe, but it was a small company operating on a short runway. It also taught me an unfortunate lesson about non-technical founders and the dangers of outsourced code.
The MVP for the company had been bought off the shelf. It worked fine, but the code was abstruse and utterly resistant to change. As the price (in time and dollars) of change requests grew, they sensibly in-housed development. Unfortunately, their clients had some idea what to expect in terms of features per day and dollar. Requests like "let us use our logo and custom color scheme" turned out to be serious challenges since every color and style decision was clumsily hardcoded, so we took far too long to achieve them.
Ultimately, we ended up a contract behind - bringing in business to fund delivering on the previous request. Most startups operate under the gun like that (with either fundraising or contracts), but they start there and labor to escape. We started solvent, and had no clear plan to break out of tech debt - a rebuild would have been too slow, 'working smarter' wasn't viable, and expanding the tech team would have come too late and too costly.
So, we died. Not because we couldn't do work, but because we couldn't do it at a competitive speed.
Seen this a lot. A lot of companies think they are "product" companies, but due to their unwillingness to push back on customers, they become custom engineering shops, bolting on little one-off mods to their project over and over to appease bad customers (or to appease POTENTIAL customers who haven't even bought the product yet).
Stop me when you recognize this one: "Hey your product is great, but we really want something that does [totally different thing]. If you just add that thing, we will pay for all the NRE and you can sell it to others as part of your product! Win win!" Advice to junior developers: If you hear such talk in the hallway, RUN!
We're an enterprise software shop, which necessarily means we do a lot of custom work, but we're careful to consider what we'll do. My mentor is an old hand that been through multiple exits, and in every meeting we have, he hammers this point. You're either a product shop, or a professional services shop, and if you don't know which one you are (or you believe wrongly) you die. Simple as that. The deeper you get into the consequences of knowing (or even forcing) which you are, the more implications it has for everything from product design to business strategy, and it's extraordinary how such a simple seeming thing effects such a vast amount of the company.
Companies that think they are a product shop but chase enterprise customers and do professional services often fall to appropriately charge enough for their services. Enterprise level customers require not only more features, more guarantees, and more support, then require more attention. Are you appropriately including sales time and expense chasing them to get a contract as well as support resources into your CAC? Are you appropriately accounting for all the added expenses (and future expenses including lost opportunities)? If not, you're probably losing your tail.
P.S. "Are you" is not directed to the OP but to the business owners/leaders that don't know what they are doing.
Yes, this. One of the big things I talk about with sales is the difference between changing a priority for a customer vs. adding distinct new things. As a recent example, a client wants better and faster feedback on the trial they're conducting (we're in the med-tech space), and we've already got a new dashboard designed and on our product roadmap. I'm more than happy to prioritize that over other product pieces if it'll get us the contract, because we're already going to do it, we're only changing the 'when'.
On the other hand, when they ask for something off the roadmap, we get into more complex issues (is this market-demand data, or custom work?) Particularly for grunt-level custom work (say, adding a support for tracking data on a niche wearable device that we don't currently support) there's a lot more questions that follow.
One of the most insidious of the latter, IMO, is that if it's just for one contract, then we're either hiring contractors/outsources (expensive, high management overhead), hiring new engineers (risky to grow headcount on a whim), or redirecting resources to tasks that are likely to have both lower ROI and provide lower growth for the re-tasked engineer. At our small size and need for high-quality people, I consider this to be a real cost too.
Then we also get side tracked and lose focus. Leadership and management expend too much energy trying to figure out what to do. Then they want estimates from the developers so they can figure out an estimated ROI. But they rarely seem to worry about the true income potential, focusing mostly on just the initial development cost.
Pursue it? Don't pursue it? If we do, how will we? Will we be >hiring contractors/outsources (expensive, high management overhead), hiring new engineers (risky to grow headcount on a whim), or redirecting resources to tasks that are likely to have both lower ROI and provide lower growth for the re-tasked engineer.
Then is it really surprising that this lack of focus and discipline trickles down to those doing the work and the work itself? Technical debt in the making. It starts at the top.
Absolutely. A brief story on tech debt from the top:
One of the more frustrating things I've experienced is when I got push-back for implementing more project management process (we have a very light process, but when I took over it was sticky-notes-on-the-desk level). The complaint was "we can't slow down development to do more process". Very through-the-looking-glass, as I, the Engineer, was arguing for more management process and Leadership wanted less.
But of course, accurate estimates were needed, just, you know, without making measurements. I implemented some process anyway. We actually increased development speed from less churn and lowered communicated (consult docs before breaking someone's flow), improved estimates, and we've been able to better contain our tech debt.
> Very through-the-looking-glass, as I, the Engineer, was arguing for more management process and Leadership wanted less.
I suspect you could go a long way with the heuristic "If engineering asks for more process, always give it to them."
It's not flawless, but it's like hearing Ron Paul call for a new regulation - when a request is that out of character, you should usually suspect that there's some good motivation.
This is a frighteningly accurate description of the company I'm currently at. They spent many years chasing after the enterprise level customers at the cost of alienating their smaller team level users and never had an answer when requests would crop up from the larger accounts asking for features ('just get it done'). Now they're trying to pivot back to the team level customers and are having an supremely difficult time dealing with the tech debt built up by addressing the enterprise level concerns. We tout ourselves as being a product shop when in reality we're trying to be both.
Sure, but the answer is pretty trivial: If you spend more than half your time on customization, you're a professional services shop.
He also added that if you're a product shop doing less than 70% off-the-shelf, you're probably screwed, while 90% off the shelf is really the ideal (again, enterprise software).
I think the more interesting question is "what counts as professional services?" This gets much trickier, for example when you start building out APIs to make second- or third-party integrations easier, is that "product" or "professional services"? It certainly seems like product building, but if you're doing for a customer's use, it gets real blurry real fast. If you're not using that API internally, you're almost certainly on the professional services side. If you do use it internally, is it rock solid enough that you can support and expose it without that support becoming professional services?
Drawing sharp lines aside, this all probably seems kind of trivial, but the first time I ran through our product design with him and we discussed this, I went back and radically re-thought a lot of our strategy, particularly at the customer interfaces.
Ouch, that's almost word-for-word from the company that died of debt.
It was enterprise sales, so customization was unavoidable, but no one was differentiating between big and small changes, or big and small buyers. The product was desperately struggling to do ~3 things at once, and still being sold to potential buyers on the promise of a fourth thing it would do "soon".
Enterprise customers which require enterprise sales require enterprise pricing. If appropriate enterprise pricing is not in place then you risk an enterprise failure.
I think every one of my former employers who have failed, did so by doing those 'customs.'
The last one even spun off a dedicated team that built (hacked) prototype customs in order to secure sales, then threw away the prototype and, after collecting the commission, told the new customers that it would take several years to get what they just saw in production but in the meantime we can do our existing product with some mods.
I imagine the pressure to accept these deals is immense though. Why let an innocuous little feature request hold up such a great deal?
That was part of the problem: the sales people couldn't push back on most requests because they were often quite reasonable. When they were more demanding, it was usually from a large prospective buyer so we had to bend over backwards.
The result was that we had huge tasks to do with no (current) revenue, and small tasks to do that took 10x as long as they should have. Since servicing existing revenue streams (even on reasonable requests) became so time-consuming, handling big enterprise demands became totally untenable.
We needed a lot of things. Better (or more technical) sales was one. More mid-level engineers was another. Mostly, though, we just needed more time or money.
Our target market was very reluctant to moving from a paper system to a software system, so there was a lot of foot-dragging and feature requests. That delay had just never been budgeted into schedules or runway.
And it was one line of code, after several hundred lines had been torn out and rearranged to ensure that different clients could insert their own pictures of different sizes without everything exploding. The whole team was desperately trying to force enough flexibility into the software that one-line changes could be made in <10 lines, instead of >100.
Last company I worked for did that to great success. All our customer got a custom version of our product tailored to their needs and their project. At least half our customers where doing something that needed at least one new feature that we didn't currently have. If you build your business model around that it is not necessarily a problematic model.
Working for a company like that, BUT they allowed me to completely rewrite 3 of the tools from scratch in a more modular fashion so that I could do these things without having to modify the old code bases. Now there are two other applications that I still have to support (and were written by a consulting company we no longer contract through). It's night and day. So this isn't really the worst thing if you're given the authority and power to take full control of an application and rebuild it and take ownership of it. Of course, this doesn't really apply to junior devs.
This part in Rich Hickey's Simple Made Easy talk [1] had a lasting impression on me. It really drove home the point on how a build up of complexity (one of the most common forms of tech debt, and one of the hardest to avoid) can eventually "kill" a project in exactly the way you described, slowly and painfully:
"But I have all this speed. I'm agile. I'm fast. You know, this easy stuff is making my life good because I have a lot of speed."
What kind of runner can run as fast as they possibly can from the very start of a race?
[Audience reply: Sprinter]
Right, only somebody who runs really short races, okay?
But of course, we are programmers, and we are smarter than runners, apparently, because we know how to fix that problem, right?
We just fire the starting pistol every hundred yards and call it a new sprint.
...It's my contention, based on experience, that if you ignore complexity, you will slow down.
You will invariably slow down over the long haul.
...if you focus on ease, you will be able to go as fast as possible from the beginning of the race.
But no matter what technology you use, or sprints or firing pistols, or whatever, the complexity will eventually kill you.
It will kill you in a way that will make every sprint accomplish less.
Most sprints will be about completely redoing things you've already done.
And the net effect is you're not moving forward in any significant way.
This. It's not like there is a sign saying "Technical Debt Required to Proceed"...but rather the slow death from a thousand cuts to productivity caused by having to analyze every potential system, process, template, stored procedure, etc, etc...to make any stable(ish) change. Even if things are loosely coupled and not dependent on each other...you still have to go in and make those changes. Telling this to a room full of non-understanding management is a whole different challenge...
Templates stored across a database is probably the worst thing I've seen repeatedly across projects. Just because a database can store everything doesn't mean it has to.
Some people really seem(ed) to have an allergy to plain files for storage. A plain file with OS level caching will beat most (if not all) databases for static content. But doesn't sound as fancy, so it's probably harder to charge a lot of money for it.
Template in one database table I can live with (pros and cons, multiple front-ends, etc). One template broken up in to 12 tables requiring an 100+ line SQL statement with concat()s and HTML interspersed is insane. Had there been an API or utilities with it to manage it, it might have been manageable, but nope - just "write some queries".
Also, just repeated your comment to a friend who said "that's the worst thing you've seen? can i have your job?" :)
(blown away by all the responses to my original question!!!)
Your story here makes me laugh if only because of a very painfully familiar memory. Luckily this wasn't a big production system but rather an internal tool (that I guess clients did also use but it wasn't part of 'production' per se) that was written entirely in perl_cgi filled with cryptic regular expressions written in complete spaghetti code and it would concatenate together entire webpages that had bits of them rendered by including the contents of files strewn all over the file system and of course the logic to concatenate all the html together was strewn across a fistful of files which were in disparate locations. In short I was once asked to make a simple change to some html and after 5 days of reading through perl_cgi and developing a pure hatred for Larry Wall, I decided to do a java re-write that took 3 days. I mean... crikey. Haha.
We have a similar application in PHP. By the time I've traced through all of the included files that are touched by a particular function, I've forgotten what I'm looking for. It's truly a nightmare.
Wait until you see a Turing complete DSL programming language stored line-by-line in rows in a database table and executed by pl/SQL using cursors, locking the entire execution to prevent concurrency.
I'm dealing with one internal project where this happens because there's an artificial IT/build distinction between "emergency" code push and "casual" raw database change.
This means lots of business-rule crap gets softcoded into the database or ini files (increasing complexity and bug-risk) just to support a hypothetical future where somebody needs it changed without a full sprint cycle.
And you aren't kidding about "repeatedly". Personally I associate it with the late 1990s/early oughts and ColdFusion; I think one of the early CF frameworks really encouraged it, and it kind of just stuck from there, particularly in Government web work. But it's probably wider than that...
This has been my experience. Since technical debt is hard to measure, it's more a case of a series of unwise technical decisions leading to a lack of productivity. Due to tight schedules, short-cuts are taken which lead to more unwise technical decisions, and you have a death-spiral.
It is, but the question is what "killed" by technical debt means. It's uncommon but not unheard of for code to reach the point of "we can't do that". Mostly, though, the proximate cause of death is a funding shortage or management decision to shutdown. Technical debt is just driving the cost overruns or inefficiencies that kill the project.
I don't disagree that the root cause in that situation is not engineering in most situations and that it usually is an indication of a symptom, but attempting to change the meaning of the term itself is not a great approach for communicating that.
ACK - I FORGOT THE BEST BIT... (well, maybe not best, but...)
No one could install anything locally - everything had to be done on their locked down remote systems (some were Amazon remote desktops).
For the accessibility testing, the auditing company used JAWS. The company I was contracting to had one license (or so I was told) so I couldn't have one. We actually tried to install JAWS on an Amazon desktop, but it just crashed the entire virtual desktop, requiring re-imaging. That happened twice, so we gave up.
So, the proposed workflow was, I'd make a change, push code, email someone to move that code to a system that an internal tester could look at it. I'd get an email back, then email the internal tester that the code was ready to go look at. The internal tester would go to the screen(s) in question, using JAWS, then "tell me what JAWS said". That would often take several hours or a day.
I was then supposed to make changes based on that feedback, then repeat the cycle until things were 'fixed', then we'd ask the auditing company for another test, which they'd schedule for 2 weeks in the future. Then we'd wait.
During the first iteration of this part, sr mgrs kept asking me "when will this be done?". I kept trying to explain that we didn't even know what "done" was - the auditing company just had blind folks that would use the system with JAWS enabled and if they felt it was usable, they'd say so, otherwise, they'd report back "hey, this isn't usable", and we'd have to start digging in again.
This account kind of comfort me in what I think of technical debt : most of time, the problem is most likely lack of documentation than anything else.
I don't see how a big project could be coded without containing anything specific to the project. And even then, the architecture by itself is unique and deserves documentation.
It happened to me twice. The first time was in a start-up at the beginning of the century, we were developing an electronic health record and we had outsourced the database abstraction layer to a company in Greece. In the beginning things went fine but after a while the development of the DAL went slower and slower and it became unstable as well.
Eventually the word came out: the main developer of the DAL framework had left the company and, according to the Greek CEO, she had been 'too smart' which meant that nobody understood her code. They had tried adding features but that had made things only worse and the DAL had started to crash randomly.
We tried to take over the framework by ourselves but it was written in Eiffel and the code was a horrible entangled mess. Eventually we rewrote it in Java but, being a start-up, we lost too much precious time already and eventually went almost bankrupt and were bought up by a competitor.
The second time was in a small company whose product was a search engine for consumers. The web layer was written in a mixture of JSF, JQuery and Ajax. While that combination already slowed down development on the front end, the main problem was the performance of JSF on the server. Because JSF is rendered on the backend, it placed a massive load on our server for certain heavily used pages and we just couldn't scale any further. Skipping JSF for a framework that was rendered on the front-end would be the solution but that was a massive refactor for which the company just didn't have enough resources. Eventually the company had to skip their search product and change their business model to a more community based website.
> We tried to take over the framework by ourselves but it was written in Eiffel and the code was a horrible entangled mess. Eventually we rewrote it in Java but, being a start-up, we lost too much precious time already
I wonder, would the result be different if you had access to competent Eiffel developers? How large was the Eiffel codebase?
Eiffel is an interesting language, with a somewhat unique feature-set (I think only Ada is coming close). Design by contract and static typing as core language features - if used right - should greatly help with both stability and ease of refactoring.
How large the codebase was is an important question, also how bad it really was. I saw a similar story - external codebase getting worse and worse from some point on - with Clojure at the center. The code quality was quite ok for a couple of months, then it worsened. At that point and for a couple of following months the codebase was possible to save - a single competent Clojure programmer would make a difference, I think. The project was less than 10k LOC then. However, more than 1.5 years and 60k LOC later, doing anything became nearly impossible for anyone, including original authors.
You had a search engine and rendering the search results was the bottleneck?
That's really weird. Don't know a lot about JSF but other templating languages are usually really not ever the bottleneck. Maybe if you have some giant table with thousands of cells each with its own complicated template directive (for loop with conditionals etc).
This sounds less like technical debt, and more like liabilities of over engineering. Possibly feature creep.
That is, technical debt is not necessarily tangled over-engineered code. It is more compromises that were made to actually ship and operate in the world. You can see this in the world with devices.
Consider, technical debt is the reason you have AC delivered to your house going through as many converters as you do devices. Often to the same target power characteristics for those devices. It is not the reason that your coffee machine that also grinds and whatever, is likely to fail within the year.
Another example; Technical debt is the reason we are still predominantly using petrol for automobiles. It is not the reason the dashboards are horribly non-responsive on modern cars.
> Consider, technical debt is the reason you have AC delivered to your house going through as many converters as you do devices. Often to the same target power characteristics for those devices.
Bad example. AC power has many desirable characteristics for the local transmission grid. If you were to do the grid over from scratch you'd still use AC. You're also too focused on household electronic usage, which is a very tiny percentage of the overall electricity used.
It's just an illustrative example. And I'm going to bet that most of us, the vast majority of us, really only have experience with household usage. So it would make no sense to get into other usages, which most people won't understand.
HVDC does have advantages in certain scenarios (very long transmission lines, for example) but parent is still correct--the majority of the grid makes way more sense with AC.
I meant my follow-on to be a concession, but worded it poorly. I thought it had advantages, but yes, I was thinking small appliance mainly. In particular, in home. And not just computers, but lights and control panels. Seems many things all use the same power characteristics and are now becoming complicated by dealing with AC.
Which, amusingly, is fitting for the tech debt debate. Eradicating some choices from the project is likely to be missing the point. Just as eradicating AC from all power would be short sighted/wrong.
AC is much better in the home. There is no way to get around the fact that you need massive wires to supply low voltage at high amps.
It is much cheaper to have a power supply on every electronic device turning 100-200 volts to 5 volts than to have one big power supply turning power line voltage to 5 volts. Of course a lot of computers need 3 volts or less, so the power supplies exist anyway. It is also more efficient big power supplies running at low loads are inefficient, the power supply on each device is sized to what the device needs and so it more likely to be operating in a high efficiency area.
>AC is much better in the home. There is no way to get around the fact that you need massive wires to supply low voltage at high amps.
That's orthogonal. What you really mean is that you want high(ish) voltage to distribute power in a home, in order to mimimize losses due to wire resistance over distances of dozens of meters.
You don't need AC to do that. In fact, with modern power electronics, the switching converters we now use for supplying LVDC to our devices can work just as well with DC as with AC input power.
The primary advantage of AC over DC is that it can be converted between voltage levels easily with transformers. But today, we can do the same thing with DC using DC-to-DC converters. These didn't really exist in an economical way before a couple decades ago, maybe even more recently.
If for some odd reason, western society decided to re-engineer and replace the whole power grid, it's quite likely I think they would simply switch to DC for everything. With deployment at that scale, the cost issues with the equipment should go away, making it no more expensive to replace everything with DC converters than transformers. DC is more efficient than AC because it stays at its peak voltage, and because it has no skin effect. But the technology needed to make it inexpensive to use for power transmission has only been around for a somewhat short time (namely, modern power electronics). Up until recently, it was simply a no-brainer to use AC because of its simplicity in generation, transmission (with transformers for stepping up the voltage), and usage (with AC motors).
> it's quite likely I think they would simply switch to DC for everything
I'm not sure. AC has some important safety considerations that would make it better even if the efficient was significantly worse.
Switches, fuses and circuit breakers that work with DC are more expensive than AC. When a circuit opens there is a spark, and this spark can in some cases create a conductive plasma. With AC the wave goes to zero and the plasma disappears, while with DC it continues. There are cases where a DC fuse blew but the fuse continued to conduct. Of course this can be engineered around, but generally with larger and more expensive parts.
When someone touches power accidentally, AC is slightly safer. With DC your muscles will grab and never let go. AC gives you a chance to let go. This is a low probability thing, but is a factor.
The guy who wanted us to debate is wrong for one other reason though: I'm approaching the limits of what I know on the subject, while you seem to have a lot more knowledge.
The desire for debate was to increase our collective knowledge. Not to prove someone right. I am fully comfortable with the idea that I was wrong. You both have knowledge I find interesting.
I'd be interested in you two debating this more, since you both clearly know the topic better than I do. This post is reflecting what I thought I had heard. But, I am not in this field.
There's really nothing to debate; the guy I replied to was totally correct about everything except the bit about "AC is much better in the home", where I pointed out that he really meant that a high voltage roughly where our current AC systems are (120V-240V) is much better in the home than some kind of low-voltage DC system, and that with modern technology, it would probably actually be better to have a DC system. But realistically, that's not going to happen because the gains (probably very minimal) aren't worthwhile compared to the enormous cost of conversion, given how standardized our current AC system is and how all our infrastructure, point-of-use devices, etc. are all designed around that.
Basically, he was assuming practical real-world considerations, I'm going off on a tangent about ideal conditions. His argument is about whether it's better to stick with the current AC system that your house has, or if it's better to install a low-voltage DC system to supply 5V, 12V, etc. to all your devices from a single, central, whole-house power supply as many people who don't understand electricity will frequently suggest. He's completely correct: low-voltage DC is a terrible way to supply power over any distance more than a meter or two because of resistive losses, so it'd require massively large copper cables or busbars. And power supplies are generally very low-efficiency when operated at low load. So our current approach (separate little optimized power supplies for every device, plugged into a higher-voltage AC supply) is actually optimal.
I was never arguing that an individual should replace the AC in their house. My argument was, with current technology, the AC setup can be seen as tech debt.
Which seems compatible with what you are saying, but the parent was specifically claiming I was wrong.
That is, you seem to be echoing my point. But seem to be claiming it is different. What am I missing?
I wouldn't call it "tech debt". Present-day AC systems may not be completely optimal (given current electronics technology), but they do work well.
As I understand it, "tech debt" is something that has to be reckoned with at some point, or else you're going to have real problems in the future (just like refusing to pay off a money debt will generally cause you real problems at some point when the creditor sues you and gets a judgment). You can't just let it go on forever; eventually you need to "pay it down" (by cleaning up the codebase, migrating to newer technologies, etc.), or else catastrophe happens (the company is unable to compete and goes under). One common factor cited in these stories is that the code becomes too unmaintainable and unreliable: too many weird changes for customers pile up and introduce serious bugs which cause the product to not work properly.
This isn't like that at all. We can go on with our current household AC power systems indefinitely. Maybe we could get a 1% improvement by switching to DC systems (at an enormous cost because most of your appliances and devices won't work with it without adapters), I don't really know exactly how much better DC would be (not much really), but what we have now works fine. Furthermore, it's not like the whole electric grid system needs to be changed: it's entirely possible, for instance, to switch distribution systems to DC and leave household systems AC. Instead of distributing the power at 30-something kVAC in your neighborhood and using outdoor transformers to step it down to 240VAC for your house, it could be distributed in DC form, and those transformers replaced by modules which convert the 30-something kVDC to 240VAC. In the old days, this was hard and expensive to do, but with modern power electronics it's not. But even here, the question is: are the gains worth the expense? And the answer is very likely "no". (For reference, I'm not a power engineer, I just studied it in college as a small part of my EE curriculum.)
So this does not, to me, resemble "tech debt" at all. It's just a system that we use for legacy reasons and which is extremely reliable and works well, even though it might not be the absolute most efficient way to solve the problem. This is no different than many other engineered systems. Perhaps you have a decent and extremely reliable car. Could it be better? Sure: you could build the chassis out of carbon fiber, use forged aluminum wheels instead of cast, etc. all to save weight and improve fuel economy. Are you going to do that? Of course not, because the cost is astronomical. There's cars like that now, and they cost $1M+.
So for AC systems that we're talking about, the question is: what is wrong with them that we want to consider replacing them with something else, instead of just sticking with them even if they're not quite as efficient as they could be? Because the cost to upgrade them would be enormous, so you need to have a very good reason.
Most instances of tech debt are things you don't have to deal with. Usually, it is the term pulled out for things people don't like. Or generally deprecated methods that have better replacements, but still work.
It is this second sense that I was latching on. It --tech debt-- will drive decisions today. But it is not clearly bad. Just a constraint on current decisions that was made in the past. Often for decent or really good reasons.
Bit rot is another term for things that start to decline in how well they work. That is generally different, though. Usually a by product of replacing implementations without keeping functionality. Such that people relying on old behavior are left cold. (I can see how tech debt can easily turn to bit rot. But it is not required.)
Consider, LaTeX being an old code base is often used to call it tech debt filled. People want to modernize it. Not because it doesn't work. But because they think there are better ways, now. And they do not consider all of the documents made on it as infrastructure.
Now, i concede that all of this is my wanting the terms to have unique and actionable meanings. Elsewhere I was told "tech debt" is a catch all term now. That seems to rob it off usefulness.
Edit:. I forgot to address the monetary aspect of the analogy. I like that, to an extent. But most debt is taken in very specific terms financially. Unlike colloqually termed debts between friends. That is, there is no interest in this metaphor that works. Nor is there a party you are borrowing from.
>Most instances of tech debt are things you don't have to deal with. Usually, it is the term pulled out for things people don't like. Or generally deprecated methods that have better replacements, but still work.
I'm not so sure about this. To me, "debt" is something that has to be paid eventually. Otherwise, why use the term "debt" at all?
So if something works fine, why waste your time and energy replacing it with something newer?
Usually, the reason for this is the assumption that sticking with something deprecated will eventually bite you in the ass: something you're depending on won't be supported, will have security holes that won't get fixed, etc., and you're going to wish you had fixed it earlier. So this is a valid use of the term "tech debt" IMO.
But if something is just something someone doesn't like, that isn't "tech debt" at all. I don't like .NET, but it's invalid for me to call all software written in .NET "tech debt". I don't like Apple's ecosystem, but it would be pretty ridiculous for me to call all iOS software and apps "tech debt" when many millions of people use and enjoy that software every day.
So, for your LaTeX example, I don't consider that tech debt at all; instead, it's just like iOS and .NET software to me. If someone doesn't like it, that's their problem; the fact that it isn't brand new isn't a problem for me and all the people who still happily use it.
So personally, I think anyone using the term "tech debt" to just refer to things they don't like is using it incorrectly and in a totally invalid way.
I find this a compelling view. But, I urge you, just google technical debt. You will see the definition: "Technical debt is a concept in programming that reflects the extra development work that arises when code that is easy to implement in the short run is used instead of applying the best overall solution."
So, in this case, AC/DC fits if we agree there is a chance the "best overall" solution is DC. (Which, I fully grant, is not a given.) There is also a bit of playing loose with "short run."
Then, skip back to the top of this thread, where you will find: "products that are written badly by inadequate teams" and "case of unpleasantness" and "A product is replaced (or intended to be replaced) by a new product that does more or less the same thing, only this time with a smart new team, in a hip new language..."
All of this is the first, most highly voted, post. The next post is a highlight of poorly engineered solutions.
My point? Find a case study that has the usage you are referring to here.
Now, certainly rhetorically it has this appeal to people. But I have never seen it used in a way that it fits the metaphor. Just used to hit the emotional strings of "you must pay back your debt!" While usually claiming that the design or lack of some technology is the debt.
I think we're going off on a tangent here, but even with that definition from Wikipedia, there's no such thing as "the best overall solution". Everyone is going to disagree about that; the best you'll get is a consensus. For instance, back to LaTeX, there's countless academics out there who use TeX/LaTeX/whateverTeX for writing academic papers, and getting beautiful results while not having to mess around with a WYSIWYG editor like MS Word and just typing in some simple formatting codes. That's what *TeX was designed for and has worked well for for ages. But I'm sure you'll find a few people who say this is bad because it's "old" and that they should switch to the latest MS Word for everything, and rewrite all their papers in the latest MS Word. If you look really hard, you might even find someone who thinks both are bad, and that all academics should rewrite everything in WordStar.
"The best overall solution" is up for debate. It's the same with programming languages; one team will say that C is the best overall solution for a certain problem, another team will say it's Python, another team will say it's one of the .NET languages. I'm sure you can find plenty of engineers who will claim that mission-critical real-time avionics systems or automotive ABS controllers should be redesigned to use x86 CPUs and run Windows and have the code written in C# instead of using C/C++ and running on a small RTOS on an embedded microcontroller.
The implication I see with your Wikipedia definition is that implementing something easy in the short run instead of something that really is the best overall solution will eventually lead to more work to fix the shortcomings of the quick-n-easy solution. So, like I said before, a "debt", because it has to be paid back eventually (with work). The problem I see is that not everyone agrees on what is the best overall solution, and unlike a money debt that's easily seen by looking at a dollar figure, the only way to really know how much "tech debt" you have is through experience, i.e. accumulating it and then finding out over time how much work you have to expend to fix things when your quick-n-easy solutions start having real, demonstrable problems. If your solution has no actual, demonstrable problem (e.g., you use LaTeX and it continues working great year after year for your use-case), then I don't consider that to be "tech debt" at all, even if some people don't like it.
Yes, even light bulbs. A typical household LED is very easy to run off AC. You just need a capacitor big enough to hold the charge between each cycle of AC (which is very little). More information here: http://www.ledsmagazine.com/articles/2006/05/running-leds-fr...
It'd be vastly more expensive to wire up an entire house for low voltage DC than it is to include the simple rectification components in every light bulb. In a house you're talking about many wire runs of many dozens of meters. This is not a good environment for low voltage DC at all.
I recall seeing IEEE articles talking about the DC wired home. I confess I stopped paying attention, as it will be a long time before this is actionable for me. Can't claim surprise to know that I had things that were wrong.
Of course, the cynic (and, ironically optimist) in me still has this as evidence that "technical debt" is often used in BS circumstances by people that just don't fully understand the reasons for the things they are talking about. :)
Saying that technical debt is only deliberate is an old argument[1], but usage defines meaning and modern usage is that "technical debt" is a catch-all term. It just means bad code we know should be fixed.
Stretching the debt analogy, you can go bankrupt from payday loans (the "just push it out" tech debt) and from getting too big of a mortgage to build/fix up a house (over-engineered tech debt).
They upgraded the server of course, to as much as they could afford. But it wasn't enough, the rendering load soon caught up. First of all because their number of visitors grew, but also because they wanted to add new features to their JSF pages and every new feature required extra rendering power as well.
That was considered but it would of course take some refactoring on the back end, and it would still cost quite much in hardware.
The thing with JSP and JSF is, they do ok as long as your content is relatively static, because then rendered content can be cached.
In case of this company, their most visited page was the list with search results which by its very nature was not very static at all.
Every problem is different, so I hate to judge, but what you're saying doesn't add up to any experience I've had.
It sounds like your company seriously screwed up the design if you can't scale your web tier code horizontally. I've also never had a view technology take up a significant chunk of cpu resources - it's always the Java code carrying out the functionality. E.g. I would expect the largest factor in CPU usage in the list of search results to be... generating the data for the search result. If the largest factor was rendering the result, then something was probably seriously wrong.
"Eventually the word came out: the main developer of the DAL framework had left the company and, according to the Greek CEO, she had been 'too smart' which meant that nobody understood her code."
OMG no - run for the hills.
95% of software systems are not inherently sophisticated - they are 'complex' - yes - maybe there are many features, and moving parts - but there are no pieces of the system that should be hard to understand by anyone. Decent architecture + decent design and coding and an entire banks system should read like a long, but well articulated user manual.
Unless you're doing super low-level stuff, complex algorithms, heavy math stuff, or issues with massive scale or performance etc. ... the end result should almost be mundane in most cases.
The closest I've come was a Rails project I inherited from a star developer who had just left the company. It was a B2B project that involved importing large Excel spreadsheets of various different formats into a standardized database for itemized review.
The code was pretty sloppy, but didn't deviate much from standard Rails idioms. Not many people on the team understood Rails well enough to read it, but I did. Bug reports were constantly flooding in. I suggested taking a sprint to build up an integration test suite and then letting loose on the backlog.
We did build up a sufficient test suite in one sprint. But the bug reports never slowed. By the time we had the confidence to truly start tackling bugs at speed, the battle had been lost. We had been so busy writing tests that we forgot to manage the bug tracker. The impression was that we were overwhelmed and unable to make progress. The project was swiftly closed.
People remembered that codebase as an exemplar of sloppy code and technical debt, but that's not the lesson I took from it. I had seen, and others would see later, much worse. The lesson I took was that perceptions are as important to manage as results.
Excel imports with Perl did pretty well for me. I was pretty careful on insisting on some rules for the sheet data and enforced them strictly with decent debugging info for the users.
I still think Robustness principle[1] is a croc and strictly controlling inputs is one key to happiness. It also, frankly, helps your users in the long run by giving them exactly what they want and it actually cuts down on the amount of thought they have to put into it. Chaos and disappointment do not make a good user experience.
Ruby has had good ETL libraries for a long time. In my opinion, our product team was too lenient concerning the format of the Excel files. Asking customers to fill out a template spreadsheet to submit to our system, rather than letting them submit any old XLS file they happen to have on their computer, would have gone a long way towards simplifying the problem space.
EDIT: I know this version won't support escaped separator/newline characters, but I made it for a specific use case in which I knew that would not occur. Adding that functionality would make it a little messier, but still not too bad.
EDIT2: Thanks for the interesting comments! Not so trivial after all!
Perhaps a more accurate version of what I was attempting to say above is that 'it is often (not always) easy to build a CSV parser to interact with one specific program'. The four line version above works perfectly for reading the type of files I designed it for. If you want to work with human created, or more complex variants of CSV, all bets are off.
You need a lot more than that to handle CSV in the wild (quoting, Unicode, line termination, etc.) but the real killer I see is when it's edited by humans. The special cases for errors and inconsistencies will add up quickly; in some cases you may be able to reject invalid data but you may not have that option or an easy way to tell whether any particular value is wrong.
Excel takes that, adds some fun things like people using color and formatting to store data, and things like Excel auto-corrupting values which look like dates and may not have been noticed before you do something with the data.
I know of at least one company whose entire business is handling this stuff. They find growing companies as they hit critical mass and need to move their Excel data into a real database. The product is just "Your data is hideous and was entered by hand without validation or formatting; it'll never convert and it'll be wrong when it does. We can help."
They handle all kinds of theory and technical stuff, like normalization and processing Excel-corrupted dates. But they also handle a lot of easy-but-agonizing tasks like regularizing single quotes into apostrophes, which crop as soon as you let humans enter free-form data.
I used to use Google Refine (now OpenRefine [0]) for this. It lets you load up the data and then apply rules to see if they are mostly correct. It doesn't get you all the way, but it is better than going blind on manually revising a huge Excel "database".
I'll try to remember. I ran into them at a career fair a few years ago, so it's not leaping to mind, but it seemed like they had good software and a great market niche.
Let's not forget Japan Post's CSV for all the Japanese Address data that contains some lines that are line-wrapped, that is, one record spans two or more lines in the CSV file. A line-wrapped CSV... I just can't even.
That is very interesting, thanks! I hadn't thought about Unicode or tolerating human error. Although the times I have worked with it have been when it is a transport medium between two computer programs.
That's definitely a less-aggravating situation by far. I've had a lot of cases where a significant amount of specialist human time was in a spreadsheet and it's really made me wish there was an Excel-for-data which acknowledges how many people are using it for semi-structured data like this.
Relational database can be expressed in tabular form, but tabular data is not necessarily relational.
> (A "relation" in relational theory is a table with a name, fields with names and types, and the data in the table.)
A relation is a system of one or more functions (in the mathematical sense) each of which has a domain that is a candidate key of the relation and a range that is the composite of the non-key attributes.
Interesting definition. Do you have a source for it. It seems ambiguous.
From the Wikipedia article on relational databases, subsection relational model.
"This model organizes data into one or more tables (or "relations") of columns and rows, with a unique key identifying each row. Rows are also called records or tuples."
I submitted that edit before the post had any replies.
Also, I did try to make clear that the given code was created 'for a specific use case in which I knew' that the format of the input files was tightly defined.
You can define a narrow subset or version of CSV that is trivial, but that doesn't reflect what one finds in the wild as "CSV", which was not systematically defined or described until well after many mutually incompatible things by that name.wdre well established.
Thanks for your thoughts. As I have stated elsewhere, the code handles all of the cases I needed it to handle, due to the stability of the input file format (which was emitted from another program). I don't see that this should be too hard to believe.
I also said in my second edit, on the top line, 'Not so trivial after all!'. If I was putting on some kind of act, wouldn't that have been dropping it? Further, I noted in my first edit, before I had received any replies, that I 'know this version won't support escaped separator/newline characters', so I am not sure what you were trying to add with your example?
I think that my central point (and I totally accept that I didn't express this well) is that depending on the specifications of your program, the required CSV parser /can be/ very short. When one compares this to other data exchange formats, for example JSON, it is clear that the barrier to /entry/ is much lower. The shortest JSON parser I could find with a cursory look was 200 lines of C.
I totally appreciate that to write a CSV parser that works for all cases would be extremely longwinded. It has been interesting to hear other people's experiences and opinions about that. But the fact remains true that /in some cases/, depending on the requirements of the program, the parser can be very short.
> We all recognize the classic developer I-could-build-that-in-a-weekend hubris when we see it. :)
It is funny you should say this. I needed the CSV parser because I thought it would be fun and interesting to see if I could build an anti-malware tool in a week (I am taking a malware detection class at the moment, I wanted it done before the next lecture). I did not expect I would be able to have anything good working in that time, but by the early hours of the next morning I had a perfectly functional anti-malware tool. It can use ClamAV signatures (so it can detect everything(?) that ClamAV can), runs in parallel, has a nice text console with DSL, and is fast enough (processing 210k small files in ~5 minutes, checking against ~60k sigs). It is about 650 lines of Erlang (including comments). I am saying this not to boast(!), but to make the point that I greatly underestimated how productive I could be, beat my expectations by many fold, then people comment about my hubris online the next day. It is funny how life goes!
Every failed product/project I've worked on in my professional career, which had full intent to ship from the start, was killed by technical debt. It's usually indirect, but it's always the root cause.
It takes many forms:
* Too buggy to ship, due to a creaky old code base being over-stretched to a product with too high reliability/experience expectations.
* Product form factor, efficiency, user experience not good enough to sell well, due to spaghetti code base which couldn't be whittled down to removable pieces. Result: large runtime, more expensive, less efficient hardware.
* Existing old codebase deemed too bad to ship a product, requiring a rewrite-from-scratch, but timescale too long to make any sense -> product killed.
It's difficult to elaborate more while maintaining some discretion about exact companies and projects. The general point is: technical debt isn't just some fuzzy intangible issue — it indirectly creates enormous costs in people and time, can affect the physical form products take on, and impact the user experience. Products always get started without taking this debt into account, but when it's finally realized, it can change basic features, and then it kills them.
Products are designed with faulty assumptions about what existing resources can be applied to them.
Interesting that you talk about project that never shipped, When I read OP's question I was thinking about already-shipped products that became too hard to run and maintain.
I am curious how long your products/projects were in development for before falling to tech debt? Were these net-new projects?
> When I read OP's question I was thinking about already-shipped products that became too hard to run and maintain.
I've been mostly in consumer electronics related companies, where a product which ships and then becomes too hard to maintain usually doesn't "fail". It just gets phased out. In a way, this is another way technical debt has an indirect, but large impact on products: obsolescence becomes a necessity. Not so much planned — which implies malice — as simply realizing it's not possible to maintain indefinitely.
> I am curious how long your products/projects were in development for before falling to tech debt? Were these net-new projects?
Usually very quickly, or after far too long.
The better projects know ahead of time that there are Dragons lurking in the code base. But that's effectively saying there are projects which never even got past brainstorming because we knew the technical debt was too high.
On the other hand, there are projects where it only becomes apparent how much debt there is after a lot has already been invested. It's like you'd expect, e.g "There's a performance problem because of a basic primitive this library uses everywhere. And that was originally a workaround for a compiler performance bug. We could fix the compiler bug, but it turns out other libraries relied on it..." and so on. Extra time-to-market makes a product make less and less sense — fashions change, hardware improves, new tech arrives — and so it gets killed. Or worse, shipped.
The last place I worked at will die because it will take them years to migrate from Oracle to postgres due to "technical debt" (the codebase is coupled with the database to a hilarious degree; business logic in triggers, huge plsql packages, plain sql queries in the java codebase, halfassed homerolled ORM). They're not getting as many new customers as they could because, for various reasons, the Oracle licensing terms are now unacceptable for the new customers they have been in contact with over the last two years.
That's the most concrete reason I can come up with why the technical debt will kill them, but there's plenty of vaguer reasons why it's been killing them for the past 5 years and will finish them off over the next 5. The attrition rate have been around 20% a year since I joined. For most of the time I worked there they compensated somewhat by hiring new people. Word has gotten around though, and they've run out of qualified candidates willing to work on their mess. Hell, we even had a couple of gifted hires leave after a month or two while shaking their heads.
My current workplaces main product is using the same tech, is the same size (loc) and has the same functionality of the other company, but serving a different market. They did the oracle to postgres migration in 2 months. 2 MAN months, one guy.
New workplace: 15ish developers, serving the same amount of customers, doing similar revenue, making stable releases every week
Old workplace: 80 developers at its peak, doing non-hotfix releases around every 3 months. Just a mess in every way. Mostly stemmed from the codebase and the architectural choices that had been made along the way.
Hey, sounds like we worked at the same place! That, or the "wedded to Oracle for life" is a common antipattern. I'd add "shared everything architecture" to the horrors.
Yeah, once you get that deeply entrenched in Oracle, it's almost impossible to get away, and after that experience I vowed never to work at another Oracle shop.
Yes, and pretty easily if you buy into all the new features that no one else will ever have all of. If you stay clean with simple storage, compute, DB, email, then you should be okay.
Ideally you'd have some kind of plan though from the start, for which other cloud provider you would use and how the services would map, in case using AWS becomes untenable.
Cloud product life cycles should definitely be more interesting. Azure for example already has a "classic" model and the new ARM model. Either way, avoid tightly coupling code with some external vendors service.
I don't know if I've ever seen a successful database transition for a large project at a large firm. You basically have to build that from the start.
Doing it after the fact in a politics-heavy organization is confounded by not just the technical difficulty of the task, but the glad-handing and perception management that has to happen to keep your team from getting fired during the process.
I was CTO of a company that had a two-week outage due to technical debt. I didn't sleep much for any of it. We fixed it, and we'd lost about 30% of our subscriber base in that period. The company took on new funding to survive, invested that in a new set of products, and shuttered the old stuff just to stay afloat.
I am currently working in a business where there is a nearly 8-year old Rails app (600+ models, 250+ controllers, 400+ libraries, LOC around 60k), that sits at the heart of everything we do.
The company is struggling to grow and believes the cause is that engineering is slow. We have asked to refactor this code base multiple times, and point to the technical debt as the cause features that should take a day to implement taking between 3-4 weeks, typically.
It is only recently that the penny has finally dropped and they've realised if they don't invest in replacing this thing (there is too much technical debt to fix, we're calling bankruptcy and moving to a brand new architecture piecemeal), the business is likely to fail within 1-2 years.
That means my current employer is likely to go bust because of technical debt within 2 years max unless we become really good at fixing this.
IMO this is the price to pay for a dynamically typed language. 60K LOC is not much in a static language, you can use tools to refactor it easily or to visualize the control flow.
But with a dynamically typed language? Its a nightmare. You change one thing and cannot possibly know what else could have gone wrong.
The legacy app has > 80% test coverage. Refactoring is still slow because there are all sorts of business assumptions put into place that add functionality without ever questioning the need for it.
Basically, for a long time, the company never really re-evaluated what it had learned and spent time trimming things down, so as a result there is this ungodly mess. At the heart of what the business does, there is no real need for more than a dozen models. So why do we have so many more? Nobody ever refactored away stuff we didn't need any more, and so weird things happen.
There is also a coupling issue that is endemic to all monoliths. We're moving to a micro-service architecture with clean domain separation, and we'll probably go to 1/10th of the code base in LOC terms within 12 months, even if we move some of that functionality into Go, Java or Python services (all options).
Unit tests just make the mess a bit easier to solve, but are far from a perfect solution. Wanna move and rename a function? Do it, then spend hours fixing your 10 broken tests, writing new ones, and test you app for hours because unit test dont cover integration. A static language + an IDE does it automatically within seconds.
I work on such type of codebase, but we have a fully covering testing suite, so applying changes is not a problem (interestingly, I've just realized that the line count of the testing code is 50%+ more than the base application code itself).
So ultimately I think company culture (that is, emphasis on automated testing, for dynamically typed languages) is the crucial factor.
I would say that 8 years to write 60k LOC is slow. I worked as the sole GUI software engineer for a hardware firm and wrote > 100K LOC in 3 years, not including the test projects proceeding the actual real project. This was in C++, and included client/server stuff, entirely custom resizeable GUI, OpenGL 3D graphics and modelling of 3D assets and textures etc too. And getting it running under OSX + Win32, fixing issues on both.
And that wasn't a stressful place to work with insane deadlines - it was fairly relaxed for the most part.
You can't compare C++ with Ruby code. Rails code especially can very easily become a hairball where everything happens "somewhere else". You don't have static type checking or other compile time hints to figure out what is going on. You can't see which functions are called from which call sites. Refactoring tools? Forget it. Ruby is a very compact and flexible language, but if you're not disciplined you'll pay the price for it.
A 60kloc C++ project is small and easily manageable, a 60kloc Ruby hairball can drive a person insane.
Rails code. You get a lot of bang for your buck. That, and the "greater" metric for the project complexity is the stupidly high number of controllers and models.
As a former C++ dev, you can't compare Ruby LOC to C++ LOC. Just 4x the Ruby and then it becomes more fair. C++ is just verbose. #import <algorithm.h> doesn't fix everything blocks fix.
If I had to guess, I'd say it's probably an issue with the build/deploy system. Perhaps someone deployed a broken build, then tried to revert/rollback, and realized that the previous version didn't build "cleanly" anymore.
This could happen if you have a lot of dependencies, switched compiler versions but left the binaries "in place" and deployed changes incrementally.
There are two problems with giving a full answer: firstly I'm still under NDA for some of that, and too much detail would breach that; secondly, my exact memory of it is limited.
In short, my predecessor had attempted a move to SOA without understanding dependencies, circuit breaking and failure modes. This would then cause scenarios where the entire front-end would fail to render on a single down-stream service taking a little longer than necessary.
When identifying how to stop that happening, I discovered a large number of comments tagged "TODO" with statements like "Refactor this when we have time" or "We need to find a way to do this better".
Further down on the downstream services there were rather esoteric SQL queries doing large joins that nobody had done a query plan on. It was hard to identify these because the ORM had been trusted to do magic, and it was happy to do so, but there was a point where it was not apparent _why_ these joins were happening, but when you found the code, there were more comments "This needs improving", "We should refactor this", etc.
We were able to get something back quite quickly with liberal application of indexes, and it took us a day or two to refactor the queries enough to mean response times came down, but the error rate was still > 20%, and it was random, so 1-in-5 page loads of the front end service would fail.
We refactored the code to circuit break and handle degraded services better, but that took a few days, and then we started working down to the back end service and figuring out the final steps.
It was a small team looking after legacy code that everybody knew was a bit messy.
A few weeks before this code was shuttered, I heard from a friend that some of our content did not render at all on certain Android devices. I identified the cause as a half-finished refactor (again, my predecessor), that had never been finished because he had been pushed to work on something else. This caused a dramatic decline within a key market segment that resulted in declining ad revenue, subscriptions and overall viability of the business.
Basically, when you start something, finish it. If you find yourself putting in comments like "We should refactor this" anywhere in your code base, and you're doing so because the business is pushing you to work on new features, you have a massive problem culturally that is going to cause a rise in technical debt that raises risk to revenue.
All technical debt ultimately will lead to problems that the business will see on balance sheets, but they will rarely successfully identify the cause as being technical debt because they can't see, understand or rationalise it. They think it's engineers being grumpy idealists.
People play too fast and loose with the concept of "MVP" for my tastes, and it's a problem I see over and over again. The risk of that is, long-term, it will cause business failure.
I'm currently working on the reincarnation of a project that was killed by technical debt -- TWICE.
The original codebase was about 20 years old. It was control code for something best described as an industrial robot. Written for the last 20 years by greybeards who knew a lot about the manufacturing process, and were reasonably good at getting a product out the door.
But the whole thing was riddled with #ifdefs for this customer or that, or one batch of machines or another. All long forgotten, written by people who had since left, or been pensioned.
It was in dire need of improvement and extension, but it would have been superhuman to inject new features into this rat's nest. Plus their electronics supplier was discontinuing the control electronics the system was designed for. The UI also looked like it had been designed by German engineers in the 1980s. Which was the case.
So they made the defensible decision to start from scratch. A team of engineers was to develop an brand new machine, with all new electronics and all new code. They got to work -- and had to scrap the new software about three years in. It was just utterly misdesigned, and riddled with bugs.
It featured wonderful WTFs like the embedded realtime code depending on the Qt libraries.
I observed its instability myself: it would just spontaneously crash every five minutes, sometimes just while idling. Once the project lead was on holiday, the programmers revolted, went to the head of the company, and the project lead found himself without a project on his return. Whee.
Now we've started from scratch again, and have at least succeeded in making different mistakes this time around. Fingers crossed, this might end up working.
I'd say it wasn't due to technical debt, more a start-up like development approach to a company that trades millions within seconds in full automation. It sounds like the deployment process wasn't that complicated for a company of that size, but it was deployed without a single check by a second person.
If you're trading automatically, you'll need a very, very solid deployment and audit process, even if you're just a small company. The reason banks are so slow in deploying software is because most of them lost a few millions at some point due to some bug.
Startups that think they can act faster than banks just haven't had that bug yet. That's also why I'm rather negative on the whole Fintech scene at the moment.
That's not a startup approach. That's an enterprise approach. Believe me, I've fought tons of resistance in automating deployment operations in the enterprise. There's a perception that automation is dangerous, and you need human checkpoints. In practice, I've worked on projects much, much larger than Knight Capital, where the deployment process was driven by a huge spreadsheet, and orchestrated by non-technical overseers telling techs what commands to run based on the spreadsheet in front of them. It's incredibly vulnerable to human error like "Oops, forgot to deploy to one of the eight servers in the cluster".
In the enterprise, this is called "mature" and is a sign of great sophistication.
Yeah, the amount of resistance towards automation at some larger companies is a total mindfuck.
My first ops job back in 2008 was at a large exchange's NOC where we shut down and clean the application environment every day. Every Friday, we would have to take a backup of the ~20 or so production databases - by hand, in an ancient CDE based UI with a . Right click -> menu -> submenu -> backup database. Very little room for error, and you weren't allowed to do it without somebody else watching you. Throughout the weekend, customers would then run tests against the production databases. Once testing was done, we'd restore the prod databases back to their original state to wipe out test data.
At one point, I asked my boss if it was alright if I automated it after showing him a POC and was rejected because, "We don't trust automation to do it accurately every single time." Mind boggling. In mild fairness, in the 15 or so years they were doing that, I don't think anyone did it wrong.. which is an enormous miracle in itself.
(That was a strange company. My boss was a JW who'd worked there for 30 years regularly tried to convert me and would spend four hours a day on spreadsheets for his church. We'd also manually kick off stock split processing from a ~10" CRT monitor from the early nineties.)
Call it "process debt", or "management debt" (i.e. the lack of investment in proper management and the culture that goes a long with it -- in favor of a "STFU and just add that feature now! I need it yesterday!" mentality). Either way, part of the same boat, basically.
"The consequences of the failures were substantial. For the 212 incoming parent
orders that were processed by the defective Power Peg code, SMARS sent millions of child orders,
resulting in 4 million executions in 154 stocks for more than 397 million shares in approximately
45 minutes. Knight inadvertently assumed an approximately $3.5 billion net long position in 80
stocks and an approximately $3.15 billion net short position in 74 stocks. Ultimately, Knight
realized a $460 million loss on these positions. "
About 1986 I was tasked with moving a small block (a few KB) of data very quickly from cabinet A to B, with the racks full of custom electronics - no PCs, all original stuff on a flight sim with 386 Intel processors all over the place. The racks had Multibus backplanes.
I suggested a 'TAXI' fast optical link (oooh - optical..too radical) or a pair of Intel 589 (Ethernet) cards for an off-the-shelf solution. Nope, too expensive. Engineering Management suggested a twisted pair ribbon cable between the two adjacent racks - um, OK..
Long story short - me and the senior design engineer decided to use the Intel 8257 DMA controller chip to grab the bus and blast the data between the RAM on two cards.
After a short period of fails, we found that the engineers who designed our 386 cards did not bi-directional buffer the DMA request line onto the backplane as they never expected any other card except the master CPU ones to initiate a DMA, so the CPU cards could not see the line being toggled from elsewhere.
Engineers would not accept a change request for 'reasons'
Intel 589 cards is it then!
All because someone chose to omit one tristate buffer.
Projects rarely die because of technical debt. Instead, it becomes ridiculously expensive and difficult to add new features. But the software itself can remain in use for decades, gradually decaying and rarely adapting to changes in the business environment. Eventually either the software gets thrown out and replaced with something new, or the company is no longer able to compete.
I've seen this play out probably close to a dozen times now, at different employers and consulting clients.
This is only true if developing new features isn't part of how the company succeeds. That's probably true for some tools that are used internally. If you can't modernize your payroll, that might cost you some money, but it's not make or break.
For a company that makes software as a product, or to directly support or create their main product, not being able to add new features is a really bad place to be.
I have seen a product getting killed by trying to resolve technical debt. The refactor took nine months and in the end didn't work better.
I am a big fan of constant refactoring on a small scale but I am very skeptical of large refactoring of a whole project. You may end up with something that's just different but not really better.
I've had the opposite happen every time the team I've been on decided to refactor a large portion (or even the entire code base). Every time, what was a source of constant bugs (i.e., X bugs per week, every week, never lessening), became tractable and moved to stable post the rewrite (X bugs first week, .7x bugs second week, etc, until finally we're encountering the odd bug only once every few months, if at all).
I'm not sure what the differentiator is. I'd be curious if others have ideas. I think part of it is that in both cases it was a small team, who caught the issues early enough that it hadn't gotten too bad yet, but late enough that the right direction to move in was clear.
I was talking about enterprise projects that had had years of development, always changing personnel and had to follow complex and changing business rules. These tend to be ugly and difficult to work with after a few years of development. In my view the only way to deal with these is to break them down into smaller components and then refactor. But that turns then into a political issue because the managers (and a lot of developers) don't see the need.
Yeah, I could see that. I've been in those environments too; I have no data points from that, because getting the okay to refactor was so hard it never happened while I was on the team (I'd been on projects that claimed to have refactored the code, but then it was mixed as to whether people claimed it was a success or a waste).
I always tell the younger guys not to try to get an explicit OK to refactor but just add 20% to all estimates and use that for continuous refactoring without asking. It's just a regular part of professional work like writing code, pull requests and testing. This also has the advantage that refactors are relatively small so you can rollback if it turns out that the idea for the refactor was wrong (yes, this happens:-) ).
Even single man-month long "refactors" are something I'm wary of. Smaller changes are easier to test, easier to review, easier to merge, easier to verify are actually improving the state of things and heading in the right direction, easier to pause when your priorities unexpectedly shift midway through cleanup without leaving a terrible mess...
I'm okay with the occasional week-long rewrite of a subsystem, but usually only after I've spent some time coming to grips with exactly why the old one is terrible and have a firm grip of exactly how the new one will be better.
Technical debt is not a thing that kills products. Shit-ass management kills products. Technical debt may or may not be a symptom of shit-ass management.
Agreed. I feel like technical debt is more of a locus of control issue among developers than a real business concern. The only thing we look at all day is code; therefore, if the project fails, it must be because of the code.
I'm a fan of continuous refactoring, making small improvements to code and environments constantly, rather than trying to do everything all at once. It might not be as satisfying, but it's less risky and a lot more realistic in most work environments.
The problem is when you need to change your "platform".
I worked on a 300k LOC business basic application at one point.
The big question everyone was asking is how do you move to something else? Everyone wanted something else, they started writing new services on top of the old system, they had some ideas on where to go, but it just didn't seem like a gradual rewrite was possible.
And to be honest, a Greenfield rewrite just wouldn't work work for something this size with the resources they had. So it stayed in business basic.
Except for one measure: Netscape died as a company. The huge rewrite contributed to killing it. If you don't ship a product (for like 4-6 years?) you're gonna die. Mozilla originally chose the name phoenix, (then firebird to avoid trademark problems, then finally firefox) was chosen because it was a phoenix rising from Netscape's ashes. Its major innovation: It was 'blazing fast' when compared to ie 5.5 / 6. Tabbed browsing was also pretty cool.
You can learn a lot of lessons from Netscape, but this isn't one of them. Servo is a great example of how a rewrite should / can work. Mozilla hasn't devoted 100% of resources to Servo, but instead is letting servo build all on its own, and someday unclearly defined in the future, the two could merge. (but might not!) It's a separate product, and nobody is pinning all their hopes and dreams on it.
I remember how long it took to release a stable version of Mozilla and Mozilla Phoenix. In the meantime, had to recompile newer releases all the time manually. There was no alternative browser on Linux or *NIX for that matter (OK, macOS still had MSIE).
The successor of Netscape Communicator was Mozilla (IIRC it was just called that, later renamed Mozilla SeaMonkey), and the successor of Netscape Navigator was Mozilla Phoenix (later renamed Mozilla Firebird and eventually Mozilla Firefox). Firefox and Thunderbird were once again separate clients.
Mozilla was still considered bloated, but Phoenix was far less bloated which is nice on lower RAM machines, and allowed the start of Web 2.0. It was also the return of doing one thing and doing it right: browsing the WWW. As Netscape Communicator (unlike its predecessor, Netscape Navigator) came with a Usenet client and e-mail client.
Later in development, addons became a thing, and you could add features which were previously part of Netscape Communicator such as calendar, HTML editor, etc. You can also add such features with addons to Mozilla Thunderbird.
Then Google Chrome happened, and people switched to that, but I'm not entirely sure why.
Also, Servo is just the engine. And modern web render engines are themselves highly modular. I think the Gecko engine powering Firefox have had its javascript interpreter replaced 2-3 times.
So when it comes to it, the most likely outcome will be a kind of "my grandfather's axe" scenario where over time parts of Servo replace Gecko within Firefox until Servo has completely replaced Gecko.
Sorry, but I emphatically disagree. Servo entailed creating a new programming language, building a community around that language and using the project as playground for feature validation. This might work for an non-commercial entity but it is not a good example of a rewrite.
It's more about the integration side than the particulars of it. It's a huge project with different goals than ff, so it should be (and is being) treated as such.
It's a huge, audacious, hairy project, which might happen if a startup said "OK let's rewrite everything from scratch!"
Then again Firefox was itself a strip down of a rewritten Netscape suite. Stripped down in that the suite included not just a browser but also a email client, IRC client and a HTML editor, and the UI was done using JS and XUL markup.
What Firefox devs did was to take the browser part, make it stand alone, and replace much of the XUL UI with native widgets (GTK on _nix).
Same thing, back in the first dotcom boom. I think the company would have died anyway, but they burned whatever runway they had by undertaking a complete rewrite of a working ASP/SQL web app (full stack Microsoft). The new version was to run on Linux and use a variety of custom code, sourceforge and/or freshmeat projects, and several different data storage tiers. An explosion of architectural complexity. As far as I could tell the main reason was that the CTO and his top architects were all Unix zealots and hated Microsoft.
I can't remember which podcast episode it was, but I do remember him mentioning a professor in South Korea had once told him he was using his blog posts for his CS classes.
I'm currently watching this happen to a product from the outside. The company I work for has an ERP system from the late 80s, written in COBOL for the HP 3000 series computers. At the time, it was probably an excellent system; however, over the years it's had modernizations tacked on with no regard to actually improving the core system. Some examples:
* In the early 2000s, they added support for Windows NT to the product. Unfortunately, they did this with an MPE compatibility layer that means the entire thing still thinks it's running on an HP 3000, so controlling it programatically means writing MPE job streams.
* It was originally written to store data in COBOL records. When they added support for SQL databases, they apparently just copy-pasted the schema verbatim from the COBOL copybook format. This means the database has no foreign keys, FLAGS columns all over the place (including tables where you have to JOIN ON SUBSTRING), and, most egregiously, a table with ITEMNO_001, ITEMNO_002, ITEMNO_003, PRICE_001, PRICE_002, PRICE_003 and so on, which has to be queried three times and UNIONed to get the data out.
* Printing packing lists requires not only a specific model of printer, but also an extra several-hundred-dollar chip to be installed in that printer. I'm told that this chip's sole function is to enable barcode printing.
I have no insight into what goes on inside the company that makes this thing, but it certainly looks to me like they have a severe case of technical debt. Any bug fixes generally take 4-6 weeks in the best case scenario, and frequently either don't fix the bug or introduce new ones instead. Their only customers are the ones that have been using the system for so long that they're stuck with the system, and can't switch--in fact, many of them are still running HP 3000 systems, which HP has been trying to end-of-life since at least 2006.
The end result of this is that the product is dying a slow, agonizing death of attrition. I think the only reason it still exists at all is because the company that makes it is stuck with support contracts that haven't expired yet.
I am working on a project now that bears some study.
I built this extranet app for a Fortune-class / NYSE company in 2001. They were a Lotus Domino shop so for that and various other reasons the extranet was deployed in Domino. The initial rollout was considered quite successful, but it was definitely "v1" code, and I'm being really generous with the code quality. Plus, Domino.
The application was considered a stopgap until the shop had become fully Microsoft-centric, at which point it was expected to be migrated to .NET. That was expected to be in ~5 years.
The result was that no investment was made into the app for over fifteen years. Every now an then an enhancement would be needed, and a contractor would be called up to bolt on a feature in shockingly slipshod manner (this app is much too complex for the average Domino dev). But no technical debt was ever cleaned up, because "meh, we're going to replace that app by 2007."
2007 was 10 years ago. In the meantime two projects to replace the app were spun up and killed. The app is finally being retired this year. I was called up at the 11th hour to jump back in (15 years later) to help support the thing through the conversion, as the one existing Domino dev they had on staff finally (wisely) jumped ship.
I cannot even begin to describe the state of this app.... it's a case study in "how to not manage IT."
---
Another recent client was a content-creation shop (think glossy magazines). Their outgoing sr dev had deployed a CMS that nobody had (or has) ever heard of. This CMS was originally developed during the glory days of XML. Believe it or not, the app worked by loading all of the CMS content into a single in-memory XML document. This was probably OK for a brochure site, but this was a site with hundreds of thousands of pages of content. As a result the application required a server with 64GB of RAM just to launch. Also - launching the app took about ten minutes after the server OS was loaded. And there was no server farm, just the one server. If the app was ever stopped, it would stay down for at minimum 10 minutes.
I came in to fill in temporarily and to try to find someone to staff the position permanently. Even with a competitive salary, nobody qualified wanted the job.
Meanwhile, the same company also had a set of blogs that they managed in WordPress....
Sure, any amount of work could have been done to replatform the app.
They were already using WordPress for blogging. A custom WordPress implementation would have easily solved their CMS problems and devs are trivial to find.
The point was that the thing had just been rolled out the prior year. There was no budget or appetite for throwing the thing away. It did work. So there it stands, aside some dozen WordPress sites...
I owned the project and decided to kill it, I guess you can call it due to technical debt, but probably more accurately it was due to incompetence, both by the developers and by myself as their "manager".
I hired 3 mid-level PHP and jsp developers in Thailand and had them make the website + reporting page.
total nightmare. Don't hire developers and assume that they will rise to the occasion (learn new tricks). I gave them as much time as they needed to research and make sound engineering decisions, I ended up with a spaghetti nightmare Frankenstein mix of server side scripts mixed with client side script mixed with server side that generates client side script.
In Thailand at least, you always need a manager to force architecture and design decisions, and force devs to refactor poorly thought out solutions.
I was naive and thought that I could have a team of 3 figure out the web part while I write the desktop client and provide PM-level guidance.
I had a chance to ship the first Tower Defense game for iOS. The OS X game I was porting had some crippling performance problems that were incredibly hard to track down.
The problem was two-fold:
1. The relevant tools (Unity3D) were extremely immature and the problem was quite diffuse. No profiler, poor quality of generated code, tiny caches, etc.
2. A problem in string-handling code that was quite diffuse throughout the game. As near as I can tell, it was blowing out the tiny CPU cache hundreds of times per frame.
On desktop, this code was a complete non-issue. On the puny little ARM on the iPhone? It was the difference between having dozens of towers and 50 enemies in play vs half a dozen towers and less than a dozen enemies in play. The impact on game dynamics, and need to re-balance everything by itself would add weeks to the shipping schedule.
There were plenty of other things that needed to be scaled WAY back of course: Switching from 3D to 2D to get vertex count and draw call count down. Completely rebuilding the entire UI. Revamping the pathfinding and suffix caching to not play havoc with the CPU cache. Moving from a 24x24 grid to a 12x12 grid. All of that combined helped a LOT, but not nearly enough.
The string manipulation was for a hierarchical property system that let me parameterize all sorts of attributes for enemies/spells/towers/projectiles in a set of text files. Ultimately, I had over-engineered on the assumption that I would be tweaking many more things -- with much greater frequency -- than I wound up actually tweaking.
Had I ripped most of it out and just had local properties on each prefab that I assigned manually, I might've hit that market opportunity. Finding that that was the cause was a multi-month project because of how interwoven it was with everything else. Hell, it would've been fine, had a not over-generalized it into a shared component on each prefab that the other components inquired with to get property values. But I did. And it took me vastly too long to identify it was the major problem it was.
Opportunity missed, and that was the final nail in the coffin for my fledgling game studio.
Wow. I'm actually seriously considering starting something on a smaller scale (tower defense for mobile as well) in the next few days as my next long-term(ish) side-project. I've done native iOS and Android for a few years now, but nothing with Unity as of yet. I know C# well enough. Other than what you've mentioned, do you have any immediate tips before I fall face first?
Technical debt won't necessarily kill a project, but over time it will reduce the speed at which you iterate and ship software. That loss of speed does kill companies, young and old.
I've seen projects die not because of the technical debt existing, but because of lack of addressing it. The developers got sick of working around the debt to do trivial things, they left. Customer support personnel left because of the constant manual "tweaking" they had to do due to the technical debt. Management focused on tacking on features instead of paying off any technical debt which was having rippling effects throughout the company.
I worked for a startup that had basically the right idea but proper execution took so long we ran out of runway. The first iteration of the product was built in an extremely haphazard, cowboy way - and took months, if not years, to refactor into something stable, usable and crash-proof. By the time the product was operational, the company was bankrupt. We simply hemorrhaged money until we bled to death.
As someone else pointed out - technological debt is not a cause per se; it's an indication of some deeper problem - usually of human, not technological, nature.
That may be a 'startup problem'. Do it cheap and coyboy because, runway. Assuming the money will come along later to do it all again. But that happens (lots of money later) only if you get bought out. Not if you have to make it on your own.
So any business plan that includes the steps "A miracle occurs" and then "We get bought out" is probably going to suffer that fate?
Even if the miracle occurs and you get bought out and get shittons of money thrown at you, you've already built a company with a "fake it till you make it" culture. Even if you get to hire great new engineers to build your product the right way, your existing team is a ragtag bunch of amateurs who don't know how to build things properly and block any attempts to improve the status quo. I've been there. You can't build castles on the foundation of mud. You'd have to throw it all out and start again - and that's a recipe for disaster.
I wouldn't say, killed, but severely burdened? Limited by technical debt? Sure.
One application was a web application built in C++ in the 90's. It didn't have the STL, it implemented everything from XML parsing to PDF rendering from scratch. It stored all data in XML files on the file system. It was a single-threaded CGI application. And it was the core product of the small business that created it.
There was no series-B/C/D/E that was going to appear so we could hire more developers and re-write everything or develop a new, superior product, etc. This is where I learned how to maintain and extend legacy software. I spent hours pouring over Michael Feathers' book. We did manage to extend and breath new life into the system. We wrapped the old code in Python, wrote a tonne of integration and unit tests on every change, wrote some code to sync data to a database alongside the XML file storage scheme it used. We even got to a place where we started replacing code paths from the Python API with functionally-equivalent (as far as our test suite was concerned) code written in nice, clean Python (and gained some features along the way thanks to Python's nice libraries!).
We kept the lights on without having to spend too much time hacking on undocumented, untested C++ code and without trying to just re-write everything. It was much more difficult to make progress than a typical greenfield project in a dynamic language but that would've cost more upfront without a clear payoff... so we did what we had to do.
Another company? Well they decided to use a document-based data storage system as the source of truth in a hot-new micro services architecture that was going to save everything... only there was no schema validation and their use cases were killing performance in some scenarios. Random breakages cause by changes at a distance. It hasn't killed their business but it has limited their options.
I've seen a whole company killed by technical debt. Because the software was written so badly far more developers had to be hired to firefight than the company could afford. The technical support team was similarly bloated to deal with the endless problems the customers had. Sales were low due to the bad reputation.
A rewrite was started, but never got anywhere. The company folded under the weight of its massive salary costs.
Our product, a large-scale enterprise software, is slowly getting killed.
It's old and it's rather unusable (by the users).
Plus, for "backward compatibility", it supports dozens of strange configurations. It's dragged down by so much technical debt (functions longer than 3000 lines with 60 parameters!) that every small changes requires so much time.
We're slowly killing (i.e. no big new developments, but only maintenance for existing customers) and abandoning it. And luckily we're not rewriting it. :-)
ITA software is a good example of a company that succeeded due to the collective technical debt across their competitors.
Though they only really succeeded on the shopping part. They didn't ever get to a credible booking engine that anyone would buy. Which may point to something other than tech debt being the biggest barrier to modernizing an airline reservation system.
Former ITA engineer here. Our airfare search product QPX was untouchable at the time due to design: it got results that were far better than those of the competitors because ITA was modeling the problem better (search through a graph). While competitor tech debt didn't hurt us, I don't think it was the pivotal factor in ITA's success. As you point out, our hopes of replacing a major carrier's reservation never came to fruition, unfortunately. A res system is a complex beast.
I do agree that QPX was untouchable, but I still think tech debt in competitors was a major factor. There were plenty of smart people at your competitors...I'm sure graph search occured to them. I suspect efforts to green field that were squashed...nobody wanted to throw out the hairball they had because of the existing investment. Thus, they tried to "fix" what they already had...with obviously bad results.
Edit: And, worth mentioning that your competitors wouldn't have had to be better than, or even as good as QPX. "Good enough" would have squashed several big sales, since shopping was typically bundled in with what their customers already paid.
"This is indeed a bitter pill for ITA Software’s founders to swallow as they put years and millions of dollars into their dream to transform the nuts and bolts of the way airline reservations systems...are handled"
I think implementations are more often killed by technical debt rather than products or brands tbh.
Although at that point I wouldn't call it technical debt. If you've a million lines of spaghetti code, then you've a million lines of spaghetti code not technical debt. I.e. a camel is a camel. It's not a horse with technical debt.
Kind of, but not really. When I complain about that product I almost always complain about marketing, intransigent leadership, etc. Ostensibly technical debt killed the product, but technical debt is almost always a symptom not a cause. Technical debt can get out of control, but you have to wonder how it got to be that way.
Back in '96 I was in a startup company in the Boston area. We had a B2B app that we customized per customer. It worked over dial-up. We didn't foresee affordable internet connections coming. When it did we couldn't rewrite critical subsystems quickly enough.
Because our product was customized per customer - not just look-and-feel, we coded their business rules into it - we had problems scaling. Further, our design didn't lend itself to rapid development.
First our sales team left. Then developers started to leave. Within a couple of years after I left, the company folded. Many good experiences had in that company. Many lessons learned.
Two examples I can recall. First time was an embedded application for a smallish (2-3 SW developers total in the company) firm. The code was, by my estimation, initially developed by a real talented senior guy. It at one time did what it was meant to do pretty well, and adhered to sound (at the time) best practices. Well Mr. Senior Guy ended up leaving for whatever reason, and since then, a string of less senior folks were brought in one by one to burn out maintaining the system throughout the years. And by "maintain" I mean "toss in every feature under the sun that the CEO asked for, whether or not it made sense and whether or not the rest of the system kept working". Every little thing that might get one potential customer to say yes was bolted on hastily and shipped as soon as it could compile. Just get it to work by any means and ship it. I was brought in as one of this long string of people. By the time I came, the CEO was frustrated that the system could not be added to quickly enough anymore, and that core functionality would fail in the field more and more often. It was a total mess, and eventually proved irredeemable. I'd like to say I left the system in better shape than I found it but at the end of the day it didn't matter. Technically, this project is not dead so it doesn't meet this Ask HN's criteria, but I don't think they ever have or will make another major release of it. It will limp along until they start taking technical debt seriously.
The second one was a mobile app that was originally ported from some legacy J2ME app and "gotten to work" on the iPhone platform. It was pretty much a straight port, data structure by data structure, from Java to Objective C, and didn't really use the platform properly at all. For example, each and every control was hand-crafted to mimic the original J2ME app, rather than using the built in UIs that iPhone provided. It got to the point where nobody could touch it without it falling over, and no senior person was willing to work on it anymore. I was senior enough in my career at that point that I could insist that a complete re-write was the only way to go. We did that successfully and the previous pile of technical debt was killed.
Those examples aside, I'd say that almost every place I have worked suffered from technical debt to a large degree. The common theme was a huge legacy code base that suffered for years (decades) from repeated "just cram it in and get it to work" abuse. The metaphor I always like to use is: No home builder on earth would, when their requirements were to build a 5 story apartment building, take a single story single family home and just add 4 floors. But seemingly every company building software attempts to do this.
It took a year to build an index from a new crawl, and they were only doing incremental "freshness" updates in between where they updated certain pages. It was a fiasco.
I have never personally experienced a project failing because of technical debt. I have experienced "the opposite" of that where there was such a focus on doing things "the right way" it felt to me like development was slowed down by this. Our product iterations were slow and from my perspective it cost us valuable chances to test different product ideas.
I'm using this definition of technical debt.
Technical debt is a concept in programming that reflects the extra development work that arises when code that is easy to implement in the short run is used instead of applying the best overall solution.[1]
The project I am referring to was a consumer web project. No mission critical type data.
In my experience (mostly consumer web, social media management for B2B) I have seen causes of failure to be heavily weighted towards product issues not engineering. The one big success I was a part of had the most technical debt. :) But that's probably because it lived the longest (and still lives today!). My experience is limited to being an employee at 4 different tech companies and several failed attempts of my own.
"Killed" as in "That was the reason it was ultimately replaced by a green field project, after ten years in the market" or "Killed" as in "It never shipped because it was so bogged down in technical debt we could not ship it"?
The former has happened to every project I know, which doesn't die for another reason (market disappearing, etc). The latter I have not experienced.
It seems everybody here is suggesting that they have worked on products that were killed by technical debt.
I'm going to take a different stance and suggest it is very difficult for a product to be killed by technical debt.
I'm working on a product now which has huge technical debt requiring a full re-write, but we have customers who love the product, so we just keep the old version going while building the new version.
Let's assume we are talking about a single product company here, to make the discussion simpler and not get bogged down by larger corporate issues and it lets us focus on the case where the product is what is paying the bills (assuming we're talking commercial products here).
Let's look at a few examples.
The product has some customers, but it is difficult to add new customers due to some technical debt issue and the lifetime customer value for your target market does not cover the cost of re-developing your product and continuing operations. Ok, you're probably dead.
You've got some customers, and you can't sell more, but the lifetime customer value is greater than the cost of re-engineering the system. You've got some bad times ahead, but there is a path. So technical debt doesn't kill you.
You've got technical debt, and you don't have many users, you want to add features but can't because of technical debt. Did technical debt kill the product? Or was it killed by a lack of market? You don't know that the new features would have saved it, all you know is that what existed did not make enough to keep you going, so that can't really be blamed on the technical debt.
Technical debt can be costly, but rarely fatal. It's great not to have it, it can make it difficult to keep good people (I lost 2 amazing devs partly because they hated the old code-base we inherited, but they also had amazing opportunities).
> You don't know that the new features would have saved it,
Or could have killed it even faster.
Imagine that at the beginning you decided not to accumulate "technical debt". Instead you got some product that developers believed had no "technical debt" and because of that it was late to the market. Money were running out and once customers started using it you had to be really lucky for new features to make enough impact to stay alive. Because customers don't necessary care that much about features that are easy to add, but maybe want some features that are hard to add and nobody anticipated that. Either way I cannot find a reason for "technical debt" to kill a product.
I put "technical debt" in quotes, because even the underlying idea of the concept doesn't make sense and relies on a belief of knowing how to do it "the right way", which I don't think can be a good way to write software. It's better to substitute it with more complex concepts of flexibility and simplicity and corresponding trade offs.
I think this point is very under appreciated, thanks for pointing it out.
Along with the 'feature' that could kill things faster is general bloat when people keep building without knowing what the customer wants. You have to try something, but product testing should be done in such a modular way that most things can be removed if it turns out to be the wrong direction. Of course, you need a base data structure, but features should conform to that rather than constantly extend.
One example I'm seeing with a start-up I currently know is that they are building SDKs for multiple languages. Most of the code is auto-generated, but just the time in examples, documentation and packaging is killing them while they don't have customers for most of the languages they're publishing. This isn't just 'code' debt, there is overhead in documentation and management of code.
I worked for a dotcom back in the day. The underlying tech was appalling, to the extent they couldn't actually stay on the internet. We fixed this. They then proceeded to vastly expand the site, but every part of it was written as a special snowflake.
Five years later, they decided on a rebrand. This was literally a reskinning of the existing site. It took 50 odd people, ten months and more overtime than you care to think about. Many of those people were contractors.
Not only was this a huge expense, it prevented us from making the site any better. It was an extremely competitive market and the other sites ate our lunch. The next year was to be after round of redundancy, a financial statement that wrote the value of the business down to its cash reserves and finally a sell-off.
Ironically, the new owners turned it back into a viable business, but they had a very different attitude to technical debt.
I am dealing with a set of legacy apps that will eventually be phased out. There are 'generations' of code, trying to solve architectural problems that are never taken to completion. Making large moves like that is a form of technical debt; not taking every step through to completion (like, documenting the code ...) is another form of debt.
The original architect was, despite his outward appearance towards emphasizing correctness, was really not a great developer or a good technical lead. He insisted on writing "perfect" code, ignoring the rest of the development cycle or the people he worked with. Regardless, he left for reasons I am not really aware of last year. I knew he hated me, since I didn't share his obsession.
Fast forward to today ... the applications work. There is no more feature plans for it. But, it is frustrating to go back into it because the code is utter garbage. I am the only one who has any deep understanding of it in my company. Those who tried to understand it also struggled over it.
We are taking technical debt in the new application we are working on, but, we have been planning for it to happen. I have the chance with the devs I am working with to get this "right".
Throughout my career, the sense that I got was that people did not understand something critical: Owning code is a full-time job even after it is written. I never worked with technical leads that asserted that. They always made passing nods to the idea that we should comment more code or write more tests, as if it is just some tired dogma to follow, but no one has made a case to me for being clear on knowing what you are accomplishing, how one accomplishes it, planning to own the explanation of what the code does and what trade-offs were made and a plan to pay back the debt that was taken out. As tech lead now, I am continually communicating this; the progress is starting to show. I am just glad now my company is recognizing these needs too and is understanding the investment that comes with maintaining their major product.
I am right now, but I can't take a step back and fix any of it because that wouldn't be "Agile". The project won't be declared a failure though, they will just keep adding more and more "resources" aka people to get the same amount of work done.
I've long thought that "premature optimization is the root of all evil" is sort of a waste of breath... not because it's wrong but because over-engineering is a far greater problem today than premature optimization.
Over-engineering is a plague in modern software. Most of the failures due to technical debt that I've seen involved cases where "smart" developers built swiss army chainsaws to do things that required a hammer. This also often results in products that require orders of magnitude more resources than they should, which makes cloud vendors like Amazon and Digital Ocean a lot of money I guess.
Do major design fails count as technical debt? Among other things I've run across an e-commerce site that required choosing a payment method before browsing products or putting things in the cart... (fortunately not my own job). (The reason for this odd choice was evidently that they wanted customers to have access to special offers/pricing/services/products depending depending on how they were going to pay. Cure worse than the disease type situation, IMO.)
I think I've worked for a company that did essentially mostly screw itself with technical debt, though.
They originally had overseas contractors write parts of their product without having a proper developer to assess and vet the results or requirements and it resulted in zero separation of concerns, business processes and display logic combined, and a terrible database structure. Eventually they wanted to move from their original layout and page design to a new one and it proved almost impossible without multiple developers spending around half a year. And even that was wasted effort because at the end of that they realized they wanted to actually have a mobile site and they still hadn't really created a good separation of concerns... Of course the whole time engineering and development wanted to refactor the business logic into separate code from the other layers, but management didn't want to expend resources on non-customer facing dev work.
I think the product still exists, but they've killed their momentum.
If the company has a government contract, the technical debt becomes a hidden feature.
Government customers rarely have the ability to independently assess what something should cost if done according to industry best practices. So what you do is bid low, with an extremely short time horizon to release. You get the contract, and now have license to run up a lot of technical debt, because it needs to be done fast and cheap.
Now you get to maintenance phase. That's where the money is. The typical government customer is never willing to spend even one penny on paying off the principal on technical debt, but will make the installment payments forever. They will spend $100k on one new feature, but not even $0.01 on reducing the price of new features. Many still use SLOC as a management metric. So you run your codebase up to a million lines of code, when the software itself is just another glorified CRUD app. For bonus points, you give yourself 100% test coverage on a bunch of functions that have boolean "isUnitTest" parameters.
I imagine this is similar to how VC firms loot companies by manipulating their financial structure. I find it to be extremely unethical, but there is literally nothing I can do individually to put a stop to it.
Yes. Technical debt rendered it difficult to create features, and difficult to hire. The project dragged on as engineers came, tried to refactor and then left. I didn't stay.
My understanding is that it was never released, so all of the money the company put into the project was wasted.
This is maybe different from what people normally consider 'technical debt'. I don't mean just code aesthetics but also bugs, redundant code, and bad abstractions.
We have lots of tech debt (we are gradually paying it down), but we keep engineers for a loooong time (they basically don't leave unless they move) because of the social aspects of our team. A bunch of really nice, helpful people, who really want to make things better but are willing to balance "good" with "practical".
I wrote a complicated and horrendously ugly scraping PHP script with hundreds of regexes to generate a PDF file from a CMS. It worked in about 90% of all articles, and for the rest only needing a few manual fixes in InDesign.
I always intended this as a stopgap measure till we implemented a clean document structure, of which both the web page and the PDF could be created. But that never happened. Then I left the company.
A few years later the company made a redesign of the website and totally scrapped the PDF feature.
However I still sometimes see these PDF pages out in the wild, mostly saved by the authors of the articles on their own web sites, because authors are allowed to link to their own content for free. It's a shame because I think these PDF versions of the articles were beautiful. But as you would say, technical debt killed the feature.
Another project for a NPO has been written in Python using an ancient back-end. When the shared hosting provider made an upgrade, the back-end died, and I was not able to repair it and I was not paid enough to invest too much time. I bluntly told the board that their project is as dead as a dodo, and they accepted that.
Yes, this is actually quite common. What I've seen happen, twice personally, is a large company asking teams of developers to work on a codebase they can't really understand, and have no chance ever to, because the code has basically got to the point of no return, in terms of messiness.
What happens next IME is that progress mostly stops, and the company loses market share rapidly.
I did work on a project that was Rails 3, and for various reasons, it can never be upgraded without basically rebuilding the entire system. They keep hoping for an exit that will simply never happen because who would buy a rotting system like that? I think myself and only a few other devs there understood the situation. Its not like sales or the CEO fully understood.
This is true: the firm I work for at the moment just completed buying an incredibly expensive Fortran-based platform. That's despite the fact it cannot integrate with our existing products without serious work, nobody knows Fortran here, and the original developer sold it so he could retire.
So even though it makes no technical sense it bolsters a gap in the product offering, and they'll have to find consultants to limp it along every time they need something small done that would otherwise be very cheap. It's all about the balancing act.
I have seen projects that were all but dead from technical debt, that management simply refused to believe were dead.
I once worked on a system that was so messed up that fixing any bug would create 2 more. At one point, the entire team was only working bug fixes for 6 months straight and in the end, we had more bugs than we started with.
We tried to refactor the code several times but it was just so fucked up.
This was the stuff of nightmares. Most of the module consisted of one class with 50,000+ lines of VB.NET code in a single file.
Global mutating state referenced in functions everywhere. Functions that were 500-1000 lines long (today I have ESLint limit functions to 10 lines).
After working on it for over a year, I recommended to my boss that they initiate a complete re-write and made it clear that in my estimation there was NO saving this code base.
I got moved to working on single page applications in JavaScript, but when I quit that job and moved on roughly a year and a half later, that code base was still in production, generating ever more bugs for that team.
When I was a consultant I saw it happen often, BUT, seldom does technical debt get called out explicitly as a reason for failure. More usually it goes like this:
feature requests with no mind to maintainability/cleanup -> tech debt -> slow dev. velocity -> failure to deliver features for the businesses -> failure to retain/grow/compete
I worked on an EDA product for a short time. Its code base had a few components, some of which were written by the product team, and some of which were copied (without source history) from other teams. The team decided to make a cloud version of the product too, and to do that, the team split in half and each new team had their own copy of the code base.
No sharing code, because those other teams were our competition. There were teams working on components for our product, and another team working on a copy of our product, but nobody shared code. So bugfixes from one of the components would never make it into our code base. (The components were products in their own right.)
The code base was ~30 years old, and it had "survived" a port from Unix to Windows. The build system was a hellish nightmare of Cygwin, makefiles, Perl scripts, hard-coded paths, and unsupported internal tools, and after ~2 hours of manual fiddling it spat out a build. A few million lines of code, which according to my tests, would take ~10 minutes to build with a proper build system. I have a proper rant about the horrors I saw in the build system, I had never imagined that a build system could be so bad, and whenever I mentioned the name of the internal tool it was built on, company veterans on other teams would recoil.
Meanwhile, I witnessed how much damage the other developers were doing to the product. Some were trying to catch up with feature requests, some were adding hacks to the core algorithm to try and improve numerical stability, and some were doing real damage to the code base by adding buggy and poorly designed features--think major performance regressions, memory errors, and deadlocks fixed by adding calls to Sleep().
I tried to improve things as much as I could there. The company culture had some upsides (good work/life balance) but the internal competition strangled innovation, the wrong people made technical decisions, and the team didn't have a balanced set of skills.
The product hasn't been "killed" per se. It's still for sale. But it's the walking dead. Being EDA, licenses can run into five figures per seat, but the product's revenue was on a downward trend last time I checked. I have a couple friends that still work on the team but I'm encouraging them to apply for new jobs. I left when I got an offer from a major (big five) tech company.
Having used several different EDA tools, I completely believe this. I don't know how they can be so poor across the board. It is part of the reason I switched from doing software & hardware to just hardware. Life is too short to deal with bugs in synthesis tools.
I've been on one lack of loose coupling project. "All you're doing is importing from here and exporting to there, no problem, right?" and eighteen months later its somehow expanded into touching every part of the business except the vending machine firmware (no, was not a vending machine company). This leads to paralysis where touching anything makes anything and everything randomly crash. This was fundamentally a business process failure in that you can trivially specify a process too big for humans to implement, or even tool assisted humans. However the failure mode of technical debt could have hit a little less suddenly and catastrophically via some technical design decisions (should have jumped off the rails at 6 months and been redirected instead of patching around with the result of exploding at 18 months, etc).
I've been on two lack of refactoring/standardization projects. If you use rails, like back in the ver 1.0 era, you can't just stop updating and do something else, you have a tiger by the tail and if you don't keep up you'll never, ever, be able to catch up ever again. You can't go from 1.0 era to today where today is any time in the last five years. Scrap and complete rewrite. Of course management doesn't understand dependency trees and going back to OS and libraries from 2007 means rolling back 10 years of security patches or hand compiling everything and it would be a lot simpler to just rewrite.
I've been tangentially involved in a poor leadership situation where basically an entire department was forced out by a new leader, taking all their domain specific knowledge with them, and then the consultant friends brought in bled the company dry killing it in a race between expenses of consultants and smelly code being unusable. On paper the company died because of the financial load of switching completely over to outsourcing ("We're not a software company so we will not have developer employees anymore ... but we will have twice as many consultants working two thousand hours per year for five times the pay temporarily")
I've never experienced the parallel development trap, or lack of test suite, those must be interesting.
Oh yeah, I've had a few projects killed by technical debt.
A lot of these are game mod related. Since hey, even the base game is a black box loaded with the technical dev of a team that could have been gone for decades. And the very tools and patches you're using on top of it were then also by people wanting theings done the 'quick' and 'easy' way over the 'right' way. Usually because said people constructed the tool in question for their project, and had designed it specifically for the environment they were working in rather than anyone else's. So you'll often see such a project completely fall apart because you don't understand all the code you used or how it interacts with the other stuff you used without understanding it.
But a few were actual work projects for clients. These fell apart because the following sequence of events occurred:
1. A coder was hired to work on the system and had a very different coding style to everyone else in the company. They thought they were being 'smart' but had overengineered the project by about a hundredfold.
2. They got sacked without telling anyone else how the project was constructed or why it was built that way. So developer B took over.
3. Developer B tried to 'rewrite' the system completely, but ended up merely creating a hodgepodge of his work and the other developer's work that ended up being rather unstable.
4. More features were requested from the client (which the system wasn't designed for), so three more developers each added them on independently. None of this work was commented, documented anywhere or stored in version control, so they bolted the features extras on, tested just that one part of the system and claimed it worked fine.
5. Project ran into large numbers of bugs, often ones which crashed the system or took down the database for a while. Multiple times a day, whatever developer was free would have to apply patches to whatever random thing stopped working in the last few hours.
6. Everyone ended up complaining that the system didn't work. Or that it should be rewritten. Or that it wasn't what the client 'wanted' at all despite the latter having changed their plans three times this week.
Either way, what should have been simple websites turned into giant unwieldy messes that no one developer understood the full design of. Which sat in endless limbo while developers ran around trying to patch up problems caused by no one having a coherent plan for the whole project.
Worked on a product that was written in a legacy technology and built over two decades. Company does not want to invest further in the product and just want to support existing clients that had done lot of customization on top of existing codebase. New players have come into the market and eaten away the market share of the category the product competes in. Its a slow death for the product now. Management feels that it is better to buy a new product built by another company rather than building the same product in-house with no guarantee of returns.
several start-ups folded on me, perhaps this was the most flagrant technical debt among them: Octek [then Octek-Foxboro, then shuttered ] was state of the art machine vision in 1980 but stuck with PDP-ll bus+boards in a box instead of investing in microprocessor single board systems and stuck with an odd-ball interpreted language "Magic-L". By 1986 they got run over by competitors with much lower price points and higher inspection rates. I could still make the boxes do the job but it was boutique stuff by then.
Not sure if it can be classified as technical debt but I worked at a place that had a very clever and profusely productive architect. As the company grew, some perks available to developers got cut to the point he decided to move on.
From that point on, the architecture froze. It still was possible to continue creating new features on top of it, but I am sure that architect would be able to provide both guidance and new solutions I feel may be needed. No one else picked up his tasks or views, nor was the architect position filled up.
Sort-of. I worked on a product that was a rewrite of an older product, and suffered second system syndrome.
The problem was that the original programmers didn't understand how to program with a database, and management was unwilling to address the core design flaws.
As a result, upper management told us we missed our market window and the project was killed. In reality, it was the technical debt from not understanding how to correctly write a data access layer that made us move too slowly to meet our market window.
I recently left a company with a software product that may die due to a misdiagnoses of tech debt. They are in the midst of an unnecessary rewrite with nothing to really show for it after almost a year.
The previous project has some issues but they were fixable with refactoring while continuing feature dev. The rewrite was done over my objections. New management had been hired and they were looking to make their mark. I left before it could all come crashing down.
Definitely have 10+ years ago. The non technical partners (majority) thought no refactoring was needed until it was almost too late. Company still exists but had to pivot quite a bit.
More recently another one which is being killed by tech debt but the tech is so old that only a rewrite can really save it and the income from it is just not high enough to warrant it.
(the former was written by me at first before the company really took off and the latter was an acquisition)
Killed - no. But I worked on several products that weren't able to move fast enough because of it, and lost money as result.
One of those products was released half a year late and turned out to be a poor market fit. The company closed several months later. It could've used this half a year to complete a pivot with another product, which could have been successful.
If it was released late, that sounds like more a case of not having enough technical debt. If they'd kicked the can a bit further down the road maybe they could have released sooner, realised it didn't fit the market, and cut their losses.
No, it was just such an awful technical debt that it caught up with the project way too quickly. I was brought as lead developer to finally get it done after initial developers missed all the deadlines - their own code quality was their biggest obstacle.
Yes. As a solopreneur, I had a moderately successful side project. When the app fell over, it took me two weeks to get it back online due to compounded tech debt. As a daily service, I lost most of my users when the app didn't work for 14 days. The app never recovered, and ultimately I sold it. One more vote against doing everything yourself.
My first company was consumed by technical debt. We all bickered about refactoring the database schema and improving the code, but investors wanted more "rooftops" (customers) now.
The company IP and employees were absorbed by the highest bidder and the company lost its funding after the CEO and CTO left.
In my experience tech debt doesn't kill codebases so much as make estimating impossible and therefore create tension between execs and engineers. It of course also makes working in the code suck.
Typically these projects are able to limp along until some other forces kill the company.
I have many times taken over projects that were so deep in technical debt that only extreme surgery would save it from being killed. It can take a long time to turn those projects around and make them healthy again but it is possible.
The first step for me is always to add solid tests. On one project (for example) it took 8000+ test cases to make it solid, maintainable, and bug free in production.
I've worked on a product that would have looked like it. In truth it wasn't "technical debt" as such, it was bad decisions from the chief technical architect that he stood by the whole time.
Almost all of them. It becomes cheaper to buy someone else's product than to continue development. Because the NIH product has its costs spread over all the buyers.
I guess you could say this is a form of technical debt, or technical stubbornness.
I used to work for a company that specialized in Solar Panel data. I was hired to help the company catch up on their client queue (those waiting for the product to be set up).. and when I first started they were behind by about 140 clients and I had gotten the client queue down to about 30 clients remaining.
So our company would sell solar panels to corporations, and then they would put kiosks in the lobbys of their buildings so their clients could come take a look at the energy savings and other information about the building. It was a very popular way of doing things, especially in cities like New York, Chicago, and Los Angeles. Well, my job was actually designing the graphics for what was displayed on those kiosks and then syncing the data with the graphics.
There was a platform that did the work, but I still had to design the graphics and type in all the serial and model numbers and all that to sync everything together and make it show it in chart form in a gorgeous view. Anyways, for years, it was being done in Flash. Unfortunately, as progressive and dominant as our company was -- at one time, their technology was the best, even having received several awards. Over the years, HTML5, JSON, and other technologies were better and faster -- and could read data from a database just like as our system could, but turnaround time for production was faster for others. It took about 1 week for us to develop, set up, and sync everything from system to database. Now these other companies were able to get everything done in days, rather than a week or more. I think a huge part of the problem was my company had these corporations interested but forgot to hire enough people to actually do the work.
We did have an in-house software developer working on improving this technology, upgrading the software, but they seemed very reluctant to adapt the new software quickly. I had worked in the new software for a few days, and then all of a sudden, they called me into an office, and laid me off. A few months later, everyone else followed, and today, I think the company only holds on to about 2 or 3 employees who just maintain all the kiosks and software because of the existing contracts. Or maybe they went under... their website doesn't even work anymore.
Why did they go under? We were warned that technology was changing fast and that we needed to do something about it... and it was just the reluctance of the CEO to adapt the technology and push it out faster. Our competition slaughtered us. Luckily, it was my second job, so I worked out a great deal with them on letting me go.. I got about a month of vacation time on the condition that I wouldn't go file for unemployment. Obviously, already having another job, it worked out in my favor.
I'm currently working on a project that hasn't died yet, but has gone through a lot of pain in the last couple years partly because of technical debt.
I ran the project for the first four years. It was my first large project and the first time I had run a team of any size. I made a few mistakes that might be worth learning from, but my mistakes weren't the only ones responsible for the debt.
The largest driver in the technical debt issue was the timelines. My boss was new to software development, and his expectations were not in line with reality. We frequently had to ship ad hoc features under tight deadlines to keep him happy. When I pushed back on the timelines he became unhappy. So we acquired debt in the form of 1) hastily designed sections of code that don't lend themselves to scalability or refactoring 2) a lot of small code smell issues that by themselves don't amount to much but in the aggregate form a kind of surface scum that makes future development and more importantly testing difficult and thus brittle.
The debt, and the accompanying bugs and delays it caused, eventually led to enough dissatisfaction that I was taken off the project (I am in a weird outside position now, sort of half on half off, supporting a tangent of the software but not working on the main branch or involved in architecture discussions, planning, or execution of the new replacement). A new manager was hired. We disagreed over strategy, so I was taken off the project.
He wanted to start over from scratch, which is what they are essentially doing now. The plan is to pattern the new solution off the old one by incorporating all the business rules (which they want me to document), but they have very different implementations in mind. They don't want to reuse existing libraries (some of which is driven by a lack of familiarity or understanding of those libraries, how they work, why they approached the problem the way they did).
They face some significant challenges:
1) their coding velocity is slow. more than 60% of the original team has left and been replaced, so a large part of the team hasn't been on the project for more than 2-3 months. this means that there is a significant dearth of institutional knowledge. i think a decent pace is good for development of software, but they aren't starting from scratch, and there's a lot of expectations to meet.
2) because of the dearth of institutional knowledge when they do start implementation, they are going to end up repeating many of the mistakes made by the old team. I have limited insight into what those are, but based on what exposure i do have, i can see planned missteps all ready.
3) my boss, the owner of the company, has learned some patience, but they've been at the rewrite for nearly 6 months and have little to show for it. in terms of feature parity with the old software, they are severely lacking. i don't expect he is going to be willing to wait another full year to get an app that does essentially what he already has only differently, even if that architecture is more flexible.
In all honesty, I hope they succeed. My boss is a good friend and this experience hasn't ruined that. I have and am dealing with some resentments, but I don't want him to fail. And because this project was my baby (so to speak) there is a part of me that wants it to live.
----------------------
In retrospect, I'm not entirely sure what I should have done differently. Had I ignored the requests for adhoc features, I likely wouldn't have made it as far as i did, because without those features he would have pulled the plug. The company grew at an exponential rate the first 2 to 3 years of that project, and part of what fueled that growth was the adhoc, fast turn around times of my dev team. We were incurring debt, but we were also making big gains.
If I was going to do this all over again here is what I would do:
1) I would push back more, in smaller increments, placing more emphasis on getting the details right before shipping.
2) I would push back more on requirement creep. My boss would frequently have these "great" ideas that he would insist we work on, which would get half done then discarded leaving the code base littered with dead ends that needed cleaning up later. Part of the debt we acquired stemmed from the fact that in order to keep a lot of those activities from impacting ongoing efforts, i built an architecture that was somewhat disjointed. the lots of little islands approach meant we had apps that were reinventing the wheel, taking different approaches, etc...
3) Place a greater emphasis on automated testing (fewer human testors, more test engineers).
The other aspect to this is the fact that when you are creating software to solve problems that don't currently have software solutions, you spend a fair amount of time going down trails that don't pan out. The R&D aspect left us with bit and pieces of code in the code base that were incomplete or incompatible, but that were relied upon by one app or another.
We lost time discovering that certain approaches didn't work.
I think thats it... I hope someone finds the story useful.
I did a Flex/AIR project a few years ago. At the time, it was a customer requirement. They had fired their previous provider, because their business model was to charge a subscription fee for the server-based product.
All their "media" was in flash format; so they made it a requirement, even though they could have saved themselves many tens of thousands of dollars in hardware costs by going to a mobile platform. (they did not have reliable access to networking in most of their deployments, and didn't want to bear the costs of rolling their own with cellular).
So I wrote the app in a "AIR isolated thick client" mode.
There were many problems with garbage collection and application freezes, that could have been addressed had they funded migration to post-Adobe FLEX (Apache FLEX). But they ran out of funds for that.
Their platform was Win 7 laptops. The first hardware iteration was fine. The next year, when they started replacing the laptops, they hit a driver bug that caused the whole screen to freeze. With the next hardware iteration they were able to fix that, but they started having problems with AIR trying to download an update when they were not connected to a network (ie. deployed in the field). AIR behaved badly in this instance and refused to run.
When they moved to Windows 10, Adobe hadn't updated the AIR player yet, so they had another botched deployment.
It was in our contract to hand over the source code, so they tried to hire their own developer. I spent many hours very thoroughly documenting the code, and they had no hours budgeted for ongoing support. Judging by some of the desperate emails I was getting last year, before my Program Manager told them to fuck off, I think my documentation was not enough. Of course: the Dev Environment setup was Win 7, Eclipse 3.x with the Adobe plugin. I maintain a VM of the dev environment, but I doubt even I could follow my own directions to set it up anymore since the Adobe SDK of that version is so difficult to locate.
Had they listened to my original recommendation to write the entire application in Java, they would still be running fine.
That was my most recent experience with "death by technical debt".
Many years ago; a product died simply because a competitor bought our company, and tried to sell both products (because theirs was a "consumer market product", and ours was "enterprise market"). Over the next 18 months, they cut development to the enterprise product, and tried to tart-up the consumer product to meet the needs of our enterprise customers. I was a major account manager, so I watched one by one as our frustrated customers dumped everything our company sold, and went to the other competitor. So that product wasn't so much killed by technical debt as it was killed by moron MBA's.
In the end; all of those products are obsolete because nobody used dedicated backup tape library software anymore. It became a very small market because tape backup hardware never got commoditized, and prices just never came down from "insane". Even blank tapes were more expensive than a removable hard drive. Poor people just back up to the cloud, pay rent to someone else for their own data, and end up losing it.
I can probably think of about a dozen other examples from my long and miserable career in software.
Yes. I've been on projects that I believe would have succeeded if we'd approached them differently, with less debt. One was due to magpie syndrome - influential architects who enjoyed working with bleeding edge technology and convinced management not to listen to the more conservative recommendations of the developers who would be doing the majority of the work. The software was a wreck, and morale dropped (and the original architects left the team). I think the project would have worked with a better approach, but after a change in upper management, they decided to end the project completely.
Another was a rewrite of an application that had been very popular, but was written by someone who was actively learning on his own (a talented developer who later learned to write very clean and well organized code). It was, as I like to call it, "superglued" to the server. SQL statements that could have been handled with a join were instead handled by querying all the ids, running through them in a loop, building a new query, getting results, stashing them in an array, one by one. When those queries took too long, they were run in the background or just crashed the system. Changes were all made in place, on the prod system. There was no build, no archive, no nothing (this was in the early 2000s - even back then, this was a no-no, but it wasn't quite as shockingly unusual as it is now) I was part of a team that tried to rewrite it, but halfway through the organization was frustrated with the time and expense and terminated the project.
Interestingly, I think there's almost always a psychological or political factor in play when a project is killed purely by "Technical debt". In theory, debt itself shouldn't really play much of a role in whether a project is worth pursuing, because it's all sunk cost. If a project has a positive enough ROI that it would be worth pursuing as a greenfield project, then even the worst case scenario, trash everything and start over, is a net win.
However, I think what happens is that some people never wanted to see the project happen in the first place, or people get very frustrated with the expense, or people start to suspect that the failure was inevitable and that the technical problems are just a distraction. Think Jurassic Park (which in many ways is a story of a software project failure, a theme that is much stronger in the book). At the end, during the "post mortem", some of the characters think that the problem was in the approach. That with a bigger budget, less dependency on a few people, less corner cutting due to a lowball bid from a software developer, enforced through threats to reputation and lawsuits, that with a better approach it all would have worked. On the other hand, you have the chaotician, who insists from the start that a project like this will fail inevitably.
My guess is that if it was ever worth pursuing, it is always worth pursuing. The problem is, we can never really tell. Sometimes failure due to technical debt is taken as a sign that this failure was inevitable. Other times, it isn't.
It's a good question, because all software has "technical debt". Which means "technical debt" is not a real thing, but a concept invented out of a limited understanding of the problem. Instead, there is always a tension between a simple solution and a more flexible one and there is always a way to make things simpler or more flexible or even more complex and less flexible. For many problems the key to keep both simplicity and flexibility is to decouple everything as much as possible, i.e. share nothing, functions, not object with methods, scalar arguments, not complex structures, libraries, not frameworks, etc., but there is still a lot of room to do it better, like DSLs, metaprogramming.
I disagree with your definition of technical debt. Decoupling everything as much as possible may be good (or it may over complicate the problem) but technical debt occurs when you don't do things like that, whatever you define as the "proper way", for sake of expedience or lack of experience.
All software has some technical debt but you can have more or less depending on how much effort you or your organization takes in reducing it or avoiding it from the start.
> but technical debt occurs when you don't do things like that
When you don't do things like that you may or may not simply weaken your flexibility or/and simplicity. It doesn't mean that it's a bad thing or a debt, because you have no idea whether you you may need any of that later. Long-term software projects, for example, might benefit from DSLs so much more, than solving those problems on a lower level with all the decoupling and everything, is it a debt then? Absolutely unclear. Because it's all about productivity and risks and all of the non-measurable things.
There's the case of mysterious and unsolvable breakage. The product simply stops working, and the team is unable to get it working again, period. This can happen with really ancient legacy products where the original team is gone, or young products that are written badly by inadequate teams.
There's the case of unpleasantness. A product is so difficult and slow to work on that the company simply loses interest in it, and shuts it down rather than suffering through more maintenance. This does not happen with products that are highly successful business-wise, no matter how bad the suffering, so it's really a business failure rather than a technical one.
There's real antiquation. The product is dependent on a product of an outside vendor that is no longer available/maintained. I've dealt with this on a mainframe replacement, and it was horrible. I've also dealt with this in Java, and it was plenty painful there too.
And finally, there's replacement. A product is replaced (or intended to be replaced) by a new product that does more or less the same thing, only this time with a smart new team, in a hip new language, and by the gods, this time it's not going to be stupid and suck like that piece of crap the morons on the old team built! Most of these projects fail before they ever replace the old, working code, so I'm not sure this counts as technical debt failure.