Ask HN: Have you ever worked on a product that was killed by technical debt?

beat · on Jan 17, 2017

There are multiple cases of "killed by technical debt".

There's the case of mysterious and unsolvable breakage. The product simply stops working, and the team is unable to get it working again, period. This can happen with really ancient legacy products where the original team is gone, or young products that are written badly by inadequate teams.

There's the case of unpleasantness. A product is so difficult and slow to work on that the company simply loses interest in it, and shuts it down rather than suffering through more maintenance. This does not happen with products that are highly successful business-wise, no matter how bad the suffering, so it's really a business failure rather than a technical one.

There's real antiquation. The product is dependent on a product of an outside vendor that is no longer available/maintained. I've dealt with this on a mainframe replacement, and it was horrible. I've also dealt with this in Java, and it was plenty painful there too.

And finally, there's replacement. A product is replaced (or intended to be replaced) by a new product that does more or less the same thing, only this time with a smart new team, in a hip new language, and by the gods, this time it's not going to be stupid and suck like that piece of crap the morons on the old team built! Most of these projects fail before they ever replace the old, working code, so I'm not sure this counts as technical debt failure.

marcosdumay · on Jan 17, 2017

> This does not happen with products that are highly successful business-wise, no matter how bad the suffering, so it's really a business failure rather than a technical one.

One thing often feeds on the other. Because the system is hard to change, it does not get necessary features. Because it does not have necessary features, it provides less business value. Because it provides less business value, there is less of a budget for improving it. And so on.

beat · on Jan 17, 2017

Bingo.

koolba · on Jan 17, 2017

Some of the worst examples of that are when a project uses a custom build of a library of which the source no longer exists and no record of the changes exists either.

ayuvar · on Jan 17, 2017

Reintegrating future changes in the upstream is also made nearly impossible as a result; our tech lead made a change to numpy a few years ago, didn't manage to get it accepted by the project, and we're stuck with this version until the sun burns out.

If there are changes in future numpy versions we want, it's up to us to backport them, which is nowhere near our core business.

There's a lot to be said for standardization and 'boring.'

javajosh · on Jan 17, 2017

Well you could estimate the impact of backing out of the changes on the application side, with the upside being continued savings in operational complexity. Or, you could address the operational complexity with processes or tooling - which would be easiest with something like a developer OS image.

SideburnsOfDoom · on Jan 17, 2017

Yep, seen that. You might be thinking of tweaked builds of open-source components, but before package managers were so common, I've seen also internal projects with a `/lib` folder full of artefacts like `MiscDbUtils.dll` which are internal "useful" utility functions that are widely used.

Now add in a script that updates this artefact with the latest version, breaking changes, and it all goes wrong and it's hard to find the correct previous version of the binary artefact to build your code any more. Especially if the build has been broken for a while due to the project being on the back-burner and it quietly dies when no-one is looking.

beat · on Jan 17, 2017

Builds that involve non-version-controlled files that exist only on a certain developer's machine, and because that developer is a control freak (or overworked), he refuses to automate or put those files in version control...

acveilleux · on Jan 17, 2017

I believe this is called "Job security." I've seen it a few times including a dev deleting the source code repo and substituting all copy of a set of scripts he was responsible for with compiled binaries. This was discovered due to a platform incompatibility between one of the hosts running the script and the binary wrapping. Data was then restored from backups and dev was summarily let go.

beat · on Jan 17, 2017

I should add here that the "antiquation" case is the one that has caused the most observed grief in my career. The forces causing failure are coming from outside the code/business (dependence on an outside vendor), and sometimes collide with forward momentum of other parts of the code (i.e. that graph library will never, ever work with Java 7, to name an example). These become life-or-death situations, and the tendrils of the product dependency are often deeply integrated. It might be easier to rewrite than to fix.

Also, this case can impact not just products, but organizations. You can still find teams dependent on an antique commercial version control system or IDE that greatly slows down or even stops work. I've tech-led jumps to new version control systems a few times, and it's always riddled with anxiety, strain, and management angst. (And it always makes the team far happier and more productive!)

Arizhel · on Jan 17, 2017

>You can still find teams dependent on an antique commercial version control system or IDE that greatly slows down or even stops work.

Sounds like Rational ClearCase.

beat · on Jan 17, 2017

Yeah, that. But even things like cvs, that were modern and hip in 1998, are still floating around 20 years later.

I actually thought Subversion would be the last version control system, when it came out. Of course, now it's git. Maybe someday we'll get something better, and git will look decrepit.

Arizhel · on Jan 17, 2017

I have little doubt that git will be replaced eventually (or perhaps severely modified). It seems like a fad to me. It is indeed very powerful, but its UI is sheer insanity. At least SVN is very straightforward to use and understand. It just doesn't offer the distributed nature that git does, as it relies on a centralized server.

pythonaut_16 · on Jan 17, 2017

The cool thing about Git is that the underpinnings ultimately boil down to a key-value data store.

I'm not an expert on these innerworkings, but in theory there's nothing stopping someone from creating a new UI that maintains most or all of the same strengths, except that everyone already uses and is used to the current way.

I suspect if you came out with "SuperVCS" that was ultimately just a new UI on Git you'd have more success than releasing the exact same project as some kind of Git enhancement.

Others · on Jan 18, 2017

Isn't that basically Gitless? (http://gitless.com/)

nicostouch · on Jan 18, 2017

What UI? Using git on the command line is exactly the same as using SVN on the command line. At least for basic, every day things like status, add, commit etc.

ethbro · on Jan 19, 2017

> I've tech-led jumps to new version control systems a few times, and it's always riddled with anxiety, strain, and management angst. (And it always makes the team far happier and more productive!)

In the cases of this I've seen, it's always been because management and team priorities were unaligned.

Management in those places cared about a minimum level of productivity and minimizing risk.

Teams cared about maximizing productivity and their work days not sucking.

As long as teams kept managing to soldier through... rarely saw things change in those shops.

contingencies · on Jan 17, 2017

Yep.

mysterious and unsolvable breakage: Helping another startup work through one now. It's a case of reclaiming functionality from a mystery outsourced codebase (without source control) meets inexperienced developers who try their hand at sysadmin plus a 100% rotated bevy of actors (the whole team, PM and all, have jumped ship), no documentation and no technical oversight. Offshore outsourcing adds cultural fun.

unpleasantness: I would expand this to unpleasant or incomprehensible. I have seen projects be de-resourced because of lack of management comprehension when they literally paved the best and most rapid path to profit (later taken successfully by the now-dominant competition).

antiquation: The best example of this I've seen was a hardware product an employer was developing as a joint venture in Taiwan early in my career. Engineers had made the decision to use a sucky chipset from a struggling company to save money, but the supplier went under and the API froze (bugs, missing functionality and all) before our product development could complete. The target feature set was literally impossible to implement on the hardware and nobody wanted ownership. Many millions of USD, wasted.

replacement: It can work out, just infrequently. Generally when it works it's a smaller system with well defined interfaces.

krmboya · on Jan 17, 2017

Would these be cases where Robert Martin's 'The Clean Architecture' [1] would help, where the core enterprise logic is separated from third party dependencies, making the latter easy to swap out and replace?

I'd imagine a number of these cases are caused by a heavy reliance on third party technologies that are no longer supported, or very few people still understand.

[1] https://8thlight.com/blog/uncle-bob/2012/08/13/the-clean-arc...

mason55 · on Jan 17, 2017

It's a great idea but in reality most third-party tools are going to work slightly differently and the abstractions will leak (unless they were developed against an existing interface, in which case you don't need to create the wrappers yourself anyway).

kpil · on Jan 18, 2017

I think the optimal route is to not bother with (extra) abstractions and interfaces, but try to avoid using things that is unreasonably tied to vendor - unless you save a lot of time.

If the code base is not a pile of dung anyway, the cost of find/replacing and refactoring obsolete or replaced api:s once is so much smaller than the running costs of maintaining an extra layer of leaky abstractions for many years.

It is guaranteed that the abstraction will not work without a lot of changes anyway, and what typically takes the most time is the regression testing.

monk_e_boy · on Jan 17, 2017

https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-a...

mywacaday · on Jan 17, 2017

I've learned over time that its always better to assume that who came before me were smarter than me and knew more than me (I'm rarely proved wrong).

beat · on Jan 17, 2017

"They did it that way because they where stupid" is a ridiculously common assumption, when the correct answer is often "They did it that way because they knew stuff I don't know".

ScottBurson · on Jan 17, 2017

Well, I think one has to be alert to both possibilities.

This is perhaps a subtle and underappreciated reason that code quality is so important. If you're looking at an obviously well-written piece of code and you see something you don't understand, you can figure it's probably there for a good reason. If the code has visible sloppiness, it's much more difficult to tease apart the good parts from the bad.

lacampbell · on Jan 17, 2017

There's obviously a lot of stuff I don't know then. Like the benefits of copy pasted code, or 300 column lines, or implementing the logic in 20 places when it has existed in the standard library for a decade. If only the ancient sage I inherited this code base from had left notes to guide me on this path of wisdom.

khedoros1 · on Jan 17, 2017

Copy-pasted code: Data redundancy

300 column lines: Support for management buying everyone nice, new, giant monitors.

Logic in 20 places: But what if we want subtle differences between each implementation?

On a more serious note, a product that I've worked on was started in about 1998 in C++. We support something like 15 different platforms, and we've got our own implementations of things like vectors because we needed a least common denominator codebase; the standard libraries of a lot of platforms didn't provide what we needed, or provided implementations that were incompatible with other platforms. By the time everything we needed to support was modern enough (in about 2010), the system had a few million lines of code, and replacing things with library functions/classes would've been a nightmare. New development is saner, but the legacy stuff is entrenched.

adrianN · on Jan 18, 2017

Libreoffice managed to get rid of their legacy containers and moved to the STL, so maybe you can do it too

https://people.gnome.org/~michael/data/2011-05-12-libre.odp

khedoros1 · on Jan 18, 2017

Development of that particular product moved to China and India last year, so I don't have a part in its development anymore, just build+release, because there are some legal benefits to releasing it from this country.

On the plus side, there are only a few platforms that they still have to support gcc 3.x on, and all the ones that ran on 2.x are out of support (until a customer holds a few million dollars in management's face, as happened a few weeks ago with AIX 5.1).

corecoder · on Jan 17, 2017

I've been there a few times, and the worst thing is when every five or ten WTFs there's something that seems as ill thought as the previous couple similar blocks, except this time it actually makes sense, as it implements (awkwardly of course) some important corner case.

beat · on Jan 17, 2017

See, taking a reasonable generalization and interpreting it as an absolute statement is a classic case of "assuming they're stupid".

lacampbell · on Jan 17, 2017

I don't assume people are stupid. I assume they don't know or care how to write code well. Big difference ;)

lutorm · on Jan 17, 2017

Well, the correct answer is often also "They did it that way because it was a reasonable choice then, and they didn't have the benefit of hindsight that I have now."

bloaf · on Jan 17, 2017

Its also possible that whatever constraint they were working around no longer exists. Either way, it's a case of not tearing down a fence before you know why it was put up.

G.K. Chesterton, 1929:

>In the matter of reforming things, as distinct from deforming them, there is one plain and simple principle; a principle which will probably be called a paradox. There exists in such a case a certain institution or law; let us say, for the sake of simplicity, a fence or gate erected across a road. The more modern type of reformer goes gaily up to it and says, “I don’t see the use of this; let us clear it away.” To which the more intelligent type of reformer will do well to answer: “If you don’t see the use of it, I certainly won’t let you clear it away. Go away and think. Then, when you can come back and tell me that you do see the use of it, I may allow you to destroy it.

empthought · on Jan 17, 2017

In my experience Sturgeon's Law usually overrules Chesterton's fence.

edblarney · on Jan 17, 2017

"They did it that way because they knew stuff I don't know".

Or "They did what they did because they were the first one's to do it, and it only looks whack in hindsight"

Or "They did what they did because they were living within totally different constraints - like having to support old crap browsers like ie9, or a 'lowest common denominator' of slow end-user PCs etc., or some old chipset on the firmware code they wrote, or some old language paradigm, old libraries etc."

msangi · on Jan 17, 2017

Often it's the opposite.

They did something in an horrible way knowing it was horrible but because they have been asked to deliver a feature as soon as possible and at any cost.

This is a slippery slope. It lets a company move faster till it reaches the point that the software becomes an unmaintainable pile of hacks.

nsxwolf · on Jan 17, 2017

I worked for a company that grew considerably for 10 years and then lost its biggest client and folded quickly. We had spent a few years reworking our platform in a way that might have been successful enough to weather the storm of losing that client, but technical debt really slowed us down.

Technical debt may not have killed the company directly, but we have to wonder how we might have done if we could have spent more of our time on new development.

greenail · on Jan 17, 2017

This is revenue diversification not technical debt. Companies with a single customer funding the business should be actively pursuing a high priority strategy to reduce this risk.

nsxwolf · on Jan 17, 2017

You misunderstand - our software itself was full of technical debt. We spent a lot of time dealing with the consequences of that debt, and I'm wondering if we'd have been able to hit vital targets sooner without it and possibly have survived.

paulddraper · on Jan 17, 2017

It was both. The combination of two failures put them beyond the point of recovery.

RantyDave · on Jan 17, 2017

Heh. I have a rule of thumb: any project that starts with the words "this time we're doing it properly" is screwed.

mgkimsal · on Jan 17, 2017

Yes and no.

The project wasn't killed specifically because "you have technical debt". It was killed because there was no way for anyone to be effective with the combination of poor undocumented code.

"We need to change the email message that goes out when someone registers". This took a team of (4?) people 5 calendar days to change. As a contractor, I had to vpn in to one system, then remote desktop over another vpn to another system. Building web apps, these dev systems were not allowed to talk to the internet at all, so things like pulling external dependencies (security libraries, templating libraries, etc) was impossible - pretty much everything was handrolled, largely due to this restriction.

The last big killer was that the system was not passing accessibility audits. Trying to determine where to make a change to any single element would take minutes to hours, vs seconds to minutes you'd normally expect. Much of the 'templates' used were the result of a SQL statement joining 12 tables (html_meta, html_form, html_link, html_grid, etc) and complex concat()s, so adding a page or making a change might take an hour to track down the appropriate collection of tables, then figure out a SQL script to run, then send it to the person who had permissions to make updates to the SQL, then wait and see.

Did the technical debt itself kill the project? Technically no, but the inability to do anything productive in a reasonable amount of time forced the project to shut down.

Bartweiss · on Jan 17, 2017

This is a great example of how technical debt 'kills'. It's not a murder, it's negligence and a slow demise.

I went through one of these projects. The tech debt was never as bad as you describe, but it was a small company operating on a short runway. It also taught me an unfortunate lesson about non-technical founders and the dangers of outsourced code.

The MVP for the company had been bought off the shelf. It worked fine, but the code was abstruse and utterly resistant to change. As the price (in time and dollars) of change requests grew, they sensibly in-housed development. Unfortunately, their clients had some idea what to expect in terms of features per day and dollar. Requests like "let us use our logo and custom color scheme" turned out to be serious challenges since every color and style decision was clumsily hardcoded, so we took far too long to achieve them.

Ultimately, we ended up a contract behind - bringing in business to fund delivering on the previous request. Most startups operate under the gun like that (with either fundraising or contracts), but they start there and labor to escape. We started solvent, and had no clear plan to break out of tech debt - a rebuild would have been too slow, 'working smarter' wasn't viable, and expanding the tech team would have come too late and too costly.

So, we died. Not because we couldn't do work, but because we couldn't do it at a competitive speed.

ryandrake · on Jan 17, 2017

Seen this a lot. A lot of companies think they are "product" companies, but due to their unwillingness to push back on customers, they become custom engineering shops, bolting on little one-off mods to their project over and over to appease bad customers (or to appease POTENTIAL customers who haven't even bought the product yet).

Stop me when you recognize this one: "Hey your product is great, but we really want something that does [totally different thing]. If you just add that thing, we will pay for all the NRE and you can sell it to others as part of your product! Win win!" Advice to junior developers: If you hear such talk in the hallway, RUN!

kernelbandwidth · on Jan 17, 2017

We're an enterprise software shop, which necessarily means we do a lot of custom work, but we're careful to consider what we'll do. My mentor is an old hand that been through multiple exits, and in every meeting we have, he hammers this point. You're either a product shop, or a professional services shop, and if you don't know which one you are (or you believe wrongly) you die. Simple as that. The deeper you get into the consequences of knowing (or even forcing) which you are, the more implications it has for everything from product design to business strategy, and it's extraordinary how such a simple seeming thing effects such a vast amount of the company.

jrs235 · on Jan 17, 2017

Companies that think they are a product shop but chase enterprise customers and do professional services often fall to appropriately charge enough for their services. Enterprise level customers require not only more features, more guarantees, and more support, then require more attention. Are you appropriately including sales time and expense chasing them to get a contract as well as support resources into your CAC? Are you appropriately accounting for all the added expenses (and future expenses including lost opportunities)? If not, you're probably losing your tail.

P.S. "Are you" is not directed to the OP but to the business owners/leaders that don't know what they are doing.

kernelbandwidth · on Jan 17, 2017

Yes, this. One of the big things I talk about with sales is the difference between changing a priority for a customer vs. adding distinct new things. As a recent example, a client wants better and faster feedback on the trial they're conducting (we're in the med-tech space), and we've already got a new dashboard designed and on our product roadmap. I'm more than happy to prioritize that over other product pieces if it'll get us the contract, because we're already going to do it, we're only changing the 'when'.

On the other hand, when they ask for something off the roadmap, we get into more complex issues (is this market-demand data, or custom work?) Particularly for grunt-level custom work (say, adding a support for tracking data on a niche wearable device that we don't currently support) there's a lot more questions that follow.

One of the most insidious of the latter, IMO, is that if it's just for one contract, then we're either hiring contractors/outsources (expensive, high management overhead), hiring new engineers (risky to grow headcount on a whim), or redirecting resources to tasks that are likely to have both lower ROI and provide lower growth for the re-tasked engineer. At our small size and need for high-quality people, I consider this to be a real cost too.

jrs235 · on Jan 17, 2017

We (I) feel these same things.

>when they ask for something off the roadmap

Then we also get side tracked and lose focus. Leadership and management expend too much energy trying to figure out what to do. Then they want estimates from the developers so they can figure out an estimated ROI. But they rarely seem to worry about the true income potential, focusing mostly on just the initial development cost.

Pursue it? Don't pursue it? If we do, how will we? Will we be >hiring contractors/outsources (expensive, high management overhead), hiring new engineers (risky to grow headcount on a whim), or redirecting resources to tasks that are likely to have both lower ROI and provide lower growth for the re-tasked engineer.

Then is it really surprising that this lack of focus and discipline trickles down to those doing the work and the work itself? Technical debt in the making. It starts at the top.

kernelbandwidth · on Jan 17, 2017

Absolutely. A brief story on tech debt from the top:

One of the more frustrating things I've experienced is when I got push-back for implementing more project management process (we have a very light process, but when I took over it was sticky-notes-on-the-desk level). The complaint was "we can't slow down development to do more process". Very through-the-looking-glass, as I, the Engineer, was arguing for more management process and Leadership wanted less.

But of course, accurate estimates were needed, just, you know, without making measurements. I implemented some process anyway. We actually increased development speed from less churn and lowered communicated (consult docs before breaking someone's flow), improved estimates, and we've been able to better contain our tech debt.

Bartweiss · on Jan 18, 2017

> Very through-the-looking-glass, as I, the Engineer, was arguing for more management process and Leadership wanted less.

I suspect you could go a long way with the heuristic "If engineering asks for more process, always give it to them."

It's not flawless, but it's like hearing Ron Paul call for a new regulation - when a request is that out of character, you should usually suspect that there's some good motivation.

cthulhubot · on Jan 17, 2017

This is a frighteningly accurate description of the company I'm currently at. They spent many years chasing after the enterprise level customers at the cost of alienating their smaller team level users and never had an answer when requests would crop up from the larger accounts asking for features ('just get it done'). Now they're trying to pivot back to the team level customers and are having an supremely difficult time dealing with the tech debt built up by addressing the enterprise level concerns. We tout ourselves as being a product shop when in reality we're trying to be both.

jrs235 · on Jan 17, 2017

What industry are you in or where is the home office located?

rvdavis · on Jan 17, 2017

Can you expand on what your mentor defined as a "product shop" vs "professional services shop"?

kernelbandwidth · on Jan 17, 2017

Sure, but the answer is pretty trivial: If you spend more than half your time on customization, you're a professional services shop.

He also added that if you're a product shop doing less than 70% off-the-shelf, you're probably screwed, while 90% off the shelf is really the ideal (again, enterprise software).

I think the more interesting question is "what counts as professional services?" This gets much trickier, for example when you start building out APIs to make second- or third-party integrations easier, is that "product" or "professional services"? It certainly seems like product building, but if you're doing for a customer's use, it gets real blurry real fast. If you're not using that API internally, you're almost certainly on the professional services side. If you do use it internally, is it rock solid enough that you can support and expose it without that support becoming professional services?

Drawing sharp lines aside, this all probably seems kind of trivial, but the first time I ran through our product design with him and we discussed this, I went back and radically re-thought a lot of our strategy, particularly at the customer interfaces.

rvdavis · on Jan 17, 2017

Great explanation, thanks!

Bartweiss · on Jan 17, 2017

Ouch, that's almost word-for-word from the company that died of debt.

It was enterprise sales, so customization was unavoidable, but no one was differentiating between big and small changes, or big and small buyers. The product was desperately struggling to do ~3 things at once, and still being sold to potential buyers on the promise of a fourth thing it would do "soon".

jrs235 · on Jan 17, 2017

Enterprise customers which require enterprise sales require enterprise pricing. If appropriate enterprise pricing is not in place then you risk an enterprise failure.

ayuvar · on Jan 17, 2017

I think every one of my former employers who have failed, did so by doing those 'customs.'

The last one even spun off a dedicated team that built (hacked) prototype customs in order to secure sales, then threw away the prototype and, after collecting the commission, told the new customers that it would take several years to get what they just saw in production but in the meantime we can do our existing product with some mods.

I imagine the pressure to accept these deals is immense though. Why let an innocuous little feature request hold up such a great deal?

cerved · on Jan 17, 2017

Sounds like poor sales to me

pavel_lishin · on Jan 17, 2017

"Let us use our logo" doesn't seem like an unreasonable request.

Bartweiss · on Jan 17, 2017

And it wasn't!

That was part of the problem: the sales people couldn't push back on most requests because they were often quite reasonable. When they were more demanding, it was usually from a large prospective buyer so we had to bend over backwards.

The result was that we had huge tasks to do with no (current) revenue, and small tasks to do that took 10x as long as they should have. Since servicing existing revenue streams (even on reasonable requests) became so time-consuming, handling big enterprise demands became totally untenable.

johnnyhillbilly · on Jan 17, 2017

Sounds like you needed better sales people.

Bartweiss · on Jan 18, 2017

We needed a lot of things. Better (or more technical) sales was one. More mid-level engineers was another. Mostly, though, we just needed more time or money.

Our target market was very reluctant to moving from a paper system to a software system, so there was a lot of foot-dragging and feature requests. That delay had just never been budgeted into schedules or runway.

wayn3 · on Jan 17, 2017

thats one line of code on some part of some html template if done in a non-idiotic fashion (speaking to the javascript overcomplicators)

Bartweiss · on Jan 17, 2017

Heh, yep.

And it was one line of code, after several hundred lines had been torn out and rearranged to ensure that different clients could insert their own pictures of different sizes without everything exploding. The whole team was desperately trying to force enough flexibility into the software that one-line changes could be made in <10 lines, instead of >100.

nicostouch · on Jan 17, 2017

I remember my first day on the job too.

dagw · on Jan 17, 2017

Last company I worked for did that to great success. All our customer got a custom version of our product tailored to their needs and their project. At least half our customers where doing something that needed at least one new feature that we didn't currently have. If you build your business model around that it is not necessarily a problematic model.

andrewclunn · on Jan 17, 2017

Working for a company like that, BUT they allowed me to completely rewrite 3 of the tools from scratch in a more modular fashion so that I could do these things without having to modify the old code bases. Now there are two other applications that I still have to support (and were written by a consulting company we no longer contract through). It's night and day. So this isn't really the worst thing if you're given the authority and power to take full control of an application and rebuild it and take ownership of it. Of course, this doesn't really apply to junior devs.

lewisl9029 · on Jan 18, 2017

This part in Rich Hickey's Simple Made Easy talk [1] had a lasting impression on me. It really drove home the point on how a build up of complexity (one of the most common forms of tech debt, and one of the hardest to avoid) can eventually "kill" a project in exactly the way you described, slowly and painfully:

    "But I have all this speed. I'm agile. I'm fast. You know, this easy stuff is making my life good because I have a lot of speed."

    What kind of runner can run as fast as they possibly can from the very start of a race?

    [Audience reply: Sprinter]

    Right, only somebody who runs really short races, okay?

    But of course, we are programmers, and we are smarter than runners, apparently, because we know how to fix that problem, right? 
    
    We just fire the starting pistol every hundred yards and call it a new sprint.

    ...It's my contention, based on experience, that if you ignore complexity, you will slow down. 
    
    You will invariably slow down over the long haul.

    ...if you focus on ease, you will be able to go as fast as possible from the beginning of the race. 

    But no matter what technology you use, or sprints or firing pistols, or whatever, the complexity will eventually kill you. 

    It will kill you in a way that will make every sprint accomplish less. 
    
    Most sprints will be about completely redoing things you've already done. 
    
    And the net effect is you're not moving forward in any significant way.

[1] https://github.com/matthiasn/talk-transcripts/blob/master/Hi...

robodale · on Jan 17, 2017

This. It's not like there is a sign saying "Technical Debt Required to Proceed"...but rather the slow death from a thousand cuts to productivity caused by having to analyze every potential system, process, template, stored procedure, etc, etc...to make any stable(ish) change. Even if things are loosely coupled and not dependent on each other...you still have to go in and make those changes. Telling this to a room full of non-understanding management is a whole different challenge...

dx034 · on Jan 17, 2017

Templates stored across a database is probably the worst thing I've seen repeatedly across projects. Just because a database can store everything doesn't mean it has to.

Some people really seem(ed) to have an allergy to plain files for storage. A plain file with OS level caching will beat most (if not all) databases for static content. But doesn't sound as fancy, so it's probably harder to charge a lot of money for it.

mgkimsal · on Jan 17, 2017

Template in one database table I can live with (pros and cons, multiple front-ends, etc). One template broken up in to 12 tables requiring an 100+ line SQL statement with concat()s and HTML interspersed is insane. Had there been an API or utilities with it to manage it, it might have been manageable, but nope - just "write some queries".

Also, just repeated your comment to a friend who said "that's the worst thing you've seen? can i have your job?" :)

nicostouch · on Jan 17, 2017

(blown away by all the responses to my original question!!!)

Your story here makes me laugh if only because of a very painfully familiar memory. Luckily this wasn't a big production system but rather an internal tool (that I guess clients did also use but it wasn't part of 'production' per se) that was written entirely in perl_cgi filled with cryptic regular expressions written in complete spaghetti code and it would concatenate together entire webpages that had bits of them rendered by including the contents of files strewn all over the file system and of course the logic to concatenate all the html together was strewn across a fistful of files which were in disparate locations. In short I was once asked to make a simple change to some html and after 5 days of reading through perl_cgi and developing a pure hatred for Larry Wall, I decided to do a java re-write that took 3 days. I mean... crikey. Haha.

RUG3Y · on Jan 17, 2017

We have a similar application in PHP. By the time I've traced through all of the included files that are touched by a particular function, I've forgotten what I'm looking for. It's truly a nightmare.

rbirkby · on Jan 17, 2017

Wait until you see a Turing complete DSL programming language stored line-by-line in rows in a database table and executed by pl/SQL using cursors, locking the entire execution to prevent concurrency.

nicostouch · on Jan 17, 2017

This is why we can't develop nice things.

Terr_ · on Jan 17, 2017

I'm dealing with one internal project where this happens because there's an artificial IT/build distinction between "emergency" code push and "casual" raw database change.

This means lots of business-rule crap gets softcoded into the database or ini files (increasing complexity and bug-risk) just to support a hypothetical future where somebody needs it changed without a full sprint cycle.

bandrami · on Jan 17, 2017

And you aren't kidding about "repeatedly". Personally I associate it with the late 1990s/early oughts and ColdFusion; I think one of the early CF frameworks really encouraged it, and it kind of just stuck from there, particularly in Government web work. But it's probably wider than that...

jkingsbery · on Jan 17, 2017

This has been my experience. Since technical debt is hard to measure, it's more a case of a series of unwise technical decisions leading to a lack of productivity. Due to tight schedules, short-cuts are taken which lead to more unwise technical decisions, and you have a death-spiral.

Bahamut · on Jan 17, 2017

Isn't this precisely technical debt? Unless you want to split hairs and call this a technical massacre...

Bartweiss · on Jan 17, 2017

It is, but the question is what "killed" by technical debt means. It's uncommon but not unheard of for code to reach the point of "we can't do that". Mostly, though, the proximate cause of death is a funding shortage or management decision to shutdown. Technical debt is just driving the cost overruns or inefficiencies that kill the project.

Bahamut · on Jan 17, 2017

I don't disagree that the root cause in that situation is not engineering in most situations and that it usually is an indication of a symptom, but attempting to change the meaning of the term itself is not a great approach for communicating that.

mgkimsal · on Jan 18, 2017

ACK - I FORGOT THE BEST BIT... (well, maybe not best, but...)

No one could install anything locally - everything had to be done on their locked down remote systems (some were Amazon remote desktops).

For the accessibility testing, the auditing company used JAWS. The company I was contracting to had one license (or so I was told) so I couldn't have one. We actually tried to install JAWS on an Amazon desktop, but it just crashed the entire virtual desktop, requiring re-imaging. That happened twice, so we gave up.

So, the proposed workflow was, I'd make a change, push code, email someone to move that code to a system that an internal tester could look at it. I'd get an email back, then email the internal tester that the code was ready to go look at. The internal tester would go to the screen(s) in question, using JAWS, then "tell me what JAWS said". That would often take several hours or a day.

I was then supposed to make changes based on that feedback, then repeat the cycle until things were 'fixed', then we'd ask the auditing company for another test, which they'd schedule for 2 weeks in the future. Then we'd wait.

During the first iteration of this part, sr mgrs kept asking me "when will this be done?". I kept trying to explain that we didn't even know what "done" was - the auditing company just had blind folks that would use the system with JAWS enabled and if they felt it was usable, they'd say so, otherwise, they'd report back "hey, this isn't usable", and we'd have to start digging in again.

_pctq · on Jan 17, 2017

This account kind of comfort me in what I think of technical debt : most of time, the problem is most likely lack of documentation than anything else.

I don't see how a big project could be coded without containing anything specific to the project. And even then, the architecture by itself is unique and deserves documentation.

pyb · on Jan 17, 2017

I've also got experience of this kind of situation (See my other comment) I think you can definitely call that technical debt.

ionwake · on Jan 18, 2017

I think this is classed as lethal technical debt

misja111 · on Jan 17, 2017

It happened to me twice. The first time was in a start-up at the beginning of the century, we were developing an electronic health record and we had outsourced the database abstraction layer to a company in Greece. In the beginning things went fine but after a while the development of the DAL went slower and slower and it became unstable as well. Eventually the word came out: the main developer of the DAL framework had left the company and, according to the Greek CEO, she had been 'too smart' which meant that nobody understood her code. They had tried adding features but that had made things only worse and the DAL had started to crash randomly. We tried to take over the framework by ourselves but it was written in Eiffel and the code was a horrible entangled mess. Eventually we rewrote it in Java but, being a start-up, we lost too much precious time already and eventually went almost bankrupt and were bought up by a competitor.

The second time was in a small company whose product was a search engine for consumers. The web layer was written in a mixture of JSF, JQuery and Ajax. While that combination already slowed down development on the front end, the main problem was the performance of JSF on the server. Because JSF is rendered on the backend, it placed a massive load on our server for certain heavily used pages and we just couldn't scale any further. Skipping JSF for a framework that was rendered on the front-end would be the solution but that was a massive refactor for which the company just didn't have enough resources. Eventually the company had to skip their search product and change their business model to a more community based website.

klibertp · on Jan 17, 2017

> We tried to take over the framework by ourselves but it was written in Eiffel and the code was a horrible entangled mess. Eventually we rewrote it in Java but, being a start-up, we lost too much precious time already

I wonder, would the result be different if you had access to competent Eiffel developers? How large was the Eiffel codebase?

Eiffel is an interesting language, with a somewhat unique feature-set (I think only Ada is coming close). Design by contract and static typing as core language features - if used right - should greatly help with both stability and ease of refactoring.

How large the codebase was is an important question, also how bad it really was. I saw a similar story - external codebase getting worse and worse from some point on - with Clojure at the center. The code quality was quite ok for a couple of months, then it worsened. At that point and for a couple of following months the codebase was possible to save - a single competent Clojure programmer would make a difference, I think. The project was less than 10k LOC then. However, more than 1.5 years and 60k LOC later, doing anything became nearly impossible for anyone, including original authors.

kisstheblade · on Jan 17, 2017

You had a search engine and rendering the search results was the bottleneck? That's really weird. Don't know a lot about JSF but other templating languages are usually really not ever the bottleneck. Maybe if you have some giant table with thousands of cells each with its own complicated template directive (for loop with conditionals etc).

taeric · on Jan 17, 2017

This sounds less like technical debt, and more like liabilities of over engineering. Possibly feature creep.

That is, technical debt is not necessarily tangled over-engineered code. It is more compromises that were made to actually ship and operate in the world. You can see this in the world with devices.

Consider, technical debt is the reason you have AC delivered to your house going through as many converters as you do devices. Often to the same target power characteristics for those devices. It is not the reason that your coffee machine that also grinds and whatever, is likely to fail within the year.

Another example; Technical debt is the reason we are still predominantly using petrol for automobiles. It is not the reason the dashboards are horribly non-responsive on modern cars.

CydeWeys · on Jan 17, 2017

> Consider, technical debt is the reason you have AC delivered to your house going through as many converters as you do devices. Often to the same target power characteristics for those devices.

Bad example. AC power has many desirable characteristics for the local transmission grid. If you were to do the grid over from scratch you'd still use AC. You're also too focused on household electronic usage, which is a very tiny percentage of the overall electricity used.

st3v3r · on Jan 17, 2017

It's just an illustrative example. And I'm going to bet that most of us, the vast majority of us, really only have experience with household usage. So it would make no sense to get into other usages, which most people won't understand.

taeric · on Jan 17, 2017

My understanding is that HVDC had advantages. That said, I was also intending that to include the distribution in your house.

napoleond · on Jan 17, 2017

HVDC does have advantages in certain scenarios (very long transmission lines, for example) but parent is still correct--the majority of the grid makes way more sense with AC.

taeric · on Jan 17, 2017

I meant my follow-on to be a concession, but worded it poorly. I thought it had advantages, but yes, I was thinking small appliance mainly. In particular, in home. And not just computers, but lights and control panels. Seems many things all use the same power characteristics and are now becoming complicated by dealing with AC.

Which, amusingly, is fitting for the tech debt debate. Eradicating some choices from the project is likely to be missing the point. Just as eradicating AC from all power would be short sighted/wrong.

bluGill · on Jan 17, 2017

AC is much better in the home. There is no way to get around the fact that you need massive wires to supply low voltage at high amps.

It is much cheaper to have a power supply on every electronic device turning 100-200 volts to 5 volts than to have one big power supply turning power line voltage to 5 volts. Of course a lot of computers need 3 volts or less, so the power supplies exist anyway. It is also more efficient big power supplies running at low loads are inefficient, the power supply on each device is sized to what the device needs and so it more likely to be operating in a high efficiency area.

Arizhel · on Jan 17, 2017

>AC is much better in the home. There is no way to get around the fact that you need massive wires to supply low voltage at high amps.

That's orthogonal. What you really mean is that you want high(ish) voltage to distribute power in a home, in order to mimimize losses due to wire resistance over distances of dozens of meters.

You don't need AC to do that. In fact, with modern power electronics, the switching converters we now use for supplying LVDC to our devices can work just as well with DC as with AC input power.

The primary advantage of AC over DC is that it can be converted between voltage levels easily with transformers. But today, we can do the same thing with DC using DC-to-DC converters. These didn't really exist in an economical way before a couple decades ago, maybe even more recently.

If for some odd reason, western society decided to re-engineer and replace the whole power grid, it's quite likely I think they would simply switch to DC for everything. With deployment at that scale, the cost issues with the equipment should go away, making it no more expensive to replace everything with DC converters than transformers. DC is more efficient than AC because it stays at its peak voltage, and because it has no skin effect. But the technology needed to make it inexpensive to use for power transmission has only been around for a somewhat short time (namely, modern power electronics). Up until recently, it was simply a no-brainer to use AC because of its simplicity in generation, transmission (with transformers for stepping up the voltage), and usage (with AC motors).

bluGill · on Jan 20, 2017

> it's quite likely I think they would simply switch to DC for everything

I'm not sure. AC has some important safety considerations that would make it better even if the efficient was significantly worse.

Switches, fuses and circuit breakers that work with DC are more expensive than AC. When a circuit opens there is a spark, and this spark can in some cases create a conductive plasma. With AC the wave goes to zero and the plasma disappears, while with DC it continues. There are cases where a DC fuse blew but the fuse continued to conduct. Of course this can be engineered around, but generally with larger and more expensive parts.

When someone touches power accidentally, AC is slightly safer. With DC your muscles will grab and never let go. AC gives you a chance to let go. This is a low probability thing, but is a factor.

The guy who wanted us to debate is wrong for one other reason though: I'm approaching the limits of what I know on the subject, while you seem to have a lot more knowledge.

taeric · on Jan 24, 2017

The desire for debate was to increase our collective knowledge. Not to prove someone right. I am fully comfortable with the idea that I was wrong. You both have knowledge I find interesting.

taeric · on Jan 17, 2017

I'd be interested in you two debating this more, since you both clearly know the topic better than I do. This post is reflecting what I thought I had heard. But, I am not in this field.

Arizhel · on Jan 18, 2017

There's really nothing to debate; the guy I replied to was totally correct about everything except the bit about "AC is much better in the home", where I pointed out that he really meant that a high voltage roughly where our current AC systems are (120V-240V) is much better in the home than some kind of low-voltage DC system, and that with modern technology, it would probably actually be better to have a DC system. But realistically, that's not going to happen because the gains (probably very minimal) aren't worthwhile compared to the enormous cost of conversion, given how standardized our current AC system is and how all our infrastructure, point-of-use devices, etc. are all designed around that.

Basically, he was assuming practical real-world considerations, I'm going off on a tangent about ideal conditions. His argument is about whether it's better to stick with the current AC system that your house has, or if it's better to install a low-voltage DC system to supply 5V, 12V, etc. to all your devices from a single, central, whole-house power supply as many people who don't understand electricity will frequently suggest. He's completely correct: low-voltage DC is a terrible way to supply power over any distance more than a meter or two because of resistive losses, so it'd require massively large copper cables or busbars. And power supplies are generally very low-efficiency when operated at low load. So our current approach (separate little optimized power supplies for every device, plugged into a higher-voltage AC supply) is actually optimal.

taeric · on Jan 18, 2017

I was never arguing that an individual should replace the AC in their house. My argument was, with current technology, the AC setup can be seen as tech debt.

Which seems compatible with what you are saying, but the parent was specifically claiming I was wrong.

That is, you seem to be echoing my point. But seem to be claiming it is different. What am I missing?

Arizhel · on Jan 18, 2017

I wouldn't call it "tech debt". Present-day AC systems may not be completely optimal (given current electronics technology), but they do work well.

As I understand it, "tech debt" is something that has to be reckoned with at some point, or else you're going to have real problems in the future (just like refusing to pay off a money debt will generally cause you real problems at some point when the creditor sues you and gets a judgment). You can't just let it go on forever; eventually you need to "pay it down" (by cleaning up the codebase, migrating to newer technologies, etc.), or else catastrophe happens (the company is unable to compete and goes under). One common factor cited in these stories is that the code becomes too unmaintainable and unreliable: too many weird changes for customers pile up and introduce serious bugs which cause the product to not work properly.

This isn't like that at all. We can go on with our current household AC power systems indefinitely. Maybe we could get a 1% improvement by switching to DC systems (at an enormous cost because most of your appliances and devices won't work with it without adapters), I don't really know exactly how much better DC would be (not much really), but what we have now works fine. Furthermore, it's not like the whole electric grid system needs to be changed: it's entirely possible, for instance, to switch distribution systems to DC and leave household systems AC. Instead of distributing the power at 30-something kVAC in your neighborhood and using outdoor transformers to step it down to 240VAC for your house, it could be distributed in DC form, and those transformers replaced by modules which convert the 30-something kVDC to 240VAC. In the old days, this was hard and expensive to do, but with modern power electronics it's not. But even here, the question is: are the gains worth the expense? And the answer is very likely "no". (For reference, I'm not a power engineer, I just studied it in college as a small part of my EE curriculum.)

So this does not, to me, resemble "tech debt" at all. It's just a system that we use for legacy reasons and which is extremely reliable and works well, even though it might not be the absolute most efficient way to solve the problem. This is no different than many other engineered systems. Perhaps you have a decent and extremely reliable car. Could it be better? Sure: you could build the chassis out of carbon fiber, use forged aluminum wheels instead of cast, etc. all to save weight and improve fuel economy. Are you going to do that? Of course not, because the cost is astronomical. There's cars like that now, and they cost $1M+.

So for AC systems that we're talking about, the question is: what is wrong with them that we want to consider replacing them with something else, instead of just sticking with them even if they're not quite as efficient as they could be? Because the cost to upgrade them would be enormous, so you need to have a very good reason.

taeric · on Jan 18, 2017

Most instances of tech debt are things you don't have to deal with. Usually, it is the term pulled out for things people don't like. Or generally deprecated methods that have better replacements, but still work.

It is this second sense that I was latching on. It --tech debt-- will drive decisions today. But it is not clearly bad. Just a constraint on current decisions that was made in the past. Often for decent or really good reasons.

Bit rot is another term for things that start to decline in how well they work. That is generally different, though. Usually a by product of replacing implementations without keeping functionality. Such that people relying on old behavior are left cold. (I can see how tech debt can easily turn to bit rot. But it is not required.)

Consider, LaTeX being an old code base is often used to call it tech debt filled. People want to modernize it. Not because it doesn't work. But because they think there are better ways, now. And they do not consider all of the documents made on it as infrastructure.

Now, i concede that all of this is my wanting the terms to have unique and actionable meanings. Elsewhere I was told "tech debt" is a catch all term now. That seems to rob it off usefulness.

Edit:. I forgot to address the monetary aspect of the analogy. I like that, to an extent. But most debt is taken in very specific terms financially. Unlike colloqually termed debts between friends. That is, there is no interest in this metaphor that works. Nor is there a party you are borrowing from.

Arizhel · on Jan 20, 2017

>Most instances of tech debt are things you don't have to deal with. Usually, it is the term pulled out for things people don't like. Or generally deprecated methods that have better replacements, but still work.

I'm not so sure about this. To me, "debt" is something that has to be paid eventually. Otherwise, why use the term "debt" at all?

So if something works fine, why waste your time and energy replacing it with something newer?

Usually, the reason for this is the assumption that sticking with something deprecated will eventually bite you in the ass: something you're depending on won't be supported, will have security holes that won't get fixed, etc., and you're going to wish you had fixed it earlier. So this is a valid use of the term "tech debt" IMO.

But if something is just something someone doesn't like, that isn't "tech debt" at all. I don't like .NET, but it's invalid for me to call all software written in .NET "tech debt". I don't like Apple's ecosystem, but it would be pretty ridiculous for me to call all iOS software and apps "tech debt" when many millions of people use and enjoy that software every day.

So, for your LaTeX example, I don't consider that tech debt at all; instead, it's just like iOS and .NET software to me. If someone doesn't like it, that's their problem; the fact that it isn't brand new isn't a problem for me and all the people who still happily use it.

So personally, I think anyone using the term "tech debt" to just refer to things they don't like is using it incorrectly and in a totally invalid way.

taeric · on Jan 21, 2017

I find this a compelling view. But, I urge you, just google technical debt. You will see the definition: "Technical debt is a concept in programming that reflects the extra development work that arises when code that is easy to implement in the short run is used instead of applying the best overall solution."

So, in this case, AC/DC fits if we agree there is a chance the "best overall" solution is DC. (Which, I fully grant, is not a given.) There is also a bit of playing loose with "short run."

Then, skip back to the top of this thread, where you will find: "products that are written badly by inadequate teams" and "case of unpleasantness" and "A product is replaced (or intended to be replaced) by a new product that does more or less the same thing, only this time with a smart new team, in a hip new language..."

All of this is the first, most highly voted, post. The next post is a highlight of poorly engineered solutions.

My point? Find a case study that has the usage you are referring to here.

Now, certainly rhetorically it has this appeal to people. But I have never seen it used in a way that it fits the metaphor. Just used to hit the emotional strings of "you must pay back your debt!" While usually claiming that the design or lack of some technology is the debt.

Arizhel · on Jan 23, 2017

I think we're going off on a tangent here, but even with that definition from Wikipedia, there's no such thing as "the best overall solution". Everyone is going to disagree about that; the best you'll get is a consensus. For instance, back to LaTeX, there's countless academics out there who use TeX/LaTeX/whateverTeX for writing academic papers, and getting beautiful results while not having to mess around with a WYSIWYG editor like MS Word and just typing in some simple formatting codes. That's what *TeX was designed for and has worked well for for ages. But I'm sure you'll find a few people who say this is bad because it's "old" and that they should switch to the latest MS Word for everything, and rewrite all their papers in the latest MS Word. If you look really hard, you might even find someone who thinks both are bad, and that all academics should rewrite everything in WordStar.

"The best overall solution" is up for debate. It's the same with programming languages; one team will say that C is the best overall solution for a certain problem, another team will say it's Python, another team will say it's one of the .NET languages. I'm sure you can find plenty of engineers who will claim that mission-critical real-time avionics systems or automotive ABS controllers should be redesigned to use x86 CPUs and run Windows and have the code written in C# instead of using C/C++ and running on a small RTOS on an embedded microcontroller.

The implication I see with your Wikipedia definition is that implementing something easy in the short run instead of something that really is the best overall solution will eventually lead to more work to fix the shortcomings of the quick-n-easy solution. So, like I said before, a "debt", because it has to be paid back eventually (with work). The problem I see is that not everyone agrees on what is the best overall solution, and unlike a money debt that's easily seen by looking at a dollar figure, the only way to really know how much "tech debt" you have is through experience, i.e. accumulating it and then finding out over time how much work you have to expend to fix things when your quick-n-easy solutions start having real, demonstrable problems. If your solution has no actual, demonstrable problem (e.g., you use LaTeX and it continues working great year after year for your use-case), then I don't consider that to be "tech debt" at all, even if some people don't like it.

taeric · on Jan 24, 2017

I 100℅ agree regarding "best overall solution.". Indeed, in large that is my point.

Alternatives may have advantages. However, often the advantages of where one is at are ignored in the debate.

My gripe in this debate is more from actual uses of the term. Not from any ideal use of it.

taeric · on Jan 17, 2017

Including light bulbs?

CydeWeys · on Jan 17, 2017

Yes, even light bulbs. A typical household LED is very easy to run off AC. You just need a capacitor big enough to hold the charge between each cycle of AC (which is very little). More information here: http://www.ledsmagazine.com/articles/2006/05/running-leds-fr...

It'd be vastly more expensive to wire up an entire house for low voltage DC than it is to include the simple rectification components in every light bulb. In a house you're talking about many wire runs of many dozens of meters. This is not a good environment for low voltage DC at all.

taeric · on Jan 17, 2017

I recall seeing IEEE articles talking about the DC wired home. I confess I stopped paying attention, as it will be a long time before this is actionable for me. Can't claim surprise to know that I had things that were wrong.

Of course, the cynic (and, ironically optimist) in me still has this as evidence that "technical debt" is often used in BS circumstances by people that just don't fully understand the reasons for the things they are talking about. :)

mattmanser · on Jan 17, 2017

Saying that technical debt is only deliberate is an old argument[1], but usage defines meaning and modern usage is that "technical debt" is a catch-all term. It just means bad code we know should be fixed.

[1] 2009 - https://martinfowler.com/bliki/TechnicalDebtQuadrant.html

czinck · on Jan 17, 2017

Stretching the debt analogy, you can go bankrupt from payday loans (the "just push it out" tech debt) and from getting too big of a mortgage to build/fix up a house (over-engineered tech debt).

taeric · on Jan 17, 2017

That seems to indicate it is a worthless term, then.

tboyd47 · on Jan 17, 2017

For the second one, you must have been receiving a lot of traffic for template rendering to be such a bottleneck. Why not upgrade the server?

misja111 · on Jan 17, 2017

They upgraded the server of course, to as much as they could afford. But it wasn't enough, the rendering load soon caught up. First of all because their number of visitors grew, but also because they wanted to add new features to their JSF pages and every new feature required extra rendering power as well.

olavgg · on Jan 17, 2017

Could you not scale this horizontally? We do all template rendering server side, though it is JSP and not JSF.

misja111 · on Jan 17, 2017

That was considered but it would of course take some refactoring on the back end, and it would still cost quite much in hardware. The thing with JSP and JSF is, they do ok as long as your content is relatively static, because then rendered content can be cached. In case of this company, their most visited page was the list with search results which by its very nature was not very static at all.

sbov · on Jan 17, 2017

Every problem is different, so I hate to judge, but what you're saying doesn't add up to any experience I've had.

It sounds like your company seriously screwed up the design if you can't scale your web tier code horizontally. I've also never had a view technology take up a significant chunk of cpu resources - it's always the Java code carrying out the functionality. E.g. I would expect the largest factor in CPU usage in the list of search results to be... generating the data for the search result. If the largest factor was rendering the result, then something was probably seriously wrong.

agrafix · on Jan 17, 2017

What was/is the product/company called if i may ask?

edblarney · on Jan 17, 2017

"Eventually the word came out: the main developer of the DAL framework had left the company and, according to the Greek CEO, she had been 'too smart' which meant that nobody understood her code."

OMG no - run for the hills.

95% of software systems are not inherently sophisticated - they are 'complex' - yes - maybe there are many features, and moving parts - but there are no pieces of the system that should be hard to understand by anyone. Decent architecture + decent design and coding and an entire banks system should read like a long, but well articulated user manual.

Unless you're doing super low-level stuff, complex algorithms, heavy math stuff, or issues with massive scale or performance etc. ... the end result should almost be mundane in most cases.

tboyd47 · on Jan 17, 2017

The closest I've come was a Rails project I inherited from a star developer who had just left the company. It was a B2B project that involved importing large Excel spreadsheets of various different formats into a standardized database for itemized review.

The code was pretty sloppy, but didn't deviate much from standard Rails idioms. Not many people on the team understood Rails well enough to read it, but I did. Bug reports were constantly flooding in. I suggested taking a sprint to build up an integration test suite and then letting loose on the backlog.

We did build up a sufficient test suite in one sprint. But the bug reports never slowed. By the time we had the confidence to truly start tackling bugs at speed, the battle had been lost. We had been so busy writing tests that we forgot to manage the bug tracker. The impression was that we were overwhelmed and unable to make progress. The project was swiftly closed.

People remembered that codebase as an exemplar of sloppy code and technical debt, but that's not the lesson I took from it. I had seen, and others would see later, much worse. The lesson I took was that perceptions are as important to manage as results.

byroot · on Jan 17, 2017

I don't think I ever seen an Excel / CSV import implementation that wasn't a huge mess.

protomyth · on Jan 17, 2017

Excel imports with Perl did pretty well for me. I was pretty careful on insisting on some rules for the sheet data and enforced them strictly with decent debugging info for the users.

I still think Robustness principle[1] is a croc and strictly controlling inputs is one key to happiness. It also, frankly, helps your users in the long run by giving them exactly what they want and it actually cuts down on the amount of thought they have to put into it. Chaos and disappointment do not make a good user experience.

1) https://en.wikipedia.org/wiki/Robustness_principle

tboyd47 · on Jan 17, 2017

Ruby has had good ETL libraries for a long time. In my opinion, our product team was too lenient concerning the format of the Excel files. Asking customers to fill out a template spreadsheet to submit to our system, rather than letting them submit any old XLS file they happen to have on their computer, would have gone a long way towards simplifying the problem space.

samwilliams · on Jan 17, 2017

Do you mean Excel to CSV? Or a CSV importer?

I totally agree about Excel importing, but CSV is trivial, no? Here is an Erlang version I happened to write yesterday:

  lists:map(
    fun(Row) -> string:tokens(Row, [SepChar]) end,
    string:tokens(InputStr, "\n")
  ).

EDIT: I know this version won't support escaped separator/newline characters, but I made it for a specific use case in which I knew that would not occur. Adding that functionality would make it a little messier, but still not too bad.

EDIT2: Thanks for the interesting comments! Not so trivial after all!

Perhaps a more accurate version of what I was attempting to say above is that 'it is often (not always) easy to build a CSV parser to interact with one specific program'. The four line version above works perfectly for reading the type of files I designed it for. If you want to work with human created, or more complex variants of CSV, all bets are off.

acdha · on Jan 17, 2017

You need a lot more than that to handle CSV in the wild (quoting, Unicode, line termination, etc.) but the real killer I see is when it's edited by humans. The special cases for errors and inconsistencies will add up quickly; in some cases you may be able to reject invalid data but you may not have that option or an easy way to tell whether any particular value is wrong.

Excel takes that, adds some fun things like people using color and formatting to store data, and things like Excel auto-corrupting values which look like dates and may not have been noticed before you do something with the data.

Bartweiss · on Jan 17, 2017

I know of at least one company whose entire business is handling this stuff. They find growing companies as they hit critical mass and need to move their Excel data into a real database. The product is just "Your data is hideous and was entered by hand without validation or formatting; it'll never convert and it'll be wrong when it does. We can help."

They handle all kinds of theory and technical stuff, like normalization and processing Excel-corrupted dates. But they also handle a lot of easy-but-agonizing tasks like regularizing single quotes into apostrophes, which crop as soon as you let humans enter free-form data.

coredog64 · on Jan 17, 2017

I used to use Google Refine (now OpenRefine [0]) for this. It lets you load up the data and then apply rules to see if they are mostly correct. It doesn't get you all the way, but it is better than going blind on manually revising a huge Excel "database".

[0] http://openrefine.org/

rabidrat · on Jan 17, 2017

What do you use now?

btown · on Jan 17, 2017

Could you share the company name?

Bartweiss · on Jan 17, 2017

I'll try to remember. I ran into them at a career fair a few years ago, so it's not leaping to mind, but it seemed like they had good software and a great market niche.

nicostouch · on Jan 17, 2017

Let's not forget Japan Post's CSV for all the Japanese Address data that contains some lines that are line-wrapped, that is, one record spans two or more lines in the CSV file. A line-wrapped CSV... I just can't even.

johnnyhillbilly · on Jan 17, 2017

That's why ASCII was designed with record and field separators. Unfortunately, it's not used (de facto) for delimited files.

samwilliams · on Jan 17, 2017

That is very interesting, thanks! I hadn't thought about Unicode or tolerating human error. Although the times I have worked with it have been when it is a transport medium between two computer programs.

acdha · on Jan 17, 2017

That's definitely a less-aggravating situation by far. I've had a lot of cases where a significant amount of specialist human time was in a spreadsheet and it's really made me wish there was an Excel-for-data which acknowledges how many people are using it for semi-structured data like this.

nicostouch · on Jan 17, 2017

Like Airtable?

byroot · on Jan 17, 2017

I'm not talking about parsing. It's a mess in it's own right of course (encoding, line terminators, etc as others mentioned).

I'm talking about the actual conversion from tabular data to relational. Most of the applications I've worked on had this in one form or another.

So you end up with users downloading an export of their data in CSV, editing it in Excel in various ways, and then reimporting it in the application.

Every company I worked for, this kind of feature was always in the top 3 in term of support load.

johnnyhillbilly · on Jan 17, 2017

"Relational" means "tabular". (A "relation" in relational theory is a table with a name, fields with names and types, and the data in the table.)

A "relationship" in an ER diagram maps to a "reference" in relational theory. This is part of the type safety/domain system of RDBMSs.

If these concepts are muddled, SQL will never quite make sense :)

dragonwriter · on Jan 17, 2017

> "Relational" means "tabular".

Relational database can be expressed in tabular form, but tabular data is not necessarily relational.

> (A "relation" in relational theory is a table with a name, fields with names and types, and the data in the table.)

A relation is a system of one or more functions (in the mathematical sense) each of which has a domain that is a candidate key of the relation and a range that is the composite of the non-key attributes.

johnnyhillbilly · on Jan 18, 2017

Interesting definition. Do you have a source for it. It seems ambiguous.

From the Wikipedia article on relational databases, subsection relational model. "This model organizes data into one or more tables (or "relations") of columns and rows, with a unique key identifying each row. Rows are also called records or tuples."

samwilliams · on Jan 17, 2017

Ah right, yes, I can imagine that would be extremely messy!

mattmanser · on Jan 17, 2017

Your edit is delightful.

"No, honest guys, I knew CSV was more complicated. I just didn't need to make my code safe."

Here's a csv parser in Erlang that actually attempts all that trivial stuff:

https://github.com/rcouch/ecsv/blob/master/src/ecsv_parser.e...

That's a lot more code than yours. And the notes even say it's not tolerant of badly formed CSVs.

samwilliams · on Jan 17, 2017

I submitted that edit before the post had any replies.

Also, I did try to make clear that the given code was created 'for a specific use case in which I knew' that the format of the input files was tightly defined.

dragonwriter · on Jan 17, 2017

> but CSV is trivial, no?

You can define a narrow subset or version of CSV that is trivial, but that doesn't reflect what one finds in the wild as "CSV", which was not systematically defined or described until well after many mutually incompatible things by that name.wdre well established.

pizza234 · on Jan 17, 2017

CSV is trivial? You may have missed this:

    http://tburette.github.io/blog/2014/05/25/so-you-want-to-write-your-own-CSV-code/

:-)

samwilliams · on Jan 17, 2017

This is interesting, and as the other commenters have pointed out, creating a parser for /all/ variations of CSV can be very tricky.

always_good · on Jan 17, 2017

But your code doesn't even handle the trivial case.

    "She said, \"Hello, world!\""

You can drop the "meh, I know I didn't handle all the complicated cases" act.

We all recognize the classic developer I-could-build-that-in-a-weekend hubris when we see it. :)

samwilliams · on Jan 17, 2017

Hi,

Thanks for your thoughts. As I have stated elsewhere, the code handles all of the cases I needed it to handle, due to the stability of the input file format (which was emitted from another program). I don't see that this should be too hard to believe.

I also said in my second edit, on the top line, 'Not so trivial after all!'. If I was putting on some kind of act, wouldn't that have been dropping it? Further, I noted in my first edit, before I had received any replies, that I 'know this version won't support escaped separator/newline characters', so I am not sure what you were trying to add with your example?

I think that my central point (and I totally accept that I didn't express this well) is that depending on the specifications of your program, the required CSV parser /can be/ very short. When one compares this to other data exchange formats, for example JSON, it is clear that the barrier to /entry/ is much lower. The shortest JSON parser I could find with a cursory look was 200 lines of C.

I totally appreciate that to write a CSV parser that works for all cases would be extremely longwinded. It has been interesting to hear other people's experiences and opinions about that. But the fact remains true that /in some cases/, depending on the requirements of the program, the parser can be very short.

> We all recognize the classic developer I-could-build-that-in-a-weekend hubris when we see it. :)

It is funny you should say this. I needed the CSV parser because I thought it would be fun and interesting to see if I could build an anti-malware tool in a week (I am taking a malware detection class at the moment, I wanted it done before the next lecture). I did not expect I would be able to have anything good working in that time, but by the early hours of the next morning I had a perfectly functional anti-malware tool. It can use ClamAV signatures (so it can detect everything(?) that ClamAV can), runs in parallel, has a nice text console with DSL, and is fast enough (processing 210k small files in ~5 minutes, checking against ~60k sigs). It is about 650 lines of Erlang (including comments). I am saying this not to boast(!), but to make the point that I greatly underestimated how productive I could be, beat my expectations by many fold, then people comment about my hubris online the next day. It is funny how life goes!

Thanks,

Sam

pslam · on Jan 17, 2017

How about: All Of Them

Every failed product/project I've worked on in my professional career, which had full intent to ship from the start, was killed by technical debt. It's usually indirect, but it's always the root cause.

It takes many forms:

* Too buggy to ship, due to a creaky old code base being over-stretched to a product with too high reliability/experience expectations.

* Product form factor, efficiency, user experience not good enough to sell well, due to spaghetti code base which couldn't be whittled down to removable pieces. Result: large runtime, more expensive, less efficient hardware.

* Existing old codebase deemed too bad to ship a product, requiring a rewrite-from-scratch, but timescale too long to make any sense -> product killed.

It's difficult to elaborate more while maintaining some discretion about exact companies and projects. The general point is: technical debt isn't just some fuzzy intangible issue — it indirectly creates enormous costs in people and time, can affect the physical form products take on, and impact the user experience. Products always get started without taking this debt into account, but when it's finally realized, it can change basic features, and then it kills them.

Products are designed with faulty assumptions about what existing resources can be applied to them.

jkchu · on Jan 17, 2017

Interesting that you talk about project that never shipped, When I read OP's question I was thinking about already-shipped products that became too hard to run and maintain.

I am curious how long your products/projects were in development for before falling to tech debt? Were these net-new projects?

pslam · on Jan 17, 2017

> When I read OP's question I was thinking about already-shipped products that became too hard to run and maintain.

I've been mostly in consumer electronics related companies, where a product which ships and then becomes too hard to maintain usually doesn't "fail". It just gets phased out. In a way, this is another way technical debt has an indirect, but large impact on products: obsolescence becomes a necessity. Not so much planned — which implies malice — as simply realizing it's not possible to maintain indefinitely.

> I am curious how long your products/projects were in development for before falling to tech debt? Were these net-new projects?

Usually very quickly, or after far too long.

The better projects know ahead of time that there are Dragons lurking in the code base. But that's effectively saying there are projects which never even got past brainstorming because we knew the technical debt was too high.

On the other hand, there are projects where it only becomes apparent how much debt there is after a lot has already been invested. It's like you'd expect, e.g "There's a performance problem because of a basic primitive this library uses everywhere. And that was originally a workaround for a compiler performance bug. We could fix the compiler bug, but it turns out other libraries relied on it..." and so on. Extra time-to-market makes a product make less and less sense — fashions change, hardware improves, new tech arrives — and so it gets killed. Or worse, shipped.

bbcbasic · on Jan 17, 2017

Ironically you avoid technical debt by slowly killing and rebirthing parts of your product.

The class that is no longer appropriate for new requirements gets canned for a better abstraction etc.

In aggregate, over time, you may kill the product to avoid technical debt!

Lewton · on Jan 17, 2017

The last place I worked at will die because it will take them years to migrate from Oracle to postgres due to "technical debt" (the codebase is coupled with the database to a hilarious degree; business logic in triggers, huge plsql packages, plain sql queries in the java codebase, halfassed homerolled ORM). They're not getting as many new customers as they could because, for various reasons, the Oracle licensing terms are now unacceptable for the new customers they have been in contact with over the last two years.

That's the most concrete reason I can come up with why the technical debt will kill them, but there's plenty of vaguer reasons why it's been killing them for the past 5 years and will finish them off over the next 5. The attrition rate have been around 20% a year since I joined. For most of the time I worked there they compensated somewhat by hiring new people. Word has gotten around though, and they've run out of qualified candidates willing to work on their mess. Hell, we even had a couple of gifted hires leave after a month or two while shaking their heads.

My current workplaces main product is using the same tech, is the same size (loc) and has the same functionality of the other company, but serving a different market. They did the oracle to postgres migration in 2 months. 2 MAN months, one guy.

New workplace: 15ish developers, serving the same amount of customers, doing similar revenue, making stable releases every week

Old workplace: 80 developers at its peak, doing non-hotfix releases around every 3 months. Just a mess in every way. Mostly stemmed from the codebase and the architectural choices that had been made along the way.

msluyter · on Jan 17, 2017

Hey, sounds like we worked at the same place! That, or the "wedded to Oracle for life" is a common antipattern. I'd add "shared everything architecture" to the horrors.

Yeah, once you get that deeply entrenched in Oracle, it's almost impossible to get away, and after that experience I vowed never to work at another Oracle shop.

pjc50 · on Jan 17, 2017

Oracle is a form of technical and financial debt all its own.

I wasn't directly involved in but had a good view of our university's finance modernisation woes: http://news.bbc.co.uk/1/hi/education/1634558.stm https://www.admin.cam.ac.uk/reporter/2001-02/weekly/5861/1.h... - although in fairness the inflexibility and disorganisation were existing features of the institution, and Oracle merely exacerbated them.

tylerpachal · on Jan 17, 2017

Do you think the same thing could happen with cloud vendors like AWS?

rabidrat · on Jan 17, 2017

Yes, and pretty easily if you buy into all the new features that no one else will ever have all of. If you stay clean with simple storage, compute, DB, email, then you should be okay.

Ideally you'd have some kind of plan though from the start, for which other cloud provider you would use and how the services would map, in case using AWS becomes untenable.

Anasufovic · on Jan 17, 2017

Cloud product life cycles should definitely be more interesting. Azure for example already has a "classic" model and the new ARM model. Either way, avoid tightly coupling code with some external vendors service.

ayuvar · on Jan 17, 2017

I don't know if I've ever seen a successful database transition for a large project at a large firm. You basically have to build that from the start.

Doing it after the fact in a politics-heavy organization is confounded by not just the technical difficulty of the task, but the glad-handing and perception management that has to happen to keep your team from getting fired during the process.

PaulRobinson · on Jan 17, 2017

I was CTO of a company that had a two-week outage due to technical debt. I didn't sleep much for any of it. We fixed it, and we'd lost about 30% of our subscriber base in that period. The company took on new funding to survive, invested that in a new set of products, and shuttered the old stuff just to stay afloat.

I am currently working in a business where there is a nearly 8-year old Rails app (600+ models, 250+ controllers, 400+ libraries, LOC around 60k), that sits at the heart of everything we do.

The company is struggling to grow and believes the cause is that engineering is slow. We have asked to refactor this code base multiple times, and point to the technical debt as the cause features that should take a day to implement taking between 3-4 weeks, typically.

It is only recently that the penny has finally dropped and they've realised if they don't invest in replacing this thing (there is too much technical debt to fix, we're calling bankruptcy and moving to a brand new architecture piecemeal), the business is likely to fail within 1-2 years.

That means my current employer is likely to go bust because of technical debt within 2 years max unless we become really good at fixing this.

We are optimistic.

We have to be, right?

sydd · on Jan 17, 2017

IMO this is the price to pay for a dynamically typed language. 60K LOC is not much in a static language, you can use tools to refactor it easily or to visualize the control flow. But with a dynamically typed language? Its a nightmare. You change one thing and cannot possibly know what else could have gone wrong.

PaulRobinson · on Jan 18, 2017

The legacy app has > 80% test coverage. Refactoring is still slow because there are all sorts of business assumptions put into place that add functionality without ever questioning the need for it.

Basically, for a long time, the company never really re-evaluated what it had learned and spent time trimming things down, so as a result there is this ungodly mess. At the heart of what the business does, there is no real need for more than a dozen models. So why do we have so many more? Nobody ever refactored away stuff we didn't need any more, and so weird things happen.

There is also a coupling issue that is endemic to all monoliths. We're moving to a micro-service architecture with clean domain separation, and we'll probably go to 1/10th of the code base in LOC terms within 12 months, even if we move some of that functionality into Go, Java or Python services (all options).

sydd · on Jan 19, 2017

Unit tests just make the mess a bit easier to solve, but are far from a perfect solution. Wanna move and rename a function? Do it, then spend hours fixing your 10 broken tests, writing new ones, and test you app for hours because unit test dont cover integration. A static language + an IDE does it automatically within seconds.

pizza234 · on Jan 17, 2017

It depends.

I work on such type of codebase, but we have a fully covering testing suite, so applying changes is not a problem (interestingly, I've just realized that the line count of the testing code is 50%+ more than the base application code itself).

So ultimately I think company culture (that is, emphasis on automated testing, for dynamically typed languages) is the crucial factor.

72deluxe · on Jan 17, 2017

I would say that 8 years to write 60k LOC is slow. I worked as the sole GUI software engineer for a hardware firm and wrote > 100K LOC in 3 years, not including the test projects proceeding the actual real project. This was in C++, and included client/server stuff, entirely custom resizeable GUI, OpenGL 3D graphics and modelling of 3D assets and textures etc too. And getting it running under OSX + Win32, fixing issues on both.

And that wasn't a stressful place to work with insane deadlines - it was fairly relaxed for the most part.

gizmo · on Jan 17, 2017

You can't compare C++ with Ruby code. Rails code especially can very easily become a hairball where everything happens "somewhere else". You don't have static type checking or other compile time hints to figure out what is going on. You can't see which functions are called from which call sites. Refactoring tools? Forget it. Ruby is a very compact and flexible language, but if you're not disciplined you'll pay the price for it.

A 60kloc C++ project is small and easily manageable, a 60kloc Ruby hairball can drive a person insane.

RangerScience · on Jan 17, 2017

Rails code. You get a lot of bang for your buck. That, and the "greater" metric for the project complexity is the stupidly high number of controllers and models.

golergka · on Jan 17, 2017

LOC differ not only by the language or technical difficulty of the task, but by business requirement difficulty as well.

If, for example, they're in banking and finance, and those LOC deal with fine details of tax code... Oh boy.

wheaties · on Jan 17, 2017

As a former C++ dev, you can't compare Ruby LOC to C++ LOC. Just 4x the Ruby and then it becomes more fair. C++ is just verbose. #import <algorithm.h> doesn't fix everything blocks fix.

PaulRobinson · on Jan 18, 2017

This is why LOC is a terrible metric. I was using it as a barometer as most ruby devs would see 60k LOC and go "Ooooookaaaay...."

If anything, we've gone too fast and not spent enough time going back and understanding what we really need to keep.

luhn · on Jan 17, 2017

Can you expand on how technical debt caused a two-week outage?

TheCoreh · on Jan 17, 2017

If I had to guess, I'd say it's probably an issue with the build/deploy system. Perhaps someone deployed a broken build, then tried to revert/rollback, and realized that the previous version didn't build "cleanly" anymore.

This could happen if you have a lot of dependencies, switched compiler versions but left the binaries "in place" and deployed changes incrementally.

RangerScience · on Jan 17, 2017

Yet another reason I swear by Heroku - you can rollback to the actual prior release, not rollback the code and try to rebuild.

ardivekar · on Jan 17, 2017

Doesn't AWS do this too?

RangerScience · on Jan 17, 2017

Which service? EC2 doesn't - at least not inherently, there might be a way to snapshot the machine's state; I'm not sure.

luhn · on Jan 17, 2017

Depends on which of their 150 services you're talking about. You can't do it on CodeDeploy.

PaulRobinson · on Jan 18, 2017

There are two problems with giving a full answer: firstly I'm still under NDA for some of that, and too much detail would breach that; secondly, my exact memory of it is limited.

In short, my predecessor had attempted a move to SOA without understanding dependencies, circuit breaking and failure modes. This would then cause scenarios where the entire front-end would fail to render on a single down-stream service taking a little longer than necessary.

When identifying how to stop that happening, I discovered a large number of comments tagged "TODO" with statements like "Refactor this when we have time" or "We need to find a way to do this better".

Further down on the downstream services there were rather esoteric SQL queries doing large joins that nobody had done a query plan on. It was hard to identify these because the ORM had been trusted to do magic, and it was happy to do so, but there was a point where it was not apparent _why_ these joins were happening, but when you found the code, there were more comments "This needs improving", "We should refactor this", etc.

We were able to get something back quite quickly with liberal application of indexes, and it took us a day or two to refactor the queries enough to mean response times came down, but the error rate was still > 20%, and it was random, so 1-in-5 page loads of the front end service would fail.

We refactored the code to circuit break and handle degraded services better, but that took a few days, and then we started working down to the back end service and figuring out the final steps.

It was a small team looking after legacy code that everybody knew was a bit messy.

A few weeks before this code was shuttered, I heard from a friend that some of our content did not render at all on certain Android devices. I identified the cause as a half-finished refactor (again, my predecessor), that had never been finished because he had been pushed to work on something else. This caused a dramatic decline within a key market segment that resulted in declining ad revenue, subscriptions and overall viability of the business.

Basically, when you start something, finish it. If you find yourself putting in comments like "We should refactor this" anywhere in your code base, and you're doing so because the business is pushing you to work on new features, you have a massive problem culturally that is going to cause a rise in technical debt that raises risk to revenue.

All technical debt ultimately will lead to problems that the business will see on balance sheets, but they will rarely successfully identify the cause as being technical debt because they can't see, understand or rationalise it. They think it's engineers being grumpy idealists.

People play too fast and loose with the concept of "MVP" for my tastes, and it's a problem I see over and over again. The risk of that is, long-term, it will cause business failure.

nicostouch · on Jan 18, 2017

Solid lesson. Thank you for sharing.

luca_ing · on Jan 17, 2017

I'm currently working on the reincarnation of a project that was killed by technical debt -- TWICE.

The original codebase was about 20 years old. It was control code for something best described as an industrial robot. Written for the last 20 years by greybeards who knew a lot about the manufacturing process, and were reasonably good at getting a product out the door.

But the whole thing was riddled with #ifdefs for this customer or that, or one batch of machines or another. All long forgotten, written by people who had since left, or been pensioned. It was in dire need of improvement and extension, but it would have been superhuman to inject new features into this rat's nest. Plus their electronics supplier was discontinuing the control electronics the system was designed for. The UI also looked like it had been designed by German engineers in the 1980s. Which was the case.

So they made the defensible decision to start from scratch. A team of engineers was to develop an brand new machine, with all new electronics and all new code. They got to work -- and had to scrap the new software about three years in. It was just utterly misdesigned, and riddled with bugs. It featured wonderful WTFs like the embedded realtime code depending on the Qt libraries.

I observed its instability myself: it would just spontaneously crash every five minutes, sometimes just while idling. Once the project lead was on holiday, the programmers revolted, went to the head of the company, and the project lead found himself without a project on his return. Whee.

Now we've started from scratch again, and have at least succeeded in making different mistakes this time around. Fingers crossed, this might end up working.

specialist · on Jan 18, 2017

Huh.

Should I ever inherit an #ifdef mess again, I intend to replace #ifdefs with Strategy patterns.

#1 figure out all the known defs in actual use

#2 rerun the preprocessor with each variant (combo)

#3 capture the output(s)

#4 aggressively apply the Strategy pattern, refactor code

Last time, I removed dead code piecemeal manually. It sucked.

vkuruthers · on Jan 17, 2017

Sounds like the software managers were not on top of things, otherwise how would they have allowed this type of design to be implemented?

Not to say I haven't seen this effect myself many times...

luca_ing · on Jan 17, 2017

Management, you say?

Management ordered the creation of new software. Shouldn't that be enough?

The project lead was responsible for this design, and above him there was nobody with any expertise in the matter.

From what I've heard he's an extremely good C++ programmer. He's just a terrible architect.

pards · on Jan 17, 2017

Knight Capital lost $465 million in 45 minutes caused (at least in part) by technical debt and poor development practices.

Summary: http://pythonsweetness.tumblr.com/post/64740079543/how-to-lo...

dx034 · on Jan 17, 2017

I'd say it wasn't due to technical debt, more a start-up like development approach to a company that trades millions within seconds in full automation. It sounds like the deployment process wasn't that complicated for a company of that size, but it was deployed without a single check by a second person.

If you're trading automatically, you'll need a very, very solid deployment and audit process, even if you're just a small company. The reason banks are so slow in deploying software is because most of them lost a few millions at some point due to some bug.

Startups that think they can act faster than banks just haven't had that bug yet. That's also why I'm rather negative on the whole Fintech scene at the moment.

beat · on Jan 17, 2017

That's not a startup approach. That's an enterprise approach. Believe me, I've fought tons of resistance in automating deployment operations in the enterprise. There's a perception that automation is dangerous, and you need human checkpoints. In practice, I've worked on projects much, much larger than Knight Capital, where the deployment process was driven by a huge spreadsheet, and orchestrated by non-technical overseers telling techs what commands to run based on the spreadsheet in front of them. It's incredibly vulnerable to human error like "Oops, forgot to deploy to one of the eight servers in the cluster".

In the enterprise, this is called "mature" and is a sign of great sophistication.

bpchaps · on Jan 17, 2017

Yeah, the amount of resistance towards automation at some larger companies is a total mindfuck.

My first ops job back in 2008 was at a large exchange's NOC where we shut down and clean the application environment every day. Every Friday, we would have to take a backup of the ~20 or so production databases - by hand, in an ancient CDE based UI with a . Right click -> menu -> submenu -> backup database. Very little room for error, and you weren't allowed to do it without somebody else watching you. Throughout the weekend, customers would then run tests against the production databases. Once testing was done, we'd restore the prod databases back to their original state to wipe out test data.

At one point, I asked my boss if it was alright if I automated it after showing him a POC and was rejected because, "We don't trust automation to do it accurately every single time." Mind boggling. In mild fairness, in the 15 or so years they were doing that, I don't think anyone did it wrong.. which is an enormous miracle in itself.

(That was a strange company. My boss was a JW who'd worked there for 30 years regularly tried to convert me and would spend four hours a day on spreadsheets for his church. We'd also manually kick off stock split processing from a ~10" CRT monitor from the early nineties.)

kafkaesq · on Jan 17, 2017

Call it "process debt", or "management debt" (i.e. the lack of investment in proper management and the culture that goes a long with it -- in favor of a "STFU and just add that feature now! I need it yesterday!" mentality). Either way, part of the same boat, basically.

robtaylor · on Jan 17, 2017

Wow.

"The consequences of the failures were substantial. For the 212 incoming parent orders that were processed by the defective Power Peg code, SMARS sent millions of child orders, resulting in 4 million executions in 154 stocks for more than 397 million shares in approximately 45 minutes. Knight inadvertently assumed an approximately $3.5 billion net long position in 80 stocks and an approximately $3.15 billion net short position in 74 stocks. Ultimately, Knight realized a $460 million loss on these positions. "

https://www.sec.gov/litigation/admin/2013/34-70694.pdf

corecoder · on Jan 17, 2017

To me, the single line that stood out was:

"The new RLP code also repurposed a flag".

I've never seen a flag repurposed without catastrophic effects.

pards · on Jan 17, 2017

^^^ That's the part of the disaster that I see as Technical Debt

kevinpet · on Jan 17, 2017

To be fair, the only time you hear about someone repurposing a flag is when it has catastrophic effects.

tboyd47 · on Jan 17, 2017

> During the deployment of the new code, however, one of Knight’s technicians did not copy the new code to one of the eight SMARS computer servers.

Was the issue technical debt or a sloppy deployment?

mhluongo · on Jan 17, 2017

Which was probably due to technical debt. I can't think of another reason you'd manually copy code to 8 servers...

throwaway2016a · on Jan 17, 2017

It is surprising there wasn't a circuit breaker here.

linker3000 · on Jan 17, 2017

Have a small hardware one....

About 1986 I was tasked with moving a small block (a few KB) of data very quickly from cabinet A to B, with the racks full of custom electronics - no PCs, all original stuff on a flight sim with 386 Intel processors all over the place. The racks had Multibus backplanes.

I suggested a 'TAXI' fast optical link (oooh - optical..too radical) or a pair of Intel 589 (Ethernet) cards for an off-the-shelf solution. Nope, too expensive. Engineering Management suggested a twisted pair ribbon cable between the two adjacent racks - um, OK..

Long story short - me and the senior design engineer decided to use the Intel 8257 DMA controller chip to grab the bus and blast the data between the RAM on two cards.

After a short period of fails, we found that the engineers who designed our 386 cards did not bi-directional buffer the DMA request line onto the backplane as they never expected any other card except the master CPU ones to initiate a DMA, so the CPU cards could not see the line being toggled from elsewhere.

Engineers would not accept a change request for 'reasons'

Intel 589 cards is it then!

All because someone chose to omit one tristate buffer.

ekidd · on Jan 17, 2017

Projects rarely die because of technical debt. Instead, it becomes ridiculously expensive and difficult to add new features. But the software itself can remain in use for decades, gradually decaying and rarely adapting to changes in the business environment. Eventually either the software gets thrown out and replaced with something new, or the company is no longer able to compete.

I've seen this play out probably close to a dozen times now, at different employers and consulting clients.

hyperpape · on Jan 17, 2017

This is only true if developing new features isn't part of how the company succeeds. That's probably true for some tools that are used internally. If you can't modernize your payroll, that might cost you some money, but it's not make or break.

For a company that makes software as a product, or to directly support or create their main product, not being able to add new features is a really bad place to be.

maxxxxx · on Jan 17, 2017

I have seen a product getting killed by trying to resolve technical debt. The refactor took nine months and in the end didn't work better.

I am a big fan of constant refactoring on a small scale but I am very skeptical of large refactoring of a whole project. You may end up with something that's just different but not really better.