Hacker News new | past | comments | ask | show | jobs | submit login
Why do we fall into the rewrite trap? (justindfuller.com)
201 points by iamjfu on Jan 21, 2020 | hide | past | favorite | 169 comments



A reminder that Joel Spolky's essay being cited is about people rewriting ugly code that already works.

That nuance is important because successful companies do invest in rewrites when there's an architecture change because it's the cleanest most economical way to do it. My previous comment of Microsoft examples: https://news.ycombinator.com/item?id=19245653

Another example is Google rewriting their early web crawlers of Python/Java into C++ to make them more efficiently scalable to thousands of machines. They also rewrote the frontend web server from C++ to Java.

But some rewrites also failed such as Evernote's rewrite to C#/WPF.

I think for the topic of "rewrites bad/good", it's better to list a bunch of famous real-world case studies and extract the common criteria that makes rewrites successful.


I was around for the Google Webserver rewrite and it was actually a rewrite done by refactoring not clean slate. It's amazing what you can do with disciplined refactoring, even moving to a new language.


If you have a decent test suite to lean on, you can work miracles this way!


And if you don't have a decent test suite, you might as well build one. So much of a rewrite is archaeology, figuring out what elements of a system are intentional versus accidental. A great way to do that archaeology is to start weaving a net of tests around the existing system. Sort of like Fowler's Strangler Fig pattern: https://martinfowler.com/bliki/StranglerFigApplication.html


Yeah, if you don't have a detailed test suite, and can't build one, you probably don't fully understand your requirements, and your rewrite is in trouble already.


Mind you, maintenance is in trouble too. Just a different kind of trouble.

If you have original code, you can always write a test suite based on it, unless management is averse to this. (Begging the question why they're not averse to a rewrite.) Like any test suite it is not going to be complete.

Ultimately it's best not to get piled on by rewriting really and often in small parts. (Carmack way.)


This sounds really interesting to read about, especially refactoring into an entirely new language. Are there any articles about what that process looked like for Google at the time? Or any example that changed languages really.


Not the OP, and no experience with google, but I have personally worked in a way that sounds as if it could be similar:

First, you start with a slow but correct prototype in python. This can then be refactored into python closer to C, thinking about the data structures and functions that will be needed in C.

Then, you can refactor these bits into C. I've usually done this one module at a time.

You then write python bindings to this new C code. There are a number of shortcuts available here with differing tradeoffs, but you can just use the Python/C API.

In the end you're left with a C library, some bindings, and some python that just calls to the python bindings for the C library.

It would not be a huge leap to go from that to a full C implementation.

Nothing I've said here is C specific, so you can do the same with C++.

There are upsides and downsides to this whole approach. For me, one of the downsides is the amount of stuff to remember, especially some of the quirks of extending Python.

I find it easier to start with something correct but high-level in Python and work down to something easier to translate into C, even if I skip straight to the writing in C bit. I have found this usually leads to a significantly better design whilst it was still Python. You can also test the behaviour of the C and python modules against each other.


I think the ironic thing about Spolsky's example is that, in the end, it worked. It eventually gave birth to Firefox. So I think there are many lessons to be learned about how to do a rewrite, how to message it, how to do it in a way that it doesn't stop other forward progress. But if a rewrite fails, that doesn't necessarily mean that not doing a rewrite would have been any more successful in the end, and that's the trap I think people fall into.


Joels article was written in 2000 and Firefox 1.0 was eventually released in November 2004. In the meantime they lost almost all their market share. They eventually won some of it back, but this was probably only because Microsoft had stopped all development of IE and disbanded the IE team.

Usually you can't count on you competitors stopping all development for half a decade.

Obviously you can rewrite from scratch and end up with a fine product, since the original version of the was developed from scratch. The question is if you can accept losing all customers and wasting years of effort.


There is still more to the story though. When Netscape went under and open sourced their code, they had to rip out all of the proprietary bits. The result? What they had didn't actually run!

It wasn't a question of rewriting working code. It was a question about whether the better path forward was to try to fix what they had and didn't understand, or whether to rewrite into something that they understood.

In other words, Joel set up a straw man argument.


They could just rewrite the components which they didn't own, then. They would have to do that anyway.

I'm not sure what you mean with "didn't understand". It was their own code. They could just read it.


To write "It was their own code. They could just read it." misses a great deal about code bases. The first and foremost is that the vast majority of working code bases do NOT have sufficient internal documentation to actually understand whys and wherefores of that code base.

The specific reasoning for doing something or using a specific technique or why the magic numbers, etc., are often not put into any form of accessible documentation. This is just a fact of life. Documenting such things can be very tedious and often thought of as not being needed because the code is "obvious". Yet to someone coming in to do work on it in six months let alone six years, it is vital missing information.

{Edits spelling mistakes - should have checked before submitting}


Code often sucks, but a rewrite from scratch doesn't change anything. The rewritten code will suck just as much, but now you just lost a lot of time and have new bugs.


I'm not sure what you mean with "didn't understand". It was their own code. They could just read it.

I meant exactly what I said.

The missing proprietary bits were not their own code. There was nothing to read and no documentation. They could see the call, and that something important happened, but what it did and how it should work was guesswork.

Furthermore key people were gone. How the code that they had was supposed to work is harder to dig out than you might think. Read http://pages.cs.wisc.edu/~remzi/Naur.pdf for an explanation of why.

Just to add to the fun, just getting the browser running wasn't enough. Netscape and IE were incompatible all over the place, on everything from how styling worked to what order event handlers ran. (In IE, the thing you clicked on gets the event first. In Netscape, the window did. That's a pretty fundamental difference!) They needed to not just have a running browser, but have a browser with enough IE compatibility that it could be usable on webpages designed for IE. It is easier to design something to be compatible from scratch than to find out what deep assumptions need to be changed in existing code to fix incompatibilities.

As fun as it was for Joel to beat up on them for their choice, their choice was arguably the best way forward given the actual situation.


>The missing proprietary bits were not their own code. There was nothing to read and no documentation. They could see the call, and that something important happened, but what it did and how it should work was guesswork.

To your point, the following link is another famous example of smart programmers not being able to easily understand someone else's source code. Raymond Chen at Microsoft couldn't easily figure out why Pinball didn't work in 64bit Windows and gave up:

https://devblogs.microsoft.com/oldnewthing/20121218-00/?p=58...


I'm sure they could have figured it out if it was important enough. They just decided it was not worth it and scrapped the product instead.

"we couldn’t afford to spend days studying the code trying..."

So the estimated time to fix the bug was on the scale of days. It would certainly have taken longer to rewrite the game from scratch if that was the alternative.


I’ve never understood why they had to port it at all. The 32-bit version of Pinball would have kept working just fine. And apparently, the binary from Windows XP still runs on Windows 10.


What kind of stuff was these third-party bits then? I would have thought it was stuff like say a jpeg decoder library.


I only know about this second hand..

Back when Joel's article was written there was also a response to it written by someone who was part of the decision. And that was one of the big points that stuck with me.

There were dozens of proprietary components that had to be removed. Java being the biggest (because it had been integrated into every part of the browser). They succeeded in removing them, but after 2 years of trying to fix the result, everything from print preview to text selection was broken. That was when they decided to scrap and rewrite.

Now consider. You've had 100 engineers working for 2 years and you don't have a functioning product or a realistic timeline for getting one. How many more years are you planning to throw into it before deciding that you're going down a dead end?

Now let's throw a monkey wrench into this. The main reason that the product is being funded was as a bargaining chip. Specifically, AOL had a deal with Microsoft to bundle a specially branded version of IE. And had every reason to believe that when the contract expired, Microsoft was not going to renew. AOL didn't care about browser market share, they wanted to have a viable alternative if Microsoft refused to negotiation. (As it turns out, Microsoft renewed the deal but part of the price was that AOL had to stop supporting Mozilla...)


So they had to change the Java integration to some kind of plug-in interface. Perhaps not trivial, but it would not require you to rewrite everything from the ground up. And it was well-understood what Java does. If it wasn't you still would have to figure it out in order to do the rewrite.

> Now consider. You've had 100 engineers working for 2 years and you don't have a functioning product or a realistic timeline for getting one.

You are definitely in serious trouble when it get to that point. But if you can't make a realistic estimate for fixing text selection, how can you make a realistic estimate for the much larger and more complex task of rewriting everything from scratch? What if they rewrite everything, but then after four years of work still can't get text selection to work?


What if the development team is not by a bus? What if they're suddenly replaced with chimpanzees?

Literally collecting all the problems with the current code base and designing around them is the only thing anyone can reasonably do.

If you have a poor development team, you're just screwed - hire a better one. The problem there is figuring whom you can keep and in which position. Since nobody knows how to grade programmers (and stupid algorithmic tests do not work for design and engineering which is critical) you get to try until you run out of money and perhaps rely on peer or expert opinion.

As a point of reference, rewriting a GUI with a layout engine (not a web browser but a music editor, shares a lot of design) took a small team half a year and solved all of the problems reported and a bunch of others. Including performance. At 4 months we had an almost matching result with exclusion of single issues that took additional redesign. (Asked client if they already want to keep it, they said to finish it.) And the testing was limited to manual checklists. A web browser is bigger and testing somewhat easier to automate.

The true important thing we did is dig into requirements including fuzzy ones and designed using good practices making change easy. (Including a few times when the client just changed the requirements.)


So they had to change the Java integration to some kind of plug-in interface.

No. They had to remove it entirely because back in 1998 there was no open sourced version of Java that was compatible with their license.

And it wasn't just that they had to rip out Java. The browser had been partly written in Java, and THAT was ripped out as well. And the parts that had been in Java needed to be rewritten.

But if you can't make a realistic estimate for fixing text selection, how can you make a realistic estimate for the much larger and more complex task of rewriting everything from scratch?

Because the second task is inherently easier to estimate.

Fixing a broken large code base involves a lot of chasing down rabbit holes to figure out what isn't working, what depends on what, etc. When you start down a rabbit hole you have no idea how deep it goes, or what else you are impacting. This is why maintenance is harder than writing fresh code. Furthermore entropy says that code will not improve through this process. Costs only go up.

Estimating a rewrite is a question of coming up with a clean design for the rewrite, and then estimating how long that design will take. This is still a notoriously hard problem to solve. But it is much easier in principle than estimating the depth of rabbit holes. And if your design is clean, there is little chance of not being able to get a feature like text selection to work.


But you are assuming the result of the rewrite will be cleaner and have less rabbit holes. There is no reason to think that. If it is developed by the same organization it will probably end up with the same level of quality. Perhaps worse since it will suffer from second-system effect.

Indeed Netscape 6 was more buggy than any of the previous versions.


Doing emergency surgery on a code base and not knowing how many holes are left is likely to leave more rabbit holes than a rewrite.

Otherwise I agree with you. Rewrites are tricky. And the second system is particularly dangerous. But then it gets better - the third system tends to be a lot better than either of the first two.


I think this is the siren song of putting a clever architectural decision front-and-center in the code.

"Everything is an X."

What happens when it turns out not everything should be an X, or worse, X ends up being harmful? How do you refactor such a system when the biggest mistake is pervasive?

[Edit: I have done it 2-3 times, but I'm not entirely sure I could adequately explain how I did it, which means I should still be asking that question.]


Doing the rewrite definitely tanked the company. That's a fairly low bar that "not doing a rewrite" has to get over to be a better option.


Microsoft giving away IE for free probably had more to do with Netacape's downfall than the rewrite alone.


How many people actually paid for the Netscape browser in the first place? Was selling the browser ever a viable business model?

Netscape could have developed different ways of getting money, like ads on the start page or default search engine. But such schemes could only have worked if they had retained a large market share, which they jeopardized with the rewrite.


I was always a little irritated to see a boxed copy of Netscape in stores, but it did solve the 'how do I get my initial web browser?' problem for people who had no idea what "FTP" is. So I'd say that for the majority of people (the 90% who were not tech people), they bought it once. Once they got later in the adoption cycle that would have dropped off, and the moment IE shipped that would immediately trend to zero.

Netscape was free because Mosaic was free, and Mosaic was free because of the National Science Foundation. (As for getting your initial browser, IIRC you could send away for copy of Mosaic on 2 floppies for the cost of shipping and media, but that could take weeks)


I think ISP's probably did more for distributing browsers at the time.


The Netscape business model was largely about selling their FastTrack Sever (for "eCommerce") and per-seat browser licensing to for-profits.

They had a reputation for giving away betas and pre-release copies on their FTP servers.


Being unable to respond won't have helped.


Here is a nice write up on different examples that either work or fail: https://medium.com/@herbcaudill/lessons-from-6-software-rewr...


A few years ago, I worked on a rewrite of a system that had been implemented in ColdFusion (yes, a few years ago - well into the 21st century). I can't imagine even Spolsky arguing that it shouldn't have been rewritten at that point.


Even with rewrites, there are good rewrites and bad rewrites. For my money, big-bang rewrites are bad. If you can strangle out the old system as functionality is gradually moved across into the new one, that can work, but still relies on actually finishing the transition so the old system goes away.

What was yours like?


My team has been working on a big-bang rewrite for about 5 years now. It started out as the "strangle" approach. Early on, everyone was on board with killing the old thing. But of course, you can't stop developing new features; you still need to make some forward progress in the midst of developing the new app. At the beginning we were 80% rewrite / 20% new feature work. But as time goes on, that balance has pretty much completely flipped; we're now more like 95% new feature work / 5% rewrite. Once we reached a certain critical mass of compatibility, the appetite for finishing the work and completely killing the old thing has pretty much disappeared. And there's a large loud minority that hate change and complain about having to use the new tool vs. the old tool.

FWIW, we were rewriting a Visual Basic 6 application using web technologies. It's VERY hard for me to imagine saying "don't rewrite that VB6 application, just make it better". It's literally not even supported by Microsoft anymore, it was written by one person at the company who has since left, no one else here writes VB6, etc. etc. It was a critical business application, so it wasn't the kind of thing anyone wanted to just leave alone and keep on life support.

Even still, it's hard for me not to wonder if we did the right thing. Maybe we should have just hired a bunch of VB6 contractors to maintain the application and continue adding functionality to it. Maybe moving it to VB.net would have been better. Maybe there's some other migration path for VB6 applications...

Anyway, my biggest takeaway from all of this has been: think very very hard (and be honest with yourself) about how long a rewrite will take. Make sure "the business" and you are on the same page about how long that's going to take and making a serious commitment to focus on migrating to the new tool, and if you can, get it down in writing the conditions necessary to kill the old tool. If your users are anything like ours, they're going to have to be dragged kicking and screaming to the new tool, because we all hate change, so if you don't have clear conditions about when you're shutting the old stuff off etc. then it's going to be painful.


In saying "If your users are anything like ours, they're going to have to be dragged kicking and screaming to the new tool", I think you miss how the end-user sees things. Having worked with many such end-user teams over decades, what I found was that the end-user is seen as the "troublesome" bit of the organisation and have to be corralled like "wild animals" (to use an euphemism).

If they are consulted and treated as if they have serious input to what is being developed, here I mean what they see and what they do, they become much more amenable to the new system.

What end-users really don’t like is being told how to do their jobs by people who do not do their jobs. Especially, when they are now required (for management purposes) to do something that does not fit into their work-flows. This is where the development teams have a strong place to collect information without user intervention by being very smart.

However, this also means that the development teams have to show that they are NOT just cost centres but are “profit” centres for the business as a whole. This takes a lot of effort to achieve this in many organisations because of the biases of other parts of those organisations to the IT challenge.


I agree that it's not uncommon to hear things on the development team that basically amount to "if it weren't for our users our application would be so much simpler" :)

We have spent a significant chunk of time working directly with our users during this migration. This is an internal tool, so we can literally walk over and talk to our user base if we have questions or want suggestions on things. We have done surveys, we track using analytics, we have a user council that we meet with regularly to run ideas by and get feedback on our work and they come to our sprint reviews (well, 3 of them do...) to see what new work we've done etc. etc.

This doesn't change the fact that you can't make everyone happy. One user will tell us that they hate the way a certain page is designed and that they won't use it because it doesn't fit their workflow. Another user will tell us the exact opposite. So we have to take the feedback and make a judgement call, using the data we have available, to try and make the best decision. One great thing about having analytics is it helps you identify squeaky wheels. We had some people on our user council that would basically complain about every single thing we did. If you just listened to them, you would have been tempted to just give up. When we looked at the data, we realized 90% of our user base was getting A LOT of work done, and we weren't hearing anything from them. Obviously it was possible they also hated the app and just didn't want to tell us, but after reaching out it became clear that they were perfectly happy with the way it worked and were just getting shit done. In my experience, the temptation is just to listen to the people you're getting feedback from and assume that they're a representative sample. DON'T. Do the work (or put the systems in place) to help you vet the feedback you're getting.

Another thing that makes this very difficult is one of the things you're touching on; trying to develop a tool for a domain that you don't have intimate knowledge of makes it pretty difficult to know which way to go when you are getting conflicting information. None of the developers is an expert user in the domain (or even a beginner for that matter...) and so we often end up having to look at things from a higher level (i.e. what other applications have we used that function this way? How do they solve this problem? etc.). I think this mostly works, but there are bound to be some misses here or there.


I don't disagree with you that there are end-users who are, shall we say, "difficult". That is the nature of having to work with a wide variety of different types of people. That is one of those "unenviable" things that happen. As far as not being "problem domain experts", I have found that getting the co-operation of the "most-liked" and "most-expert" of your end-users in that area can be helpful as to getting most people on side to give you the best chance of gathering what you need. This is not necessarily an easy find as I have had to work with some who don't like to lose their "control" and that can be frustrating, to say the least.

My original comment wasn't meant to say that getting end-users on side was going to be easy, sometimes it is and sometimes it isn't. But the reputational boost you get when you do, eases the next time something has to be done.


Curious how much the interface changed and if you think that plays a role in the sentiment towards it. VB6 I'd expect a desktop app or add-ins to office, going from that to web changes maybe more than necessary for the users.


I think this plays a role in the difficulty of the migration, because yes, the interface has changed quite a bit. In general, we tried to hew as close to the old paradigms as made sense. We never changed anything just because we didn't like it; we tried to make sure that the transition would be easy and the users didn't have to drastically change their mental models. But of course, there are always going to be differences in look/feel/capability between native applications and the web. I will say the web has come a long way since I started as a web developer 15 years ago and it feels like closing this gap is more a priority these days.

But there were parts of the application that were poorly designed (our users told us so...) and so, even though they were the primary ones driving change, there's always pain when changing to a new way of doing things.


ColdFusion URL traversals were the first servers I cracked as part of the OSCP course. They seemed to be pretty horrible! A rewrite definitely sounds appropriate.

I think we should consider developer Morale when it comes to legacy systems - it’s hard to hire and a lot of people want to leave if they’re stuck with them.


>I think we should consider developer Morale when it comes to legacy systems - it’s hard to hire and a lot of people want to leave if they’re stuck with them.

Basically. Some systems/languages just don't have many people on the market with expertise in those things, and you may not be able to hire someone who does, and get them to relocate to where you are. If you try to stick someone else with that job, they're not going to want that on their resume, so they'll just look for a new job that does fit their career goals. For instance, personally, I have no desire to work on VB.NET code at all, so if my company suddenly decided to make me a full-time VB.NET developer (including training etc.), and wouldn't take no for an answer, I'd immediately look for a new job. Having VB.NET on my resume isn't much better than having a job gap.


One of my favorite pieces of trivial knowledge is that MySpace's rewrite to asp.net included mimicking all their old ColdFusion .cfm routes.


well, Joel did make FogCreek write a cross-compiler from VB to whatever (perl or php or something?) so...


Including the monetary outcome of the re-write.

How much was the cost of the rewrite (total hours expended) and what was the benefit (energy savings, less hours spent bug fixing, less customer support calls, ...).


Wasn't Evernote's (successful) rewrite _from_ C#?


All writing is rewriting and all factoring is refactoring it's a question more of how it's actually done i.e. in-situ or in parallel and of course there's no technique that's good or bad per se how many times have we all heard "Never rebase!!" that's just a crock


Rewriting from scratch is usually, but not universally a bad idea. There are times in your life when you encounter that piece of legacy internal software that is simply garbage. The math is simple if it takes less time to rewrite than to refactor then rewriting is better, but also keep in mind that estimating software efforts is often inaccurate.

I have also noticed that in Agile teams refactoring is generally not rewarded and sometimes is even discouraged. Instead it is a voluntary effort taken at the developer's expense and risk, which is perhaps the greatest crime of Agile.


> I have also noticed that in Agile teams refactoring is generally not rewarded and sometimes is even discouraged

As a long time Agilist, the spread of Fake Agile is a sad thing to see.

Constant refactoring is a core Agile practice! https://martinfowler.com/bliki/OpportunisticRefactoring.html


I always get nervous when “refactoring” shows up on the backlog and then gets deprioritized. In my mind it should never show up as a separate task and then it should never get ignored . It should just be a normal part of the workflow for professional development that gets done without much fuzz. Most long term technical debt can be reduced a lot by proactively refactoring small things that for some reason don’t work well.

When I worked closer to production the people at the workshops also cleaned and repaired their equipment every evening, crunch time or not. They didn’t just let their machines die long term because it may buy some time in the short term to skip maintenance.


This has been my experience. Carving out time for refactoring explicitly is only beneficial when your stakeholders, PM's, etc. understand its importance completely. Usually, at least one person up the chain says "it's already working and we have more important things to focus on. Put it on the backlog indefinitely."

Better to just incorporate refactoring into the estimate for any given coding task and do it without explicit permission. Unless of course you're in a culture that is willing to straight up miss a deadline in order to get refactoring finished.


“Better to just incorporate refactoring into the estimate for any given coding task and do it without explicit permission.”

It should be as implicit as attending planning meetings. For some reasons PMs usually have no problem forcing people to sit through meetings but they have a problem with spending the same amount of time refactoring .


In my teams we take tickets for meetings, which makes product owners very aware of their impact on delivery.

I like including refactoring in tickets, but if the refactor is consequent it's not going to work. It's not so often thankfully, I usually manage to convince the PO of the value of it, then try to not budge once it's decided


Evidently the answer is refactoring during meetings. /s


Sarcasm not needed. I've probably done this during a meeting. I definitely code if it's an all hands on deck meeting that gets hijacked by technical discussions that aren't relevant to my work.


I probably spend 1/3 of my average day refactoring.

Occasional there is a bigger refactoring project, but that's my normal day.


“Real agile” relies on a thorough suite of tests, a stable team that understands the domain and the codebase, and clear explanations of problems rather than a shopping list of solutions. Most teams don’t work in such an environment, so safe refactoring is impossible. Instead it’s just a whole series of shotgun surgery appointments.


Every "Agile" shop I've worked in was awful and has been doing it wrong according to Agile fans.

"That wasn't true Agile" is starting to sound about as credible as "That wasn't true Communism."

Best methodology: Self-organizing teams of experienced, expert engineers. Let them develop whatever process works best for the individual team and project.

That requires two things:

  1.  Hiring the right people.
  2.  Trusting them.
Most companies aren't willing to do this, so... we get a semi-unstructured form of micro-management hell which goes by the name "Agile." Regardless of whatever that term originally meant, this is what it means now.


> "That wasn't true Agile" is starting to sound about as credible as "That wasn't true Communism."

The difference is that many people have worked on highly functional "real agile" teams and can tell the tale.

> Best methodology: Self-organizing teams of experienced, expert engineers.

Minor quibble: While that is of course the best scenario, it can also work quite well with a few experts guiding the less experienced. Pair programming works wonders here.

If your team is all inexperienced newbs, I don't know what can save your project :)

> Regardless of whatever that term originally meant, this is what it means now.

Perhaps. I have no interest in debating word definitions though.


> The difference is that many people have worked on highly functional "real agile" teams and can tell the tale.

Sure, but is it really the "agility" or the people making the difference?

The highly functional teams I've worked on either ignored or satirized the elements of Agile development (e.g. the "story" format, story points, any sincere effort at estimation poker) to get out of those discussions as quickly as possible. This probably bears some similarity to how the original "Agile" developers dealt with the preceding methodologies.


Both these things are directly written in the Agile principles.

“Build projects around motivated individuals. Give them the environment and support they need, and trust them to get the job done.”

“At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly.”

So based on your description those teams were Agile in the sense meant by the manifesto.


You know, that is a fair point!

The great Agile teams I was on had really good people on them. So, sure, they might have done great work in other environments.

Then again, this was the methodology they chose.

It also sounds like the teams you describe picked their own way of developing, which, in a Big Picture sense, is all the Agile Manifesto says: Let the team control their own work, and get out of the way.

Which, in a lawyerly and annoying, but still kinda true way, makes that team agile too :)


“Self-organizing teams of experienced, expert engineers. Let them develop whatever process works best for the individual team and project.”

That’s what “Agile” is when you read the Agile Manifesto.

It seems every reasonable methodology quickly gets perverted by leadership to a top down, dogmatic dictatorship caricature of itself. Happened to OOP, Agile, will probably happen to FP. You could argue that what Lenin and friends did in Russia was also not what Marx had in mind.


This is what agile proponents always say. Whatever someone is suggesting, that's agile. It appears to mean nothing. Here, you're saying that the agile manifesto literally says nothing beyond "let people make up whatever process they want", which boils down to nothing at all. If that's really the case then agile as a concept may as well not exist.

But we have story points, epics, scrums, retros etc. Agile is very much more than that.


I think people just confuse Agile the manifesto and various methodologies that claim to adhere to it. You can work in a way that meets the manifesto without any of those concepts that are tied to it through things like Scrum.

Most “no true agile” posts are really “no true scrum” because the issue is firmly in process and methodology rather than underlying principles.

In fact I’d say Scrum itself whilst designed to meet the principles doesn’t actually do so in practice.


You have to look at how Agile was started. Before it often projects were planned out in detail beforehand and then they tried to execute that plan without changes. As it turned out that often didn’t work. So the Agile Manifesto writers tried to explain that software is unpredictable and you have to make sure people can adapt to changes.

Scrum as a base methodology isn’t too bad either but as always management prefers precise plans and a lot of Agile consultants are happy to feed that. Saying “let people make up whatever process they want” is actually quite profound because it often goes complete against what management wants.


Even granting Marx wasn't particularly specific, the specific invention of Leninism, the Party as "intermediary" between the proletariat and History with a capital H, is conceived of absolutely nowhere in Marx's work, but the 'perversion' of Marxism to Leninism to Maoism is no different than the 'perversion' of Agile to Scrum to SAFE; there's no 'perversion' at all, because the idea of "lineage", while it works for and describes sequences of organic structures, is completely unfit for understanding ideas and how and why they change, to the point that its difficult to really use the word "origin" when it comes to ideas, particularly when each phase distinguishes itself from the last by what problems its trying to solve, the constraints on solutions, etc. We can only really talk about them being related insofar as there are structural identities in the concepts but that's about it.

If this is clear and understood, congratulations, you basically understand the gist of the French historian Michel Foucault's criticism of modernity (it supposes origins and progress when no such thing actually exists, so how do we talk about politics, history, ethics?), a criticism which would go on to inform all of contemporary Left politics, including contemporary progressivism and identity politics.


Then, to clarify, I almost never see story points assigned for refactoring existing code. Completion of story points is generally the metric of success in the absence of more valid metrics. That measure of success applies to the scrum master that moves the project forward and the developer assigned to perform the work.

> Although there are places for some scheduled refactoring efforts, I prefer to encourage refactoring as an opportunistic activity, done whenever and wherever code needs to cleaned up - by whoever.

That is how I personally view refactoring as well. Its like a necessary sanitation ritual to keep things tidy like doing the laundry or the dishes but far more rewarding knowing that the change is more than mere maintenance, an improvement that simplifies things. Refactoring shouldn't require a scheduled event, but its the quantity of scheduled events that is visible and reinforced by the system.


You're describing another sad disease. Or two, really.

1. Story points are only intended as a prognosis tool. If we produce 14p/week on average, we'll probably work through this 80p block in 5-7 weeks.

That's it.

When you start using it to evaluate teams or individuals, Goodhart's law sets in, and it all turns to garbage.

2. The reason you don't count refactoring or bug fixes as points is that they don't add user value. If you spend all your time fixing bugs you created earlier, you are not being productive to the customer. It's easy to get caught up in polishing your toys instead of delivering things the customer is paying for.

Though what's often unstated/forgotten ignored, is that refactoring and leaving the code in at least as good a shape as you found it, is an integral part of what calling a story "done" should mean.

https://en.wikipedia.org/wiki/Goodhart%27s_law


> The math is simple if it takes less time to rewrite than to refactor then rewriting is better

This is the common pitfall with rewriting. It does not account for the existing software being tested in production for years. There often are subtle behaviors/bugs, that users by now rely on.


This is why it's worth the effort to make behavior clean, coherent, and logical.

"Huh the behavior of this system is a little eccentric, but if I insert my code in just the right place, I can get it to do what I need!" This is how legacy systems get built.

Instead you have to ask, "How can I describe what this system does in one or two sentences?"

"How can I describe how it does it in a short paragraph?"

The first explanation should make it clear why you're building the new functionality into or on top of the system. The second explanation should make it clear how the system supports the new functionality and why it works. These two explanations form a team's shared understanding of the system. If you build a new system that conforms to this understanding, porting is straightforward, even if you don't replicate all the details.

Invariably, though, there are developers who are too lazy to ask how a system works, and instead crawl through the code sticking their new feature in likely-seeming places and leaving it in the first place where it seems to work, regardless of whether the reason it works is related to the shared understanding of the system. Because everybody else must keep this new feature working, there can be no understanding of the system separate from the system itself: there is just a mass of equally significant details. The intentional developers are reduced to the level of the random developers. There is no such thing as a rewrite of a system like this. All you can do is replicate it nut by nut, screw by screw, and then it still doesn't work because you didn't match the spacing of the screw threads.

The corollary to this is that if you build a complicated enough system, with an abundance of distinct places to plug in code, the random developers will always find a place to stick their code and will always be happy and "productive." This is the way to achieve architectural immortality inside a company: create something so complex that it can't be understood but solves every problem if you look hard enough.


In some cases, it's possible to rewrite incrementally via refactoring. (Whether you can do this depends on the architecture of the system and whether the starting and target languages have decent interoperability.) In other words, rewrite one small piece of a system at a time while maintaining the entire unit and integration test suite. While this doesn't completely obviate the risks you point out, being able to rely on existing tests can make the effort far more plausible and less risky.


Joel's examples of hairy functions. "Why does this need to be 50 lines and makes half a dozen case exceptions? Just strip out that noise, god who wrote this?"

Now legacy systems don't cooperate, well I guess they can just rewrite their APIs too because "software entropy".


So, unmaintainable undocumented untestable legacy code? Yes please, rewrite it, break it, and when its real intended behavior is understood, write tests and document it so the next guy doesn't have to wonder about that crazy function..


Please stay off my systems. If you deploy broken software that runs your business, no one has to wonder about any of the functions any more.


If you keep doing that, there won’t be a next guy. The company will be bankrupt.


That's why commercial software is doomed to be crap.


Why are you “rewriting” instead of refactoring? Even if the existing code is “garbage”, if you need to add new functionality, you can usually use the “strangler pattern”. Put a nice facade over the old code to provide an interface and write on top of it.

If I can do it by putting a COM wrapper on top of 20 year old PowerBuilder code and then interact with it with C#, I can’t imagine a case where this can’t be done.


I was once partially responsible for maintaining an old .NET remoting system that did absolutely nothing to isolate its transactional boundaries.

The services were baked into the model which made it easy to write in the first place but very hard to refactor. The code would load a "hollow" entity and then as the entity was interacted with: automagically over the wire the data would be lazy loaded. If you applied an SQL trace you might see over 300 separate SQL commands executed just to load a page that visually represented the entity.

This resulted in some dire performance outcomes, a simple for loop over a given collection on the entity could result in N round trips and sometimes even N*N depending on the data modelling in the back end for given types of data within a collection or association.

A primary issue of this system is that lots of client code had been written that didn't express its intent. There were about eight relatively complex clients of this software and nowhere in that code base did the client ask for the data it wanted upfront in a way that would allow someone to refactor it without rewriting every single one of those eight clients.

Now yes, you could refactor this "architecture-less" sub-system by replacing it with another "architecture-less" sub-system that was "better", but sadly you're left with the same problem: the clients will never tell you what data they need for a use-case they will just assume it will appear when its accessed. To add in those "data demands" you're performing huge shotgun changes across the entire code-base which are always too expensive to justify.

One particularly egregious example was a WinForm loading an entity and then stuffing it into the .Tag of a control where a handler would cast it back out later and access it again in a lazy-loading kind of way.

Your only hope in that case is some sort of static code analysis to attempt to reverse engineer the client's data demands but that's probably approaching as expensive as any of the changes.


Microservice architectures, if you have the organizational scale to support them, are intended to produce services that can be rewritten and replaced.

I've actually done this on purpose, by getting a v0 API into production quickly (1 week), then getting a v1 with all the bells and whistles out with sufficient time. It completely replaces the the v0 microservice which is a separate deployment.

The first version provided a specific implementation of the functionality for a single, small user population. The full version scales to millions of users, globally. It is still small enough that we could rewrite it, if we had to. The infrastructure is actually harder to produce (as code) than the API functions themselves.

CD pipelines, extensive unit tests, automated smoke tests, automatically configured health monitoring... all of these things make it possible to replace a service with a different implementation that conforms.

It's a hard discipline to get to, but once you're there, it is often more economical to rewrite to move on. I guess that's not the case for the usual kinds of projects. We had to spend a year as an organization to get to the point where this was true. It's usually not worth the cost for small teams to do this.


CD pipelines, extensive unit tests, automated smoke tests, automatically configured health monitoring... all of these things make it possible to replace a service with a different implementation that conforms.

That’s all true in theory, but how many organizations have badly written code that needs to be refactored and good dev ops practices and/or unit test coverage?

And that still goes back to my point about using the strangler pattern. You put a facade on top of the “legacy code”, write integration tests that capture existing functionality and you slowly start refactoring the code or replace entire subsystems as necessary.

As far as v0 versus v1. I did just the opposite. My microservice was written in Node/Express. It was our company’s first Node app, we were a Windows/.Net/IIS shop.

I could easily get buy in for Node because everyone was expected to know JS. But I would never have gotten buy in to set up a Linux server and I didn’t want to run Node on Windows.

I deployed the API on AWS lambda even though I knew it wouldn’t be good enough for phase 2 - 6MB request/response and the “warmup time”. I also didn’t know anything about Docker it Fargate (AWS Serverless Docker).

We got the first version in, I spent some time getting up to speed on Docker, Fargate, and a deployment strategy and rearchitected the entire back end hosting and deployment to use Docker and Fargate.


I always wonder how micro services will do once a refactor doesn’t impact just one service but several services and teams. This seems to be a recipe for a lot of political machinations.


Part of the expense of microservices is that they be individually replaceable. Every microservice has dependencies to the interface of other services, and only the interface, which is typically RESTful. Refactoring in one code-base (read: one microservice), would never effect another code-base.

The unit of reuse in this architecture tends to be the entire API, or libraries published as open source. If there is generic code that is suitable for a library, it is also suitable for contribution. If there is a core business function that should be reused, it is a microservice with an API and gets shared and consumed that way.

Our hard rule for refactoring or enhancing services is if it is a breaking change: use another name (Rich Hickey's recommendation). If you provide more or accept less, fine. If you provide less, or require more, use a new route (like a /v2...).

Therefore, refactoring in your code base, which makes your life easier, is transparent to everyone else who depends on your code. We have many small teams, so each team is providing 5 to 10 microservices, and refactoring within each independent repo is simple and low impact.

There's no magic here, we simply paid the upfront cost of adopting an architecture that can be fielded by a very large dev organization, with a concerted effort in high quality DevOps practices. This move was also a means to scale the organization without dragging our efforts to a screeching halt with maintenance burden.


If you have to change the API signature and not just add more information, you have to version it and have a decent deprecation schedule.


It's not as easy to just refactor when it comes to UI oftentimes because you are often writing completely new UX.


True,

It was probably bad wording on his part when he said “rewriting from C# to Angular”. If he meant, “we created an API in C# and decoupled the MVC rendering from the business logic and rewrote the front end in Angular”, that’s different. Since he didn’t mention rewriting the back end, I can’t assume they rewrote the backend.


I agree. I recently worked on a price where 1 java class had 14 methods named construct<subtype>fields which all built a html+js+css mess using a string builder. I was asked to add support for 3 more subtypes. And this was the god class for the UI. We rewrote the whole UI in 3 months. There was no way that could be refactored.


A rewrite that small is a refactor.


That just sounds like garbage code to begin with.


One of the oft-unspoken things about refactoring is that it's a way to do a rewrite without anyone noticing. Essentially it's the Ship of Theseus gambit: https://en.wikipedia.org/wiki/Ship_of_Theseus

Later on I realized a sort of reverse Catch-22 in this: The people who are most capable to do a rewrite are also the least likely to ask for one. They will just do it.

An official rewrite often just becomes a pressure release valve. During the early stages nobody can measure how the team is doing, so the team gets a break from scrutiny. Whether that's healthy or not comes down to a lot of factors, but the odds are in favor of it being unhealthy.


I rewrote two applications, once on my own, once with some help from the team. The first time we switched the backend to a completely new framework (I'm in the middle of one of those right now...) and the second time we upgraded from an outdated version of the framework to the latest one in one go (so, essentially a rewrite from the ground up).

Both times sucked, both times we underestimated the amount of work to be done, and both times we were much happier with the new product than what we had before.


If the team refuses to work on the old code, you have to do something. A rewrite may be the least bad thing you can do in that case, but if you don't ask how we got here and why we won't end up right back here again...

The Relentless Improvement model that many refactoring advocates espouse is one (great) answer to that question. But it also means they won't wait for a rewrite. They'll fix the bullshit they're staring at today, and stare at some other bullshit tomorrow.


Lately I've been testing a "third way" of thinking about refactor/rewrite decisions, which is to first document the existing requirements.

The trick is in making those docs so small and lean that they fit neatly in comments inlined with key functions, and this I have accomplished by reduction to a list of "must" and "cannot" (abbreviated to + and -) bullet point items. It must function like this, it cannot function like that. Some requirements can be turned into automated tests or static checks, others need an eye to verify. But the cycle I'm seeing often works out to be: write speculative requirements, write speculative code, then reverse-engineer true requirements and maintain code against that.

Doing this creates a base quality bar that the code has to hit in both the rewrite and refactor cases. The trap that rewrites tend to fall into is that the requirements are lost during the rewrite, so the new code becomes just as speculative as the original greenfield code. If you have the requirements as your reference, on the other hand, the area of bad code you can write is limited to the boundaries of those requirements and any missing requirements. This makes new code much less of a gamble.


That's a very good point.

It's so hard to get people to write good requirements that we often overlook (or rather, won't look at it) as a solution.

Rewrites are easier with good requirements. They're a constant stream of bad surprises without them.

It's possible, while refactoring, to begin to collect those requirements, in the tests if nowhere else. But it's not a given.


The company had built heavily around .NET, which, while it has its merits, did not provide the type of experience that was best for this environment.

And then he makes the same mistake that his entire article is based on. He needed to rewrite the code to use Angular instead of .Net? I’m assuming he meant he needed to use Angular instead of server side rendering. I can’t imagine a scenario where server side rendering wouldn’t have gotten the job done. Especially since he admitted that he didn’t need high UI performance.


It depends, I've some some beasts built in ASP.Net Webforms. Worse than anything you'd see in PHP from a reliability and extensibility perspective (although factors better in security - Microsoft pushed their kludgy ORM hard back then whilst PHP tutorials started with string-based SQL).

If I inherited an ASP.Net Webforms product which was built by amateurs from an MVP/PoC which would go on to form a very small part of a large product then I would absolutely re-architecture (and thus rewrite) it.

If it's an existing product the the rewrite cannot be compartmentalized then there's no way I'd touch the code.

In the scenario of an aging product I think that you're honestly better taking your best engineers and product people and starting again. At some point when the new product is mature enough you can look at how to move the last straggling customers over. I've seen that done successfully by clients and also by large companies (Microsoft with WebForms > MVC and later .Net Framework > .Net Core).


WebForms is pretty similar to Coldfusion. Which he spent the whole start of the article saying was fine and got the job done.

There's nothing inherent in .NET that would 'not provide the type of experience that was best for this environment'. If anything, probably the vast majority of code that is written for managing 401Ks out there is probably written in .NET or Java.


In my experience it’s a management problem more than a technologist problem. New engineering manager, wants to leave their stamp, has a grand vision, and is able to get buy in and budget from the higher ups.

“Higher ups” in most companies don’t have the tech literacy to question such a move. And there’s a lot of ego going around to say “this time is different” when it’s really not.

Usually when I see just a technologist push for a “big rewrite” they won’t succeed without management allies or enablers.

One reason this happens is that management and technologists alike have a huge dearth of skills when it comes to dealing with and evolving legacy code. It’s not taught anywhere and has to be learned. It’s this lack of skills that CAUSES a lot of new shiny tech to come out, as it’s more fun to write my own JS framework than try to work with other people to improve theirs.


I agree. I think the other dynamic that I've observed in these situations is that the technologists who oppose the big rewrite often sound like a wet blanket who are just opposed to the excitement. The old curmudgeons are just raining on the parade of excitement.

For some senior executives who are looking to generate passion and excitement the desire to endorse and support the big rewrite is just too much to resist.


I saw a link here on HN a short while ago (though I can't remember where exactly) to a Ribbon Farm post talking about Legibility, or the idea that people generally like things that can be understood and categorised.

https://www.ribbonfarm.com/2010/07/26/a-big-little-idea-call...

It seems to capture the idea of The Grand Rewrite fairly well. From the post:

  - Look at a complex and confusing reality, such as the social dynamics of an old city
  - Fail to understand all the subtleties of how the complex reality works
  - Attribute that failure to the irrationality of what you are looking at, rather than your own limitations
  - Come up with an idealized blank-slate vision of what that reality ought to look like
  - Argue that the relative simplicity and platonic orderliness of the vision represents rationality
  - Use authoritarian power to impose that vision, by demolishing the old reality if necessary
  - Watch your rational Utopia fail horribly
There are often good technical reasons for a rewrite or major refactoring, but more than once in my career the urge was definitely based on the fact that if we rewrite it then we will know it inside out and we will only have what we need, nothing more. Full control and understanding.


My feeling is that rewrites simply don't scale well.

To make them work you need resources, good planning and motivation from everyone involved. As the number of people increase the probability that any of them made a mistake estimating how long it will take or how well their part will run explodes. That's why refactor tends to work better, rewrite with a scope.

I worked at a company were we build small embedded devices (EM dataloggers). I was supposed to take an existing code base shared by several products and modify it to build a new one. The code was beyond ugly. As an example:

   delay_ms(5)
Would yield a 5 milliseconds delay, as long as you weren't in a variety of modes, the GPRS connection wasn't hang up, and GPS had a lock. It was painful to think about it, let alone work with it.

I convinced everyone that a total rewrite was necessary, and then I let everyone keep working on the old code base while I worked on the new one to add the basic functionalities. Then we started to move product by product, adding the "extras" each device required. Now the new products are built with that skeleton, and refactors are way less scary and faster, and can be made across devices easily (one of the improvements is that "business" logic has automated testing now).

To this day I'm convinced that the only reason that could be made is because we started with the new product, and then moved upwards. Trying to build a new ecosystem for all of them would have failed hard.


> rewrites simply don't scale well.

I learned early on that rewriting a codebase is a lose-lose proposition: I’ll never suggest or recommend it again. Not because I don’t think it’s ever a good idea, but because it can only go wrong. If everything happens perfectly without a single hiccup or surprise, you’re still right back “where you started from” from the perspective of management. I worked on a massive rewrite porting a C++-based codebase to Java. The rewrite was actually suggested by “the business” because they were having trouble finding C++ developers and they noticed that Java developers were easier to come by. I was actually hired because I had a strong background in both, so I jumped right tin. I was too young and too naive to realize how backstabbing, conniving, and treacherous business people can be: I actually ended up being blamed for every surprise that came up during the rewrite process.


The biggest problem with rewrites is that you need someone who understands how the old system works, and why it works that way. Sometimes there are good reasons for why something is a mess, that were the result of long iterations over customer requirements. And sometimes things are just a mess, because. If you are walking into a codebase cold, you don't know which is which, without doing a lot of archeology - assuming, of course, that more than potsherds of that train of cognition is left buried in the commit messages, JIRA tickets, and email archives you have access to.


Another aspect I didn't see mentioned was that, at least in my experience, one can put too much effort into solving a problem. Then one ends up with a nice bespoke solution, but once it's been used for a bit, the requirements change and the solution might end up being a poor fit.

Very often in my experience this is because the client doesn't have a perfect handle on what exactly it is they want, which is only natural when venturing into the unknown. So they end up spec'ing a solution that doesn't exactly match the ideal work flow.

These days at work, when the client wants some new custom feature, we try to figure out if we can get something good enough by making small adjustments to existing features.

Then, when the inevitable changes come, we're in a much better position to determine the true requirements if we decide to go for a bespoke solution.

This does however require that the code is malleable, so I write with that in mind.


He kind of touched on it, but he really didn’t go into one reason to rewrite code is if it is based on old legacy technology that won’t be supported in the future.

If your website was Flash heavy, you had to rewrite it for it to work on mobile.

I mentioned another post where I was responsible for rewriting a legacy 20 year old PowerBuilder app that depended on sql server 2003. The first step I took was to upgrade it to a “newer” version of PowerBuilder that supported COM, put a C# REST API on top of it, and then create Postman/Newman integration tests.

After that, we slowly started moving the code from PB to C#.

Honestly though because everyone has their own self interest at heart, sometimes you have to rewrite or transition to newer tech to keep existing employees and/or to get new employees.

I was at a job where the department grew from 4 to 15 developers in a year. The product was a B2B product where we had 90%+ of the sellers market using a PHP web app with complicated business logic. We didn’t have a product for the buyers market.

We were going to put COM on top of the (well written) PHP rules engine and share it with a new product for the buyers written in C#.

The buyers product written in C# didn’t find a market, but we were getting interest in expanding the PHP product. Every single developer left within 6 months because no one wanted to spend the next year doing PHP. We all knew that PHP wasn’t marketable at the compensation rates we were looking for. The only reason it took six months for them to leave is because some were waiting for their 3 year vest.


Probably the best section is right here:

> However, refactoring can be a better tool before you begin making any changes as a way to make your change easier. As Kent Beck said “Make the change easy (warning: this may be hard), then make the easy change.”


Combine that with "Red, Green, Refactor" (write failing tests, fix code refactor) and we have another pair for the list of mutually contradicting proverbs.


They're different development phases. If you've got a green test suite, you can just "Refactor, Green, Refactor, Green".


> Just like in Joel's post, where Netscape took years to rewrite its already working code, rewriting code because Coldfusion was “dead” or “inferior” would have served little business purpose.

I understand there are plenty of counter-arguments to be made here. First and foremost, it gets harder every day to find a Coldfusion developer. Those who know the platform are most likely looking to give a facelift to their skillset, rather than prolonging the inevitable drying-up of the proverbial job pool. More, it may be harder to implement functionality in Coldfusion, particularly the kind we were dealing with, than, say, in .Net, Ruby on Rails, or even Go.

Even so, there are other, better ways to move toward new technology without rewriting an entire codebase.

Interesting that the author doesn't provide any real suggestions to this specific case, only the other examples, which are not similar.

The facts are the following:

1. The Coldfusion community is dying.

2. Adobe is actively making it more difficult to commercially pay for the software. They are now seeking compensation for use of CF as part of "SaaS Revenue". You have to submit a form that says what your revenues are. I kid you not, thats what the sales people are saying to customers.

These two situations have serious economical ramifications on a company running CF.


When the answer to "what is your pricing?" is "well... how much can you pay?" the vendor gets kicked out on their tuchus in our shop.


How much do you charge your customers?


Well, I work in a government so...

But anyway, the problem of that behavior is not really the pricing, it's all the rest that comes bundled with that attitude.


I don't ask them what's in their bank account. We study the market and price our products accordingly and competitively.


Rewrites are a symptom of orgs that focus only on feature velocity, and not other forms of customer value.

If in addition to getting features done, teams were held more accountable for uptime, performance, and stability, there would be appropriate focus on careful redesign, but only in proportion to some customer end-value after that is delivered.

A rewrite for the sake of a rewrite, because X doesn't conform to software engineering technobabble is a huge red flag. These things exist to support customer value, and are not ends in-and-of-themselves.

In short, there shouldn't be a difference between new feature delivery and refactoring to support some other customer need (like uptime or performance). It's all work that needs to get put into a backlog, prioritized, and delivered with clear success outcomes


Don't fall into the trap of "never do X" prescriptions written in blog posts. Rewriting an application or library from the ground up is often a joyful experience packed with an immense amount of lessons to learn. Similarly, refactoring an existing application and watching it come back to life can also be a joyful experience packed with an immense amount of lessons to learn. Whether or not you know what you're getting into on either side, you'll learn soon enough, and experiencing both approaches many times over is part of what being a programmer is about.


If you are just developing for your own enjoyment, sure, you can rewrite for the learning experience or for fun.


if you are clever enough, you can get your employer to go with either approach you choose and you can learn while getting paid.


> If your reason for rewriting the code is that you don't understand it, you should not rewrite it.

This is a fundamental thing that most people miss.

If you don't understand the code/system/structure/process/law, you can't possibly rewrite it in a way that is a) effective and b) addresses the use cases it was designed to address.

If you try to rewrite something you don't understand, everything after that decision is re-discovering the same old pains, lessons, and practices that the first person discovered.


This notion of code being "done" is the root problem. This idea that you write something and ship it and that's over is wrong. Code is never done, it is almost biological in its emerging complexities. The current code is the running manifestation of everyone's understanding, of the architecture : it will and must change.

Viewed through that lens you can see;

* estimation is hard, as you are never really done.

* software has a very high maintenance cost, plan for it.

* architecture and design are important, longer term : spending more money on things with longer value life just makes sense.

* Interfaces are important and allow for rewrites on a smaller scale.

* you will always be releasing: focus on CI and devops to take the grind out of that.

Rewriting is a form of refactoring I guess: It's like chopping a leg off or something. One thing I like about microservices (or rather, strict boundaries) is that you can rewrite the worst part but keep the rest going.

Yeah and oh man I hate the "this code sucks" culture :(


The most important thing for developers is will they have the skills to get the next job or will they be working on technology that no one wants in a few years time. If FANG companies are hiring people with SPA, JS, Python experience it makes perfect sense to get rid of any technology no matter how well it is working.


.NET is amazing and very popular right now. Why would anyone get rid of it?


Agreed I like .NET I've modified the original to make it any generic technology that isn't hot in SV.


Those are the same companies that 5 years down the line discover why all big corporations bet on .NET and Java when performance matters.


Big enterprises (corporations and government) have been betting on Java/.NET often without considering performance; they do it because the skills are widespread and the talent is cheap to hire (not necessary low salary, but easy to fill positions.)

That's also why JS/Python is becoming attractive to them now.


They do when accounting sees the cloud renting costs to scale those js/python apps.


"Nobody ever got fired for buying IBM", or so the industry said for decades.

But IBM wasn't (always) the best, they were just safe bets. Until the world started changing and they weren't.

.Net and Java have always been good enough. Even today they're good enough for many, many problems.

I probably would not reach for .Net and Java if performance literally mattered.


> I probably would not reach for .Net and Java if performance literally mattered.

There are many tiers for performance mattering. It is quite possible, and common, for Python to be too slow while .Net is fast enough.


>"Nobody ever got fired for buying IBM"

Now that I think about it, given this is no longer true, at some point it must have been the case that people started getting fired for buying IBM.


The biggest trap is new and shiny. Let me explain, they decided in the company to do a complet rewrite while we only needed a few apis connecting to the existing software. Its a group of 8 people. The current software is written in PhP, jQuery, Mysql and some use of Yii. Blazing fast, customers are happy. New guys come and of course we need a rewrite. Everything has to be microservices. Now they have to work in Kubernets, Docker, React, Node, elastic search, MongoDb. With each of these is nothing wrong, but its a lot of new technology to pickup with 8 people. Now the clients hear when they want a change or modification, no just wait till we have platform 2.0. Long story short, development has gone downhill, customers unhappy and after 6 months they are already 12 months behind schedule. This company will byte the dust within a year.


A lot of rewrites end up trading one set of problems with similarly sized set of different problems.

But to be honest the rewrite was probably a positive for the devs. They have more cool stuff on their resume and will do better in the market. That’s one of the big problems with finding jobs. If you do what’s good for the company you may quickly end up with an outdated resume and be marked as dinosaur who hasn’t been keeping up. Even your own company will hire new people for new sexy projects instead of the people who did what was good for business.


A very common story. Engineering managers need to be aware that their engineers will cry for a rewrite and say there's too much "technical debt", but it could easily destroy the company if you let them go hog wild with the rewrite. You need to quantify the benefits up front and make the engineering team meet the milestones, none of this "yeah we're polishing the codebase because reasons"


10% necessary to fix the code itself

20% necessary to get into a better ecosystem (especially for hiring)

30% NIH

40% know it's bad but padding review/resume


> If your reason for rewriting the code is that you don't understand it, you should not rewrite it.

Which is just another example of Chesterton's Fence: https://en.wikipedia.org/wiki/G._K._Chesterton#Chesterton's_...

I once lived through a complete application rewrite. The rewrite was necessary to add a new feature that didn't fit into the existing code organization. We were able to reuse large parts of the old code, but it still took twice as long as estimated. And the result was the most bug-ridden version of the app ever. But ultimately it was successful, and the new feature was much loved.


And again: it depends. I was involved in at least one rewrite where large sections of the original code were incomprehensible, and rewriting the whole thing from scratch proved a massive success.


I would ask a meta-question: why did the author fall into the trap of writing about the rewrite trap? While much of what the article says is true, it talks about a phenomenon that exists mostly in the funhouse mirror of HN and Reddit. People don't actually rewrite so much; in fact, they probably do it less than they should. Yes, there is a "contempt culture" but it almost always comes from the losing side. E.g. Go is not very popular outside the US, but it's hugely popular in America; how popular is it in the US for active development? 33x less than Java [1]. That's still very respectable, but let's not get carried away. Moreover, HN and Reddit are not only terrible at describing the current state of the industry, they're also terrible at predicting trends. While a few of the ideas/language/techniques hyped on such forums do end up successful, the vast majority end up failing. Some of them continue to be hyped on those forums even long after they've failed.

Why do Reddit and HN paint such a distorted image? I have a couple of guesses. For one, by their very nature, they focus on new and unusual things, which is what they're supposed to. This means that by their nature they focus on things people don't actually do. For another, these forums can, at best, represent the content that's actually produced online, and here there is a big bias towards smaller companies -- perhaps because such content is an efficient form of marketing for them -- as well as towards places and industries where producing blog posts featured on HN is a part of the culture, namely startups and SV companies. Even there we see that an SV company like Google, that employs 2000 times as many employees as a startup, does not produce 2000 times the number of technical blog posts. When the content is so skewed towards small organizations, it gives disproportionate representation to the practices of small companies working on small codebases. When those companies grow, they tend to shift their practices and technologies to more established ones that are not as new and not as unusual, and also produce fewer technical blog posts. Similarly, younger, less experienced developers have more time and incentive to publish posts, and so are overrepresented, but as they mature the often realize, like their predecessors, the mistake of their ways, but also produce less online content. The result is that HN and Reddit mostly talk about things that aren't usually done, and overrepresent smaller problems and practices that come from inexperience. That makes them fun to read, but readers shouldn't forget that the genre they're reading is more that of GQ/Vogue than, say, the New York Times.

[1]: https://www.hiringlab.org/2019/11/19/todays-top-tech-skills/


A programmer never starts from scratch. Every new line of code typed brings with it the legacy of past struggles and new understanding;


I agree. Mostly you need refactoring to account for the changes in requirements over time. We don't spend enough time focusing on encapsulation and modularization.

People want to rewrite because it is green field and they can do it how they want. As you said, this doesn't really add value to the company.


Quick question, since we are on the topic: do you think that it is a feasible (or even the optimal) approach to first develop an MVP, using a very productive, but much less performant, language (Python) and then, upon reaching a product-market fit, if needed [i.e., after trying to remove performance bottlenecks and using a faster compiler, like PyPy], rewrite the product (SaaS platform) in a more performant and otherwise more suitable language (e.g., Julia, C#)?

P.S. By "rewrite the product" I meant rewriting only the backend.


Depending on the size of the refactor, the lines between it and "rewrite" start to blur, so I'm just going to consider what I'm doing right now at my job a big refactor of how the data is stored and read, even if it involves making some big changes to the database models and how they are interacted with.

Really, it's pretty much a rewrite of the database handling code but hey, that won't impact how the data is sent through the endpoints so the interface didn't change. Refactor!


What if the rewrite was intended to be competing product rather than a replacement for the existing one?

So many rewrites die as they attempts to reach parity and acceptance from users of the legacy version.

When a startup comes along and "disrupts" a product with their fancy new tech, they have basically done a rewrite without caring about the state of the legacy product. And then the incumbent often acquires them for a lot of money.

So, why not just do that whole process in-house?


The company where I'm working is doing a full rewrite of one of its product from a single-tenant, installed on-premise to a multi-tenant solution in the cloud. Except that the previous version is still worked on (new features & fixs), with perhaps less people than a few years ago. And the new version is developed by new teams set up for this project and the targeted market seems a little different.

Well we'll see in a few years how it ends.


> Coldfusion (you've probably never heard of it),

I still use it.


> One of my favorite reads is Joel Spolsky's Things You Should Never Do. He wrote this post almost twenty years ago, outlining the downfall of Netscape and others because they spent years rewriting working code.

I've read that piece long ago, and over time my cynicism has ripened. Should I be surprised that the opinion of a former Microsoft employee is that Netscape's downfall was their own fault?


My policy on rewrites is that you must already fundamentally understand not only the code but also the business that the code covers. If the developer responsible for the rewrite satisfies this constraint, I have a much greater deal of confidence in the success of the effort.

Also, any rewrite should be done in contrasting terms of: Value to the end customer VS long-term technical value. Many rewrites remove value from both ends of the scale. Some may improve both. I think as long as you keep this equation balanced you will be ok. Many developers have ulterior motivations which can greatly compromise this equation if not carefully accounted for (e.g. chasing shiny things, resume padding, etc).

I would say that in purely technology terms, rewrites are amazing. My approach to writing a new system usually involves writing it 2 times at a minimum. The first pass is the fastest, hacky way I can get to MVP with a major focus on the most difficult technical aspects of the product. This pass is what I put in front of our internal developers to get some input so that I can quickly correct course if needed. During this time, I allow myself to make sloppy mistakes in favor of proving that a certain feature can (theoretically) work. This seems to do wonders for productivity because you are much more likely to experiment and find better paths when you aren't worried about your extremely pedantic code policies (of which we have many).

The 2nd pass (aka the rewrite) is when I have the first pass project up on the left hand side of my monitor, and a new solution on the right hand side. I use my original implementation as a reference, but now view it through the lens of auditing another developer's code for quality and policy. Having already proven that the hard thing can be done, my mind is now free to focus on the correctness and standards applied throughout. This is the code that I would actually put in front of our customers. From this point onward, unless a major shift in framework, architecture or language is decided, all future iterations are done on top of this 2nd pass codebase.

I would also say that just because you decide to do the rewrite doesn't mean you have to throw away the old code and any support for it. The most successful rewrites I've ever seen occur where the legacy system is ran in parallel with the new system with both being maintained and used in production simultaneously. Obviously this has some overhead, but it also ensures you have a stable fallback option with clear A/B comparison capabilities throughout the migration phase.


Usually the way I turn down junior ideas about rewriting the whole deal is making them think about the monetary costs of a rewrite, and to show what is the benefit for the end customer from an cost / analysis report.

Rewriting stuff is cool when one isn't paying the work hours expended in them.


The successful rewrites I've seen are usually because the requirements have changed so much since the original was written that the original code just isn't the right thing anymore, and refactoring it to be the right thing amounts to a rewrite anyway.

Depending on what the software does and how it works, you can often have the original software running along side the rewritten software, and gradually move users over to the new software, giving you time to iron out bugs in a fairly safe way.

Of course, that doesn't work for everything. It's easier for applications that are used internally by a company vs external customers.


it's called investing in your product. if a rewrite will cost x amount in dollars and man hours upfront, BUT will save much more over the lifetime of the product in support and pain for both you and your customer, then it is ALWAYS worth it.


If that is proven for the use case in question, by all means.

If not, then there are better places where to spend money.


Because we did it wrong the first time.


but it'll be so much better the second time!! https://news.ycombinator.com/item?id=22104506


The only times I had to rewrite the software was the time when initial documentation, didn't match what was actually delivered/expected from the client. Sometimes it's very hard to plan architecture when the whole concept change.


While you should be careful about rewrites, Joel is not correct that a rewrite is always a bad idea. I've been (reluctantly!) involved in a bunch of rewrites in my career and they were invariably highly successful.

So, as always:

1. it depends, and

2. please use your best judgement.


Parable: Company wants to move "into the cloud" and adopts AWS managed services or Kubernetes to recreate and re-platform existing and functional, though sketchy, on-prem services. "Agile coaches" and "Solution Architects" advise rewriting the old code because of its fundamental alien-ness in a cloud-only environment. So it's a rewrite. But of course, the re-write doesn't work, primarily because of fundamental misunderstandings about what properties allow software to scale across many machines, in many time zones, running processes 1000's of times per day. So it becomes necessary to either cut losses and re-write the re-write or decide to start iterating now on the crappy re-write to make it closer in terms of functionality to what already existed in the crappy on-prem systems - which begs the question of why you didn't just iterate on the original sketchy code in the first place.

There are times when rewriting does make sense - rare though they may be. I've only seen a few, and they always had to do with improving stability and reducing the potential for bugs. None of them required you to use the latest shiny to "completely reimagine what software could be doing for your business". I've helped to rewrite subscription-management systems that ended up noticeably reducing churn and cutting out spurious chargebacks from customers who couldn't navigate our bug-prone subscription service wasteland. But some refactors are so labor intensive that they amount to re-writes, even if you're using the same tools and languages. Some code is acutely awful and it needs to die for people to be able to go to sleep at night without having to drink a fifth of whiskey beforehand. Some atrocities don't need to exist.

But, again, if the reason for a refactor or rewrite is summed up as "we're bored and want to play with these new shinies" then you are in for it. If rewriting provides you with some sort of new business capability or fundamentally changes a business process drastically for the better, then go ahead. But in my experience, if a technology or paradigm is fundamentally "so much better than what we do now", then it should work so well with what we do now so well that it doesn't require fundamental rewrites of large software systems. So rewriting things in Python or C++ is great, or using Kubernetes instead of Lambdas or whatever is fine, but it's more important that these paradigm shifts can be incremental rather than a complete recreation of what already exists, that the gulf between refactoring and rewriting can be relatively small instead of two completely different processes.


Does anyone know the error handling pattern the author mentions?


Something I've noticed over the years is that the people who say that reading code is harder than writing code are people who have been writing code for a long time and are paid to write code.

I think the truth is that it's actually a different skill, and one that's much harder to pickup because when you write code, there's a very quick feedback loop telling you that you're wrong, but you can read code, think it does something different than it does and never ever learn you were wrong.


I think reading code is very good for one’s character. It teaches compassion and openness to other people’s ideas. Pretty much like listening without prejudice. I am very guilty of not being interested in what other people did so now I am making a point to really trying to understand code before criticizing. It’s so much easier to throw a fit and think everybody else is stupid.


What if it’s written in ASP classic and powered by deprecated SQL procedures that won’t run on anything newer than SQL Server 2005?


People not wanting to write Cold Fusion isn't "contempt culture", it's an objectively terrible platform.


But isn't Firefox Quantum a result of a successful rewrite of Firefox Gecko?


Firefox Quantum isn't a rewrite of Gecko. At one point, Servo was going to be a rewrite of Gecko, but that petered out. They pivoted to a strategy where some parts of Servo would be integrated into Gecko, and other components of Gecko would be rewritten in Rust. The most significant of these was Stylo, Servo's CSS style (but not rendering) subsystem. However, despite the rebranding, the core of the engine and the overall architecture is still Gecko.


The thrust of Spolsky's argument is that you can't stop delivering just for the sake of a rewrite. If a rewrite happens, it has to happen in the background while your main product keeps shipping. Mozilla spent years letting the pieces that would become Quantum incubate in Servo, all the while still shipping Firefox as normal.


its rendering engine hasn't been completely re-written though. They've been re-writing parts of it piece by piece.


do we have a model for differential structure migration back to formal specs ?


This came up a few weeks ago regarding crypto.

There's a distinct difference between rewriting code WITH a regression suite, and rewriting code WITHOUT a regression suite.

The difference is enormous.

I come from a hardware background where the # of employees looks like this:

lead architects = N employees microarchitects = 10 * N employees circuit designers = 10 * 10 * N employees validation = 10 * 10 * 10 * N employees

It is rough, but true: 10 lead architects can equal 10's of thousands of validation engineers

There's a reason why it is so upside-down: bugs cost a lot more in the hardware world than in software because they are much harder to fix in the wild.

The answer to the question, "Should we rewrite?" is always "maybe", and then requires risk exposure analysis and a robust mitigation plan. You need regression testing and a bug tracking system from day one, but you also need a rollout plan and a response team.

It is not an easy decision for established software. Or rather, it shouldn't be! :)


Sometimes there's no option, because enterprise policy mandates that you have to port that uncommented, spaghetti code, dependent on archaic Win32 calls, written by amateurs VB6 image processing application to something modern like .NET. BTW if anyone needs that skillset I'm open to offers!


I feel if you need to rewrite something you're already in the trap. Rewriting is climbing out, not rewriting is digging deeper.

The problem with rewriting though is sometimes it's another trap, but anytime you feel the need to rewrite... you are already in a trap.


Obligatory comment I have to repost every time rewrites come up: https://news.ycombinator.com/item?id=11554288




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: