Totally agreed. This is pretty much the definitive guide on how to perform a high stakes migration where downtime is absolutely unacceptable. It's extremely tempting, particularly for startups, to simply have a big-bang migration where an old system gets replaced by something else in one shot. I have never, ever seen that approach work out well. The Guardian approach is certainly conservative but it's hard to read that article and conclude anything other than that they did the right thing at every step along the way.
Well done and congratulations to everyone on the team.
Yeah it did take a long time! Part of this though was due to people moving on/off the project a fair bit as other more pressing business needs took priority. We sort of justified the cost due to the expected cost savings from not paying for OpsManager/Mongo support (as in the RDS world support became 'free' as we were already paying for AWS support) - which took the pressure off a bit.
Another team at the guardian did a similar migration but went for a 'bit by bit' approach - so migrating a few bits of the API at a time - which worked out faster, in part because stuff was tested in production more quickly, rather than our approach with the proxy which, whilst imitating production traffic, didn't actually serve Postgres data to the users until 'the big switch' - so not really a continuous delivery migration!
The article mentions several corner cases that weren’t well covered by testing and caused issues later. What sort of test tooling did you use, Scalacheck?
Well done and congratulations to everyone on the team.