Hacker Newsnew | past | comments | ask | show | jobs | submit | jack6e's commentslogin

Specific technology choices aside, this was an incredible write-up of their migration process: thorough, organized, readable prose about a technical topic. It is helpful to read how other teams handle these types of processes in real production systems. Perhaps most refreshing is the description of choices made for the various infrastructure pieces, because it is reasonable and real-world. Blog posts so often describe building a system from scratch where all latest-and-greatest software can be used at each layer. However, here is a more realistic mix of, on the one hand, swapping out DBs for an entirely new (and better) one, but on the other hand finding new tools within their existing primary language to extend the API and proxy.

Great read. Well done.


This was what was particularly interesting to me - that they went to the effort of writing a purely technical article on the particulars of how parts of their environment operate, and to publish it on their platform even when that's not the sort of content they're known for.


It's not their main platform though:

> Digital Blog

> A blog by the Guardian's internal Digital team. We build the Guardian website, mobile apps, Editorial tools, revenue products, support our infrastructure and manage all things tech around the Guardian


Hi! Thanks for your comments. I'm one of the authors of this post. It is the same platform at the moment (just not tagged with editorial tags so it stays away from the fronts), though sometimes the team that approves non-editorial posts to the site can be concerned about us writing about outages and things as it might carry a 'reputational risk', so we may end up migrating to a different platform in the future so we can publish more quickly, we'll see!


In our era of deplatforming a publisher publishing on something like Medium seems antithetical, even for a dev team that just wants to get words out. Should you spend some cycles on the dev blog? Probably, but you should also split test the donation copy and get the data warehouse moved forward for 2019 initiatives and fix 1200 other issues. Thanks for sharing a great post. I shared it with my team and we all learned a lot.


Thanks to hacker news and reddit this piece got over 100,000 page views which should be enough to justify the blog staying on platform!


Does SecureDrop run on your AWS infrastructure?


No, I think it's on our own infrastructure.


Thanks for the info, here and below.


It’s probably better if they didn’t answer this question...


It's a bit disturbing to me that they seem to be using AWS for confidential editorial work.

> Due to editorial requirements, we needed to run the database cluster and OpsManager on our own infrastructure in AWS rather than using Mongo’s managed database offering.


In a happy world the guardian wouldn't rely on a company we spend a lot of time reporting on for unethical practices (tax avoidance, worker exploitation etc.) - but we decided it was the only way to compete. One of the big drivers was a denial of service attack on our datacentre in 2014 on boxing day - not an experience any of us want to have to deal with again.


>Since all our other services are running in AWS, the obvious choice was DynamoDB – Amazon’s NoSQL database offering. Unfortunately at the time Dynamo didn’t support encryption at rest. After waiting around nine months for this feature to be added, we ended up giving up and looking for something else, ultimately choosing to use Postgres on AWS RDS.


Anyone who gets control of the live server can still read a database, even if it encrypts its storage.


Exactly. As I read the original article, which mentions "encryption-at-rest", there was a voice in my head crying: "No, what they need is E2EE". That would enable the authors to write confidential drafts of the articles, no matter where the data is stored (and AWS would be perfectly fine of course).

Disclaimer: The voice is my head does not come out of nowhere. I am building a product which addresses this: https://github.com/wallix/datapeps-sdk-js is a API/SDK solution for E2EE. Sample app integration is available at: https://github.com/wallix/notes (you can switch between master and datapeps branches to see the changes of the E2EE integration)


In which case they could've just used a separate encryption layer with any database, including DynamoDB. The HSM security keys available from all the clouds makes this rather simple.


Yes, any db including Dynamo would have been fine.

Our software E2EE solution has advantages over HSM though: Cost obviously, and more features and extensibility.


Great idea.


Encryption at rest is still important as it closes off a few attack/loss vectors: mis-disposed hard drives, re-allocated hosts. I'm probably missing a few others.


Yeah, but it doesn't really address my concern.


Anyone who can control a server in any environment can potentially interact with the database powering applications running on that server.

How is running on AWS different than Guardian Cloud in their basement?


The level of control over who has physical access, of course.

Did reports of the Snowden revelations reside on the CMS?


Sadly we don't trust our security practices anywhere enough for that! Secret investigations happen in an air gapped room on computers with their network cards removed then get moved across to the main CMS when they're ready to publish.


Probably not, no, until they were about to be published. I imagine that the choice between "run an entire data centre ourselves, store everything there" and "use AWS, but keep high sensitivity stories on local machines" is an easy one.

After all, the client computer that connects to the CMS is just as, or more likely to be compromised. I wouldn't be surprised if the coverage (or at least parts of it) were edited on airgapped laptops.


> the choice between "run an entire data centre ourselves, store everything there"

If those were the only two choices, you might be right. But the resources needed for the actual CMS functionality sound modest enough to run independently of the main website.

> the client computer that connects to the CMS is just as, or more likely to be compromised

That's faulty reasoning.


> That's faulty reasoning.

Why? It's an obvious potential point of compromise.


Sorry, I misunderstood. I read it as saying "We're going to get hacked via this other vector, anyway, so why bother?" I see your point, now.


They're using AWS VPC (Virtual Private Cloud) which isn't open to the world (you use a VPN to bridge the VPC into your internal network) and which you can spin up dedicated instances that don't share underlying hardware with other AWS customers.


Thanks for writing the blog post. Insights like these and of such high quality are rare.

Can I ask what was the total cost of the migration?

If there was software that could do this database migration without downtime, how much would you/Guardian be willing to pay?


This is pretty much how all Guardian articles are formatted. Some of their regular pieces could be called "blog posts" - Felicity Cloake's cooking series comes to mind.

Guess it makes sense to reuse the platform that already has the templates than use another platform and reimplement the design.


Given their employers, one would hope that they get good editorial support though!


Ish. From what I can tell, the “Digital Blog” seems to be set up as just another column on the platform.


Totally agreed. This is pretty much the definitive guide on how to perform a high stakes migration where downtime is absolutely unacceptable. It's extremely tempting, particularly for startups, to simply have a big-bang migration where an old system gets replaced by something else in one shot. I have never, ever seen that approach work out well. The Guardian approach is certainly conservative but it's hard to read that article and conclude anything other than that they did the right thing at every step along the way.

Well done and congratulations to everyone on the team.


I agree, but it looked them a year if I am reading the article right.

In most early stage startups, that would be an unacceptable loss of time.

So I don't judge them for doing a one-shot migration even if it causes an hour of downtime.

It all depends on the business.


Yeah it did take a long time! Part of this though was due to people moving on/off the project a fair bit as other more pressing business needs took priority. We sort of justified the cost due to the expected cost savings from not paying for OpsManager/Mongo support (as in the RDS world support became 'free' as we were already paying for AWS support) - which took the pressure off a bit.

Another team at the guardian did a similar migration but went for a 'bit by bit' approach - so migrating a few bits of the API at a time - which worked out faster, in part because stuff was tested in production more quickly, rather than our approach with the proxy which, whilst imitating production traffic, didn't actually serve Postgres data to the users until 'the big switch' - so not really a continuous delivery migration!


The article mentions several corner cases that weren’t well covered by testing and caused issues later. What sort of test tooling did you use, Scalacheck?


Agreed! I don't think enough engineering orgs appreciate the value of a great narrative on any technical topic.

Part of my duties at work require me to deal with "large" issues. While a solution to them is usually necessary and quick and high quality, I've seen the analyses that come after them vary in quality.

Good writeups tend to stick around in people's memories and become company culture, and drive everyone to do better. Bad writeups are forgotten, and thus the lessons learned from them are forgotten as well.

This particular article stands out for me. English is not my first language, and I've spent most of my life dealing with very fundamental technical details, so most of my writeups aren't the best. I'm going to bookmark this one and come back to it to learn how to write accessible technical narratives.


Writing is their Core business after all :) I agree though, I read it like a fascinating breaking news story


The BBC's Online and R&D departments have very interesting blogs, if you like this sort of thing.

http://www.bbc.co.uk/blogs/internet

https://www.bbc.co.uk/rd



Has world class platform for delivering articles. Uses Medium.

I don't even.


I imagine this is a nice benefit of working at an organization that is primarily about writing.


I was actually a little confused by the article - it seems to go up and down in terms of technical depth. It feels like it was written by several people. The hyperlink to “a screen session” was odd as well ... ammonite hyperlink I get but... screen is a pretty ancient tool... people either know it or can find out about it. Like you link to screen but not “ELK” stack?

I like the article but it was a bit hard for me to consume with multiple voices in different parts.


screen is hard to google.


This is the first thing I noticed too. This was an excellent read.


Old versions of mongo were very bad.

We accured lots of downtime due to mongo.

But later versions were rock solid and I've matainer mongo installations at many startup and SMEs once you setup alertd for disk/memory usage, off you go. Works like charm 99% of the times.


MogoDB is proof that with the right strategy, marketing and luck, you really can fake it until you make it.

Not that that's really a surprise or was unknown, it's just fairly new to see in the open source ecosystem instead of the enterprise one.


I'd posit its more a matter of maturing a new paradigm. There's a lot more edge cases you have to cover as NoSQL became more popular for production-at-scale.

SQL has decades of production maturation, and has wider domain knowledge.


I'm sure there's some of that. But a lot of the early problems were a bit more weighted towards poor engineering in general, IIRC. For example, I seem to recall an early problem was truncating large amounts of data on crash occasionally.


That's probably true.

Unfortunately the number of my customers who would sign off on just "two nines" is approximately zero...


The link to the actual vote results [0] states that the electorate was "CPython core developers with a known email-address."

[0] https://discuss.python.org/t/python-governance-vote-december...


Ah, I see. Thanks for the info!


Skipping over the little annoying facts that fully remote work is not some secret strategy, and that no Fortune 500 company (as an index of success) is 100% remote, let's just do a direct comparison of competitors:

- GitLab is 100% remote and, according the article, was most recently valued at $1 billion, 350 employees, and has an unknown/hidden (or I can't find) number of users.

- Github has a headquarters and also remote work, and was recently acquired for $7.5 billion, with 800 employees and 31 million users.

So...is remote work a secret? Did it lead to a comparative success over a competitor offering similar services but a different organizational strategy? Not really, not even close, no.

More accurately, we should say GitLab has so far managed to make remote work a success for themselves through leadership, organizational culture, and some other actually secret ingredient, which is where the real story lies. Lots of remote companies fail. What has GitLab done right? Sadly, this article only skims the possibilities.


Github also had real trouble with their cash flows. MS bought the popularity and the current and future projects that are hosted there (probably overpaid for all that).

Now - why remoting would work better for the company (off the top of my head): 1. next to zero office-space rental costs 2. you're going shopping for talent in the whole world and you can hire regardless of visas, eligibility, etc. I.e. you hire better talent, for less money (they don't need to pay outrageous rents for SF/London/Munich/Dublin/whatever) 3. You get happier employees (they don't get to see their families once every six m onths or so) 4. You get easier on-calls schedules 24/7/365 if you get a few ppl on different timezones 5. You get diversity from day 0 and local eyes in almost all markets that you care to sell anything 6. You _have_ to document more and better since you _have_ to work with tickets 7. You make your meetings worthwhile because your time matters (and you're not valued or paid according to "chair-time" that can be filled with boring nonsense meetings so that you can coast through the day)

There's a bunch of other advantages in other areas (ecology, general economy, tech, etc) but since the focus is on what's in it for the company I won't go into these.


> MS bought the [...] current and future projects that are hosted there

what? microsoft doesn't own any projects hosted on github other than their own


It's pretty clear the GP was using that in the sense that one often refers to acquirers buying customers, not in terms of actually buying IP of the projects.


Exactly.


I think they meant projects like Electron which are owned by Github IIRC.


Gitlab is also 3 years younger as a company, and Github had a undeniable first-mover advantage in the space.


GitLab the open source project was started in 2011. GitLab the company didn't start until 2014. GitHub is 6 years older than GitLab.


On the other hand, GitLab didn't have to do any UI work to start with, they just appeared to copy and paste GitHub.


Huh? It doesn’t look the same at all, or at least it doesn’t now or at any time I used it.


Today they have their own identity, but when they started GL was basically a clone of what GH was at the time.


I'm sure the fact that they pay near zero in office rent/leases helps such companies with cash flow.

You're also likely to get more productivity form employees that don't have to spend 2-3 hours per day traveling. I found myself working a full 8-9 hours when working from home, whereas if I was traveling to the office I'd be looking to duck out after 6-7 hours. The 3pm slump is a drag and I personally dread the commute home. Those few extra hours mean that I can squeeze out more code.


I'd like to point out that my comment mostly referred to the article's original title, which was click-bait referencing GitLab's "secret" to multi-million dollar "success." Spoiler alert: the "secret" was remote work. I took issue with calling their organizational structure of 100% remote work a secret strategy, and with calling their current revenue and valuation a comparative success or that they could be directly attributed to their remote structure.

The updated title is much more informative, accurate, and devoid of click-bait hype.


Github is also the first mover and the dominant player in the market.

It's silly to try to say the differences in their success are based on remote work or not.


Qualtrics is pretty popular among their initial target audience mentioned in the article: academics. Among academic survey shops this product is the definite go-to.

I imagine, based on my own experience, the average tech worker thinks of surveys as small sets of questions used to easily and quickly collect a few data points. SurveyMonkey is fine for that. But for the types of real surveys that provide the data for serious research, Qualtrics is what you need. I've used it for government surveys, sociological, health/medical, etc., web and in-person, with lots of flow control, randomization, and all the other features needed to produce robust and accurate data.


What? This is entirely untrue, and I feel bad that you had a teacher or reading experience that did not focus on how much inner life, motivation, and emotion are in this poem. I have had the exact opposite experience: people think of Beowulf as a monster-fighting action poem and are disappointed or surprised when they read it and discover most of the content is songs, conversation, historical reminiscences, and not-fighting.

Take as just one brief example these passages from XXXIV lines 52-7 and XXXV lines 5-7 [0]:

So to hoar-headed hero ’tis heavily crushing

To live to see his son as he rideth

Young on the gallows: then measures he chanteth,

A song of sorrow, when his son is hanging

For the raven’s delight, and aged and hoary

He is unable to offer any assistance.

Every morning his offspring’s departure

Is constant recalled

...

So the helm of the Weders

Hrethel grieves for Herebald.

Mindful of Herebald heart-sorrow carried,

Stirred with emotion, nowise was able

To wreak his ruin on the ruthless destroyer

This is a description of an experience almost (but not totally) unknown in modern American/western culture. To dmreedy's point, this captures the duality that on the one hand, these were humans just like us, and fathers who have lost sons violently in war or crime who are weighed down with the burden of that grief and the inability to do anything, to take any vengeance, can probably identify with this ancient father. These words can be a foil that gives voice to their own hearts. On the other hand, we can consider how different Anglo-Saxon society was by looking at the culture of vengeance and reciprocal warfare that resulted in this son's death. Very few of us even experience war, much less can imagine a life in which we expect every spring/summer to be attacked by, or to go out and attack, some neighboring city.

What forces shaped society to develop those systems? How can we prevent returning to that? What was good about life then - what did the people enjoy, and what was evil? These are all questions that are worth asking and trying to answer. But we only get to explore those questions if we approach these texts, as the article's author states, as texts with presence in time and place, creations of real people from a certain time, encoding their particular moment with all its similarities and differences to ours.

Lastly, it does sound better in Old English [1].

[0] https://www.gutenberg.org/files/16328/16328-h/16328-h.htm#XX...

[1] https://www.youtube.com/watch?v=ROghKY1jmuE

(edit for formatting)


Maybe I'm not explaining myself well -- the passages you've quoted above do tell us in very poetic language that some character is sad that their sons died in battle, but they don't show that, really. As a reader, we have no access to their interior mental state, we just get the narrator's report that Herebald carries "heart-sorrow" (great word!).

Something like this article might explain the distinction I'm not articulating correctly: http://nautil.us/issue/65/in-plain-sight/why-doesnt-ancient-...


Considering that miniature books have been a part of literary and publishing history since at least the 16th century, I doubt this modern rehash will drastically change reading any more than the existing four centuries of iterations.

That said, mini books with this horizontal orientation are a cute and maybe convenient idea for some people. Anything that removes barriers to reading paper books is great.


They state that the cumulative valuation of these 100 companies is $100+ billion, of which $81+ billion comes from the top 5 companies, and $95+ billion comes from the top 10.

Also interesting that the top 2 (Airbnb and Stripe) were in the 2009 cohorts. From the depths of the recession to $50+ billion combined valuation.


Yeah these statistics should give any armchair VC/angel wannabees some thought:

Out of the 2000 or so startups that deserve some type of funding[1], the chance that you will pick a $1B unicorn is about 0.2%.

Additionally, the chance that you will choose the next Airbnb or Stripe is closer to 0.002%.

For comparison, you could turn $150,000 into $4,500,000 with about a 3% chance if you find a vegas table willing to service this bet size (which they have done in the past).

[1]a16z said this recently


I don't see how you get those numbers.

From the article, 19 / 1900 YC startups are above $1B, or 1% (not 0.2%). Many of the 1900 are young, so the fraction of any cohort that will eventually reach $1B is more than that.

The chance of having picked exactly AirBnB or Stripe from among YC startups is 2/1900, or 0.11% (not 0.002%). More companies out of that 1900 are likely to achieve similar valuations in the future.

The payoff ratio for angel investing is much higher than the 30:1 that Roulette pays. On the order of 1000:1 is typical ($10M cap safe - $20B IPO with 50% dilution on the way.)


No worries, we're using different numbers.

A16Z said that there are 2000 startups worth funding every year. Side note, A16Z only talks to about 1000 of them due to focus areas and competitive overlap.

Based on the 19 unicorns that YC has unearthed over the past decade, we can really only say that roughly 2 unicorns are born every year, and 2 megaunicorns born every 5 or so years.

They payoff is immense for finding one, but the chance that you keep at least all 150k is probably worse than your roulette odds, not to mention time invested and emotional capital.


What am I missing? You seem to be taking the YC-specific unicorn numbers and dividing them by the much broader A16Z "fundable" numbers. Apples & oranges, no?


I assert that they aren't apples and oranges becauss YC has picked every single unicorn there is to pick up until very recently. Therefore, both the 2,000 number from A16Z and 19 number from YC can be directly compared.

Do you think it is significantly easier to pick unicorns than I am describing? Because it seems very, very hard.

So hard, in fact, that YC has to pick 150 or so per year and hope that one or two hit 1000x to make up for the others.


I wish YC could have funded every unicorn! In reality there are a few hundred such companies worldwide, so it's less than 1 in 10.

https://www.cbinsights.com/research-unicorn-companies has a list.


Old tweet but here is where I was basing my "YC funded every unicorn" assertion (as of 2015)

https://mobile.twitter.com/rabois/status/634205368172814337


The tweet you referenced is talking about companies that actually applied to YC. Uber, for example, never applied to YC.


I've done quite a bit of angel investing and have a pretty good IRR. The key is to make sure you stay within your area of competence or within area of competence of those you trust.

Also, the numbers will necessarily look worse for a fund or accelerator like Y Combinator than for an individual angel. Y Combinator has to fund cohorts of companies, right? they can't just say one year, nope, I found nothing I want to invest in this year. But as an individual angel you can.


But the effect of that would be to cause massive distrust of Chinese suppliers and cause a shift away from electronics being produced there. IC and cyber experts generally identify the Chinese as using intelligence operations for primarily economic purposes, as compared to Russian/Iranian/North Korean objectives being military or political. A Chinese military intelligence agency using cyber espionage to intentionally disrupt one of the most significant export industries of the Chinese economy does not seem likely, nor does it seem to provide such an out-sized strategic benefit as to be worth the economic cost.


Good point, I agree with that thinking. But the actual execution of such a hardware-based attack would surely be discovered at some point anyway, and risk the same negative outcome. So then that would leave the only possible conclusion that the story just isn't true at all. In the end, none of it makes clear sense...


The difference is two-fold: actively planting a fake story means that first, the espionage is fake and thus no real intelligence can be gathered, so the only benefit is the hypothetical respect you suggested; second, the story will definitely get out, thus the potential for the negative effect is innately 100%. However, as a real intelligence operation the cost/benefit analysis is inverted, because there is a real, tangible benefit to extracting possibly sensitive commercial and national security information. And while an eventual discovery is always a possibility, it seems care was taken to ensure it would only be a small possibility, and that in any case it would be in the future, hopefully after a large amount of useful data is extracted.

So in the planted story hypothesis, there is certainty of negative outcomes with only the potential for positive outcomes, and those only intangible, while in the this-is-real hypothesis, there is near certainty of some tangible benefit with good probability of significant tangible benefit, with only a potential, distant, deniable risk of negative outcome.


I would say that given the amount of motherboard variants, even gene rationally that have varying differences, especially in component supplies, it was pretty unlikely to see the issue. I mean, while some may take a MB out and inspect it thoroughly, most that I'm aware of, will plug it in and if it works, leave it there.


> Or at least have testimonies by employees in these company

The original article directly addressed this: "The companies’ denials are countered by six current and former senior national security officials, who—in conversations that began during the Obama administration and continued under the Trump administration—detailed the discovery of the chips and the government’s investigation. One of those officials and two people inside AWS provided extensive information on how the attack played out at Elemental and Amazon; the official and one of the insiders also described Amazon’s cooperation with the government investigation. In addition to the three Apple insiders, four of the six U.S. officials confirmed that Apple was a victim. In all, 17 people confirmed the manipulation of Supermicro’s hardware and other elements of the attacks. The sources were granted anonymity because of the sensitive, and in some cases classified, nature of the information."

It is entirely likely that the companies affected were directed by the IC agencies working on this not to discuss or reveal their knowledge of the hack. Often in intelligence operations it is important and useful to not alert your adversary that you are aware of their intrusions until you are fully ready to take action against them, or have fully removed the danger.

I don't see any reason to take the companies' categorical denials as evidence that this did not happen or that they were not targeted. Those statements are what one would expect in a national security incident and investigation of this magnitude, with such serious implications.


> "Not as clickbaity though...."

Not as accurate, either, given that the title of the paper is "Get Billions and Billions of Correct Digits of pi from a Wrong Formula," and it was published before "click-bait" was a thing.


Click-bait has long existed as headlines and movie and novel titles.


Perhaps that means this paper invented click-bait.


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: