The report points out (Image 8) design choices that contributed to a raging fire, once the fire started:
* no emergency electricity cut-off device, an "economic strategy choice of the site operator" - the electrical room where the fire started was hot, 400C at the door (measured by thermal camera), with meter long electrical arcing from the door, thundering deafening sounds. Making access of utility technicians "difficult" - it took them 2 hours to cut incoming electrical service from the utility. On site UPS devices also had no cut-off, so they kept supplying.
* emergency water network provided only 70m3/h at the site. A firefighting boat arrived, Europa 1, was called, supplying 14.5m3/min max flow rate.
The freeflowing air cooling design, a good design choice as it saves on operating costs for cooling, contributed to nourishing the fire.
* Freeflowing air in DCs is good for cooling(
* It’s bad when you have a fire;
* An improved DC design would allow an operator to $somehow stop the freeflowing air (although one could argue that it’s not free flowing anymore if one can control it);
* I’d like to know how much money really was saved by not allowing the UPSes to be cut off.
I’m very curious how the insurance companies respond, and whether they’ll demand e.g. UPS supplies to be able to cut off. Or maybe in general, the fire department should be more aware of these types of trade-offs being made, and give their approval accordingly.
It is VERY common in buildings to have ventilation dampers that are closed when a fire alarm starts exactly for this reason. I am most surprised that the design sounds like it was made a certain way deliberately even though the basics were got wrong. Does Strasbourg not have any fire regulations that require fireman accessible power cut-outs, automatic extinguishers etc?
It doesn't matter just that their equipment burned but the severity of the fire was massively dangerous to the fire brigade and anything else nearby if it couldn't be extinguished properly.
> Does Strasbourg not have any fire regulations that require fireman accessible power cut-outs, automatic extinguishers etc?
Fire codes for industrial buildings, office buildings and residential buildings differ by quite a lot. Usually, they get stricter the more people are supposed to be in a building and what their training status in firefighting/escape is.
> Does Strasbourg not have any fire regulations that require fireman accessible power cut-outs, automatic extinguishers etc?
On the other hand, it could be that this was deliberately allowed for buildings with low/no occupancy, easy, safe & quick escape routes and no risk to neighbouring properties.
If someone wants to risk their own property, as long as they don't risk harming anyone, why not allow it?
>If someone wants to risk their own property, as long as they don't risk harming anyone, why not allow it?
Because it's a flawed premise, you don't exist in isolation from the outside world.
The smoke is harmful to everyone nearby, especially a data center fire where you have burning servers, UPS batteries, etc. contributing large quantities of burning cadmium, lead, mercury, assorted plastics and so on. The environmental impact from the fire event alone should make this a non-starter.
Also, little to nothing can be recycled after such an event, and replacing the lost equipment requires a whole bunch of environmentally damaging mining and manufacturing processes.
>no risk to neighbouring properties.
Even if you plan carefully there is an unacceptable risk of fire spreading. And while some datacenters are moving to the countryside or aquatic environments, most are in built up areas for easy access to industrial power supply, workers, spare parts, freight, etc... And I have no confidence that a company who's disaster plan is "screw it, let it all burn" would adequately maintain firebreaks and other containment measures.
Frankly, I find the very idea of such excessive and deliberately considered wastefulness to be grotesque.
I can't speak for Strasbourg, but here in the UK you don't get an inspection by the local building inspector or fire department - instead, you hire a fire risk assessor of your choice, on the free market.
So if you're not in compliance, you don't need to bribe anyone or cheat - you simply have to hire a friendly dumbass who'll check the basics but won't ask too many questions.
Mark this one up as something else that's been privatized in the UK that's shocking to me as an American. This sort of stuff is always run through the government (the level depends on the state).
Insurance companies might impose additional requirements, but then they'll send out their own inspectors.
>Instead, you hire a fire risk assessor of your choice, on the free market.
That sounds a lot better than the one size fits all crap we get in the US. If you are trying to do anything other than take over an existing and operating industrial site that's grandfathered in or run a warehouse for some sort of boring and non-reactive goods it tends to be a terrible maze of conflicting and irrelevant requirements.
Note the OP said the inspectors were privatized, not the actual building codes. The only difference is the private inspectors sound like they'll let you skate by on some stuff.
You find the same dynamic in states with private emissions inspections. Spend any time on a car enthusiast website and everyone knows "the guy" to go to that'll pass your obviously illegal emissions equipment bypass...
I know exactly what he said and that's why I said exactly what I said.
>The only difference is the private inspectors sound like they'll let you skate by on some stuff
Government building inspections are incredibly onerous unless you're a homeowner building a deck or a developer building cookie cutter houses or offices. You run into all sorts of stupid edge cases depending on what the facility being built is intended for and you need to get clarification on what you should do. Have you ever actually tried to get government clarification about something? They need to be dragged kicking and screaming, often through a courtroom to ever clarify anything in writing because nobody wants to set a precedent or take the responsibility of making judgement calls. Having professionals who will take the responsibility of those judgement calls is a massive net plus.
The fact that your knee jerk response is to frame professionals using judgement their jobs as "letting you state by" is just crazy. Would you say the same thing if the context was a PE crunching numbers, finding something that didn't pass the default rules and then using their judgement to conclude it was fine because of other situational details? People do their jobs satisfactorily the overwhelming majority of the time. Being on the government payroll doesn't make public inspectors immune from playing favorites or making screw ups. It just means the oversights follow a different pattern. Everyone's experienced plenty of government inspectors who don't do their job because it's Friday and they want to GTFO.
> Spend any time on a car enthusiast website and everyone knows "the guy" to go to that'll pass your obviously illegal emissions equipment bypass...
If the vehicle code were written the way the building code and occupational health and safety code is that's the only way any commercial vehicle that isn't a cookie cutter box truck would ever pass an inspection.
> The fact that your knee jerk response is to frame professionals using judgement their jobs as "letting you state by" is just crazy.
The UK system (builders choose their own building control inspector in the free market, building owners choose their own fire risk assessor in the free market, makers of building materials choose their own flammability testing house in the free market) produced hundreds of tower blocks clad in flammable cladding and insulation, and a fire in which 72 people died.
After all, why choose the guy who's going to give you a hard time, when you could choose someone friendly, helpful and trusting?
And inspectors and testers know that. And those that are still in business are so very friendly, helpful and trusting, in fact, that the person certifying whether your insulation is flammable will let you set up the test rig yourself, and they "won't notice" you've put magnesium oxide fireproof boards over the flame sensors.
> I’d like to know how much money really was saved by not allowing the UPSes to be cut off.
I don't think this was a cost-cutting measure. Instead, (automatic) cut-off devices are also systems that can fail. And if there is such a centralized device, it can nullify the hard work of 4 redundant power supplies if it "decides" to malfunction one day. So, this was a SPOF-cutting measure. I still think that having 4 separate switches for 4 power supplies is not so wrong. Firefighters anyway wrote that the electrical room had a door at 400C and observed 1-meter arcs, so they had to ask the electrical company to cut the power from their side for the sake of safety anyway.
I disagree with your idea that not having an automated power disconnect system is acceptable but for the sake of argument lets say it is. The standard for manual emergency power cutoff switches is to be located where they are easily accessible in an emergency which means outside of the electrical room. Most large data centers have manual switches in multiple locations. They should at a minimum have had cutoff switches located in their NOC.
Right! Sorry, I didn't write it clearly but I agree that there should be automatic disconnect, but think it's OK to have 4 separate disconnects to maintain the supply redundancy. Though now it can get really expensive.
The report says that the cut-off device was not present due to an economic/financial decision made by the company . It's weird to see that in a post incident report from a fire department, but I guess they thought it was worth mentioning.
> 1. Imagine if Amazon starts building datacenters for others as a service
Just as an FYI, those fancy datacentres you see on Amazon/Google/Microsoft marketing videos are really the exception rather than the rule.
You will find the majority of cloud servers in exactly the same third-party datacentres that everyone else uses.
Why ?
Because only the US and perhaps one or two other countries in the world has the spare land for a cloud operator to dump a massive datacentre campus on.
Most other countries don't. Or if they do, its either protected land (greenbelt etc.) or its uninhabitable (e.g. Australian Outback).
The observation in the first half of your comment seems sound but your explanation doesn't make sense.
There's rather a lot of land in the world outside of the US that could have a datacenter built on it. There's also a lot of Australia that's uninhabited but not uninhabitable.
I oversimplified a bit, by "land" I also meant associated infrastructure, which includes for example access to electricity. In most countries, building out new high-voltage infrastructure to where it does not exist is both financially expensive and technically painful (planning permission etc). Same for pulling fibre runs to the middle of nowhere.
No doubt other things like local laws, tax rebates and whatnot also come into play as well.
You may seek to argue that there are a small number of third-party datacentre sites where the cloud operator is the sole tenant of the building. But this again is not the same thing as the cloud operator building their own. They get the option to up sticks and leave at the end of their contract. They also don't have any responsibility over facilities management etc.
At most sites, the cloud operator simply has whole or part of a floor (or floors in larger buildings), the rest of the building is occupied by third-party customers.
Your order of operations is inverted. The existing data center providers have been doing this for a few decades. Equinox, Dell, and friends are the ones who do the build out, management, etc. If youre big enough theyll also build to suite, though at that size you probably have a lot of in house expertise as well.
AWS, Azure, Google Cloud, etc are the alternative to dealing with dedicated physical infrastructure and financing. Things like AWS Outpost are there to bridge the gap for companies/workloads that dont fit existing public cloud offerings.
> The freeflowing air cooling design, a good design choice as it saves on operating costs for cooling, contributed to nourishing the fire.
The reason you use forced air in your house instead of being built for natural convection is so you don't die in a fire. Fires are all about convection. Infernos doubly so.
There was an early luxury cruise ship tragedy, I think in New York City. A 'freeflowing air cooling design' and all wood paneling. It caught on fire, so they turned around to come back into port... and burned to the waterline, killing a bunch of people.
Building code for passenger ships got changed to require forced air and limit natural (flammable) materials after this.
It's a great thing for startups that OVH provides servers at such an amazing low cost. Yes, there is always a risk that someone will make a mistake in the building design. There's a chance that eliminating some redundancies increases the possibility of a failure. There's always a chance that something bad will go wrong.
However, this isn't just a matter of Hanlon's razor (incompetence vs malice), but more of a matter of an intelligent guesstimate of risk versus a lack of knowledge in some areas (wasn't this OVH's very first datacenter?), and a strong focus on reducing costs. Perhaps the latter went too far, and definitely some obvious mistakes were made by not having a universal power cut off of some sort, but dealing with the amount of power on tap in a datacenter is always dangerous, even when there is no fire at all.
I'm not saying we need to give OVH a complete pass on this. I'm just saying that there are a lot of extenuating circumstances and, except for the power cut off, it's not clear that OVH made any choices due to extreme negligence or cost-cutting. In other words, they didn't do anything immoral. At worst, it appears that (even from the most anti-OVH party here), this was just a mistake in the design of a new (at the time) style of datacenter, and it did work properly for many years before there was a problem. Making a mistake is not immoral.
When it is worked for years without problem, but also without correcting initial sources of risks (learning the business after the clueless first years) that's like learning while driving that the trunk is full of flammable fluid but driving on as "nothing bad happened before".
Buildings should be used within the safety margins of those and prepared for certain type of extrimeties, especially fire. We do not put risky operations into a construction that could not handle or mitigate potential risks (no electricity cutoff, not enough fire extunguising material, no cut off of intense ventillation). Operating a bakery in a barn without alterations comes to mind.
Where fire’s concerned, I do think all mistakes (failure to take reasonable precautions, and it sounds like this is the case) really are either negligent or immoral. The costs you save for yourself and your customers don’t factor in the externalities that will impact third parties in the event of a fire - namely, risk of damage to surrounding property, and risk of injury or death for the people who have to put that fire out.
In this case at least the neighboring buildings are both from OVH as well and the only thing realistically at risk from spreading fire is a rail yard.
Of course that doesn't excuse the design at all. Putting employees and firefighters at risk isn't ok, and I'm kind of baffled that they were allowed to operate like that.
I don't think any self aware extremist is going to feel attacked by the phrase "extremist minority". They know where their beliefs stand relative to the population at large.
Any comment that begins "Thank god attitudes like yours" is very likely breaking the site guidelines. Would you mind reviewing them and taking the intended spirit of this place more to heart? We don't want a culture in which people are just bashing each other. It's bad for conversation.
Those sorts of presumptive openers go unchecked all the time and we both know it. I'm not going to say anything about the patterns about when they do and don't go unchecked because I don't have the data but at some point poor enforcement becomes selective enforcement.
If by 'unchecked' you mean unmoderated, that's likely for the simple reason that we don't come close to seeing everything that gets posted here. You can help with that by flagging such comments or by emailing hn@ycombinator.com in egregious cases.
People breaking the rules doesn't make it ok for you or anyone else to break them, though. I replied to your comment because I happened to see it—it's not as if I recognized your username or have any idea what your views are!
I mean both unmoderated and uncriticized by the community. I know you can't mod everything but it's an not exactly secret that nobody snidely cites the guidelines when they're being lightly violated by people arguing for whatever the majority view is and people are happy to snidely cite them when people arguing for whatever the minority opinion is even get close to getting out of line. To then show up and do that as moderator is not great. Basically the ref can stand to go a little less hard on whoever is losing.
Did OVH make it clear to customers that the saved money was coming from lack of safety? I've never worked with them so I don't know. But it seems to me that if you're advertising yourself as equivalent to a competitor only they have fire suppression and you don't, you are obligated to bring that up.
"We save you money because if anything goes wrong, your servers and all the critical data on them will be melted."
> neither can have more than 3 HDD per machine which is just absurd
Not sure where you get that information from. Some of Hetzner's “auction” machines have 10 drives, many of OVH's offerings support more than three (even some of the budget range, Kimsufi, are 4x2T).
I'm not talking about auctions. I'm talking about regular servers.
And I get those facts from ACTUALLY FUCKING TRYING TO CONFIGURE MORR THAN 3 HDDS on those machines.
You go ahead and try it before talking out your ass and downvoting.
Woops I meant HDD, can't edit it anymore.
Won't make a lot of difference, the CPU is so bad it would be your bottleneck here.
It's a baremetal sever available on their lowcost brand: https://www.kimsufi.com/en/
Hetzner Falkenstein was toured recently by a YouTuber and I didn't spot any fire suppression systems in the video [1]. OVH SBG1 (which was partially destroyed by the fire that wiped out SBG2) used shipping containers which also didn't appear to have any fire suppression systems [2].
By contrast, the typical data centers people know which have fire suppression systems appear to have much of their key electrical equipment (and control systems) located in the same area or adjoining rooms of the same facility [3] [4] [5] [6].
In fact, there is even a video recording of a UPS failure [7] showing a lucky case where no one was injured and not too much damage was caused. The employee lingered in the room when they should have immediately left at the first sign of danger. Arc flashes are a scary possibility as shown in [8] and [9] because of the need to switch multiple megawatts of electricity through complex power systems that includes UPS battery banks and automatic diesel generators.
There are video recordings demonstrating how fire suppression systems work [10] [11] and a description of how a data center would be designed to respond to a fire (including closing ventilation dampers) [12] [13]. I'm sure fire suppression systems are not cheap, but in the grand scheme of a data center full of millions of dollars of equipment (not to mention cost of customer downtime) surely it would make sense to install them.
> Hetzner Falkenstein was toured recently by a YouTuber and I didn't spot any fire suppression systems in the video [1]. OVH SBG1 (which was partially destroyed by the fire that wiped out SBG2) used shipping containers which also didn't appear to have any fire suppression systems [2].
They can definitely shut the vents at the top so that the problem with air circulation during a fire wouldn't appear. Furthermore, their dcs are only on one level, so it couldn't spread upwards. I don't know if they got an external fuse/switch, but it seems likely that there's something between substation and the individual dcs.
As for fire suppresion, I tried to spot one but couldn't either. But he never shows all parts of the DC, it could be in a part that's not shown.
However, with the design they use, if they can ensure that a fire is contained within one DC, the damage would probably be comparably low, especially if you consider the space they have between the racks.
For anyone interested, I maintained a server within the affected DC.
OVH provided 3x the price of the service for the downtime. But for the recovery we needed to buy a new server from them as our backups were only accesible from dedicated machines...
At the end we basically received 2x the price of the server when discounting the temp machine.
Communication during the downtime was not bad from OVH side, taking into account the huge amount of affected servers.
At the end, as a small customer I cant do or ask for much more. As it's not worth neither the money or the time.
IMHO, 3x is not covering any business loses for anyone. We got our own backups within the OVH network and it took 3 additional days to be able to access them as the network was a mess after the fire. That for some business is going to be a huge sum.
This is the illusion of SLAs. These money-back-guarantees only refund the cost of the service. They aren't insurance for loss of business.
If I'm using your service for $5k a month and making less than $5k a month because of it, my finance people might rightfully ask where my head is at. 2x is better than 1x, but in general I think we are looking for higher rates of return for that. This hardware has to pay for my development and really my entire payroll after all.
I also can't trust that you losing $5k an hour motivates you to fix the problem ASAP as me losing $5k an hour, let alone if I'm losing more than that.
Just a heads up that different providers doesn't always mean different buildings. Different cities is more important for a physical disaster than different companies. For operational disasters (BGP/DNS/accounting error/abuse detection false positive/other customer DDoS, etc) different company is useful too, of course.
With this kind of providers you’re kind of expected to have your own online backups in order to avoid outages. The price point certainly allows for it.
This is so weird I can hardly believe it, maybe some details were lost in the writing.
Were they connecting everything directly to the grid? Even the most basic electrical setup goes through a fusebox with switches that turn everything on/off.
Perhaps that box was burning as well, or the fire blocked access to it, idk. If there truly was no way to cut power from the site then, wow, that was just an abysmally stupid decision.
It wasn't directly plugged into the grid but the report says:
>Aucun organe de coupure externe
Meaning there was no way to cut external power from going into the the DC. But the DC also had
>4 niveaux de reprise automatique de courant [very roughly translates to "4 layers of automatic power restart"]
Which I guess kept switching the power back on. So the only way to completely shut down power was by cutting off the building from the grid... with a switch that didn't exist. I don't know anything about data centers but that does not make a lot of sense to me, why would you want your safety systems to cycle back off automatically?
The report also states that the lack of main switch was due to "economic decisions made by the company", but does not give further details about that multi layered restart system
Yes exactly! But it seems like the breakers for the internal circuits were turned back on repeatedly by whatever system was doing a "reprise automatique du courant".I guess that means UPS power back ups, but why would those not be shut off automatically either by the fire alarm or just the breakers? Maybe they weren't, it's not really clear, but it's surprising to me that the breakers would not cut all current if they are damaged? Do fire-safe breakers exist?
This does seem crazy to me. I've probably toured 30-40 datacenter facilities in my career, and they ALWAYS have a big red EPO (Emergency Power Off) button at the major exits to each datahall. I've been in plenty of facilities all over the US, Europe and Japan. (I've also had to deal with the fallout from outages caused by someone accidentally pushing that button. I think they thought it would open the door. Later on those buttons were always covered with a clear plastic housing)
Though never in France and everything I've spent time in was a a retail or wholesale provider, eg selling space to other companies. Not something owned and operated by a single company.
They said they couldn't access the electrical room due to electrical arcs (and fire?). That's where the switches and fuse boxes would have been located.
What they wanted is a switch outside the building, that could cut power to the whole building without having to get to the electrical room.
I think they refer to the absence of one single mains switch to turn off everything in the datacenter. It took 3 hours for the firefighter to find all the individual switches and turn them off, according to the report.
Datacenters sometimes have redundant main power providers, I'm guessing that could change the assumption to "there should be a number of supply boxes nearby, which all need to be interfered with".
It said there were meter long electrical arcs in the "power room". If I had to guess, I'd guess there was a shutoff in the power room, but not one accessible outside of it.
Irresponsible design. It's not just the fire and the damage to businesses, but the report lists concerns about lead from the UPS batteries being spread all the way to Germany as a result of the plume from the fire, as well as in the water from the firemen. Fortunately, in this case, they measured the water and found no significant amounts, nor did the German environmental authorities, but it could easily have been as bad as the Notre-Dame fire where a huge chunk of innermost Paris was contaminated by lead from the destroyed roof.
At the time of the fire I used nodechef that hosted their services on OVH and seemingly all their backups as well (in the same datacenter). Turns out when extraordinary events happen promises of backups and such aren't always kept.
We lost some data because of that, luckily we had our own backups. A good reminder to make sure you have backups and that they are working correctly. No matter what promises anyone gives you should always have your own backup strategy that's disconnected from the vendor you use.
It was Nodechefs fault, not OVH obviously but perhaps it's interesting for others.
Make sure to read the T's and C's and availability / retention rates closely; it's a process that involves decoding the legalese and trying to associate it with RL situations.
Amazon's S3 for example offers a 99.999999999% data durability guarantee, with other bits implying they can withstand a datacenter going up in flames. But there's two caveats there; data availability is lower (so if that datacenter goes up in flame your data may not be lost, but it may also not be directly accessible until they restore their backups), and if they do lose data, what is the consequences to them? It'll be a financial compensation at best.
Breaking the S3 durability guarantees badly enough to get press attention would be a critical reputational hit, with the consequences going far beyond just the contractual SLA compensation for the customers.
It would still heal with time, though. There was a period after some nasty incidents where the general advice was to avoid EBS for anything critical... but that's over now.
Would it be critical though? Unless there are lots of alternatives at a similar availability and price-point, most people aren't just going to jump to another provider. Clouds sound largely interchangeable but you try swapping from Azure to AWS or vice-versa: for anything more than the simplest system, this would probably take 12 months+
> Amazon's S3 for example offers a 99.999999999% data durability guarantee
IIRC AWS S3 was subjected to formal methods design using TLA+, so as long as the underlying infrastructure is correctly implemented, I suspect the durability guarantee is on fairly solid foundations.
Also of course, I suspect the durability guarantee is based on what option you select for your bucket in the S3 console.
Well yes but Nodechef was used before I joined the company I work for. Needless to say, we migrated away from them due to that they didn't handle the situation to our satisfaction.
Since then we've updated our view on backups a lot and it was a valuable lesson for us (or rather the managers). We didn't lose that much data and not anything important since we had backups but they were not daily.
Even if you read the terms and conditions it's hard as a developer to really grok what legalese actually say in practice and since I work for a small company we can't really go out and hire a lawyer for each and every terms and conditions. Instead we now spend time to write our own backup strategy in case a new fire should occur, or something else like war.
Disclaimer: I am French and I had servers (well, my employer did, and I was the admin) in that DC.
From the report (not the post): something that really annoyed the firemen is that not only was there no universal electrical cut-off, there were 4 different electrical backups, which they had to figure out how to cut off one by one...
It's the same thing as the self-cooling design: OVH optimized for what typically matters in a DC. You want an energy-efficient design and you never want power to go down.
Well... except when you do. I suppose by now them and other hosting providers have taken that issue into account and are modifying their DC designs accordingly.
A proper datacenter can be both efficient and safe. Add solid blidners to close the chimneys, and an oxygen suppressant system (NOVEC, etc.), and add motorized switch-fuses. They can all be orchestrated by a PLC, and a fire alarm/control system.
Close blinds, release NOVEC, disable power rails to computers. That's all. It might not stop everything, but it can help a lot.
You can even have passive gates to close ducting which have a wax element which releases if it gets too hot. No extra electrical or PLC knowledge required, just put these on your inlets and outlets and you're good to go.
Unless you assume other people's money is unlimited (which is the accepted industry standard for online discourse related to subjects such as safety and reliability) there will always be tradeoffs.
It's really easy to make low effort comments about the right balance being obvious when you're defining the right choice as "literally any balance of factors that does not recreate the precise set of circumstances that lead to the events that spurned this discussion."
Which providers were you looking at? IIRC Huawei was the only company providing ISO container DCs, but the were super low density (e.g. 50kW total) compared to what the original Google/Sun/HP designs were.
I remember back when OVH was smugly mocking anyone who had concerns about their WC system, to be fair the concerns were kind of ridiculous but ultimately their system _did_ fail catastrophically.
Glad to have left that company years before the fire, never doing business with them ever again.
> Glad to have left that company years before the fire, never doing business with them ever again.
What alternatives are there? Not many good ones at a comparable price point. (Yeah I know about online.net, Hetzner and so on. OVH offers a much more polished product and vastly better network)
At the prices OVH offers, you can have your servers replicated in multiple datacenters for less money than many DCs would charge you for a single server with the same specs.
At this price point it should be perfectly fine if a datacenter burns down occasionally. At least their network is otherwise very reliable.
Well, price are being jacked up and customers like myself are being kicked out on short notice. French providers become less and less attractive as electricity prices soar in France.
I'm not aware of OVH having jacked up prices recently at all (I have services with their SoYouStart and Kimsufi brands, and I'm sure such an occurrence would have started a bun-fight on the LET forums), at least some providers have.
As the problems causing price increases for power continue, I expect many cheap providers (who have very low margins so can't afford not to pass on significant increases in power costs), and larger outfits with budget lines, will have to change prices soon or fail.
I was so annoyed by that I actually started renting a new larger server from Hetzner because the old two would cost more and be slower, but since I’m a lazy bum and dislike the idea of losing anything, I haven’t actually decommisioned anything yet, so I’m now paying 250% of what I did before, instead of 125%.
It is not OVH, but you will see it happen there as well. Electricity prices have gone up by 500% in France. Cheap nuclear energy was the reason they were able to do low cost. That's gone now.
> At this price point it should be perfectly fine if a datacenter burns down occasionally. At least their network is otherwise very reliable.
That depends on what you're looking for. If you're looking for sustainable options, this probably doesn't include companies that burn down datacenters.
I would argue Hetzner is much better than OVH price-wise and I've been using a multiple-datacenter scheme with them for several years now for several of my clients. And they're very happy because the cost is so reasonable.
Depending on what exactly you mean with multiple datacenters, make sure you take a look at the pictures of their DC sites - the DCs in one location may be right next to eachother. Like having your servers in SBG2 and backups in SBG1.
You would need to use multiple locations, not just datacenters, for durability. The network performance between locations just isn't as good as between datacenters in the same location.
With AWS, the "datacenter" isn't a concept users have to worry about (or can even discover or manage). Each AZ comprises many datacenters, each region comprises many AZs, and AWS makes it easy to deploy most services across multiple AZs in a region. If one AWS datacenter goes down (and this happens occasionally), typically only one AZ in the region is affected.
Public rates offered by Hetzner are definitely slightly cheaper than OVH, but OVH offers vastly better volume discounts to even fairly small businesses (talking like 10-20k/mo spend).
IME OVH also offers much better connectivity around the world (which makes sense, given that they operate at a much larger scale than Hetzner)
Oh, and Hetzner billing support is absolutely terrible. I had to fight with them for months to stop charging for servers which had already been cancelled, after tens of emails they eventually owned up to having a bug on their end. It was like talking to a wall.
I've had the opposite experience. OVH charged my creditcard twice for the same payment, then simply refused to see the issue. I sent screenshots showing both payments, they just kept reiterating there was only one invoice (which was exactly the point). And every interaction typically took about five days -- for each follow-up to the same ticket. Had to revert one with my creditcard provider.
With Hetzner on the other hand I've had technical support issues responded to adequately within two hours, multiple times. As a small individual customer no less.
> With Hetzner on the other hand I've had technical support issues responded to adequately within two hours, multiple times. As a small individual customer no less.
Hetzner has very good technical support, definitely no complaints about that. It’s their billing department which is downright unpleasant to work with.
I had the opposite experience - OVH support not responding for several days for fairly simple cases. They even forced me once to send a fax to them..
I also had several outages with OVH (Exchange and Dedicated Server) - none yet with Hetzner. Right now planning to finally leave OVH for Hetzner and Microsoft (Exchange)
They might, but they fail to disclose any kind of pricing for dedicated servers on their website and just send you to a “contact us” form instead. Hardly promising.
Note that OVH is primarily focused on dedicated servers, a VPS host isn’t comparable.
But… Hetzner offers an even lower quality service at an even lower price point. (But only to low-volume buyers, OVH offers vastly better volume discounts)
It would be strange to expect them to deal with the situation any differently if there was a catastrophic event in one of their DCs resulting in destroyed servers.
When working with dedicated servers at this price point, you’re very much expected to deal with your own backups.
Edit: To clarify, they are claiming generic compliance with for example ISO/IEC 27001, 27017 et 27018. Does not look like it from the incident report. Maybe only some of their offers, and that is the detail I am referring to.
OVH SBG1, SBG2 and SBG3 were all at:
Rue du Bassin de l'Industrie, 67000 Strasbourg, France.
The addendum to that document says is only related to activities
on the sites mentioned. The address Rue du Bassin de l'Industrie,
67000 Strasbourg is listed, so sounds like they have now addressed
the issues.
As architect I don't understand how you can blame OVH here.
Local fire code is within the city council, which uses the local fire department, and then with the maintainer.
Without proper planning and fire code measures you won't get a permit.
How did the constructor got a permit at all? This is a wooden building for F 60?? This must be F90 for starters. Then the electrical planning: How did they get a permit at all?
OVH was only a renter. First I would blame local fire department for not enforcing their fire codes.
You don't rent a log house to operate a glass manufacture inside! Each construction have limits of usage, quality or quantity wise, you know that.
Those put their operation into a place must ascertein that the location can withstand the operations and its risks. Even renting a flat put limits on your activity in the contract, what you can or cannot do there.
If they requested the proper fire rating and safety measures then it is not their mistake being deceived. Otherwise it is.
It could be that only risk to life, limb and neighbouring properties was evaluated and (probably correctly) concluded that the combination of low-occupation, trained staff and sufficient escape routes meant that the risk was minimal.
If someone wants to burn down their own property, why not let them? That's up to the business and their insurance to judge.
I've spent a lot of time in US Data-centers, and I can't remember seeing one without an EOD. I've even seen them pressed a few times accidentally when people exiting the floor thought they were "exit" buttons.
Are they talking about the lack of an EOD or something more fundamental to the power system? It almost seems like you need something external to the facility in case of a fire where it's unsafe to enter the building.
This setup is so far off standard practice it's shocking.
Freeflowing does not excuse not having proper auto-closing dampers, triggered by the fire alarms or suppression system.
Automatic power disconnects based on fire suppression system triggering are pretty standard. Most of the systems are fail-safe, meaning if the fire suppression panel loses power or is disconnected from the breakers feeding the room, the breakers trip off.
That there were no automatic shutoffs or easily accessible breakers to disable the generators is shocking. Most NOCs I've visited, there are remote genset controls on one of the walls, often remote transfer switch controls as well. At the very least, on the way out the door someone in the NOC should have pushed the right buttons.
I've never even heard of a commercial datacenter that didn't have fire suppression. Some companies go for preaction systems to reduce the chances of a "everything gets completely fucked by water" systems (gaseous systems are the gold standard, hugely expensive to install compared to water.)
They had wood ceilings!?
In my career in the US, I've been around numerous situations where it was pretty obvious the fire inspector was being paid off or at the very least had an extremely cordial relationship with the building management. My guess is that something similar was happening here.
I don't understand how they managed to have insurance, or get the ISO certifications they claimed they had.
What fire department didn't do the walkthrough? Here that is obligatory and when that happens you can be sure that even a carton sti left from the newest installation is reprimanded.
The last room I took take of was accepted with protocol by an architect. The carpenter and electrition had to explain every single change they made and justify that fire protection was reasonable (e.g. if necessary an extra layer of plasterboard and plaster on every small bit that could ignite was mandatory to gain an est. 30min on top here and there)
ISO: what apart from documentation is necessary for ISO? What Standard are you referring to?
How does it practically work? e.g. if I were OVH I just pick any building owner that gives me the cheapest deal, and such owner has absolutely zero concerns as well about safety? Or would OVH own or co-own the building? I am surprised that the landlord didn't have a virtuous conflict of interest in this case.
It's a massive multi-story datacenter used only by them, specifically designed and built for their passive cooling concept, surrounded by other datacenters used only by them. I would be very surprised if they didn't own the building
If you make such strong statements, I would expect that you back it up. For now, it reads as a shallow dismissal, and the guidelines talk about that: https://news.ycombinator.com/newsguidelines.html
Simple comparison. The ones I've been in, including the bad ones, actually had more than adequate fire suppression systems, proper electrical cut outs, fire monitoring systems and certificates proving that they had been audited and tested.
This OVH DC was designed to a cost rather than a safety specification and the risk did not pay off.
Authorities typically take fire safety serious. So if the fire department write that in their report, authorities typically will adapt their regulation and inspections.
I have done work on a power system for a data center in a previous life.
It is not easy to shut power off in a data center. They are designed that way intentionally. Yes, it is fairly easy to shut down utility power. But then you have automatic diesel generators that will start. If you shut them down, then you have battery powered UPS units.
In the building that I knew well in Canada, they had a well guarded and covered button behind the security desk that was labeled 'EPO': Emergency Power Off.
This button would send a signal to all systems (utility switches, diesel generator, and UPS units) to immediately shutdown or don't even start (diesels).
Small correction: UPS start after utility power goes down and only THEN Diesel stats up. Because you can't start diesel generator in a fraction of a heartbeat.
(Title truncated from original "OVHcloud fire report: SBG2 data center had wooden ceilings, no extinguisher, and no power cut-out" to fit HN length limit)
As a former OVH employee, I was constantly reminded to avoid "surqualité" and try to understand OVH's "bricolocracie" better. I never could. I hope this shock will incite the company to improve the quality of its infrastructure and products. I only wish OVH the best.
"Bricolo" is kind of a slang term for "Bricolage" which can be translated to DIY / Tinkering.
-cracy as in the ancient greek word that is used in democracy/meritocracy/...
In this specific context, I believe parent is hinting (in a negative way) about the internal culture of OVH that could be interpreted as not up to professional standards ?
"Bricolage", in a private context (typically DIY light home renovation) doesn't bear any negative connotations. In fact, several hardware store chains use the term (Mr. Bricolage, Brico Depot, Bricorama).
But using the term when describing a product or in a professional context is far more negative, often describing something which doesn't look well made or designed.
Unrelated tidbit of knowledge i learned yesterday. The first fire brigade was created by Julius Caesars partner/general Crassus. He put together a team of about 500 people to put out fires that happened on a near daily basis in Rome. The catch: he was a land speculator who would flip burnt homes on the cheap so his brigade would run to a fire but until the owner sold to Crassus on the cheap they would watch the home burn.
Talk about a hard nose business or completely unethical leverage, wow.
In a datacenter fire, the risk to human life is very small (has anyone ever died in a datacenter fire, apart from being suffocated by a halon system?).
It's just property risk, data loss and service downtime. Therefore, it's a business decision.
I have worked in a business that, during a fire, prioritized maintaining service uptime over putting the fire out. The end result: They had to buy more new servers, but customer workloads were migrated away within 15 minutes and saw no outage. For them, it was the right decision.
Indeed, and they also often enter buildings to look for people who are in harm's way. OHVcloud's omissions did not amount to a good business decision, it was an irresponsible one.
Maybe risk to life was eliminated by a combination of low occupancy, trained staff and adequate escape routes so that there's no reason for firefighters to enter it, and the building was essentially designed to be disposable should the worst happen?
I don't see this as a particularly bad thing - if risk to life and neighbouring properties was correctly managed (and it seems like it was here), why not allow this?
I think the upset is a general feeling, not necessarily at the incident.
Living in a modern era, I think we take for granted that the building regulations prevent this type of disaster, and it looks like they don’t.
Even in an industrial park, one would expect that engineering effort would have gone into fire suppression and safety systems. It’s easy to wave off something as “whatever, it’s an industrial facility”, but skimping on things like shutoff switches is a demonstration that the company was chasing pennies and putting worker lives at risk.
It could have spread, and people could have gotten hurt. If you have a useful head on your shoulders, then you should have a proactive mindset and try to prevent issues instead of sending thoughts and prayers after the fact.
> It could have spread, and people could have gotten hurt.
Could it though? It is a standalone installation on an industrial site. There are about ten meter-ish standoff between them and their neighbours. (which is also an industrial site)
> If you have a useful head on your shoulders, then you should have a proactive mindset and try to prevent issues
And if you have an even more useful head on your shoulders you calibrate your level of upset to the level of actual danger.
> There are about ten meter-ish standoff between them and their neighbours.
Here in Norway there was a vicious fire[1] in a small town in 2014. It was during winter, and it had been below freezing for months before. If you think 10m is enough to contain a fire, here's a quote from the Wikipedia article:
For a while it was thought that the [village's ice rink] would be a fireline. But the fire made a jump of over 130 meters, and then set fire to a water truck.
If you think 10m is enough of a fire break, you have no experience of fire. The infra-red heat that comes off of a fire is incredibly intense and can easily spread those distances as that distance, especially if pieces of flammable debris were to bridge the gap.
It woul dbe fair enough if that design was recognised and signed-off by the local inspectors, then no-one would complain but it sounds like they were just not doing the risk assessment properly.
Yes, firemen and the technical personnel who had to turn off the power in a room with 1-meter long arcs definitely could have gotten hurt.
Also,
> the level of actual danger
was quite high, 10 meter is nowhere near enough to ensure a fire won't jump over and it might suit you not to make insinuations on the level of danger if you don't have enough experience in the area you're talking about.
Why is Hacker News uniquely such a hotbed for uninformed libertarianism?
1. There were concerns about lead poisoning that were only quelled with testing. I doubt you have knowledge beyond an environmental protection agency, or a crystal ball.
2. As someone else has already said, a break that small for a fire that large is laughable.
3. Fighting this fire put human life at risk.
4. Fighting this fire also used resources that could’ve been allocated elsewhere, or not at all.
Every business operation does not exist in its own vacuum, and you cannot say that concern is illegitimate just because “nothing bad happened”. Again, you did not have a crystal ball then, and you still don’t.
It’s a pretty common attitude in software and business. A significant number of people mistake their work implementing processes in software as domain expertise. (Why would I talk to a building engineer, I has the Google?) It’s a type of hubris or arrogance.
Often times software people get deep in the weeds and think a lot about a particular problem set, but don’t realize that they are actually working inside a fence and don’t see the bigger picture. It’s just like driving - your skill on a racecourse is scoped by the ability of generations of engineering of the brakes that enable you to go fast.
Nowadays, we have mid career tech people here who have never known anything but building stuff in clouds, and have had everything related to building facilities magically taken care of behind the AWS curtain. Anything related to datacenter facilities is black magic.
I think the sentiment is this was easily avoidable if the company had put in some standard technology. It feels like there were corners cut and maybe even the data center was built in an area that had lax fire standards to lower the cost.
I remember this fire, it was definitely one of the biggest disruptions for us at the time.
> it took three hours to cut off the power supply because there was no universal cut-off.
This seems really egregious given the nature of data centers, I would argue less so than the wooden ceilings, those were treated to survive an hour long fire, one would thing if the building was grounded with no electricity the fire could have been handled much faster.
Is the reimbursement just pro-rated for the service costs (i.e. 1 day downtime = 1/30th discount of the monthly bill) or something more? If it's just service costs, I find it almost inconsequential compared to the costs to the business of having downtime.
This is cheap self-service hosting, the customer is expected to deal with things like backups themselves. If your business suffers from a single DC going down, it's your fault.
I deal with a ton of spam/phishing/malware that comes from OVH datacenters - OVH does nothing with complaints. Sometimes you really do reap what you sow.
I have been an ovh customer for over 8 years but they kept increasing their prices and lowering their offerings, I finally moved on and have zero sympathy for them.
I gave OVH a chance by suggesting it to a client. Prior to that point, I had been using an OVH dedicated server for personal stuff without any issues.
We opted for OVH Public Cloud and their Managed Kubernetes (which was considered stable at that point). Kubernetes randomly froze up after deployment, requiring manual intervention from the staff. We had no way to restart the Control Plane on our own. It always took 3+ messages / 1h+ on the phone to reach someone capable of handling it. After months of complaining, they didn't address anything. That was late 2020.
What forced us to migrate away was when we got a VM with a broken NVMe SSD that broke PostgreSQL data and OVH refused to acknowledge the problem. Phone support was still practically unreachable. After spending countless hours debugging their crappy services, we just gave up and moved to Oracle Cloud. Best decision ever - their discounts are great and support is excellent. They're clearly trying to challenge the big 3.
As we were leaving OVH, we send one last email that finally reached someone who gave a crap about customers. They replied that they will "have a manager talk with us"... I don't think that ever happened.
Even if they improve their services, I don't trust them anymore. It was beyond horrible.
* no emergency electricity cut-off device, an "economic strategy choice of the site operator" - the electrical room where the fire started was hot, 400C at the door (measured by thermal camera), with meter long electrical arcing from the door, thundering deafening sounds. Making access of utility technicians "difficult" - it took them 2 hours to cut incoming electrical service from the utility. On site UPS devices also had no cut-off, so they kept supplying.
* emergency water network provided only 70m3/h at the site. A firefighting boat arrived, Europa 1, was called, supplying 14.5m3/min max flow rate.
The freeflowing air cooling design, a good design choice as it saves on operating costs for cooling, contributed to nourishing the fire.