Hacker News new | past | comments | ask | show | jobs | submit login
Anatomy of a Catastrophic Boiler Accident (1997) (nationalboard.org)
100 points by zetalyrae on Sept 7, 2021 | hide | past | favorite | 47 comments



This tragically illustrates the wisdom of the saying in the U.S. Navy's nuclear propulsion program: You get what you INspect, not what you EXpect.

The Navy's SUBSAFE program [0] arose directly from the 1963 loss with all hands of the USS Thresher, which is generally thought to have resulted from a cascade of damage ultimately traced back to a shipyard's faulty silver brazing of a salt-water pipe [1].

[0] https://en.wikipedia.org/wiki/USS_Thresher_(SSN-593)#SUBSAFE...

[1] https://en.wikipedia.org/wiki/USS_Thresher_(SSN-593)#Cause


Years ago I was offered a job at Portsmouth Navy Yard, and due to the great difficulty they were having hiring they invited me out for a bit of a reverse interview - meetings with their leadership and the grand tour of the facility to try to sell me on taking the position. This was basically an entry level position so the experience was rather unusual.

There are a lot of very interesting things to see there, but one thing that stood out is how they took me and a couple of other prospective employees to see the memorial wall to the Thresher in one of their conference rooms. Even fifty years later it felt that the incident was still well known at Portsmouth and influential on their culture.

Later I would work at facilities handling much more dangerous weapons that seemed to lack anywhere near that level of personal connection to safety culture, despite a legacy that had claimed numerous lives and not only among the enemy.


Slightly off topic, today I learned there is a Naval Yard in Portsmouth, Maine. It did genuinely confuse me for a second because Portsmouth UK is also a historically important dockyard where BAE systems, who build the UK submarine fleet, still have an important presence.


It's an especially confusing situation because Portsmouth Navy Yard is kind-of sort-of in Kittery, Maine, not Portsmouth, which is in New Hampshire. It's an island in the river between the two and the only bridge connects it to Kittery, not Portsmouth. This was part of a border dispute between the states: https://en.wikipedia.org/wiki/Piscataqua_River_border_disput...


37 Pings : Death Throes of the USS Thresher

https://www.youtube.com/watch?v=HV5FGTxIU4Q

video is about:

The Navy released the ninth and tenth set of documents from a previously classified investigation into the April 10, 1963 loss of USS Thresher and its crew of 129 sailors off the coast of New England.

https://news.usni.org/2021/07/09/navy-releases-latest-round-...

maybe interesting


Wrong bolts. That matters.

In the private sector in the US, there's The Hartford Steam Boiler Inspection and Insurance Company, established in 1866. They were the first company to insure steam boilers, and they still do. They inspect them before insuring them, and re-inspect at random times thereafter.

They've been trying to expand this approach into "cyber insurance", but with limited success. They will insure the cooling and power systems for your data center, though. They know how to inspect those.


Not the bolts, the wrong NUTS. The bolts were the original: the mechanic wanted to change them. They asked for replacements and were shown a box of parts from which they selected the nuts and re-used the bolts.

From reading the OP it appears the failure was the replacement nuts, not the re-used bolts.

This wasn't a failure of the boiler vessel itself, although the article says that the stop valve was defined as "a boiler-boundary".

BTW 600 psi and 850 F is a lot of heat and energy. UPDATE a post below talks about 1200 psi 975 degree superheated steam.


Something people don't appreciate is at that temperature a jet of steam is invisible. When we think of steam we are often thinking of water vapour, really just tiny particles of water.


Yeah, sounds like brass nuts used instead of steel - they expand about 80% more per degree than steel.


Basically all the world over, the steam boiler accidents lead the way to the start of the external control organizations.

For example, the TÜV[0] in Germany was founded in 1866 after "the explosion of the boiler at the Mannheim Aktienbrauerei in January 1865, the idea was pursued there to subject boilers to regular inspections on a voluntary basis, as was already the case in Great Britain".

[0]: https://en.wikipedia.org/wiki/Technischer_%C3%9Cberwachungsv...


If you enjoyed this, you might also enjoy the US Chemical Safety Board's YouTube channel, where they analyze major industrial accidents and their root causes: https://www.youtube.com/user/USCSB/videos?view=0&sort=p&flow...


I read it, but I can't say I enjoyed it. I worked on a US Navy ship in the early 1970s, with 1200 psi 975 degree superheated steam. A couple years before I came aboard, that ship had suffered a boiler explosion, killing the four men on watch in the after fireroom. This article was too familiar.


Same thing.

As a trainee I was helping a crew supervise a power plant in charge of providing superheated steam to a large nuclear facility.

There was a bronze plaque on the ground in one of the technical room where some guys had been cooked alive.

Superheated steam and boilers scare the hell out of me.


The twitter account @swiftonsecurity linked me to these videos once and I've been watching every single one of them ever since. It's very interesting to see how a set of complex systems can disastrously fail because of a small mistake two days earlier or because of some basic human error that anyone could make.


We've almost come to full scale nuclear war, I think more than once, due to failure of singular computer chips in a known failure state.

Functioning modern society/things in general just working (for those of us living in developed countries) is incredibly fragile and regulations are often still taken completely for granted.


The official report has more details https://www.jag.navy.mil/library/investigations/IWO%2520JIMA...

The accident is an example of the rare cases where the tolerance for error is tiny to prevent catastrophic consequences. Usually systems are designed with multiple lines of defense but this is not possible with a steam valve. The process to ensure that the correct fastenings were used was not in place. Ideally in these cases engineers should use poka yoke where the device can't be assembled incorrectly e.g. using an unusual thread size or marking all low strength fasteners in an obvious way to indicate low strength


Ha, I looked up "poka yoke", and found this Medium article:

https://medium.com/@bhavyamangla/error-proofing-poka-yoke-fo...

but the jigsaw pieces with the words "Poka" "Yoke" can be connected in seven incorrect ways :-)


>poka yoke

In an online legal forum, a lawyer said (paraphrasing), "I keep trying to make my master services agreements more and more idiot-proof, but they keep coming up with better and better idiots."


That seems to be based on a quote from the book "The Wizardry Compiled" by Rick Cook:

Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning.


Interesting, I've never run across this term before but that's how we design electrical connectors. Ideally it should be physically impossible to connect a cable to the wrong place, via deliberately varied pin counts, polarity, keying etc.


There is this spectacular rocket failure: https://youtu.be/ycRVAcZC5R4

According to the reports it happened because a technician installed the innertial measurement unit upside down. It was supposedly designed to be impossible to do that, but the technician found a clever way to bend the PCB :)


What I'm surprised by here is that the rocket didn't self destruct as soon as it started tilting - it could have easily flown towards the crowd. Is self destruct not a thing most rockets have?


It’s a race between engineers inventing more and more foolproof devices…


Any time someone designs a fool proof device, someone goes out and designs a better fool.


having done some steam work before... that's the one thing that terrifies me in industrial settings. so much pressure and heat with little margin for error. having worked with the people that install things like that is no comfort at all.


I’m not sure what you have against union pipefitters, they’re probably one of the highest skill building trades. They’re generally at the top of prevailing wage scale, they do clean looking work, and pipefitting is still pretty much exclusively union labor.

I’m on the commercial side, so maybe industrial is different?


I have a relative who is a boilermaker, though now he supervises. Listening to his stories, there's never any shortage of idiots. A good team requires some veterans and some good supervisors, all whom need to constantly keep an eye on the idiots, lest they kill or maim themselves or someone else. Also, AFAIU, there are often rules requiring a minimum percentage of local union members on a project. So the number of idiots on a project is both a function of the quality of the local membership, as well as how many reliable veterans the national union and contractor can put on a project, assuming a project will even have non-local union members.

While much less physically exhausting, he doesn't much like his new job supervising, which entails alot of walking around and exclaiming, "WTF!?"


Does this link work for you?


Me neither. Looks like he over-escaped the URL somehow ('%2520' is escaping the '%' in what should be the space-separated URL's '%20'), and that's why you're getting that access-denied error (because the over-escaped URL doesn't correspond to an existing file). If I go to https://www.jag.navy.mil/library/investigations/IWO%20JIMA%2... directly, or try out a IA version like https://web.archive.org/web/20110716095835/https://www.jag.n... then it works.


Yep, looks like it's fixed now?


For those looking for an illustration of what exactly the bonnet of a valve is:

https://www.littlepeng.com/single-post/2017/06/29/major-valv...

I hear software engineers frequently talk about things being "dangerous", but IMHO most of the time it's incomparable to the dangers in working with heavy machinery.


Reminds me of this story [1] about the BA flight where a window blew out, nearly costing the life of the pilot who only survived being blown out the cockpit because the cabin crew hung onto him until the plane landed.

[1] https://en.wikipedia.org/wiki/British_Airways_Flight_5390


Another accident comes to mind - sub-specced metal used for steam pipes on the nuclear cruiser Peter the Great. The pipe ruptured, 5 killed. After years of delays during the last years of USSR and the economic collapse of the beginning of 199x, the cruiser was rushed to get launched for the 300 years of the Russian Navy celebration deadline.


I looked at the Wikipedia article on that ship, and several other articles, and could find no mention of that. Are you sure it wasn't the Kirov?


https://lenta.ru/articles/2004/03/24/petr/ or https://www.kommersant.ru/doc/242286 (that being right at the time reports 4 dead, 2 heavy injured)

and it wasn't the only mishap, though the deadliest.


Ah, no wonder I couldn't find it--it's in Russian. Thank goodness for machine translation, and thank you for the links!


For similar and excellent write-ups of the causes and consequences of air crashes, I highly recommend Admiral Cloudberg's write-ups: https://admiralcloudberg.medium.com/


> First, the mechanic wanted to replace the fasteners, but he did not have any. He also did not speak English very well. Allegedly, the mechanic asked one of the boiler room personnel for new nuts and bolts, and was given permission to look through the boiler room's spare parts bins. He selected parts that he thought would work.

This makes it sound like the mechanic was out of his depth, if he didn't know about the importance of nut material in a high temperature application.

The subsequent failures in inspection are not excused, of course.


Differential expansion is well known to most mechanical engineers - it frequently is the cause of leaks or issues.

However nuts and bolts suffering differential expansion so badly to cause a total failure I have never heard of. I would attribute at least half the blame to the original designer for not having enough dimensionality and/or strength margin on the threads.

600psi steam is only at a little over 225 degrees Celsius, which really isn't much thermal expansion for metals.


I worked on a ship in which there had been a steam pipe rupture. Injured several sailors, caused deaths of two in the hospital from severe burns.

There is so much that can go wrong in engineering systems. Unintentional mistakes could lead to dire consequences. Safety and quality assurance programs may seem like time consuming inconveniences, but have a very important role.


In 2020 there are very few uses left for steam boilers.

Steam systems still have plenty of uses, but there is no longer any need for storage of large quantities of high pressure steam in a boiler.

Steam should be treated like electricity - generate and use it at exactly the same rate. That way, you need no storage, and if a pipe bursts there won't be an explosion.


The rate of steam generation can't be controlled in the same way as electricity, no? As I think about it, I suppose electricity generation also can depend on large amounts of moving water - maybe it would work by having extra capacity that is 'bled off' when not needed.


Steam isn't exactly "stored", but the pressures involved mean that any small breach of confinement becomes dangerous.


oh yeah - it's still dangerous. But the danger changes to "will probably kill one person" rather than "will kill everyone in the ship/building"


>It is especially important in today's environment of cost-cutting and increased profit margins that safety not be sacrificed.

Was there ever a time where the environment was not one of cost cutting and increased profit margins?


I think there are two challenges here:

1) Safety systems work so well that people get complacent. "Approval of the fasteners is required, so I'm not going to get out a flashlight and mirror and double check."

2) At one point, many failure modes were totally unknown. Someone discovers them for the first time. You can have a comprehensive safety program that's well funded and always performed correctly, but if there is a failure mode that nobody knows about, it's as likely to happen to you as it is to someone else.

And hey, at least people give safety lip service. Nobody ever posts signs that says "cost cutting is our #1 priority", they always say that safety is their #1 priority. Their heart's in the right place at the very least ;)


Especially when tied to expensive compensation, it then becomes a no-brainer to put safety first as it's cheaper. I do wish laws were designed like this to inculcate a safety focused culture




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: