Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

snippet summary: prelim failure cause might have been strut holding up helium tank inside of oxygen tank failed well-below rated stress causing tank to release helium into oxygen tank > failure.

snippet:

"Preliminary conclusion is that a COPV (helium container) strut in the CRS-7 second stage failed at 3.2Gs. A lot of data was analysed, it took only 0.893 seconds between first sign of trouble and end of data. Preliminary failure arose from a strut in the second stage liquid oxygen tanks that was holding down one composite helium bottle used to pressurize the stage. High pressure helium bottles are pressurized at 5500 psi, stored inside in LOX tank. Several helium bottles in upper stage. At ~3.2 g, one of those struts snapped and broke free inside the tank. Buoyancy increases in accordance with G-load. Released lots of helium into LOX tank. Data shows a drop in the helium pressure, then a rise in the helium pressure system. Quite confusing. As helium bottle broke free and pinched off manifold, restored the pressure but released enough helium to cause the LOX tank to fail. It was a really odd failure mode."



Indeed, it appears that they had to test thousands of struts. Most passed but they found one that failed. Microscopic examination showed bad grain structure. They're made by a vendor so it would appear that vendor has a quality control problem.

The delays will be in setting up a QA process to test each individual strut part, as well as eventually testing all vendor-supplied parts. If anyone can do it, SpaceX can. No reason they can't design robot systems to test every part.


I worked for a while for a vendor that produced some critical components for ... let's say major entities. NASA, US military, Boeing, and so on. I was in electrical QA, next-to-last step before shipping. My job was to electrically scrutinize individual pieces (in some cases) or random samples from a batch (depending on how much the customer was paying) and compare their output to customer's spec.

If your stuff is mission critical, if anybody could potentially die, you really can't trust the parts from these vendors. A few of their smarter customers would repeat all of our tests and send back defective parts. The thing was, we had a lot of borderline stuff come through from bad production (poorly paid or poorly trained staff or defective tools or materials), and once some of these parts hit QA, they had a lot of expense sunk into them. The company didn't want to eat that cost, so there were a lot of arguments between myself and the general manager. I failed a lot of stuff that previous people in my position had let slide.

They also had a really stupid hockey-stick output graph each month. The beginning of the month was slow, we were all cleaning our work areas and retesting our test equipment, and then the last week of the month they'd try to produce 90% of their expected output for the month. Because of my reputation for rejecting stuff, he'd hover over my work area for the last day or two each month.

Given the size of the company I worked for, I have to assume this is not uncommon practice.

It was a heck of an experience, I finally got a better understanding for why so many things seem to break all the time.


I'm bookmarking your comment. It is just the slice-of-life that I want to show newbie engineers. Like some people say you need to spend a year or two in the service industry to learn empathy, I feel engineers likewise need to spend time in QA to learn what their ethics really are. QA is hard, and the pressure to pass is difficult to withstand; eventually you take the "my boss told me to do it" attitude or you learn to make ... Well, not enemies, but certainly rock the boat.

Kudos to you!


> I feel engineers likewise need to spend time in QA to learn what their ethics really are. QA is hard, and the pressure to pass is difficult to withstand

You really nailed it. The GM's position -- and he said this more than a few times -- was that the parts were designed with extra tolerances already, so if they were a little below spec it was OK.

Engineers have to keep that in mind when designing products: production knows there's a margin for error and they'll take that into consideration when deciding whether or not they can get away with shipping something.

(And the GM was a pretty OK guy, we got along fine otherwise. He in turn was just under a lot of pressure from further up the ladder to meet certain production goals.)


“Look at this. What do you see?” He nodded at Tony again.

“A laser weld, sir.”

“So it would appear. Your identification is quite understandable --- and quite wrong. I want you all to memorize this piece of work. Look well. Because it may easily be the most evil object you will ever encounter.”

They looked wildly impressed, but totally bewildered. He commanded their absolute silence and utmost attention.

“That,” he pointed for emphasis, his voice growing heavy with scorn, “is a falsified inspection record. Worse, it’s one of a series. A certain subcontractor... found its profit margin endangered by a high volume of its work being rejected... The welds passed the computer certification all right --- because it was the same damn good weld, replicated over and over again...”

He gathered his breath. “This is the most important thing I will ever say to you. The human mind is the ultimate testing device... There is nothing, nothing, nothing more important to me in the men and women I train than their absolute personal integrity. Whether you function as welders or inspectors, the laws of physics are implacable lie detectors. You may fool men. You will never fool the metal.”

--- Lois McMaster Bujold, _Falling Free_


I was really pleased that they found a strut in their current batch that would have failed in a similar way. That, for me, is key to the confidence on this root cause. If they had been unable to find one it would have remained "theoretically possible but we don't know how", now it is "if the strut is made improperly it can fail."


Dad worked at a firm that produced high-reliability capacitors (they went into the IBM System/360, etc) and they did 100% testing. IBM would do their own 100% testing upon receipt, and they would still find a few rejects each month. They investigated these of course, but never really found a cause. They marked it up to delayed yield problems from the production line.

The funny thing was Delphi was also a customer (the electronics subsidiary of General Motors). They wouldn't pay for more than normal statistical sampling. When he visited them, they had large bins of defective car radios that wouldn't turn-on, etc. To Delphi it was cheaper to run the production line flat-out and deal with the failures, rather than find problems earlier by inspection of received parts.


Listen to the "This American Life" about NUMMI, and you'll see that this approach was endemic to GM assembly and it's supply chain for, like, ever.

http://www.thisamericanlife.org/radio-archives/episode/561/n...


Thank you for this comment! So simple, yet so deep. Sunk costs.


Great story, thaumaturgy! Does anyone know how much more rigorous Statistical Process Control [0] needs to be for aerospace than for consumer electronics (for example)? I wouldn't be surprised if 1/1000 failure is AOK for consumer hardware and catastrophic failure for aerospace (as we see with SpaceX).

[0] https://en.wikipedia.org/wiki/Statistical_process_control


Hmm, the takeaway I see here is to find out the period each vendor's quota is measured in, and slot your orders in early.


> setting up a QA process to test each individual strut part

Can they actually do that? Aren't most of those tests destructive?


If it's rated to 6000lbs, and you test it at 6000lbs, it certainly _shouldn't_ be destructive.


It should go back as long as the test stays below the yield point: https://en.wikipedia.org/wiki/Yield_(engineering) ... but there is also fatigue: https://en.wikipedia.org/wiki/Fatigue_(material) . This is just for raw material though, I'm not sure how it works on the connectors.


If you only do a few cycles under the yield point, fatigue isn't going to be an issue. If you ran 10^5 or more on a fatigue critical part, then I'd start to question it.


Those struts are also used in the stages that are about to be reusable, so my guess is that they're supposed to withstand the flight conditions multiple times. That would hint the equivalent tests should also not be destructive to the material.


The post said they're rated to 10,000lb and failed at 2000. Sounds like there's a lot of margin between the highest force expected and the rated level, so they could test somewhere between the two.


From what I understand they are designed for 10,000lb rated/certified for 6,000lb and only need to withstand 2,000lb. They found one/some struts in stock that failed at/below 2,000lb. Does anyone have some other interpretation? It seems like every place I read one of those numbers it's different.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: