I'd love to see a full post-mortem. It smells like something was very off in either the hardware design or software configuration, but not knowing their architecture, it's very hard to say with any certainty what could have been improved. A couple of questions I have:
- other systems I know of that care deeply about their attitude have multiple redundant sensors in place to "vote" on a consensus output in case one or more of them fails. Was that the case in this hardware design? If not, why not? If yes, how did the collective answer end up a constant error?
- did they have other sensors (such as a strain gauge) that could have been integrated into the model to spot-check this kind of failure mode? A rule like "If the satellite 'feels' like it's tearing itself apart, stop accelerating" could perhaps have been useful (on the other hand, it'd leave the craft vulnerable to other known failure modes, such as "thruster stuck in the on position and must be countered by another thruster to keep the craft stable," which almost killed one of the U.S. manned missions).
Something as simple as software to prevent extended firings of a thruster for any reason would have worked. In a LEO satellite it's constantly being exposed to night/day cycles and isn't in danger of draining the batteries in safe mode, no matter what orientation it is. LEO satellites have low-bandwidth TT&C (tracking telemetry and control) omnidirectional antennas and radio systems in the L and S bands that don't particularly care about the orientation of the satellite. Code as simple as "if thruster tries to fire for greater than period of time, call exception, place satellite in safe mode" would have worked. Using ground based TT&C systems it's possible to manually reorient a satellite in safe mode, or query what its star tracker sees.
- other systems I know of that care deeply about their attitude have multiple redundant sensors in place to "vote" on a consensus output in case one or more of them fails. Was that the case in this hardware design? If not, why not? If yes, how did the collective answer end up a constant error?
- did they have other sensors (such as a strain gauge) that could have been integrated into the model to spot-check this kind of failure mode? A rule like "If the satellite 'feels' like it's tearing itself apart, stop accelerating" could perhaps have been useful (on the other hand, it'd leave the craft vulnerable to other known failure modes, such as "thruster stuck in the on position and must be countered by another thruster to keep the craft stable," which almost killed one of the U.S. manned missions).