I wonder if a physics simulator would have predicted this outcome, and if in fact they have such a simulator for testing both the hardware and software together.
It wasn't a matter of them not knowing what would happen if the satellite was stressed, it was a case of bad data that kept reporting that the satellite was spinning. The software then tried to correct the spin, which resulted in an actual spin, in the opposite direction, that kept accelerating since it was under the impression that the corrections weren't working.
I'd love to see a full post-mortem. It smells like something was very off in either the hardware design or software configuration, but not knowing their architecture, it's very hard to say with any certainty what could have been improved. A couple of questions I have:
- other systems I know of that care deeply about their attitude have multiple redundant sensors in place to "vote" on a consensus output in case one or more of them fails. Was that the case in this hardware design? If not, why not? If yes, how did the collective answer end up a constant error?
- did they have other sensors (such as a strain gauge) that could have been integrated into the model to spot-check this kind of failure mode? A rule like "If the satellite 'feels' like it's tearing itself apart, stop accelerating" could perhaps have been useful (on the other hand, it'd leave the craft vulnerable to other known failure modes, such as "thruster stuck in the on position and must be countered by another thruster to keep the craft stable," which almost killed one of the U.S. manned missions).
Something as simple as software to prevent extended firings of a thruster for any reason would have worked. In a LEO satellite it's constantly being exposed to night/day cycles and isn't in danger of draining the batteries in safe mode, no matter what orientation it is. LEO satellites have low-bandwidth TT&C (tracking telemetry and control) omnidirectional antennas and radio systems in the L and S bands that don't particularly care about the orientation of the satellite. Code as simple as "if thruster tries to fire for greater than period of time, call exception, place satellite in safe mode" would have worked. Using ground based TT&C systems it's possible to manually reorient a satellite in safe mode, or query what its star tracker sees.
So was it a faulty sensor then, not bad software? It's not really clear.
> but its data apparently was wrong, reporting a rotation rate of 20 degrees per hour, which was not occurring. The satellite attempted to stop this erroneous rotation using reaction wheels. The satellite configuration information uploaded earlier was wrong and the reaction wheels made the spin worse.
So both the data and configuration were both wrong, or is this one in the same?
It depends on what you call a "physics simulator". I don't know what they do at JAXA, but I would guess that they have at least some kind of simulator. But on thing which is difficult to avoid is to feed your simulator configuration with the same value as your satellite.
From what I understood, the failure (the final one, in safe mode, which is not the initial error but the one which killed the satellite) seems to be an error in thrust parameters. Somewhere in the software satellite, you evaluate your rotation speed. In safe mode, you probably want to nullify this speed. Let's say that you have measured 1 deg/s around the X axis. The controller will say "give me a torque of - 10 Nms around X (10 is a made up values, and I don't take the inertia tensor into account)
The next stage of the software will convert this "-10 Nms around X" to valve opening of one or more thruster. To do this conversion, it must know how much torque is generated by each thruster. This information must be on board, but to compute it, you need: each thruster position, each thruster direction, each thruster force intensity, the satellite center of mass position.
It's not a problem, this information is available somewhere, in a database. The propulsion engineer, together with people doing the CAD model, the people computing the mass parameters, they have got these values. Someone has checked them also.
Now imagine a scenario where this has been tested but the error has not been found. I don't say it happen like this , but I've encountered this kind of situations. Fortunately, always before launches ;-)
Let's look at the simulator side. What does it need to simulate the actuation of this thruster? Basically, the same information. If the valve is open for t secondes, the torque is t*(F x r). Where does it get this value? Same database, it's the same spacecraft, the same thruster, the same position, the same direction, ...
It absolutely makes sense to not duplicate this value.
But what happens if the value is wrong? For example, if the database says that you should have the thrust vector direction, but you put the direction of the mass flow? You just get -T instead of T for this thruster, both on the simulator and the software. Not a problem, your simulation can still works. (if you do it only for one thruster, it may not work, as you may not have control around one axis. But if you do it for all, or maybe for just two opposite thruster, it still works. It may be or may not be noticed that it does not makes sense)
The verification of this database is an extremely difficult (and boring, IMHO) job. You have trap everywhere. You have this kind of sign error, you need to know if the rotation matrix is from reference frame A to B, or B to A, does a 1 in a bit field means actuating your RW CCW or CW, is the tachometer of the wheel in the same direction, how are the thruster numbered from 1 to X, bonus if you have multiple different ordering method because the software guy, the electrical guy and the mechanical guy use a different one...) And this base is huge. (people managing this merit respect, but we prefer to yell at them because they are always late. Due to our own input, of course)