The grid is supposed to tolerate any single failure, even under full load. Of course sometimes the first failure is a fire or equipment malfunction and the second failure is a planning failure or someone pressing the wrong button.
Cascade failures are common. The weakest link fails, load gets re-distributed evenly, the second-weakest link fails and so on.
In theory [a flawed one] you've had enough spare capacity to survive N failures and N+1 failures are statistically unlikely because p^(N+1) is close to zero.
On practice [or with a better theory] you can't multiply probabilities in a grid system because random variables aren't independent. 30% spare capacity can go to -100% in a second.
Grey failures are harder for large systems to handle. If a chunk goes hard down that's usually easy. Something like voltage oscillations that trigger cascading failures in a sequence can lead to negative feedback loops that bring it all down.