Unless I'm misunderstanding something the system did not perform as documented. It should have scaled, it didn't.
When a critical piece of infrastructure fails under massive load I'm not sure it it'll help much when you politely tell your engineers they fucked up for not anticipating it.
You learn lessons. Both Slack and AWS seem to have learnt lessons here.
I agree with much of what you say, but if you change it to "It's Amazon's fault, not ours", that's where I diverge.
Slack did fuck up here, as evidenced by the outage and you seem to at least partially agree by the fact that Slack learned a lesson. Further, I think that "understanding how your system scales up from a low baseline to a high level of utilization (such as Black Friday/Cyber Monday for e-commerce, or special event launches, or a SuperBowl ad landing page)" is a standard, "par for the course" cloud engineering topic to be on top of nowadays.
When a critical piece of infrastructure fails under massive load I'm not sure it it'll help much when you politely tell your engineers they fucked up for not anticipating it.
You learn lessons. Both Slack and AWS seem to have learnt lessons here.