> Amazon does a good job of training new hire on the 'Amazon way'. Amazon does 6 pagers, they do design docs. Amazon does SOA. Amazon does not use relational databases. Everything has an API. Because of the 'Amazon way' and the training they do new team members understand at least some of the context and expectations.
As a counterpoint, a huge part of Amazon's culture (or at least, AWS's) in my experience was the emphasis on operations and the fact that they didn't have any separation between SREs/on-call engineers from the people who implement their services, and at least for me as someone who had never been on-call before in any meaningful capacity (due to my previous job working on a libraries rather than services), the training for it was basically non-existent on the two teams I spent time on. The "training" I did receive essentially consisted of being put on the rotation once to shadow, where I was able to sort of see what the actual on-call person did but didn't really have any explanation for how to know how to do them other than being told to read the runbooks, which were not really written in a way that was easy to understand for me as someone who was so new to learning all of the internal AWS tooling and ops in general. The next time I came up on the rotation, I was expected to be able to manage on my own, which essentially meant that literally no matter what occurred, I ended up having to escalate because I wasn't knowledgeable enough to fix literally anything within a timeframe that would have been reasonable.
Which is the only way to learn tbh, you can receive as much positive reinforcement imaginable, nothing prepares you for a large scale incident like living through one, building the connections you need to solve it, getting the shame of your life, and losing sleep over your failure.
I'm not really sure what you mean by "positive reinforcement", but I don't think it's possible to disagree more with this sentiment. "building the connections you need to solve it, getting the shame of your life, and losing sleep over your failure" isn't a strategy for teaching for something; it's a coping mechanism for someone trying to brute force their way through something that they weren't adequately trained for.
Most people seem to think it's fine for companies to offload the entirety of the burden of learning to individual employees, and maybe I'm an outlier in this regard, but to me, this seems more like a cop out to avoid trying to actually solve the problem at the cost of the employee's emotional health. I'm not surprised that companies default to this, but it's also not surprising that burnout is so common in our industry when this is considered the "best" or "only" way to do things.
My team didn't add new members to the oncall rotation for about 6 months to ameliorate this issue. But starting oncall at first is rough and even with months of context on our systems people usually take a few rotations before they really figure it out. We expected new members to have to escalate.
I don't think this really ameliorates the issue much; it just pushes the problem down the line. IMO this is a big part of why people transfer internally to different teams so much at Amazon, and that masks the problem even further. If by six months of on-call rotation you expect someone to be self-sufficient, but they only start after six months on the team, they'll have been on the team a year by that point, and people either transferring or leaving the company after a year on average isn't going to be immediately obvious as a problem, but if people start a few weeks in, you're going to have 5-6 months of noticing that there are issues when that person is on-call, and if that happens more than a few times, the trend will be noticeable.
As a counterpoint, a huge part of Amazon's culture (or at least, AWS's) in my experience was the emphasis on operations and the fact that they didn't have any separation between SREs/on-call engineers from the people who implement their services, and at least for me as someone who had never been on-call before in any meaningful capacity (due to my previous job working on a libraries rather than services), the training for it was basically non-existent on the two teams I spent time on. The "training" I did receive essentially consisted of being put on the rotation once to shadow, where I was able to sort of see what the actual on-call person did but didn't really have any explanation for how to know how to do them other than being told to read the runbooks, which were not really written in a way that was easy to understand for me as someone who was so new to learning all of the internal AWS tooling and ops in general. The next time I came up on the rotation, I was expected to be able to manage on my own, which essentially meant that literally no matter what occurred, I ended up having to escalate because I wasn't knowledgeable enough to fix literally anything within a timeframe that would have been reasonable.