> Amazon does a good job of training new hire on the 'Amazon way'. Amazon does 6...

xwolfi · 2025-01-13T05:26:37 1736745997

Which is the only way to learn tbh, you can receive as much positive reinforcement imaginable, nothing prepares you for a large scale incident like living through one, building the connections you need to solve it, getting the shame of your life, and losing sleep over your failure.

saghm · 2025-01-13T18:57:45 1736794665

I'm not really sure what you mean by "positive reinforcement", but I don't think it's possible to disagree more with this sentiment. "building the connections you need to solve it, getting the shame of your life, and losing sleep over your failure" isn't a strategy for teaching for something; it's a coping mechanism for someone trying to brute force their way through something that they weren't adequately trained for.

Most people seem to think it's fine for companies to offload the entirety of the burden of learning to individual employees, and maybe I'm an outlier in this regard, but to me, this seems more like a cop out to avoid trying to actually solve the problem at the cost of the employee's emotional health. I'm not surprised that companies default to this, but it's also not surprising that burnout is so common in our industry when this is considered the "best" or "only" way to do things.

pyrale · 2025-01-13T08:25:53 1736756753

Nothing teaches you to swim like being thrown in the middle of the atlantic.

Sevii · 2025-01-13T15:17:13 1736781433

My team didn't add new members to the oncall rotation for about 6 months to ameliorate this issue. But starting oncall at first is rough and even with months of context on our systems people usually take a few rotations before they really figure it out. We expected new members to have to escalate.

saghm · 2025-01-13T19:06:36 1736795196

I don't think this really ameliorates the issue much; it just pushes the problem down the line. IMO this is a big part of why people transfer internally to different teams so much at Amazon, and that masks the problem even further. If by six months of on-call rotation you expect someone to be self-sufficient, but they only start after six months on the team, they'll have been on the team a year by that point, and people either transferring or leaving the company after a year on average isn't going to be immediately obvious as a problem, but if people start a few weeks in, you're going to have 5-6 months of noticing that there are issues when that person is on-call, and if that happens more than a few times, the trend will be noticeable.

wesselbindt · 2025-01-13T13:47:47 1736776067

> they didn't have any separation between SREs/on-call engineers from the people who implement their services

I.e., treat DevOps as a way of working, rather than a role meaning something akin to "Ops person who knows terraform, or k8s, or Ansible etc".