Oddly, the call to put the SWEs in the on-call rotation was one of the original goals of site reliability engineering as an institutional discipline. The idea at conception was that SREs were expensive, and only after product teams got their act together could they truly justify the cost of full-time reliability engineering support.
It's only in the past 10 years (reasonable people may disagree on that figure) that being a site reliability engineer came to mean being something other than a professional cranky jackass.
What I care about as an SRE is not graphs or performance or even whether my pager stays silent (though, that would be nice). No, I want the product teams to have good enough tools (and, crucially, the knowledge behind them) to keep hitting their goals.
Sometimes, frankly, the monitoring and performance get in the way of that.
It's only in the past 10 years (reasonable people may disagree on that figure) that being a site reliability engineer came to mean being something other than a professional cranky jackass.
What I care about as an SRE is not graphs or performance or even whether my pager stays silent (though, that would be nice). No, I want the product teams to have good enough tools (and, crucially, the knowledge behind them) to keep hitting their goals.
Sometimes, frankly, the monitoring and performance get in the way of that.