I was so glad to get away after 5 years of 24/7/365. I had to drive home 5 hours from holiday once, leaving the rest of the family behind, spend 20 minutes sorting stuff out and drive back - the untold joy of pre-cloud startups :)
What do you recommend? I figure that if you're working on something without oncall no one probably cares about it any way. I prefer to have a good rotation than no rotation.
Staffing such that on-call is handled by presently-in-office staff. This is, as I understand, pretty much what Google does. When you're in the office, you're in the office, but when you're not, you're not. Having global coverage means ops in several timezones, and this is what Google accomplishes.
Not knowing when, at any time, your phone or pager will go off wears in interesting ways over time.
It depends on the team and type of oncall rotation for the service. My team (a SWE team) has its own oncall rotation as we don't have dedicated SREs for all of our services.
Since we US based only, it means the oncall person will have pager duty while they sleep. Our pager can be a bit loud at night due to the nature of our services, so it's definitely not for everyone (luckily it's optional).
I'll note you're SWE not SRE. I'm talking mostly about dedicated Ops crew on pager.
It's one thing if you're responding to pages resulting from other groups' coding errors or failure-to-build sufficiently robust systems. Another if you're self-servicing.
One of my own "take this job and shove it" moments came after pages started rolling in at 2am, bringing me on-site until 6am. I headed back for sleep, showed up that afternoon and commented on the failure of any of the dev team to answer calls/pages/texts (site falling over, I had exceptionally limited access capabilities and was new on team). Response was shrugs.
Mine was "That wasn't your ass being hauled out of bed. See ya."
The opinions stated here are my own, not necessarily those of Google.
Yes, it is at Google. Our important and high visibility bits have SREs that help monitor our services (SREs actually approached us to take over some bits that were more important).
Google has a lot of oncall people that aren't going to go into a data center (most googlers never see a data center). So there is lots of oncall rotations that still have an SLA that can be handled from their bed if it happens at 2am.
This is not generally true for at least the big SRE-supported services at Google. I don't know what every team does, but my team's oncall shift (for example) is 10am-10pm, Mon-Thu or Fri-Sun. Another office covers the 10pm-10am part of the US day.
Games, mobile apps, desktop apps like Photoshop, Office, Intellij etc and some shrink wrapped server side apps. But you are right, some of these products are starting to have an online component as well.
I recommend others doing on call so I don't have to. I'm not an ops person, though I probably wouldn't mind some of the job, and hate being on call. I did it for a year at my current workplace (as a dev). All the problems I was capable of fixing I automated away, and got really annoyed that others didn't do the same for their areas of expertise. In hindsight, we probably should have had separate rosters for separate areas to encourage ownership, but we were a very small team (6 or so).
Developing software that other people deploy, as opposed to running a service people use directly, is pretty great for doing something people care about without being on call.
At my last job (comfortably small enterprise software shop), we had customers with more employees deploying and running our product than we ourselves had engineers. The only people who were first-line pageable were IT and the one engineer maintaining our demo server, which we eventually shut down.