Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A few things that worked for us:

1. The roster is set weekly. You need at least 4-5 engineers so that you get rostered not more than once per month. Anything more than that and you will get your engineers burned out.

2. There is always a primary and secondary. Secondary gets called up in cases when primary cannot be reached.

3. You are expected to triage the issues that comes during your on-call roster but not expected to work on long term fixes. that is something you have to bring to the team discussion and allocate. No one wants to do too much off maintenance work.

4. Your top priorities to work on should be issues that come up repeatedly and burn your productivity. This could take upto a year. Once things settle down, your engineers should be free enough to work in things that they are interested in.

5. For any cross team collaboration that takes more than a day, the manager should be the point of contact so that your engineers don't get shoulder tapped and get pulled away from things that they are working on.

Hope this helps.



> 2. There is always a primary and secondary. Secondary gets called up in cases when primary cannot be reached.

Now you have two people on-call. Except if the expectation is that the secondary doesn't need to carry a laptop/can be unreachable. Important consideration to meet "only on all every x weeks".


megacorp I work for solves this by automatically escalating pages up the org chart every 30 minutes using LDAP when a page isn't acknowledged. while this seems scary, it makes the managers have a pager (and feel the pain, many actually get paged when the engineers get paged just so they know things are breaking and how bad the tech debt is). It also means you don't need to have a secondary, the manager just doles it out if it gets lost.

It has other big benefits, it lets N+1 tier know when tier N doesn't have a pager setup. Sometimes this is the engineers, but it gets real fun when a Director or VP gets paged, ops culture sharpens up very quickly. It also forces the managers to buy in to oncall as I said, which is a good thing imho.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: