One way to reduce outages and bugs would be to put an incentive plan in place that pays bonuses to the folks on-call for each time they fix a problem while on-call.
This would have two effects:
- Management would strongly encourage the design and implementation of systems that fail less, which results in less payout of these bonuses.
- Employees would want to more willingly be part of the on-call process.
Imagine a group of your workforce eagerly waiting to fix an impending failure while another opposite group is eagerly making sure those guys don't get paid a bonus.
And, you could tie it all together with bonuses for everyone when you meet certain performance levels.
I certainly don't disagree that there are opportunities to game the "system," but in startup environments there are lots of people keeping a close eye on development and production environments. Someone gaming it will quickly be exposed.
Just to clarify, I'm not recommending this approach to enterprise corporate environments where layers upon layers of management and developers could easily derail what I am suggesting.
But for the startup, it is worthy of consideration.
This would have two effects:
- Management would strongly encourage the design and implementation of systems that fail less, which results in less payout of these bonuses.
- Employees would want to more willingly be part of the on-call process.
Imagine a group of your workforce eagerly waiting to fix an impending failure while another opposite group is eagerly making sure those guys don't get paid a bonus.
And, you could tie it all together with bonuses for everyone when you meet certain performance levels.