It's been completely reliable when there were actual problems. The web interface was pretty intuitive to us as well. I really like the login flow.
Our PagerDuty is integrated right now with Scout, Pingdom, and our own custom alerting system.
So far most complaints we've had while using the service for the first week were our own faults: for example, our monitoring was too sensitive, which was fixed by using the regexp filters, and by eliminating spurious errors from reporting on our side. One thing that PagerDuty did was that it basically forced us to fix these reporting issues so that we weren't woken up at 5AM unless it was a real emergency.
The SMS interface got a little confusing when we had two errors at once. For example, a frequent case is getting two pages at once, "Service X is DOWN" and "quora.com is DOWN". I think what I tried doing was:
1. Receive the first report (site).
2. Receive the second report (service).
3. ACK both reports using the second report's code.
4. Fix the service.
5. Attempt to resolve the second report, receive a "code already used" error.
Resolving things via SMS is a little bit clumsy (it's what I usually default to). A link to the PagerDuty login would be cool, but I don't know if it would fit in the 160 character limit.
Thanks for the feedback! Very good point, we should allow you to send multiple replies per SMS alert to ack and then resolve (currently, there's a limit of one reply per alert).
This looks like a great tool for corporate IT departments too. From their website (http://www.pagerduty.com), I like how it works with any monitoring system as long as the tool can send email. That would make it easy to integrate.
Large scale engineering dept guy here, the problem there is that most companies like that won't accept a SaaS solution.
Don't get me wrong, it's absolutely brilliant. I think it's the first time I've ever given a thumbs up to a third level metasolution to a problem.
Pagerduty needs to push some use cases on their site. It might break the SaaS reluctance to steaks and strippers type corporate managers. "It can eat my rediclously complicated jasper report that I send straight to the trash bin on arrival so I don't have to read it and figure out what buttons on the phone I have to push with all the reluctance of a four year old kid with a plate of brussel sprouts and broccoli in front of them? Sign me up!"
It would be funny to set up a server that specifically crashes when you want to wake up, similar to those alarm clocks that require you to solve a puzzle to stop blaring
Beyond their features, these guys are great - they're determined and helpful. And for a service like theirs these things are paramount. Congratulations PagerDuty!
We have a simple integration API and we can also integrate with any system that can send email. Regarding the integration, just shoot us an email: support@pagerduty.com.
Was looking for it too, found it: http://www.pagerduty.com/docs/api/api-documentation took a while, though -- in my opinion, a link to the API documentation is important enough to put in the footer, maybe next to or under "integration guides".
Not really. It doesn't do any monitoring, it just turns an e-mail into a phone call, and decides who to call based on a schedule you maintain in the web app.
The fact that it doesn't do monitoring is pretty much my only major gripe with PagerDuty. I've been a customer for a few months.
It is nothing like Nagios. PagerDuty handles alerting, not monitoring. Once your monitoring system discovers something is wrong, it tells PagerDuty. PagerDuty then gets ahold of the right person. It lets you set up on-call rotations. It does automatic escalation if the primary contact doesn't acknowledge the incident. If your monitoring system supports it, it will even automatically resolve the incident once you fix your servers.
In point of fact, Nagios (which I use extensively) handles alerting, notification, auto escalations, acknowledgements, on-call rotations, and resets itself post incident. http://library.nagios.com/ Sounds like re-inventing the wheel for a well-solved problem.