Hacker News new | past | comments | ask | show | jobs | submit login
Launch HN: Rootly (YC S21) – Manage Incidents in Slack
132 points by kwent on June 7, 2022 | hide | past | favorite | 93 comments
Hi HN, Quentin and JJ here! We are co-founders at Rootly (https://rootly.com/), an incident management platform built on Slack. Rootly helps automate manual admin work during incidents like the creation of Slack channels, Jira tickets, Zoom rooms & more. We also help you get data on your incidents and help automate postmortem creation.

We met at Instacart, where I was the first SRE and JJ was on the product side owning ~20% GMV on the enterprise and last-mile delivery business. As Instacart grew from processing hundreds to millions of orders, we had to scale our infrastructure, teams, and processes to keep up with this growth. Unsurprisingly, this led to our fair share of incidents (e.g. checkout issues, site outages, etc.) and a lot of restless nights while on-call.

This was further compounded by COVID-19 and the first wave of lockdowns. We surged in traffic by 500% overnight as everyone turned to online grocery. This highlighted our need for a better incident management process as it stressed every element of it. Our manual ways of working in Slack, PagerDuty, Datadog, simply weren’t enough. At first, we figured this was an Instacart-specific problem but luckily realized it wasn’t.

A few things here. Our process lacked consistency. Depending on who was responding and their incident experience it varied greatly. Most companies after they declare an incident rely on a buried-away runbook like on Confluence/Google Docs to try and follow a lengthy checklist of steps. This is hard to find, difficult to follow accurately, slow, and stress inducing. Especially after you’ve been woken up to a page at 3 am. We started working on how to automate this.

Fast forward to today, companies like Canva, Grammarly, Bolt, Faire, Productboard, OpenSea, Shell use Rootly for their incident response. We think of ourselves as part of the post-alerting workflow. Tools like PagerDuty, Datadog act like a smoke alarm to alert you to an incident, which hand off to Rootly so we can orchestrate the actual response.

We’ve learned a lot along the way. We realized the majority of our customers use the same 6 (Slack, PagerDuty, Jira, Zoom, Confluence, Google Docs, etc.) tools, follow roughly the same incident response process (create incident → collaborate → write postmortem), but their process varies dramatically. The challenge in changing these processes is hard.

Our focus in the early days was build a hyper opinionated product to help them follow what we believe are the best practices. Now our product direction is focused on configuration and flexibility, how can we plug Rootly into your already existing way of working and automate it. This has helped our larger enterprise customers be successful with their current processes being automated.

Our biggest competition is not PagerDuty/Opsgenie (in fact 98% of our customers use them) or other startups. Its internal tooling companies have built out of necessity, often because tools like Rootly didn’t exist yet. Stripe (https://www.youtube.com/watch?v=fZ8rvMhLyI4) and GitLab (https://about.gitlab.com/handbook/engineering/infrastructure...) are good examples of this.

Our journey is just getting started as we learn more each day. Would love to hear any feedback on our product or anything you find frustrating about incident response today.

Leaving you with a quick demo: https://www.loom.com/share/313a8f81f0a046f284629afc3263ebff




I will echo the other comments on no upfront pricing. Even though this could be potentially useful for my team, I won't "contact" you for pricing. I am sick of having to deal with salespeople who want to know a ton of info about your business so they can gouge every last cent out of you and then some.

I would gladly pay a little extra just to have clear pricing and sign up with a credit card. I have got an engineering org to run, don't have time to moonlight as a procurement officer.


I'll just ask, flag if this is out of place, I see lots of complaining about pricing - how many of those people are in a position where they would actually be buying this? Isn't it an enterprise product targeted to executive level buyers, and customized to the business? Just because some users or hands-on folks at smaller companies are angry they can't get a price doesn't mean there is a problem. It's probably a feature from a business perspective. Not every business model can just give a $x/user/month price.


I'm im position of buying this kind of product (CTO of 50eng), I could be interested by this kind of tool (we're alerting a lot through slack, and I'm searching a way to better track my incidents), but i'll never take 1 hour to "present my company" and "discuss a pricing" for a such "little" product.

For a product that costs tens of thousands of $ I understand, for a product that would cost hundreds, red flag, I don't have time for that. Incidentally, there is nothing worse than non-transparent prices, which can change dramatically every year

If AWS manages to have public pricing (with customers ranging at $100/m, $1000000/m), why not you?


Consider the possibility that the team hasn't yet figured out what the correct pricing model should be. Being able to talk to teams and understand how they use it and what they value in the product will help them shape a pricing model that works.


> Isn't it an enterprise product targeted to executive level buyers, and customized to the business?

Maybe, and maybe not. Execs aren't or at least shouldn't be making these kinds of decisions in a vacuum, they should be asking their engineering/technical people for input. And in a lot of places stuff like this might be purchased at the business unit or even team level.

I work at a large enterprise (as in >100k employees) but our business unit/division's execs would consult with teams like mine before buying into a product like this, and for some tooling we may actually be tasked with investigating it and talking to sale creatures ourselves. For example we're looking for new ways to implement and manage SLO's that would shape development work for ~9k people, and even the junior engineers on our team attended meetings and demo's with multiple vendors related to this goal.


I wish it was also standard to provider a small sandbox environment where I can go setup things and play around. I don't want to book a demo where I'll again be hooked up with someone from sales for a long presentation. For someone with a few million in funding, there should really be no excuse to not invest in this.


Totally agree, we do offer a 14-day free trial here if you want to give it a go: https://rootly.com/users/sign_up.

This feedback is helpful, I think we can make that a bit more obvious on our website!


Appreciate the feedback here. We'll look into what we can do to help you make a more informed decision around the financial cost structure and how that works before deciding to try us.


Congrats on the launch, I would use this for my smaller team. Why have a pricing page with no pricing on it though?


because they want to look at crunchbase and find out how much funding you got then charge you based off that.

I wish companies would boycott companies doing contact us for pricing


Well, then they would charge me pennies because the startup I work for has $0 funding :)


> I wish companies would boycott companies doing contact us for pricing

A lot of companies do, by simply never contacting them for pricing.


Thank you for the kind words and feedback.

If you email jj@rootly.com we can get you some pricing ASAP. It'll be dependent on number of users and level of support/onboarding/custom feature development required.

But if you're a small team, Rootly is powerful and ready to go out of the box with defaults and you might not require some of the custom stuff.

To answer your question directly, we customize each package for each individual customer on a bunch of variables.


I'm sorry, but if your pricing is too complex to write out on the pricing page so I can know upfront if it's even worth talking to you, I won't reach out to you. It also suggests that your pricing is simply too complex, and that would hurt me as a paying customer as well, as I wouldn't fully understand the invoices I receive from you.

Building a calculator that lets potential customers input their variables and you show what the pricing would do solves that problem fully, and takes minimal amount of time to implement.

That you haven't spent that short amount of time in order to be transparent, looks shady to me (and others, some of them even write the same in this comment thread), as otherwise you'd surely display your pricing upfront.


Thank you for the feedback, a calculator is a good suggestion! We realize there are a fair number of people that will be turned away by this, we'll see what we can do for a better middle ground.


I’d also suggest you have a free plan for very small teams. You can already see how many slack members they have. Make your tool so people just make it their default and then as the team grows the naturally start paying you up the tiers.


Thank you for the feedback.

Actually we did offer a free plan for up to 5 users and even 10 at one point. What we've found is the collaboration overhead during incidents at companies at that scale wasn't too useful and pivoted away. Instead we offer a 14-day trial so customers can get their feet wet without contacting us: https://rootly.com/users/sign_up.


You seem to have an implicit assumption that there will necessarily be some amount of "support/onboarding/custom feature development" needed.

For the vast majority of your users, I would expect the amount of installation-engineering required to be zero. For every big business on an Enterprise plan of a SaaS like e.g. Cloudflare, there are 10 or 20 accounts for customers just as big(!) who are on their Business plan — which is basically the same feature-set, but without any of the high-touch custom stuff. Because they don't need any of it, in order to get all the value they need from the SaaS.

In other words, it feels like you have an Enterprise plan, but no Business plan. It's pretty easy to construct one — just assume any high-touch stuff doesn't happen. What's the base cost, per user?


If there is no pricing, I will likely never return. If you cannot get your money right, how can I expect you will get anything else right?


Appreciate the feedback!


I’ve always wondered about building a startup on another startup’s back. What happens if they cut you off? Is getting bought up by Slack the end goal here? Seems like a big risk, one whim at Slack and you’re toast.


We have seen Slack start investing in this area with their own Workflow Builder they announced last year. One of the big use cases they highlighted was incident response. We haven't ran into any customers trying to leverage that just yet though as still a lot of heavy lifting required.

IMO what makes Slack so power is their app ecosystem. We aren't too worried about them shutting that down or competing with us. We see the awesome folks on the Slack Platform team continue to invest heavily there.

But if Slack wants to seriously compete in this space we'd welcome it. The more attention and competition the better. Most accounts we approach we've found they didn't know off the shelf solutions existed!


I'm not familiar with Workflow Builder. Does it have a separate data retention scheme? Slack seems to have a big problem in that information quickly ages out, the search is pretty bad, and, in some cases, the data is actually ephemeral and will just disappear. Incident response is one of the categories that I want to preserve. Does Rootly address any of these problems?

OK, I see at the end of the demo that there is a chat transcript, so that's useful. Does it differentiate between incidents if there are multiple active incidents? Where is that archived stored?


Rootly would address that problem, we keep a database of all of your incidents and metadata (impact, timeline, participants, metrics, etc.) on our Web platform separate from Slack. You can customize a data retention policy with us if you want but it's helpful to be able to quickly search for similar incidents without trying to find it in Slack channels.

It does differentiate between incidents if there are multiple too. We'll even warn you if you're opening an incident and another one that could be related is also active to avoid duplications.

And of course we keep the garden walls low on the product. You can export any of this data out via CSV, JSON, API or via our integrations (Airtable, Google Sheets, Looker, etc.).


> information quickly ages out

Only on free plans. Corporations that have customers with SLAs that they're doing incident-response for probably aren't using Slack's free plan.

But even if they are, the data's also not actually purged from their systems. It's still in the Slack-workspace archive export, if/when you do one of those. They just hide it from you. Paying for a plan un-hides it.


It’s actually pretty common to start on the back of an incumbent. As a startup gains success it can do more to reduce dependency by building more of the end to end experience and distribute risk to more than one partner.


That is spot on. We're investing in our Web platform which has the exact same experience and quite a few companies running incidents from there.

But the Slack ecosystem has been great for us so far. Easy to develop on and fairly flexible in terms of what we can do. I think the most challenging part is going through the review cycle, can take longer than expected when you're constantly shipping new features!


The risk is lower when you're not actually competing directly with that platform. For instance, if you're building a Twitter client reliant on the Twitter API, that's a bigger risk since you're directly competing with Twitter.

Unless Rootly is grossly doing something negligent, the chances are extremely low that Slack will not allow Rootly to build a Slack app.


Yup that is a very good point. We make Slack even stickier for their customers and ours alike.


Let's hope slack itself doesn't start using rootly for incident response :)


Although this post is largely focused on our integration with Slack, we have a standalone Web platform that does the exact same. We have quite a few companies (especially MS Teams shops) run incidents solely from there.

This also serves as a backup when Slack goes down, users can continue using Rootly (top 5 most common questions we get).


Aren't you concerned with IR or ITSM tools (ala ServiceNow) just copying your patterns and adding it to their existing Slack Apps? Slack already has public demos with pretty much the same pattern you've adopted.

Have you got a bigger end-goal, because this is trivial to copy well. You need a moat, or you need to seek acquisition quick.


My thought as well. Their opening statement talks about Pagerduty handing off incident response to Rootly. Well, Pagerduty can just launch an incident response platform and now you're out of business.


PagerDuty legitimately have this flow on their YouTube demo from March last year: https://www.youtube.com/watch?v=zEZCZG8y1eY

Surely there's more to it if they got YC funded?


I do t get the pricing on things like this and pingdom. This stuff seems like it should be cheap, like $5/user/mo. But everyone seems to go expensive.

There are other industries that are similar, but this always stood out to me as an industry where the pricing never felt right to me.


I appreciate the feedback.

We tried to think of our pricing as the value a user would get out of the tool. The amount of time and headache we'd save them when actually responding to an incident. We've found a lot more pushback for smaller sized companies (e.g. <20 eng) but have also realized the challenges of managing incidents are less pronounced.

Just curious for my own learning, is the thinking behind "should be cheap" motivated by potentially not everyone would need access to tools in monitoring, alerting, response?


For non-technical purchasers, it makes sense to price by the value the organisation will get out of the tool. However for tools that technical users are involved in you have to fight against the "I could build this myself" factor.

There are lots of tools that are basically CRUD apps, or maybe CRUD with a chat interface in this case, which are fundamentally straightforward to build a first-pass version of. I'm sure the product here is far better than a first-pass version, but it's an uphill battle to justify that when the pricing is on the value to the user, rather than based on the cost to build.

Another complicating factor for this market in particular is that there are often two types of users: regular and infrequent. In my experience tools like this would be used heavily by the engineering team, but there was value in having everyone in the org have access to the tool. There may be 10x the number of non-engineers, but they're often worth 1/10th or less to have on the platform. Each person isn't worth having by themselves but having everyone there is worth something. Nickel-and-diming customers on the basis of lots of users who rarely use the platform isn't great.

Edit: also, don't have a pricing page with no pricing on it.


This was an awesome response and you absolutely nailed it.

A vast majority of our customers actually have some sort of internal bot built. It's quite limited usually and largely focused on the "creation" portions of an incident. Most commonly create an incident channel, create Zoom, link to a few integrations, and that seems to be it.

But depending on who you speak with that can be enough which is totally fair. Especially when complexity and incident volumes are low.

And yes agree not every user on the platform will value it the same. An SRE vs. someone in legal will find different value out of it. Our goal is to make this accessible to the entire org not just engineering teams.

Thank you for the feedback on the pricing page as well!


One last thought, if engineering teams paid "what it was worth" for a tool for every piece of their stack, they'd have no money for engineers or building an actual product. It is very easy to spend the entire personnel budget again on tools, and most companies just can't do this. At that point, you're competing with your competitors for business, and with other unrelated service providers for budget.


If you charge the value a user gets out of the tool, an economically rational user would be indifferent between buying your tool and not buying it. (Given that choice, a user should probably avoid buying as they’re economically no better off, but now they have the complexity of another vendor to manage for no net gain.)

There are well established reference points that you’re at least nominally going to be compared to. “Is X worth more or less than Slack? O365? Box? gSuite?”

That is probably going to create a stronger anchor on perceived value than a carefully calculated product of reduction in MTTR * cost of outage * frequency of such savings.


Not the OP, but in one of the smaller companies (~30 people/~12 engineers) you describe that's currently looking for a tool like this.

I can't actually see the pricing because it's behind a nebulous "contact us" link, but if this is more than about $5/user/month I would definitely balk at the price.

Larger companies already have dedicated platform and tooling teams with enough technical talent and bandwidth to build this sort of solution (I've seen something eerily similar to this at a previous employer that had about ~75 engineers). IMHO it's the small companies that need off the shelf incident management because they have very few people to dedicate to solving this problem and need a way to manage the communication chaos that incidents can cause.


You'd balk at more than $5/user/mo? For 12 engineers, that's only $60/mo (and for all 30 employees, $150/mo). I'd guess that you're not a C-level since you said you'd balk at that price, which is incredibly cheap. I pay more for my business's status page.


Thank you for the feedback, we'll make some changes to the page.

Agree on those companies having technical talent and in fact a vast majority of our customers came to us with their own bot built out that resembles of our features. It really comes down to the age old question, build vs buy? The maintenance cost and feature enhancements for something owned in-house can be burdensome but I am obviously bias towards it.

We've seen it takes usually at least an engineer one whole quarter to standup the basics of nailing creating incidents right with some basic automation.

But the times customers decide building it themselves isn't worth it is often a) someone that owned it had left the company and b) its a big distraction from their core focus/ has become a full-time job to maintain.


firehydrant.com has a free tier that allows people to open incidents from Slack. It also includes the service catalog, runbooks, and status pages.


Congratulations on your launch!

The incident response space is brutally competitive and there are so many players all providing the same functionality.

I think the main problem you would have with your customers is their inertia. If an enterprise has their tools and processes setup, even though you provide a better tool at a lower price point, it's not worth their time switching to a new provider if whatever they have is working just fine.


Thank you!

And yes the space is heating up for sure. Really good awareness and attention developing as a result though. We are also noticing monitoring companies starting to snag up companies to developing into this space (e.g. Datadog https://www.datadoghq.com/blog/incident-response-with-datado...). But by far our fiercest competition are companies that are still building a subset of what we have internally. Depending on complexity and incident volume we've seen many cases where it's good enough like you mentioned.

Inertia and change management is the #1 barrier to adoption. Companies have ways of workings (right or wrong) that are engrained and established. To come in and rip it all up and say "this is the right way to manage incidents" is a tough pill to swallow. Even the inability to manage IaC or integrate with a specific tool can cause quite a bit of friction. The technical setup of any of these tools is quite easy, the real home run is how does that tool help you drive adoption?


I'm very curious, why base everything on Slack? There's no denying it's super popular (and my preferred choice for collab) but the majority of businesses don't use it, and it indirectly adds cost to your product; if I want to use (and pay for) Rootly I also have to pay for Slack (which is expensive).

And to echo everyone else here; not having up front pricing is a big red flag. You've done well to attract the large customers you have at the moment, but you're going to struggle to draw in the long tail. Engineering teams don't want to talk to your sales people and they shouldn't need to - if AWS can tell me what I'll be paying right down to the second for something, I'm sure you can tell me how much your tool will cost me to use each month without needing to call me.

EDIT:

I should also clarify, I actually quite like this tool from what I've seen of it. Looking at how we handle incidents I could see something like this addressing a lot of our pain points.


The messaging for "managing incidents on Slack" is a lot easier to understand and digest we found than something like "all-in-one incident management platform".

We have an entire Web platform that performs the same functions as Slack but also can configure custom Workflows, integrations, view metrics, service catalog, and more.

And thank you for the feedback on pricing, we have a bit of work to do here to make it easier to understand the financial cost before investing time to chat with us!


Congrats on the launch. How is this different than FireHydrant?


The CEO of FireHydrant had a less charitable answer last year:

https://twitter.com/bobbytables/status/1403090735038189573


I've been a victim of this too and it sucks. People will flat out copy your business, and while they're at it, they'll go and copy and paste your painstakingly-written documentation as well. This really leaves a bad taste in my mouth.

I always try to go with the OG when I find these types of instances, since they're likely the one actually innovating (I know nothing of FireHydrant, so this is just conjecture).


Apparently not just FireHydrant either. Someone posted on YC about 8 months ago Chris Evans from Incident.io calling them out for copying their Slack and Changelog formats literally word for word.

https://twitter.com/evnsio/status/1442908782405701634

Standing on the shoulders of giants huh


FireHydrant was my worst experience out of every incident manager I experimented with - literally nothing worked during our tests - and after two months of asking they're still refusing to remove our account; we still get a weekly email dashboard.


Yikes! Never like seeing that. Can you email me directly and I will get this sorted? robert@firehydrant.com


Tickets #1452, #1454, #1588. Was told "the account has been removed" on March 22nd, but I continue to receive "Last week on FireHydrant" emails specific to our org, most recently on May 29.


Thanks, emailed you as well. Sorry you ran into this!


Sorry to hear that. I'll do my best to resist asking if you'd like to chat instead ;).

We try to take a very "partnership" centric approach. What that looks like day-to-day is our engineers/success/leadership team collaborating in a shared Slack Connect channel on new features. For a lot of our customers we get deep into the problem and bring in outside speakers from the industry to come do workshops, AMAs, etc. that might align well with the challenges.

This is the fun part about the job!


>Sorry to hear that. I'll do my best to resist asking if you'd like to chat instead ;).

Wow - you're looking for dirt on a company in the EXACT thread that calls out some shady things you've done against them.


Thank you!

Great question, There are quite a few differences, namely in our product design focus. We've taken a more configurable and flexible approach that focuses on plugging into a companies existing stack and their process. Often times we'll have customers send us their entire playbook on what they have now and ask us to automate that as a starting point (e.g. rename Slack channels to my Jira number for incidents, etc). We do this to hopefully reduce the amount of change required when a new tool is brought in. As a result we focus on features such as our Workflows engine that allows for this customization. Another big area of focus for us is unsurprisingly Slack, we think of the other areas of Rootly such as our Web platform to be the backend that powers this.

FH does a lot of things well and has great customers too. They have a sleek UI, strong security posture, and more. Their approach is more opinionated in guiding you through incident best practices. There is no wrong answer here as we hear plenty from customers that want both.


Why did you or your team copy & paste the FyreHydrant docs?


Thank you.


What do you see as different to incident.io?


Great question.

Incident.io is likely our closest competitor and are doing amazing work over there. They have a strong team and smart founders that have been in the trenches before. From a product perspective they are my favourite of our competitors if I had to pick one myself to use.

Our differences are largely driven by the customer segment we serve and the needs of SMB vs. enterprise. We've found by going upmarket to enterprises such as Canva, Shell, Bolt, it is quite difficult to develop an opinionated platform based on only industry best practices as each organizations approach to incidents (regardless if it's the same tooling) vary greatly in their process. You'll find Rootly to be a lot more pluggable, customizable, and can turn any knob the make the product work for you if needed. The reason for this is because we find change is hard, even small process tweaks in process. We want to reduce as much of that as possible when a new tool is brought in.

We've been around longer so naturally our product maturity is further along. Such as Workflows (https://rootly.com/changelog/2021-11-30-workflows-2-0), integrations (30+), Terraform/Pulumi provider, and security (https://rootly.com/security).

Again, they do a great job. Just different needs and requirements for startups vs. enterprise.

If helpful, customers have written reviews for us on our respective G2 pages: https://www.g2.com/products/rootly-manage-incidents-on-slack...


Hey folks. Co-founder of incident.io here.

Firstly, thanks for the kind words Quentin – appreciate it, and congrats on your progress so far.

incident.io is pitched slightly differently to Rootly, insomuch as we're not building an engineering product, but instead something that's designed to work for entire organisations. I saw first hand what this looks like at Monzo – a bank here in the UK – where incidents weren't just declared when Cassandra fell over (ahem, https://monzo.com/blog/2019/09/08/why-monzo-wasnt-working-on...), but were also declared for things like excessive customer support queries and not enough people to serve them, regulatory issues, or a customer threatening staff in reception. All of these things require teams to form quickly, communicate well, and follow a process. We're building for this.

In terms of market and customer segments, we're working with a wide range of companies with up to 6k employees. That said, we're a perfect fit right now for folks in the 200-1500 people range.

By all means reach out if you have any questions.


Oh and if you're not in the market to buy something, I open sourced the tool I originally wrote at Monzo: https://github.com/monzo/response


The open source option from Netflix is quite popular too: https://github.com/Netflix/dispatch


Watching the demo, how is this more than a glorified Slack workflow to a generic ticketing system? The only additions I can see are lifecycle events that update your ticket and the postmortem reminder?

This could be easily done with an engineer/tech following a runbook/doc that's linked in the alert. Cutting tickets, creating a slack channel, escalating, and/or setting up a bridge can be done in minutes (and most alerting systems can automate that).

IME, defining the and enforcing the process is more important than the automation.


Appreciate the feedback!

You could certainly go through a checklist/runbook that got attached to the incident. This is usually the setup most companies we speak with have. However, compliance, consistency, and speed on following that during a stressful incident tends to be quite poor. Tasks like creating Slack channels, Zoom bridge, I agree aren't difficult but not things expensive engineers should be focused on. We want them to focus on putting out the fire and not the admin.

A few things for example a simple doc won't be able to accomplish:

- automatically track incident metrics

- set recurring reminders to e.g. update the statuspage

- auto-archive channels after periods of inactivity

- create incident timeline without copy-pasting

- update Jira/Asana/Linear tickets with incident metadata and action items

- automatically invite responders to the incident channel

None of which are impossible to do manually, just a question of how much time you'd like peoples spending on it. If you have a few incidents a year I agree this is likely overkill.


Congrats on the launch! This is really cool. I remember having to join several incidents and it was always a mess, especially people being left out, others being added who should not be there in the first place.

What happens when the incident is over? Where does all that data live and can there be some fancy data analytics that could potentially address bigger issues that keep reoccurring?


Many thanks!

So glad you asked. Once the incident is in a resolved, we'll prompt you to edit your postmortem. This can be done inside of Rootly but most commonly we'll auto-generate a Confluence or Google Doc. Here you'll have all your incident metadata, template to fill out, but most importantly your incident timeline (no copy-paste required).

From there we can help you do things like automatically scheduling your postmortem meeting with everyone that was involved.

We also want to help you improve your process and response. We'll prompt anyone involved in the incident for feedback (they can submit anonymously too) and collect important metrics.

There are top line metrics like incident count, MTTX, outstanding action items but also finer grained ones which is what I think you might be hinting at. For example you can visualize automatically what services are being impacted the most. That might be an indicator as an area to focus on more.

We try to keep the garden walls on the product quite low, allowing you to export any of the data out of Rootly and into your own analytics engines.


I wish we could move away from using postmortem. I know how the term is used in tech is a bit different but I've been chewed out for it (worked adjacent to pharma where postmortem has different connotations). Retrospective. Please.


Agree with many of the sentiments on here about pricing. I actually came across Rootly a couple months ago when looking for this sort of solution and the lack of transparency on cost led me to immediately dismiss it as an option


Oof, thank you for the feedback. We certainly don't want that happening. We'll look into adding greater transparency that works for our model so you can make a better informed decision upfront.


Took a quick look at your loom. Very cool. In practice, we prefer just looking through a single channel with threads, but perhaps that's because we're a small team.


Thank you!

You're not alone, I've ran into quite a few cases where working from a thread will suffice (big and small companies). We are still team dedicated incident channel but there are a few things that threads do better.

For example, the number of incident channels that get spun up can get out of hand. Threads are a lot cleaner for that. So we built a workflow that'll let you specify an auto archive behaviour.

We also don't want you to lose context of your threads or conversations, you can run /incident convert in Slack and we'll pull that context over. Lastly, we often see people working from threads from a primary #incidents or #outage channel for better visibility. We'll actually let you specify exactly who you want to notify whenever an incident gets opened/closed.

But generally we found we can do a lot more powerful things in dedicated channels around integrations or even assigning roles if that is important :)

Happy to chat more on your use case tho!


Great stuff. Good luck with your product. I imagine we'll need it later but for the moment, I think we're too small to be worth your time.

Would have loved to have run into you guys when/if you were looking for seed investment though


Appreciate the kind words and agree on fit.

Happy to compare notes and stay in touch if you'd like to connect of course jj@rootly.com.


If every text class has emphasis, no text class has emphasis.


Congrats on your launch! Seems like you've had to build a lot of integrations with other platforms, any tips or tools that made that easier?


Thank you!

We built out our own Workflows engine that makes our life significantly easier. This way different customers can use the same integrations N number of ways without us needing to bake in a default configuration. Otherwise we'd be playing catch up all the time.


PagerDuty has a product called RunDeck, which sounds familiar to your offering. Do you view them as competition?


Indirectly, their automations are less incident management/process focused but more on remediation (e.g. run A to restart db). Whereas ours are more focused around e.g. updating status page automatically, inviting responders, posting reminders, syncing with Jira, etc.


no pricing, no thanks


Hi Quentin! This is Jade

The loom link at the end 404s for me.


Fixed, thank you for catching that Jade!


How is Blameless doing?


We just picked them up across an engineering/tech org of ~350 to do precisely what they describe here.

PagerDuty for notifications and on-call rotations, Datadog for monitoring, Slack for communication in-the-moment, Google Docs for post-mortem documentation; Blameless as the glue and automation that takes away a lot of the incidental mental overhead of communicating and documenting while the incident is happening.

Super encouraging to see competition, though. A former teammate turned me on to https://how.complexsystems.fail/ and I'm willing to believe that in a complex enough system, the closest we will get to understanding how it actually works is during/after incident response.


One of my favourite sites!

And that is great, we know plenty of happy Blameless customers, they're certainly one of the better ones in the market we compete against.


From our experience we still see them in a number of deals. I think their focus has shifted more towards SLOs and Microsoft Teams though, areas where we aren't investing in right now.


Cool, thanks for this view.

I'm also intrigued by the text in this launch announcement:

> Our focus in the early days was build a hyper opinionated product to help them follow what we believe are the best practices. Now our product direction is focused on configuration and flexibility, how can we plug Rootly into your already existing way of working and automate it. This has helped our larger enterprise customers be successful with their current processes being automated.

As I have gotten more experience managing complex incidents I've come around to the idea that having a standard process you follow for big issues is somewhat more important than what the process really is.

I loved the PagerDuty response documentation ( https://response.pagerduty.com/ ) not so much because of the specifics but because it suggests they have a culture where there is a well-understood protocol they always try to follow for big problems.

I think about archery and "shot grouping" - once you learn to always land in the same place, you can move your aim to start landing somewhere else.

A number of the things that I see as valuable incident management involve having responders with a shared set of priorities. Tooling can influence how easy/hard some of these things are but it's really up to the people to do things like:

* Actually finding and fixing the problem and being sure the fix worked

* Clearly communicating the current user impact to the people who care

* Figuring out who the right responders are, and getting them in the room quickly

* Making one production change at a time with the incident coordinator's signoff, so you know which one helped and when it happened

* Helping the rest of the organization learn from what happened (you may not know what there is to learn)

Do you see room for the tooling company to also provide best-practices training, mentorship, or other kinds of support? That stuff scales less well than a web app but is arguably more important to changing a company's culture in a way that gets better user outcomes.


I LOVE this.

Ironically having good tooling is the least important element in a successful incident response program (but it does help).

Motivated people and a good process far supersedes tooling. And yes so certainly see room for this. We are doing this already as part of our partnership model.

Part 1 is getting you setup and using Rootly. Part 2 is helping you successfully drive adoption now and 365 days onwards. We'll run workshops and AMAs with guest speakers on topics completely unrelated to Rootly when we are able to identify needs (we bare the cost). For example we did a session with a F500 organization on on-call compensation and another startup on communicating incidents to leadership.

The biggest mistake for a tool in this space is to think of this as only a PLG/SaaS based offering.


Just built out Rootly and it's fantastic product. On top of that JJ and team are responsive and really heavily engaged with customers, listening to feedback and continually implementing improvements.

Having trialed Rootly against competing products Rootly won out for the customer engagement. The competition just wasn't responsive.

Implementing easy incident management tooling also saw the volume of incidents increase rapidly as teams started to notice and handle issues as actual incidents. While the metrics increase is bad on the face of it, actually addressing far more issues that were previously just ignored is fantastic, and had lead to increased stability through better incident and problem management.


Thank you for the kind words, Alex. The whole team over at Gemini has been such a treat to work with.

The collaboration on building new features together is what really excites us. Appreciate you always pushing us to be better.

I feel like I should get you a t-shirt now that says "more incidents = better" because I couldn't agree more.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: