Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> “Oscar, do you mind sharing your screen so Deepak and Deanna can see the weird log messages too?”

it seems so obvious from an Incident Commander perspective but so much goes into this workflow during an incident

* what if the person is a fresher, you are asking him to share screen, debug and perform actions in front of 100 people in the incident call and the anxiety that comes with it

* While IC has much more practice with handling fires continuously, for instance, if there is a fire every week in a 50-team organisation, a specific team would only be seeing their first incident once a year

* Self-consciousness/awareness instantly triggers a flight or fight response from even the most experienced folks

I don't know how other industries handle such a thing, I'm pretty sure even in non-tech there would be a hierarchy for the anomaly response and sometimes leaf level teams might be called to answer questions at top level of the incident response (like a forest fire response, might have a state wide response team and they pulling local response team and making them answer questions) probably they get much more time to prepare than in tech where its a matter of minutes



In a previous job, I had a critical incident crop up and we were dealing with the offshore parent company. All the senior management had been cc’ed into the emails about the problem.

Result: nobody was willing to say anything for fear of looking bad in front of those people. This was frustrating to say the least.

I solved this by replying all, but I took out all the senior people. I said something along the lines “hey guys, I’m the guy who needs this fixed. I can see you are la working hard. I’m removing a number of people from the cc lost and we will communicate with them in a seperate email. Just keep me up to date with how it’s going and tell me what you need from my end.”

This worked wonders. They worked the issue, and though it took some time it was to be expected.

When it was solved, I found the original email, replied all (including management) and explained that the problem was solved, and made a point of highlighting the excellent work the team fixing the problem had done on resolving the issue.

I never had any issues with the patent company’s dev team after that :-) in fact, they went through our incident reports and fixed 80% of the longstanding issues within the next week! Which I wasn’t expecting…

Moral of the story - take as much pressure off the incident team as you can.


Thanks so much, that was good, practical, wise, in-real-life experience.


> 100 people in the incident call

Well, there's your first problem...


I took a high enough number to showcase the problem, for a fresher it doesn't change much even if that number is as low as 15 or 20, or even if 5 people that they don't know or at higher levels

also I feel like, the number of people that hop on the incident call are almost always related to the category of the incident, sure you can always break out to a separate room, but often the person would have already realised the impact and the weight of the incident


And the point is that both of these are problems that an incident commander is there in part to solve, both in the sense of making sure that those investigating have what they need including the ability to focus, and in that of handling communications with stakeholders including leadership.

If whoever feels like it can "hop on" the incident call and stay on it, regardless of whether or not they can contribute to the investigation, then the IC needs to do a better job. Granted, usually this is for lack of institutional competence; I've been one place where the IC role was taken seriously, and incident response there ranged from solid to legendary, where most places never rise above "cautionary tale." But nonetheless.


In my exp people will get pulled in then never let go for the rest of the incident. The coordinator needs to be 'do we need XYZ anymore if not they can go and we can call them back if needed'. That is how you end up with 30+ people on a call. Not letting anyone go. Dont hold them hostage.


Can you comment on why you think it is a issue for anyone to hop on a incident call, whether or not they can contribute?

It is one thing if they are being disruptive, but I don't see a problem with observers.

For this thread, the fact that some people may feel scared to share a screen or participate if the group is too large, again that is for the IC to control. But I wouldn't kick anyone else just for lurking, there may be a good reason and I'm not going to call out every one on the call asking why they are there, that is just as disrupting.

TIA


An ongoing major incident is already stressful enough for everyone involved, and looky-loos don't help that at all. Nobody does a better job of debugging for having to fight a helmet fire at the same time, and one of the IC role's responsibilities is to proactively minimize that risk as far as possible.

It does depend somewhat on the situation and the organization, and on the role; IC engineers observing for familiarization is fine, VPs joining never is. My approach is that the incident call is for those actively involved in the investigation or who have been invited to join by those who are, including engineering ICs who wish to observe for familiarization. Meanwhile, stakeholders not directly participating in response receive updates from the incident commander via a separate (usually Slack) channel. Managing that communication is also part of the IC role, whether directly or by delegation.


I've been on an incident call that Jeff Bezos hopped on to listen into. The "IC" (we had some different name like problem management engineer or something like that) did not ask him to get off it.


This makes sense. Amazon's corporate culture is famous for its deficits.


Surely you'd want to instead share a link to the logs being investigated so others can investigate concurrently, instead of having 2 backseat drivers observe someone observing logs.


Depends. In some situations it would in fact be better to have everyone discuss one person's shared screen, instead of having to constantly coordinate what they are talking about.


+1 Depending on how complex the system/tooling is, it is rarely just one log file to share in a text editor.

If you have logs, metrics, tracing, other dashboards for context you want to see how they are debugging.

Some of these tools are very complex and other eyes can help pinpoint inefficiencies.


Ideally, wouldn't the IC's / Group of ICs' responsibility to introduce blameless culture before the incident, right?

I've worked in blameful places, always without ICs; just shouting HIPPOs.

I hope that an org evolved enough to create IC roles would back that up with culture, but I could be wrong.


Indeed - in that kind of environment an important role is "managing upwards", preventing the people who are actually doing the work from being overwhelmed by constant requests for status and explanations.


What is a fresher?


Recently graduated, just entered the workforce.


Fresher is not a good term for this example.

There are engineers that are great coders but bad in a incident environment. They may not be fresh, but also need the same help as a "fresher"


It's a very US centric term, in the UK we'd just call them graduates, for example.


Nope, not a US term. I've found it in a couple dictionaries as a UK term for "freshman", which is a similar idea but not quite the usage in OP.

The equivalent that I've usually heard in the US is "recent graduate", rather than just "graduate".

https://dictionary.cambridge.org/us/dictionary/english/fresh...


As a US developer for nearly 25 years, I've never heard this term used in business context. I'd call them a graduate as well.


Recent (this generation) Indian immigrants to the US use the term in my experience. I've never heard anyone else say it.


It's mostly a South Asian centric term.


It's a very US centric term

You've never heard of "freshers week"? That being said, I've never heard the term used to refer to anything other than university students.


I live in the US and have never heard of it.


not a US term. SE Asian.

"Fresher" + "100 people on the call" immediately makes me think Tata or Cognizant.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: