Just my personal take, I think this is a really well-written incident postmortem. It's specific, extensive, candid, and dare I say, entertaining?
Many incident reports are fully lacking in any meaningful detail, or wholly unapologetic. I actually enjoyed learning tidbits about the author, in particular their mention of https://how.complexsystems.fail/.
Reading this boosted my confidence in Slack's teams, which should ultimately be the objective of a release like this. It's not pure PR nor a gruff legally-obligated disclosure.
It helps that I wasn't really affected by this incident.
The named authors on the blog post are an amazing set of engineers and the type that would have added a lot of introspection and expertise to what happened while maintaining a high-level of lightheartedness. I spent a number of years at Slack and was involved in a postmortem in my first week as an Engineering Manager. I was impressed by how excellent practices were borrowed from Etsy in the early days and then magnified by Google's practices. As someone who once had to run these sessions and distill the learnings in to the wee hours of the morning this is great to hear.
The fact that the current top comment thread is quibbling about the date format in the title seems to agree with this assessment, if there was anything real to complain about that’s what we’d be seeing, instead we get bikeshedding on the date format in the title of a post.
Now a more philosoraptor style comment: I see Mcrib is a service built to
quickly detect and replace memcached's. I treat memcached in infrastructure as
a very stable service. Meaning it is infrequently necessary to upgrade it, and
it will generally not fail on its own. If it does it will be highly infrequent
compared to services with higher churn or more complexity/dependencies. This
means if they're failing often enough that you need to rapidly detect and
replace them you have a more fundamental problem.
From a structural standpoint I think my technical comment can be useful. If
things really are failing this much A) you should figure out why and slow that
down. B) if you have a generally stable system and understand the typical rate
of failure, you can add tripwires into Mcrib to avoid over-culling services
and loudly raise alarms. Then C) you can improve technical reliability with
redundancy/extstore/etc.
I've also seen plenty of times where folks have a dependency of a service
determine if that service is usable, which I disagree with quite strongly.
Consul being down on a node should trigger something to consider if the
service is dead. It's important both for reliability (don't kill perfectly
working things because you end up having to design around it), and for
maintainability as you've now made people afraid of upgrading Consul or other
co-dependent services. Other similar failures are single-point-of-testing
availability checking where instead you probably want two points of truth
before shooting a service.
Now you risk people being afraid of upgrading probably anything, which means
they will work around it, abstract it, or needlessly replace it with something
they feel safer managing. The latter is at best a waste of time, at worst a
time bomb until you find out what conditions this new thing breaks under.
This isn't advocating that you design without assuming anything can fail
anywhere at any time; just pointing out that how often a service _should_ fail
is extremely useful information when designing systems and designing fail
safes, alerts, monitoring, etc.
"I treat memcached in infrastructure as a very stable service."
I run memcached at a large scale. You are totally right. Every other year we will find ONE bad memcached node down. We use nutcraker instead of mcrouter for consistent hashing to each memcache node. Once i read "We also run a control plane for the cache tier, called Mcrib. Mcrib’s role is to generate up-to-date Mcrouter configurations" -- I was like oooooh boy, here we go....
Knowing memcache is a rock comes with experience though.
Our underlying hardware (AWS) is nothing like this reliable. We see regular (several times a year) failure of racks of machines or whole DCs.
Across the whole fleet (all services), we lose 1-10 servers per day as a baseline. Major events are then on top of that and can impact thousand of hosts at once.
I don't believe you run it at the scale Slack does.
The people at Slack who decided to use Mcrouter (and created Mcrib) have experience running Memcached, Mcrouter and Nutcracker in production at two of the biggest web properties in the world.
I think you nailed the real issue that caused the incident: saying "consul down == unhealthy memcached", then evicting the node. If Mcrib instead did some actual applicative healthchecks (e.g. memcached ping), which could be correlated with some system metrics (cpu, ram), it could avoid evicting those perfectly good nodes with a warm cache that just happen to have a restarting consul agent.
Granted, this is easy to say once the incident happened with an excellent postmortem, but this should be an industry-wide wakeup call: don't do this.
I have the same issue at work, where people treat a "prometheus node_exporter down" as a "the app on the machine is down". I've started to add the actual app name in our alerts, and now people don't freak out anymore when they see "down" alerts: oh node_exporter is down, but not the app? Don't panic and calmly check why.
It’s likely that the memcached install is so large that the underlying instances themselves are failing. When you have hundreds or thousands of instances, failures in the instances themselves become pretty regular.
I don't see this. I have thousands of long-lived instances - full VMs, not containers, running in our hardware.
If they start "going bad", something is wrong. That's a signal I wouldn't want to ignore.
It has happened - once an HBA in a storage node was causing occasional corruption, another time due to a communication failure people were building things with the wrong version of something which had a memory leak and would eventually summon the OOM killer. There have been other issues.
"Have you tried turning it off and back on again" is still a terrible system management strategy.
I can say with certainty this isn't strictly true. The failures should be relatively rare; when I say relatively I mean on the level of natural node failure. If natural node failure isn't survivable without special systems to quickly replace downed nodes you don't actually have an N+1 redundancy system. Thus, the pools aren't large enough :) Or, in this case, if they really are failing this much then having them always lose their cache is a major reliability hole.
It's a subtle difference. I think many operators get used to node failures being extremely common when they don't necessarily have to be. I suspect the note on "if they come back on their own ensure they're flushed" meaning they have something unusual causing ephemeral failures. If that's just "cloud networking" there isn't much they can do but it's almost always fixable.
> The failures should be relatively rare; when I say relatively I mean on the level of natural node failure.
And exactly how rare do you believe this to be?
In my experience, node failures at scale of hundreds to thousands of nodes are monthly to weekly, if not daily. Generally speaking, stability is a normal distribution. Young, new instances experience similar failure rates as old instances. If you have any sort of maximum node lifetime (for example, a week) or scale dynamically on a daily basis then you'll see a lot of failures.
Which still means you could implement a hard limit of 1 fail per hour and only allow more replacements with manual intervention. With a thousand nodes, several or hundreds failing within a few hours is so unlikely that you're probably better off preventing automatic failover in these cases.
But that generally mirrors my experience that automatic failover for stable software tends to cause more issues than it solves. A good (i.e. redundant hardware and software) Postgresql server is also so unlikely to fail that wrong detection and cascading issues from automatic failover are more likely than its actual benefits.
I think you're looking at it the wrong way. A server is never just postgres or memcached, there's always other stuff running, and it's that other stuff that can cause problems. Like maybe you're patching the fleet and a node fails to come back up, or due to misconfiguration the disk gets full.
I'd argue that stable systems are actually worse for operational stability as you become complacent and comfortable and when shit hits the fan you are unprepared.
Hi! I'd like to offer some hopefully useful information if any Slack folks end
up reading this, or anyone else with a similar infrastructure. I'll start with
some tech and make a separate philosophical comment.
Also caveat: I have no deep view into Slack's infrastructure so anything I say
here may not even be relevant. YMMV.
First some self promotion: https://github.com/memcached/memcached/wiki/Proxy
memcached itself is shipping router/proxy software. Mcrouter is difficult to
manage and unsupported. This proxy is community developed, more
flexible, likely faster, and will support more native features of memcached.
We're currently in a stabilization round ensuring it won't eat pets but all of
the basic features have been in for a while. Documentation and example
libraries are still needed but community feedback help speed those up
tremendously (or any kind of question/help request).
It's not clear to me why memcached is being managed like this; mcrouter seems
to only be used to abstract the configuration from the clients. It has a lot
of features for redundant pools and so on. Especially with what sounds like
globally immutable data and the threat of cascading failures during rolling
upgrades it sounds like it would be very helpful here.
If cost or pool sizes are the main reasons why the structure is flat, using
Extstore (https://github.com/memcached/memcached/wiki/Extstore) can likely
help. Even if object value sizes are in the realm of 500 bytes, using flash
storage can still greatly reduce the amount of RAM necessary or reduce the
pool size (granted the network can still keep up) with nearly identical
performance. Extstore takes a lot of tradeoffs (ie; keeping keys in RAM) to
ensure most operations don't actually write to flash or double-read.
Extstore's in use in tons of places and everyone's immediately addicted.
Finally, the Meta Protocol
(https://github.com/memcached/memcached/wiki/MetaCommands) can help with
stampeding herds to help keep DB load from exploding without adding excess
network roundtrips under normal conditions. I've seen lots of workarounds
people build but this protocol extension gives a lot of flexibility you can
use to help survive degraded states: anti-stampeding herd, serve-stale, better
counter semantics, and so on.
Sometimes new roll-out causes outage, sometimes, roll-out are delayed due to the overall system architecture. Reading the post-mortem, I could not help but be reminded of this issue as described here: https://www.youtube.com/watch?v=y8OnoxKotPQ
> Because the GDM data is immutable and thus can tolerate staleness, the query was also updated to read from replicas as well as Vitess primaries.
Any data that is fronted by the memcached caching tier can tolerate staleness right? There is not much difference between a short TTL of 1s versus an async replication delay of 1s.
Abusing vitess primaries is the root cause of this incident, and a similar incident can happen even without any scatter query.
(The original McDonald's McRib sandwich is well known for only being sold a limited time.)
So "Mcrouter" comes from Memcache-Router, then the obvious McDonalds jokes are made and someone cleverly suggests "Mcrib" for the next service. But I can't think what the backronym would be for it. Memcache Ring Buffer maybe. Or Broker.
RIB is a common term in networking for "Routing Information Base" (being the set of all routes which could be chosen to be installed in the routing table (or FIB -- "Forwarding") by the control plane. I don't know that this is the actual etymology but it's not implausible.
They had to know when they picked that name. If workers at Slack actually pronounce it like "Ehm See Rib" or, forbid it, "Ehm See Ahr Eye Bee" and not "McRib", I have very little interest in working there.
1.5 hours for a tool like Slack is major. Lots of productivity lost (or gained depending on how you view Slack) and thus $ impacts at companies that heavily rely on Slack for internal/team comms
I find reading about these incidents super interesting, and I generally find the work performed by the folk keeping these service running (and dealing with the inevitable falling over of any computer system).
At the same time it seems like a horrifying job I would never ever want :D
This is very transparent and a good write-up. I wonder if someone at Slack could explain how they calculate their downtime on their status page. This outage was for 3 hours and 14 minutes but they claim 99.79% uptime for the month of February.
You're probably right. I've noticed a trend of people fantasizing out loud in the past 3-5 years. It's nearly always cynical / conspiracy-theory fantasies, and they're stated as fact (or near-fact, as GP did), but without any backing information or logic... just pure fantasizing.
But, thinking back, it's not that common. Hopefully just a fad. People expressing their frustration at inequality and fear of corporate dystopia.
> engulfs the worker inside a dead-eyed grunt culture, featuring an endless spree of work-life balance destroyers. It might be great for people who ask for things from others, but for the people who have to actually do the thing being asked of them, Slack is a nightmare world.
I think it depends on the organization and how you use it. In a previous role I would’ve agreed with you. People expected you to reply at all hours, where I am now that isn’t the case.
Tools do not create toxic culture or destroy work-life balance. Organizations do that.
Nah, nothing can save a work culture when it utilizes Slack for anything. I have been on both sides of the relationship in the Slack-based work society. It always ends up yielding the same outcome, no matter if I am a manager, an individual contributor, or even an outside contractor.
There are always going to be a couple of people in your unit who run the show, who make all the animated GIF posts, to make all the slackbots that don't do anything. They always use Slack to gain undue notoriety within their firm. Slack caters to these people because these people tend to have the controls over the purse.
It's a terrible thing to do to a business, to force Slack upon them. You cannot blame Slack for making this dang product, but you sure can blame the people who consume & purchase the service without even once thinking about the well-being of the employees.
I can't say ive seen any of what you’re describing despite using slack since the early days across several organizations. I know that my experience is just that, but you might want to do a little introspection. this sounds very pessimistic and a bit paranoid.
I am confused by the query that had the problem. Specifically, I am confused by why the sharding is done by user id.
Even the largest Slack instance probably has under 100,000 users and less than 1000 peak messages per second. That feels to me like it could be served by a single master DB. It feels like that would be a better way to shard.
Major downside is the major difference in shard sizes so some management/migration might be needed but it seems doable to me.
Certainly it feels naively that scaling should be easy due to the way slack instances are independent (unlike say Twitter).
Thanks! Indeed, the variability of size and usage of each instance was the big issue, and then doing features that crossed instances meant they'd be crossing shards whatever they did, so it made sense to fix the variability issue.
(I'm also surprised/reminded how fast Slack grew and how quickly it became effectively ubiquitous — I think every company I've contracted for in the last five years has used Slack).
I love the diagrams of the cache<->DB cycle in normal vs. degenerate states. Those illustrate the problem very clearly and succinctly, and I hope they make it into a textbook some day. Kudos.
“Mcrib is objectively a better system for generating memcached configurations — but its efficiency made the broader system behave in a less safe way.” Be good but not _that_ good :)
Great post mortem, I love reading these. Its pretty neat that the tech industry is relatively transparent about these situations -- we all benefit from learning about them.
That date format is actually the worst I have ever encountered. m-d-y, with year in 2 digits, numbers not zero-padded, US "order" yet using dashes. It's like a moderator of /r/ISO8601 came up with the worst possible format on purpose. Am I missing something?
Came here to complain specifically about this. 2022-02-22 is unambiguous, big endian, and sorts nicely. IDK why society still uses any other date formats considering how international everything is.
It's because people for hundreds of years have been saying "March second, nineteen sixty two" which they then write out in that order. As a programmer, peoples' frustrations are understandable, but you're a bit naïve if you think even a percentage point of the speaking population of the world knows or is concerned with big endian-ness or sortability. However they speak English, at least in America, in that order, and that's the way they write it. Europeans only got it a little better.
I'm an Australian who occasionally has video chats with Americans overseas for work and regularly plays D&D online at least once a week with friends all over the world.
The only time I've every heard someone say "<Month> <Ordinal>" or "<Month> the <Ordinal>" is when talking with Americans.
Every other time it's always "<Ordinal> of <Month>" or just "<Ordinal>" for short.
There is a reasonable argument for little endian dates (as in the least significant information is usually the most relevant as it changes most often), but apart from the "it has been like this forever" I don't see any reasonable argument for middle endian date formats. Then again, the US is notoriously resistant to the metric system too.
So, European in the US, here. I switch my dates stubbornly to DD-MM-YYYY, 'cause that's the only way. Of course I would. But then there's so many US applications that don't adhere to my settings and use MM-DD-YYYY. So then I am still deciphering 05-07-2020-kind of stuff. All. The. Freakin'. Time.
I sometimes format dates in documents or emails dd-MMM-yyyy when the audience is international. ie 2-Feb-2022 using the short month form disambiguates the fields and I think avoids mental gymnastics like 'what month is 09 again?' for the reader. (or in my case the finger-counting....)
I frequently receive date data in Excel spreadsheets from the UK, but as a US user of Excel, I cannot convince it to interpret the date correctly. It is astonishingly bad at this.
I agree that the European format is probably not more useful, and you probably convinced me to go change my settings to YYYY-MM-DD. But I _do_ think that the European format makes more _sense_, it being in chronological "magnitude" order.
I can understand that perspective, although I maintain the usage is only preferred because of familiarity.
Since you are already on the fence with ISO8601 I invite you to consider time of day. Would you use second:minute:hour? That is also in (reverse!) “chronological magnitude” order.
It's because it matches the way we speak dates aloud. When intended for human consumption, sortability and big-endianness doesn't matter, but matching the way we speak does. Maybe other cultures actually speak dates differently, I don't know, but I have never seen a native English speaker habitually speak dates any differently than "January 1st, 2001".
All that said, I definitely agree with the original complaint, m-dd-yy is an atrocious format. If you're going to use dashes, stick with yyyy-mm-dd. Replacing the dashes with slashes, as in 2/22/22, would have been fine.
In the UK I think "1st of January" is probably slightly more common than "January the 1st" although you hear both. "January 1st" (no "the") sounds American.
Given that so many (all?) other English speaking nations including the UK usually speak it the other way around and write it day-month-year, I wonder if writing (especially thinking about newspapers here) influenced the way you speak it and not the other way around. March 1st saves space and ink over 1st of March or some other rationale. Someone certainly has already investigated the origin of putting the month first?
Edit: [1] says my hypothesis is most likely wrong, but that the UK just changed it later to match the rest of Europe. So maybe that influenced their way of speaking? In any case, matching the way one speaks doesn't seem to be a strong reason as it's easily adaptable and month names are unambiguous. Interestingly it also quotes that using a purely numeric format is incorrect in any formal use as to not confuse month and day.
“the twenty-sixth of April” would be the way I say today’s date and anecdotally is in common usage in both countries I’ve lived in (the UK and Australia, both using d/m/y). I’d say it’s about as frequent as “April the twenty-sixth” by itself, and definitely more common if you include the day (“Tuesday, the twenty-sixth of April”).
Oh this is a great point! I'd never realized that. I know that in Spanish (and I assume many of the romance languages) we always say the day first, eg dos de febrero (2nd of February). In American English even though the day first technically is grammatically correct, we pretty much never say it in that order (February 2nd instead of the 2nd of February)
If you want to write the date little endian then you should do the same with the year. So today’s little-endian date is 26-04-2220. Or maybe that is 62-40-2220? Or is it 62-40-2202?
ISO8601 is the only sane date format. Anything else is only favored for familiarity.
Our Independence Day is probably a special case. Clearly language is flexible enough to say all the formats, but the date format we write matches the most common verbalization.
I write it in that order some of the time… But when I do, I spell out the month because when I say “I was born on June 14th, 1962,” I don’t say I was born on 6/14/62. I also never say 14/6/62. In fact, I almost never say a month’s number in conversation.
If you want to write it out the way it’s spoken, write it out the way its spoken. Mixing the computer’s numbers and the spoken word’s grammar make for misunderstandings, and as a programmer, eliminating misunderstandings is one of my goals.
Same, but with spaces, and the month always full capitalized. I learned this habit in the Military as an alternative to 20220222. 22 FEB 2022 is nice because it's neither a string of numbers, which is very intimidating to read if it's written out to include hours minutes and seconds, like 20220222122222. It also completely bypasses the argument around month and day because the format includes spelling specific to the month.
If I'm writing a letter or addressing a specific thing in a formal context I chose to "revert" to the Month, Day Year because it's the social standard for the country I am in, and I want to fit into that cultural expectation, but if it's for a business document or normal chatter I think DD MMM YYYY is probably the clearest to both English & Non-English speakers. It eliminates the distractions I'd normally be dealing with when considering if I'm talking to someone out of country or not. It would be really great if it ends up being more widely adopted.
The one exception I can think of is a bug in the mssql datetime type (but not date or datetime2) where strings in that format are assumed to be yyyy-dd-mm if the locale dateformat is dmy (e.g. British English).
I'm so tired of having to do that game every time I see a date. It is not hard, but it is quite annoying. Especially since it isn't solvable in a lot of cases, so you try to reason your way to the most realistic interpretation.
The complaint isn't about the particular other order, but the fact that the order is ambiguous. In this case that doesn't matter, but often it does.
Americans memorize inches and yards, and often also memorize centimeters and meters, and working with either is fine, but we're not so often faced with numbers where it might be inches or centimeters and we have to figure out which (and when we are, it's sometimes a pain - certainly a bigger pain that working with known units).
Or, working with your language analogy, please go fetch me some "pasta" without knowing whether I'm speaking Italian or Polish.
… all the major predominantly English speaking counties will use mostly hyphens in the dd-mm-yyyy format. So although there is ambiguity, it’s easily resolved by picking that as the default mentally and only back tracking on failure.
Now in the more general case, this whole thing feels like a lieutenant/leftenant situation. We are annoyed simply because it’s not they way that we do things in a peculiar case, when otherwise the language is fully intelligible.
Even people in non-English-speaking countries write in English all the time, especially on the Internet.
Picking one default and back-tracking on failure really isn't that comforting nor the constant reminder that the date you thought it was might be something else.
Since the text itself doesn't clarify, context is the only way of resolving any of the scenarios. In each case it's usually sufficient and often not all that hard. But it's always harder than if the system in use was made explicit, and I understand the complaint (even if my annoyance at the ambiguity is quite significantly below the level where I would have complained myself, particularly in this case).
> Think about it like speaking a different language
The correct analogy is I don't know which language is spoken and the same words get used in multiple languages with different meaning. Now I can apply heuristics to figure it out or in some cases I can only guess.
Is there a different post where they used this date format?
None of their other incident reports even have a date in the title. Yet this one does, and in a weird format. Maybe there's something novel about the date, and it's written this was to emphasize the novelty, not to provide some vital information that happens to be excluded on every other incident report title they've posted.
It's a waste of effort and makes me wonder about the competence of the person who wrote it (when looking at a mangled date generally). Display dates are for humans so write the month name, then it doesn't matter what the order is.
22 Feb 22
Feb 22 22 (weird but still better)
22 22 Feb (very weird but still better)
This also goes to show that 2022 is a better choice. My own personal preference - the 22nd of February 2022.
OMG - I thought I clicked on the tablet thread regarding Sumerian OOOs -- and I thought you were sarcastically making fun of the way the Sumerians captured dates on limestone tablets ~4,000 years ago...
(i had scrolled immediately down, so the thread titel wasnt visible when I was reading your comment)
This is what you sometimes see for best-before dates in Canada. Even better, because our dates are “supposed to” be like 22/2 but I don’t think anyone here does that, except Quebec perhaps. Sometimes you just have no clue
I know it's somewhat trivial but it does bother me a bit too, because I am used to the dashes being an indicator that ISO 8601 is being used. If you're going to use a nonstandard format, I'd much prefer it not look like a standard one.
This. The use of dashes here is a bit annoying IMO ...
Not sure if this is standard but I usually see the delimiter being used to define the date format: big-endian y-m-d uses dashes, middle-endian m/d/y uses slashes, and d.m.y little-endian uses dots.
The issue is a great deal of the rest of the world don't do this, so you need to decide whether to apply best-guess heuristics to parse it or decide that it's a typo ("ah there's not 22 months, so maybe it's the 22nd of February or someone fat-fingered the 2nd of February...?").
In this case you can lookup Slack outages to disambiguate it, but the frustration here - and I share it - is directed at the stubborn refusal to use a standard format that the reest of the world has agreed upon.
Yes, the numbers are all the same, and the author is based in the US, and thus is using the default format in the US. So odd that this is the top comment.
I agree with you though, the point of a date like yyyy-mm-dd is to avoid working out stuff like this. You don't pick a date format based on whether the current date is ambiguous or not.
If you're trying to write a date parser for their outage report titles, the problem isn't the format of this date. It's that this is their only outage report with a date for the title.
It's the title because it's a novel date, and formatted this way to emphasize the novelty. It's also a date for which your question is irrelevant: it's the same either way.
The official EU rules say 22.02.2022, but nobody in Europe would have trouble parsing 22/2/22 or any variation thereof. And the / (or -) separator is indeed used in parts of the EU.
It’s the ordering that’s significant, not the separator.
Yeah, I guess if people look at it and parse it, they understand it. But what I notice is, that the IT-bubble I am in has no worries parsing these dates because we use them a lot in IT, even in Europe. But people outside of IT do seem confused about US-formatted dates from time to time, because they rarely encounter them.
More than once did I notice someone struggling to fill in their birthday into an online form because the people making the form decided to use M/D/Y instead of D.M.Y
Of course most can help themselves, but its not like these date formats seem natural or normal to everyone in Europe.
The separator is often a good clue. Dots and dashes strongly imply d-m-y, slashes imply an English date, which might be m/d/y if it is from North America.
A mixture is even more likely to be d-m-y, today is 27/4-2022 in Danish handwriting.
Doesn't anyone learn anything. Youth is not an achievement. Experience is. Having to refer to a book, rather than good change management practices, highlights the madness of agile tosh, and ignorance in terms of capacity and performance management. It's also a data breach incident (denial of service, unavailability), I hope they reported this to the UK ICO, and each countries data protection regulator. Amateurish rubbish.
1: Yup.
2: Its the legal definition under the GDPR ( a European law, that applies to the US, via Privacy Shield). If you can't get to your data - you get the idea?
> The notification obligations under the GDPR are only triggered when there is a breach of personal data which is likely to result in a risk to the rights and freedoms of individuals.
> including the author — which certainly made my role as Incident Commander more challenging!
As if no other way to communicate exists?
I remember using Slack, feeling fed up with emails, until I realized that if I wanted to sync Slack messages offline and have a standard way to view these messages that I was SOL. I am so glad that I've returned to email and optimized my workflow to use email effectively and efficiently. The best part is no more vendor lock-in.
Many incident reports are fully lacking in any meaningful detail, or wholly unapologetic. I actually enjoyed learning tidbits about the author, in particular their mention of https://how.complexsystems.fail/.
Reading this boosted my confidence in Slack's teams, which should ultimately be the objective of a release like this. It's not pure PR nor a gruff legally-obligated disclosure.
It helps that I wasn't really affected by this incident.