HBO Max accidentally sent an integration email test to users

fleddr · on June 18, 2021

Amateur mistake, compared to my professional mistakes.

I once wrote a bot that sent email within the company (100K+ employees). I kick-started the bot on a server remotely and only then discovered it's an endless loop. It required server admin rights to stop it, which I did not have.

I couldn't immediately reach server admins so had to physically drive there. An hour or so later somebody helped me kill the process.

The emails already sent could not be cleared out server-side, which meant that recipients had freezing email clients for over a week, unable to handle the volume, a typical 300K new emails per recipient. They had to Ctrl+A and Delete 100 emails, do this for the next 100, and so on, whilst not deleting real and useful emails.

I pretty much destroyed email for those people.

I don't just destroy things at scale though, also at home. Around the time we had our first home broadband internet connection, I set up a web server and just kept my PC running. Unknown to me, the web server software included an email server with open relay enabled by default.

About 3 days later, my dad complained about the internet connection not working. The IPS detected the issue (millions of emails sent out via my home server) and gave us a red card, fully shutting us down, permanently.

macNchz · on June 18, 2021

> I kick-started the bot on a server remotely and only then discovered it's an endless loop. It required server admin rights to stop it, which I did not have.

That sinking feeling when you realize you've started something bad and can't stop it always gives me a visceral feeling like the world is doing a dolly zoom* around me.

I have had two really fun bulk email screwups:

The first one started when we hit the the send button for a mass email campaign driving traffic to our newly launched website redesign. We immediately realized that the email marketing software had put a unique query parameter in every link that was resulting in all requests to our cache going to origin, which instantly smoked the little vm hosting the site, sending thousands of clicks to the now famous 503 Guru Meditation page. With the infrastructure folks offline in another timezone, it was the perfect environment to learn Varnish Configuration Language on the fly with the whole marketing team hanging over me with looks of horror on their faces!

The second one involved coming to work, sitting down with my coffee and noticing that our email sending process had crashed overnight. Given the rate they could be sent sequentially I realized we'd have a big backlog of tasks, so I wrote a quick shell script to split the tasks into separate lists and parallelize them across the number of cpus on our (big, colocated, dedicated, hosting many important apps and websites) server. As soon as I ran it I sent it to the background and opened up `top`, only to see thousands and thousands of forks of my process filling the list. By the time I realized that I'd flipped the numbers and split 4 email tasks into each of 50,000 processes instead of 50,000 tasks into 4 processes, the server locked up and my SSH session disconnected. Cue several panicked minutes of our apps being offline while I scrambled for the restart button in the remote management console. Somehow there were no lasting effects, and every service started up on its own when the box came back online, even though it hadn't been restarted in several years.

* https://filmschoolrejects.com/wp-content/uploads/2021/01/Jaw...

dharmab · on June 18, 2021

The proposed name for that moment of realization and horror is the onosecond:

https://youtu.be/X6NJkWbM1xk

fleddr · on June 18, 2021

Beautiful screw-ups, thanks for sharing.

On a serious note, the trend where development and admins now operate in a single blur, I find concerning. It may now be quite common for a front-end developer to also do all kinds of potentially disastrous admin/infra changes.

I think this is in particularly true with all the cloud stuff. 20 years ago, adding new servers to our DC would be a 6 month process. Now I can accidentally spin up 500 in 1 second.

dathinab · on June 18, 2021

> On a serious note, the trend where development and admins now operate in a single blur,

It is I know enough about admin stuff to know that I don't know much of the endless amount of small but "for production" very important things which you can get away without on a local dev setup and which don't tell you they are wrong. So it will just not work grate and you have no idea why and might even think it's a problem of your software instead of your setup.

Doing admin properly is as big of a job as any programming task and it's a very different field of expertise.

And docker doesn't fix it, at all. It at best improves the illusion of you doing admin stuff correctly.

Nexxxeh · on June 18, 2021

>That sinking feeling when you realize you've started something bad and can't stop it always gives me a visceral feeling like the world is doing a dolly zoom* around me.

The largest unit of time known to mankind, the "Ohnosecond".

chadash · on June 18, 2021

Thanks for sharing this. I hope that the person who made the mistake (or better yet, their manager) sees these types of stories and realizes how common this sort of thing is, even for good engineers.

I wouldn't go so far as to say that it's never an engineer's fault, but this sort of thing usually relates more to faulty processes than people.

I'm reminded of the engineer at AWS who was writing a bash script about 5 years ago and unwittingly took down a good chunk of AWS's main east coast region, causing outages for tens of thousands of websites. I remember admiring that their response wasn't to fire the engineer, but rather to say that any system that large that allows a single non-malicious engineer to take it down must need some beefing up to make that sort of mistake impossible.

fleddr · on June 18, 2021

Thank you for the appreciation, it was in part my goal to normalize mistakes and to treat them in a light manner.

I in particular object to kicking a man when already down, which is a common behavior these days.

I think spreading awareness of the error, pointing out the stupidity of it, or taking entertainment value from it is really cruel. I'm sure the person involved is aware they screwed up and is embarrassed, so all these piled up messages only hurt.

And I still think it was incredibly harmless. Any individual would get at most one useless email, which contained nothing inappropriate.

As said, this intern must professionalize in their errors, as this is not a pro level screwup.

colejohnson66 · on June 18, 2021

To avoid accidentally deleting useful emails while “Ctrl-A-ing”, could they have run a search by sender or some other criteria first? Then only delete the search results.

fleddr · on June 18, 2021

In the email client? Probably yes, if they know how. This was a very long time ago, early 2000s, and the email client was the dreaded Lotus Notes software.

jimmaswell · on June 18, 2021

I had to use that client recently. I can't believe it doesn't even have a proper email search function. Best you can get is ctrl-f on the open page or sorting all your email alphabetically.

samus · on June 18, 2021

The company didn't have sysadmins for the email server that could have cleaned up the mess?

zzyzxd · on June 18, 2021

> I pretty much destroyed email for those people.

GUI clients might be destroyed, but wasn't writing an IMAP script to bulk delete all emails an option?

swiley · on June 18, 2021

>email server with open relay enabled by default.

Every self hoster's nightmare holy cow.

fleddr · on June 18, 2021

The only fun part of the experience was checking the email queue, once I discovered it.

To my amazement, it was simply a dictionary-attack style of approach:

a@yahoo.com aa@yahoo.com ab@yahoo.com

And so on. This was 20 years ago so I'm sure it's more sophisticated now, sending based on web scrapers, leaked databases, etc.

atonse · on June 18, 2021

Aww I’m feeling bad for the poor engineer who is saying “crap” a thousand times now, or is blissfully unaware that their day is about to get much worse.

Once during a system upgrade we ran some scripts and they triggered emails to about 8,000 people before we realized it (would’ve been 150k people otherwise). The next day was all about clean up, sending an apology email etc.

My mother had visited the next day and asked why I wasn’t hanging out and my son (6 at the time) said, totally unphased, “oh he can’t come now because he’s saying sorry to 8,000 people”

Hope you get through it! These mistakes happen. Oh, and your test passed. :-)

teeray · on June 18, 2021

That engineer should absolutely point out that despite the mistake, he just created the most engaging newsletter email ever in defiance of modern marketing practices. The real cosmic brain strategy is to run with this somehow and turn it into something like those guerrilla marketing campaigns.

chromaton · on June 18, 2021

It's a promo for the new season of Silicon Valley.

pixiemaster · on June 18, 2021

that’s correct.

happened to me as well, we send out newsletter to a b2b ecom site with all links to staging, behind htacess of course: so no tracking pixels, no images etc.

some complaints, lots of „hey you did something wrong - even more of „i can’t open it please send again“ — and it was the best week in terms of sales ever.

dawnerd · on June 18, 2021

It’s number two trending so it sure worked!

bjacokes · on June 18, 2021

The Accidental Integration Test on Purpose. Larry David would be proud.

peterkos · on June 18, 2021

That’s honestly what I thought it was at first!

cortesoft · on June 18, 2021

Similar story... about 10 years ago, I had written a really simple script to email all our customers. It worked great for a long time, but then suddenly we went over 1000 customers.

My script was supposed to try to grab batches of 1000 customers and keep looping until it ran out of customers (signaled by having retrieved less than 1000 customers in my last request for the next batch of 1000 customers).

My script was missing the offset part of the query, so after we hit 1000 users, it just kept looping, sending the same email over and over to our first 1000 users.

I felt so bad that day. From then on, sending out emails was this whole huge process that involved queuing them all and then having like 6 people review to make sure we didn't mess it up.

wiredfool · on June 18, 2021

When I worked on email systems, my worst nightmare was a Sorcerers Apprentice sort of problem.

yawnxyz · on June 18, 2021

oh man I just wrote a piece of code that lets me write any markdown, push a button and it just sends it out to our ~1-2000 users.

An hour later I just commented it all out, and wrote a note to myself: "if you need this, uncomment and push back to stage". Just having that code even sitting around makes me nervous

samus · on June 18, 2021

We introduced an allowlist on all our testing and staging environments to ensure that only certain recipients can get email. We also make sure that no email address in these databases would work, unless we really want to send to it.

atonse · on June 18, 2021

We did the same thing after that incident. It’s worked beautifully.

prox · on June 18, 2021

I always was super careful with that stuff. The final send function wasnt unlocked until knew for certain all worked as intended.

acituan · on June 18, 2021

I sympathize because when I was a junior engineer once I accidentally emptied the test email spool sending the entire company including upper management all sorts of fake test emails accumulated in the past years about hires and fires and whatnot. What is more, a coworker convinced an HR person to play a prank and call me to HR office next day for giggles. Needless to say it was stressful and infuriating at the same time.

Now I know better and if somehow they are reading here, I would advise the person to just chill and don't take any shit than necessary. If you were not one of the few people with root access but somehow still had the capabilities for mass emailing in prod, that is not your problem, it is an organizational problem. For an operation at the size of HBO, anything prod has to be behind sufficient failsafes and a peer reviewed process (except maybe for a very rare "break glass" emergency).

Hope there will be a good, rational postmortem that can cool headedly identify the root causes and create action items for the actual stakeholders. If your shop is worth its salt, there wouldn't be performance evaluation consequences for you. If there is, no worries either, it is time to look for a better place.

saghm · on June 18, 2021

> What is more, a coworker convinced an HR person to play a prank and call me to HR office next day for giggles.

That's the type of thing HR people should be putting a stop to, not literally being a party to. I don't have any illusions about HR being there for the employee rather than the employer, but I can't imagine working for a place where HR is abusing their authority to add stress and shame solely for their own amusement.

flukus · on June 18, 2021

> Aww I’m feeling bad for the poor engineer who is saying “crap” a thousand times now, or is blissfully unaware that their day is about to get much worse.

I hope not. It sounds like the test database was not being anonymized but sometimes things like this can be as simple as not selecting a debug build in Visual Studio, either of which is an organizational issue and not an individual one so he shouldn't be punished.

This is especially true if you have corporate buzzwords like "taking ownership and responsibility". No one will take responsibility if there are punishments for owning up to and admitting mistakes. Odds are they feel pretty bad about it already.

While we're sharing personal anecdotes. Parents get very upset when they incorrectly receive truancy reports because you forgot to check the IsDeceased flag...

onion2k · on June 18, 2021

I hope not. It sounds like the test database was not being anonymized

Taking a copy of a production database and using it for tests is a bad idea, even if you believe you're expunging any private user data.

Development, staging, and test environments just shouldn't ever have access to production data. If you're at a company that's ISO27001 certified for data security it even goes as far as most employees not having any access to data. I've never seen any production data for the app I work on.

https://en.m.wikipedia.org/wiki/ISO/IEC_27001

gls2ro · on June 18, 2021

I agree about the part of not accessing information from production.

But I am wondering how could we debug or test something which happens only on production? I ask this because there are some bugs that can appear at the intersection of code and data.

So far my strategy is to do the following:

1. Only one person can access production DB. This person will do a backup copy and encrypt it to an internal storage.

2. Another one will get the backup and run an anonomizer script on data. The anonimzer is still up to debate what it should do after the obvious cleaning of personal data from user accounts. One important (and hard step) is regenerating the uuids but keeping foreign keys integrity.

At the end this person will create a new DB internally with the anonimizer data.

3. Someome reviews the new DB and marks it as ready to be used

Then a dev can ask access to this fresh copy.

In some teams I played with making this process full automated until review. But then if there are bugs suddenly we have a live internal DB with customer data which is not wanted.

As an alternative but only for small projects I wrote once a script which analysis the DB data and tries to create fro, scratch a similar data structure but with fake data.

Kalium · on June 18, 2021

> But I am wondering how could we debug or test something which happens only on production? I ask this because there are some bugs that can appear at the intersection of code and data.

I've found that your strategy depends greatly on the kind of bug and what kind of service:

* If you're implementing a DNS server, you can copy live queries and compare good-to-bad. Then you can notify when something bad crops up. But odds are you aren't implementing a DNS server.

* If you're working on something whose behavior potentially changes under load, you need to find a way to replicate load. Some companies have entire production environments where release candidates are sent without being less secure. Cloudflare has some of these - I implemented one of the early versions.

* If you're dealing with weird logic tied to edge cases in the database, you need to work to identify those. Having live data often makes it only marginally easier.

There are products out there that will synthesize large amounts of production-like data based on the patterns in your database. I've used tonic.ai, and I know there are others. As you say, this is a touchy process with nasty error cases. Having someone else implementing it might be desirable.

eru · on June 18, 2021

Use a copy of production (perhaps anonymized) for debugging, and delete the copy afterwards.

> But then if there are bugs suddenly we have a live internal DB with customer data which is not wanted.

Don't let the production-copy touch your normal development environment. Make sure it's deleted in time.

onion2k · on June 18, 2021

Use a copy of production (perhaps anonymized) for debugging, and delete the copy afterwards.

This way of debugging assumes a lot of things;

- You're assuming that your anonymization script works. What if some data isn't removed?

- What if the system you're using for debugging sends an email or connects to a webhook or attaches to a remote volume or pushes to a cloud service etc etc? Did your anonymization step really work?

- What if someone has connected the system you're debugging on to a production service by mistake? That would mean you're not even using the anonymized database. You're really on production..

- What if you forget to delete the database afterwards? Or forget to purge a cache? Or you fail to delete a container? Or you do delete the container, but not the container volumes? That production data is still there. Oops.

It's much simpler to just not use production data for debugging. It makes debugging harder, which is annoying, but you can't go wrong and accidentally leak your user's data. I'd prefer to just spend more time on debugging than have my users data be put at risk.

eru · on June 18, 2021

Yes, obviously you'd try to debug as much as possible without touching production data.

Of course, different businesses also have different requirements on how sensitive production data is.

flukus · on June 18, 2021

> I've never seen any production data for the app I work on.

The rest I agree with you, at least in a perfect world, but not allowed to look at production data? In the jobs I've had recently I wouldn't even be able to hypothesize what the problem is without looking at production data and production logs. Some of the issues wouldn't even have been reported if I wasn't checking the logs.

How do you bridge the gap from problem to replication and/or something actionable? Do you have someone knowledgeable enough in a role where they can feed you this information?

somebodythere · on June 18, 2021

I guess the poster meant production customer data. Production logs and metrics should be easy to access, but customer data should be highly privileges and definitely not present in logs. At my old employer viewing production customer database required a customer support escalation.

onion2k · on June 18, 2021

The rest I agree with you, at least in a perfect world, but not allowed to look at production data?

For some context, the app is all about visualising corporate and legal structures at global law firms, so it's all very private and very secure. Never having access to production data to replicate issues certainly makes debugging a bit harder, but it's never been so complex that we've not been about to figure out what's happened. I've learned a lot about understanding how an application works, how data flows through it, and intuitively zeroing in on a likely problem area while I've worked on it.

eru · on June 18, 2021

It's a lot of effort, but you can make a system like this work.

Eg Google does for example.

(SREs can still look at some metadata of the running system, like load etc.

Logs themselves have to be carefully anonymized.

The data itself is almost completly off-limits.)

KronisLV · on June 18, 2021

I'm also in a similar situation, but in my case, i cannot even get access to the application logs unless i explicitly ask for them (typically as a part of solving a certain problem, given a time of occurrence), same for APM data.

While there's certainly something good to be said about the data security in such instances, it makes catching errors and fixing them absolute hell, especially if the clients are unaware that there are the occasional exceptions appearing into the logs, or they send the wrong logs (in the case of old fashioned file based logging with unclear logging strategies).

Daily ETL with data anonymization/pseudonymization from the prod and into the test environments would be really good to have, yet i haven't really seen any companies adopt that. The closest i've seen were situations where, the production data would be manually exported, scripts run against it and then given to the developers quarterly at best.

That concludes my tiny rant that's vaguely related to the topic (DB data vs log data), though that could also encourage discussion about which data is available to other developers and how they approach it (e.g. trying to never log things like monetary amounts or even person data in logs to make them harmless and the tradeoffs of that, like them becoming more useless). Heck, maybe someone out there has automated the things i mentioned above.

chrischen · on June 18, 2021

Honestly I'd rather them not send an apology email... because then I'd have two useless emails. And in the grand scheme of things, it's literally just one more email I receive during the day.

chrsig · on June 18, 2021

I for one am happy to know that they're running integration tests at all, so I don't think they have anything to apologize for. It was just an email -- I get many uninvited emails per day. This uninvited email happened to be from an engineer rather than a marketer.

ljm · on June 18, 2021

The first thought I had, though, was if some team decided to pull down their prod DB for testing so they had 'realistic data'.

Because that would be a much bigger problem than sending an email by accident.

pestaa · on June 18, 2021

Of course you say sorry after you bump into someone. It's basic courtesy.

Grustaf · on June 18, 2021

Yeah but you don't call up some stranger on the phone to say sorry for pocket calling them.

chaps · on June 18, 2021

Good point, but considering a lot of people are probably calling/emailing HBO without any sort of response, asking what the email is about.. it might be more fair to say that the stranger called back, you didn't pick up, and now you're calling back to clear the air.

pestaa · on June 18, 2021

These were most likely paying customers on the quick dial.

corobo · on June 18, 2021

I can't get HBO max here (UK) and I got the email. I don't recall signing up for anything.

If I get a moment I might GDPR the info out of them but honestly loads of spammers have my email it's nbd.

e: Going by past emails someone decided to change their account's email to mine (which HBO was fine with, no confirmation required, hope the user can still use their account).. Don't you just love end users.

machinerychorus · on June 18, 2021

Sounds like it could be the scam where they put in your email address and hope that you pay the bill for them:

https://jameshfisher.com/2018/04/07/the-dots-do-matter-how-t...

corobo · on June 18, 2021

Ooh I'd not come across that one. Cheeky bastards (if so)

pestaa · on June 18, 2021

Be glad! Sounds like you could have a subscription for free.

chrischen · on June 23, 2021

If that's the case then ~20 people bump into me on a daily basis.

smelendez · on June 18, 2021

I think in the case the first email is too cryptic not to follow up without being unsettling to people.

acomjean · on June 18, 2021

I got it. It one line and includes the word test so I just ignored it. It wasn’t scary but it wasn’t clear what it was.

jamesdmiller · on June 18, 2021

HBO should email everyone to give us all the choice of whether to get an apology email.

yummybear · on June 18, 2021

However - most of HBO's customers are non-technical and will be wondering what kind of test this is and if they should be worried.

commandlinefan · on June 18, 2021

Well, I got my integration test e-mail yesterday but still no apology e-mail, so I think they listened to you.

samus · on June 18, 2021

It depends what the email was about. If its content reads like "test email" or some nonsense like that, nevermind. But if it looks like a legit email that would have significant consequences for the recipient, were it legit, it should definitely be clarified what's up. Also, a well-crafted email will hopefully prevent people invoking the GDPR on the sender.

commandlinefan · on June 18, 2021

I got one. It says:

Integration Test Email #1

This template is used by integration tests only.

cheschire · on June 18, 2021

The first email was useful to someone! An apology email is useless to everyone.

diegoperini · on June 18, 2021

I disagree, it's definitely useful. It's useless only if it feels/is dishonest. One way to avoid sounding dishonest is letting the dev team write the small apology piece.

userbinator · on June 18, 2021

Once during a system upgrade we ran some scripts and they triggered emails to about 8,000 people before we realized it (would’ve been 150k people otherwise). The next day was all about clean up, sending an apology email etc.

On the bright side, if you can accidentally send an automated email to that many people, then sending another email to them to apologise is unlikely to be a manual effort either.

ryukafalz · on June 18, 2021

When you've interrupted the process partway through as they did, figuring out which part of the list you need to apologize to may well be significantly more effort.

generalizations · on June 18, 2021

Just restart the script and send it to all 150k people, then just send the apology to the whole list.

notwedtm · on June 18, 2021

Found the DevOps

edgyquant · on June 18, 2021

This is why at my startup we have a “sent_emails” table with the to and from addresses, email type and a template id (if applicable.) Saved a lot of headaches

atonse · on June 18, 2021

Yup and we have deduplication logic anyway from day one.

It was easy for us to figure out the 8,000 since we log every message.

mschuster91 · on June 18, 2021

Additionally, you have some sort of paper trail you can use in GDPR or T&C disputes.

sodafountan · on June 18, 2021

The bigger and more concerning issue would be recipients complaining about the incoming email and marking it as spam, which could damage the reputation of HBO Max's domain and potentially send future legitimate messages to spam. (provided the integration tests don't use a non-primary domain for sending)

I learned this the hard way in my last role, I worked for a small company that wanted to do custom email marketing. I was pretty gung-ho about it, I thought I'd just set up a script to loop through contacts and use mailgun to send the email from a custom domain. As we used the tool we saw a steady drop-off in click-throughs to the site, the majority of our messages were getting caught up in spam filters.

Turns out there's a whole science to email marketing, how emails should be structured and formatted etc. A lot of times the criteria for spam is how often the domain was flagged for spam in the past, the length of the email, the contents etc.

Ended up relying on this tool quite a bit: https://www.mailgun.com/deliverability/email-spam-checker/

thelopa · on June 18, 2021

I had a experience like this, once. Luckily it was less visible, but I felt like a fool all the same.

I came out as trans and changed my name last year, and with the name change I set up a new email alias for work. Then I set up automation to send out a gentle reminder email about the change for people who emailed my old alias. It worked fantastically for a few months… right up until the point that (due to a series of individually innocent events) the automation ended up running across the entirety of my 7 years worth of inbox. Everyone who had mailed me over the 7 years prior to the name change started getting the reminder email. One reminder for each email they had sent me. The worst part is that due to a bug in the email automation stuff, emails sent by the automation weren’t preserved in my sent box. So I don’t even know how many people I spammed. If I had to guess, I sent dozens of emails to the CEO and other execs, hundreds to my director, and thousands to people who worked closely with me over the years.

I learned a valuable lesson that day.

_meqs · on June 18, 2021

I've been laughing at myself for like two years now about how awkwardly I came out, and only now do i realize how fortunate i am that i didn't try to automate it. Thank you very much

mkr-hn · on June 18, 2021

I thought I was done with queer tragedy stories, but I could read more of this subgenre.

snarkypixel · on June 18, 2021

What was the valuable lesson? :p

buu700 · on June 18, 2021

Don't transition unless you're able to budget for a software QA contractor.

acomar · on June 18, 2021

> Don't transition unless...

ancient trans proverb

jdasdf · on June 18, 2021

until*

Gotta keep positive!

throwawayboise · on June 18, 2021

Don't write your own email auto-responders?

thelopa · on June 18, 2021

You joke, but there is a profound truth to this. Don’t reinvent the wheel. To be fair, it’s not like I wrote any real code for my auto-responder. I used a slightly janky mail tool that could, if held just right, be used to set up an auto-responder. I should have looked for something a bit more bulletproof and focused, but I wanted to play with this specific tool and see what I could make it do.

Mostly, I’m kicking myself for not thinking to put safeguards in. Dependencies can fail in unexpected ways, and I should have set up my auto-responder to be a bit more defensive.

donbrae · on June 18, 2021

I did similar in Mac Mail a few years ago: it caused an out-of-office reply to be sent to every email I’d ever sent going back years. I was surprised Mail allowed for this scenario. Needless to say, my holiday got off to a stressful start!

floatingatoll · on June 18, 2021

Oh no

squeegmeister · on June 18, 2021

Gary_Boooossy · on June 18, 2021

[flagged]

frandroid · on June 18, 2021

You're getting downvoted to hell, but I'm going to respond anyway. The parent poster here is fully transitioning their gender. They're not cross-dressing at night, they're not "closeted trans"--they're changing their identity for their whole life, in all contexts--personal, professional, etc. Furthermore, they don't want to be known by their old identity anymore--in the parlance, that's called a deadname. So they're informing people of that happening, of who they're identifying as from now on.

floatingatoll · on June 18, 2021

It’s often considered professional courtesy to let people know when a property that matters to 99% of humanity (name and gender) changes permanently. Everyone takes a different approach. There are upsides and downsides to the “email autoresponder” method, but it’s certainly an acceptable option in local instances of context.

thelopa · on June 18, 2021

It turns out, people generally don’t change the terms of address they use for you unless you ask them to!

floatingatoll · on June 18, 2021

There are certain predictable exceptions, like when the physical changes reach a threshold of severe mismatch versus your old name and they get uncomfortable and figure it out (or ask :)

dec0dedab0de · on June 18, 2021

Is there any professional context where you wouldn't share when your professional email address changed?

iriche · on June 18, 2021

[removed]

dang · on June 18, 2021

"Don't feed egregious comments by replying; flag them instead."

https://news.ycombinator.com/newsguidelines.html

corobo · on June 18, 2021

I appreciate the solidarity coming in from the replies to their response[1].

We've all done it. If you haven't broke production you haven't broke production yet.

https://twitter.com/HBOMaxHelp/status/1405712235108917249

piyh · on June 18, 2021

An HR person was poking around the admin panels of a payroll system and accidentally clicked a notify on SSN change box. An email was sent with the before and after social security number went out to 20,000 people.

eru · on June 18, 2021

Seems like a bug in the system?

Semaphor · on June 18, 2021

I had a bug in our invoice sending script. Someone had been getting thousands of copies of their invoice. They called and politely asked if we could stop sending them that mail :D

whalesalad · on June 18, 2021

Badge of honor to be honest. Even if HBO Max fired me I would forever be stoked to be "that guy/gal" who sent out an integ test message to the entire userbase.

corobo · on June 18, 2021

If HBO fired them for this then they'd have to bring in someone without the "Don't accidentally send a million emails" training to replace them

Intern isn't going to be sending mass email again without a double check

scrame · on June 18, 2021

Can relate. Had an excitable junior engineer working on migrating between platforms, spammed the global list with '123456'

dividedbyzero · on June 18, 2021

Without knowing any details, that you're framing it this way (their fault for being junior and "excitable") and that you don't mention any failed safeguards and process failures (implying that their mistake was an easy one to make) doesn't let you (and that org) appear in a great light. This sort of thing is a huge org smell for me; people will always make mistakes, how you deal with that fact tells a lot about the quality of leadership.

And what exactly is the "excitable" bit supposed to tell us?

lal · on June 18, 2021

It's probably supposed to tell you this is a humorous anecdote about a mistake someone made and not a thinkpiece about all of the valuable lessons this company learned.

nathias · on June 18, 2021

who thought sending another spam mail as an apology for sending spam mail was a good idea?

anyfoo · on June 18, 2021

Now imagine it didn't just say "test" or "asdf".

I still might sometimes put a funny/lighthearted twist on debugging logs that nobody outside the company would see, but I never put in swear words, condescending things, or anything else I wouldn't be okay with if it was accidentally logged in production.

I fortunately did not have to make that experience myself, but I've seen too many supposedly "internal only" messages turn uninternal.

lamontcg · on June 18, 2021

I setup an apache webserver one time with a single index.html with the word "FOO" in it.

It was a placeholder until the software dev team responsible for the server could deploy software onto it (they asked it be setup with something in the index.html so they could confirm it was running, they didn't care what)

The networking team in advance of that wired it up to some load balancers.

They took an IP which had formerly been a decommissioned cluster of webservers serving the main website for the entire company (an internet retailer named after a river in some other country, you've probably never heard of them).

The DNS loadbalancers found the old IP (it had never been deleted from their configs which was the root cause) was now live and it was REALLY fast, so they shunted most of the traffic over to it.

Created a sev1 outage for "users reporting 'foo' on the website"

I'm happy I kept it professional that day.

shoo · on June 18, 2021

future alien FOO technology

> Let's say 99 of your 100 machines are taking 750 msec to handle a request (and actually do work), but this single "bad boy" [machine] is taking merely 15 msec to grab it and kill it. Is it any surprise that it's going to wind up getting the majority of incoming requests? Every time the load balancers check their list of servers, they'll see this one machine with nothing on the queue and a wonderfully low load value.

> It's like this machine has some future alien technology which lets it run 50 times faster than its buddies... but of course, it doesn't. It's just punting on all of the work.

-- https://rachelbythebay.com/w/2015/02/16/capture/

lamontcg · on June 18, 2021

they were all returning 200s though for "GET /", so they were perfectly healthy and happy...

optimized as shit for that one request...

geoduck14 · on June 18, 2021

This made me VERY happy

buu700 · on June 18, 2021

I learned that lesson the hard way when I was a teenager.

I'd semi-automated the generation of emails to parents about how their kids were doing at summer camp. It was basically just a script that asked various questions and pulled various data about the camper from the database, then used that to generate a letter that could be used as a starting point and edited to be more personal (or, if in a rush, just sent as-is).

Long story short, due to user error, a value of 0 was set for a camper's behavior/politeness/helpfulness rating, which resulted in a joke sentence that I'd written as an Easter egg getting slipped into that particular report. Cue egg on my face when a parent calls in, baffled about the otherwise normal report containing a casual aside about how terrible their nice little girl was.

After that experience, I always assume that every string in the code will inevitably be seen by a real user/customer.

bostik · on June 18, 2021

> assume that every string in the code will inevitably be seen by a real user/customer.

That's a good rule, and the same principle can be applied to communications in general. In fact, I have the three rules printed on my office door:

    Dance like nobody's watching
    Encrypt like everybody is
    Email as if it were read out loud at a deposition

floatingatoll · on June 18, 2021

The first production outage of my sysadmin career was when I crashed a university email system at a summer camp (I was a kid attending, probably very early 1990s?) by circumventing the block on emailing the entire university, using the Mac keyboard shortcut Cmd-A to select all in the recipient field in the cc:Mail UI. I apologized for childishly causing harm with something I thought would be funny, and so they decided not to kick me out.

Wheen · on June 18, 2021

As a former HBO dev, can confirm that the majority of test cases or random debugging messages are Game of Thrones quotes.

dylan604 · on June 18, 2021

I guess HBO devs are too young to be able to quote The Wire.

oregano · on June 18, 2021

404: Code Not Found. Man’s Gotta Have A Code.

dylan604 · on June 18, 2021

400: WTF Did I do?

edgyquant · on June 18, 2021

    401: I robs drug dealers

BeFlatXIII · on June 18, 2021

502: you want things to be one way, but they’re the other way

Nexxxeh · on June 18, 2021

502: What the fuck did I do?

isatty · on June 18, 2021

500: Fuck fuck fuck fuck … fuck

c17r · on June 18, 2021

To this day, that is in my top 5 favorite scenes, ever. So much said with a single word.

dylan604 · on June 18, 2021

Totally agree. I loved how it just showed the difference between a cop going through the motions, and these 2 guys show up months later and solve the thing in the span of 5 mins. "Real police"

Show me a GoT scene anywhere this level

jagger27 · on June 18, 2021

The Sopranos too.

sofixa · on June 18, 2021

Not directly related, but it still enrages me to this day that HBO allowed the two idiots who shall not be named to butcher Game of Thrones. It went from one of the biggest things in popular culture to basically nothing over the span of a few weeks.

shp0ngle · on June 18, 2021

They created the show in the first place, so it's hard to blame HBO for keeping them.

CRConrad · on June 19, 2021

> They created the show in the first place

Well, "created" may be too big a word here... Transferred it from book to screen, rather.

As evidenced by the fact that when they ran out of stuff to transfer and started actually creating for themselves, quality took a nosedive.

sofixa · on June 18, 2021

Actually it's not, anyone with half a brain and access to the script should have know it won't work.

123123as1asd12 · on June 18, 2021

Ginger Minge

stickfigure · on June 18, 2021

This, a thousand times. I have a standard bit of lorem-ipsum-style content for this situation (whose origin I have forgotten and cannot credit, lo siento):

    This is a test.
    This is only a test.
    Had this been a real emergency, we would have fled
    and you would not have been informed.

fy20 · on June 18, 2021

To be honest things like that may sound funny to you, but it's easy to take things the wrong way. I'd leave out the last sentence.

smcl · on June 18, 2021

This is a great point, but it's a good idea to be cautious about anything you write in a work context - like in IMs, commit messages or emails. I've been working by the principle that I won't write out things that I would be uncomfortable being confronted with out of context - say if a newspaper got hold of some leaks, or some court case caused a company to have to turn over emails etc. This doesn't mean I don't swear or joke ever, but it means that I don't send stupid one-off throway dumb things. So "The fucking $foo-service is down again, I've no idea what caused it" is fine, but "Well $foo-service is down, hope our idiot customers don't figure it out before we fix it lol" is obviously a no-no (not least because that's not actually how I joke or talk anyway).

rurounijones · on June 18, 2021

> I still might sometimes put a funny/lighthearted twist on debugging logs that nobody outside the company would see, but I never put in swear words, condescending things, or anything else I wouldn't be okay with if it was accidentally logged in production.

A previous company I worked in once managed to send the following in patch notes:

"So sit back, smoke a spliff and stop worrying you nappy wearing fucktards"

... so, yeah, good advice.

Nexxxeh · on June 18, 2021

>good advice.

Both are, really.

tluyben2 · on June 18, 2021

Yep, a colleague of mine was running a quick test (this is a long time ago though) to see if some new feature, which sent email, was working and, because it was not supposed to actually sent email anywhere else than to him, he quickly put something with 'fucker' in it, pressed test and... yep all customers received that. Luckily we were in NL with only NL people receiving it and they mostly just found it funny. My colleague nearly got a coronary of course and never did that again.

Macha · on June 18, 2021

We got a support ticket about swearing in browser console logs before

edgyquant · on June 18, 2021

Not using swear words was a big change I had to make when going from working alone as a freelancer to working with other engineers and testers. Where I work we delegated some frontend stuff to a team last year who would test with things like “God damned” and when I saw that I realized how much I’d matured as my first thought was, “unprofessional.”

dbt00 · on June 18, 2021

One team I worked on put a profanity filter in the source code checkin filter so that we could put curse words in whatever code we were writing and make sure it didn't get merged until that code was refactored.

samus · on June 18, 2021

How tolerant was it against typos? For example: "fuk". Or did it contain every variation already?

123123as1asd12 · on June 18, 2021

I used to write silly or rude things when testing, however I once found out a client of ours was getting tagged onto all of my emails via bcc, From then on I always use "This is a test, Please disregard"

herrkanin · on June 18, 2021

A national Swedish business news paper learnt this the hard way https://translate.google.com/translate?sl=auto&tl=en&u=https...

aidos · on June 18, 2021

I worked with a guy that was convinced at the last second to change his test email from a swear word to just "test". When he accidentally sent that to the customer base the CEO tore him a new one; it could have been so much worse.

I learned my lesson from him and now I keep these things very clean, just in case.

alasdair_ · on June 18, 2021

I set up the production servers for Pokemon GO using my personal email address as the owner. When we hit 120 million users and all the servers melted, guess which address every single one of those people was told to email about the problem?

It took a script running for five days to delete everything.

Fordec · on June 18, 2021

Can I just say I would love a detailed break down of the glorious mess behind the Pokemon GO launch. As a person who both wanted to play from the very start and couldn't even log in and a DevOps person who empathized heavily, I'd love to hear all about the firefighting behind the scenes during those first few hours to weeks.

alasdair_ · on June 18, 2021

I’d love to write one. I need to take a look at my NDA again just to make sure I don’t infringe on anything.

I can say one thing: the problems were heavily exacerbated by a few botters trying to scrape the entire planets worth of data every minute so they could charge money for realtime maps of everything. Every time we’d shut them out, they would find another way around the limits. Not fun when we were already so overloaded with real users.

MasterScrat · on June 18, 2021

Seconded - is there any article anywhere about the technical side of these early days?

alasdair_ · on June 18, 2021

This is one from Google’s side. Note that we had a total of five server engineers compared to their org :)

https://cloud.google.com/blog/products/containers-kubernetes...

Also, we got a call from them at one point early on saying we’d broken their global L7 load balancers because of too much traffic :)

coremoff · on June 18, 2021

if you can't find anything about Pokemon Go, might be worth looking for Ingress instead - I believe that Pokemon Go was sort of based on that mobile game

jagger27 · on June 18, 2021

Oh man. I'm assuming you used your Gmail or Microsoft account where they can handle that amount of influx. Imagine if that happened on a late 90s ISP-run email account running on a single server. Ouch.

fy20 · on June 18, 2021

A while ago I worked on an SMS gateway and somebody had entered their own phone number in some sort of test. Somebody tested it, and there was a bug which triggered 100s of messages to be sent to their phone.

This was over a decade ago, when the messages were stored on the SIM and there was a limit as to how many they could hold (something like 20). So you just fill up the limit and that's it right? Nope, the carrier helpfully buffers messages that can't be received, and they will be sent/received when there is space on the device. I can't remember how they resolved it in the end, I guess just waiting for the messages to expire (72 hours).

matsemann · on June 18, 2021

I once tried to apply a hotfix in prod by opening the PHP file through ftp and modify it inline and just save. What I was fixing was the email sending logic. Since it was a long time since I had written any PHP, I forgot to add a $ before my i-variable in the loop. It still ran, though, just getting stuck with index=0 and sending email after email. Luckily my user was the first one in the db so no one else was affected.

But since PHP running on a managed host isn't something one can easily "shut off" it ran until it timed out, sending thousands of email to my gmail. While Google could handle it, it ended up locking my account for a few days with an error every time I tried opening the inbox.

Luckily my domain / provider didn't get blocked or spam-listed in the future.

sergiomattei · on June 18, 2021

As someone who was playing late-night Pokemon Go minutes ago, this is hilarious to read.

0des · on June 18, 2021

Just curious, how is something like this typically handled/addressed internally? Is it one of those live and learn type situations or was there any consequence other than your email getting flooded?

diveanon · on June 18, 2021

I would pay actual money to see your GCP invoice for that month.

shrike · on June 18, 2021

My first IT job was in a large call center. I was sweeping up in a data center and there was a keyboard cable stretched across a walkway. The keyboard wasn't movable (I don't remember why) so I unplugged it from the PC, swept around it, and plugged it back in. About 2 minutes later half a dozen people run into the room. Apparently the SUN workstation I had unplugged the keyboard from was a critical component of the call manager and there was a bug that forced a reboot when the keyboard was plugged in while the system was running.

I had hung up on ~36,000 people. Lesson learned and I only had to keep the data center clean for another year.

diplomatpuppy · on June 18, 2021

That's....not your fault.

There should have been a label on the computer near the keyboard jack written by the people who knew about the problem.

If you didnt make a label afterwards yourself, the next keyboard-based outage would be your fault :)

Talanes · on June 18, 2021

A cover on the service critical cable stretched across a walkway wouldn't be amiss either.

elwell · on June 18, 2021

> keyboard-based outage

I love it.

hbn · on June 18, 2021

Reminds me of the "load-bearing poster" bit from The Simpsons

cjbprime · on June 18, 2021

It sounds like the real lesson that needed to be learned was taught to whoever decided that a system without any fault-tolerance that reboots whenever a keyboard is plugged in was fit for production.

geoduck14 · on June 18, 2021

My work is doing planned DR testing right now. One key system had some problems failing over - so they delayed failing back, then they decided to change when they were going to fail back.

Each time they changed their plans, there were war rooms discussing the "impact" of them changing plans. In each meeting, I'm scratching my head: this the closest thing to an actual disaster and we are all in a tizzy.

sharken · on June 18, 2021

Doing regular fail overs is a healthy practice, but it's also a test of the company.

A large outage is just about the only thing that can convince management that it is important.

Reducing technical debt and staying current with technology ought to be a priority, but all too often it's not.

Netflix learned it the hard way:

https://opensource.com/article/18/4/how-netflix-does-failove...

jimmydorry · on June 18, 2021

They had an outage which caused them to optimize the time to recover. However, this article didn't make any mention of regular testing of the new failover. If I'm reading it right, they designed their backup to not report on its health at all, and they instead have to just hope it works when they need it.

Was this not the completely wrong lesson to learn?

>Since our capacity injection is swift, we don't have to cautiously move the traffic by proxying to allow scaling policies to react. We can simply switch the DNS and open the floodgates, thus shaving even more precious minutes during an outage.

>We added filters in the shadow cluster to prevent the dark instances from reporting metrics. Otherwise, they will pollute the metric space and confuse the normal operating behavior.

>We also stopped the instances in the shadow clusters from registering themselves UP in discovery by modifying our discovery client. These instances will continue to remain in the dark (pun fully intended) until we trigger a failover.

bcrosby95 · on June 18, 2021

Reminds me, my friend recently had a "scoping" meeting. It was planned for 2 hours. Lasted 9. Obviously they needed a scoping meeting to scope their scoping meeting.

xvector · on June 18, 2021

Ah, I love when people decide they need committees for plans for their committees.

urbandw311er · on June 18, 2021

I wonder if it was one of the earlier PS/2 keyboards. As I recall some of those weren’t plug and play, so it would have been less of a bug and more just that the hardware wasn’t designed to be hot swapped.

toomanybeersies · on June 18, 2021

> As I recall some of those weren’t plug and play

Due to its how its implemented on the motherboard, standard PS/2 isn't hot swappable. Any support for hot swapping is the exception, rather than the norm.

I always thought it was a bit ironic that VGA and RS-232 cables (with D-sub connectors) could be mechanically attached with screws, even though they were hot swappable. Yet PS/2, which used screwless mini-DIN connectors, wasn't hot swappable.

aspaceman · on June 18, 2021

I'm hilariously reminded of the episode of Rick and Morty where Morty flips the wrong light-switch, killing a room full of cryogencially frozen people.

adaml_623 · on June 18, 2021

I did almost the same action and unplugged and replugged a Sun keyboard while rearranging cables. But in my case the machine just suspended to the boot mode. 10 minutes later my team lead came and found me, figured out the situation and typed 'go'.

The server restarted and just continued happily doing its job.

GuB-42 · on June 18, 2021

So, you have an keyboard cable stretched across the walkway, maybe a tripping hazard, unlabeled, that crashes everything when it is unplugged...

It looks a lot like this https://xkcd.com/908/

I have also seen instances of similar things, but usually there is a large sign saying "do not unplug/turn off" and they try to make it as unobtrusive as possible.

The problem here is that it is you were the newbie. It should have been done by a higher up who could then properly chastise whoever was responsible of that mess.

toomanybeersies · on June 18, 2021

Fundamentally, a large "do not unplug" sign is just like any other failsafe system. Nuclear missile silos, probably the most "do not touch this switch" systems in existence, are only slightly more refined, and have red covers over all the important switches instead.

The alternative is to epoxy the keyboard in. But then when you legitimately need to unplug it, you need to find a hammer.

As an aside, that XKCD strip feels heavily inspired by an episode of the IT Crowd [1]

[1] https://www.youtube.com/watch?v=iDbyYGrswtg

anitil · on June 18, 2021

I mean, at least they could have put some superglue on the plug!

hyperman1 · on June 18, 2021

I heard how a company in the long past got a phone call from an elder customer: I've got such a strange letter from you. Reception asks what it says. Says the customer: I don't know yet, they're still busy putting it in the hallway. Huh?

It turns out an error was made in a mass mailing, and every letter was sent with the first address on the list. The list was roughly sorted by birth date, so the eldest customer got all of them. Postal office workers drove with a van to the customer, and duly delivered bags and bags of mail to the same address.

century19 · on June 18, 2021

Same thing happened somewhere I worked, except all mail was being sent to the company HQ. It was loaded on lorries but someone at the sorting centre had the foresight to phone up and ask "is this correct"?

sethammons · on June 18, 2021

My email sending bug: I work at a large email sending company. You’ve heard of us. One of my first days on the job (years ago now), my task was to set a cron to send out emails to folks to say “hey you’re almost out of credits you should upgrade your plan.” Well there was a bug caused by a couple different problems in the system. And it turns out that the utility got stuck on a tight loop on one poor individual’s inbox. this poor yahoo recipient received over 400,000 emails from me in a matter of seconds. Yeah, i felt pretty bad there.

bombcar · on June 18, 2021

In a way it’s better to completely obliterate one persons inbox than sent one email by accident to 400k people.

The first you could apologize in many ways including financial, the second you really can only apologize by doing it again.

dylan604 · on June 18, 2021

>the second you really can only apologize by doing it again.

Do you though? Is there value in sending yet another email? Is there harm in not sending another email? For those of use "tuned in", we just grab the popcorn, and show sympathy for the poor chap that is having a bad day. The rest of the world probably ignores. Maybe some people report it as spam?

geoduck14 · on June 18, 2021

I once sent spam out to persons unknown. It was probably sent to about 4000 people - but no one could actually be sure.

So to fix it, I sent an "I'm sorry email" to the following: my boss, his boss, and 50 key customers, all on BCC so they didn't know who I was "apologizing to" and left it at that.

bombcar · on June 18, 2021

Depending on the type of accidental email it may have tracking links and the software may let you redirect those links to a “we screwed up” page.

Sending an apology email really should be reserved for situations where the mistake included erroneous information - a “Dewey defeats Truman” type of email, or with wrong dates, etc.

Buttons840 · on June 18, 2021

I was building a system to call patients and remind them of doctors appointments. This was hooked to a T1 I think, with dozens of phone lines. Naturally, we tested this with our own phone numbers. One evening I get a call reminding me of a fake appointment, and while I listen to the reminder and enjoy knowing I built this... I get a call waiting notification (and you know the rest)

... (but I'll say the rest anyway). The system was using dozens of phone lines to call me and leaving multiple voice messages simultaneously. I considered myself reminded.

brianwawok · on June 18, 2021

I did that to myself via gmail. Gmail has some kind of weird quota. It got to the point where it would only let like 100 emails per hour through. So that 5 minutes of email blast turned into one big blast, and then 100 emails per hour, for like 14 days straight. Thanks filters at least to send them to the trash..

Frost1x · on June 18, 2021

Depending on the time period, it's pretty easy to clean out nice well patterned spam like that these days. In the days of yester-year, it could be a largely manual process.

In AOL one prank was to flood someone with emails because it was difficult to remove them, IIRC and needed to be done almost one by one.

stordoff · on June 18, 2021

> In AOL one prank was to flood someone with emails because it was difficult to remove them, IIRC and needed to be done almost one by one.

An email service one of my friends used around 2005/6 (I can't remember if it was Lycos or the school internal system) had a similar issue - you could only remove a page (10-15 IIRC) of emails at a time. Once he mentioned this, one of my other friends took this as a cue to blast him with 32k (pretty sure it was 32768) emails. I don't think he ever got them all deleted.

kayodelycaon · on June 18, 2021

My company had a bug that did this and Yahoo throttled mail from our server into uselessness for weeks.

throwawayboise · on June 18, 2021

Serves them right for using Yahoo mail.

floatingatoll · on June 18, 2021

HBO MAX lead Andy Forssell responded:

https://twitter.com/aforssell/status/1405731205274685442?s=2...

joren- · on June 18, 2021

To share my mistakes as well:

After launching a web app we noticed severe performance issues and had to roll back to a previous version. To identify the performance problems introduced in the new version, I wrote a script mimicking a thousands of typical user interactions. This also included sending email. In the moment, we forgot to properly configure the dev environment and the emails went out.. but it did not stop there.

This is a good decade ago and some clients still preferred to get faxes instead of emails. We used a service which automatically converted emails to fax messages. We tried to intervene but were not able to stop thousands of fax messages and basically DDOS'd many fax machines and then had to call the affected clients..

adius · on June 18, 2021

Maybe this prompted some people to switch to email? Thanks for your service in that case! =P

orev · on June 18, 2021

The bigger problem I see is that they’re clearly using production data on test systems (or using the production system for testing), including PII. This is a pretty big no-no and violates many security standards. I don’t blame the tester per se, but I do blame HBO for not having a process in place that prevents this kind of thing from happening.

kortilla · on June 18, 2021

Jump to conclusion and shame. Classic.

How about, just maybe, they use a mail list management system like millions of other companies and PII wasn’t available in tests at all?

bogomipz · on June 18, 2021

Except they also sent emails to people who are not subscribers such as myself and there was no unsubscribe link in the email. There was also no header, footer or any branding at all in the email. The only content in the email was the single line of text that read:

"template is used by integration tests only."

None of that sounds like a mail list management system to me. Also nowhere does the OP appear to "shame" anyone. In fact the OP very clearly states they don't blame any person but that they felt fault lies in lack of process to prevent such incidents.

kortilla · on June 19, 2021

That sounds exactly like what you would send to a mail list management system. Since you’ve likely never used them, think of it like a black box you feed a template that has a bunch of vars you can reference like

“Dear {firstname},

Check out our new movie: Batman Undresses.

Thanks for being a subscriber for {accountlife}.

{termsfooter} {unsubscribefooter} {alternatelanguagesfooter} “

The whole point of a template is so you send to an entire distribution list with a single API call and the mail system handles rendering templates to per user emails, setting up the unsubscribe link, tracking pixels, etc.

> Except they also sent emails to people who are not subscribers

That has exactly zero relation ship to your name being in their mail distribution system.

CRConrad · on June 19, 2021

> That has exactly zero relation ship to your name being in their mail distribution system.

Seems it has some relation after all: If your name isn't in the system, you wouldn't receive the result of the template being applied: Somewhere a bit higher in the code than your example, up among the headers, there's a bit like "to:{emailaddress}".

I mean, what is sending mail to non-subscribers related to, if not the contact data of those non-subscribers being in the sender's mail distribution system?

kortilla · on June 20, 2021

No, you’re getting confused. An email list management system has far different data than the system that dispatched the fucked up template to it. Additionally, whether or not someone is an active subscriber no impact on them being in the mail system in general.

The whole point is that the piece that screwed up and pushed this template would have no PII access itself.

Beyond it being PII, it’s just how you sanely design these types of mass email/sms/push notification distribution system.

badams2527 · on June 18, 2021

No, no, that can't be. Don't be reasonable

jffry · on June 18, 2021

This incident doesn't necessarily indicate that they were using prod data in a test system.

I can plausibly imagine that there's some separate system that takes an identifier for some list of customers, and some template, and blasts out emails. Such a system could exist to help manage compliance issues with e.g. unsubscribe requests.

If so, and with a few "shortcuts" taken in making test environments for integration testing, I could envision a scenario where this incident happens that don't involve the test having direct access to real user data.

jspash · on June 18, 2021

HBO Max sounds like a big company (I've heard of HBO, and "Max" sounds big to me). But it is possible they are using something like Mailchimp for their mailing list and don't do it in house. I'm just guessing here - a quick look at the headers would reveal this.

And with a confusing and horrendous UI such as Mailchimp's, it's quite easy to send a test email to the "live list". VERY easy indeed.

We've done it twice now. Once to about 10,000 emails and another to almost the entire list of 800,000. Luckily the template we were testing was 95% complete and not many people noticed. It just looked like the email got truncated with gibberish at the end.

chrisbolt · on June 18, 2021

They use SendGrid. From the headers, abmail.mail.hbomax.com resolves to u6146175.wl176.sendgrid.net.

mdeeks · on June 18, 2021

Or they ran a suite of integration tests against prod as a way to verify a release was working correctly.

sahkopoyta · on June 18, 2021

but... why would you do that? production is not for that one

kortilla · on June 18, 2021

If you don’t test against prod your CD pipeline is missing a critical step.

bpicolo · on June 18, 2021

In some systems the cost of writing and maintaining stubbed or alternate versions of things you don’t want hit by tests in prod can be pretty overbearing. Good SLOs can serve the same purpose as well for that sort of flow

drewmate · on June 18, 2021

+1. The first thought I had when I saw that email was that I felt bad for the tester/dev; it's not really their fault, but they're certainly going to get at least some backlash for this. Really, it's a director/VP level issue that this kind of mistake was even possible with my email address.

jffry · on June 18, 2021

This tweet does at least give some some indication that they're not (outwardly) trying to throw their intern under the bus: https://twitter.com/HBOMaxHelp/status/1405712235108917249