Hacker News new | past | comments | ask | show | jobs | submit login
CrowdStrike's impact on aviation (heavymeta.org)
414 points by jjwiseman 41 days ago | hide | past | favorite | 318 comments



>> Why were other airlines able to get back to normal so much faster than Delta?

I read somewhere that their crew tracking software was hit hard and took time to recover. Will look for source on that.

(Edited) source: https://news.delta.com/update-delta-customers-ceo-ed-bastian

“… and in particular one of our crew tracking-related tools was affected and unable to effectively process the unprecedented number of changes triggered by the system shutdown…”


One other compounding problem is that Delta's headquarters and main traffic patterns are on the east coast. Crowdstrike affected all the airlines at roughly the same time. This gave them roughly one to two fewer hours to respond before they hit their morning peak flights.

As someone else pointed out, they probably weren't ready by the time they needed their systems for the morning rush so they went to their business continuity strategy (manual). This has a throughput and recovery time penalty and obviously it compounds the longer they are in that mode.

I think what we're finding with the Southwest meltdown and now the Delta meltdown is that the big airlines just don't have the manpower or scheduling slack to accommodate going into business continuity. I do think this should be investigated. Hopefully financial penalties incentivize action but time will tell.


They prioritized stock buy backs instead of investing in a robust it operation


As well they should!

Which one profits the CEO more? Stock buy-backs or robust IT? Robust IT is only good for the company in the long term; however, with stock buy-backs or other skimping on IT, if disaster like this happens, the CEO just takes his golden parachute and leaves, but if no disaster happens, he gets a huge bonus to buy another private yacht.


    > big airlines just don't have the manpower or scheduling slack to accommodate going into business continuity
Do small airlines have it? And, how much higher are you willing to pay in ticket prices to have this ability?


> Hopefully financial penalties incentivize action

Delta already took a huge financial hit for this.


Re Delta

It's not so much a severity as "hard"; but with the hub and spoke model that Delta uses, scheduling being down (at all on Friday), combined with FAA hour limits. It becomes exponentially difficult to reschedule flights.

Put more plainly, on Friday, your scheduling software is down for 4 hours in the morning, so you "borrow" any replacements you need for employees that are late or sick. This ruins the availability for the next flights, at which time you hope the system is up again; but if it's not, you borrow from the evening flights. Combine this with each flight that was late/cancelled as you were hoping to fill now affects the hours available for the employees that were available. Finally, as you've cascaded this, you head into a weekend trying to catalog how many hours each crew member did or did not log, and you're not sure how to get them back in time.


Except for Southwest the other legacy airlines (United, American) also use a hub and spoke model. So does jetBlue.


Funny you should mention WN. Delta's meltdown is the exact same scenario as Southwest. Crew scheduling is messed up, they don't have a way of tracking where employees are, if the employee is legal, etc and so the operation grinds to a halt


To clarify, Southwest's meltdown last year, which was all about the difficulties of crew scheduling and the knock-on effects of same.


wouldn't this imply either an upper bound on down time (airline simply folds as it never catches up) or an upper bound on the duration of the impact ?


Worst case, with good weather, you can stop service for a few days: day 1 mandatory rest; day 2 fly crews to where they need to be to start service; day 3 mandatory rest; day 4 return to service. Then start rebooking passengers and picking up the pieces. Carriers with long haul international may need longer, and maybe you need more rest days to ensure everyone is ready for their normal shift, but that's a reasonable napkin estimate.

Otoh, Delta seemed to have recovered after about a week, and canceled about 1,000 out of about 4,000 flights for several days. It's way better to fly 75% of the daily flights than not. There's less wiggle room in a summer schedule for weather, but there's still some wiggle room.


They did a "reset." Cancel enough flights to reduce load, then manually recalibrate the crew tracking software to figure out where everyone is and their hours. Then start operations again.


It's like stopping your in-place manual software recovery efforts and restoring from backup. You KNOW it's going to take a massive amount of time, but at least you know how much time it's expected to take, and what the expected result is, rather than "2 more hours... 2 more hours.... 2 more hours.." for a week.


There probably is an upper bound on down time, by which point the business has suffered some irreparable harm. It might not result in the business simply folding, but might result in significant expense or legal complications, long-term reputational damage, etc. In business continuity speak, that's the "maximum tolerable downtime," and while I don't know how Delta defines it for the impacted systems... I imagine they're not happy with how long they were down.


>> Why were other airlines able to get back to normal so much faster than Delta?

I read somewhere that their crew tracking software was hit hard and took time to recover. Will look for source on that.

I heard on the radio (maybe NPR, not sure) it wasn't about the computers, it was about Delta's response.

According to the report, the other airlines delayed flights, while Delta cancelled them outright. That left Delta with more people and planes in the wrong places, making it harder to recover.


Because they used Windows 3.1


I chased through this chain the other day...

https://www.tomshardware.com/software/windows/windows-31-sav...

https://www.forbes.com/sites/tedreed/2024/07/20/meltdown-wha...

> A story on the website govtech.com on Friday asked the question, “Why isn’t Southwest affected by the CrowdStrike/Microsoft outage?

> “That’s because major portions of the airline’s computer systems are still using Windows 3.1, a 32-year-old version of Microsoft’s computer operating software,” the website said. “It’s so old that the CrowdStrike issue doesn’t affect it so Southwest is still operating as normal. It’s typically not a good idea to wait so long to update, but in this one instance Southwest has done itself a favor.”

The govetech.com article is https://www.govtech.com/question-of-the-day/why-isnt-southwe...

which linked to https://www.digitaltrends.com/computing/southwest-cloudstrik...

which linked to an earlier Forbes article - https://www.forbes.com/sites/hershshefrin/2022/12/31/can-sou...

> The December 2022 scheduling fiasco was the result of skimping on information technology. I am old enough to remember when Microsoft introduced a new operating system called Windows 95, to replace its predecessor operating system Windows 3.1. The 95 in Windows 95 refers to the year of its introduction: 1995. By some accounts, major portions of Southwest’s scheduling system for pilots and flight attendants is built on the Windows 95 platform. That platform is now more than 25 years old.


Southwest does not run Windows 3.1:

“That’s it. That’s where all these stories can trace their origin to. These few paragraphs do not say that Southwest is still using ancient Windows versions; it just states that the systems they developed internally, SkySolver and Crew Web Access, look ‘historic like they were designed on Windows 95’.”

https://www.osnews.com/story/140301/no-southwest-airlines-is...


The other day, I saw a screen capture from Tom's Hardware and so chased the series of links and quotes to try to find the earliest one that had reporting on it that was the source. That was the chain that I found.

I am not claiming that they run Windows 3.1 or Windows 95 ... but rather "this is where that story was sourced from" because everyone kept linking to somewhere else. The relevant XKCD is https://xkcd.com/978/


Funny enough, this cycle is close to what the Russian disinformation machine does deliberately to spread bullshit.


Is that actually true, or just something that's repeated until people believe it?



Yes, is there some evidence beyond the claims of "intelligence officials"?



I see what you did there.


Russian approaches are well known and documented. None of this is new, and wasn't even really that new in 2016, it's just become better known.

Essentially modern versions of Soviet-style disinformation campaigns, but augmented with new technology (social media), and without the ideological hindrances of a Communist government (e.g. sell hard to both Right and Left).

RAND Corp calls it "the Russian Firehose" model: https://www.rand.org/pubs/perspectives/PE198.html

Similar approaches are also used by NK, Indian, Chinese, and other national-tier disinfo campaigns. This contrasts with models used by the West, which are often less about creating a disinformation clusterfuck, and more of a "watch our Disney / BBC / Scandinavian TV & movies and their implied messages about freedom and human rights and shit".


Not too sure they are. The "experts" insisted that Russia colluded with Trump to "hack the election", that they somehow faked or planted or were responsible for various laptop and email leaks, etc., all such things which have since been found to be false or at best no real evidence has ever been produced in support of.

Obiously Russian, Chinese, and all other governments engage in information campaigns, and obviously the US government knows a lot about what they are. But we the public does not necessarily have the same information. It's not really possible to distinguish the "well known and documented (by the military and espionage industrial complex)" operation of foreign countries from domestic propaganda developed by those corporations and agencies to influence their own citizens.


In the article it says Southwest used 3.1, not Delta (though, that's apparently incorrect according to other posters).


And Southwest had two crew-management outages in 2022[0], so let's not sing their praises for escaping the CrowdStrike disruption. Southwest has been widely critized for under-investment in technology, Delta on the other hand purchased one of the best security products on the market and that backfired.

[0] https://en.wikipedia.org/wiki/2022_Southwest_Airlines_schedu...


Delta put all their eggs in one basket and had no DR capability


What basis do you have for saying that? It is likely their DR was running on a mirror of their production systems, and was similarly impacted by the Crowdstrike outage. So they fell back to Windows Servers similarly stuck in a boot-loop.

Keep in mind there was no way to opt out or delay CS Channel updates.


If your DR system is susceptible to the same faults as your main system it’s not a DR system.

It would be like claiming raid1 is a backup.


Or it would be like claiming my backup isn’t a backup because both systems run openssh, so a remote code execution vuln there could take down both systems.

Any DR system will have to accept some risks, and those don’t necessarily invalidate it in general, just make it insufficient for some scenarios.

Conversely, if they ran the main system on windows with crowdstrike and the DR one on poorly configured linux with no security software, they probably would have needed more sysadmins, had more trouble maintaining software for both, and been vulnerable to risk from both linux and windows bugs, so I feel like they made the right tradeoff in general.

I’m sure you, who can deride this DR system, have devised your own system such that it is resilient to a meteor destroying the earth.


> I’m sure you, who can deride this DR system, have devised your own system such that it is resilient to a meteor destroying the earth.

That reminds me one of Corey Quinn's comfortable AWS truths.

https://x.com/QuinnyPig/status/1173371749808783360

> If your DR plan assumes us-east-1 dies unrecoverably, what you're really planning for is 100 square miles of Northern Virginia no longer existing. Good luck with that ad farm in a nuclear wasteland, buddy!


As HN itself discovered a couple of years ago when a set of same-manufacturer, same-batch disks within both RAID arrays and backup server failed within a few hours of one another:

<https://news.ycombinator.com/item?id=32048148>

<https://news.ycombinator.com/item?id=32031243>


One idea: build a DR system and turn it off. Ideally it would be cloneable, but even without that ability, one could test it every few months to make sure it boots adequately quickly and then turn it back off. The attack surface of a bunch of computers or instances that are powered down is pretty low.


Better yet, alternate between them every month or two.


> Keep in mind there was no way to opt out or delay CS Channel updates.

Do CS updates somehow work over airgaps? You know, the kind that production systems have to prevent any access to or from external networks? Well... some production systems anyway.


What's your point? An air gapped disaster recovery system would be useless. An airline operations application has to connect to a bunch of other external systems to be of any use.


>Delta on the other hand purchased one of the best security products on the market and that backfired.

It looks like it wasn't a good security product after all...


I would like to know if a solid, up to date, well-rehearsed disaster recovery plan saved anyone's butt, or if we're all just raw dogging our machines whether IT is paying for backup and recovery or not?


Our systems worked fine, we expect things to fail - including software like sentinal one, crowdstrike, etc, and have DR systems which can keep us limping along. We have DR systems which will work should other things happen - say the Thames barrier fails (i.e. no docklands)

Unfortunately some of our outsourced suppliers didn't have such attitudes.


Sure has. HSRP and VRRP plus other SD-WAN features definitely made a difference when one of our sites had the fiber pulled by accident. data center tech screwed up bigly and took us plus at least one other of their customers down.

definitely saw a blip and stuff had issues for 10 minutes, e.g. pages timed out or had to restart a process, but generally sites failed over and were able to keep limping on while we did triage.

got something like a $19 ($21?) service credit and an apology from the data center. our CEO shouted a lot and threatened lawsuits but it never went anywhere. Director of IT Infra quietly thanked all of us for having failover that mostly worked.


They certainly have DR infrastructure primed and ready to go… with Crowdstrike pre-installed on every DR server.


I've never seen it.

Obviously some selection bias there, but I'd love to hear some success stories.


I see just moments after I posted, someone posted this: https://news.ycombinator.com/item?id=41103486

So, yeah, lack of DR is why Delta was so screwed.


One thing I don't understand from these graphs - why was there a relative uptick in takeoffs starting a short time /before/ the CrowdStrike update was pushed? It's in the overall graph, as well as the graphs for United, American, and especially Delta. I can't think of any reason for this, maybe it's just random noise, or maybe there was something unusual about the previous week at the same time?


One reason I put charts for both absolute numbers of flights and percentage change is to help understand the larger context. Those relative upticks happened just before CrowdStrike hit, so around midnight Eastern time, when traffic was already very low. So a 25% increase might be 12 extra flights taking off in the U.S. in an hour. There's plenty of noise, for sure, and looking at absolute numbers and percent change together can help give you a sense of what was going on. Looking at two days worth of data is probably enough to give you the main themes of the CrowdStrike impact, but not enough to explain every variation.


It was widely reported to be the busiest travel day for quite a long time, which compounded problems.


Yeah, shouldn't have been too hard to add a couple more weeks so you at least get an idea about variance.


Will be most interesting how this lawsuit by Delta plays out against Microsoft & Crowdstrike:

https://www.marketwatch.com/story/delta-hires-law-firm-seeki...


From the included link: https://www.techradar.com/pro/security/southwest-airlines-av...

> To give you an idea of just how outdated this operating system is, Windows 3.1 was originally launched in 1992, and Microsoft ended support for it on December 31, 2001, except for the embedded version, which was officially retired in 2008.

I keep hearing the Windows 3.1 story repeated. I mean here it comes from TechRadar and even has the "Pro" in the name, they can't possibly make stuff up, right? But still don't quite believe it.

Can anyone working at Southwest confirm that their main scheduling system is running on Windows 3.1?


> keep hearing the Windows 3.1 story repeated

It’s wrong [1] and serves as a litmus test for whether an outlet independently verifies its claims.

(“The systems [Southwest] developed internally, SkySolver and Crew Web Access, look ‘historic like they were designed on Windows 95’.” That got mangled into they run 3.1.)

[1] https://www.osnews.com/story/140301/no-southwest-airlines-is...


Wow, that’s even more frustrating considering it’s conflating an unfashionable UI (which I’d argue is a good thing, since all modern UI trends are towards slick, minimalism-worshiping messes which hide everything from users) and old, provably-flawed technological foundations (like a 16-bit system without things like filesystem access control or memory protection).

I knew this story was false immediately though because no company ever even in 1993 had production server systems which ran a desktop OS like Win 3.1. It just wasn’t up to the task. They would have used NT if anything.


http://www3.alpa.org/LinkClick.aspx?fileticket=IO7kd%2Bfm2Do... shows the system as of 2020. To the parent’s point, it’s actually quite a reasonable UX, with colored outputs, filter banks, and just enough abbreviations and whitespace to balance density with intuitiveness.

But that doesn’t mean this is the only modern design system that meets those requirements. And conflating all modern UI with consumer design trends is an equally frustratingly broad statement.


OK, this is definitely unfashionable looking if your main exposure to apps is the latest doodah on your phone that was literally updated yesterday.

Very standard looking legacy Win32 looking app. Which, admittedly, would have probably look very similar had it been on Windows 3, but is probably running on LTSC Windows 10 or something in reality.


Doesn't look Microsoft at all to me, just colored to mimic XP. Java on some Unix?


Page 7 (as labeled) of the slides. The tabs and checkboxes layout have a distinctly Win 9x era look/feel. I do agree that it's missing an obvious menu, and the theme for the window decorations reminds me of win 3.1, but that was probably an option for software of that era just as it is in this if someone pushes hard enough.


Perhaps you just aren't old enough? It looks very Windows 95 to me.


Age has nothing to do with it, the interface just doesn't look like Windows 95.

The button shapes, minimize/close window buttons, the titlebar are all looking wrong for Windows 95.

It looks significantly more like Swing, but then the buttons don't match that either.


It looks like every single hospital or car rental software I have managed to peek.

It's not old-fashioned, it is _timeless_ B)


Link worked for me but took a long time to load. It just seems like their server is overloaded.


Broken link


Windows 95 is an "unfashionable" OS which has not received any security updates since 2001.


Yes and the fact that my software’s UI looks like Windows 95 makes it vulnerable to all the same security vulnerabilities.

/s

The systems don’t run on W95, they look like W95


Being blasted by media for running your own software, incredible. As others have commented, just a single tweet was enough to propagate this story. Quite concerning how easy it is to fake reality nowadays.


This is the same as the "Olympic cardboard beds are anti-sex" fake story that persisted. Anyone who publishes it demonstrates they don't actually research.


Thanks, I updated the post.


i miss the lemonodor blog


I know this is a hot take but companies have to figure out if modernization of a UI will be worth it to retrain everyone in the new UI. Many people were involved with its creation and maintenance and due to its age the UI may have a large amount of glue code that can't be separated unless you build an API around the other software. Especially if there is some kind of change in the system that moving off the old one is meaningless. Southwest is also making changes to their operations so they probably might be in maintenance mode for the software especially when the outage of their current software was done since they will have to not have anyone choose any seat at this time. [1]

[1] https://www.cnn.com/2024/07/25/investing/southwest-airlines-...


I don't know, I like the classic Windows UI. I don't think modern UIs are an improvement on that.


No no no. We must now have floating headers that don't give any indication they belong to the columns below them, much less that you can click them to sort the columns. 95% of possible actions must only appear when hovered over. Buttons should not look like buttons, nor should they provide any feedback that they've actually been clicked. Etc.


to be fair, some of the java-era software with their default toolkits do look very windows 3.1/95'ish (all that blue and teal)


Tech Radar quotes Tom's Hardware; Tom's Hardware quotes a tweet.

Not a tweet from Southwest, mind you. Not even a tweet from someone who says that they used to work for Southwest. Just... a tweet.


I just wish there was some type of identifiable credit / penalty system for writing accurately as a news source. And this would include quotes / retweets. Never been a better time to be wrong about everything.


> wish there was some type of identifiable credit / penalty system for writing accurately as a news source

Good starting point is if the news is free. A shocking fraction of people get their news from solely free sources.


And why would someone put in effort for free?


This is a misunderstanding of the problem. Effort is made in both cases. In one case effort is made to find verifiable truth as a service. In the other effort is made to provide eyeballs to advertisers.


What's "solely free"? Does the ad-driven model count as free? Why do you think an outlet that works for you will necessarily deliver better quality news that the one that works for advertisers? There are obvious bias downsides to both.


The ad-driven model does count as free, and it's far less likely to deliver better quality news than a subscription service users pay for. The core metric for ad-driven news sites is maximizing views—it doesn't matter how you get views as long as you get them. This means free sites are heavily incentivized to be the first to break a news story even if the details are wrong or sparse. Sure, they'll issue corrections and updates later, but only a small percentage of the initial viewers will ever see these, and there's essentially zero cost for having made the mistake.

The core metric for subscription news sites is minimizing churn. A mistake will cost a subscription site subscribers who have a massive lifetime value. These sites are heavily incentivized to report high quality, accurate news even if they're not the first to break the story.


> What's "solely free"? Does the ad-driven model count as free?

Yes, in this context.

> Why do you think an outlet that works for you will necessarily deliver better quality news that the one that works for advertisers?

I can’t explain the mechanics precisely. But it’s pretty clear when I compare my subscription and non-subscription sources where the quality lies.


Yet paying for news is a very weak guarantee of not being fed propaganda/inaccurate reporting.

If we held food safety to the same standards as paid news sources are held, people would get salmonella once a week.


Your cure is worse than the disease. The second such a system existed, it would be gamed to hell and back, and nobody would believe it anyway since they'd all angrily insist that "you shouldn't have counted X" or "you should've counted Y more" and it would just turn into a war over who got to control the system and use it to deplatform their enemies.


It doesn't have to, and indeed shouldn't, be a single system. We'd rather have a handful of independent news checker orgs, maybe some topic-specific ones. Funding remains an exercise for the reader.


There just isn’t. You just have to read enough of one source to determine your own opinion.

Just like with anyone you meet: you are the judge if they are trustworthy, nice, mean, funny, etc.

That said, I think tech journalism is the bottom of the barrel. I just feel like they focus more on tech than journalism.


the cost of producing bs is too low, back in the day it would at least require time and money to print / distribute.


Community notes on twitter is the closest thing to what you're describing I've seen yet. It's been very helpful too imo


A great example of why people don't trust journalists anymore. They don't even perform a basic amount of fact checking before publishing.


Articles from the likes of Tech Radar or Toms Hardware I would trust to a higher standard than a random tweet, but really I wouldn't label them as "real journalists"

I question the ethics and standards of the New York Times at least a little at this point so it's not like great journalism is common.


Also, there is the effect of a lie oft repeated becoming the truth - the times I've seen small outlets writing nonsense, and having it picked up by progressively bigger papers citing the smaller ones as credible sources is too much to count.

Generally there is a chain of trust in news publishing that goes nowhere and there's nothing we can do about it, as more often than not, someone credible repeats the hearsay nonsense down the line, at which point they count as a primary source.

So much of news publishing I would describe as not even wrong.


People don't pay for news anymore so we get what we pay for.


People never paid for news really. If you're thinking of the days when you had to pay 25 cents for a newspaper at the convenience store, that didn't come even close to the cost of running a newspaper in those days. Your quarter only covered (maybe) the cost of the paper and printing it. These days, we don't need paper, and running a web service is probably cheaper per-reader than physical paper.

Newspapers got the bulk of their funding from advertising back then, just as they do now.


The important thing is you were able to justify not paying for stuff.


I don't trust any kind of generalization like this, which only serves further disinformation and misinformation.

There are bad journalists (if they can be called journalists at all) and good journalists. At this point in history, our only hope lies with diligent reporters from reputable publishers.


[flagged]


When's the last time you paid for a newspaper or a subscription to a newspaper?


It's unfair to pretend that all journalists have the same level of professionalism (or lack thereof) with regard to sourcing.

They don't.


It's kind of depressing to think that we have had this world-spanning system of knowledge and "hyperlinks" for decades now, individual pieces that should've enabled an easy chain of attribution/citation...


And encourage the reader to move away from your site‽ No self respecting PHB could condone such a thing.


I've started seeing this on Wikipedia.

Wikipedia sources an article from a semi-legit source. That semi-legit source either just says "sources" or points to something less-legit, like a Tweet.

You can bring new "facts" into existence by just laundering them from lower- and lower-quality sources.


Source-laundering is a bit catchy, I have to say.


The guy that started it all said it was just a "troll tweet":

https://x.com/ArtemR/status/1815408553131426179


The "Southwest uses Windows 3.1" claim is false, and is a great example of how bullshit can spread on the Internet once some semi "reputable" organizations repeat the false rumor:

https://kotaku.com/southwest-airlines-windows-3-1-blue-scree...


I worked for SITA ( https://en.wikipedia.org/wiki/SITA_(business_services_compan... )back in the late 2000's. They had a massive X25 serial network connecting airlines across the globe. Some of its customers were still running Windows 3.11 in the data center on old AT system. We would buy old computers on craigslist and ebay to keep hardware around for when it failed. I wouldn't be surprised if those systems are still in use today.


The San Francisco subway runs off of 5-inch floppy disks.

https://sfstandard.com/2023/02/02/sfs-market-street-subway-r...

That article links to an (only slightly older) article about British Airways loading navigation updates every month off of the fancy new 3.5-inch floppy disks.


> Can anyone working at Southwest confirm that their main scheduling system is running on Windows 3.1?

I can't confirm that, but I can certainly confirm lots of hospital equipment is still running Windows XP and lots of hospital personnel browse the internet with Internet Explorer.


This story is another example how hallucinations from LLM can successfully replace many "news" portals.


Berlin Brandenburg got hit hard. As a disgruntled BER user, I am NOT surprised they had one of the worse repercussion.


German IT is often hit hard by such things. Unless of course they're still running on paper.

At least they immediately mentioned it on their website, as a banner right at the top. The immigration office's appointment system has been down for over a month, and it took them 3 weeks to just acknowledge it.


Thanks for your website btw. My partner’s currently renewing her work visa (she’s from Australia) and it’s insane how bad the situation is. To the point where it feels like it should just be illegal for Berlin to operate services like this. Appalling.


I'm still shocked that Brandenburg is actually open!


Lawsuits inbound. Delta appears to be gearing up for one already:

https://finance.yahoo.com/news/delta-air-lines-seek-compensa...


> has hired a law firm and will seek compensation from Microsoft and CrowdStrike

Going after Microsoft seems like a misguided move here. What does Microsoft have to do with a third party driver installed by your own IT department?


I suspect the lawsuit is created by lawyers, not techies.

Equally reporting on this whole issue seems to be by journalists, not techies. It's been framed (a lot) as a Windows issue not a Crowd Strike issue.

(8.5 million machines were affected, out of 1.4+ billion windows machines [1])

I have one affected customer (10k machines) who assumed I'd suffered like he did, and was surprised when I said we weren't affected. The reporting was consistently that it was a Windows issue, caused by an MS update.

Even this article leans into this narrative...

"as the New York Times put it, “It is more apt to ask what was not affected.” The answer is Linux, Macs, and phones."

Let me add "not to mention 99.4% of windows".

So the journalists don't know what happened, or who was affected, and felt "some computers have a problem" was a weak headline. The lawyers get that narrative and run with it.

And yes, it's easy to squint and claim the "OS should cope with this", but there's realistically limits on what an OS can do once you install a kernel-level driver on the machine. Should we go after Intel for making the chips?

[1] https://www.pcworld.com/article/608447/microsoft-delighted-b...


Reports in the mainstream media were absolutely insane. All the language used points to some kind of unlucky event similar to a bad weather pattern. I knew the general IT knowledge isn't very high but I didn't expect newspapers to report on it like a tornado or an earthquake...


"Let me add "not to mention 99.4% of windows".

...And neither were any of my Windows 7 systems affected.

"And yes, it's easy to squint and claim the "OS should cope with this", but there's realistically limits on what an OS can do once you install a kernel-level driver on the machine."

What are those limits, and why are they limits? Are there solutions? Yes there are. For instance, Microsoft doesn't takes snapshots of the working system state before loading a kernel patch, which on crash it would automatically reload without patch, nor does it employ various other techniques that would solve the problem.

I've discussed these issues in other posts so I won't repeat them here.


It's arguably also a fault of MS: https://news.ycombinator.com/item?id=41096344


> Windows is unsuitable precisely because it can be brought down by third party updates

If I run bullshit on a Linux or MacOS box it can also be unstable and brought to its knees. Or is that poster really trying to argue there's no way you can get a Linux box to lock up?


Key quote from my link:

> Third party vendors are forced into writing unsafe kernel drivers because Microsoft does not provide sufficient user mode APIs.

AFAIK it's different on Linux, and the reliability is higher. Is this not the case?


https://forums.rockylinux.org/t/crowdstrike-freezing-rockyli...

https://access.redhat.com/solutions/7068083

https://lists.debian.org/debian-kernel/2024/04/msg00202.html

You can install buggy kernel modules in Linux as well. I can't even count how many times an apt upgrade/yum update made my system unbootable when using nvidia GPU drivers.

And besides, if you're really wanting that AV system to deeply know about everything the operating system is doing and hook into tons of syscalls, you pretty much can't be running exclusively in usermode. If someone compromised the root of the system you then can't trust the info the kernel is giving your usermode application. eBPF isn't usermode.

And in the end, the poster literally said Windows is unsuitable because third party updates can kill it. That was the key takeaway from their post. Well, third party updates can kill Linux, it can kill MacOS, it can kill darn near everything.


It's an argument. I'm not sure it's a _good_ argument, but hey it's an argument.


It was too tempting to include damages from accidentally enabling New New Outlook.


You, want Microsoft named in the case so CrowdStrike can't defect that it's Microsoft's fault.


Anyone know why Minneapolis-St Paul began experiencing cancellations much earlier than other US airports?


How to avoid getting rekt

> Southwest wasn’t affected because they don’t use CrowdStrike


i really dont understand, how can my social media have better backup and infrastructure as compared to an OS which is being used by worldwide?


Because IT is your social media's business. They know IT inside and out. They understand what can go wrong and how to mitigate it. The business for airlines (for example) is to fly planes. They are pretty damn good at it. IT, however, is just a tool to them that they buy elsewhere. They don't understand it in the same way as social media. They rely on outside contractors to do it right: outside contractors who get the job based on being the cheapest or convincing the buyers their service is "industry best practice."


I wouldn't overestimate FAANG's immunity to crash-the-world config updates. Facebook had everything including engineers' access to the datacenter down for hours in 2021:

https://news.ycombinator.com/item?id=28750894

> infrastructure as compared to an OS

By the way, I don't think quality of Microsoft's infrastructure is relevant here.


Compare the salaries, working conditions and prestige offered by tech jobs at a social media companies vs some large legacy company like a bank or airline.

In the former, you are paid well and have some sort of prestige and political capital. In the latter, you are underpaid and your prestige/political capital is often equivalent to the janitor's.


Meta is one of the most valuable companies in the world with the most resources to buy the best of everything. At 1,280 Billion dollars of market cap it is 30x bigger than American, Delta and United put together. It made $39 Billion last year compared to $7.8 Billion for all US airlines together. Of course it has better systems.


Because one is in a data center that can be controlled, and the other is deployed to user owned hardware that cannot.


Because they don't do mass rollouts on the servers. Then again those companies could fail if they had single point of failure with automatic mass deployments...

This could happen for anything that supports this type of automatic mass deployment. Just in this case that thing was popular enough and happened on one of the most popular platforms.


Windows has always been terrible for reliability. Adding a "security" system which is invasive and always-updated makes the reliability worse.


Is there a similar global analysis?


Maybe I'll do a Part 2: The World.


As a non-American that would be very interesting.


love to see the airlines using linux and what kind of problems, if any, they experienced that day


I’d love to have some solid numbers of “global cancellations due to” - I heard a bunch of varying figures so far.


basically any airline using linux is not on that list


It's more accurate to say "any airline not using Crowd Strike is not on that list."

Blaming Windows for this outage is like blaming Linux for Apache bugs. The two systems are distinct.

It just so happens that Crowd Strike was very successful at selling to large corporates. That includes some airlines.

99.4% of Windows machines were unaffected. Including those of airlines using Windows, but not Crowd Strike.


ahhh yes you are correct


every airline in the world "uses linux", the core reservation and distribution systems were migrated from TPF to Linux over the past 20 years


Can you name one?


One interesting feature of this outage was that "PROD" was generally fine, on account of mostly running on Linux and/or ancient proprietary software, while "CORP" was generally wrecked, on account of mostly running Windows. In other words, the bank systems responsible for moving money mostly worked, while the systems responsible for allowing humans to interact with them (to issue approvals, change configuration, or other ops things) often did not.


Same thing for a lot of industries actually. PROD runs on Linux and probably has some delay to prevent this. Corp gets hosed.


Yep, here in manufacturing production/OT PLCs run on Wind River VxWorks from Rockwell, Siemens, and others. The HMI (human-machine interface, basically a touchscreen used to display status and enter setpoints and other data) and SCADA/ERP systems run on Windows. Sometimes, this is an industrial fanless PC with eg. Ignition (Java+Python) software, other times it's a Rockwell Panelview which actually still run Windows CE 6.0.

This gets to be a problem when IT wants to get their hooks into OT networks. The PLC is meant to be left alone, and will happily send its Ethernet packet to that servo drive or digital IO card every 10ms for literal decades. There is no reason to update its firmware ever, just don't expose it to the Internet. But corporate wants everything on the Internet.

The PLC will reliably run its sequence when you close the contacts on the physical "Cycle Start" pushbutton. But if corporate is down, you can't know what part number you're supposed to make or how many of them, or get a serial number from and report test results to the traceability database.


On the flip side, there are a lot of physical production systems (think CNC mills or 3d printer farms) where remote observability and management would be very handy, or where you'd really like to upload gcode files directly from your workstation. However, because they've been air-gapped, you need to instead walk across the shop floor to "that one PC" that allows you to insert a USB stick, copy the files off the network drive to the USB, then walk back to the lab with the machine tools and insert the stick to feed the files over.

If you want to monitor, you need to sit in the lab and watch, or if you're lucky, leave a PC with a webcam pointed at the tool and remote into that machine from your desk or your laptop at home.

This works, but long cycle times kill productivity, and engineering twiddling their thumbs costs money. It's easy to end up spending multiple hours a week just walking back and forth doing this dumb dance. One would expect that with 40+ years of networking experience, we would have come up with a way to securely perform these tasks without simultaneously exposing our tooling to cyberattack. Perhaps some kind of segregated network that can't access the internet, but gets pull-only access to the file share? Or vice versa - a screencap feed that gets sent through a data diode so an engineer can monitor the tool from their phone or laptop without being able to affect it?

Perhaps such solutions exist and are just beyond the IT skills, budget, or complexity appetite of the sorts of production tooling shops that I'm familiar with.

Caveat tho - I don't work in this space, I'm just friends with people who do.


100% - banks were just a specific example that popped to mind immediately. If this bug had affected Linux Crowdstrike, we might have seen the opposite - people hard at work on their laptops trying to fix the production server outages. Probably nothing could have taken down the FAA systems though, on account of them being too old and bespoke to have a supported Crowdstrike module.


In the original thread there were some reports of people having their Linux systems taken down by Crowdstrike as well. At separate times, of course, and I supposed the greater heterogeneity of Linux distros prevents events of this magnitude. But that would be little consolation when it takes down your systems.


Those should be considered coincidence until proven otherwise. Crowdstrike is intended to bring down systems when it believes there was an intrusion, after all.


Not by crashing the kernel...


Outsourcing a core business competency and surely also cutting the contracts to the bone as well to pocket the savings embrittled Delta and I seriously hope the compensation to customers costs more than any savings or profits they made in the interim. It MUST be painful enough that they do not repeat this mistake again.

The article quotes https://www.reddit.com/r/delta/comments/1edtfbh/why_did_delt... (with improper attribution)

topgun966Platinum wrote on Reddit """ These "experts" are completely wrong. The core issue was Delta did NOT have a proper DR plan ready and did NOT have a proper IT business continuity plan ready. UA, AA, and F9 recovered so fast because they had plans on stand-by and engaged them immediately. After the SWA IT problem, UA and AA put in robust DR plans staged everywhere from the server farms, to cloud solutions, to end-user stations at airports. They had plans on how to recover systems. DL outsources a lot of their IT. UA and AA engaged those plans quickly. They did not hold back paying OT for staff. UA and AA have just as much reliance on Windows as Delta. AA was recovered by end of data Friday and resumed normal operations Saturday. UA was about 12 hours behind them having it resolved by Saturday morning resuming normal schedules Saturday afternoon. The ONUS is 100% on DL C+ level in their IT decisions. The problem is that the lower level IT staff is going to get the brunt of the blame and the consequences. """


That’s why I think the suit against crowdstrike and Ms is mostly a dud. First you have to get around the waiver (much harder for business than a consumer) and then you have to deal with comparative fault - ie delta’s disaster recovery system sucked.


I love that “CrowdStrike” is now a synonym for “global outage”. Not some cute hihi name like “heartbleed”, just the name of the company that did the screwup. Seems fair.


Not sure it's fair, but I am certainly waiting for it to become a verb or a noun.

    crowdstrike. n.
     1. A set of major disruptions caused by an update that was not tested enough, pushed to many devices across the globe.
     2. The name of such an update.
     3. (by extension) a joke so bad it causes major disruptions.

     For instance:
       - Congrats for your crowdstrike! Now my weekend is ruined as I'll be the one who'll be asked to fix this mess.

    crowdstrike. v. (simple past crowdstruck or crowdstriked¹, past participle crowdstricken, or crowdstruck, or (obsolete, regionalism) crowdstroke²)
     1. Action of pushing an update to many devices that causes a global outage or major disruptions in various sectors.

     For instance:
       - We've been crowdstruck. Again.

    crowdstrike. adj.
     1. Qualifies an update that, when pushed to many devices across the world, causes major disruptions across the globe.
     2. Qualifies such a (set of) event(s).

    For instance:
       - We are sorry for the crowdstrike event we caused. We gently remind our kind customers and their end users that per our ToS, we will issue no refund, and that no liability can be held against us. Customers who don't try to contact us in the following month will get a discount for their next contract renewal. You will hear us speak before the Congress, who nicely invited us for some comedy in the hope it will appease you all. Make sure you like the related videos on the various online platforms. We wish you a nice end of the week and nice, relaxing summer holidays.
¹ people have differing but strong opinions on which simple past form is correct, mainly due to regional differences. Some avoid saying crowdstrike and say crowdhit instead.

² some people have tried to push crowdstricken, which first caught on in some areas or particular contexts. The idea that this form likens the qualified subject to the bearer of some sickness has eventually seduced a critical mass of people after some initial push back. Please also see the usage notes for strike for other, rarer, alternative forms [*].

[*] https://en.wiktionary.org/wiki/strike#Usage%20notes

(Thanks to the contributors in this thread)


since nothing will happen to them except a slap on the wrist, and all our employers will continue to force this crapware on our machines, i think we should make a point to start using their name as a pejorative (similar to the 'santorum' neologism). any when they inevitably try to rebrand, use that term too


> since nothing will happen to them except a slap on the wrist

I've already bought some of their stock, i'm pretty sure it's bottomed. I bet i make 30% a year from now. This always happens some "ohnoes!" event cuts a stock price off at the knees but then everyone forgets and in a year or so it's back to where it was before the event.


“The intern crowdstruck half the customers”


Exactly, by the way I added the irregular inflections and fixed the example for the verb. Thanks for your contribution.


I disagree, I think that the simple past should be "crowdstruck" but the participle should be "crowdstricken", as might apply to someone afflicted by an illness:

"The update wasn't tested, so the servers are all crowdstricken."


Thanks, I added the documentation for this form, and added a second usage note. I initially wanted to tease you by documenting that people with bad taste tried to push for this form, but I really like this illness idea.


> crowdstruck

    Said, "Yeah, it's all right
    We're doing fine"
    Yeah, it's all right
    We're doing fine, so fine

    Crowdstruck
    Yeah, yeah, yeah, crowdstruck
    Crowdstruck (crowdstruck)
    Whoa, baby, baby (crowdstruck)
    You've been crowdstruck
(AC/DC's Thunderstruck, but replacing "thunderstruck" with "crowdstruck")


They do have a song about an insidious disabling virus you know: https://www.youtube.com/watch?v=6njy7mZbwdc


Does anyone know (or have any guesses as to) why the founder(s) named it "CrowdStrike"? What was (or might have been) the idea behind the name? I'm guessing it's not patterned after "crowdfunding" "crowdsourcing" "crowdlending", etc.


It's part of a trend where companies name themselves after a self-describing disaster they're going to cause. Oceangate also did this.

New investing strategy is to look for companies whose name also fits this pattern but who have not yet caused the disaster and short the stock.


The cute name was Blue Friday, but it doesn't seem to have caught on.


Rebranding project coming up at CrowdStrike?


That would be a shame, the name is so fitting, more than ever!

They struck a very big crowd real bad.


I found it quite interesting, that crowdstrike actually exclude a bunch of services explicitly. They also basically say, don’t use, if it needs to be reliable. I don’t know if this is standard for software, but for me this was quite surprising.

From crowdstrike terms and services [1]: […] THERE IS NO WARRANTY THAT THE OFFERINGS OR CROWDSTRIKE TOOLS WILL BE ERROR FREE, OR THAT THEY WILL OPERATE WITHOUT INTERRUPTION OR WILL FULFILL ANY OF CUSTOMER’S PARTICULAR PURPOSES OR NEEDS. THE OFFERINGS AND CROWDSTRIKE TOOLS ARE NOT FAULT-TOLERANT AND ARE NOT DESIGNED OR INTENDED FOR USE IN ANY HAZARDOUS ENVIRONMENT REQUIRING FAIL-SAFE PERFORMANCE OR OPERATION. NEITHER THE OFFERINGS NOR CROWDSTRIKE TOOLS ARE FOR USE IN THE OPERATION OF AIRCRAFT NAVIGATION, NUCLEAR FACILITIES, COMMUNICATION SYSTEMS, WEAPONS SYSTEMS, DIRECT OR INDIRECT LIFE-SUPPORT SYSTEMS, AIR TRAFFIC CONTROL, OR ANY APPLICATION OR INSTALLATION WHERE FAILURE COULD RESULT IN DEATH, SEVERE PHYSICAL INJURY, OR PROPERTY DAMAGE. Customer agrees that it is Customer’s responsibility to ensure safe use of an Offering and the CrowdStrike Tools in such applications and installations. CROWDSTRIKE DOES NOT WARRANT ANY THIRD PARTY PRODUCTS OR SERVICES.

[1] section 8.6 of https://www.crowdstrike.com/terms-conditions/


> I don’t know if this is standard for software

This is pretty standard. There is almost identical language in the Windows and macOS EULAs, for example.


Same for datasheets of most electronic components. The manufacturers don't want the responsibility to avoid possible multi-million lawsuits.


So how does it get installed on all the endpoints in 911 dispatch centers?


Because FBI CJIS requirements, adopted by state law enforcement bodies, require it. I support a Public Safety Answering Point (PSAP, aka a 911 call center) and I push back on as many of the inane requirements as I can with compensating controls.

Example: As of right now I am still required to expire passwords every 90 days. My state is considering the current guidance from NIST but FBI CJIS policy still mandates the expirations.


I don't know what CJIS requirements entail precisely, but at a first glance, they seem reasonable. But it's weird that people then think they can comply by installing a product with a disclaimer against their intended use. It's just a token acknowledgment: "Yeah, we've read it, but we don't really care."

If that's also the interpretation of the courts, then each company would be invidivually liable, at least towards the government.


Holy shit I cannot stand the password expiration requirements. Like you said, NIST literally recommends against it but so many regulations require it. So aggravating.


Because no endpoint protection software exists that doesn’t have the same disclaimer clause. So you install this one and accept the lack of vendor liability.

(If such a thing did exist, it would cost a lot more!)


What is the alternative? Have you considered a possibility that those could be the best out there for 911 despite their imperfections?


The data entry endpoints in a 911 dispatch center should not be running a general purpose consumer OS. They should be single purpose machines much closer to a dumb VT100 terminal than a personal computer. Maybe something like a stripped down hardened Chromebook. No internet connection. No personal email, web, or other use allowed or even possible. A product like crowdstrike should not be needed because it should not be possible to run anything but the dispatching software on those machines.


That's what computer aided dispatch (CAD, in the industry) software was 30 years ago (my PSAP had an AS/400). The market has rejected it. Also, see my other comment re: FBI CJIS policy.

In the PSAP I support we have three dedicated PCs at each workstation to run the CAD, phones, and radio. Each of those has a dedicated VLAN, separate physical servers and storage, separate Active Directory forest for CAD (no AD for radios or phones-- standalone PCs), and default-deny ACLs for inbound and outbound traffic on the hosts and at the borders.

A fourth dedicated PC (VLAN, ACLs, physical servers, AD environment) does email, web browsing, etc. (All of it is shackled together with a nice KVM that supports a single keyboard and mouse controlling up to 5 PCs.)

Not every PSAP does this and I think that's insane. The law and fire agencies we interface with absolutely do put a single PC on a desk (or in a cruiser) and use it for everything (and we filter and monitor the traffic coming in from them over our VPN heavily and block access at the first sign of anomalous traffic). Often their budgets don't support the notion of using dedicated computers for task-oriented work. The marketers have pushed general purpose devices for this kind of application.

In the last 5 years all three "hardened" systems we use (all companies acquired by Motorola) have started requiring Internet access for various APIs they use, and for integration with third-party vendors (mapping, public information databases, and task instructions for telecommunications). I think it's ridiculous, but I don't get to decide the direction of the product roadmaps or what the business stakeholders want from a feature perspective.

Motorola (who makes the CAD software used by some of the largest US municipalities) is pushing for hosted CAD and integrating hosted features into on-prem systems. (Of course, they have a managed security product offering that they want to sell along side it.)


Usually the largest of companies will have their own customized T&Cs governed in their Master Services Agreement (MSA) which are often very modified versions of these publicly available ones


My experience has been better legal counsel has the relevant terms struck before the deal is signed. In this case it would have been the terms around Aircraft and aviation


There often are limits to how much your can disclaim in your T&C. If under the same terms you cause damages deliberately you'll be held liable, and obvious gross negligence can be a factor as well.

There are often 3 opinions between any 2 lawyers so we have a chance to learn the outcome many months and millions of dollars later.


> The outage highlighted a different kind of digital divide. On one side, gmail, Facebook, and Twitter kept running, letting us post photos of blue screens located on the other side: the Windows machines responsible for actually doing things in the world like making appointments, opening accounts, and dispatching police.

At this point using windows for these tasks seems like using legacy software because training people to use an iPad or a web browser seems too complicated or because no one wants to move their age old systems to a more modern web based system because of costs. Native apps work great, but I think the world is moving to the cloud and that means web based everything should be the norm. Yes AWS AZURE outages can still happen but those can be fixed by spinning up a VM in different clouds.

This is also why software jobs aren’t going anywhere thanks for a while. Many systems need to be changed to more modern and robust clouds. It might take decades for this transformation across the globe.


Your “modern and robust cloud” is my “why on Earth doesn’t this thing work offline”.

The world is absolutely full of things that have worked for decades to centuries without the Internet, are eventually more or less consistent (remember carbon paper credit card machines?), and did an amazing job of keeping the world running despite, wars, network partitions (the “network” would basically always be partitioned), mistakes, entire branches offline, etc.

Sure, a lot of things are easier when centralized, and “the cloud” is incredibly powerful. But it’s not necessarily more robust. Also, depending on any sort of cloud means you’re also depending on the network, and networks are far from infallable. There’s a reason that a lot of stored-value transit systems still track balances on the card and will let people in even if a fare gate cannot connect to a cloud service.

And CrowdStrike took out plenty of cloud instances, and recovering them can be worse than recovering physical hardware, as the “robust cloud” has an absolutely terrible ability to do anything outside the happy path of booting an instance normally.


Okay this sounds all very reasonable, but how do you know when your washing machine is finished, when it's not connected to the cloud and you won't get notified in your app? It sure is not an easy thing and the cloud helps very much here


When the noise from the white box stops, then I know. And if I'm not at home to hear it, I'm not quite sure why I'd need to know.


Well, for people in an apartment it doesn't matter all that much, but if your laundry washer or dryer is in the basement, you don't necessarily hear it if you're out in the garden.


Sure, it might be a "nice to have" thing. But the machines usually show how long they'll take. And even if it's a newer one with sensors that make the whole process vary in time. I'd still be like "Oh, okay it'll take about 3 hours, so ill be back at 6pm". It doesn't really matter if the clothes chill out for about an hour, especially the newer machines don't stink that fast. And on top of that, I don't think that it has to go over the internet if you needed some sorta notification. Local would be suffiecient.

If I buy something new like this and have a few choices, I intentionally pick the one with as few smart features as possible.


What happened to the good old tin can telephone down the side of the house to the washing room?


I think you are joking, but I'll reply with a serious answer.

Where I went to college, our dorms had (free) shared washing machines. This was "pre cloud", but wifi was throughout. One student rugged up a hall-effect sensor and attached it to each power cable. It could detect if the washers and driers were on. It sent this info to a specific website that the students could monitor to see if there were any available washers or driers.


Wasn't the first webcam setup to show whether a coffee pot was full?


Also the reason we got Hyper Text Coffee Pot Control Protocol (HTCPCP) in RFC 2324


I hope this is sarcasm, but if it isn't washing machine cycles have a fixed duration so a timer on your phone is more than enough, no cloud necessary.


I wish washing machines had a fixed cycle duration. When I start the cycle my washing machines tells me the same duration, always, but in actuality it takes different amounts of time every time. Madness. I've been told this is a feature.


> Madness. I've been told this is a feature.

It actually is. Fixed length cycles haven't been a thing for many years now - modern washing machines adjust the washing cycle length by the weight of the laundry and its behavior during spin-drying, both its vibration behavior aka weight distribution (that can have multiple adjustment cycles to achieve reasonably even distribution) and how much water it loses - when no more water comes out during spinning, it will cut the cycle short to save energy.


Yes, newer machines shorten the cycle for lower loads and less dirty clothes.


> When I start the cycle my washing machines tells me the same duration, always, but in actuality it takes different amounts of time every time.

If it says (e.g.) 43 minutes, but sometimes it takes 40 and sometimes 49 or 53, set your timer for 60 minutes and get on with life. Your laundry sitting for 17 or 7 minutes isn't the end of the world. If your timer goes off and it's still not done, set it for another 20 and do something else.

Of all the things to fill your head with worry and annoyance with, laundry is near the bottom of the list for me.


Except when you live in a building with communal washing machines and where you need to book time for laundry, as it is common in many European cities.


My washing machine is kind enough to both indicate time to end in minutes, but also allows me to delay start so that the cycle is finished in [x] hours. It's not even that modern.


My modern dishwasher is also very kind, and displays the time to end in minutes throughout the wash. Counting down from an hour. But I don't know what kind of upbringing it had, for some reason, the sneaky bastard always adds another 25 minutes, when there is supposedly only 10 minutes left.

I guess dishwasher years are like dog years. At least it definitely behaves like a teenager at 2 years old, finishing when it wants to finish. Estimates be damned.


Do you always load your machine up to the same level? A low load will trigger a shorter cycle time to save energy and water.


My home assistant does approximately this without the cloud, but it isn't magic: cloud is just 'someone else's servers' and I just host it on my own raspberry pi.


At this point I'm tempted to start using "the ground" as the opposite of "the cloud".

I'm already mentally replacing "cloud" with "clown" anyway, to the point I have to stop myself from accidentally saying "clown computing" out loud.


> Okay this sounds all very reasonable, but how do you know when your washing machine is finished

1. Check back in an hour (like my (grand)mother did—and she managed to do laundry without Wifi).

2. Or: have a washer that beeps.

3. Or: set a countdown kitchen timer (or a timer on my phone) that will beep if my washer does not have a washer.

There are complicated situations in life: doing laundry is not one of them.


What does this have to do with “the cloud?” If you want to make a washing machine robustly notify its user that it’s done, surely a message sent over the local network or even Bluetooth is a better start. Anything involving the Internet is only useful when the user is outside the house, and there are more robust solutions to that than a server in us-east-1 that you hope the manufacturer keeps paying for.


I can't tell if this comment is sarcastic but maybe washing doesn't need to be hyperoptimised down to the instant the machine finished


First I though you were joking, then got hit by the disbelief of realizing you were not...


Nah, you're good. I was joking.


Wait. They aren't being sarcastic?

In all seriousness, I think never has there been a better time to educate people on the fundamental philosophy of computing freedom, and I usually start with Eben Moglen and RMS's talks with people.

I don't know how much of this is generational, or how much of this is corporate sell out, or maybe even sockpuppetry for consensus cracking and other psyop techniques, but relearning the lessons of early computing (such as being able to do things offline, locally, as a core part of a functioning decentralized system), seems highly in order.


I'm really hoping your comment is sarcastic.

If it is serious, you could always set a timer.


BTLE exists and is good and cheap.


This could have been fixed by having a minimal baseline of machines not running the same software

Resilience comes from diversity, in computing and in biology. Whether that's having critical workloads on multiple cloud providers or having one user interface on windows on network A (Arista) with crowdstrike and one on a mac on network B (cisco) with Sentinal one

Sometimes perhaps you can't eliminate a single point of failure, but you can sure reduce them to a minimum.

Or you can choose to increase next years bottom line and thus your bonus by not having a robust DR plan or system. You can also skip on boring things like raid and backups.

The trick for a CxO is to ensure that when failure happens, it's massive and widespread. Then it's not your fault. The CxOs in a given industry won't be fired because their DR plans didn't work because they believed Gartner and all their CxO chums in competitors did the same thing.

Nobody got fired for choosing IBM/Microsoft/Cisco/Crowdstrike/Azure, even if it's worse than the alternatives. People do get fired for bucking the trend even when it's measurably more reliable.


The update affected less than 1% of all Windows machines. [1] Although maybe the biggest software failure in history, far from the biggest possible one. The level of cloud connectivity in the world could basically break the world if we didn't have diversity.

[1] https://blogs.microsoft.com/blog/2024/07/20/helping-our-cust...


Diversity increases your attack surface however. You rather want redundancy and easy deployment or rollback of your clients and servers


Diversity means a successful attack will take out part of your operation.

Monoculture means a successful attack will take out all of your operation.


That is not a good model.

Cyber attacks rarely take down stuff directly. Rather attackers will establish a bridge head into your organization first and inspect the network and gather data for further (phishing) attacks.

Diversity only means more opportunities to install bridge heads.


let's not throw the baby with the bath water.

native desktop apps are absolutely necessary for most professional / serious work and native desktop apps need offline support too.

with cloud - your risk factor goes up massively.

the risk here is that most of these companies are reliant on windows and of course snake-oil salesman of antivirus tools.

if you have a proper native desktop app, that runs in a sandboxed environment then you simply wouldn't need crowdstrike and the likes.

unikernels / bsd jails are things that have been well known and will easily mitigate "security" issues.

even windows these days has sandbox mode.

but incentives rule the world.


I'm not sure I follow, I doubt the web vs native implementation of an application makes much difference when the terminal used to access it is unavailable. A cloud based web-app is not much help if no one has a working computer and browser.

I'm not sure we're quite at the stage where a check-in agent using their personal un-managed devices to handle passenger data via a web-app is a great idea.


It does make a difference, because now you can give end-users iPads or Chromebooks which don't need all this "security" BS.


They might not need them, but I'd be surprised if at least some companies don't install security BS on them anyway (just like they do on Linux machines), because of compliance reasons. It can't hurt, can it? (at least that was what most IT departments thought before CrowdStrike)


Try making a graph in excel online and then come back to tell us everything needs to move to the cloud asap.


Ok, just did. It went just about as smoothly as the desktop client. What's the hold up again?


Hmmm, my experience is vastly different. I wanted to make a graph using 5 cols of data, first col is x labels, then data for 4 lines. The cols are not next to each-other in the sheet. Then add linear fits trough that. Then give specific html colors (woops no impossible) custom colors and line types to the original lines and the fitted lines. It's possible but the ui is terrible. Changing line type is simply bugged half of the time.


Counterpoints:

- Latency

- Security

- Legal obligations

- Offline work

- Managing the different sources of locking.

- Avoiding a single point of failure (I get the irony).


> training people to use an iPad or a web browser seems too complicated

iPads aren't designed to be turned into kiosks or airport departure displays and web browsers aren't operating systems (except maybe ChromeOS). So this advice boils down to don't run Windows, but CrowdStrike has caused outages of Linux as well.


By the way, ChromeOS is a perfect fit for digital signage and kiosks. It's officially supported.


It blows my mind how many people actually believed the claim -- clearly in the obvious-joke category -- that SWA is running their mission critical flight systems on Windows 3.1. (Yes, Southwest runs a lot of old tech in their stack, but that claim is patently hyperbolic.)

People need to stop believing everything they read on the Internet and have a little bit of skepticism.


It's insane to me that CrowdStrike's stock is still up 66% year-over-year.

With all of the angry customers, lots of incoming lawsuits, and the fact that their "protection" is provably more costly than no protection at all now - I can't imagine why investors aren't dumping it like mad.


My guesses: -no one really cancels their security vendors since security budgets don't shrink -they have a big moat so their customers won't be able to leave them


You don’t drop them until budget renewals, at least not for this. Solarwinds comes to mind as a company with a similar kind of thing.


I am confused why they are around to begin with.

Companies already trust Microsoft, they buy Windows, Office, Azure.

Why would they bother with a 3rd party here when the low effort low risk solution is to pick the tool made by the OS vendor. I.e. windows defender

It should be a nobody gets fired for picking IBM situation. How did this random place get so much credibility that people trust them over the manufacturer?


Because they provide far more protection than Windows defender. You can write your own custom never-before-seen malware, and CrowdStrike will detect it purely based on behavioral signals. Windows Defender is still largely an antivirus solution.


Microsoft's E5 offerings are a direct competitor to Cloudstrikes threat response products which is a lot more than just Windows Defender on endpoints. I'd imagine many of Cloudstrikes customers will be looking to move this to MS's tools instead as a result of this.


crowdstrike has oracle enterprise sales model. have you ever been to one of their events?


1. Compliance. No protection at all isn’t a contractual option in many cases.

2. Companies react slowly. When has a vendor paid a high price for failure? Boeing can kill people and fail time after time still sell planes.

Catastrophe always changes less than anticipated.


CrowdStrike makes it easy to pass your security audit. That's where the value is.


Good news then! You can short it and make a ton of money if you're confident this share price increase is a mistake.


i replied upthread i think the stock price has bottomed from this event. It's way to hard to switch vendors like this at an enterprise scale. What's going to happen is the cloudstrike account reps are going to get yelled at and abused, some discounts are going to be offered for annual renew, then two years from now all will be forgiven/forgotten. In a year or so the stock price will recover and trend to more or less where it was before this event. I've already bought as much stock as I could.


> It's way to hard to switch vendors like this at an enterprise scale.

Huh? Once you get the control plane/backend of a new AV vendor configured, you just uninstall AppA and deploy AppB on your nodes.

It's not like Crowdstrike is deeply integrated with other systems: it's an agent.


This is massively underselling the kind of change management processes and potential challenges of scale a deployment like this would require at large enterprises. It's never as simple as "deploy app to nodes". Approvals, maintenance windows, deployment in waves (ironic I know, given the nature of the outage in the first place). Most places I've worked would require deployment to many sets lower environment machines of different functions first, then allow time to "bake" and ensure no issues crop up after things have settled. You would NEVER just yeet out a new agent to critical production systems without extensive change management, testing, and validation. I've deployed different AV products multiple times throughout my career (including Crowdstrike). It was never simple, and almost always took months to complete.


What this also tells me is there are a lot of computers connected to the internet that probably shouldn’t be.


Hmm. I think this is a pretty shallow take.

My experience from the airline industry is that the vast majority of systems classified as flight safety critical are not connected to the internet, or large networks at all. Which is good.

But unless we want to drastically change how airlines operate, the rest need to be online.

Today, you can purchase a ticket (or rebook an existing one) on your phone really close to the departure time. When that happens, a gazillion interconnected systems, across legal entity borders, need to cooperate to take you (and your luggage) to the destination.

To put all of this in a large non-internet network seems pretty pointless.

If we wanna go down that route, the only real "security improvement" I can think of is to dismantle the digital systems and go back to paper. Like Ryanair did during this incident. Handwritten boarding passess, verified against print-outs of passenger manifests.


> the vast majority ... are not connected to the internet

But those couldn't have gone down due to Crowdstrike.


It's an interesting thought experiment to consider everything that would have to go into running operations for a business like one of the largest airlines in the world using a non-Internet connected network. Among other things, you probably lose the ability for your employees who aren't physically in the office (which is kind of a lot of them if you're running an airline) from interacting with your operations network. If you're an airline trying to schedule employees and share information with them while they're in hotel rooms, that's probably a deal breaker.

That's really only a secondary problem though, because disconnecting a network from the Internet isn't a replacement for security software or software updates, so you wouldn't even avoid the root cause of the issue here. I'm not saying CrowdStrike is essential software for Internet connected computers either, but if your business thinks it is, you should probably be running it on your "airgapped" computers too. And you should definitely be installing updates, so you can still fall victim to a bad updates regardless of which software you run. At best you perhaps increase the likelihood of hearing about a problem with an update before you deploy it on disconnected computers, but you can get a similar effect by delayed deployment of updates even on Internet connected networks.


The related question is "How do you run your business out of downtown Fort Worth (American Airlines) and get your updates to 350 airports in 60 countries?"

Saying "run your own network" isn't exactly practical. Even imagining the very small airlines (that partner with big ones for the last leg) that only service a handful of rural airports, this doesn't seem practical.

The days of point to point updates over modems are overish with the amount of data that needs to be consistent transmitted and available.

I can imagine a modem at each airport and a phone bank of about 700 modems that are each getting or sending updates. The long distance calls to distant countries for that data could get expensive. Woe to the power outage in Texas that takes down the phone bank for a day or two or three or four.

Alternatively, there's a system that was developed to do this and it works pretty well most of the time. Combine this with having redundant systems there that are geographically separated. It isn't turnkey, but its probably better than other options that would involve home grown solutions.


Air-gapped networks have gotten more scarce in the day and age of the cloud computing. Expect for certain DoD and cleared spaces, I've even seen PLC networks internet connected....


So Web 2.0 was a mistake I take?

Problem once again is humans. Humans need to interact with systems, either receive information from them or give it to them or use the systems to process it. And for efficiency in general it nowadays happens online. It could be offline, but that would be slow. Or it could be segregated networks but that would get really expensive. Imagine having own fiber line for your instant messaging and email? With different terminal...

In the end most of these affected computers are on Internet for very good reasons. And this model really is working vast majority of time generating lot of efficiency.


I suspect it is incredibly challenging to keep a non trivial number of computers available, but somewhat airgapped. Too many ways to unintentionally bridge the networks without extreme diligence. Which is slightly incompatible with something like airlines which employ enormous numbers of people.


These are not small time operations (most of them). They are multi billion dollar companies with complex technical needs. This is doable and the bread and butter of good networking engineers.

Having not done this just cost them billions. Now imagine the US at war or another nation state that just wants to cause havoc.


That's ridiculous. The cost of building air gapped operations systems, and including the resulting loss of productivity and efficiency, would be worse than just accepting an occasional outage. People who lack technical competence and operational experience always tend to over react to software failures.

The major airlines may not be "small time operations" but none of them are even in the top 100 US corporations by market cap. They simply don't have the level of IT resources or competence that we see at major tech companies.


It doesn’t need to be strictly air gapped, but there’s nothing technologically demanding about a computer used to check people in on a flight, it could be done over a 9600 baud modem and a thin client.


Nothing technologically demanding about global logistics software, being run by an industry with 0% profit margins (give or take, airlines don’t make much money), being asked to cripple itself 24x7 to avoid one weird outage every several years?


Crowdstrike is a trusted application on every computer in these people's farms. It is not uncommon to have specific rules for these packages to be downloaded directly from the Internet.

You're suggesting either that Crowdstrike itself will get used as a vessel for an attack, or that banks and airlines have firewall rules open for enemies.


Exactly, most of these systems probably wouldn't require EDR software if networking was done correctly in the first place.


Sounds like we saved a lot of tons of CO2.


Hard to say, it could actually increased emissions. As when the timings of things don't align correctly it's common to cause an increase of resource usage. Eg. people travelling less optimal route, extra commutes back and to the airport. People having to physically travel to datacentres in order to fix things, just rebooting the machine without need will use more CPU.


Or planes taking less efficient routes or flying at faster speeds to “make up time”.


Always look on the bright side!


In my country, companies are required by law to keep track of their CO2 emissions.

In the case of CrowdStrike, they would have been able to deduct this event from their emissions for decades.


> Apparently Southwest Airlines’ ingenious strategy of never upgrading from Windows 3.1 allowed it to remain unscathed.

this is pretty damning both ways

on the one hand, it's insane, unfathomable and inconceivable that anyone can run anything critical on windows 3.1 (!!!)

on the other hand, it's equally insane, unfathomable and inconceivable that those who do are actually better off - 30 years of "progress" is actually just bs? what are we as an industry "even doing here"???? is computing actually a solved problem and we're really just mostly reinventing the wheel and enshittifying perfectly already working systems?


From my memory, this wildly circulating Windows 3.1 quote is inaccurate. The software they were running was compared to running something like Windows 3.1, but it wasn't actually running on Windows 3.1, as far as I understand.

Edit: https://kotaku.com/southwest-airlines-windows-3-1-blue-scree...


Obviously Southwest is not using Windows 3.1, and you should probably be thinking hard about the trustworthiness of any outlet or article that repeats that claim: https://www.osnews.com/story/140301/no-southwest-airlines-is...


Thirty years of progress is still progress for managing the additional complexity that modern software needs.

A lot of software doesn't need that additional complexity. Having thirty year old software, if properly sequestered (since there are security holes large enough to fly a 737 through), means that this is software that has been working for three decades. It has issues (as the mess they had previously showed), but Southwest appears to be able to be able to manage that to some degree without needing to incur the additional complexity of managing a modern software stack for application software that doesn't need it.

The ability to play minesweeper on critical computing equipment without impacting it isn't necessarily a desirable feature. Having the computer boot in five seconds and run the desired application is.

And there are a number of ways to handle that ... running old operating systems is one of the ways. Space Force S02E07 is not a desirable situation https://youtu.be/xDLvUqhwHZc . You could also have a kuberentes cluster with multiple replicas and load balancing and all of that additional complexity that takes more people to be able to manage without any real gains in what the application itself is doing.


> Having the computer boot in five seconds

There are certainly more modern options that allow that and it's highly doubtful that specifically is particularly relevant for Southwest.

Not that there is any evidence that they're actually using 3.1 for anything?

> that this is software that has been working for three decades

Or it's so buggy or designed (or more likely updated) so poorly that everyone is afraid to touch it. e.g. I doubt there are many (even any?) practical reasons for airlines to use GDS besides the cost and complexity involved in designing an entirely new system and somehow forcing all other airlines to switch to it?


> Not that there is any evidence that they're actually using 3.1 for anything?

Windows 3.1? No. I'd even say there's no evidence that windows 95 is being used but rather that they've got what appears to be some old software with older design.

https://www.dallasnews.com/business/airlines/2022/12/30/what...

> 2. The crew scheduling system is the main culprit.

> Southwest uses internally built and maintained systems called SkySolver and Crew Web Access for pilots and flight attendants. They can sign on to those systems to pick flights and then make changes when flights are canceled or delayed or when there is an illness.

> “Southwest has generated systems internally themselves instead of using more standard programs that others have used,” Montgomery said. “Some systems even look historic like they were designed on Windows 95.”

Screen shots of this are in http://www3.alpa.org/LinkClick.aspx?fileticket=IO7kd%2Bfm2Do...

Unfortunately, I don't know the nuances of Microsoft Windows UI well enough to be able to pick out which OS version is running the software in those screen shots.

---

> Or it's so buggy or designed (or more likely updated) so poorly that everyone is afraid to touch it.

That is a very common occurrence (I'm dealing with that now ... a .jar file that hasn't been rebuilt in 15 years). The big rewrite is something that comes with one part excitement (I can do it right this time!) and dread (oh my, that's how much code that I need to retest?!).

I was involved in the tail end of a 3 year project at one company with some software that replaced previously running DOS (and yes, it was DOS - they had an C and assembly guru employed who's job it was to remove / optimize code in the binary to get it to fit into 640k) to a Java Web Start (which was a neat technology) and the millions of lines of software that monstrosity had and needed to be debugged and fixed.

While they're in a better spot now (can use modern hardware), and its something that they can build in house (a major part of the reason to do it was to drop the external contractor who didn't like maintaining the C code) ... but that also came with the added complexity of the software that they licensed and the maintenance and deployment of that software. Before they could put the software on a floppy and have it shipped to each location ... now its a big bigger and more complex of a deployment (that was built with duct tape and chewing gum one night to do diff deployments of specific class files rather than trying to push the entirety down the pipe for each location).

My rambling point is that we are moving forward with complexity - and that allows us to manage more complex situations, but it comes at the cost of managing that additional complexity of the infrastructure and software it needs and that cost is ongoing and not always taken into account.


Those screenshots look more like WinXP to me with the rounded and shaded button elements. It’s the boring grey and buttons people presumably associate with the 90s.


Most times its less about a system being poorly designed and more about it being able to solve very hard problems which most existing employees today haven't even heard of. Institutional knowledge plays big time on these decisions.


It was a "troll tweet" says the guy who started the rumor:

https://x.com/ArtemR/status/1815408553131426179


You don't want to know what OS Sabre (backend for 30% of world's airlines) is using on their mainframes.


I do actually. My last job had a mainframe team maintaining (and adding to) an AS/400 application. They still had punchcard programs.

They had json apis. Each one had some variation on parsing http from a raw tcp connection with IBM RPG. I had to do some unspeakable things to a ruby library so I could control the order of the headers.


Looks like it was IBM System/360 mainframes but they've recently migrated to google hosted services.


actually, I do :) is it DOS? some IBM mainframe OS? do tell


Was DOS actually every used on mainframes/servers on a significant scale? (genuine question, not saying it wasn't)


A mainframe OS called DOS was in fact quite popular, but it’s not the same thing as the DOS that was in PCs. (There were others, too, like Apple ][ DOS. As soon as your computer gets the capability of attaching a disk drive, somebody has to write a Disk Operating System.)


Well, what actual new features does Windows 11 give you compared to Windows 3.1?

It will support a huge number of new chips, new peripherals, more memory, and so on.

If I'm running Southwest's crew scheduling software, how much of that do I care about? Do I care that it will now support the latest Bluetooth? Do I care that it now has the same UI as tablets? Do I care that it has better ads to display on the start menu? No, no, and no.

The only thing might be more memory. (I mean, the UI might not look like it belonged in the Stone Age, so that's something, I guess...)

There hasn't been a real fundamental improvement in the functionality of OSes since Windows 3.1. It's all been device support (including new classes of devices), new CPU support, and new UI styles. (The security improvements in Windows were a legitimately big deal, but those were fixing what was broken, not adding new functionality.)

And I'm sure that, having said this, someone is going to point out something really important that I forgot...


> If I'm running Southwest's crew scheduling software, how much of that do I care about?

Not a lot of if you can't/don't want to upgrade or replace that software to make sure it runs on modern OSes. I'm sure that for the most part that software is causing various unnecessary issues and decreasing potential productivity at least to some extent. Just look at GDS, they love to get rid of that, but that would require a coordinated effort and extensive collaboration between all major airlines which is somewhat tricky.

> There hasn't been a real fundamental improvement in the functionality of OSes since Windows 3.1.

Multitasking? A massive amount of other important features that matter if you want to build new software or significantly improve what you're using now.

Also you seem to be downplaying security a but too much? Those devices would need to be carefully isolated from everything else (not that as I understand there is any evidence that Southwest Airlines is actually using 3.1?).

> but those were fixing what was broken, not adding new functionality

It's like saying that every new feature in introduced in any type of software that wasn't entirely novel was actually fixing stuff that was broken and wasn't "new". I guess some would apply to the claim GUI/desktop wasn't something new but it was just "fixing" (inherently "broken") command line interfaces?

Being able to design significantly objectively better (based on how much it could increase productivity) is I guess is not strictly tied to the OS at least in some cases. But it certainly make it a lot cheaper/easier.


Multitasking. Yeah, I'll give you that. That is actually a huge step up.

I'm presuming that Southwest's internal software backends are not internet exposed, which is why I'm downplaying security.


Well all you'd really need to do to avoid this outage is not run auto-updating proprietary kernel modules in the early anti-malware environment. Bare Windows 11 would have been fine - the problem was Crowdstrike.


> is computing actually a solved problem and we're really just mostly reinventing the wheel and enshittifying perfectly already working systems?

On a whim I tried playing Solitaire on windows the other day. You know, that game that’s shipped with windows since forever. Well, it’s horrible now. When I tried firing it up, it first spent several minutes downloading software updates. Then it loaded in some horrible “casual games bundle” app which felt laggy like a web app - complete with Xbox cloud sync for my progress, and daily achievements and other junk.

The game used to run flawlessly on my old 486. My computer now is orders of magnitude more powerful - but solitaire feels laggy. I bet the entirety of windows XP is smaller than the “update” it performed to install solitaire.

I have a personal theory that there’s always something that gets the attention of the best engineers. Decades ago it was human interface guidelines and UI toolkits. Today it’s LLMs and AAA game engines. Most of the rest of the software in the world is worked on by the B team. And they don’t blink an eye at the idea of rewriting solitaire for windows on top of electron. If JavaScript is all their team knows, so be it. Heaven forbid we have to learn how to properly build software for windows.


>Heaven forbid we have to learn how to properly build software for windows

They already are properly building software for Windows, and your Solitaire app is a good example of it. It's much, much better than it used to be: it's laggy and slow, and downloads a bunch of extra crap, which has the potential of getting you to spend more money. In short, changing from the old and simple Solitaire to this bloated mess makes Microsoft more profitable, and customers like you keep using Windows regardless, so why shouldn't they do this?

Meanwhile, on my Linux box, my simpler games haven't changed substantially in years, if not decades (perhaps updates for the newer GUI toolkits being used), and still work great. But people prefer to stick with Windows and then complain about things being too bloated.


> which has the potential of getting you to spend more money.

I don't think any of the extra stuff is monetised. I don't know; I didn't look closely at it.

> customers like you keep using Windows regardless ... But people prefer to stick with Windows and then complain about things being too bloated.

"Stick with windows"? I run windows, macos, linux and freebsd at home. I want all of them to be better. It pains me to see any of the great operating systems of our time fall slowly toward mediocrity.

Of course I complain about it. Windows should be better than this.


>I don't think any of the extra stuff is monetised. I don't know; I didn't look closely at it.

If they weren't making money out those changes somehow, they likely wouldn't be dedicating resources to making those changes.

>Of course I complain about it. Windows should be better than this.

Why should it? It's not yours, it's Microsoft's, and it exists solely so their shareholders can extract money from customers. It's not there to be whatever you think an OS should be. They're doing things to it that they believe will improve their profits, and that's all Windows needs. As long as MS makes changes to Windows that improve profits, that by definition makes Windows "better".


It’s obviously not that simple.

Microsoft makes more money when the broader community wants to use their products. Users are stakeholders.

You’re right in part - Microsoft’s incentives aren’t perfectly aligned with mine. But they aren’t in opposition either. We’re both better off when Microsoft makes their products better for me such that I give them more money. Microsoft will try things on - of course! But if the broad community hates what they’re doing, they will stop doing it. They must, if they want to remain in business.

Which is a round about way of saying, yes of course windows isn’t “mine”. But as I’m a paying customer, my opinion still matters - or at least, the opinion of their customers in aggregate.

When you’re a customer in a situation like this, you have at least 2 choices: Voice and Exit. In this case, exit means dumping windows. And voice - at least how I’m exercising it - means making a bit of a stink about it in a forum like HN where microsoft engineers sometimes visit.

If you don’t think this is the forum for that, feel free to downvote my comments. But in the meantime, it seems mighty strange of you to try and convince me that my opinions don’t matter and I should… what exactly? Be quiet and take it instead? Provide no feedback and quietly migrate to another equally imperfect ecosystem?

I’m not sure how I would be helped by being smaller in the world.


I think your analysis is highly flawed.

>Microsoft makes more money when the broader community wants to use their products. Users are stakeholders.

>But if the broad community hates what they’re doing, they will stop doing it. They must, if they want to remain in business.

This mostly isn't true. MS customers are going to use Windows no matter what, unless MS somehow makes it so completely unusable they're forced to abandon it. So they can make whatever annoying changes they want, and their customers aren't going anywhere.

However, this isn't true of everything MS makes, so you have to be careful not to conflate Windows with anything else. If people hate Teams enough, they might be convinced to switch to Zoom or whatever. This just isn't the case with Windows. They're free to piss people off as much as they want here.

>But as I’m a paying customer, my opinion still matters - or at least, the opinion of their customers in aggregate.

No, it really doesn't. You aren't paying for a Windows subscription, you paid for a Windows license. You probably paid it when you bought your PC. Or if you're a big company, you probably have some sort of site license, because you made the decision to use MS products across your organization. And if you're BigCo, you probably don't have Solitaire installed on employee computers anyway. So MS making the solitaire game slow and shitty isn't going to change the amount of money they get from you: they already got your money when you bought your PC. Even worse, you probably didn't have much choice: there aren't a lot of no-OS computers unless you assembled yours from parts, and those that exist (or come with Linux pre-installed) usually cost more, because MS isn't making money from you so much, but from kickbacks from all the crapware that's pre-installed.

>...a forum like HN where microsoft engineers sometimes visit. >it seems mighty strange of you to try and convince me that my opinions don’t matter and I should… what exactly?

What makes you think MS engineers have any sway at all over the user experience? That stuff is decided by upper management, product managers, etc. And they don't care about you; they only care about the company's profitability. Adding more crapware and ads into Windows makes it more profitable, so that's what makes sense for them to do. Making a bloat-free, high-performance OS without any annoying ads or other crap doesn't make them more money, because you and everyone else are going to keep using Windows (and buying new computers with it pre-installed) no matter what.

>it seems mighty strange of you to try and convince me that my opinions don’t matter and I should… what exactly? Be quiet and take it instead?

Basically yes: you're wasting your keystrokes complaining about Windows enshittification, because your opinion really doesn't matter. It's not like MacOS, where someone has to actively want to use a Mac to go buy one, or Linux where you not only have to actively be willing to buck the trend, but also choose which of dozens of distros you want to use. With Windows, it's just the default choice for 90% of users, and they're not going to switch to something else, which MS has found out over the course of decades now.

>Provide no feedback and quietly migrate to another equally imperfect ecosystem?

Other ecosystems are far more likely to take your complaints seriously and to worry about your user experience.


I hear what you're saying; I just think your worldview is too cynical. The world isn't full of middle managers micromanaging engineers and laughing all the way to their overflowing bank accounts. People want money - sure. But people also want to do good work. If the world was really as dismal as you think it is, microsoft would destaff the entire windows team and just let the ecosystem rot while reaping in their locked in revenue.

Thats not what happens.

Windows is worked on constantly. And most of the changes are clearly done with the intent of making the product better. Things like IO completion ports. C# and the .NET ecosystem. Support for big/little (P/E) cores in the kernel. Edge replacing IE. ARM support. And so on. Of course like any big company, they make plenty of bad choices - like telemetry, the "watch everything you do" AI assistant, and so on. But its too simple to paint any large company with the "evil megacorp" paintbrush. Microsoft has 200 000 employees. I'm sure some of them are exactly who you think they are. But some - I suspect most - of them really want to do a good job and make products their customers want to use.

Just like any big company.

> Other ecosystems are far more likely to take your complaints seriously and to worry about your user experience.

Are they though?

Parts of Apple are great. And other parts are obviously horrible, extractive and greedy. Did you know companies can't use the NFC chip in Apple phones - in my phone, which I've already paid for - without being paid millions of dollars for the privilege? I'd love it if I could pay for public transit in my city using my phone, which I own, but my government doesn't want to pay the extortionate price apple is charging to bless them with that capability. Horrible.

Companies aren't good guys or bad guys. They're just big groups of people. And people are complex, and they have a wide variety of incentives and drives. Calling companies out for bad behaviour and sloppy work is important - even though it often has no effect. "Killed by google" is infamous inside google. Microsoft rolled back their AI assistant thing after the bad press.

If you're looking for great products made by companies who are ethically spotless, well, there aren't a lot options. Linux on the desktop is pretty decent these days. But I still use windows for gaming. And I'm typing this on a mac.


>The world isn't full of middle managers micromanaging engineers and laughing all the way to their overflowing bank accounts.

Now this isn't what I think either. It's the top executives mainly that drive this stuff, since they set the direction for the company. Middle managers are just pawns, except for the ones who are empire-builders trying to work their way up. Also, I'm not claiming that all companies are this bad, but I think MS is a unique company, and Windows a unique product, because of its monopoly status. Other companies really can't afford to piss off their customers too much because they'll just jump ship. MS doesn't have this problem with Windows. Remember how paranoid they were about Linux back in the early 00s? Now they don't seem to care at all, and I think it's because they figured out they had nothing to be worried about after all: almost no one was going to stop buying or using Windows (or even if they did stop using it, they still bought a license with their PC anyway; the addition of pre-loaded crapware also changed the economics here so they weren't worried about actual license fees from individuals).

>If the world was really as dismal as you think it is, microsoft would destaff the entire windows team and just let the ecosystem rot while reaping in their locked in revenue.

Of course, they can't just go to this extreme. They'd keep taking in money, but things would go down eventually, and they wouldn't grow revenues either, and remember, Wall Street wants never-ending growth. And as you point out, the ecosystem is important (as Apple has shown); they make a lot of money from all that other stuff too. Windows itself might not even be that much of a money-maker these days.

>Of course like any big company, they make plenty of bad choices - like telemetry, the "watch everything you do" AI assistant, and so on.

MS can afford to do a lot more bad stuff like this, because Windows users aren't going anywhere. It's why they can bake ads into the OS.

>I'm sure some of them are exactly who you think they are. But some - I suspect most - of them really want to do a good job and make products their customers want to use.

I'm sure there's some of them too, but they're not pulling the strings. They do get to work on cool stuff now and then of course, like the interesting kernel features you listed, but those are in support of their long-term goals (like being prepared for a post-x86 world).

>Did you know companies can't use the NFC chip in Apple phones - in my phone, which I've already paid for - without being paid millions of dollars for the privilege? I'd love it if I could pay for public transit in my city using my phone, which I own, but my government doesn't want to pay the extortionate price apple is charging

No, I didn't know that. It doesn't surprise me though; Apple was like this with Firewire too, and it's why Firewire is dead now. But that's weird too, because here in Japan Apple users routinely use their phones for the public transit, and I kinda doubt the IC card companies paid that much, though I guess it's possible. (Android users use their phones too, but only if their phones are Japanese models. iPhones however all have the Felica NFC chip needed for Japanese transit systems.)


FWIW, that newer Windows Solitaire app isn't actually using Electron now; it's a UWP app using XAML. I guess that just goes to show that the problem isn't Electron specifically.


In the long run the hardware that can still run Window 3.1 will become harder and harder to find and they'll be forced to upgrade, but currently enjoying the benefits of "if it ain't broke don't fix it". Plus, there were literally millions of systems made that can run Windows 3.1 so it will be many many years before the hardware is too hard to find.

We're talking about a problem on the scale of 4,000 flights per day. Assuming you avoid O^2 complexity computations that's the sort of thing even 90s computers could handle easily.


>> Plus, there were literally millions of systems made that can run Windows 3.1 so it will be many many years before the hardware is too hard to find.

Two of my first contractor roles I had as a developer really opened my eyes to a lot of this.

We were building an inventory management system for a large company that built farm equipment. We started building it and once we got to the browser and mobile requirements, one of the VPs spoke up and asked if it would run on IE6 since they had not one, but THREE of their inventory legacy systems that still ran on Win98. This was in 2014, a full 6 years after many companies had stopped supporting it. And another 4 years since websites stopped supporting it.

The other one was for a very large, regional construction company. Same thing, we were building a web app for them and in one of the conference calls, one of the VP's was asking how this would run on Windows95 for the same reason. They had several legacy ERP systems that were running on Win95 and had specific requirements for stuff to run on that OS.

As a developer who was used to working with somewhat current tech - it was a real eye opener. It was crazy to think how many massive companies just didn't have the constitution to upgrade their stuff, and then by not doing so, had now dug themselves into an even deeper hole.

Once I started hearing stories about the details of this scenario, it made perfect sense to me since I had seen it multiple times. And not from little companies who didn't have the money or resources to upgrade, but massive Fortune 500 companies who just neglected their stuff until was too late.


You can basically run Windows 3.1 in dosbox on a potato now, so the hardware really isn't even a problem. If any of this was actually true...


productivity wise 70's and 80's peaked with those thin terminals with every action carried out by keyboard... workers tapped at light speed due to muscle memory, didn't look fancy, but it got the job done. GUI is sexy but like short videos ultimately did nothing for the user


> productivity wise 70's and 80's peaked with those thin terminals with every action carried out by keyboard

Computers only started showing up in nationwide productivity figures in the 80s to 90s.


> GUI is sexy but like short videos ultimately did nothing for the user

That's a stretch.

I guess it increased the productivity (measured in amount of "work" done, not necessarily something useful) expectations for most workers which effectively did nothing for them because they still need to work as much even if modern software allows them to accomplish much more in the same amount of time. So t might make sense in that regard.

> workers tapped at light speed due to muscle memory

That's great if we're mainly talking about robotic tasks than can be mostly automated to only require a fraction of those clicks but wasn't for some unclear reasons.


Almost every industrial embedded system I've ever used runs Windows XP at the absolute newest, and it is not uncommon in the slightest to see stuff as old as 95/3.1. These are computers that operate machinery that costs 6 or 7 figures. If it ain't broke, don't fix it.

Don't get me wrong, my Macbook is an absolute beast for all of my work tasks, and my gaming PC is an utter joy to use for my recreation time, but at the end of the day, for a ton of applications, a computer doesn't need to do shit beyond sending a lot of signals out of a parallel/RS232 port to control systems to operate... I mean Christ, anything. CNC mills, building lighting/security systems, packing machines, or to do things like issue tickets to people parking in a ramp. Like... a lot of this stuff just does not benefit at all from a modern software stack. Stick a crappy PC inside instead, load it up with the same image it had before which includes firewall rules that shut down every last port and connection apart from whatever needs to manage it, and you're done.

Don't fix what ain't broke.


It’s like a calculator, if you need to do basic math, you can use an old calc.


> is computing actually a solved problem and we're really just mostly reinventing the wheel and enshittifying perfectly already working systems?

80% of the work is json bureaucracy

The other 80% is adapting to new requirements

And if you’re lucky maybe 0.1% of the time you get to build something new.

Fear not, a lot of this stuff was perfectly solved with pen and paper long before us computer nerds came to play in the big boy sandbox


It is BS. Continuous updates for security notion, especially so. That said, the barrier to entry for programmers did come down significantly.


> Apparently Southwest Airlines’ ingenious strategy of never upgrading from Windows 3.1 allowed it to remain unscathed.

OMFG, does this mean we need to be prepared for a (juicy) “IT failure” that brings down Southwest at some point?


Southwest experienced this kind of scheduling issue in 2021 [1], and again in 2022 [2]. Honestly, if they're running win 3.1 or win 95 as suggested, I think that puts them in a better place tech wise than keeping up with the Joneses on the upgrade treadmill --- although they should consider updating to windows 3.11, because they have a workgroup :P and the microsoft hearts network is pretty cool; but they have historically done poorly on scheduling after a significant disruption. An article from last year [3] says they updated their crew assignment software as well as increased staffing in colder airports and in general and got more deicing equipment. We won't really be able to tell if it works, until they experience another disruption.

[1] https://www.cnbc.com/2021/10/12/southwest-airlines-reduces-c...

[2] https://www.npr.org/2022/12/26/1145536902/southwest-flight-c...

[3] https://www.npr.org/2023/11/09/1211064462/southwest-airlines...


This isn't true, and that should have been obvious to technical people. It's so sad that we have a tech media that doesn't give a damn about making things up.


Southwest had two of these recently. It was widely reported:

https://en.wikipedia.org/wiki/2022_Southwest_Airlines_schedu...


You don't even have to wait it's been happening


For everyone flabbergasted by Southwest running ~Windows 3.1~ old software, I have bad news about the telecom industry. I worked at Ericsson at an R&D branch and one of the projects in the works was to move one of the main pieces of routing equipment that handled millions of telephony operatorations a day away from an ancient version of Windows.

A lot of code lives on much longer than you think. The general attitude we took was that most of the code we were writing would be running for at least 30 years. And that was the attitude at an R&D branch, arguably a side of that industry where we were working on the new tech.

Edit: Win 3.1 or something else, the point still stands. There is a lot of old software running out there that will continue to run our core services. Legacy software doesn't just mean v1 versus v2, it can mean v1 versus v41.


>For everyone flabbergasted by Southwest running Windows 3.1

Southwest isn't running Windows 3.1, though. That's some rather lame, but predictable, truth-through-repeated-assertion thing on social media.

Not everyone uses CrowdStrike, and in this case SW was the lucky one that didn't.


Southwest does not use Windows 3.1. Why does not one read the article?

Southwest wasn't affected because they don't use Crowdstrike. That's it.


I did read the article. It links to https://www.techradar.com/pro/security/southwest-airlines-av... perhaps you might want to read it again?

Notably, crowdstrike won't run on 3.1, and thus you're kinda right.


No one runs servers with Windows 3.1. They would have used Windows NT.

The really damning bit is Windows 3.1 did not have preemptive multitasking. It barely had networking. You couldn’t run a server with it if you wanted to.


> They would have used Windows NT.

Or, at the time, OS/2


Yes and that article is wrong.


Win 3.1 or not, the point still stands. There is a lot of software out there that has been running for a really long time and will continue to do so.

Relatedly, it is nice to provide a source for your claims. I did see this [0] which would have been an appropriate thing to link

[0] https://kotaku.com/southwest-airlines-windows-3-1-blue-scree...


Dude, take the L. But for that false story being recited, you wouldn't have made that point in the first place.

> For everyone flabbergasted by Southwest running ~Windows 3.1~ old software

You said it; own it. You could have said "For everyone who was tricked by the joke that Southwest runs ~Windows 3.1~ old software" but didn't.


That is fair. I wasn't aware of the kerfuffle around whether Southwest was using 3.1 or not. I took the source and linked source at face value. It would have been nice to have had someone do more than "nope" and instead link to a reputable source. This is how you counter disinformation.


It had already been done a dozen times in the original item across multiple threads.


Good thing a lot of our banking still runs on mainframes, will never be taken out by crowdstrike


I find this more surprising, even if the Southwest & Win3.1 claims were true, I would expect most Ericsson systems to be Erlang based and thus happily chugging along on a (perhaps ancient) Linux box.


I did too. But I knew of only 1 project that was using Erlang. I have always wanted to use it.

Instead I saw a lot of MML and (happily) TCL.

https://en.m.wikipedia.org/wiki/MML_(programming_language)


> Apparently Southwest Airlines’ ingenious strategy of never upgrading from Windows 3.1 allowed it to remain unscathed.

The “ingenious” strategy saved them from a weeks worth of downtime this year. But that same “ingenious” strategy was the primary reason for their meltdown in 2022

[1] https://www.npr.org/2022/12/30/1146377342/5-things-to-know-a...

[2] https://www.nytimes.com/2022/12/28/travel/southwest-airlines...


It’s also not true




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: