So CrowdStrike is deployed as third party software into the critical path of mis...

lelanthran · 2024-07-19T10:27:20 1721384840

> The risk of ZScaler being a central point of failure is not considered. But - the risk of failing the compliance checkbox it satisfies is paramount.

You're conflating Risk and Impact, and you're not considering the target of that Risk and that Impact.

Failing an audit:

1. Risk: high (audits happen all the time)

2. Impact to business: minimal (audits are failed all the time and then rectified)

3. Impact to manager: high (manager gets dinged for a failing audit).

Compare with failing an actual threat/intrusion:

1. Risk: low (so few companies get hacked)

2. Impact to business: extremely high

3. Impact to manager: minimal, if audits were all passed.

Now, with that perspective, how do you expect a rational person to behave?

[EDIT: as some replies pointed out, I stupidly wrote "Risk" instead of "Odds" (or "Chance"). Risk is, of course, the expected value, which is probability X impact. My post would make a lot more sense if you mentally replace "Risk" with "probability".]

vasco · 2024-07-19T10:50:11 1721386211

Moreover no manager gets dinged for "internet-wide" outages unfortunately, so the compliance department keeps calling the shots. The amount of times I've had to explain there's no added security in adding an "antivirus" to our linux servers as we already have proper monitoring at eBPF level is annoying.

dijit · 2024-07-19T10:57:28 1721386648

I'd be fired if I caused enough loss in revenue to pay my own salary for a year.

I am responsible for my choices. I'm CTO, I don't doubt that in some cases execs cover for each other, but at least I have anecdotal experience of what it would take for me to be fired- and this is clearly communicated to me.

hobs · 2024-07-19T11:29:49 1721388589

Hope you get paid a lot! Otherwise you are either in a very young or very stupid job.

I regularly spend multiples of my salary every month on various commitments my company makes, any small mistake could easily mean that its multiples of my salary type of problem within 10 days.

datavirtue · 2024-07-19T13:03:54 1721394234

A friend of mine spent half a million on a storage device that we never used. It sat in the IT area for years until we were acquired. Everyone gave him so much shit. Finance asked me about it numerous times (going around my friend the CTO) so they could properly depreciate it. He didn't get dinged by the board at all. It remained an open secret. We were making million dollar decisions once a month, though.

fluoridation · 2024-07-19T14:50:58 1721400658

What sort of storage device, just out of curiosity?

dijit · 2024-07-19T13:43:28 1721396608

> I regularly spend multiples of my salary every month on various commitments my company makes.

Yeah, same here.

But if I choose a vendor and that vendor fails us so catastrophically as to make us financially insolvent, then it's my job to have run a risk analysis and to have an answer for why.

If it's more cost effective to take an outage, that's fine, if it's not: then why didn't I have a DRP in place, why did we rely so much on one vendor, what's the exposure.

It's a pretty important part of being a serious business person.

hobs · 2024-07-19T13:54:20 1721397260

Sure, but that's not what I said or you said, and my commentary was about relative measures of your salary to your budget.

If you can't make a mistake of your salary size in your budget then your budget is small or very tight, most corporations fuck up big multiples of their CTOs salary quarterly (but that turns out to be single digit percentage points of anything useful.)

vasco · 2024-07-20T09:14:30 1721466870

So you never messed up ever? That's the only thing that can fulfill both your comments, unless you've also been fired.

ta1243 · 2024-07-19T16:13:50 1721405630

CTOs do not get fired because they chose a massive system like crowdstrike and it fails once a year

They would get fired if they chose a non-normal system and it failed once every 10 years

tivert · 2024-07-19T16:20:43 1721406043

> I'd be fired if I caused enough loss in revenue to pay my own salary for a year.

I'm not so sure.

I know of a major company that had a glitch, multiple times, that caused them to lose about ~15 million dollars at least once (a non-prod test hit prod because of a poorly designed too).

I was told the decision-makers decided not to fix the problem (the risk of losing more money again) because the "money had already been lost."

FrustratedMonky · 2024-07-19T11:08:07 1721387287

"no manager gets dinged for "internet-wide" outages"

Kind of like, nobody gets fired for hiring IBM, or using SAP. They are just so big, every manager can say, "look how many people are using them, how was I supposed to know they are crap".

But, seems like for uptime, someone should be identifiable. If your job is uptime, and there is a world wide outage, I'd think it would roll down hill onto someone.

dessimus · 2024-07-19T11:58:18 1721390298

> Kind of like, nobody gets fired for hiring IBM, or using SAP. They are just so big, every manager can say, "look how many people are using them, how was I supposed to know they are crap".

I wouldn't necessarily say IBM or SAP are "crap". It's much more likely that orgs buying into IBM or SAP don't the due diligence on what the true costs to properly set it up and keep it running, therefore cut tons of corners.

They basically want to own a Ferrari and when it comes to maintenance, they want run Regular gas and try to get their local mechanic to slap Ford parts on it because its too expensive to keep going back to the dealership.

beeboobaa3 · 2024-07-19T12:36:51 1721392611

> "look how many people are using them, how was I supposed to know they are crap".

if all your friends jump off a cliff, do you as well?

This is taught to children at a young age to teach them not to blindly follow others. Why do you think these adults deserve a pass?

vasco · 2024-07-19T13:03:40 1721394220

The thing is usually this argument goes something like this:

A: Should prod be running a failover / <insert other safety mechanism>?

B: Yes!

A: This is how much it costs: <number>

B: Errm... Let me check... OK I got an answer, let's document how we'd do it, but we can't afford the overhead of an auto-failover setup.

And so then there will be 2 types of companies, the ones that "do it properly" will have more costs, their margins will be lower, over time they'll be less successful as long as no big incident happens. When a big incident happens though, for most businesses - recent history proves that if everyone was down, nobody really complains. If your customers have 1 vendor down due to this issue, they will complain, but if your customers have 10 vendors down, and are themselves down, they don't complain anymore. And so you get this tragedy of the commons type dynamic where it pays off to do what most people do rather than the right thing.

And the thing is, in practice, doing the thing most people do is probably not a bad yardstick - however disappointing that is. 20 years ago nobody had 2FA and it was acceptable, today most sites do and it's not acceptable anymore not to have it.

beeboobaa3 · 2024-07-19T13:13:53 1721394833

That's a lot of words to say: "Yes, I will jump off a cliff if all my friends do it!"

Besides, no one is seriously considering auto failover for desktop machines. Not sure where that came from?

bdw5204 · 2024-07-19T13:44:34 1721396674

Parents may teach this to kids but the kids usually notice their parents don't practice what they preach. So they don't either.

The world is filled with people following everybody else off a cliff. If you're warning people or even just not playing along in a time of great hysteria, people at best ignore your warnings and direct verbal abuse at you. At worst, you can face active persecution for being right when the crowd has gone insane. So most people are cowards who go along to get along.

FrustratedMonky · 2024-07-19T14:14:37 1721398477

"if all your friends jump off a cliff, do you as well?"

Sure, that is a common idiom. Usually as stated implying that people shouldn't or wont, jump off the cliff. 'People must be smarter, right?'.

And we would like to think that is logical, and people wouldn't jump off a cliff.

Sadly, it seems like it is more true, people DO jump off the cliff, follow the illogical leader and jump.

It seems to me more and more that it is human nature to follow the leader off the cliff.

Maybe something to do with being social animals, following the herd.

shuntress · 2024-07-19T15:50:57 1721404257

Depends how big the cliff is and whats at the bottom.

noahmasur · 2024-07-19T10:41:07 1721385667

I think the parent was correct in the use of the word "Risk"; it's different than your definition, which appears to be closer to "likelihood".

Risk is a combination of likelihood and impact. If "risk" were just equivalent to "likelihood" then leaving without an umbrella on a cloudy day would be a "high-risk situation".

A rational person needs to weigh both the likelihood and impact of a threat in order to properly evaluate its risk. In many cases, the impact is high enough that even a low likelihood needs to be addressed.

heisenbit · 2024-07-19T10:59:11 1721386751

ZScaler and similar software also has some hidden costs: Performance and all the other fun that comes with a proxy between you and the server you connect to.

twisteriffic · 2024-07-19T13:18:37 1721395117

Their local proxy is so poorly implemented that it's impossible to get more than 2mbps on a bypassed site.

rakoo · 2024-07-19T10:49:10 1721386150

> Now, with that perspective, how do you expect a rational person to behave?

What is a business to do, maximize business or manager contentment ?

lelanthran · 2024-07-19T10:54:17 1721386457

> What is a business to do, maximize business or manager contentment ?

A "business" is still just a collection of people. Each person is going to take actions that are in their best interests.

What I'm saying is that the business's interests are not aligned with the people comprising that business.

In that regard, what "the business" wants is irrelevant.

rakoo · 2024-07-19T12:11:02 1721391062

> What I'm saying is that the business's interests are not aligned with the people comprising that business.

Yep, that's the point of capitalism.

> In that regard, what "the business" wants is irrelevant.

And yet here we are. Companies get fined left and right for breaching rules but it's ok because it earned them money. There are literal plans made to calculate whether it's profitable to cheat or not. In the current system, what the business wants always wins over individual qualms, unfortunately.

antiframe · 2024-07-19T13:58:26 1721397506

Because the punative system in most countries doesn't affect individuals. As a manager, you're not going to jail for breaking environmental laws, a different entity (the company) is paying for being caught. So, it's still the rational thing to do to break the environment laws to make your groups numbers go up and get a promo or bonus.

Freak_NL · 2024-07-19T10:37:16 1721385436

Almost correct, but you mean 'chance' where you write 'risk':

    Risk = Chance × Impact

The chance of failing an audit initially are high (or medium, present at least). The impact is usually low-ish. It means a bunch of people need to fix policy and set out improvement plans in a rush. It won't cost you your certification if the rectification is handled properly.

It's actually possible that both of your examples are awarded the same level of risk, but in practice the latter example will have its chance minimized to make the risk look acceptable.

codetrotter · 2024-07-19T11:34:47 1721388887

Chance has more positive connotations than it has negative connotations IMO.

Probability is a more neutral word, and fits better.

cvoss · 2024-07-19T13:28:22 1721395702

> Now, with that perspective, how do you expect a rational person to behave?

They'd deploy the software on the critical path. That's exactly GP's point, isn't it? That's why GP explicitly wants us to shift some of the blame from the business to the regulators. GP advocates for different regulatory incentives so that a rational person would then do the right thing instead of the wrong thing.

greggsy · 2024-07-19T15:06:24 1721401584

> Risk: low (so few companies get hacked)

I’m at risk of sounding like chicken little, the reality is companies are getting popped all the time - you just don’t hear about them very often. The bar for media reporting is constantly being raised to the point where you only hear about the really big ones.

If you read through any of the weekly Risky Biz News posts [1] you’ll often see a five or more highly impactful incidents affecting government and industry, and they’re just the reported ones.

[1] https://news.risky.biz/

maxerickson · 2024-07-19T11:13:40 1721387620

If you read their comment holistically, they obviously agree with you and think that the outcome of the audit should be more meaningful.

makeitdouble · 2024-07-19T12:27:05 1721392025

> 1. Risk: low (so few companies get hacked)

I wonder how much that's still true now that ransomware has apparently become viable.

Finding an insecure target, setup the data hostage situation, have the victim come to pay is scalable and could work in volume. If getting small money from a range of small targets becomes profitable, small fishes will bear sinilar risks to juicier targets.

danaris · 2024-07-19T11:37:08 1721389028

But...surely you're also missing another point of consideration:

Single point of failure fails, taking down all your systems for an indeterminate length of time:

1. Risk: moderate (an auto-updating piece of software without adequate checks? yeah, that's gonna fail sooner or later)

2. Impact to business: high

3. Impact to manager: varies (depending on just how easy it is to spin the decision to go with a single point of failure rather than a more robust solution to the compliance mandate)

beeboobaa3 · 2024-07-19T12:34:46 1721392486

> 3. Impact to manager: minimal, if audits were all passed.

I don't know about you, but I'll be making sure everyone knows that the manager signed off on the spectacularly stupid idea to push through an update on a friday without testing.

collinc777 · 2024-07-19T10:58:10 1721386690

You’re conflating risk and frequency

ThePowerOfFuet · 2024-07-19T21:04:23 1721423063

>Risk: low (so few companies get hacked)

Come on.

izacus · 2024-07-19T10:08:35 1721383715

Of course, disabling those auto updates will have you fail the external security audit and now your security team needs to fight with the rest of the leadership in the company explaining why you're generating needless delays, costs against the "state of the art in security industry" and why your security guys are smarter than the people who have the power to approve or deny your security certification.

btbuildem · 2024-07-19T12:45:27 1721393127

I've taken part in some security audits where I work. They're not a joke only because they're a tragic story of incompetence, hubris, and rubberstamping. They 100% focus on checking boxes and cargo-culting, while leaving enormous vulnerabilities wide open.

chii · 2024-07-19T10:18:45 1721384325

> you fail the external security audit

aka, you fail the cover-your-ass security, rather than actual security.

izacus · 2024-07-19T11:17:33 1721387853

yep, but "trust us, we're secure, pinky promise by our internal employees" doesn't really work either.

ryandrake · 2024-07-19T14:49:37 1721400577

Don't forget the press releases all saying "We take security very seriously!"

StefanBatory · 2024-07-19T11:38:05 1721389085

Employees would rather care for their employment and keeping boss happy rather than going against their orders.

anchochilis · 2024-07-19T12:03:29 1721390609

What I don't understand is why they don't have a canary update process. Server side deployments do this all the time. You would think Windows would offer that to their institutional customers, for all types of updates including (especially) 3rd party.

luma · 2024-07-19T12:40:13 1721392813

This isn't a Windows update (which absolutely does let you do blue/green deployments vis SUS), but rather a Crowdstrike update which also lets you stage rollouts and I expect several administrators are finding out why that is important.

vladvasiliu · 2024-07-19T14:49:38 1721400578

I know about update policies, but afaik those are about the “agent” version. Today’s update doesn’t look like an agent version. The version my box is running was released something like a week ago.

Is there some possibility tu stage rollouts of the other stuff it seems to download?

throwaway7ahgb · 2024-07-19T11:54:20 1721390060

I have been in these audits and nowhere does it say that software has to be 'auto updated', this is a ridiculous statement and requirement.

What a proper audit will look for is a update and testing control with supporting evidence.

skywhopper · 2024-07-19T13:55:45 1721397345

Sounds like your employer has better auditing processes than most places.

lysp · 2024-07-19T10:23:43 1721384623

Or lose IT/security insurance for not installing or disabling it.

nlitened · 2024-07-19T10:39:52 1721385592

Well, let’s see how much insurance companies will pay now

scott_w · 2024-07-19T12:44:08 1721393048

It’s not about whether they pay out, large enough customers demand you have insurance as a condition of sale. It’s cover your arse all the way down!

mdip · 2024-07-19T12:59:28 1721393968

Kind of a big thing most people don't understand about the various forms of "Business Insurance." For the most part, businesses have whatever insurance whatever they are doing requires them to have. Those requirements are set by laws/regulations applied to those entities and the various entities they want to do business with.

At every small shop I've worked when the topic of Business Insurance came up with one of the owners, the response was extremely negative -- basically summarized as "it's the most you will ever pay for something you won't ever be able to use".

scott_w · 2024-07-19T15:48:06 1721404086

Yep, it’s pretty much a toll on doing business with entities. I’ve no doubt the intention is so your customer can sue you without you winding up, whether it actually works… no idea.

remram · 2024-07-19T14:38:30 1721399910

Why do we call managers "leaders" now? That's not what they are.

guax · 2024-07-19T10:26:13 1721384773

Well. Now you have something to point to. Next RFO you can ignore the blameless part and point to a executive override of a technical decision.

basisword · 2024-07-19T11:58:26 1721390306

>> It's easy to blame CrowdStrike but that seems too easy on both the orgs that do this but also the upstream forces that compel them to do it.

While orgs using auto update should reconsider, the fact that CrowdStrike don't test these updates on a small amount of live traffic (e.g. 1%) is a huge failure on their part. If they released to 1% of customers and waited even 24 hours before rolling out further this seems like it would have been caught and had minimal impact. You have to be pretty arrogant to just roll out updates to millions of customers devices in one fell swoop.

lqet · 2024-07-19T12:00:03 1721390403

Why even test the updates on a small amount of live customers first? Wouldn't this issue already have surfaced if they tested the update on a handful of their own machines?

basisword · 2024-07-19T12:04:44 1721390684

I would hope they've done that and it passed internal QA. But maybe not a good idea to assume they're doing any sort of testing at all.

greggsy · 2024-07-19T15:10:32 1721401832

They’d have rigorous test harnesses, but you can’t really account for the complexities of a highly configurable platform like Windows.

kristjansson · 2024-07-19T17:53:36 1721411616

The prevalence of the issue makes it seem unlikely to have been caused by site-specific configurations

tempaway4575144 · 2024-07-19T09:58:12 1721383092

You are completely right. BTW It wasn't a software update, it was a content update, a 'channel file'. Someone didn't do enough testing. edit: or any testing at all?

https://x.com/George_Kurtz/status/1814235001745027317

https://x.com/brody_n77/status/1814185935476863321

Shank · 2024-07-19T10:03:24 1721383404

It's an automatic update of the product. Semantic "channel vs. binary" doesn't indicate anything. If your software's definition files can cause a kernel mode driver to crash in a bootloop you have bigger problems, but the outcome is the same as if the driver itself was updated.

tempaway4575144 · 2024-07-19T10:05:35 1721383535

Indeed. Its worse really, it means there was a bug lurking in their product that was waiting for a badly formatted file to surface it. Given how widespread the problem is it also means they are pushing these files out without basic testing.

edit: It will be very interesting to see how CrowdStrike wriggle out of the obvious conclusion that their company no longer deserves to exist after a f*k up like this.

account42 · 2024-07-19T11:10:25 1721387425

Simple: They are obviously too big to fail now.

tempaway4575144 · 2024-07-19T12:30:48 1721392248

"Too big to uninstall" is a thing I guess

account42 · 2024-07-19T11:09:31 1721387371

> President & CEO CrowdStrike, Former CTO of McAfee

Well that's certainly a track record.

mrguyorama · 2024-07-19T13:38:32 1721396312

That's funny, because IIRC McAfee back in the Windows XP days did this exact same thing! They added a system file to the signature registry and caused Windows computers to BSOD on boot.

https://www.zdnet.com/article/defective-mcafee-update-causes...

delfinom · 2024-07-19T12:14:28 1721391268

Showing yet again that the executive class only fails upwards.

morpheuskafka · 2024-07-19T10:04:01 1721383441

That’s even worse—-they should be fuzz testing with bad definitions files to make sure this is safe. Inevitably the definitions updates will be rushed out to address zero days and the work should be done ahead of time to make them safe.

tux3 · 2024-07-19T10:20:35 1721384435

Having spent time reverse-engineering Crowdstrike Falcon, a lot of funny things can happen if you feed it bad input.

But I suspect they don't have much motivation to make the sensor resilient to fuzzing, since the thing's a remote shell anyways, so they must think that all inputs are absolutely trusted (i.e. if any malicious packet can reach the sensor, your attackers can just politely ask to run arbitrary commands, so might as well assume the sensor will never see bad data..)

weinzierl · 2024-07-19T11:02:55 1721386975

"that all inputs are absolutely trusted"

This is something funny to say when the inputs contain malware signatures, which are essentially determined by the malware itself.

I mean, how hard would it be to craft a malware that has the same signature as an important system file? Preferably one that doesn't cause immediate havoc when quarantined, just a BSOD after reboot, so it slips through QA.

Even if the signature is not completely predictable, the bad guys can try as often as they want and there would not even be way to detect these attempts.

lmm · 2024-07-19T11:21:47 1721388107

> malware signatures, which are essentially determined by the malware itself.

No they're not. The tool vendor decides the signature, they pick something characteristic that the malware has and other things don't, that's the whole point.

> how hard would it be to craft a malware that has the same signature as an important system file?

Completely impossible, unless you mean, like, bribe one of the employees to put the signature of a system file instead of your malware or something.

weinzierl · 2024-07-19T11:26:46 1721388406

The tool vendor decides the signature

Sure, but they do it following a certain process. It's not that CrowdStrike employees get paid to be extra creative in their job, so you likely could predict what they choose to include in the signature.

In addition to that, you have no pressure to get it right the first time. You can try as often as you want and analyzing the updated signatures you even get some feedback about your attempts.

lmm · 2024-07-20T04:39:32 1721450372

> Sure, but they do it following a certain process.

Which is going to include checking that it doesn't match any OS files.

> You can try as often as you want and analyzing the updated signatures you even get some feedback about your attempts.

As others said, probably only if you can reverse a hash function.

GONE_KLOUT · 2024-07-19T10:45:57 1721385957

Please more details. What do you mean with "is a remote shell anyways"? thanks!

tux3 · 2024-07-19T10:56:37 1721386597

Falcon has a feature called "Real Time Response". The sensor is in contact with a server with which it exchanges events serialized in protobuf.

One of the event you can get from the Crowdstrike server runs an arbitrary shell command.

https://www.crowdstrike.com/tech-hub/endpoint-security/the-p...

blacklion · 2024-07-19T14:01:11 1721397671

How can THIS pass any sane Audit?!

Like, «We require that your employees opens only links on white list, and social networks cannot be put on this list, and we require managed antivirus / firewall solution, but we are Ok that this solution has backdoor directly for 3rd party organization»?

It is crazy. All these PCI DSS and SOC2 looks like a comedy if they allow such things.

mdip · 2024-07-19T13:56:58 1721397418

At a former employer of about 15K employees, two tools come to mind that allowed us to do this on every Windows host on our network[0].

It's an absolute necessity: you can manage Windows updates and a limited set of other updates via things like WSUS. Back when I was at this employer, Adobe Flash and Java plug-in attacks were our largest source of infection. The only way to reliably get those updates installed was to configure everything to run the installer if an old version was detected, and then find some other ways to get it to run.

To do this, we'd often resort to scripts/custom apps just to detect the installation correctly. Too often a machine would be vulnerable but something would keep it from showing up on various tools that limit checks to "Add/Remove Programs" entries or other mechanisms that might let a browser plug-in slip through, so we'd resort various methods all the way down to "inspecting the drive directory-by-directory" to find offending libraries.

We used a similar capability all the way back in the NIMDA days to deploy an in-house removal tool[1]

[0] Symantec Endpoint Protection and System Center Configuration Manager

[1] I worked at a large telecom at that time -- our IPS devices crashed our monitoring tool when the malware that immediately followed NIMDA landed. The result was a coworker and I dissecting/containing it and providing the findings to Trend Micro (our A/V vendor at the time) maybe 30 minutes before the news started breaking and several hours before they had anything that could detect it on their end.

jasonladuke0311 · 2024-07-19T16:54:08 1721408048

At ring 0 I assume. Not that it would matter, I imagine privesc would be fairly trivial.

baq · 2024-07-19T10:53:20 1721386400

It's an interface to the ring 0 kernel module. Everything is a remote shell if it can talk to ring 0.

morpheuskafka · 2024-07-22T18:23:43 1721672623

Hilariously, my last employer was switching to Crowdstrike a few months ago when my contract ended. We previously used Trellis which did not have any remote control features beyond network isolation and pulling filesystem images. During the Crowdstrike onboarding, they definitely showed us a demo of basically a virtual terminal that you could access from the Falcon portal, kind of like the GCP or AWS web console terminals you can use if SSH isn't working.

attentive · 2024-07-19T18:39:32 1721414372

it's a root-kit with RCE and C&C is CS headquarters.

tempaway4575144 · 2024-07-19T12:25:27 1721391927

That approach only makes sense if trusted inputs are tested

weinzierl · 2024-07-19T10:49:33 1721386173

As I understand, this only manifests after a reboot and if the 'content update' is tested at all it is probably in a VM that just gets thrown away after the test and is never rebooted.

Also, this makes me think:

How hard would it be to craft a malware that has the same signature as an important system file?

Preferably one that doesn't cause immediate havoc when quarantined, just a BSOD after reboot, so it slips through QA.

I don't believe this is what's happened, but I think it is an interesting threat.

ExoticPearTree · 2024-07-19T11:57:31 1721390251

Nope, not after a reboot. Once the "channel update" is loaded into Falcon, the machine will crash with a BSOD and then it will not boot properly until you remove the defective file.

jodrellblank · 2024-07-19T11:10:57 1721387457

> How hard would it be to craft a malware that has the same signature as an important system file?

Very, otherwise digital signatures wouldn’t be much use. There are no publicly known ways to make an input which hashes to the same value as another known input through the SHA256 hash algorithm any quicker than brute-force trial and error of every possibility.

This is the difficulty that BitCoin mining is based on - the work that all the GPUs were doing, the reason for the massive global energy use people complain about is basically a global brute-force through the SHA256 input space.

See the “find a custom SHA256” challenge on HN last month discussions: https://news.ycombinator.com/item?id=40683564

weinzierl · 2024-07-19T12:31:05 1721392265

I was talking about malware signatures, which do necessarily use cryptographic hashes. They are probably more optimized for speed because the engine needs to check a huge number of files as fast as possible.

jodrellblank · 2024-07-19T15:19:04 1721402344

Cryptographic hashes are not the fastest possible hash, but they are not slow; CPUs have hardware SHA acceleration: https://www.intel.com/content/www/us/en/developer/articles/t... - compared to the likes of a password hash where you want to do a lot of rounds and make checking slow, as a defense against bruteforcing.

That sounds even harder; Windows Authenticode uses SHA1 or SHA256 on partial file bytes, the AV will use its own hash likely on the full file bytes, and you need a malware which matches both - so the AV will think it's legit and Windows will think it's legit.

andrecarini · 2024-07-19T11:16:45 1721387805

> same signature as an important system file

AFAIK important system files on Windows are (or should be) cryptographically signed by Microsoft. And the presence of such signature is one of the parameters fed to the heuristics engine of the AV software.

> How hard would it be to craft a malware that has the same signature as an important system file?

If you can craft malware that is digitally signed with the same keys as Microsoft's system files, we got way bigger problems.

patmorgan23 · 2024-07-19T12:44:09 1721393049

>How hard would it be to craft a malware that has the same signature as an important system file?

Extremely, if it were easy that means basically all cryptography commonly in use today is broken, the entire Public Key Infrastructure is borderline useless and there's no point in code signing anymore.

bravetraveler · 2024-07-19T10:05:01 1721383501

That makes me even more unsettled! Shouldn't this be closer to metadata than operational/mechanical?

Feels like they made unsafe data for the format they created. Untrustworthy. To your point, they aren't testing.

Avamander · 2024-07-19T12:30:18 1721392218

Why does it make you more unsettled? The amount of parsers written in unsafe languages for difficult formats is immense. They're everywhere.

bravetraveler · 2024-07-19T12:34:31 1721392471

Admittedly, I don't know exactly what's in these files. When I hear 'content' I think 'config'. This is going to be very hypothetical, I ask for some patience. Not arguments.

The 'config file' parser is so unsafe that... not only will the thing consuming it break, but it'll take down the environment around it.

Sure, this isn't completely fair. It's working in kernel space so one misstep can be dire. Again, testing.

I think it's a reasonable assumption/request that something try to degrade itself, not the systems around it

edit: When a distinction between 'config' and 'agent' releases is made, it's typically with the understanding that content releases move much faster/flow freely. The releases around the software itself tend to be more controlled, being what is actually executed.

In short, the risk modeling and such doesn't line up. The content updates get certain privileges under certain (apparently mistaken) robustness assumptions. Too much credit, or attention, is given to the Agent!

glimshe · 2024-07-19T10:55:52 1721386552

It passed all unit tests!

livrem · 2024-07-19T13:57:27 1721397447

It passed the type checker!

andyjohnson0 · 2024-07-19T11:36:56 1721389016

> It wasn't a software update, it was a content update, a 'channel file'

Because I know nothing about Crowdstrike... what is a "channel file"? Some kind of config file?

ExoticPearTree · 2024-07-19T11:58:18 1721390298

It is how they package their malware definitions. It's semantics.

TheOtherHobbes · 2024-07-19T12:45:35 1721393135

So their malware definition turned into malware?

Good to know they don't check their definitions for defects before installing them.

philipwhiuk · 2024-07-19T14:07:25 1721398045

It's possible there's no human involvement from detection to deployment.

drivebycomment · 2024-07-19T13:36:54 1721396214

This will go down as one of the worst examples of communication during an outage.

chrisjj · 2024-07-19T12:10:43 1721391043

Since when is this content not software, just because it is.not an .exe?

monkeydust · 2024-07-19T10:50:11 1721386211

"All over the place I'm seeing checkbox compliance being prioritised above actual real risks from how the compliance is implemented."

Great statement and one that needs to be seriously considered - would DORA regulation in the EU address this I wonder? Its a monster piece of tech legislation that SHOULD target this but WILL it - someone should use todays disaster and apply it to the regs to see if its fit for purpose.

wjnc · 2024-07-19T12:41:27 1721392887

Emphatically NO. Involved in (IT) Risk and DORA in a firm that actually does IT risk scenario planning (the sort opposite of checkbox compliance). DORA is rubber stamping al the way round. One caveat is that we are way ahead of DORA, so treating DORA as a checkbox exercise might be situational. But I haven’t noticed a place where the rubber hits the road regulatory wise. It’s too easy to stay in checkbox compliance if the board doesn’t see IT-risk as a major concern. I’m happy one of our board members does. We’ve gone so far as to introduce a person and paper based credit line, so we can continue an outgoing cashflow if most of our processes fail (for an insurer).

piker · 2024-07-19T10:54:13 1721386453

Broken regulations? Fix by adding more!

skywhopper · 2024-07-19T13:57:18 1721397438

What's your suggestion for fixing broken regulations? Not having any? That is also "broken".

idle_zealot · 2024-07-19T11:18:05 1721387885

Well, yeah. If a regulation is broken and not achieving its goal it should be changed. What's the alternative? "Regulation? We tried that once and it didn't work perfectly, so now we let The Market™ sort out safety standards."

InsideOutSanta · 2024-07-19T11:24:13 1721388253

Who needs regulation when you can have free Fentanyl with your CrowdStrike subscription! All of your systems will go down, but you won't care, and the chance of accidental overdose is probably less than 10%!

jddj · 2024-07-19T11:58:52 1721390332

The child slave labour is what really gets the deal across the line

piker · 2024-07-19T11:36:08 1721388968

Yes, in many contexts that may well be the correct conclusion. Your comment presumes that regulation here has proven itself useful and not resulted in a single point of failure which potentially reduces overall safety. It’s of course the correct comment from a regulator’s perspective.

harimau777 · 2024-07-19T13:02:06 1721394126

For the market to work wouldn't you need something to hold the corps accountable if they fail to be secure AND to make regular people whole if the crops' failures cause them problems?

piker · 2024-07-19T18:24:15 1721413455

Yes, like the court system … specifically class actions in the United States have been established for this exact purpose.

harimau777 · 2024-07-20T13:32:32 1721482352

After attorney's fees, class action rarely pays enough to make the victims whole.

Suing individually is only an option if someone can afford a lawyer.

dmix · 2024-07-19T12:12:14 1721391134

Especially for something like technology and infosec which rapidly changes, it’s silly to look to slow moving regulations as a solution, not to mention ignoring history and gambling politicians will do it competently and it won’t have negative side effects like distracting teams from doing real work that’d actually help.

You can make fines and consequences after the fact for blatant security failures as incentives but inventing a new “compliance” checklist of requirements is going to be out of date by the time it’s widely adopted and most companies do the bare minimum bullshit to pass these checklists.

BartjeD · 2024-07-19T13:57:00 1721397420

There are so many english centric assumptions here.

Regulation of liability can be very generic and broad, with open standards that dont need to be updated.

Case in point: Most of continental Europe still uses Napoleon's code civile to prescribe how and when private parties are liable. This is more than 150 years old.

The real issue is that most Americans are stuck with an old English regulatory system, which for fear of overreach was never modernized.

engeljohnb · 2024-07-19T13:27:52 1721395672

> companies do the bare minimum bullshit

This can be true of security (and every other expense) whether it's regulated or not. Which do you think will result in fewer incidents: the regulated bare minimum, or the unregulated base minimum?

emdowling · 2024-07-19T10:54:51 1721386491

EU tech regulation actually addressing an issue effectively? I wouldn't hold my breath, but there is a first time for everything.

InsideOutSanta · 2024-07-19T11:24:37 1721388277

I like USB-C in my iPhone.

belter · 2024-07-19T10:12:54 1721383974

> So we need to hold regulatory bodies accountable as well - when they frame regulation such that organisations are cornered into this they get to be part of the culpability here too.

No, we need to hold Architects accountable, and this is the core of the issue. Creating systems with single, outsourced responsibility, in the critical path.

ta1243 · 2024-07-19T10:18:06 1721384286

As a CTO, when your company goes down you get fired.

When every company goes down you get let off.

The sensible thing is follow the herd and centralise. You're outsourcing the risk to your own job.

zer00eyz · 2024-07-19T10:41:03 1721385663

This is the point of much of the security efforts we see now.

Outsourcing of security functions, and things like login push a lot of liability and legal issues off into someone else's house.

It's hard to be the source of a password leak, or be compromised when you don't control the passwords. But like any chain your only as secure as your weakest link... Snowflake is a great current example of this. Mean while the USPS just told us "oops" we had tracking pixels for a bunch of vendors all over our delivery preview tool.

Candidly, most people stacks look a lot less like software and more like a toolbar riddled IE5 install circa 2000. I don't think our industry is in a good place.

PretzelPirate · 2024-07-19T12:14:07 1721391247

This is one of the interesting aspects in Ethereum.

If your validator is down, you lose a small amount of stake, but if a large percentage of the total set of validators are down, you all start being heavily penalized.

This incentives people running validators to not use the most popular Ethereum client, to avoid using a single compute provider, and to overall, avoid relying on the popular choice since doing so can cause them to lose the majority of their stake.

There hasn't been a major Ethereum consensus outage, but when that happens, the impact of being lazy and following the heard will be huge.

chuckadams · 2024-07-19T15:35:04 1721403304

How is it lazy and herd-like to _not_ run the latest and greatest? Sounds like Etherium's design is promoting a robustly diverse ecosystem rather than a monoculture.

PretzelPirate · 2024-07-19T16:36:52 1721407012

> How is it lazy and herd-like to _not_ run the latest and greatest?

I'm not sure what you're asking here. Ethereum incentives don't make you run the latest version of your client's software (unless there's a hardfork you need to support). You can run any version that follows the network consensus rules.

The incentives are there to punish people who use the most common software. For example, let's say there are around 5 consensus clients which are each developed by independent teams. If everyone ran the same client, a bug could take down the entire network. If each of those 5 clients were used to run 20% of the network, then a bug in any one of them wouldn't be a problem for Ethereum users and the network would keep running.

If the network is evenly split across those 5 clients but all of them are running in AWS, then that still leaves AWS as a sigle point of failure.

The incentives baked into the consensus protocol exist to push people towards using a validator client that isn't used by the majority of other validators. That same logic applies to other things like physical host locations, 3rd party hosting providers, network providers, operating systems, etc... You never want to use the same dependencies as the majority of other validators. If you do and a wide-spread issue happens, you're setting yourself up to lose a lot of money.

chuckadams · 2024-07-19T22:34:16 1721428456

It sounds like you're describing the advantages of diversity, with a little game theory thrown in to sweeten the deal. Still not sure how that can be described as lazy, or did I completely mis-read the original phrasing?

ta1243 · 2024-07-19T16:39:30 1721407170

If 90% of the world runs in AWS, and I'm the only one running on my own hardware, do I get a benefit when AWS goes down?

belter · 2024-07-19T10:52:40 1721386360

> When every company goes down you get let off.

And when only companies who use a certain OS and/or Cloud vendor go down? ;-) Do you also get let off?

ta1243 · 2024-07-19T11:03:24 1721387004

If it's large enough. Nobody loses their job when office 365 goes down, even if that happens once a year.

However if you decided to choose a small company which goes down once every 5 years, you're screwed.

capybara_2020 · 2024-07-19T11:10:46 1721387446

I find that in today's world it is no longer about one person being "accountable". There is always an interplay of factors, like others have pointed out cyber security has a compliance angle. Other times it is a cost factor, redundancy costs money. Then there is the whole revolving door of employees coming and going, so institutional knowledge about why a decision was made lost with them.

That is hard to do for even a small company. How do you balance all that out for critical infrastructure at a much larger scale?

citrin_ru · 2024-07-19T15:09:43 1721401783

The problem is that even knowing that this likely to happen many companies would still put CrowdStrike into a critical system for the sake of security compliance / audit. And it's not even prioritization of security over reliability because incentives are to care more about check-boxes in the audit report than about the actual security. Looks like almost no party in this tragic incident had a strong incentive to prevent it so it's likely to happen again.

lqet · 2024-07-19T11:40:52 1721389252

Can anyone explain how CrowdStrike could possibly fix this now? If affected machines are stuck in an endless BSOD cycle, is it even possible to remotely roll out a fix? My understanding is that the machines will never come to the point where a CS update would be automatically installed. Is the only feasible option the official workaround of manually deleting system files after booting into the recovery environment? How could this possibly be done on scale in organizations with tens of thousands of machines?

m348e912 · 2024-07-19T11:56:22 1721390182

There are orgs out there right now with 50,000+ systems in a reboot loop. Each one needs to me manually configured to disable CS via safe mode so that the agent version can be updated to the fixed version. Throw bitlocker in the mix which makes this process even longer, we're talking about weeks of work to recover all systems.

ExoticPearTree · 2024-07-19T11:54:05 1721390045

CrowdStrike itself will not fix anything. They published a guide on how to workaround the problem and that's it. Most likely a lot of sales reps and VPs will be fielding calls all over the weekend explaining large customers how did they manage to screw up and how much discount will they offer on the next renewal cycle.

Legally, I think somewhere in their license it says is that they're not responsible in any way or form if their software malfunctions in any way.

akdev1l · 2024-07-19T13:37:13 1721396233

> Legally, I think somewhere in their license it says is that they're not responsible in any way or form if their software malfunctions in any way.

I really should add this to my resume and see if it’ll work.

seanw444 · 2024-07-19T14:10:15 1721398215

Nah, it only works for corporations. Peons still have accountability.

consteval · 2024-07-19T14:52:12 1721400732

I think about this all the time.

Like if I kill someone of course I go to jail. But if I get some people together, say we're a company, and then kill 100 people, nobody goes to jail. How does that work? What a huge loophole.

roxil · 2024-07-19T16:35:27 1721406927

Phillips (the company) basically killed people with malfunctioning CPAP machines (which are meant to help against sleep apnea) and no one went to jail. So that's a practical example.

ExoticPearTree · 2024-07-19T15:01:44 1721401304

While it sounds funny, it doesn't work like that. We'd be having a real corporate shootouts everyday all over the place :))

umanwizard · 2024-07-19T15:26:21 1721402781

I don't think that's true in this case. I've never heard of an individual employee who introduced a bug being legally liable for it.

TheCoelacanth · 2024-07-19T15:03:33 1721401413

It's already the norm for devs to not be responsible for software malfunctions. They can choose to end their relationship with you, but they can't sue you for damages.

delfinom · 2024-07-19T12:16:07 1721391367

Small companies get the shitty generic license.

Big companies negotiate liability terms.

saratogacx · 2024-07-19T15:08:06 1721401686

Yep, I've been involved in many vender contracts at my company and the contracts take weeks to months to finalize because every aspect of the agreement is up for discussion. Even things like SLA's (including how they're calculated), liability limitations, indemnity, recourse in the event of system failure are all put through the ringer until both sides come to agreeable terms. This is true for big and tiny venders.

crazygringo · 2024-07-19T13:45:09 1721396709

> Big companies negotiate liability terms.

I have never heard of that. Can you point to some examples?

Not SLA's (which are standard), but actual liability? E.g. if we brick your computers we'll pay for replacements and lost employee productivity?

betaby · 2024-07-19T13:25:51 1721395551

> Big companies negotiate liability terms.

Never heard that in the context of the software licenses.

levi-turner · 2024-07-19T14:40:35 1721400035

This isn't a Github project with a MIT license. When you do B2B software, there aren't software licenses, there are contractual terms and conditions. The T&Cs outline any number of elements but including SLAs, financial penalties for contractual breaches, etc. Larger customers negotiate these T&Cs line by line. Smaller customers often accept the standard T&Cs.

ExoticPearTree · 2024-07-19T15:08:15 1721401695

Penalties, as far as I was involved in vendor discussions, are a part of the negotiation only when the software provider does any work on the client's premises and are liable to that extent.

For software, you don't pay penalties that it might malfunction once in a while, that's what bug-fixes are for and you get offered an SLA for that, but only for response time, not actual bug fixing. Where you do get penalties and maybe even your money back, is when the software is listed as being able to do X,Y,Z and it only does X and Z and the contract says it must do everything it said it does.

rayrey · 2024-07-19T13:38:15 1721396295

Pretty standard in enterprise b2b, most of the sales cycle is in contracts

betaby · 2024-07-19T18:22:31 1721413351

Well, probably no? I've never seen liabilities in dollar value, or rather any significant value. Also I saw our company Ceowdstrike contract for 10k+ seats, no liabilities there.

butlike · 2024-07-19T14:29:14 1721399354

"THIS SOFTWARE IS PROVIDED AS-IS..."

dintech · 2024-07-19T14:44:09 1721400249

I think I preferred it AS-WAS.

chrisjj · 2024-07-19T12:07:56 1721390876

But they've already fixed it.

"CrowdStrike Engineering has identified a content deployment related to this issue and reverted those changes."

How they've reverted changes on non-booting PCs, goodness only knows... ;)

justinclift · 2024-07-19T12:39:33 1721392773

Sounds like people in some of these environments will be doing their level best to automate an appropriate fix.

Hopefully they have IPMI and remote booting of some form available for the majority of the affected boxes/VMs, as that could likely fix a large chunk of the problem.

pelasaco · 2024-07-19T12:22:04 1721391724

Imagine if North Korea comes with a statement, that they did it.. It would spawn such amount of work internally at CS to proof if it was intentional or a simple mistake.

geitir · 2024-07-19T14:26:20 1721399180

Amazing idea

maxrecursion · 2024-07-19T14:43:28 1721400208

I work for government organization that is constantly audited and I've seen this play out over and over.

An important aspect I never see mentioned is most Cyber Security personnel don't have the technical experience to truly understand the systems they are assessing, they are, like you said, just pushing to check those compliance boxes.

I say this as someone who is currently in a Cyber Security role, unfortunately, as I'm coming to learn cyber roles suck. But this isn't a jab at those Cyber Security personnel's intelligence. It's literally impossible to understand multiple systems at a deep level, it takes employees working on those systems weeks to months to understand this stuff, and that's with them being in the loop. Cyber is always on the outside looking in, trying like hell to piece it all together.

Sorry for the rant. I just wanted to add on with my personal opinions on the cyber security framework being severely broken because I deal with it on a daily basis.

Hizonner · 2024-07-19T15:40:11 1721403611

> It's literally impossible to understand multiple systems at a deep level,\

No, it's not. It takes above average intelligence, and major investment in actual education (not just "training"), and actual depth of experience, but it's not impossible.

zmmmmm · 2024-07-19T22:03:10 1721426590

Do you think it comes from a fundamental misconception of how these roles should be structured? My take is that you just can't fundamentally assess technical elements from the outside unless they have been designed that way in the first place (for assessability). For example I educate my team that they have structure their git commits in a way that demonstrates their safety for audit / compliance purposes (never ever combine a high risk change with a low risk one, for example). That should go all the way up the chain. Failure to produce an auditable output is failure to produce an output that can be deployed.

lokar · 2024-07-19T15:06:12 1721401572

Our compliance and security people turned up with an urgent request to patch out Linux kernels in AWS.

The pcmcia driver had a vuln

I don’t listen to them much anymore

unethical_ban · 2024-07-19T16:28:24 1721406504

I know of an important company currently pushing to implement a redundant network data loss prevention solution, while they don't have persistent VPN enabled and multiple known misconfigurations of things that prevent web decryption working properly.

Because someone needs a checkbox.

theallan · 2024-07-19T09:57:11 1721383031

The flip side is, if you don't do auto updates and an exploit is published and used against you and you haven't yet tested / pushed the patch, that you would have been protected against if it had auto updated, you are up the creak without a paddle in that situation as well.

To some degree you have to trust the software you are using not to mess things up.

zmmmmm · 2024-07-19T10:10:10 1721383810

So since I do mission critical healthcare I do run into this concept. But it's not as unresolvable as you portray. Consider for example HIPAA "break the glass" requirement. It says that whatever else you implement in terms of security you must implement a bypass that can be activated by routinely non-authorised staff to access health information if someone's life is in danger.

Similarly, when I questioned, "why can't users turn off ZScaler in an emergency" we were told that it wouldn't be compliant. But it's completely implementable at a technical level (Zscaler even supports this). You give users a code to use in an emergency and they can activate it and it will be logged and reviewed after use. But the org is too scared of compliance failure to let users do it.

jenscow · 2024-07-19T10:20:08 1721384408

While I agree with the requirement, but it sounds like a vault would need to have an unlocked door with a sign.

tux3 · 2024-07-19T10:38:22 1721385502

Well, if the vault says you have COPD, and the devious bank robber is interested in your continued breathing, perhaps we can just review the footage after the fact.

This is one of those cases where you don't disable emergency systems to defend against rogue employees. If people abuse emergency procedures, you let the legal system sort it out.

ivan_gammel · 2024-07-19T10:30:03 1721385003

A vault with firearms in the police station to which every staff member has a key. Sounds reasonable to me.

Users are not prisoners left in the burning building without a fire escape.

ben_w · 2024-07-19T10:50:21 1721386221

> It says that whatever else you implement in terms of security you must implement a bypass that can be activated by routinely non-authorised staff to access health information if someone's life is in danger.

Huh.

I can see why this needs to exist, but hadn't thought of it before. Same deal as cryptography and law-enforcement backdoors.

> logged and reviewed after use

I was going to ask how this has protection from mis-use.

Seems good to me… but then I don't, not really, not deeply, not properly, feel medical privacy. To me, violation of that privacy is clearly rude, but how the bar raises from "rude" to "illegal" is a perceptual gap where, although I see the importance to others, I don't really feel it myself.

So it seems good enough to me, but am I right or is this an imagination failure on my part? Is that actually good enough?

I don't think cryptography in general can use that, unfortunately. A simple review process can be too slow for the damage in other cases.

cqqxo4zV46cp · 2024-07-19T10:00:20 1721383220

Yes. And the vast majority of the time, it doesn’t mess things up.

The notion that you may take on risk to net alleviate risk is somehow lost on a lot of people in these conversations.

Puts · 2024-07-19T13:08:26 1721394506

This is an oversimplification. IF we are talking about compliance to ISO 27001 you are supposed to do your own risk assessment and implement necessary controls. The auditor will basically just check that you done the risk assessment, and that you have done the controls you said yourself you need to do.

I'd say this has nothing with regulatory compliance to do at all. The real truth is that modern organizations are way too attached to cloud solutions. And this runs across all parts of the organization with Saas and PaaS whether it's email (imagine Google Workspace having a major issue), AWS, Azure, Okta…

I've had the discussions so many times and the answer is always – the risks doesn't matter because the future is cloud and even talking about self hosting anything is naive and honestly we need to evaluate your competence for even suggesting it.

(Also the cloud would maybe not be this fragile if it wasn't for lock-in with different vendors. If you read the TOS it says basically on all cloud services that you are responsible for the backup – but getting your data out of the service is still pain in the ass – if possible at all)

crazygringo · 2024-07-19T13:51:16 1721397076

> The real truth is that modern organizations are way too attached to cloud solutions.

I'm confused. This is a security product for your local machine. Not the cloud.

Unless you call software auto-update "the cloud", but that's not what people usually mean. The cloud isn't about downloading files, it's about running programs and storage remotely.

I mean, if CloudStrike were running entirely on the cloud, it seems like the problem would be vastly easier to catch immediately and fix. Cloud engineers can roll back software versions a lot easier than millions of end users can figure out how to safe boot and follow a bunch of instructions.

Puts · 2024-07-19T13:59:43 1721397583

Well, in all times usually there has been the option to run a local proxy/cache for your updates so that you can properly test them inside your own organization before rolling them out to all your clients (precisely to avoid this kind of shit show). But doing that requires an internal team running it and actually testing all updates. But modern organizations don't want an IT-department, they want to be "cloud first". So they rely on services that promise they can solve everything for them (until they don't).

Cloud is not just about where things are – it's also about the idea that you can outsource every single piece of responsibility to a intangible vendor somewhere on the other side of the globe – or "in the cloud".

crazygringo · 2024-07-19T14:12:07 1721398327

> Cloud is not just about where things are – it's about the idea that you can outsource every single piece of responsibility to a intangible vendor somewhere in the cloud.

I've never heard of a definition of cloud like that.

Cloud is entirely about where things are.

Outsourcing responsibility to a vendor is totally orthogonal to the idea of the cloud. You can outsource responsibility in the cloud or not. You can also outsource responsibility on local machines or not.

And outsourcing responsibility has existed since long before the concept of the cloud was invented.

It's important to keep definitions clear.

Puts · 2024-07-19T18:36:46 1721414206

The product affected here is litelarly called "CrowdStrike Falcon® Cloud Security". Meraki all tough they sell routers and switches markets their products as "cloud-based network platform". Jamf all tough their product is run on endpoint devices is marked as "Jamf Cloud MDM". I think its fair to say that cloud these days does not only mean storing data, or running servers in cloud but also if infrastructure is in any way MANAGED in cloud.

So to tie back to what i wrote earlier – none of these services has to have the management part in the cloud. They could just give you a piece of software to run on your own server. That would certainly distribute the risk since now it only takes someone hacking the vendor to go after all their customers, or in this case one faulty update brakes all users experience. And as far as I can see it seems we are willing to take those risks because we think it's nice having someone else manage the infrastructure (and that was my main point in the first comment).

number65259 · 2024-07-19T12:52:10 1721393530

> My org which does mission critical healthcare just deployed ZScaler on every computer which is now in the critical path of every computer starting up

Hi fellow CVS employee. Are you enjoying your zscaler induced SSO outages every week that torpedo access to email and every internal application? Well now your VMs can bluescreen too. A few more vendor parasites and we'll be completely nonfunctional. Sit tight!

stefan_ · 2024-07-19T10:01:52 1721383312

When we think "security" on HN we think about the people who escalate wiggling voltages at just the right time into a hypervisor shell on XBox, but I've had to recognize that my learned bias is not correct in the real world. In the real world, "computer security" is a profession full of hucksters that can't tell post-quantum from heap and whose daily work of telling people repeatedly to not click links in Outlook and filling out checklists made by people exactly like them has essentially no bearing on actual security of any sort.

Shank · 2024-07-19T10:08:41 1721383721

It's driven by a lot of things. Part of it is driven by rising cyber liability insurance rates, for one. A lot of organizations would rather not pay for CrowdStrike, but the premiums for not having an "EDR/XDR/NGAV" solution can be astoundingly high at-scale.

Fundamentally there's a lot of factors in this ecosystem. It's really wild how incentives that seem unrelated end up with crazy "security" products or practices deployed.

teeray · 2024-07-19T10:23:50 1721384630

> A lot of organizations would rather not pay for CrowdStrike, but the premiums for not having an "EDR/XDR/NGAV" solution can be astoundingly high at-scale.

Just like a lot of homeowners would rather not pay for ADT, but insurance requires a box-ticking “professionally-monitored fire alarm system.” Nevermind that I can dial 911 as well as the “professional” when I get the same notification as they do.

tristor · 2024-07-19T13:56:25 1721397385

> In the real world, "computer security" is a profession full of hucksters

Always has been. The information security model is about analogizing digital systems as physical systems, and employing the analogues of those physical controls that date back hundreds of years on those digital systems. At no point, in my relatively long career, have I ever met anyone in Information Security who actually understands at depth anything about how to secure digital systems. I say this as someone who has spent a lot of my career trying to do information security correctly, but from the perspective of operations and software engineering, which is where it must start.

The entire information security model the world works with is tacking on security after the fact, thinking you need to builds walls and a vault door to protect the room after the house has already been built, when in fact you need to build the house to be secure from the start because attacks don't go through doors, attacks are airborne (I recognize the irony of my analogizing digital concepts to physical concepts surrounding security, but I do it because of any infosec people that may read my comment so they can understand my point).

Because of this model, we have gone from buying "boxes" to buying "services", but it has never matured away from the box-checking exercise it's been since day one. In fact, many information security people have /no training or education/ in security, it's entirely in regulatory compliance.

l33t7332273 · 2024-07-19T12:06:52 1721390812

I’ve met highly paid “security engineers” that talked about not really being into programming or being okay with python but everything else is too complicated.

It shocks me that such a low level of technical competence is required.

temporarely · 2024-07-19T13:57:57 1721397477

> So CrowdStrike is deployed as third party software into the critical path of mission critical systems and then left to update itself.

TIL that US government has pressured foreign nations to install a mystery blob in the kernel of machines that run critical software "for compliance".

If this wasn't a providential goof on the part of Crowdstrike -- the entire planet is now aware of this little known fact -- then some helpful soul in Crowdstrike has given us a heads-up.

RedShift1 · 2024-07-19T09:56:41 1721383001

Don't put your eggs in one basket, I use multiple anti-virus products so that if one blows up at least not all computers are affected. Looks like my old wisdom is still new wisdom.

Clarification: I mean that every computer has one anti-virus product, but not every computer has the same anti-virus product. I'm not installing multiple anti-virus products on the same computer.

fire_lake · 2024-07-19T10:34:50 1721385290

If you have all of them on a critical path then your risk of blow up increases!

xorcist · 2024-07-19T10:33:40 1721385220

Not sure that's a great idea. This stuff tends to have very high privilege access.

It's enough that one of your anti virus vendors get hacked for your whole organization to get owned...

interludead · 2024-07-19T11:06:26 1721387186

I see one main challenge - it can increase the incidence of false positives.

NDizzle · 2024-07-19T10:02:53 1721383373

You use multiple anti-virus products. Let's assume you use 3. Do you have multiple clusters of machines, each running their own AV product, so in case one has this problem the other two are unaffected?

How much overhead are we talking about here? Because if you're just using multiple AV software installed on one machine, 1) holy shit, the performance penalty, 2) you'd still be impacted by this, as CS would have taken it down.

ta1243 · 2024-07-19T10:20:38 1721384438

They surely mean that all odd number assets are running crowdstrike and even are running sential-one (or similar, %3, %4, etc etc). At least then you only lose half your estate.

RedShift1 · 2024-07-19T10:39:46 1721385586

Yes each computer has only one anti-virus installed, it's basically a random distribution among the estate.

NDizzle · 2024-07-19T10:42:41 1721385761

I have never seen a company that uses multiple AV products rolled out to user machines, ever. Sure, when you transition from one product to another, but across the whole company, at the same time? Never... I have also never seen a distribution of something like active directory servers based on antivirus software. I think these stories are purely academic, "why didn't you just..." tall tales.

ta1243 · 2024-07-19T16:37:53 1721407073

Mine certainly does, our key windows based control systems use windows defender, the corporate crap gets sentinal one and zscaler and whatever else has been bought on a whim.

I'd assumed that any essential company would be similar. OK if your purchasing systems for your hospital are down for a couple of days it's a pain. If you can't get x-rays it's a catastrophe.

If half your x-ray machines are down and half are up, then it's a pain, but you can prioritise.

But lots of companies like a single supplier. Ho hum.

mschuster91 · 2024-07-19T10:18:48 1721384328

Not the person you're replying to, but in any reasonable organization with automated software deployment it should be easy to pool machines into groups, so you can make sure that each department has at least one machine that uses a different anti-virus software.

Bonus, in case you do catch a malware, chances are higher that one of the three products you use will flag it.

NDizzle · 2024-07-19T10:48:33 1721386113

Again, "should be" academic stuff.

So you have multiple AV products and you target those groups. You have those groups isolated on their own networks, right? With all the overhead that comes with strict firewall rules and transmission policies between various services on each one. With redundant services on each network... you've doubled or tripled your network device costs solely to isolate for anti virus software. So if only one thing finds the zero day network based virus, it won't propagate to the other networks that haven't been patched against this zero day thing.

How far down the rabbit hole do we want to go? If you assume many companies are doing this kind of thing, or even a double digit percentage of companies, I have bad news for you.

RedShift1 · 2024-07-19T10:41:06 1721385666

Basically every machine gets a randomly picked anti-virus suite assigned at deployment. I'm not running multiple AV products on one machine.

Lovesong · 2024-07-19T10:45:46 1721385946

Was there any situation where having 3 anti-virus was more beneficial than having only 2?

ben_w · 2024-07-19T10:55:40 1721386540

I'm reading this as first third of computers have AV brand A, second third have brand B, remainder have brand C.

Thus, if brand A does something actively harmful all by itself, only 1/3rd of machines are impacted.

This is an improvement on having only 2 brands, as having 1/3rd of your machines go down is better than having 1/2 of your machines go down.

cqqxo4zV46cp · 2024-07-19T10:01:08 1721383268

This is much easier applied personally than it is to a 30k person organisation. No need to be condescending.

Twisell · 2024-07-19T11:02:45 1721386965

In both case it's costly.

But cost of maintenance aside it wouldn't be that bad to deploy each half the fleet with two distincts EDR.

This is actually implicitly in place for big companies that support BYOD. If half your fleet is on Windows another 40% on MacOs and 10% on Linux you need distinct EDR solutions and a single issue can't affect all your fleet at once.

interludead · 2024-07-19T11:04:59 1721387099

Indeed a wise strategy

jrockway · 2024-07-19T11:00:33 1721386833

Zscaler is truly amazing. It can't do HTTP/2. Our product is HTTP/2-only. So we can't use our own product at work.

mrweasel · 2024-07-19T12:48:26 1721393306

I know a few people who have Zscaler deployed at work. It will routinely kick them of the internet, like multiple times a day. It has gotten to the point where they can sort of tell in advance that it's about to happen.

The theory so far it that it's related to their activities, working in DevOps they will sometimes generate "suspicious" traffic patterns which will then trigger someone policy in Zscaler, but they're not actually sure.

supertrope · 2024-07-19T14:06:33 1721397993

ZScaler itself uses port 443 UDP, but blocks QUIC. The last time I checked it didn't support IPv6 so they told customers to disable IPv6. Security software is legacy software out of the box and cuts the performance of computers in half.

pwarner · 2024-07-19T15:20:11 1721402411

What Zscaler can and will do though is break your network randomly and in strange ways. They don't even seem to charge for that feature!

Avamander · 2024-07-19T12:33:19 1721392399

Their visibility and process in general for handling abuse of their services is also abysmal.

bonoboTP · 2024-07-19T09:58:58 1721383138

> more scared of failing an audit than they are of the consequences failure of the underlying systems the audits are supposed to be protecting.

Duh, else there would be no need to audit them to force compliance, they'd just do it by themselves. The only reason it needs forcing is that they otherwise aren't motivated enough.

lucianbr · 2024-07-19T10:12:52 1721383972

Good point. But the audit seems useless now. It's supposed to prevent the carelessness from causing... this thing that happened anyway.

Sure, maybe it prevented even more events like this from happening. But still.