Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah... the comment above reads like someone who has read a lot of books on CI deployment, but has zero experience in a real world environment actually doing it. Quick to throw stones with absolutely no understanding of any of the nuances involved.


There is no nuance needed - this is a giant corporation that sells kernel layer intermediation at global scale. You better be spending billions on bulletproof deployment automation because *waves hands around in the air pointing at whats happening just like with solarwinds*

Bottom line this was avoidable and negligent

For the record I owned global infrastructure as CTO for the USAF Air Operations weapons system - one of the largest multi-classification networked IT systems ever created for the DoD - even moreso during a multi-region refactor as a HQE hire into the AF

So I don’t have any patience for millionaires not putting the work in when it’s critical infrastructure

People need to do better and we need accountability for people making bad decisions for money saving


Almost everything that goes wrong in the world is avoidable one way or the other. Simply stating "it was avoidable" as an axiom is simplistic to the point of silliness.

Lots of very smart people have been hard at work to prevent airplanes from crashing for many decades now, and planes still crash for all sorts of reasons, usually considered "avoidable" in hindsight.

Nothing is "bulletproof"; this is a meaningless buzzword with no content. The world is too complex for this.


> You better be spending billions on bulletproof deployment automation

There is no such thing.


You must have insanely cool stories :-)

What are your thoughts on MSFTs role in this?

They’ve been iterating Windows since 1985 - doesn’t it seem reasonable that their kernel should be able to survive a bad 3rd party driver?


1. System high/network isolation is a disaster in practice and is the root of MSFT and AD/ADFS architecture

2. The problem is the ubiquity of windows so it’s embedded in the infrastructure

We’ve put too many computers in charge of too much stuff for the level of combined capabilities of the computer and the human operator interface


So let's hear the "nuances" that excuse this.


I am not defending of excusing anything. I am saying there is not enough information to make a judgement one way or the other. Right now, we have almost zero technical details.

Call me old-fashioned and boring, but I'd like to have some basic facts about the situation first. After this I decide who does and doesn't deserve a bollocking.


I think we do have enough info to judge e.g. :This should not have passed a competent C/I pipeline for a system in the critical path."

Thay info includes that the faulty file consisted entirely of zeros.


> That info includes that the faulty file consisted entirely of zeros.

Even that is not certain. Some people are reporting that this isn't the case and that the all-zeroed file may be a "quick hack" to send out a no-op.

So no, we have very little info.


But the all-zero file is version CS has IDed as the cause, right?


No, CS has explicitly stated that the cause was a logic error in the rules file. They have also stated "This is not related to null bytes contained within Channel File 291 or any other Channel File."


It’s not a matter of excusing or not excusing it. Incidents like this one happen for a reason, though, and the real solution is almost never “just do better.”

Presumably crowdstrike employs some smart engineers. I think it’s reasonable to assume that those engineers know what CI/CD is, they understand its utility, and they’ve used it in the past, hopefully even at Crowdstrike. Assuming that this is the case, then how does a bug like this make it into production? Why aren’t they doing the things that would have prevented this? If they cut corners, why? It’s not useful or productive to throw around accusations or demands for specific improvements without answering questions like these.


Not an excuse - they should be testing for this exact thing - but Crowdstrike (and many similar security tools) have a separation between "signature updates" and "agent/code" updates. My (limited) reading of this situation is that this as a update of their "data" not the application. Now apparently the dynamic update included operating code, just just something the equivalent of a yaml file or whatever, but I can see how different kinds of changes like this go through different pipelines. Of course, that is all the more reason to ensure you have integration coverage.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: