I'm with amir in comment 23 and with Aaron in previous comments. Stuff happens. And when there are multiple moving pieces, the process and policies are the issue, not the individuals. Since individuals rarely have a complete overview of the entire system.
Also, as noted in the comments, it sets a bad precedent for people coming forward reporting issues.
> Since individuals rarely have a complete overview of the entire system.
I think the key phrase you are missing which is repeated many places in the timeline is “Engineering was not consulted”.
His failure wasn’t that he doesn’t know the entire system, it was that he kept engineering in the dark, “investigated” on his own, and made conclusions to disregard the persistent researcher who kept banging at the door saying “there is an issue” despite him disregarding it again and again, notably without consulting engineering.
You don’t need to know the entire system as CISO, but you certainly do need to know that you don’t know the entire system as CISO. This wasn’t an institutional failure, it was a personal failure at a very high level.
I’m not arguing that the guy had to resign, merely pointing out that the narrative used to argue against his resignation is flawed.
When I said "overview" I mostly meant both vision (as in observability) and understanding of everything. I would have said overseer, but then I would ask for the reader to crack the thesaurus. To not digress anymore, CISO not only need to have proper knowledge of what happens, but understand why happens, how happens and in the several contexts that happens. And that's hard.
jeremy rowley has been C level at digi cert for over 7 years and is/was CISO. if anyone should have a completely overview of the entire system. it’s him
* Does replacing the CISO actually make the system more secure? Presumably he had a lot of tribal knowledge built-up and who is going to know the system better than him?
* As systems get more and more complex, it's likely impossible for a single individual to truly understand and prevent these types of situations 100% of the time. It seems that any application that needs to be 100% secure (if that is even possible) has to be provably secure in a strict mathematical sense, which goes beyond individual culpability.
* Does shooting the person accountable actually encourage responsible disclosure or discourage it?
Does replacing the CISO actually make the system more secure?
Counterintuitively, probably yes. Tone flows from the top down, and if you want to change the tone you need to start at the top. It's very difficult to try and build a coalition to change the system from underneath.
Presumably he had a lot of tribal knowledge built-up and who is going to know the system better than him?
Likely he has a lot of political influence and knowledge of the system and for lasting change all of that has to go. If it has gotten that bad it's no good and needs to be swept away.
But it would also make the system more insecure since reporting failures means dismissals. At the end, DigiCert self reported the issue they were having. Without that, other operators would be blind to this flaw.
This one C level individual failed to do the most important part of the job, which is to build the team of people who have shared knowledge to understand and prevent these issues 100% of the time.
According to his LinkedIn he joined DigiCert as a lawyer. This is an organizational failure of DigiCert's leadership to put a non-technical person in the CISO role.
Enough companies are looking for their CISO to be an attorney, or to also be an attorney, because you spend a lot of your time threading through laws, contracts, policies, company risks, and parter risks, etc.
Much of it at that level is NOT architecture and software discussions. You wouldn't think the job would be similar to lead counsel, but unfortunately a majority of a certain company's risk now a days is in that area.
And this guy is the gold standard for accountability based on these comments. Whoever pressured him to resign I think is making a mistake.
Bugs happen, and he's doing exactly what's expected of a leader in this situation. Anyone who thinks "this incident is personally my fault because I didn't read every line of code the devs who work for me wrote, and for this dishonor I am now unfit to lead" is a sane reaction to these events is not living the blameless postmortem life.
Last I looked, a CISO's tenure was only 3-5 years (I'm being generous) because the stress is incredible and you end up being responsible to every other CxO in the company.
You have much power, but little as well because every department claims security is the reason their projects are delayed - or if they move quickly and there is an issue, they point back at CISO.
It's always the way. The people with honour tend to hold themselves to a high standard, and step back when they find they do not meet it. Their replacements are either the same in this regard, or they're not.
Captains used to go down with their ships. In most organisations, this is no longer the case, because we lost all the captains willing to do so, without replacement.
Resigning when you fail to prevent an incident rarely helps, directly. But it's not something the power-hungry do, unless forced (and if they expect to be forced, they will try to cover things up). It rarely fails to win my respect. As a move in a social game, I suspect that the general strategy "resign when an unacceptable failure has occurred" makes things better, overall, even if it doesn't directly benefit the organisation you're leaving. (I'm not sure whether this applies when you don't expect that your replacement feels the same way about duty.)
We don't know what happened in those board rooms between CISO and CEO. It's not clear if this was _strongly_ impressed upon Jeremy to resign because the market needs a blood sacrifice; or if Jeremy really did on his own will commit the corporate equivalent of seppuku. In the wake of bringing great shame, he just felt it no longer necessary for him to lead them to the future.
Honestly, I can understand that. I can respect it too. I can imagine the stress over that period of time being very high. Personal attacks, business attacks, mounting legal issues from business _partners_. We work in computers sitting on our ass. We are not used to these stressful events. This was probably Jermey's _stress test_ and he unfortunately broke.
I just hope he finds another gig or maybe starts up something on his own. I would probably follow him.
Yeah what a terrible mistake to resign and for the CEO to accept such a resignation. Shows a complete lack of understanding of what accountability actually means and shows that Amit (CEO) has no idea how to handle a critical security event.
It also shows a (potential) lack of understanding of what's happening internally at the upper levels DigiCert. This might have been a known issue or some other very poorly handled thing, is part of a larger pattern or issue, and he's been given the chance to "resign".
Don't condemn people until you have all the facts, which multiple reporters are working on figuring out right now.
> it sets a bad precedent for people coming forward reporting issues.
Disagree entirely. Shit happens, and it’s important to accurately report things going on. Sure someone could interpret this as “we should fire the people who make mistakes” but I think it’s unrealistic to begin with and if it were to happen people would see through it.
“When DigiCert has another incident (and while I have tremendous faith in Tim, it will happen), I would rather that they have Jeremy Rowley with his wisdom and scar tissue around to guide their response and subsequent improvement.”
> This incident made me realize that I am no longer the person for this role. As such, after consulting with Amit (CEO), we have agreed that the path forward is for me to tender my resignation at the company. I will definitely miss the community, browsers, and public interactions as the PKI ecosystem has been my home and life for such a long time.
wow that is an extreme level of ownership. the debate of the resignation and the chilling effect it might lend to the incident report itself is also interesting.
> The ultimate root cause ended up being me. I have led the compliance team for the past several years. The fact this went unnoticed in our many reviews during that time shows that we need a different approach to both our internal investigations and compliance controls. I also dropped the ball on the certificate problem report by failing to escalate the issue to engineering and give it the proper attention it deserved. Although I did some investigation, I failed to treat the allegations with sufficient seriousness based on what could have been wrong. I assumed I knew the systems and what was happening in them rather than deeply investigating the report. Finally, I didn’t do enough to eliminate the silos between compliance and engineering.
Really does sound like he personally dropped the ball in the handling of the report. It would be interesting to hear the story from the researcher who will undoubtably have been frustrated beyond reason that they kept acting like there was no issue despite the repeated persistent attempts at getting them to take it serious.
I question why we accept someone resigning after making a big mistake.
Unless it's malice, or the fault truly is entirely on that person, what good would resigning do?
Rowley admitted he fucked up, badly, he admitted on several layers what must be changed. How he must change. How the org must change. How the way things are presently is not good enough. Made an extremely deep dive into what happened.
And now he's leaving??? Someone who royalty messes up, would not want to mess up on the same issue twice. So all that experience is now worthless and doesn't benefit Digicert in the slightest.
> We note that other customers have also initiated legal action against us to block revocation.
This seems crazy to me. In what world does suing your business partner make more sense than clicking some buttons in a UI or running some shell commands to renew your cert?
Maybe some mobile apps or IoT hardware with pinned certs that would all need to be rolled first? So they could get a new cert immediately but still need the old one for a while until the roll over completes.
Ops here, there is probably plenty of companies who use Digicert for various certs and nothing is automated. Also, in any regulatory strict company, filling out all Change Controls and pushing them through approval process would be take me longer then 24 hours.
"Clicking some buttons" gets hard if you have hundreds of certificates, or the responsible people are on vacation, or are using the certificates in ways other than securing a website. 24 hours is a very short window.
If you publish a cert for your product and require (for whatever reason) all 3rd parties validate the cert before using it, you're going to have a very bad time here. It could cost you contracts.
Now, this could be the fault digicert's customers, but suing to block revocation would be logical.
One thing I find odd about this: the rules for CAs are long and detailed, but they don’t seem especially complicated. If I were implementing a CA, I would have the main code (their “service oriented architecture” or a monolith or whatever) produce not just an instruction to issue a certificate but a transcript of the entire exchange. Then a completely separate code path (plain old synchronous Rust or Python or Go or Haskell or ML — no microservices) would check the transcript for compliance with each clause of the requirements and block issuance if anything fails. And raise an alert that gets noticed.
One could even get fancy and use verifiable randomness for everything in the protocol that is supposed to be random.
And then one could refactor some other code with much less worry about messing up.
This might also reduce the blast radius from a bug in some other component. If the magic random string generator can be coerced into returning ‘www’, then a separate check would prevent this from compromising everything.
(I work in a different industry, and in my industry there is plenty of complex, evolving code, that needs to do the right thing. The more competent players have separate verification code as a double-check.)
they want to get payed for the extra work that they will need to do. also there might be some systems that cant be modifief fast enough and they need more time before revocation. of cours they could just ask directly.
In previous incidents I’ve seen cases of customers who had contracted with their customers to giving them 90 days notice of any changes to certificates.
Probably cheaper to file for one TRO than defend yourself against cases from a bunch of your own customers.
I mean, cheaper still to not contractually obligate yourself to something you can’t guarantee or perform, but once you’re already in that position…
I have low expectations from C-level executives. But this incident and his response to it has changed my perspective of them just slightly.
It's a rare incident where a C-level executive actually takes accountability for their fuck up. Shit rolls down hill. He is very likely to end up taking the helm at another place or startup on his own. He is the exact opposite of the CrowdStrike CEO (George Kurtz) that caused an absolute shitstorm compared to DigiCert incident.
> We also found that the bug in the code was inadvertently remediated when engineering completed a user-experience enhancement project that collapsed multiple random value generation microservices into a single service.
Interesting. What is the value of a microservice that generates random numbers over just using a language's SecureRandom equivalent?
edit: well I got that wrong. The reason seems to be they had multiple different validation methods and wanted to be able to use the same random number across the different methods. Lots of details in the incident report[0].
I'll leave this in case some find the links interesting:
For someone like DigiCert I imagine they could ensure a much higher quality source of randomness, ie multiple different hardware sources like the avalanche noise of semiconductor junctions[1], cosmic rays[2] or lava lamps[3].
It's also easier to ensure the random numbers used are of good quality when you have a single source, ie you can collect statistics as they're served.
An underscore prefix was added in some microservices but required in all. Abstracting it out fixed the code paths requiring an underscore but not adding one.
He resigned with honor, grace and responsibility – and should be applauded.
This is what real accountability looks like, and doing so not only preserves the reputation and trustworthiness of his employer, but demonstrates that he is a valuable contributor and trustworthy individual. He will land on his feet as a result.
I don’t quite follow why a missing underscore results in a security problem. It seems like it must be somehow related to what’s valid for CNAME records?
Given that a revocation is simply a publication of additional data by a CA, and does not directly affect the customer’s systems, how is the TRO in this case not unconstitutional? I’m not a lawyer but it feels like prior restraint, no?
It does directly affect the customer’s systems. Revoking the certificates would have broken the customer’s website as an immediate, foreseen, and arguably intentional consequence. I’m skeptical any judge would buy an argument that this is an expressive act.
Even if one would, I very much doubt that a company which argued it has a free speech right to break your website with 24 hours notice would survive the ensuing controversy.
While I applaud his openness and willingness to take accountability, I agree with others that resigning shouldn't be necessary.
Resigning is what you do when you are clearly not fit for your post. Jeremy has demonstrated that he is anything but unfit. People that can see where things went wrong, who can communicate such, can come up with changes to fix those issues, and can implement them are exactly what is needed at such a high level of management. Most people would bury the story or claim ignorance, but Jeremy doesn't hide anything and takes full responsibility.
I wish Jeremy could have stayed and used this honesty and insight to make the necessary changes. Firing a C-level executive when things go wrong doesn't fix anything any more than finding a low level engineer to blame and fire. Experienced people learn lessons by making mistakes. It sucks that it happens, but unexpected circumstances can't be foreseen. Hindsight is 20/20. Now that they know, they know to look out for it and to change the system to prevent it next time.
Perhaps he did overlook it. Perhaps he didn't respond when he should have. It's easy to get complacent. This is a wake up call. I have no doubt that he would be much more attentive and responsive as a result of this, and as such, be exactly what's needed for his post.
Mistakes don't call for sacrifices; they call for systematic changes to prevent making the same mistakes again.
Thank you Jeremy for being as forthcoming as you have been. I only wish more C-level execs would do the same. I hope you find a good place to land where you can take this experience and do an even better job. And I hope that whoever replaces you can bring the same rigor and professionalism that your brought.
Also, as noted in the comments, it sets a bad precedent for people coming forward reporting issues.