Hacker News new | past | comments | ask | show | jobs | submit login
Google's response to Reptar CPU vulnerability (bughunters.google.com)
113 points by iseappsec on Dec 14, 2023 | hide | past | favorite | 86 comments



"A non-disruptive hotload microcode update" a what? I've only seen microcode updates as a boot-time side path, how "hot" is this hotload?


On linux you can enable an option in the kernel to update microcode at runtime (late microcode loading), but this is not recommended as you have processes that started before the late microcode update and thus might have already ran into cpu bugs


> might have already ran into cpu bugs

Not only that. There were microcode updates which disabled CPU features (most notably TSX). During process startup such features might have been detected as available and decided to use optimized subroutines. Disabling features while the process is still running might either severely degrade performance or outright crash it.


Was TSX ever used? I didn't even see an ounce of dev documentation about it.


HLE, which is a subset of TSX, is (was?) supported by glibc [1]. Not sure if it is enabled by default. If it is, pretty much every multithreaded linux process would be affected.

[1] https://www.gnu.org/software/libc/manual/html_node/Elision-T...


And, more presently relevant, the actual sequence of CPU state manipulations needed to apply the update reliably in gnarly. Hopefully the thing currently implemented in Linux is fully reliable.


I think you can load microcode after boot, it's just more likely to have weird effects, especially with older processors.

For VMs running in GCP, they do migrate VMs across machines from time to time, and you could handwave a process where they take a machkne, update the hypervisor image with new microcode, migrate all VMs off the machine, then make it available to receive VM migrations. Migrating a VM is disruptive, although the disruption is brief.


Oh, that's a good point, you could even determine that your hypervisor doesn't touch any changed microcode paths, and just pause the VMs, take the update, and unpause. (Still seems... brave, maybe? Or at least involves more information that I'd expect anyone outside of intel to actually have, I guess.)


Did tavis sell his soul for infinite exploits?

Also the amount of stuff he finds alone makes you think about how much stuff gov agencies find.. they get a lot of smart people as well.


Tavis collabs with plenty of people (both within Google P0 and outside of google; priavte 0day groups). Plus, they have the resources / hardware to fuzz the hell out of everything.


If they push updates for stuff like this to Chrome OS prior to disclosure is it not possible to watch chrome OS updates and reverse engineer exploits, at least in theory?


People RE patches all the time. But they don't contain exploits, just vulnerability fixes.

In microcode update case it's not much help for developing an exploit when it's encrypted.


Occam's razor: Google processes are all built around hosted software. they're still learning how to actually ship code to users device.


Seriously? Is that is the simplest explanation you could come up with?

Occam's razor: They do it like everyone else.


compare google QA and release process to someone bound by auto or medical industry.

imagine a google peace maker. lol


> the exploit on a guest machine causes the host machine to crash resulting in a Denial of Service to other guest machines running on the same host. Additionally, the vulnerability could potentially lead to information disclosure

I don't quite understand how this issue can be a DoS vector, and 'maybe' lead to info disclosure...

If the only issue is that a machine check exception is thrown when it shouldn't be, then I don't see how that leads to info disclosure...


In Tavis' blog post he explains it:

> In general, if the cores are SMT (Symmetric Multithreading) siblings, then you may observe random branches, and, if they're SMP (Symmetric Multiprocessing) siblings from the same package, then you may observe machine checks.

[...]

> However, we simply don't know if we can control the corruption precisely enough to achieve privilege escalation. I suspect that it is possible, but we don't have any way to debug μop execution!

The exploit they published crashes the machine, but it may be possible to write one that does other things, they just don't know how.

Actually useful blog post: https://bughunters.google.com/blog/5997221712101376/the-rept...


> In Travis' blog post he explains it:

Tavis.


Thanks.


It is not an exception. That would have happened if the CPUs had not been buggy.

It is a crash that requires a reboot of the computer.

Therefore a rogue VM can take down the hypervisor and all the other VMs hosted on the same server.

The bug is that according to the x86-64 ISA specification that sequence of instruction prefixes should have caused an illegal instruction exception, but it does not cause any exception. Instead of that, it desynchronizes the microcode execution and the CPU executes other microinstructions than it should, leading into a crash.


If you can get the exception thrown only when another process has a particular state, or if that other process's state can affect the timing of the exception throw, then that's information disclosure.

Typically these are are hard to pull off, I remember a case a number of years ago where the proof of concept private key exfiltration came months or maybe even a year after the vulnerability was shown, and even then I believe the process took minutes or perhaps hours to run. This stuff isn't magic, it's really hard, but that doesn't mean it's not possible.


They produced the processor equivalent of a segfault. Obviously this is a denial of service but it could also mask a more problematic bug lurking behind the scenes.


I guess it also could 'maybe' lead to people dying... under the right circumstances.

In other words: just marketing.


Why would they take the risk of hotloading microcode instead of rolling out to small numbers of machines and rebooting gracefully?

Unless they run their datacenter at redline all the time, there should be plenty of spare resources to shift load and bring a machine down for maintenance. Right?

I really don't understand how the risk of crashing programs or introducing undetectable errors outweighs the effort of rolling the update out gradually. surely they have an automated system that can shut down any number of machines without disrupting the network?

Then again I have no idea what a datacenter on Google's scale even looks like. Maybe there's some crucial reason they can't shut down any machines at all ever. Smells bad to me though.


This reads like advertising copy for Google Cloud. "Come and host with us, so you get the advantage of pre-disclosure protection against issues that we discover.".

I wonder to what extent the security industry is self-reinforcing and how many of these vulnerabilities would be discovered by 'the bad guys' taking into account that none of them have these kind of resources. But now the rest of the world has to deal with the fall-out of the disclosure that Google got a head start on and can relatively easily deploy across their cloud. It feels a bit like the hero model to me: only we can keep you safe from the problems that we create.


You may have missed this part:

> From there, Google partnered and collaborated with Intel to securely share the vulnerability mitigation information with other large industry players to ensure they too could respond and protect all users globally (not only Google users).

So while the blog post of course is marketing, Google does not gain any direct competitive advantage from the vulnerability, all the large clouds are more or less on the same starting line.


Well as long as you’re large industry player you’re on the same starting line.


What are you suggesting? They shouldn't secure their services before public disclosure?


Yes, either everyone is operating on the same playing field or this is a giant “fuck you” to every independent operator designed to pressure consolidation into mega providers.


I work for a startup that competes (in some sense) with Google Cloud. The resources Google can apply to identifying vulnerabilities are absolutely a competitive advantage they have over us. What's the point in complaining about it? It's a true fact, a very real reason to use GCP as opposed to other cloud platforms.


Google Cloud customers had their servers patched a month before the vulnerability was publicly announced. Your customers got the patch whenever your Linux distro pushed out the microcode update. But this doesn't mean that Azure and AWS also had to wait, I would expect Intel to provide the update to them some time before making it public.

Also, how many vulnerabilities relevant to public clouds does Google detect in one year, and does that outweigh the lack of support and care you can expect from Google?


>But this doesn't mean that Azure and AWS also had to wait

The article says they didn't wait, but got informed.


> 'the bad guys' taking into account that none of them have these kind of resources

State actors definitely have these resources. USA, China, Russia at a minimum. I definitely feel the angst against the effort it takes to protect against bad actors. For example the adoption of https/ssl had a lot of worry about how expensive it was to encrypt server traffic. There were people arguing that it just wasn't worth it because the risk felt low, but it turns out the compromising was happening in practice. The ability to not have to guard against bad actors would make implementing technology a radically simpler endeavor, we just simply don't live in that world.


>State actors definitely have these resources. USA, China, Russia at a minimum

Do they? Amazon and Google level knowledge, both from live systems with data like all of Google and Amazon, but also the developers and analysts to look through data from it and to comb through source code? I very much doubt Russia has this at least, but it is a widespread American point of view that Russia has, well, basically everything one day and nothing at all the next (IE. Ukraine war). I doubt anyone -except maybe the US- has this.


> Do they?

Have the resources to develop zero days? Of course they do. Like, even Iran, North Korea, Uzbekistan, Vietnam, etc have the resources required to either make their own zero days, pay someone unscrupulous for it or acquire them some other way. It only costs a few million.


Is your solution that these things shouldn’t be looked into by Google and let’s just hope no bad actors find and make use of it?

Really seems like there’s no winning for Google in the public narrative!


Google really needs to be more like Apple and close source everything, keep all innovations to themselves and stop giving away software. Apple has shown that locked down operating systems and ecosystems are the most profitable and legal friendly courses of action.


Hmm. This is either the most sarcastic or misinformed comment I have read in a long time on HN.


Hmm...Apple's quarterly results would say otherwise.


This bothers me even more when it comes to web browsers. (Or should I say the browser?)

I grew up in a world where everything was hackable. Chrome (and Firefox, but it really doesn't matter anymore) become less and less modifiable and adaptable. In theory and practice.

I think the main reason there is no viable alternative browser is that it has become far too complicated and far to much effort to write and maintain one. But - and here comes the point - even if we could muster the resources to pull one off, we'd never gain enough trust that it is as secure as Chrome to make it even remotely popular.

As long as things are as they are, it is a game we're never gonna win.


>we'd never gain enough trust that it is as secure as Chrome to make it even remotely popular.

The vast majority of people do not care about security, privacy, etc. at all.

Chrome achieved dominion simply because it's better to use than anything else, it also doesn't help that Firefox also Mozilla'd itself into irrelevance.


Chrome achieved dominion because it was pushed heavily by Google abusing their monopoly. If Chrome had been fielded by a small software company (assuming they could have) it would have maybe reached parity with FF but it would have never gotten to the dominant position that it has today. All of those 'download Chrome here' and all of those articles that happen to move to the top of your search results add up to a pretty big competitive advantage.


Chromium still seems to be plenty modifiable and adaptable. It's pretty modular and there are plenty of active forks of it: Edge, Brave, Opera, etc.


And those projects are dependent on Google's perpetual goodwill to keep Chromium open and up to date. What if Google one day yanks the rug out from under them?


It's instructive to think about some history. Chrome's rendering engine, Blink, is a fork of WebKit. Before 2013, the landscape of browsers and browser-like projects were even bleaker than today: basically everyone depended on Apple's perpetual goodwill. Even Google, because Apple had a notoriously bad code review process for committing to WebKit itself. Then Google forked it.

You ask what if Google one day yanks the rug out, and the answer is, plenty of large companies will fork it.


> Before 2013, the landscape of browsers and browser-like projects were even bleaker than today

What are you talking about? That was the prime of Firefox and browser modification


Since GP was talking about dependence on Chromium, I was naturally talking about the dominance of WebKit at that time. WebKit had almost 100% dominance on mobile and ~60% market share on desktop. Of course by then everyone knew that mobile was the future so it was even fine to target just WebKit.


That applies to any project you fork or build. The team working on it can change licenses or stop development at any time.

Considering how many companies depend on working with Chromium there is financial backing for funding development if Google were to go away. It is the browser in the most favorable position for if this were to happen.


The difference is that most projects are of reasonable scope so that an individual or small team can take over maintenance completely if needed.

Web browsers on the other hand have become so complex that is no longer possible without an enourmous amout of resources so you really are dependent on big G to keep feeding you updates. This complexity is at least part due to the ever increasing number of standards and expanding scope that Google themselves are pushing for.


What prevents literally anybody from continuing to support Chromium? Has Microsoft ran out of competent engineers?


Microsoft is not and will never be the hero in any story.

> What prevents literally anybody from continuing to support Chromium? Has Microsoft ran out of competent engineers?

Remember that Microsoft failed at its promise of if we don't match Chrome bug for bug, that is a defect in Edge. They failed and they gave up trying. My personal conspiracy theory is that it costs well over a billion dollars a year, not including marketing dollars or bribery dollars, just to keep Chrome running.

I don't think Google will pull the rug on Chromium but then again all bets are off if Google has new overlords or if Google isn't making that USD 200B+ revenue year over year. Things feel permanent probably right before giving up the ghost. I think if Microsoft felt like it could avoid using Chromium with its own stuff, it would have never touched Chromium.

tl;Dr I doubt Microsoft will put in the money or energy it takes to maintain Chromium.


If chromium were abandoned they wouldn't really have a choice, would they? I guess they could migrate edge to run on top of firefox, but I'm not sure if they'd want to.

It's actually a little interesting, why did they choose Chromium in the first place over firefox, when Microsoft and Google are more directly competitors?


Sure, you can fork it but your fork will never reach the same level of security as the original simply because you cannot afford to put in the same effort as Google does.

My point is that these forks are futile because it takes only one major vulnerability - found by Project Zero and publicized with Google's might - to blow you out of business.


Project Zero notifies vendors before disclosing the vulnerability. They would also be able to provide a patch from Google assuming the code hadn't diverged too much since the fork.


It really feels like Google could announce the cure for cancer and HN would find a way to talk shit about it.


Hmmm...

if the cure for cancer is offerred as a subscription model, that requires an active Google account, well...


History is full of organizations that do good with a sprinkle of evil and they do get their fair share of criticism. Christian missions come to mind


Here is the adblocker for your cancer meds.


They'd just cease producing the pills in 18 months.


That's not even a joke. If Google had a magical cancer cure, they would absolutely stop supporting it. The manager in charge of that project got their promotion, and the new manager brought in won't get any credit for maintaining an existing project, they need a new project they can put their name on.


The striking similarity of this pattern with what real world politicians are doing is not a coincidence


The worst part is that you have to take the medication for 20 months before you're cured


They'd develop the cure in the open, all their researchers will end up working for a new startup starting off as a non profit but then spinning off a for profit company which will actually make a working cure and everybody will wonder why Google lost their lead at cancer research


If you were emperor of Google Security what would you have them do differently?


Release the information to the rest of Google in the same timeframe and in the same way as Amazon, Intel, Linux, BSD, Microsoft, etc.


This reads more like an advertisement for the person who wrote the post - as they did not even discover the bug, just led security response.


> I wonder to what extent the security industry is self-reinforcing

Works wonders for Apple. ("look, we fixed the bug discovered by Citizenlab").

Security, unfortunately, has become theater. There are some things fixed when and if they are disvovered but, in general, the main issues were not adressed (fine grained permissions, web browser as remote code executor, etc).


Anyone else hate this particular blogging style?

Grandiose "expert" posturing, focusing exclusively on corporate bullshit hierarchy while being scarce on meaningful details. So they chose to hotload the microcode patch instead of rebooting, that's the only actual piece of information.


Once I realized this was reported in august 2023 I kind of just closed the browser window.

It's a hardware bug and it's been reported, that doesn't really seem like enough time to start holding people's ears to the fire.


Grandstanding on the achievements of others. He didn't even rate a mention on Intel's page so I guess this is his attempt at hogging the light.

https://www.intel.com/content/www/us/en/security-center/advi...

It's interesting that Intel more or less hints at having been aware of this before Google reported it. Also: a lot of fear mongering about information disclosure but there is afaik no proof of concept for that (though eventually a way might be found to do that it isn't the case right now).


Yes, I found it depressing how many people needed to be involved for such a simple decision. "We installed a vendor update" is not blog worthy.


Next idea: pay Intel to introduce obscure vulnerabilities so Google can have an headstart and try to attract more people to GCP. :)


Oh no, Google is benefiting from investing in security research. Damn if they do, damn if they don't


Funny how they used to do it before GCP when the company had a moral backbone.


"We like the results of security research, but to be concerned about money is bad taste".


When you pitch a team as research, it’s in very bad taste. Google used to set itself apart from Oracle by not tying stupid commercial interests to teams like that.

The reputation damage to that team for being tangled in GCP shilling is subtle but immense. Pure security researchers will be driven away by this kind of thing and just stick with academia or other research institutions.


LMAO they are free to go to academia now if they don't like Google salaries... ahahaha who am I kidding.


They are. Not everyone is motivated by money, especially researchers.


> They are

Demonstrated by said researchers publishing under the Google brand.


How did they do it before?


For the good of the broader Internet/society. Like an academic research institution.

Value to Google came in the form of attracting top talent because it indicated it was more than a version of oracle/ibm focused on nothing but money making.

It also came in the form of feedback on how to improve software security in general. Google was also able to tap the team resources to look into the security of a new proposal.

8 years ago if the notion of “we’ll sit on vulnerabilities, patch them in our products, and use that as advertising for the products” came up, it would be laughed out of the room.

People joined project zero to do cutting edge research and improve the world, not to line Google’s pockets through early disclosure access.


Security quality seems like a competitive advantage for GCP.


The "Reptar" name may have different cultural references based on your generation: cf https://rugrats.fandom.com/wiki/Reptar_(character)


That's clearly the namesake. It's a bug that triggered machine check exceptions that halted the machine, and Reptar's known for the line "HALT! I am Reptar! HALT! I am Reptar!" Is there another Reptar?


My cat.


What’s the other reference? Rugrats is the only one I associated it with (from the movie trailers on network tv, I didn’t actually watch rugrats beyond a couple episodes while visiting relatives with cable).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: