Hacker News new | past | comments | ask | show | jobs | submit login
TikTag: Breaking ARM's memory tagging extension with speculative execution (arxiv.org)
180 points by skilled 5 months ago | hide | past | favorite | 26 comments



ARM has confirmed the possibility of an attack on systems with Cortex-X2, Cortex-X3, Cortex-A510, Cortex-A520, Cortex-A710, Cortex-A715 and Cortex-A720 processors, but does not intend to make changes to the CPU to block the problem[0], as how the MemTag architecture means that tags are not sensitive data to applications. Chrome's security team acknowledged the issues but won't fix them, as the V8 sandbox isn't designed for memory data confidentiality, and MTE defenses aren't enabled by default.

It's mentioned quite early in the paper but there's a prototype toolkit on GitHub[1] if you want to jump straight into that.

[0]: https://developer.arm.com/documentation/109544/latest

[1]: https://github.com/compsec-snu/tiktag


I'm not sure anyone is really surprised by this. Apple for example likewise calls out that you are at risk of an attacker using timing attacks to construct an authentication oracle for PAC, which is much more explicit about when exactly authentication checks actually happen.


"Confirmed the possibility" -- it was known when they invented this.

https://infosec.exchange/@david_chisnall/112632040031740747


Are you going to be posting the original shor's algorithm when the first quantum prime factorization hits the streets too?


Does this actually have any real world implications? If you're using MTE you already know there's a simple attack that succeeds with 1/16th probability (or maybe greater if you reserve some tags for special purposes or have some exploitable pattern in how your allocator hands out tags). In that system this pushes the attack probability to 100%. A nice technique but (hopefully) you've already architectured things assuming MTE will be broken so this doesn't change anything.

Certainly MTE was always pushed as a debugging feature only.


> Certainly MTE was always pushed as a debugging feature only.

This claim seems to come from ARM marketing but every SW company applying it - notably Google which I believe may have worked with ARM on this - mention of these types of tags in a security context (I believe Herb Sutter also talks about this as a way to harden c++ code). In fact, it’s seen as one very important way to harden existing legacy c/c++ code (and even unsafe Rust). There’s a big difference between 100% success rate and 1/16th - while you will still get an exploit through a 1/16th chance, the chances of detection raise significantly which isn’t nothing in a security context as it leaves a trail of breadcrumbs to investigate.

> It will take more fundamental mitigations that address the root issue of memory corruption, like MTE, in order to dramatically raise the bar for attackers.

https://googleprojectzero.blogspot.com/2024/06/driving-forwa...

> Hardware memory tagging may fix some security pitfalls of C++, while still allowing high performance. We are looking forward to see a more broad adoption of hardware memory tagging in the future and suggest using Scan on top of hardware memory tagging to fix temporary memory safety for C++

https://security.googleblog.com/2022/05/retrofitting-tempora...

Of course MTE + Scan described above may not be vulnerable to this attack, but my point stands that MTE is very much viewed as a memory safety hardening feature (and thus the security implications that come with safety hardening).

This isn’t that bad news either. It may leave existing chips vulnerable but the next iteration of MTE will probably account for this problem (assuming you can’t patch microcode to solve this).


Are Google really pushing it a 'proper' security feature? You've quoted a line at the end of a blog that doesn't elaborate further. In the android documentation for instance it stats '[MTE] helps detect use-after-free and buffer-overflow bugs': https://source.android.com/docs/security/test/memory-safety/... without making any stronger claims.

> This isn’t that bad news either. It may leave existing chips vulnerable but the next iteration of MTE will probably account for this problem (assuming you can’t patch microcode to solve this).

I might argue fixing is is actively bad. Helps push the idea this is a feature to actually use for security rather than augment bug detection. Still fixing it shouldn't be too hard, just speculate as you do currently but always assume the tag check passes so you get the same speculative behaviour regardless of tag values (likely easier said than done of course, probably some fun corner cases where they'd still differ).


I’d say yes. Reading the blogs fully (and reading comments from people like Herb) makes it pretty clear it’s seen as a defense in depth mechanism. Yes it’s a debugging tool but since it’s running always at runtime it’s also usable for security purposes.

Whether it should be used for security or not is irrelevant - it clearly will be and I don’t see why it’s inherently undesirable. Seems like a useful technique to further raise the complexity of exploit chains needed to execute an attack and in some applications the speculation bypass may not even matter (eh CHERI + MTE is not subject to this attack based on what’s been written).

The annoying thing is that ARM engineers took your position instead of realizing it will be used for security applications anyway and fixing the speculation issue during the design phase since they did know about this.


MTE can be enabled in two modes. One is "synchronous": tag checking occurs on the relevant load/store instruction and will raise a segmentation fault immediately in case of a mismatch; this mode is slower but retains information about where the bad memory access was (i.e., hence intended for debugging). The second mode is asynchronous, where any mismatches are only raised after a context switch; this mode has minimal overhead but loses relevant debugging information (i.e., hence intended as a run-time protection).


The difference between 100% reliability and 7% reliability in this environment might exceed a million dollars.


I totally agree. The authors have found a security vulnerability in a mechanism that wasn’t even trying to protect the software from attacks. I don’t see any relevance in this paper except being a good exercise


The authors' hypothesis about the STLF case:

> Considering the affected core (i.e., Cortex-A715) dispatches 5 instructions in a cycle [ 55], it is likely that the CPU cannot detect the dependency if the store and load instructions are executed in the same cycle, since the store information is not yet written to the internal buffers.

> If Len(GAP) is 4 or more, the store and load instructions are executed in the different cycles, and the CPU can detect the dependency. Therefore, the CPU skips the tag check and always forwards the data from the store to load instructions.

> If Len(GAP) is less than 4, the store and load instructions are executed in the same cycle, and the CPU fails to detect the dependency and performs the tag check for the load instruction. In this case, the forwarding is blocked on tag check faults.

This would mean there's no forwarding from stores to loads within the same dispatch group on these cores?


> No problem, I was told, it’s not intended as a mitigation technique, it’s intended for at-scale debugging. Then certain companies decided it was a security technology.

Yeah and it did work well for that usecase. https://grapheneos.social/@GrapheneOS/112066872276203917


My brain automatically thought this is "TikTok" -> Why would they release research on security if they are being targeted for security concerns


> Why would they release research on security if they are being targeted for security concerns

Hmm. That doesn’t really follow IMO.

Every big corporation needs to have information security people. And most of them probably do some form of research as well. That’s the only way to stay on top of information security.

Whether they publish anything or not is up to them. But it’d be super silly to base the decision for that on the fact that USA government is banning TikTok.

By that line of argument, couldn’t you just as well say that they shouldn’t reveal that they hire infosec people either? Yet for example at https://careers.tiktok.com/m/position?category=6704215862603... they have several job openings relating to infosec such as this one: https://careers.tiktok.com/m/position/7379470900063095078/de...

> The Global Security Organization provides industry-leading cyber-security and business protection services to TikTok globally. Our organization employs four principles that guide our strategic and tactical operations. Firstly, we Champion Transparency & Trust by leading the charge in organizational transparency, prioritizing customer trust, and placing user needs first. Secondly, we aim to maintain Best in Class Global Security by proactively identifying and reducing risks while enabling innovative product development. We constantly work towards a sustainable world-class security capability. Thirdly, we strive to be a Business Catalyst & Enabler by embodying the DNA of technical innovation and ensuring our Global Security operations are fast and agile.

Heck here they even say just that: “proactively identifying and reducing risks while enabling innovative product development”. A natural part of that would be doing for example the kind of security researcher that the article of the OP is about.



This is not a bug caused by memory unsafe languages.


No, this is a feature created for memory-unsafe languages which has turned out to be yet another memory safety attack vector. But there's no way to prevent this.


> for memory-unsafe languages

And languages which require unsafe blocks to write any real system code.

> But there's no way to prevent this

There isn't. There are only mitigations. The environment is actively hostile.

You can try to bake them in the hardware, which has huge up front design costs, and if a vulnerability is ever found, they become useless instantly.

You can try to bake them into the language, which has moderate continual design costs, but if someone simply types 'unsafe' anywhere in your codebase they become useless instantly.

You can try to enforce them in the frame work of your own code, which has moderate up front design costs, is possibly less reliable than the other methods, but they can be changed and redeployed when vulnerabilities are discovered.

These problems don't go away simply because you made a particular hardware or language choice. You can't sweep this engineering under the rug and then act like your software is more secure than anything else.


> And languages which require unsafe blocks to write any real system code.

The difference you've missed is cultural. Rust's unsafe blocks just denote code which the compiler can't check, which if you have the appropriate culture makes those blocks a focus for human oversight in a way which cannot be sustained in the legacy languages where the whole codebase has the same problem.

Also, there's a lot of security and performance critical code which isn't "real system code" by this weird definition. Whole domains where it's just greed and stupidity versus a correct solution with better performance, and yet the greed and stupidity gets ahead as often as not. Depressing.


> The difference you've missed is cultural.

Describe how this culture is created and enforced by the language itself. Otherwise if it does actually exist I'd worry it's simply an ephemeral phenomenon that will disappear if the language becomes more common place. You could also expect to be able to develop this culture in any other project in any other language.

Further, in my experience "lack of culture" is not why software engineers make mistakes.

> Whole domains where it's just greed and stupidity versus a correct solution with better performance,

In open source projects? If a better, correct, more performant solution existed, people would use it. If they're not, something else is causing it, and I highly doubt it's because they're just "stupid" or "greedy."


You have the cart in front of the horse. Culture drives language. This is why the WG21 musings, especially those by Bjarne himself are so amusing. Bjarne actually gave a big talk about how he can solve this and the talk includes the C word exactly once, in a quote from someone else explaining the problem which Bjarne promptly dismisses.

> Further, in my experience "lack of culture" is not why software engineers make mistakes

Culture is why the mistakes aren't caught and instead go uncorrected. The language features, the tooling, and larger engineering processes which rely on them are a product of culture.

> If a better, correct, more performant solution existed, people would use it.

This is a really common mistake on HN, the false belief that people will somehow magically know a choice would be better and always make that choice, so therefore all their choices must have been optimal - nothing about our world would suggest this could be the case and yet it's so often relied upon as if it's somehow obvious. And no it's not just stupidity, laziness is also crucial, humans are very lazy.


Memory safe languages contain memory bugs as soon as eg you want them to be fast enough to implement a JIT.


The only exception is if the microcode responsible for this was written in C somehow. I highly doubt it though.


Oh no, I meant that this feature was made so people could continue to use memory-unsafe languages, but it turns out to be defeatable. If only there was some way to prevent this.


That's the point. /s




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: