Reverse Engineering the M1 [pdf]

floatboth · on Aug 6, 2021

Of course most of the platform weirdness is down to "we're vertically integrated and our business goal is the end product and getting that product done fast" but some of the design decisions are kinda baffling even when taking that into account. Like what exactly did they gain by making a weird non-PCIe NVMe situation? Is it really any easier/faster to make the kernel handle that crap than to put the drive on a virtual PCIe bus in hardware+firmware? Are they.. (oh no) trying to improve boot speed by not having to discover the drive on PCIe?!

marcan_42 · on Aug 6, 2021

Why have PCIe when you don't need it? Nothing else is on PCIe internally on mobile SoCs. It's just not a thing. That would just add silicon that you don't need.

All these "crazy" design decisions only look crazy from the point of view of x86/server hardware. From the point of view of an embedded SoC this is all reasonable and standard practice.

floatboth · on Aug 6, 2021

Well, yeah – embedded standard practice is not caring about standards and expecting the kernel to accommodate you. Makes sense for Apple with their vertical integration, but even the less integrated things do that, see the entire concept of a "BSP".

But that is what sucks. That's why embedded is creating all these piles of e-waste doomed to only run crappy outdated vendor kernels unless someone invests huge effort into reverse engineering. The whole "we don't need" attitude in regards to standard things that mainline kernels Just Work with is evil.

And now – with both Apple and Qualcomm – we have this embedded crap powering general purpose laptops…

marcan_42 · on Aug 6, 2021

Embedded has the DeviceTree standard to serve the same purpose as PCIe enumeration. This is even supported inside UEFI (which is how we will boot standard distros, once our kernel patches trickle upstream). There's no reason why not using PCIe means "creating all these piles of e-waste doomed to only run crappy outdated vendor kernels". What does that isn't whatever choice of device enumeration you use, it's vendors not bothering to upstream anything.

(This is another difference between the Asahi Linux project and Corellium's kernel; we're going through the bureaucracy of standardizing all of our DeviceTree bindings, which takes time but establishes a common reference that other OSes such as OpenBSD and bootloaders such as U-Boot can use to also support drivers for this hardware using our first-level bootloader.)

asddubs · on Aug 6, 2021

how exactly does one standardize DeviceTree bindings?

marcan_42 · on Aug 6, 2021

The canonical repo is the Linux kernel tree, so you go through there (and the DeviceTree maintainers). This is mostly for practical reasons since they are the primary consumer, but the bindings are packaged for use by other OSes too.

ChuckNorris89 · on Aug 6, 2021

>And now – with both Apple and Qualcomm – we have this embedded crap powering general purpose laptops…

Indeed. This is why I'm not as enthusiastic about an ARM desktop future as everyone else is and honestly I'm quite terrified. The happy accident that the original IBM PC was, is that it had and open BIOS and HW interfaces which allowed HW vendors to come up with clones that were compatible with the rest of the ecosystem allowing for a chaotic anarcho-democracy where no vendor had control over the ecosystem, so today in the PC realm we have this open garden that everyone can install virtually whatever HW and SW they want.

Now, Apple, Qualcomm, Microsoft, Nvidia(through their desired acquisition of ARM) have seen the mistakes IBM has made which got them kicked out of their own ecosystem and, instead of going through a standardized, open route, try to create their own HW+SW walled gardens where they can rule with an iron fist and lock everything in.

I don't care if they bring 2X the performance/slimness, I just don't want to be locked into a walled garden and then be monetized through rent-seeking behavior.

zozbot234 · on Aug 6, 2021

The IBM PC was never "standardized" in any sense. It was simply one out of many de-facto standards. Early PC's didn't even support any sort of hardware enumeration, that only came way later with "Plug and Play"-compatible hardware.

svenpeter · on Aug 6, 2021

NVMe requires a co-processor (which Apple calls "ANS") to be up and running before it works. This co-processor firmware seems to have a lot of code and strings dealing with PCIe. Now I haven't looked at the firmware in detail but I'm willing to bet that the actual drives are on a PCIe bus (or at least used to be on a PCIe bus on previous hardware).

It's just that this bus is not exposed to the main CPU but only to this co-processor instead. The co-processor then seems to emulate (or maybe it's just a passthrough) a relatively standard NVMe MMIO space.

marcan_42 · on Aug 6, 2021

Yes, the raw NAND storage modules are connected over PCIe on all M1 machines, to dedicated root ports that are behind ANS. As far as I can tell ANS implements (parts of?) the FTL and data striping and other high-level features, and effectively RAIDs together the underlying storage modules into a single unified NVMe interface. So in this sense, the PCIe here is just an implementation detail, the physical/logical interface Apple chose to connect its Flash modules to the built-in non-volatile storage controller in the M1.

svenpeter · on Aug 6, 2021

Ah, that makes a lot of sense. Then this unified MMIO NVMe is "just" emulated inside ANS.

monocasa · on Aug 6, 2021

Are there any plans to replace the Apple firmware for ANS as well, or is that so locked down with signature checks that we can't expect to be able to?

wtallis · on Aug 6, 2021

> Like what exactly did they gain by making a weird non-PCIe NVMe situation?

When Intel did exactly that, there was a clear plausible chain of decisions leading to that madness. I have no clue what may have led Apple in this direction, but the excuses probably aren't any more pathetic than trying to explain why Intel has shipped two mutually-incompatible "solutions" for preventing NVMe drives from working out of the box with unmodified Windows.

floatboth · on Aug 6, 2021

VMD is weird, but at least it doesn't require big intrusive changes like decoupling your NVMe driver from PCIe. It's more like a weird special PCI-PCI bridge. Only needs a little extra driver: https://reviews.freebsd.org/D21383

wtallis · on Aug 6, 2021

VMD isn't the only method Intel has used to mess with how NVMe works. Their consumer chipsets going back at least to Kaby Lake had an even weirder "feature" that hid NVMe devices from PCIe enumeration and made them only accessible through proprietary interfaces on the chipset's SATA controller. Intel had to start using VMD on consumer platforms instead when AMD forced them to start providing more PCIe lanes from the CPU.

ribit · on Aug 6, 2021

Well, their SSDs are not PCI-e devices, they are directly connected to the SoC. I think that their use of NVMe is just a compatibility stop gap, they will probably move to a custom direct-access interface in the near future.

marcan_42 · on Aug 6, 2021

They're using NVMe with customizations because NVMe is a good standard. There's no reason to reinvent it from scratch when it works; they can just make the non-standard changes they feel like making, as they have already done.

There's no fundamental reason why NVMe has to be tied to PCIe; it just happens to be that way on existing devices.

mdaniel · on Aug 5, 2021

Wow, I would buy every book written by this person, as the amount of knowledge and experience required to pull off a stunt like that feels overwhelming

Does anyone know if https://dougallj.github.io/applecpu/firestorm.html is the "Dougall Johnson's work if you can find it" reference, or is there some dark web version?

igor_sk · on Aug 6, 2021

It's https://gist.github.com/dougallj/7a75a3be1ec69ca550e7c36dc75...

ksec · on Aug 5, 2021

> A14/M1 use a non-standard NVMe queue format

>USB4... but Apple made it their own, as usual

I am wondering about these two points. Why ?

jeffbee · on Aug 6, 2021

If you're going to have your own platform where you control all the parts and none of them are replaceable, why would you religiously stick to the standards when you don't need to? Maybe they saved a slight amount of power consumption by not supporting 64K queues, like the nvme standard supports. A standard will try to be all things to all people, but it will never be the best thing for any given application. There is always room to specialize.

marcan_42 · on Aug 6, 2021

The non-standard NVMe queue format is because they support offloading encryption/decryption to the NVMe controller, so they had to increase the queue entry size to be able to fit an encryption key in there.

svenpeter · on Aug 6, 2021

The "queue" format itself is incredibly similar to the normal NVMe queue.

The normal queue is (more-or-less) a ringbuffer with N slots in memory and a head/tail pointer. You append the command to the next slot and increase the tail by writing to a doorbell. Once the controller is done it increases the head the same way.

Apple's "queue" instead is just a memory region without those head/tail pointers. Command submission now works by again putting the request into a free slot followed by just writing the ID of that slot to a MMIO register. Once a command is done the CPU again gets an interrupt and can just take the command out of the buffer again.

This probably makes the driver a little bit easier to implement.

On top of that a similar structure (which identifies the DMA buffers that need to be allowed) also needs to be put into their NVMe-IOMMU with a reference to the command buffer entry. The slightly weird thing about the encryption is that you put the key/iv into this buffer instead of the normal queue. My best guess is that this IOMMU design pushed them to also simplify the command queue to make the matching easier.

Hiding the encryption part inside the IOMMU also makes sense for them because the whole IOMMU management is hidden inside a highly protected area of their kernel with more privileges while the NVMe driver itself is just a regular kernel module which possibly doesn't have access to keys.

marcan_42 · on Aug 6, 2021

Ah, I was thinking of what they did for the T2 Macs (change the queue entry size), not the new changes for M1. But yeah, once they're doing proprietary variants of the NVMe spec, they can do whatever they want and probably had a good reason to make this change too.

(For those following: sven has been working on the NVMe stuff for Asahi Linux recently, he knows about this more than me)

mhh__ · on Aug 6, 2021

There's probably an interesting groupthink/psychology to Apple, but I think in this case the proprietary-ness of parts of M1 is them focusing on getting it out of the door: Apple have a lot of experience making ARM mobile chips but this was their first desktop part.

I say this based on nothing but gossip, of course.

userbinator · on Aug 6, 2021

Think Different™

Apple has a really long history of doing stuff like that.

https://news.ycombinator.com/item?id=12924051

shawnz · on Aug 6, 2021

I remember when Microsoft were the archetypal "not invented here"-ers.

8note · on Aug 6, 2021

For USB, apple makes proprietary connectors so that competing cable makers have to pay them to license it (along with other licensing terms)

SilverRed · on Aug 6, 2021

This is not true of any M1 device.

kzrdude · on Aug 5, 2021

Is there a video recording to go with this?

Ristovski · on Aug 5, 2021

Afaik, BlackHat talks/presentations go public after around half a year.

Icathian · on Aug 6, 2021

First thing I was gonna ask. I don't think I'm good enough yet to get much value out of the slides without some voiceover.

jchw · on Aug 5, 2021

> M1 Linux does not have a cool logo or name

Wonder if this is a subtle jab at Asahi Linux. Hopefully not, as I was thoroughly unamused by Corellium’s previous antics regarding Asahi Linux. It would be a lot nicer to see at least polite relations in the future, if not collaboration…

marcan_42 · on Aug 6, 2021

I'm pretty sure it is.

They also apparently still haven't figured out that they have a bad setting in their tunables (that macOS does not do) which disables the SError reports when I/O writes are using the wrong type. They definitely don't get "silently" ignored if you don't turn off the error reports :-)

But this presentation finally answers the question of why Corellium did this. It wasn't a fun side project or a way to contribute to the community. Their Linux port is a validation platform for their emulation/VM product. Now it makes perfect sense. Releasing it and claiming they were going to upstream it was a PR stunt; they haven't updated their repo since February and didn't reply to any of the upstreaming mailing list threads they were CCed on. I'm pretty sure they don't have any actual interest in collaborating with anyone or upstreaming anything. To them, this is a platform validation tool for their commercial product, and it only has to work once, not be maintainable or upstreamable. They have no business reason to spend time on that.

Apple have done the same, by the way. They have internal Linux ports to their SoCs that they use for silicon validation.

I should probably give a talk on m1n1 and the hypervisor I built for M1 reverse engineering...

xuki · on Aug 6, 2021

> I should probably give a talk on m1n1 and the hypervisor I built for M1 reverse engineering...

Please do. Pretty please.

marcan_42 · on Aug 6, 2021

Not in "proper talk" format, but I do have a 3h stream where I go over the hypervisor, why it was made, and how all the different parts of the code work, if you're interested.

https://youtu.be/igYgGH6PnOw

monocasa · on Aug 5, 2021

Isn't the feeling mutual? I was under the impression that Asahi didn't really want Corellium's contributions because they considered Corellium's looks at proprietary Apple code to be an existential threat to the project's need for clean room RE.

marcan_42 · on Aug 6, 2021

Sometimes we look at Apple code too, within reason and where strictly necessary.

The problem is that there are safe ways of doing this and terrible ways of doing this. When I look at vendor code, my goal is to understand what I need to do to the hardware to get it to behave how I want. I don't care how Apple's code works; I only want to use it to help answer questions about the hardware, and then build my own driver with my own approach.

However, I've seen other people (including a Corellium co-founder, in a past project that predates the company) "reverse engineer" code by literally translating it back to compilable C code, with the same logic, functions, and everything. And that is where you run into copyright problems.

And the issue is that unless you re-do all the work from scratch, it's not easy to tell from the output whether this has happened or not. I've been bitten by this in the past. And so, as the Asahi Linux project founder, I can't afford to accept contributions from any random person that are not clean-room, without having some level of trust. I have a few long-time friends in the project who I know won't screw this up, so I'm not worried about them taking a peek at Apple's drivers in Ghidra. But if it's someone I don't know, they need to build trust and we need to have a conversation about this first.

Corellium have so far refused to have any conversations with us. They never replied to any emails they were CCed on, nor have they joined our project IRC channels.

So, without being able to build trust that their approach is kosher, I can't just take their code and use it.

Practically speaking, Corellium certainly have internal documentation on the hardware that they could share (much like we're dumping info on our wiki) that would be immensely useful to us, but I doubt they'd ever share that, as it would potentially help someone compete with their product. Asahi Linux is, in this way, fundamentally at odds with Corellium: we're trying to reverse engineer the M1 and share that knowledge, while they did so to build a proprietary product and keeping that reverse engineered knowledge a trade secret is important to them.

jchw · on Aug 5, 2021

In any case, it never had to turn petty, even if they really did feel like RE knowledge from Sandcastle was being used. Even if code wasn’t shared, knowledge could have been. I came out with the feeling that they had bigger concerns about optics above all, and they couldn’t use the same approaches that might’ve worked when they were the David and not the Goliath.

I don’t want to be the asshole dredging up drama needlessly, but the line just feels snide after all that happened. Like, if you wanted a slick logo, name and marketing page, you all could just make one; you had absolutely no trouble doing so for Project Sandcastle. I don’t see why this statement needs to be made here and now, as nobody would’ve batted an eye if they hadn’t said anything.

coldtea · on Aug 5, 2021

>Even if code wasn’t shared, knowledge could have been.

Legally it would be still an issue to share "knowledge" with someone who has seen the code.

jchw · on Aug 5, 2021

Isn’t that false? I thought the fact that you could is basically the principle behind Chinese wall reverse engineering.

not2b · on Aug 6, 2021

Emphasis on "wall" here. People who are being careful (and when your potential opponent is as big and powerful as Apple it's best to be careful) don't let the engineers who have seen the secrets talk directly to those who are doing the clean-room reimplementation. The first group documents what they found and how they found it, lawyers check the report for anything iffy, and if the implementers need to talk to the first group they don't do so directly, everything goes through legal.

But better to get legal advice on this than just listen to some random HN person (like me).

marcan_42 · on Aug 6, 2021

That interpretation of "clean room" is largely a myth in the context of open source reverse engineering projects. That level of rigour only really applies when you have companies directly opposed to each other, e.g. someone making unlicensed games for a game console by reverse engineering the DRM.

Clean-rooming is not a legal requirement to avoid copyright infingement; it's a legal defense against claims of copyright infringement.

In practice, the goal when doing open source projects like this, without a legal team to consult, is to use methods and approaches that yield a result similar to strict clean-room RE and result in an end product that is not a derivative work of the original code. The tricky thing here is that you need to be able to trust that the people doing the work knows how to do this safely.

IfOnlyYouKnew · on Aug 6, 2021

That may apply where patents are concerned.

For copyright, there really is no need to be as paranoid. It doesn’t protect some generic logic, only the particular rendition of it.

monocasa · on Aug 5, 2021

Totally agreed, the pettiness in this case (if we're reading a one-off sentence fragment in a slide correctly) is unacceptable. I was just commenting that I don't see collaboration occurring for the well meaning base differences of opinion even once you unwrap away the pettiness.

rowanG077 · on Aug 5, 2021

I believe this is false. The Corellium guys are (much?) further along technically. But they don't care to put their code in a format that upstream will ever accept. I believe this was the main reason collaboration went nowhere. Corelliums forked kernel will always be little more then a marketing toy because of this.

marcan_42 · on Aug 6, 2021

FWIW, they had a ~year of head start working on their A14 kernel, but they also haven't done any graphics work and our userspace stack now passes >90% of the GLESv2 tests... so I'm not sure it's fair to say their demo kernel (which they haven't touched since February) is further along technically any more.

We're focusing on different things, and now that the DART code is in good shape (which makes PCI and USB work) and ASC/RTKit support is coming along, the only significant driver code they have that we don't is the thunderbolt/USB4 stuff. The rest is trivial stuff (SPI, clocking, CPU frequency scaling) that takes more bureaucracy to figure out how to do properly and upstreamably than actual code. And the drivers which we are writing are higher quality than theirs (e.g. their NVMe support sometimes wedges in a way that can apparently only be fixed with a hard power cycle; sven's patches for Asahi Linux not only don't do that, but actively fix the wedge so you can use the Corellium driver again after booting with our kernel... I don't know why this happens, but I suspect it has to do with their ASC/RTKit driver being buggy).

rowanG077 · on Aug 6, 2021

I assumed they had a working graphics stack when I said they are further along :p. But yeah I follow the asahi channels on IRC and am a github sponsor myself!

So I'm definitely not knocking on you guys effort or anything. I'm very grateful I will soon have the opportunity to buy a thin & light laptop that doesn't throttle itself into the ground if you so much as even think about using the GPU.

coldtea · on Aug 5, 2021

Well, they objectively were.

asddubs · on Aug 6, 2021

haha, I have to admit I did think that the asahi linux logo has way too many different colors

CRConrad · on Aug 5, 2021

[flagged]

puszczyk · on Aug 5, 2021

Arguably, the comment is off-topic, but TIL about bmw m1 :)

CRConrad · on Aug 6, 2021

Take that, downvoters! :-D

Myself, I'm so old I genuinely still think of that first when I see the string "M1". (Also, I hear it's a motorway in England.) So I thought it could be worth mentioning as A) a reminder that headlines could be clearer, and B) an interesting tidbit of knowledge for those who weren't familiar with what else the name could refer to.