> For example, I have not encountered hardware for reading vertex attributes or uniform buffer objects. The OpenGL and Vulkan specifications assume dedicated hardware for each, so what’s the catch?
That is not my understanding of those specs (as someone that's written graphics drivers). Uniform Buffer Objects are not a "hardware" thing. They're just a way to communicate uniforms faster than one uniform per API call. What happens on the backend is undefined by those specs and is not remotely tied to some hardware implementation. Vertex Attributes might have been a hardware thing long ago but. I'm pretty sure there are older references but this 9yr old 2012 book already talks about GPUs that don't have hardware based vertex attributes.
This is completely off topic, I am very sorry, but given your comment and your username -- are there any learning resources you would particularly recommend for graphics programming? I have collected a few already (however they are all beginner level), and was wondering if there are hidden gems I missed.
An excellent starting point for anyone interested in low-level graphics programming is Sokolov’s tinyraytracer [0]. It’s also a great way to learn a new language (work through the code while porting it to $DIFFERENT_LANGUAGE).
> Simply put – Apple doesn’t need to care about Vulkan or OpenGL performance.
OpenGL and Vulkan allow an implementer to more easily make such specialized HW. But it doesn't assume it at all in any other way. If your HW is fast enough there is absolutely no need to implement specialized block for it without any performance penalty.
It's trivial to implement things like input assembler without specific HW, just issue loads. But it would be massive pain to go the other way around. Try to sniff what loads fit the pattern that could be tossed into fixed function input assembler. That's a no go.
This is the right way around to do things. As there is no performance penalty for "emulating" it, because there is nothing to emulate in the end.
This is great work—I'm glad to see this being tackled with such speed.
From the Phoronix comments on this post[0]:
> I have an idea. Why not support exclusively Vulkan, and then do the rest using Zink (that keeps getting faster and faster)?
> This way you could finish the driver in one year or two.
(For context: Zink is an OpenGL to Vulkan translator integrated into Mesa)
I had the same thought in my mind—Zink is 95% the speed of Intel's OpenGL driver[1], so why not completely ignore anything but Vulkan? On the Windows side, dxvk (DirectX to Vulkan) already is much faster (in most cases) than Microsoft's DX9-11 implementation, so it's completely feasible that Zink could become faster than most vendors' OpenGL implementation.
I have no knowledge of low-level graphics, so I don't know the ease of implementing the two APIs. I could envision, however, that because this GPU was never designed for OpenGL, there may be some small optimizations that could be made if Vulkan was skipped.
Yes, it works the other way: Zink is the Gallium->Vulkan translation layer, while the main Mesa code is effectively an OpenGL->Gallium translation layer.
Like, the classic "Intel OpenGL driver" in Mesa (i.e., i965) doesn't use Gallium and NIR, and hence has to implement each graphics API itself, whereas their modern "Iris" driver using Gallium presumably just handles NIR -> hardware?
Or does the Gallium approach still require some knowledge of higher-level constructs and some knowledge of things above NIR?
Think of it as HAL, on top of which state trackers implement their chosen APIs. OpenGL is one of them, there's also Gallium Nine that implement DirectX 9.
This is top-notch and very impressive work. I'm currently in the middle of tuning performance of piet-gpu for the Pixel 4[1], and I find myself relying on similar open resources from the Freedreno project. When the time comes to get this running efficiently on M1, having detailed knowledge of the hardware will be similarly invaluable - just the info on registers and occupancy is an important start.
Alyssa isn't personally taking donations for her work on this project, but she suggests you donate to the Autistic Self Advocacy Network or the Software Freedom Conservancy instead :)
So much for "It'll take years before we get the GPU working". Obviously this is far from a full implementation but seems like progress has been quick. Hopefully the power management stuff will be equally quick.
Also curious how far progress is on reversing the Apple NVMe SSDs. Last I heard, Linux couldn't properly install itself on modern Macs, only do liveboot.
Apple NVMe SSDs have worked fine for years in mainline. This is a myth that won't die.
The Linux driver required two new quirks (different queue entry size, and an issue with using multiple queues IIRC). That's it. That's all it was.
On the M1, NVMe is not PCIe but rather a platform device, which requires abstracting out the bus from the driver (not hard); Arnd already has a prototype implementation of this and I'm going to work on it next.
As I understand it, Apple's NVMe were pretty wildly non-standards-compliant - they assume that tags are allocated to commands in the same way as Apple's driver does, including crashing if you use the same tag at the same time in both the admin and IO queues and only accepting a limited range of tags, and as you say they use a totally different queue entry size from the one required by the standard. Also, apparently interrupts didn't work properly or something.
Oh, and it looks like the fixes only made it into mainline Linux in 5.4, less than a year and a half ago, and from there it would've taken some time to reach distros...
There's also applespi for their input drivers. I think with macbooks you never know what protocol apple will change the next iteration. Not something I would use as a daily driver(running linux) anymore.
Maybe I'm remembering it wrong, but wasn't there an issue with a secret handshake, and if the system didn't do it in a certain time after the boot, the drive disappeared? I.e. some kind of T2-based security?
Interesting... what bus does it use if not PCIe? At the driver level I’m guessing it just dumps NVMe packets onto shared memory and twiddle some sort of M1-specific hardware register?
Generally, "platform device" means that it's just a direct physical memory map. Honestly, from a driver perspective, that's sort of what you get with PCIe as well. The physical addresses is just dynamically determined during enumeration instead. Of course, there's some boilerplate core stuff to perform mappings and handle interrupts specific to PCI, but at the end of the day, you just get a memory mapped interface.
This is unlike something like USB where you need to deal with packets directly.
Yes. It seems there is no distinct SSD on the system. The M1 SoC seems to communicate with raw flash. I tried looking up the datasheet for the flash ICs (SDRGJHI4) to see if they would leave any clues but it’s not publicly available AFAICT. This is rather interesting that Apple has custom or semi-custom IP that manages raw flash as part of their SoC. That does seem like a natural outgrowth of shipping iPhones for so many years.
The specific logical signals between separate IPs on the SoC is slightly less interesting to me then. It’s likely something similar to ACE5, like you said, for sharing the memory bus.
Ah, yeah, it's been integrated on their SoCs for quite a while. Word on the street is that it's the (internal only successor to the) Anobit IP they bought back in 2011 with an ARM core strapped to the front for the NVMe interface.
There has to be some Apple engineers reading this wondering which feature she’ll find next, and hopefully with a big smile when she get something right.
>What’s less obvious is that we can infer the size of the machine’s register file. On one hand, if 256 registers are used, the machine can still support 384 threads, so the register file must be at least 256 half-words * 2 bytes per half-word * 384 threads = 192 KiB large. Likewise, to support 1024 threads at 104 registers requires at least 104 * 2 * 1024 = 208 KiB. If the file were any bigger, we would expect more threads to be possible at higher pressure, so we guess each threadgroup has exactly 208 KiB in its register file.
>The story does not end there. From Apple’s public specifications, the M1 GPU supports 24576 = 1024 * 24 simultaneous threads. Since the table shows a maximum of 1024 threads per threadgroup, we infer 24 threadgroups may execute in parallel across the chip, each with its own register file. Putting it together, the GPU has 208 KiB * 24 = 4.875 MiB of register file! This size puts it in league with desktop GPUs.
I don't think this is quite right. To compare it to Nvidia GPUs, for example, a Volta V100 has 80 Shader Multiprocessors (SM) each having a 256 KiB register file (65536 32-bit wide registers, [1]). The maximum number of resident threads per SM is 2048, the maximum number of threads per thread block is 1024.
While a single thread block _can_ use the entire register file (64 registers per thread * 1024 threads per block), this is rare, and it is then no longer possible to reach the maximum number of resident threads. To reach 2048 threads on an SM requires the threads to use no more than 32 registers on average, and two or more thread blocks to share the SM's register file.
Similarly, the M1 GPU may support 24576 simultaneous threads, yet there is no guarantee it can do so while each thread uses 104 registers.
I'm not really a developer so maybe I'm just not understanding something, but why in the world isn't Apple making it easier for people to optimize for the M1? I would think it's in their best interest to help developers make the best software possible, by sharing information about how to leverage the architecture. It's bizarre to me that the best sources of information are posts like this.
Apple is helping developers make the best software possible. They have a full suite of incredibly optimized and performant frameworks for compute, graphics, signal processing, machine learning, rendering, animation, networking, etc... That is all available via free download for writing MacOS and iOS apps.
Remember, they are selling computers as a combination of hardware and software. They are not selling processors so they are of course not supporting driver and other low level software for other OSs. That's a bummer if you are into other OSs, but it is not part of their business model so it should not be surprising.
Other OSs are supported via their virtualization framework. My limited tests show about a 10% performance penalty. Not too bad right out of the gate with a new system.
That being said, Ms. Rosenzweig is doing some incredible and interesting work. Really enjoying the series.
In a few months Apple is going to release a successor to the M1 processor, and then maybe in a year or two another revision after that. Apple would like your code to be optimized for that processor as well, in addition to the M1. The way Apple does this is by wrapping their hardware in a high-level API that knows how to use the hardware it is running on, rather than exposing architectural details and having developers hardcode assumptions into their code until the end of time.
Abstractions are both leaky and expensive. There are a lot of things that could have much better performance if they had access to the lower level APIs.
Metal is a good fit for the M1 GPU (since the GPU was essentially designed to run it). There isn't a need for a lower level API than Metal.
Most people will not end up writing code in the optimal way though, since they also want to support discrete GPUs with their own VRAM and those have totally different memory management.
Apple's official graphics API is Metal. There is plenty of documentation for that. Apple considers Metal a commercial advantage over OpenGL / Vulkan. That is; they want developers to develop against Metal.
Why do you think they are not? They are helping developers developing for the macOS platform that they own,develop,support. They are not responsible for 3rd part OSes and the development on those platforms. Why would you expect anything else based on their history and track record for closed/walled ecosystems?
Believe it or not, you're just purchasing rare earth materials packed very tightly into a nice box when you buy a Mac or an iPhone. The operating system is paid for on the backend by developers giving up 30% of revenue that goes through the App Store. The M1 is turning the Mac into an iPhone in exchange for an extremely fast processor and insane battery life, so they're not interested in helping you bypass the technical challenges of running other operating systems on their hardware (much like how they don't help you jailbreak iOS in order to help you install Cydia or third-party app stores).
You are drawing a false equivalency between the Mac and iPhones. iOS devices are deliberately locked so that running your own low-level software on them is not supposed to be possible, and requires breaking Apple's security. If they make no mistakes, doing it is completely impractical (the cost of an attack outside their attack model is greater than the price of the device).
macOS devices are not, and Apple invested significant development effort into allowing third-party kernels on M1 Macs. The situation is very different. They are not actively supporting the development of third-party OSes, but they are actively supporting their existence. They built an entire boot policy system to allow not just this, but insecure/non-Apple-signed OSes to coexist and dual-boot on the same device next to a full-secure blessed macOS with all of their DRM stuff enabled, which is something not even open Android devices do.
You can triple-boot a macOS capable of running iOS apps (only possible with secureboot enabled), a macOS running unsigned kernel modules or even your own XNU kernel build, and Linux on the same M1 Mac.
This pales in comparison to the rhetoric when OS X came out - you see that rhetoric survive today at opensource.apple.com, it's there, just, not in the spirit of the freedom that was promised.
I acknowledge that it’s possible to run unsigned code at “ring 0” on M1 MacBooks but the existence of the DRM restrictions leads me to believe that certain DRM-relevant hardware is not accessible unless a signed OS is running. I’m not exactly sure how the attestation works from software to hardware but I have to guess that it exists, otherwise it would be relatively trivial to load a kext that enables 4K DRM with SIP disabled.
One may consider that not important but I think it’s important to at least note the hardware of these machines are not fully under user software control.
Then again, I don’t think the iOS support requires any hardware so I’m not sure why someone hasn’t released a mod (requiring a kext or not) that enables iOS app loading with SIP disabled.
The Secure Enclave Processor (SEP) is the part that you cannot run your own code on, and it knows what state the device was booted in. However, it doesn't get in the way of normal OS usage. It also serves as a secure element, e.g. we can use it to store SSH keys and authorize use with Touch ID, or to secure a password store, or as a U2F token, just like macOS does. This is much better than most unlocked Android devices, which do this kind of thing in EL3/TrustZone, which runs on the main CPU and means there is always a proprietary, locked "supervisor" over your OS. This is also the case on x86, where the firmware runs code in SMM (ring -2) that the OS cannot touch or control, and in addition there is a management engine running even more intrusive stuff. On the M1, the main CPU is owned entirely by the OS, and doesn't even implement EL3 (only EL2, the VM hypervisor level, which we boot in - we already have the Linux KVM hypervisor working on the M1), making it a much more user-controlled execution environment than almost all x86 machines and Android phones.
In fact, the SEP is re-bootstrapped by the main OS itself after the boot stuff is done (we get the SEP firmware blob in memory passed in from the bootloader), so even though we cannot run our own code on it, we can choose not to run Apple's signed code either, and thus guarantee that it isn't doing nefarious things behind our back. For most intents and purposes, it's like an optional built-in YubiKey with a fingerprint reader.
iOS apps should be FairPlay encrypted AIUI, and presumably that goes through the SEP to only authorize decryption when booted in secure mode. That's my understanding anyway, I haven't had anything to do with that ecosystem for ages. Of course, either way you could load decrypted iOS apps just like you can pirate iOS apps on a jailbroken phone.
The M1 does have other co-processor CPUs that run signed firmware loaded before Linux boots (e.g. for power management, graphics management, sensors, etc), but all of those other firmware blobs are plaintext (only the SEP one is encrypted); we may not be able to change them (unclear yet how much control we have over those CPUs post-iBoot, there might be a way to stop them or reload the firmware) but we can at least reverse engineer them and audit that they don't do anything nasty. Besides, I think most of this stuff goes through OS-controlled IOMMUs to access memory anyway, so it can't do much harm to the main OS.
All the low-level details so far have been reverse-engineered since Apple doesn't provide documentation. Just because m1n1 finds the CPU to be in the EL2 state when its first instruction executes doesn't mean EL3 doesn't exist. An equally valid conclusion is that iBoot dropped from EL3 to EL2 before jumping to the m1n1 code.
Apple's phone chips use EL3 as a "god mode" to silently scan the kernel's code pages for modifications, and panic the processor if any are found:
> On the M1, the main CPU is owned entirely by the OS, and doesn't even implement EL3 (only EL2, the VM hypervisor level, which we boot in - we already have the Linux KVM hypervisor working on the M1), making it a much more user-controlled execution environment than almost all x86 machines and Android phones.
I agree that Apple probably has less random junk running at exceptionally high privilege levels, but your argument is not convincing to me. We have control of the exception levels, which we can check the existence of from the public ISA, but that doesn't mean Apple hasn't added any new stuff elsewhere that can touch the CPU in ways that are not yet known (and, to be entirely fair: I don't even think we know we have control of the exception levels. We have EL2 execution, but GXF exists, and even if we know how to navigate through it who can really say for sure what it does to the processor state?). I think the right argument here is "Apple has no reason to add stupidity to their processor (and many reasons to not add this garbage) so it likely does not exist" and leave it at that, rather than trying to draw up technical reasons why it seems more open.
> iOS apps should be FairPlay encrypted AIUI, and presumably that goes through the SEP to only authorize decryption when booted in secure mode.
Wow is it really possible that iOS apps are encrypted with a private key that is stored within all SEP devices and it hasn’t been cracked yet? If so that’s incredible and would explain why a workaround for using iOS apps with SIP disabled hasn’t been released. Of course I shouldn’t be that surprised since 4K DRM media content would rely on the same property.
Edit: I looked into this and it turns out that each device has its own public key and the server encrypts the app/content with a key derivable from the device public key on-the-fly at download time. This is a simplified explanation but the essential implication is that there is no global private iOS app package key.
> Besides, I think most of this stuff goes through OS-controlled IOMMUs to access memory anyway, so it can't do much harm to the main OS.
Great point. If these other onboard devices have unfettered access to the memory bus and/or can trigger some sort of NMI then you can never really trust these devices. Though as you point out, most contemporary x86 PCs are no different in that regard.
This is the standard content protection mechanism on pretty much every DRM download/streaming/whatever system in the world. Each app/movie/whatever is encrypted with a per-app key (so you can stick it in a CDN). Then each device has some kind of private certificate or key. When the user buys/rents content, you send the content key encrypted with the device key. This is how pretty much every game console, streaming service, etc does it.
There are global keys, which are used for system software. iOS used to be encrypted as a whole (not any more though, but the SEP firmware and iBoot still are) and getting those keys is tricky, as they are baked into hardware and different for each generation. You can build hardware so it lets you decrypt content or subkeys with a key, but not access the key material itself; if done properly (it often isn't done properly), that can mean you can only use the devices as an oracle (decrypt anything, but only directly on-device) unless you spend a lot of time and money reverse engineering the baked-in hardware key using a scanning electron microscope.
> If changing fixed-function attribute state can affect the shader, the compiler could be invoked at inopportune times during random OpenGL calls. Here, Apple has another trick: Metal requires the layout of vertex attributes to be specified when the pipeline is created, allowing the compiler to specialize formats at no additional cost. The OpenGL driver pays the price of the design decision; Metal is exempt from shader recompile tax.
I've just started playing with OpenGL recently and I don't know what "changing fixed-function attribute state can affect the shader" means.
Can anyone give an example of what kind of operations in the shader code might cause these unnecessary recompiles?
OpenGL has a model of the hardware pipeline that is quite old. A lot of things that are expressed as OpenGL state are now actually implemented in software as part of the final compiled shader on the GPU. For example, GLSL code does not define the data format in which vertex attributes are stored in their buffers. This is set when providing the attribute pointers. The driver then has to put an appropriate decoding sequence for the buffer into the shader machine code. Similar things happen for fragment shader outputs and blending these days. This can lead to situations where you're in the middle of a frame and perform a state change that pulls a rug from under the shader instances that that driver created for you so far. So the driver has to go off and rewrite and reupload shader code for you before the actually requested command can be run.
More modern interfaces now force you to clump a lot of state together into pretty big immutable state objects (e.g. pipeline objects) so that the driver has to deal with fewer surprises at inopportune times.
I think I understand now. Ideally the GLSL shader code is compiled once and sent to the GPU and used as-is to render many frames.
But if you use the stateful OpenGL APIs to send instructions from the CPU side during rendering you can invalidate the shader code that was compiled.
It had not occurred to me because the library I am using makes it difficult to do that, encouraging setting the state up front and running the shaders against the buffers as a single "render" call.
I’m very amused by the fact that this hardwares does not have hardware for some specialised operations that competitors do.
From the article, it would seem that compensating via software was fine (performance wise). Apple’s approach seems to be seems to break the norm in fields where the norm has proven to be unnecessary complexity. Which open tip room for just more raw performance.
AMD has removed exactly the same hardware, and NVIDIA doesn't benefit much from the specialized vertex fetch hardware anymore, even specialized uniform buffer stuff is getting close to marginal benefit. Hardly unique to Apple.
Lots of what they do on the M1 is stuff that PowerVR was doing before (and apple's older GPUs were based on PowerVR's via a licensing deal). There are other vendors who have also ditched some of this stuff.
It's a smart move for Apple to double down on pruning hw features you don't think you need, but sadly you can only go all-in on it if you control the entire ecosystem.
As described in the article by not having fixed function units apple can put more regular floating point math units. These units can be used in many, for example when doing compute tasks or when doing 3d tasks that do not make use of those very focused use cases. In the end if it is a big perf hit for those approaches devs will just use different solutions as they need to developer explicitly for metal anyway.
No, this is purely a hobby project undertaken in my spare time. (The email addresses on the git commits are force-of-habit, apologies for the confusion.)
I came to say the same thing. Fantastic technical writing—as if doing the reverse engineering and novel development wasn't enough! Thank you, Alyssa, if you're reading, for this gift to the community. <3
Unrelated - All this work is done by an 18 year old. Absolutely incredible. Some people are simply built differently and admittedly, it makes me jealous.
Good for her. But to be honest, a lot of us did in fact do a lot of things including lots of reverse engineering when we were in high school. Arguably high school and early university is the place where you have more time to focus on these sort of things without having to worry about growing your "career" or thinking about how you will earn enough money to buy a house or a afford a family.
Some of the comments below surely are out of jealousy. But then again that jealousy is understandable, when not too long ago, people wouldn't celebrate the age of the people nor would they even mention the age anywhere.
To some extent, I personally am jealous that in the place I grew up we didn't understand how this kind of marketing helps with life later on. And I still find myself jealous of Americans who oftentimes market less work that lots of us have done as something that turns the person in question into some sort of hero character.
Though, I have plenty of other comments that actively critique hero worshipping. I personally think that even if you remove the jealously aspect its damaging to the persons character development.
Wow, hopefully they are getting enough in donations/sponsorship to compensate them for their extraordinary efforts. It’s very easy to see these skills getting snapped up for a multiple 6 figure salary in the near future…
Being born in the San Francisco Bay Area and going to a "elite" public high school in a upper middle class suburb helps a little (and creates lots of other issues too :) ). Really tells you how many bright kids we're missing because they're not born in the right environment.
Don’t be jealous, just engage yourself in work that challenges you at that level! There is lots of challenging free software stuff that needs doing with your name on it. Alyssa is just having fun.
It is great for her future, however back in Portugal if you take the technical high school path, you are expected to be a good developer by 18, when done with high school as alternative path to university.
During my time it meant:
- Knowledge of BASIC (GW, Turbo and Quick), Turbo Pascal, Turbo C, Turbo C++, 80x86 Assembly, dBase III Plus and Clipper
- Databases and their data organization on harddisk
- Digital circuits
- OS design, with experience on MS-DOS/Netware and Xenix
- 3 months trainship at the end of the degree into a local company
- All the remaining stuff on traditional high schol like physics, math, geometry and whatever else.
Now would everyone be as good as she is?
Certainly not, but the tools are there for anyone that wants to have a go at it.
Something about this comment on each of Alyssa’s posts[1] rubs me the wrong way. I find her work remarkable even without the context of her age. Why mention it? Highlighting it is almost diminutive, even though I know that the comment is not intended to be that way.
Because it’s interesting and intent matters. Maybe you’re being too sensitive, and too vicariously sensitive, about a fact that’s obviously stand-out interesting.
Yes, I am very impressed with the fact that the person with this level of skill and understanding of the task at hand is ONLY 18. How could pointing out the impressively young age be considered bad? The percentage of young people that have this level of skill in whatever aspect is very small and is a valid thing to recognize.
This is the advantage of building your hardware and your software under the same roof. You can make optimizations (such as not including hardware that makes OpenGL faster, but doesn't impact Metal nearly as much.) The "standard" here is the Metal API, not the GPU hardware.
I don't think it's an advantage to sabotage the adoption of common API like Vulkan. It holds the progress back and I totally blame Apple for pointless NIH and lack of collaboration here.
Apple are doing it out of rather sickening lock-in culture in the company and Metal is far from the only example like that.
They aren't "sabotaging" anything, they are making a perfectly normal design trade-off to implement some features in shaders instead of as fixed-function hardware. By your definition, every modern GPU built in the last decade is "sabotaging" OpenGL 1.x support, because fixed-function vertex and pixel processing hasn't been a thing in that long and it's all done in shaders now, even if you use the legacy APIs.
Alyssa clearly explained how avoiding fixed-function hardware means they can cram more shaders in which means they can increase performance; we have no idea, at this stage, whether this ends up being a net gain or a net loss for, say a Vulkan app. And we probably never will, because we don't have an "AGX-but-it-has-this-stuff-and-fewer-shader-cores-in-the-same-silicon-area" to compare with. And it doesn't matter. In the end OpenGL and Vulkan apps will run fine.
If we ever end up with empirical evidence that these design choices significantly hurt real-world OpenGL and Vulkan workloads in ways which cannot be worked around, you can start complaining about Apple. Until then, there is absolutely no indication that this will be a problem, never mind zero evidence for your conspiracy theory that it was a deliberate attempt by Apple to sabotage other APIs.
You're talking about macOS. We're talking about AGX2. If you want to complain about Apple's API support in macOS, a discussion about AGX2 support for Linux is not the right venue.
I am, quite honestly, getting very tired of all the off-topic gratuitous Apple bashing in articles about our Linux porting project.
Shmerl pops up on every thread mentioning Vulkan/Apple spouting conspiracy theory nonsense that every design decision is some kind of evil plan to screw over open standards. Ignore him.
Keep up the great work, plenty of people really appreciate it.
Yup, I doubt he has experience working in the games industry. Many engines support multiple graphics APIs and there's often only 1-2 employees implementing/maintaining them so speaking about vendor lock-in is not a strong argument.
I am aware of my limitations as human being in this society, speak from actual work experience, and will use any tooling that I rant about when it is on the best interests of the customers, regardless of my personal agenda.
Your argument is equivalent to criticizing ARM for putting in instructions to optimize Javascript into their architecture, as if that "sabotages" every other programming language. Or Intel for putting in instructions to optimize AES into their architecture, as if that "sabotages" Salsa20.
It doesn't make any sense. Of course Apple optimizes Metal for their GPUs and their GPUs for Metal. None of that is hostile towards other APIs. All of this hardware is Turing-complete and by definition can implement any conceivable graphics API. The only question is how well it performs with those APIs, and until we have benchmark numbers, your argument is based on assumptions lacking any evidence.
I'm not sure what Turing completeness argument has to do with anything. Turing machine is also Turing complete. You are going to make GPUs like that.
We are talking about a simple fact - Apple don't care to collaborate on Vulkan, neither when designing their GPUs nor for their OS. I see no point to further argue about facts. And I see criticism of that as completely valid.
Supporting both Vulkan and Metal in a game engine is not a huge task. I work in the games industry, my job is to implement and maintain graphics backends to a renderer engine, so I can speak from experience.
Huge or not, duplication of effort is a tax. And no, it's not trivial as you claim. Especially when some engine wasn't designed from the ground up to address these differences.
Why should they support vulkan? what does apple get out of that apart from less well optimised compute and shader code, using more battery and producing more heat for the same output. (the reason it would be less well optimised is Vulkan is an API designed by a group to be the best compromise of many GPU vendors.
If apple wanted to support vulkan without it being worse than Metal they would either need to add so many apple only extensions that it would be Vulkan in name only or make their GPUs be identical to AMD or Nvidia (unfortunately due to IP patents apple can't just make a copy of AMDs GPUs they need to find another IP partner and that is PowerVR).
If PowerVR had 80% of the GPU market (like Nvidia) the would have pushed Vulkan to line up with a TBDR pipeline but they do not so while you can run Vulkan on a TBDR pipeline you end up throwing away lots and lots of optimisations.
Adding to this, that whole render pass concept in the Vulkan API was the TBDR vendors being very active contributors to the API. More explicitly describing the data flow there allows TBDR arch's to work on multiple parts of modern render graphs simultaneously and keep their tiles filled with work in places where the other synchronization methods wouldn't (or would require the kind of divination on the part of the driver that Vulkan is trying to avoid).
The fixed-function hardware that Alyssa assumed existed (vertex attribute fetch, special uniform buffer hardware) doesn't exist on many GPUs outside of mobile. In fact, Vulkan was designed to be a closer match for these GPUs, by making many of the same tradeoffs that Metal did. If they wanted to sabotage Vulkan, choosing the same tradeoffs as it and following the same path as most other GPU vendors doesn't seem like a very effective way to do it.
Apple has every right to do that as they control their stack. And people who favor FOSS projects have every right to criticize Apple for exercising their control in a way that adversely impacts FOSS.
Sure, but comments like this attribute hostility instead of simple practicality.
> ...pointless NIH and lack of collaboration here.
> Apple are doing it out of rather sickening lock-in culture in the company and Metal is far from the only example like that.
> no benefit to them in doing it the way that you want them to.
portray Apple almost as a helpless besieged small business that should be shielded from critique of its decisions. Whereas they are an industry titan, and people should criticize them as they see fit, even if others don't find merit in the criticisms.
> Whether you like it or not is immaterial
is a completely true, and utterly banal statement, as it can be applied to any opinion made in conversation. No one here has any power over Apple, but we do have the power to free discussion, do we not?
I find such lock-in to be hostile and not something to be excused with practicality. It's like saying ActiveX is practical, don't blame you know who for not supporting HTML, or something the like.
There is a benefit to them, but they're myopic about it. The fact that we had OpenGL and DirectX as a standard meant that a vibrant 3d accelerator market opened up, they made immense advancements since the late 90s. Apple benefitted tremendously from being able to just pick up 2 decades of R&D into GPUs that existed because consumers had a competitive choice. If software had been locked into a single GPU, say a 3dfx Voodoo1, and all software was targeted at a proprietary API and design, how much advancement would have been lost?
Apple didn't even design their own GPU, the IP behind it is largely PowerVR, again, arising from a company trying to compete against ATI, NVidia, 3dfx, Matrox, etc who were running in a bandwidth wall, by taking a big risk with a tile based deferred renderer.
Now look at what is being competed on now? Ray intersection hardware. This is happening because of Raytracing extensions to DirectX and OpenGL. Otherwise you end up with a game console, and while game consoles can leverage their HW maximally, they don't produce necessarily top end HW innovation and performance.
They sure think lock-in is a big benefit for them, that's part of their corporate culture that I was talking about. I'm just saying that it's nasty, bad for progress and it's the wrong way to do things.
> Apple are doing it out of rather sickening lock-in culture in the company and Metal is far from the only example like that.
I don’t disagree, but what else they could possibly do?
Metal shipped in 2014 for iOS, in 2015 for OSX. Vulkan 1.0 was released in 2016.
I don’t think it was reasonable to postpone long overdue next gen GPU API for a few years, waiting for some consortium (outside of their control) to come up with API specs. By the time Vulkan 1.0 has released, people were using Metal for couple years already.
> our driver should follow Linux’s best practices like upstream development. That includes using the New Intermediate Representation (NIR) in Mesa, the home for open source graphics drivers. NIR is a lightweight library for shader compilers, with a GLSL frontend
I remember when building everything on LLVM bytecode was best practice. It wouldn't be the Linux ecosystem without continual reinvention of the wheel, would it?
A nit:
> For example, I have not encountered hardware for reading vertex attributes or uniform buffer objects. The OpenGL and Vulkan specifications assume dedicated hardware for each, so what’s the catch?
That is not my understanding of those specs (as someone that's written graphics drivers). Uniform Buffer Objects are not a "hardware" thing. They're just a way to communicate uniforms faster than one uniform per API call. What happens on the backend is undefined by those specs and is not remotely tied to some hardware implementation. Vertex Attributes might have been a hardware thing long ago but. I'm pretty sure there are older references but this 9yr old 2012 book already talks about GPUs that don't have hardware based vertex attributes.
https://xeolabs.com/pdfs/OpenGLInsights.pdf chapter 21