Got downvoted out of view for saying this once but no less true. Absolutely a pi...

reissbaker · on June 13, 2023

Apple's take on GPUs is quite different and very interesting IMO. Shared memory architecture with absolutely massive RAM (and hence VRAM) support, e.g. the new Mac Studio having 192GB RAM/VRAM which can run pretty massive models, much more so than is easily consumer-accessible even at the high end with Nvidia 4090s. It's not as fast as Nvidia, but it's not horribly far off in the latest chips.

As LLM adoption grows, I wonder whether Apple's approach will start to make more sense for consumer adoption, so that you can run models on your machine without needing to pay large subscription costs for AI-powered apps (since OpenAI et al have fairly high fees). The high cost of using the APIs in my opinion is a drag on certain types of adoption in consumer apps.

Llama.cpp actually first started as a way to run LLMs on Macs! At first CPU-only, but then later the first GPU driver backend added was Metal, not anything from Nvidia.

bottled_poe · on June 13, 2023

I agree, though it’s still vendor lock-in. I think a lot of devs aren’t aware of how tightly integrated these frameworks are with the hardware. It’s not trivial to separate the two and the companies driving this tech have no motive to do so, while also being amongst a very small number of hardware designers.

anikom15 · on June 13, 2023

I think criticizing Apple for not supporting Nvidia is a classic missing-the-forest-for-the-trees. Unifying memory is a next logical step in general purpose computing. The flexibility that comes from it has unseen potential.

jsnell · on June 13, 2023

It's a next step that AMD has been trying to make happen for like 10 years, but not managing to make it commercially successful in anything but consoles, ending up permanently stuck in the lowest low end of the market. In both the PC and datacenter market spaces it has ended up with basically the opposite of product market fit. Nobody actually wants the generic CPU compute tied to the GPU compute.

The PC gamers really want the GPU component to be separately upgradable from the CPU. Non-gamer PC users don't care about the GPU performance, just cost. The datacenter folks want to be able to use a single $1k CPU to host $100k worth of GPUs.

It's plausible that AI accelerators follow a different path for the consumers. It's harder to see it happening for the datacenter market.

anikom15 · on June 13, 2023

You’re missing the forest for the trees.

Being able to switch from an optimized CPU-centric workload to an optimized GPU-centric workload without any hardware changes sounds useful to me.

You could even do unified memory with upgradeable separate CPU and GPU. You won’t get the benefits of having them on the same chip, but there’s nothing intrinsic about the separation requiring separate memory space.

ttflee · on June 13, 2023

Thinking what Apple sells is a GPU bundled with a large amount of VRAM and a free ARM CPU makes things easier to accept.

NVIDIA never sold a GPU with expandable memory, either.

eurekin · on June 13, 2023

Is that RAM as fast as A100/H100 VRAM? AFAIR, gpus push ~ 1 TB/s ish

jlokier · on June 14, 2023

The Apple M2 Ultra memory bandwidth is 800GB/s so it's not a long way off 1TB/s.

A100 goes up to 2TB/s in the largest configuration, and H100 claims 3TB/s per GPU node. (These figures keep changing with new variants.) But you can buy several Mac Studios for the price :-)

The real use of H100 is for training, as it can pool many GPUs together with a dedicated high bandwidth network. You can't do that with Mac Studios.

eurekin · on June 15, 2023

That figures are really impressive. A single node is still useful for finetuning smaller models. Apple could move the needle in this market a bit

mliker · on June 13, 2023

In addition to the cost, OpenAI censors their models, and while that seems protective at first, if you think about it, if the model is the knowledge graph, then censor data in the graph is censoring free speech.

Tostino · on June 13, 2023

I don't personally like model censorship, but you need to come up with a better argument than...whatever that is. It is entirely unconvincing, and seems like you misunderstand what free speech is.

mliker · on June 13, 2023

Freedom of speech is a principle that supports the right of an individual or a community to articulate their opinions and ideas without fear of retaliation, censorship, or legal sanction.

Since the models are not suppressed to answer certain questions and indeed have been demonstrated to be biased toward one end of the political spectrum, and if you treat the model’s output as a knowledge graph, as proposed by John Schulman, one of the cofounders of OpenAI, then yes, I would say the suppression of freedom of speech is a valid argument to make. Otherwise why would there be a set of “uncensored” models that exist in the open source world?

I would suggest you read about these models and think about the implications. Perhaps that’ll lead to you to reconsider your stance

svnt · on June 13, 2023

If we ignore the corporate structure details where your analogy breaks down, in the simplest case, choosing to self-inhibit is not violation of the freedom of speech. OpenAI is voluntarily self-censoring their output. This isn’t an ideological battle for them, it’s a business.

smoldesu · on June 13, 2023

> I would suggest you read about these models and think about the implications.

I have done that, and come to the conclusion that "that's business, baby!"

Do you have any more meaningful objections to this "censorship" of a free thing you agreed to license?

AeiumNE · on June 13, 2023

I've said it before and I'll say it again. Cuda dominance is the darkest timeline.

OpenCL was the utopia timeline.

Ono-Sendai · on June 13, 2023

OpenCL still works however :)

smoldesu · on June 13, 2023

Shame Apple never really invested in OpenCL. Things would look a lot different now if they hadn't abandoned it.

asynchronous · on June 13, 2023

AMD almost pulled it off but software made Nvidia take the lead again. AI being CUDA dependent really gave them an edge, in both the consumer and business market.

josephg · on June 13, 2023

I have hope that AMD can close that gap over the next few years. Their hardware is already great, and the business case for investing in AI software is crazy strong. Their stock price would probably get a bump just from announcing the investment.

It seems like a no brainer to hire a bunch of people to work on making PyTorch / Tensorflow on AMD become a competitive option. It’ll just take a few years.

raverbashing · on June 13, 2023

I think it's been a couple of years already that we hoped better of AMD, to general disappointment

ec109685 · on June 13, 2023

Overview of some of the efforts to move away from cuda: https://www.semianalysis.com/p/nvidiaopenaitritonpytorch

smoldesu · on June 13, 2023

Can hardware vendors even put their differences aside to build such a thing? We can't even build a unified open raster graphics API, and now you're asking for machine learning acceleration in that vein?

OkayPhysicist · on June 13, 2023

Machine learning would probably be the simpler API. If you can speak Linear Algebra, you're most of the way there.

smoldesu · on June 13, 2023

You're right, and it's why projects like the ONNX runtime exist to unify vendor-specific AI accelerators. Covering the basics isn't too hard.

What GP seems to be asking for is an open CUDA replacement, which is kinda like asking someone to fund a Free and Open Source cruise ship to compete with Carnival for you. You'll get somewhere with some effort, luck and good old human intuition, but Nvidia can outspend you 10:1 unless you have funding leverage from FAANG.

earhart · on June 13, 2023

IMHO, it comes down to the software.

It turns out you need very different kernels for good performance on different GPUs, so OpenCL is a nice tool, but not sufficient; you need a hardware-specific kernel library.

From the framework side, each integration is relatively expensive to support, so you really don’t want to invest in many of them. Without some sort of kernel API standard, you’re into a proprietary solution, and NVidia did an amazing job at investing in their software, so that’s the way things go.

I think we had a pretty solid foundation for doing something smarter with PlaidML, but after we were bought by Intel, some architectural decisions and some business decisions consigned that to be a research project; I don’t know that it’s going anywhere.

These days, I’d probably look into OctoML / TVM, or maybe Modular, for a better solution in this space… or just buy NVidia.

(I worked a bit on Intel’s Meteor Lake VPU; it’s a lovely machine, but I’m not sure what the story will be for general framework integrations. I bet OpenVINO will run really well on it, though :-)

numlock86 · on June 13, 2023

> Absolutely a pity & a shame no one else has competed with this market dominance by Nvidia.

Well, at least you admit it's not Nvidia's fault. Apparently Apple, Intel and AMD don't think there's much money to grab here.

rapsey · on June 13, 2023

I'm sure Intel and AMD heavily regret their neglect of OpenCL since the rise of LLMs and stable diffusion.

baq · on June 13, 2023

For me it’s doubly amazing that Intel does not exist in those discussions about alternatives that are already rare. They should write a book about how to blow up a successfully semiconductor company from the inside.

smoldesu · on June 13, 2023

FWIW, Intel has OpenVINO acceleration on their ARC GPU lineup. Their $300 Arc A770 outperforms the M1 Ultra by ~10% in OpenCL (which OpenVINO uses): https://browser.geekbench.com/opencl-benchmarks

It stands to reason that Intel is making highly price-competitive hardware at the moment, but people don't talk about them as much Nvidia because they have a minuscule install base with primitive Windows drivers. I wouldn't count them out if their first showing is this impressive, though.

ygouzerh · on June 13, 2023

High technology hardware needs a lot of investment, and it's hard to gather enough resources (human, capital, market share).

But AMD is coming strong, and they are trying to compete with Nvidia now. https://www.forbes.com/sites/iainmartin/2023/05/31/lisa-su-s...

croes · on June 13, 2023

The others tried with OpenCL to build a more open environment, this will always lose against a single vendor tailoring it's solution for their own lineup.

Think of Apples ecosystem vs Android or MS's Office-Outlook-Teams vs anything else.

wyldfire · on June 13, 2023

Well, we could do a much much better job of it but in fact Qualcomm does compete with NVIDIA for use cases like this (inference). Both in mobile devices and the data center.

Disclaimer: I work at Qualcomm.

Tostino · on June 13, 2023

What is on offer optimized for running these LLMs?

wyldfire · on June 13, 2023

The hexagon NSP is reasonably well suited for running ML in general. I know it's used for some image/CV use cases and I think it will work well for language models, but maybe suboptimal for the recent large ones.

This processor shows up in both snapdragon SoCs and cloud ai 100.

lachlan_gray · on June 13, 2023

With any luck projects like MLC will help close the gap

https://github.com/mlc-ai/mlc-llm

m00x · on June 13, 2023

You can also use TPUs or other training cards. Nvidia is just the best one that's accessible.

But I think you're getting downvoted since it's very off-topic.

jiggawatts · on June 13, 2023

I wonder if it would be possible to emulate/translate CUDA to target non-NVIDIA hardware.

I suspect it would be more of a legal challenge than a technical one.

mahkeiro · on June 13, 2023

That's what ROCm was trying to do

sharikous · on June 13, 2023

WebGPU strikes me as the answer to that. Perhaps I am missing something?

kayvr · on June 13, 2023

It could be. But there's quite a bit of momentum behind CUDA. Plus, CUDA is just wicked fast. I wrote a WebGPU version of LLaMA inference and there's still a bit of a gap in performance between WebGPU and CUDA. Admittedly, WebGPU can't access tensor cores and I undoubtedly need to optimize further.

jeron · on June 13, 2023

looking forward to AMD MI300, hopefully will be a game changer

eur0pa · on June 13, 2023

There are far more important things to be this worried about

fragmede · on June 13, 2023

tinygrad is trying to address this problem. We'll see if it's successful.

MuffinFlavored · on June 13, 2023

it’s uh… not that serious