Seeing those gigantic models it makes me sad that even the 4090 is supposed to stay at 24GB of RAM max. I really would like to be able to run/experiment on larger models at home.
It's also a power issue. The 4090 sounds like you're going to need a much, MUCH higher PSU than you currently use.. or it'll suddenly turn off as it uses 2-3x the power.
You'll need your own wiring to run your PC soon :-)
I think it is a stupid question, but does the power consumption needed by processors to infer compared to human brains demonstrate that there is something fundamentally wrong for the AI approach or is it more physics related?
I am not a physicist or biologist or anything like that so my intuition is probably completely wrong but it seems to me that for more basic inference operations (lets say add two numbers) power consumption from a processor and a brain is not that different. It’s like seeing how expensive it is for computers to infer for any NLP model, humans should be continuously eating carbs just to talk.
Around room temperature, an ideal silicon transistor has a 60 mV/decade subthreshold swing, which (roughly speaking) means that a 10-fold increase in current requires at least a 60 mV increase in gate potential. There are some techniques (e.g. tunneling) that can allow you to get a bit below this, but it's a fairly fundamental limitation of transistors' efficiency.
[It's been quite a while since I studied this stuff, so I can't recall whether 60 mV/decade is a constant for silicon specifically or all semiconductors.]
> but it seems to me that for more basic inference operations (lets say add two numbers) power consumption from a processor and a brain is not that different
Sure it is - it is too hard to figure it out based on 2 numbers number, but lets multiply that by a billion - how much energy does it take a computer to add two billion numbers? Far less than the energy it would take a human brain to add them.
Nvidia deliberately keeps their consumer/gamer cards limited in memory. If you have a use for more RAM, they want you to buy their workstation offerings like RTX A6000 which has 48G DDR6 RAM or A100 which has 80G.
What NVIDIA predominantly does on their consumer cards is limit the RAM sharing, not the RAM itself. The inability for each GPU to share RAM is the limiting factor. It is why I have RTX A5000 GPUs and not RTX 3090 GPUs.
It only gets expensive if you insist on sourcing it from enterprise vendors. The first 256GB I paid $2,400 for. The second 256GB I paid $1,200 a little over a year later. And the third 256GB I paid $800 about seven months later. I've got a workstation with 768GB DDR4 and I am considering upping that to 1.5TB if the prices on the 256GB sticks will come down.
What's the difference between Apple's unified memory and the shared memory pool Intel and AMD integrated GPUs have had for years?
In theory you could probably assign a powerful enough iGPU a few hundred gigabytes of memory already, but just like Apple Silicon the integrated GPU isn't exactly very powerful. The difference between the M1 iGPU and the AMD 5700G is less than 10% and a loaded out system should theoretically be tweakable to dedicate hundreds of gigabytes of VRAM to it.
It's just a waste of space. An RTX3090 is 6 to 7 times faster than even the M1, and the promised performance increase of about 35% for the M2 will means nothing when the 4090 will be released this year.
I think there are better solutions for this. Leveraging the high throughput of PCIe 5 and resizable BAR support might be used to quickly swap out banks of GPU memory, for example, at a performance decrease.
One big problem with this is that GPU manufacturers have incentive to not implement ways for consumers GPUs to compete with their datacenter products. If a 3080 with some memory tricks can approach an A800 well enough, Nvidia might let a lot of profit slip through their hands and they can't have that.
Maybe Apple's tensor chip will be able to provide a performance boost here, but it's stuck on working with macOS and the implementations all seem proprietary so I don't think cross platform researchers will really care about using it. You're restricted by Apple's memory limitations anyway, it's not like you can upgrade their hardware.