Hacker News new | past | comments | ask | show | jobs | submit login
AMD ROCm Software Blogs (amd.com)
118 points by jrepinc on Feb 23, 2024 | hide | past | favorite | 54 comments



With a little finagling, I was able to get ComfyUI working for my AMD Cards. I purchased 7800XT with 16GB RAM and have been pretty happy with its value. Getting around 9it/s for a simple SD 1.5 pipeline using 512x512 latent image. Not a speed beast, but plenty fast to teach my kids about AI and run some local models.


The 7900 line-up (and the old VII weirdly) are the only supported consumer cards that do ROCm 6. https://rocm.docs.amd.com/projects/install-on-linux/en/lates...

There seems to be hit and miss reports about coaxing a 7800XT to run. But the uncertainty (and the fact that it's no better/in some ways worse than a 6800xt) basically dissuaded me from bothering to upgrade my old rx580 recently.


I have automatic1111 working on my 7900xtx and it does alright.

Took some finagling to get it working with ROCM 6 and the newest pytorch which I couldnt seem to coax invoke or comfy to do


I've done similar with an 8GB 7600, and I've run SD1.5, SDXL and Stable Cascade on it with ComfyUI. To get beat speeds I need to not run the desktop. That way I get all 8GB for inferredce, and control it from another computer. I think the 16GB 7600XT could be a nice entry level card for ML applications.


Looks like tinycorp is shipping a box with 6x 7900 soon.

https://twitter.com/__tinygrad__/status/1760988080754856210


I wonder what the PCIe AER difficulties he refers to are.

Switching from HIP to HSA is an interesting turn of events. I thought HSA was dead or mostly a buzzword. The Wikipedia article is short on dates, but it feels like the HSA was trying to be a thing a decade ago. https://en.wikipedia.org/wiki/Heterogeneous_System_Architect...


I can't help but wonder what'll happen to those boxes when AMD inevitably discontinues ROCm support for the 7900XTX 3 years down the line.


6x 7900 XTX according to the link. That's a lot of horsepower.


If you think about doing something using ROCm, read these first to get a feeling for what to expect:

https://www.reddit.com/r/Amd/comments/a9tjge/amd_rocm_hcc_pr...

https://www.reddit.com/r/ROCm/


Just watch out. Sometimes AMD's ROCm support for their consumer GPU can be only 4 years (like the RX 580). So if you're not buying cutting edge on release day that means you have ~2-3 years max of ROCm support. And then you have to use opencl and let me assure you: it's not fun or widespread.


I have wasted so many nights debugging and trying to figure out how to get LLMs to run on my AMD iGPU. Despite pytorch and xformers (unofficially) being supported, I was still unsuccessful. Cuda on the other hand, it just works. I don't see the value proposition of AMD over Nvidia for now


For that I'd suggest using llama.cpp and the clBLAST (opencl) config to compile it.


You could also run CUDA on AMD cards using ZLUDA.


ZLUDA has its own baggage for now. Still lacking many support, but that changing fast, maybe in a month or so.


It also took AMD a year to officially support ROCm with the 7900xtx. IME it's still rough because ROCm is fundamentally rough but non-officially supported cards are even rougher.


Some of these are pretty good. The AITemplate Stable diffusion demo, for instance, is a nice hidden gem (though I'm not sure if it works on consumer hardware these days).


Guessing there are a few of us in a position where we are frustrated with our past experiences with ROCm software, (e.g. being awful to install, in the past there were long guides with loads of steps to follow, and not the greatest clarity / simplicity in the instructions, and the only option they suggested if you messed it up was to reinstall the operating system, and there were multiple guides / pages and it wasn't clear which was the latest one, and then it only supported older versions of tensorflow / pytorch / jax, and only supported recent / higher end cards, etc. — it may be better now, this is my experience from I guess a few years ago); but who at the same time recognise that it would be great for GPU compute to be more affordable, and for there to be good competitors to Nvidia.


I don't think that anyone is trying to whitewash the past. All we can do is look to the future. Fact is that the demand for AI, has forced their hand. As I see it, AMD is trying to clean up their act as quickly as possible, but change of this magnitude, isn't going to happen over night.

These sorts of updates to things, like their blogs, are baby steps in the right direction.


Andrew Ng says ROCm has improved a lot in the last year and is no longer as bad as people make it out to be


Thanks, I do agree with this, and I am rooting for them. Would love to find out at some point that it's turned around.


Waiting for this to work on windows.

My gaming machine is a windows machine, so that's where my GPU is. Not willing to add a Linux partition, and the "Linux in Windows" support doesn't work with AMD gpus.


You could add another ssd and have that be a linux distro. Its not a partition per se, since the whole drive is dedicated to it.

I have that on my PC and I like it a lot


If you want to not be moving SATA cables, or worse, swapping NVME drives, you're still having to modify the boot loader, which is probably what OP worries about. I've had a bad time trying to dual boot due too boot loaders not working the way the manual says they should.


This is the way. Using a partition is just begging for an update (Windows or Linux) to kill your boot. Convert the amount of time you would spend periodically fixing a broken boot into a dollar amount and suddenly having a separate SSD looks dirt cheap.

See also: fixing software bluetooth issues, especially those caused by dual boot, by using an external 3.5mm or toslink -> bluetooth transmitter. I use a B03Pro and haven't had to fight bluetooth pairing / quality / microphone issues in years, despite dual booting.

Software sucks. Replace it with dedicated hardware and your life will suck less.


> Using a partition is just begging for an update (Windows or Linux) to kill your boot.

Everyone says that, but I really haven't found that to be true in the last 12 years or so. I dual-booted Windows and Arch Linux for multiple years (2014-2017), doing plenty of system updates. I never really had any boot issues caused by an update. The only time I ever had any boot issues is when I was mucking with Grub configs and I didn't really know what I was doing.

Maybe I've just been lucky, but I've had a ton of computers, many of which I've dual-booted with partitions, and it feels like a pretty streamlined process that doesn't seem to break.


My lived experience is just wildly different from yours and stretches from shortly after 9/11 to about 2019 when I swore off partitions.

> The only time I ever had any boot issues is when I was mucking with Grub configs and I didn't really know what I was doing.

Why were you mucking with Grub configs? Because something went wrong.

Why did you know what you were doing? Because it was far from the first time something had gone wrong.

Beware the rose-tinted glasses.


> Why were you mucking with Grub configs? Because something went wrong.

Because I needed to enable a driver, actually, and something on the Arch forums or AskUbuntu said that I could enable it in a boot parameter in my grub config. I cannot remember the specific driver, but it was absolutely not to fix any boot partition issues.

Also, I explicitly stated that I didn't know what I was doing in the comment you're replying to.

Please don't get me wrong; I really don't have any rose-tinted glasses in regards to Linux. I'm basically a Mac person now; my work computer is a Mac M3, my home computer is a MacBook i9. I do have NixOS dual-booted on my personal computer, and it's been a nightmare to get all the drivers working. Just kidding, they still don't work, because the Linux drivers for the T2 MacBooks are garbage. There's a good chance that I will be nuking the Linux partitions this weekend in all honesty. The only thing that runs Linux full time in my house is my server, which runs a NixOS install on a tmpfs root.

It's not like I have a ton of love for Linux, if I did I would probably still be using it for my daily driver, but I just haven't had the dual-boot-partition issues that people complain about. Maybe it's because I so rarely actually went to Windows, but grub more or less did the job.

I will acknowledge that getting Linux installed in a secureboot environment continues to be a pain, however.


Windows is usually the side that causes problems, so if you stayed out of it or supervised the updates that could be why.

The most common killer combo is that Windows tries an update, reboots as part of it, and the reboot unexpectedly (from Windows point of view) goes into Linux. Some time later you log into Windows, it looks at the clock, realizes the reboot didn't go as planned, and flips its shit into some kind of hare-brained recovery process that winds up overwriting your bootloader with something that can't load Linux and can't load Windows either. It doesn't happen every time so clearly there are heuristics, but it definitely happens some of the time because the dead windows loader it leaves behind is distinctive and unmistakable. That said, I've seen both Ubuntu and Fedora updates that automatically fixed the bootloader until it was broken, so it's wrong to fingerpoint too hard here.

The upshot is that "leave bootloader on boot drive alone" isn't part of the implicit operating system social contract, but "leave bootloader on non-boot drive alone" is, so you can solve a big hairy software problem with a small hardware investment.


I currently have dual boot on my laptop.

It is not straightforward, you have to disable windows fastboot, and some windows updates might mess stuff up. Sometimes weird behaviour happens with the bootloader, but it might be bad configuration.

For the peace of mind I would not use dual boot. Perfect way would be to have windows and linux on separate drives and having a easy to use switch.


You were lucky, there is a reason why when VMWare Workstation and Virtual Box became good enough, around 2010, I never used partitions any longer.

The blame has been more on the Windows side than Linux, and special OEM partitions on laptops, nevertheless I no longer had to reinstall my computer, or try to rescue boot partitions.


Just use a live USB. New USB-3.0 external drives are fast. Then you can just switch boot device when BIOS starts and switch OS without needing chainloader. GRUB very stable these days, but if you don't want any hassle at all, this will be completely separate.


We've came to the sad state of ecosystem where tools work by default on Linux and rarely on Windows


Don't fret, Microsoft and AMD are working together, Windows recently added NPU usage to Task Manager: https://www.extremetech.com/computing/windows-task-manager-w...


Is it sad? It seems like the natural conclusion of locking down your software and making it (both technically and legally) hard to extend.


> We've came to the happy state of ecosystem where tools are available to everyone and not just those who pay Microsoft for access.


The nVidia stack seems to work on Windows in various ways, so it's up to AMD to be competitive with respect to compatibility.


Why "sad"? Linux users have had to stomach second-class status in so many other software categories.


The tools I rely on, work just fine on Windows, natively.


Lists don't make great HN submissions: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&sor.... It would be better to pick the most interesting element in the list and submit that instead.


hi dang, I think in most cases you are correct, however in this case it is subtly acknowledging the general change that AMD is making with regards to overall their focus and support of ROCm (which has been a complaint in the past), more than any single blog post on the topic. I think that is what is spawning the discussion here now.



MI25 is dropped from ROCm supported list from 4.5.x onwards.


I have a 6950XT. Last I read it was impossible to run stable diffusion on it on windows with any decent iteration speed. Is there anywhere to track progress on that? I'm trying to practice with some AI/ML tools, but the software issues make me wish I bought a NVIDIA card instead.


When are we likely to see MI300 available for rent and what would the cost per hour be?


Thanks for asking.

Soon! Estimated ship date is 2/29. Should arrive around 5 days after that. We are going to fly to the data center, get the boxes all set up and start to launch things.

Pricing is TBD since there isn't really a comparable and we expect that since these GPUs are so new, people will want to do a lot of testing before they are willing to commit. Therefore, we are looking for partnerships over just customers.

Disclosure: Check out my profile, I'm building a white glove bare metal service that is focused on only top end AMD compute and specifically the MI300x (and future generations). Feel free to reach out via email.


We can't quite afford an 8x bare metal instance, but I am pinning this for later.

Think ya'll will ever offer single MI300X instances?


Yes! Like I said above, we fully expect people to be toe dipping at first given how new this hardware is.

We offer single GPU access through a VM and PCIe pass through. Either we can provide you a basic Ubuntu VM with everything loaded into it (with regards to drivers and such) or you can send us your VM image and set it up however you like. Then you'll just have direct ssh access into it.


OK that sounds lovely. We are but a poor startup, but I am hoping we can afford to reach out to y'all soon. Maybe even share workflows/stuff that works for us. If some ML (unsloth? lorax? sglang?) work on a Mi300 with a little tinkering, that would be spectacular.


Thanks! Again, partnerships over customers. If you're experienced and have the technical chops to make a MI300x sing, we want to work with you. Our model is that we are the capex/opex investor for businesses. As much as I love software, Hot Aisle is more of a hardware business. Running super high end large scale compute is an extreme challenge in itself. We are less interested in building the software side of things and want to foster those who can focus on that side.

https://github.com/unslothai/unsloth/issues/160

https://github.com/search?q=repo%3Apredibase%2Florax+rocm&ty...

https://github.com/sgl-project/sglang/issues/157

https://github.com/casper-hansen/AutoAWQ (supports rocm)


Which companies that already rent out H100s would risk the ire of nvidia providing them less capacity next time?

edit: It seems cudocompute and dataknox is willing to rent them out


MI300 is early access in Azure right now -- https://techcommunity.microsoft.com/t5/azure-high-performanc...


This sadly doesn't mean that they are actually available. I haven't seen anyone saying they are using them. Obviously, that doesn't mean much either, but I'd at least hope to see someone post some performance benchmarks.


Early access means NDA.


Well, that's no fun at all.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: