Hacker Newsnew | past | comments | ask | show | jobs | submit | ftufek's commentslogin

https://huggingface.co/models is usually good place to look, you can sort by trending and filter by the task you care about (ex: Image-Text-to-Text). The first page will usually have the leading edge/newer models.


Yeah, some people say they got replacements through the warranty. The problem is, this thing is really big and heavy, so boxing it up is a real pain, especially if you've had it a while and already threw out the original box.


That's why my buddy said it's time to buy shares in bubble wrap


Nah, just be a geezer and wrap it in bin bags and then tape around. It's bricked anyway, innit.


Waste of bin bags. Just write the address on the front in marker pen.


Unfortunately those "solutions" don't work, the person who had a potential solution was able to at least go through the inputs, this is not the case here, you can't even go through the inputs.

I've tried all the potential solutions this morning. It seems permanent unless Samsung somehow finds some magic to fix it, especially since the soundbar won't connect to WiFi/internet and doesn't do anything with the USB plugged in.


Their consumer cards are the entry point for many researchers and students though, so it pays off eventually when they become engineers working with the expensive enterprise cards.


Great point - an analogy comes to mind: MSFT (and Adobe) were totally okay with (and even encouraged) students pirating Windows/Photoshop etc in non-Western countries in the hopes they grow up and carry that knowledge in a future legal venture.

They were right.

This is the hardware equivalent of that.

Only until CUDA is replaceable though: and who is gonna do that? Intel better carry through their promises in this regard.


It's not just the home insurance either. Last week, I bought a car and 3 of the big insurance companies refused to insure it without a 15 days waiting/underwriting period (Geico which I've had for many years, State Farm and Progressive). It was pretty surprising, it seems like they are trying to stop offering insurance without explicitly getting out of the state or something. Thankfully, AAA gave immediate coverage.


You'll need something like EPYC/Xeon CPUs and motherboards which not only have many more PCIe lanes, but also allow bifurcation. Once you have that, you can get bifurcated risers and have many GPUs. And these risers use normal cables not the typical gamer pcie risers which are pretty hard to arrange. You won't get this for just $200 though.

For the chassis, you could try a 4U rosewill like this: https://www.youtube.com/watch?v=ypn0jRHTsrQ, not sure if 6 3090s would fit though. You're probably better off getting a mining chassis, it's easier to setup and cool down, also cheaper, unless you plan on putting them in a server rack.


Really depends on the model and the software tricks you're using. With DDP and gradient accumulation, you can reduce the bandwidth bottleneck by quite a bit. We've trained with 4090s running at x4 lanes with very small impact. And running at x4 means you can stuff up to 26-28GPUs on a single cpu node (say epyc) and get PCIe latency and get rid of networking hassle.


Interesting, I would expect the impact to be noticeable at 4x! and yeah it heavily depends on model, sharding method, model vs data parallel. I’m hitting the peak bandwidth due to a very wide, shallow model that is split between each GPU model parallel and with CPU optimizer offload - so worst case scenario there.

But it does kind of validate Nvidia’s choice to remove nvlink. How useful would it really be if x4 PCIE gets reasonably decent perf? Unless your inner dim is massive or something you should be fine.


Do you have any pictures and/or documentation of that setup, power draw and performance? It sounds pretty interesting!


Never got around to writing some public docs. It's essentially bunch of GPUs on custom aluminum extrusion frames sitting in a server rack, connected to romed8-2t motherboard through pcie splitters.

Power limited to 240w, negligible performance loss while halving energy usage, uses 3 20a circuits.

Performance can range anywhere from 2x4090=1xa100 to 4x4090=1xa100 depending on models, etc.

It's great value for the money, and very easy to resell as well.


Very nice!

240W?

3 x 20A = 6600W?


I meant each card is limited to 240w, instead of the usual 450w. Also, it's more like 4 circuits after all, because the main cpu/mb/2gpus are on a 15a too.


Ah! Ok, thank you now I get it. That's a very nice rig you have there. So at a guess you didn't care as much about the peak computing capacity as long as whatever you are doing all fits in GPU memory and this is your way of collecting that much memory in a single machine so you still have reasonable interconnect speeds between GPUs?


Yeah, it's really just trying to get as much compute as possible as cheaply as possible interconnected in a reasonably fast way with low latency. Slow networking would be a bottleneck and expensive high end networking would defeat the purpose of staying cheap.


You’d be surprised at how cheap high end networking that outperforms PCIE4 x4 is - 100Gb omni-path nics are running for 20$ on ebay! And those will saturate PCIE3 x16.

Though of course with multiple boards/ram/cpu it gets complicated again.


Which cards? Been looking at nics but couldn't find cheap ones past 25-40Gb


Omni-path is/was the Intel fork of Infiniband, which from rough memory they bought from QLogic some years ago.

* Switch: https://www.ebay.com/itm/273064154224

* Adapters: https://www.ebay.com/itm/296188773061 / https://www.ebay.com/itm/166199451199

* Cables: No idea, sorry. ;)

* Description: https://www.youtube.com/watch?v=dOIXtsjJMYE

Note that I don't know those Ebay sellers at all, they're just some of the cheaper results showing up from searching. There seem to be plenty of other results too. :)

---

Possibly useful, though it's Proxmox focused:

https://forum.level1techs.com/t/proxmox-with-intel-omni-path...


Very smart approach. I may copy your setup for some project that I've been working on for years but that stalled waiting for more memory in GPUs.


240W per card probably


Indeed.


Local workstation is much cheaper in the long run.

Even ignoring that, most of the development is running experiments. You're gonna be hesitant to run lots of experiments if they each cost money whereas when you pay upfront for the hardware, you're gonna have the incentive to fully utilize it with lots of experiments.

I'd go with rtx 4090 and deal with memory limitation through software tricks. It's an underrated card that's as performant as cards that are magnitude pricier. It's great way to get started with that budget.


I agree with you but right now RTX 4090 cards are pushing $2000, which doesn't leave much budget left. I'd suggest picking up a used 3090 card from eBay, which are currently around $800. This will still give 24gb of VRAM like the 4090.


i've seen some blog posts saying if you buy a used 3090 that has been used for bitcoin mining then there is a risk of thermal throttling because the thermal paste on the vram is not great and worse if it was run hot for a long time.

any recommendations on how to buy one? e.g. 24GB model, any particular model to run LLMs? what is the biggest baddest LLM you can run on a single card?

have been thinking about it but was sticking with cloud/colab for experiments so far.


The good deals are gonna be on local ads. Facebook Marketplace in most of the US.


Craigslist and eBay have some great deals.


I remember videos (on youtube likely) of thermal paste replacement, that was upgrade to stock card. So, average person should be able to do it. It'll cost a few $$ for the paste. I would go with local workstation, then don't have to think much about while running stable diffusion. Plus, if it's used from ebay, prices cannot go much lower, you'll get something back at the end. Also, for image things training dataset can be quite big for network transfers.


Strong endorse here. I pick up used RTX 3090s from Facebook Marketplace and eBay at $800 maximum. Can usually find them locally for $700-750, and typically can test them too, which is fine (though I've had no issues yet).


Depending on what you're doing, 2x used 3090s are the same price and offer you more VRAM. That's what I'm planning on doing, in any case - being able to run 70B LLMs entirely on the GPU is more useful than being able to run 34B faster.


Yeah multiple 3090s is the best budget way to go for sure. Also older server boards with tons of PCIe lanes if you can swing rack mounted hardware and have some technical skills.


Agreed. I recently completed a new build with two 3090 GPUs and really appreciate being able to run 70b models.


which cpu did you go with?


i7-14700k

z790 chipset w/ mobo that supports x8/x8 bifurcation

96gb ddr5 @5600mhz


I've been seeing a bunch of them in Bay Area, I thought it was already launched and started deliveries. In person, it looks like something out of a movie set.


The release event is in like ~30 minutes on their discord, probably the announcement went out a bit early.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: