My low cost provider of GPUs has run out of capacity. Good alternatives?

etaioinshrdlu · on July 3, 2019

Hetzner has provided a quality service for a long time but now they have no more GPU servers available. That makes it hard to scale.

Any other options out there without going bankrupt on AWS?

jfindley · on July 3, 2019

I've used https://www.paperspace.com/ in the past, and they seem reasonable. They're cheaper than AWS but I don't know how they compare to hetzner.

reallydontask · on July 3, 2019

Have you contacted them to see when they might get them back in stock?

It might be a temporary blip in supplies, etc ...

hanfer · on July 3, 2019

Yeah write them a ticket. They are really supportive, pretty sure they will come up with something for you. (as did they for us, when we were in a similar bind)

etaioinshrdlu · on July 3, 2019

Already did. They just said, currently out of stock, working on making it available as soon as possible.

No ETA provided.

fs111 · on July 3, 2019

scaleway has GPUs now: https://www.scaleway.com/en/gpu-instances/

pantalaimon · on July 3, 2019

500€ instead of 111€ though

tex0 · on July 3, 2019

Did they disclose when they will be in stock again? Did you ask?

ptah · on July 3, 2019

how many do you need? maybe just buy your own?

etaioinshrdlu · on July 3, 2019

On the order of 5-25. Interestingly, these Hetzner machines are so cheap monthly that it is much easier to buy via hetzner and they provide good networking / data center ops as well. Break even time on cost is on the order of a year.

nurettin · on July 3, 2019

Break even? Are you generating digital tokens?

ALittleLight · on July 3, 2019

I interpreted that as meaning you'd have to rent one for a year to break even buying the hardware.

krageon · on July 3, 2019

Why would that follow? He said "on cost", which I read as "comparing the costs of buying a video card and renting GPU time".

nurettin · on July 3, 2019

Break even point is used as a term to describe the point your gains are equal to your initial investment, hinting that after this point, you will start making net profit.

pedrocr · on July 3, 2019

Break even applies to the crossover point between any two options. You can break-even between renting or buying a house at some point even though you are even between two options that cost you money.

blodovnik · on July 3, 2019

Aws and Google spot instances are easily comparable price wise to hetzner but it really depends on what you are doing, which you haven't said.

For example a g3 spot instance can be as low as 19 cents per hour $138/month.

Google GPU instances I've run for 14 cents an hour.

I think these aren't the lowest prices either.

minimaxir · on July 3, 2019

For one-time GPU processing tasks that only take a couple hours, AWS spot / GCP preemptible is far more cost efficient than Hetzner.

However, if you need to run a GPU 24/7 and/or have massive bandwidth requirements, Hetzner is far more cost effective than AWS/GCP.

It's a balance of needs.

streetcat1 · on July 3, 2019

Another option is to over provisioned spot instances, such that it would feel like a 24/7 instance, but cost like spot. However, you would probably need an automatic platform to achieve that.

minimaxir · on July 3, 2019

An idea I've had is to use GCP preemptibles + script to automatically run a ML training job on instance start + Google Cloud Function + Cloud Scheduler to attempt to start the instance every minute if it gets preempted. The latter two are effectively free, so you'd get the cost benefits of preemptibles as long as the ML training job is resilient to random shutdowns.

streetcat1 · on July 3, 2019

Yes, this is basically the idea. However, there are different solutions for training and inference. For training, I would recommend that you add automatic checkpoint, and even consider model migration. For inference (which I think is the original concern), over provisioning is the key (simply because the fact that it would take a long time to load the model. Also, you also want to diversify your node types, etc.

coleca · on July 3, 2019

AWS spot instances for gpu can have availability issues as well or sometimes be priced higher than on demand (still trying to understand that one).

If you do choose to run spot at aws (gpu or otherwise) be sure to check out the excellent project at autospotting.org and donate if you use it. Makes it super easy to replace on demand nodes in an ASG with spot nodes and always make sure you’re getting a good price.

pixelwhale · on July 3, 2019

> AWS spot instances for gpu can have availability issues as well or sometimes be priced higher than on demand (still trying to understand that one).

You always pay the spot market price, not your bid. Your instance gets killed if somebody outbids you. A higher bid increases the probability that your instances don't get killed while at the same time letting you pay spot market price.

By bidding above on-demand price, you are speculating that for the majority of the time, nobody else will bid more than the on-demand price. If you're not the only one doing that, spot price can rise above on-demand price.

etaioinshrdlu · on July 3, 2019

The main use case is on demand neural net inference 24/7 availability.

Kind of like hosting a website that must be on 24/7, this neural net must be on 24/7.

It runs on the order of 50x faster on an nvidia GPU compared to a CPU.

But it will sit idle most of the time.

Price per hour per GB of GPU ram is the most important metric when choosing a server.

somuchtyler · on July 3, 2019

I really like nocix servers, they are out of Kansas City. They got their "i7-6700K 32GB + 2x 480GB SSD + GTX 1080" for $105/mo https://www.nocix.net/cart/?id=338

etaioinshrdlu · on July 3, 2019

This sounds great but they are also out of stock!

rpedela · on July 3, 2019

Would AWS Elastic Inference work for your use case?

https://aws.amazon.com/machine-learning/elastic-inference/

etaioinshrdlu · on July 3, 2019

No, I need NVIDIA CUDA support. Also, the most important metric for me is cost per month per GB of GPU RAM, and AWS elastic inference is pretty bad in that metric.

jonatron · on July 3, 2019

Colocation with ebay'd servers and graphics cards. Some people don't know there's somewhere in between building a datacentre and cloud servers.

NietTim · on July 3, 2019

That's not a low cost solution though, and hard to just turn off the cost when you're done with your day

parliament32 · on July 3, 2019

That's not really how Hetzner's dedis work: you typically pay a setup fee (that's about 1/2 of a month of usage) and you have to cancel a full billing cycle (usually a month) in advance. You're meant to run them for months/years at a time, not just spin them up/down on-demand.

icebraining · on July 3, 2019

> hard to just turn off the cost when you're done with your day

Yeah, but the same is true of OP's Hetzner machines.

matthew-wegner · on July 3, 2019

Elsewhere in the thread, it sounds like your needs a little more robust. But for anyone reading who might be interested:

- Tesla K40 12GB cards are ~$130 on US eBay, and have been for awhile

- These cards are passive, though, intending to be used in wind tunnel servers

- However, you can put on 40mm screamer fans to cool them in a normal desktop. Something like https://www.thingiverse.com/thing:3032044 or just wire-tie to the back grill

High pressure 40mm fans are LOUD, though. I run a desktop with two K40s in my garage, and I can hear it a tiny bit standing outside in my driveway. (I mostly use that box for neural net art experiments like various style transfer workflows)

bitL · on July 3, 2019

K80 are for ~$400, 24GB, 2xGPU, newer architecture, a bit better choice.

matthew-wegner · on July 3, 2019

It shows up to the OS as two 12GB cards still. So it kind of depends on whether the higher cost is worth the extra slot (in my case it isn't). Total power is less than 2X on the card versus one K40, although I'm not sure offhand if that indicates less performance or just power savings from shared components.

As mentioned in a peer comment, K40 aren't the fastest cards either, but the 12GB is really nice for some use cases.

bitL · on July 3, 2019

I have one K40 as well; you should be fine, even Haas F1 team built a CFD supercomputer full of them fairly recently ;-) Kepler is pretty good in FP64.

pbhjpbhj · on July 3, 2019

You consider $400 'low cost'?

Johnny555 · on July 3, 2019

The article link mentions an NVIDEO card, and an NVIDIA GeForce GTX 1080 8GB sells for ~ $750 new. I don't know how this $400 card compares for this use case, but it is relatively cheap.

bitL · on July 3, 2019

K80 is about as fast as 1080 in FP32 (8.2 vs 8.8TFlops) but vastly faster in FP64 (2.7 vs 0.28TFlops; FP64 supercomputers are still being built with it). It has also 3x the memory, so fitting BERT_large NLP model might be possible. 1080 is pretty much outdated at this point as not many state-of-art models can fit inside 8GB. Disadvantage of K80 are older CUDA kernels versions so customized kernels for new CUDA versions might not work, but most models don't touch CUDA directly anyway.

pixelwhale · on July 3, 2019

Being a dual-GPU model, it really can only fit half the total memory in the majority of cases, because sharing memory across GPUs is far slower and more difficult to implement.

_ugfj · on July 3, 2019

https://www.serverhunter.com/?search=BF6-4AC-E30 says the cheapest provider with a GTX GPU is Ikoula with an 1070 at 101.66 USD and then Hetzner and then Hostkey. https://www.hostkey.com/gpu-servers

It doesn't find https://dedispec.com/gpu.php these but I do :) I also read somewhere they even accept shipped in GPUs. I know they did with disks.

ServerHunter · on July 3, 2019

Sorry about that! While we had the Dedispec GPU servers listed, our spider system didn't correctly detect the presence of a GPU, so it didn't show up as such. It has now been resolved and they show up in your search results.

Thanks for using Server Hunter! :)

etaioinshrdlu · on July 5, 2019

Also missing some GPU servers from https://www.hostkey.com/gpu-servers ... although you got some of them.

etaioinshrdlu · on July 5, 2019

Dedispec also appears to be out of stock! Oh no!

andrewl-hn · on July 3, 2019

Folks at CleverCloud launched CleverGrid today https://www.clevergrid.io/

I know a few of their engineers personally, really experienced and trustworthy engineers. They are based in France.

mciancia · on July 3, 2019

I wonder how are they dealing with nvidia EULAs about using geforce series cards in datacenters.

Maybe that is the issue?

pantalaimon · on July 3, 2019

Courts in Germany have ruled that you can not add additional clauses to a contract after the contract has been signed, e.g. the purchase has been made.

So unless Hetzner is buying directly from nvidia, nvidia has no way to enforce such a clause. And when I buy a nvidia card at any retailer in Germany, I will not have to sign a contract that obligates me to not use it in a data center - heck I can just take the box off the shelf and pay in cash without having to sign anything.

So anything they put into the EULA is void as it is only known after the time of purchase.

pbhjpbhj · on July 3, 2019

It's so good to hear when a legal system gets things right for consumers.

mugsie · on July 3, 2019

probably just ignore it, they are unenforceable in the EU afaik (which is a good thing - if I buy a piece of hardware, I will do what I want with it).

hotdox · on July 3, 2019

They don't install drivers into servers, you have to do it by yourself. EULA attached to drivers, not hardware

dragandj · on July 3, 2019

I've read somewhere (maybe in HN comments) that these kinds of clauses can not be enforced in EU. Hetzner operates out of Germany.

etaioinshrdlu · on July 5, 2019

Yes, I wonder how sure HN is about this however!

KaiserPro · on July 3, 2019

1080s are much harder to buy now, perhaps its taking time to find and qualify 2080s

Donald · on July 3, 2019

They use older cards without the restrictions.

ntenenz · on July 3, 2019

The EULA is attached up the drivers. Newer versions of CUDA often require newer drivers, so you bump into the issue regardless of when you buy the GPU.

mciancia · on July 3, 2019

From what I see, first news about that EULA change is from end of 2017, so far before current gen cards were introduced

lukeqsee · on July 3, 2019

Linode just launched their instances in Newark on public beta.

https://www.linode.com/gpus

I'm not affiliated, just love their service!

tmikaeld · on July 3, 2019

~ 8.5x higher price than Hetzner.

lukeqsee · on July 3, 2019

Yes, they're at a much higher price point, but it's also way cheaper than AWS.

I certainly understand the draw of Hetzner's price point!

geezerjay · on July 3, 2019

> Yes, they're at a much higher price point, but it's also way cheaper than AWS.

Everything is way cheaper than AWS.

abrichr · on July 3, 2019

How is Herzner able to be so much cheaper than the competition?

tmikaeld · on July 3, 2019

They have addons for most support issues, including priority incidents - throughout the years we've been with them, it can take ~2-5 hours before someone even reply to complete server downtime.

We mostly host proxmox nodes and use HA, so it's not much of an issue when it's used like this.

OVH is very similar, although, Hetzner is often quicker.

We only use server-grade (Like Xeon or EPYC), not the desktop ones (No ECC memory).

If one need many IP's, OVH always wins in price due to it being a one-time cost, while Hetzner charges monthly.

Eldt · on July 3, 2019

Unmanaged (and typically old) hardware

raverbashing · on July 3, 2019

Well, it looks like a supply/demand issue by Hetzner

baybal2 · on July 3, 2019

I think Ebay and TaoBao are now filled to the brim with discarded mining GPUs

blodovnik · on July 3, 2019

Can you link some examples?

baybal2 · on July 3, 2019

My first search at taobao gives gtx1080 for 300 usd at first position

dharma1 · on July 3, 2019

how much life do these typically have left in them?

baybal2 · on July 3, 2019

Limited by caps and electromigration. I'd say a half life of 10-15 years

sp332 · on July 3, 2019

My concern would be the fans dying. Consumer-grade cards have quiet fans which have 1/4 the life of a loud ball-bearing fan.

penagwin · on July 3, 2019

This is the biggest failure point for mining cards, they're easy/cheap to replace though.

samscully · on July 3, 2019

Vast AI (https://vast.ai) are good, 1x GTX 1080 comes to about $170/month. You are charged by the hour unlike Hetzner.

There are obviously downsides to renting time from random individuals though, so it's not suitable for a server-like workload. Good for development notebooks or training.

gwern · on July 3, 2019

I recently used Vast.ai for ~3 weeks to run an anime BigGAN ( https://www.gwern.net/Faces#biggan ) and the stability/uptime was pretty much 100%: it never went down or caused problems. (I had problems, but they were all due to the BigGAN.) As long as a little downtime isn't too big a deal or you can script changing instances (they have a CLI tool), you probably could run as a server.

ydau · on July 3, 2019

Lambda Labs: https://lambdalabs.com has 1080 Ti GPUs for rent. We also sell GPU workstations and servers for AI.

minimaxir · on July 3, 2019

I wouldn't call $0.80/GPU/hr "low cost" compared to other options.

julienfr112 · on July 3, 2019

Wasn't it forbidden by the driver licence to use 1080 on cloud ? Nvidia tries to bully you to buy ultra expensive Tesla card for that. Maybe the germans don't care ?

pbhjpbhj · on July 3, 2019

Do USA tyre manufacturers sell tyres that you're not allowed to use on your car if you're Uber-ing?

Are there other examples of this sort of restriction that actually make sense. I can think of warnings ('don't use this keychain carabineer for climbing') but not actual use restrictions that are sensible?

brohee · on July 3, 2019

Not enforceable in Europe IIRC. So they rightfully don't care.

etaioinshrdlu · on July 5, 2019

I believe you but can we get some hard evidence? Everyone in this thread agrees but it would be nice to know for sure.

tech_man7 · on July 3, 2019

Golem has GPUs for very cheap on its network so ask here https://www.reddit.com/r/GolemProject/ or https://twitter.com/golemproject?lang=en. Golem website is www.golem.network

etaioinshrdlu · on July 3, 2019

I don't like that website, it looks almost like a blockchain ICO site. Low on details, high on hype.

I don't see prices listed. Something seems wrong.

This company does something pretty similar minus the blockchain and is much more straightforward: https://www.zerosix.ai/ , same with https://vast.ai/console/create/

tech_man7 · on July 3, 2019

It's not one of those ICO scams. Maybe you'd prefer their GitHub page in which the team has written over 100,000 lines of code. https://github.com/golemfactory/golem.

It's a marketplace so you name your own price and at the moment, it's used for CGI rendering, but next week there are more use cases being added, specifically WASM.

workingpatrick · on July 3, 2019

It's exactly a blockchain ICO site :D https://golem.network/crowdfunding/

etaioinshrdlu · on July 5, 2019

Update: they don't seem to actually be a usable option at this point. vast.ai and zerosix.ai at least do work.

cryptofits · on July 3, 2019

Hetzner support is pretty bad

took them like 6 days to answer my ticket

mythz · on July 3, 2019

Not the experience I've had with Hetzner (been with them for 5+ years), basically all responses have been within a day, 2 at most. Although their servers have been pretty reliable so I haven't had to contact them much.

troffed · on July 3, 2019

That's not our experience. We're using Hetzner servers for 6 years and we not have any problem with their support.

monksy · on July 3, 2019

My firend's experience has been the same. We've had issues where our wordpress instance was exploited and we got kicked off the network.

Also, we've had issues where there was a upnp connection open and we got kicked off the network for that as well.

In those cases it took days for them to get back via email, and then they struggled to open up a channel which we could resolve the issue and get it compliant.

Also, we've had situations where there was maintenance but no update that it was going on.

jbverschoor · on July 3, 2019

They have a phone number