Yeah write them a ticket. They are really supportive,
pretty sure they will come up with something for you.
(as did they for us, when we were in a similar bind)
On the order of 5-25. Interestingly, these Hetzner machines are so cheap monthly that it is much easier to buy via hetzner and they provide good networking / data center ops as well. Break even time on cost is on the order of a year.
Break even point is used as a term to describe the point your gains are equal to your initial investment, hinting that after this point, you will start making net profit.
Break even applies to the crossover point between any two options. You can break-even between renting or buying a house at some point even though you are even between two options that cost you money.
Another option is to over provisioned spot instances, such that it would feel like a 24/7 instance, but cost like spot. However, you would probably need an automatic platform to achieve that.
An idea I've had is to use GCP preemptibles + script to automatically run a ML training job on instance start + Google Cloud Function + Cloud Scheduler to attempt to start the instance every minute if it gets preempted. The latter two are effectively free, so you'd get the cost benefits of preemptibles as long as the ML training job is resilient to random shutdowns.
Yes, this is basically the idea. However, there are different solutions for training and inference. For training, I would recommend that you add automatic checkpoint, and even consider model migration. For inference (which I think is the original concern), over provisioning is the key (simply because the fact that it would take a long time to load the model. Also, you also want to diversify your node types, etc.
AWS spot instances for gpu can have availability issues as well or sometimes be priced higher than on demand (still trying to understand that one).
If you do choose to run spot at aws (gpu or otherwise) be sure to check out the excellent project at autospotting.org and donate if you use it. Makes it super easy to replace on demand nodes in an ASG with spot nodes and always make sure you’re getting a good price.
> AWS spot instances for gpu can have availability issues as well or sometimes be priced higher than on demand (still trying to understand that one).
You always pay the spot market price, not your bid. Your instance gets killed if somebody outbids you. A higher bid increases the probability that your instances don't get killed while at the same time letting you pay spot market price.
By bidding above on-demand price, you are speculating that for the majority of the time, nobody else will bid more than the on-demand price. If you're not the only one doing that, spot price can rise above on-demand price.
I really like nocix servers, they are out of Kansas City. They got their "i7-6700K 32GB + 2x 480GB SSD + GTX 1080" for $105/mo https://www.nocix.net/cart/?id=338
No, I need NVIDIA CUDA support. Also, the most important metric for me is cost per month per GB of GPU RAM, and AWS elastic inference is pretty bad in that metric.
That's not really how Hetzner's dedis work: you typically pay a setup fee (that's about 1/2 of a month of usage) and you have to cancel a full billing cycle (usually a month) in advance. You're meant to run them for months/years at a time, not just spin them up/down on-demand.
Elsewhere in the thread, it sounds like your needs a little more robust. But for anyone reading who might be interested:
- Tesla K40 12GB cards are ~$130 on US eBay, and have been for awhile
- These cards are passive, though, intending to be used in wind tunnel servers
- However, you can put on 40mm screamer fans to cool them in a normal desktop. Something like https://www.thingiverse.com/thing:3032044 or just wire-tie to the back grill
High pressure 40mm fans are LOUD, though. I run a desktop with two K40s in my garage, and I can hear it a tiny bit standing outside in my driveway. (I mostly use that box for neural net art experiments like various style transfer workflows)
It shows up to the OS as two 12GB cards still. So it kind of depends on whether the higher cost is worth the extra slot (in my case it isn't). Total power is less than 2X on the card versus one K40, although I'm not sure offhand if that indicates less performance or just power savings from shared components.
As mentioned in a peer comment, K40 aren't the fastest cards either, but the 12GB is really nice for some use cases.
I have one K40 as well; you should be fine, even Haas F1 team built a CFD supercomputer full of them fairly recently ;-) Kepler is pretty good in FP64.
The article link mentions an NVIDEO card, and an NVIDIA GeForce GTX 1080 8GB sells for ~ $750 new. I don't know how this $400 card compares for this use case, but it is relatively cheap.
K80 is about as fast as 1080 in FP32 (8.2 vs 8.8TFlops) but vastly faster in FP64 (2.7 vs 0.28TFlops; FP64 supercomputers are still being built with it). It has also 3x the memory, so fitting BERT_large NLP model might be possible. 1080 is pretty much outdated at this point as not many state-of-art models can fit inside 8GB. Disadvantage of K80 are older CUDA kernels versions so customized kernels for new CUDA versions might not work, but most models don't touch CUDA directly anyway.
Being a dual-GPU model, it really can only fit half the total memory in the majority of cases, because sharing memory across GPUs is far slower and more difficult to implement.
Sorry about that! While we had the Dedispec GPU servers listed, our spider system didn't correctly detect the presence of a GPU, so it didn't show up as such. It has now been resolved and they show up in your search results.
Courts in Germany have ruled that you can not add additional clauses to a contract after the contract has been signed, e.g. the purchase has been made.
So unless Hetzner is buying directly from nvidia, nvidia has no way to enforce such a clause. And when I buy a nvidia card at any retailer in Germany, I will not have to sign a contract that obligates me to not use it in a data center - heck I can just take the box off the shelf and pay in cash without having to sign anything.
So anything they put into the EULA is void as it is only known after the time of purchase.
The EULA is attached up the drivers. Newer versions of CUDA often require newer drivers, so you bump into the issue regardless of when you buy the GPU.
They have addons for most support issues, including priority incidents - throughout the years we've been with them, it can take ~2-5 hours before someone even reply to complete server downtime.
We mostly host proxmox nodes and use HA, so it's not much of an issue when it's used like this.
OVH is very similar, although, Hetzner is often quicker.
We only use server-grade (Like Xeon or EPYC), not the desktop ones (No ECC memory).
If one need many IP's, OVH always wins in price due to it being a one-time cost, while Hetzner charges monthly.
Vast AI (https://vast.ai) are good, 1x GTX 1080 comes to about $170/month. You are charged by the hour unlike Hetzner.
There are obviously downsides to renting time from random individuals though, so it's not suitable for a server-like workload. Good for development notebooks or training.
I recently used Vast.ai for ~3 weeks to run an anime BigGAN ( https://www.gwern.net/Faces#biggan ) and the stability/uptime was pretty much 100%: it never went down or caused problems. (I had problems, but they were all due to the BigGAN.) As long as a little downtime isn't too big a deal or you can script changing instances (they have a CLI tool), you probably could run as a server.
Wasn't it forbidden by the driver licence to use 1080 on cloud ? Nvidia tries to bully you to buy ultra expensive Tesla card for that. Maybe the germans don't care ?
Do USA tyre manufacturers sell tyres that you're not allowed to use on your car if you're Uber-ing?
Are there other examples of this sort of restriction that actually make sense. I can think of warnings ('don't use this keychain carabineer for climbing') but not actual use restrictions that are sensible?
It's not one of those ICO scams. Maybe you'd prefer their GitHub page in which the team has written over 100,000 lines of code. https://github.com/golemfactory/golem.
It's a marketplace so you name your own price and at the moment, it's used for CGI rendering, but next week there are more use cases being added, specifically WASM.
Not the experience I've had with Hetzner (been with them for 5+ years), basically all responses have been within a day, 2 at most. Although their servers have been pretty reliable so I haven't had to contact them much.
My firend's experience has been the same. We've had issues where our wordpress instance was exploited and we got kicked off the network.
Also, we've had issues where there was a upnp connection open and we got kicked off the network for that as well.
In those cases it took days for them to get back via email, and then they struggled to open up a channel which we could resolve the issue and get it compliant.
Also, we've had situations where there was maintenance but no update that it was going on.
Any other options out there without going bankrupt on AWS?