EDIT: if you really need GPU compute power, go buy a GPU from the computer store around the corner - they are cheap, fast and available.
I gave up on ec2 when they started requiring you "request for quota" to start a gpu instance.
You have to "request for quota" even if you want to run a single instance.
You have to specify which specific instance type you want.
You have to specify which region you want it in.
Then you sit back and wait for some human to "approve your request".
In my case this took 24 hours.
In this time I literally could have walked (not driven) down to the local computer store, bought a computer, pushed it back to my house in a shopping cart, spent a few hours configuring it and still be left with 16 hours to have a sleep, eat and do some other things.
AWS quota system is so far from "scalable" and "elastic" that it's effectively useless. You can't design any sort of infrastructure around that sort of quota system.
I dumped AWS at that point. Mind you, Azure has exactly the same quota system.
Seriously, just rent a server from Ionos or Hetzner or get a fast Internet connection and self host. It's faster and better and cheaper than any of the big clouds.
I can understand why though. One employers first use of a GPU instance was when we were hacked and someone fired up a few to mine crypto, only a day off which cost us a few $1000 that, fortunately, AWS refunded. It's quite likely that most users don't use them and that it is a good signal that you've been hacked.
They have a quota system for sending emails too, and it's not because they need to purchase any hw for sending more emails. It's because those are also a magnet for hackers.
If AWS was really clear about what you were spending then it would be easy to be running an app that tells you the AWS usage and alerts you of anomalous patterns.
My bank contacts me if there are any questionable transactions.
I think you would be shocked at how often banks have similar restrictions on corporate accounts. As would you be surprised at how often companies are hacked.
In that world, you better make it very difficult to change your alerts downard: Otherwise, the very first thing a hacker would do is to take over the alert system, and only then start using your account for their profit-making ventures.
They do this. There's a fine place between balancing the overhead (contacting customers, monitoring) and false positives.
But if weird stuff happens, AWS will take action. I'm sure part of it is because they care, but another is because they don't want to get stuck holding the buck.
AWS will tell you as much as you want to know about how munch you’re spending. Cost breakdowns and billing alerts are more configurable than you’ll see pretty much anywhere else. This is a miscategorisation of the problem with AWS billing. The actual problem - which is not misunderstood - is that it’s complicated. There’s a reason that AWS has an unwritten ‘one-time get out of jail free card’ system if you accidentally mess up and charge $50,000 to your corporate credit card. And there’s a reason that a condition of this escape hatch is that you set up billing alerts like you should as a responsible person hooking up a PAYG service to what’s sometimes an effectively infinite line of credit.
That's like saying the Internet will tell you all you want to know.
Yes, (most of) the detail is in the CUR/CBR - but you have to be smart enough to understand it.
It's misleading to say that AWS will tell you all you want to know. Try getting them to explain your network costs in detail using network load balancers in a simple email.
Lambdalabs used to have spot instances for GPUs and now they are completely sold out. Even vast.ai hardly has any A100 instances. We are starting to see paperclip maximizer economics start with GPUs since they are so damned useful.
Well, paperclips are clearly useful, not so sure about ChatGPTs ... so far looks like their effect is net negative - more cheating and more spam of all kinds.
It's okay for an infitely burstable product to delay provisioning services by 24 hours and require you to capacity plan in specific detail before you can use things?
I don't agree. Their reality distortion field is fooling customers, AWS has their cake and eats it too. Try being a startup and spin up 1000 c6in.8xlarge for a 2 hour batch job.
Also, I felt that Amazon was playing sketchy little games with spot instances.
It's not impossible but quite technically challenging to find out how much you are actually paying for a spot instance.
It felt really dodgy to me that I might have started a spot instance thinking I was paying the minimum listed rate but somewhere amongst the formulas and terms and conditions AWS actually decided I would be paying maximum rate.
They don't actually tell you up front what your spot instance are costing, no doubt it would not be hard for them to do so, but it seems a deliberate strategy to hide this information.
It's just not worth the game playing when self hosting or renting a server from Ionos is cheaper and takes away the uncertainty.
Comparing with self-hosting or renting a server is just dishonest. At that point you may as well compare with spinning up an actual EC2 instance. Your alternatives miss the main original value proposition of EC2 altogether, let alone spot instances.
I don't really buy the idea that companies can't afford to run their own computers due to hardware and infrastructure costs, and that clouds don't require hiring or specialised technical experts.
Pay per second Linux boxes with public IPs. A lot of the VPS/server hosts I used in the past took minutes to boot an instance and billed you by the hour (or maybe by the month, although that was uncommon), not so with EC2.
There’s much less value if you never need to scale of course, which is true of 90% of businesses. But doing a dumb experiment on a 10 cent spot instance for an hour where it would’ve taken a week to arrange space on the company VMWare box? That is awesome.
EC2 was billed by the hour initially. Minimum one hour charge as soon as you started an instance. It was still better than the alternatives; nobody else had per-hour billing with no commitments in 2006. Moving to per-second billing was a much later change (in 2017).
> Pay per second Linux boxes with public IPs. A lot of the VPS/server hosts I used in the past took minutes to boot an instance and billed you by the hour (or maybe by the month, although that was uncommon), not so with EC2.
It's only worth it if the pay-per-second box has an equivalent or better cost-performance ratio than the "legacy" alternative, otherwise it's still cheaper to get one "legacy" VPS/dedicated server, use it for 10 seconds and let the rest go to "waste".
Wow, never realized that that could be a great money laundering strategy.
1. Take cash, load it onto prepaid cards.
2. Buy GPU instances.
3. Mine crypto.
4. Pay for compute with #1
5. Sell crypto
6. "Clean" cash
I just don't have the mind of a criminal I guess, it sounds like this was figured out a long time ago.
There are (were) enough random kids in thier basements making millions on crypto that it probably wouldn't turn too many eyebrows either by the IRS if you actually tried to report your crypto earnings cleanly, they would likely spend their efforts chasing down the people who weren't reporting.
Fortunatelly your evil ML plan going to fail on #1. It's basically impossible to get some cash loaded to prepaid cards anywhere. Also AWS and other hosting providers wouldn't accept prepaid cards either. Instead you just buy crypto with cash P2P and report that as mined or whatever.
On other hand credit card fraud is real problem especially due to fact that Amazon don't want to add any KYC burden on their customers or try to track down any anomaly spending. After all good chunk of AWS profits come out of fact how bad people of tracking their cloud costs.
If Amazon to implement built-in option to track suspicious jumps of AWS costs then it's not only gonna cut on fraud, but on overall AWS profits.
That would be true regardless of the means of obtaining the GPUs -- I was just replying to the issue of the cash->GPU funnel, not asserting that any other part of the operation is easy.
That's a good way of monetizing stolen credit card but it would be a very bad way to do money laundering, because crypto is effectively "cash" so it still isn't clean. It doesn't have a good origin story.
Basically for money laundering you want to have a good story for your money that you can get tax authorities and banks to believe. "I found some crypto" would only work at small scale, and crypto is considered "high risk" anyway so you'll get higher scrutiny.
If I was in the business of laundering money and I wanted to do it via crypto, I think I would create an NFT instead then buy it from myself. Then I have funds I can transfer to cash, and I can claim I got them through my great artistic abilities.
The NFT sounds better but assuming the police FBI, DEA etc. are smart, which they probably are, they can see that you are not famous in the NFT world so why did someone pay so much. So you would need to almost be able to make a million anyway (just pure NFT grift) to launder a million (some other source)
There are a lot of systems and features that are hard (impossible?) to design in a way where you can predict (every dimension of) spend; you can only react to spend events.
For example, a theoretical "prepaid AWS" might allow you to put a hold on a vCPU-month of account credits to start a 1vCPU instance. But what about the bandwidth egress fees when someone makes requests to said instance? Those are going to be completely variable, depending on how much traffic the instance receives.
Yeah, but there are plenty of organizations which have very modest or predictable loads that would be significantly well served by knowing that the monthly spend was capped at $X prepaid.
As a nobody, I want to use AWS, but I refuse to have the unlimited liability in case I screw up something and wake up to a $30k bill. Hell, they could even over-charge me on the credits, and I would gladly take that deal if I knew that once the kitty ran dry, services would stop.
Would there be edge cases and complications to resolve (eg what about storage?). Sure, but AWS pays some smart people a lot of money to figure out tricky things.
I work at AWS in Professional Services of course all opinions are my own.
The answer for you is LightSail.
LightSail is a standard VPS. But if you want to upgrade to “real AWS” later on, you can. The only thing that I’m aware of that could cause bills to go up is egress over your allowance.
To be honest with you, I would be slightly afraid of screwing up and having an unexpected bill from AWS if I were doing a personal project and I do this for a living. There have been plenty of times where I left something expensive running or provisioned an expensive service (Kendra) and forgot to shut it down until I ended up on a list of “people with the highest spend” on our internal system on one of my non production accounts.
If AWS truly cared about customers they would implement spending limits. Note the plural: Customers don't want their S3 data deleted because some GPU stuff went crazy.
How would that actually work? When you reach your spending limit, delete your data from S3 that you’re being charged for? Stop allowing egress traffic? Stop allowing any API calls that cost money? Stop your EC2 instance?
AWS has over 200 services. How would you implement that conceptually?
I know it’s a real concern when learning AWS for most people. I first learned AWS technologies at a 60 person company where I had admin access from day one to the AWS account and then went to AWS where I can open as many accounts as I want for learning. So I haven’t had to deal with that issue.
But what better way would you suggest than LightSail where you have known costs up front?
I think it could be done reactively, as long as two things are true:
1. spending limits are fine-grained — rather than having one global budget for your entire AWS project, instead, each billable SKU inside a project would have its own separate configurable spending limit. The goal here isn't to say "I ran out of money; stop trying to charge me more money"; it's rather to say "I have budgeted X for the base spend for the static resources, which I will continue paying; but I have budgeted Y for the unpredictable/variable spend, and have exceeded that limit, so stop allowing anything to happen that will generate unpredictable/variable spend."
This way, you can continue to pay for e.g. S3 storage, while capping spend on S3 download (which would presumably make reading from buckets in the project impossible while this is in effect); or you can continue paying for your EC2 instances, while capping egress fees on them (which would presumably make you unable to make requests to the instances, but they'd still be running, so you wouldn't lose the state for any ephemeral instances.)
2. AWS "eats" the credit-spend events of a billing SKU between the time it detects budget-overlimit of that billing SKU, and the time it finishes applying policy to the resource that will stop it from generating any more credit-spend events on that billing SKU. (This is why this kind of protection logic can never be implemented the way people want by a third party: a third party can only watch AWS audit events and react by sending API requests; it has no authority to retroactively say "and anything that happens in between the two, disregard that at billing time, since that spend was our fault for not reacting faster.")
Note that implementing #2 actually makes implementing #1 much easier. To implement #1 alone, you'd have to have each service have some internal accounting-quota system that predicts how much spend "would be" happening in the billing layer, and can respond to that by disabling features in (soft) realtime for specific users in response to those users exceeding a credit quota configured in some other service. But if you add #2, then that accounting logic can be handled centrally and asynchronously in an accounting service which consumes periodic batched pushes of credit-spend-counter increments from other services. The accounting service could emit CQRS command "disable services generating billable SKU X for customer Y starting from timestamp Z" to a message queue, and the service itself could see it (and react by writing to an in-memory blackboard that endpoints A/B/C are disabled for user Y); but the invoicing service could also see it, and recompute the invoice for customer Y for the current month, with all spend events for billing SKU X after timestamp Z dropped from the invoice.
In Washington State we have an account for toll infrastructure. You set a top-up amount and a minimum. When the minimum is reached, the top-up is charged to your card and applied. If that fails a bill is mailed to you. If you fail to pay that then civil penalties are assessed.
Your point? The goal here would be to have a policy that protects you from someone stealing your toll badge (i.e. hacking into your AWS account) and running you through a toll bridge a million times (i.e. generating huge variable spend.) I don't see how what you're saying relates.
Devil's advocate: 10 year old accounts are probably just as likely as any to get hacked into and used for crypto mining, and honestly I bet a majority of their customers don't even use GPU instances.
One could argue that older accounts are even more likely to get hacked, as they are more likely to have older passwords that are weaker that may have been leaked along the way, along with various other accumulated security issues (leaked API keys, out of date 2FA choices etc).
Yep, Amazon's threat model has less to do with how reputable a given account is, since credential theft is rampant. Even at massive institutional customers on POs I've had to apply for quotas.
I think it’s tough to blame Amazon here. p4d server hardware costs around 100k and they were almost certainly having countless hours of use on these by stolen aws accounts and credit cards. This doesn’t excuse the requirement to specify region and instance types in advance. The region part at least can be explained by the fact that Amazon tries to make regions as independent as possible.
>and still be left with 16 hours to have a sleep, eat and do some other things.
I hope somewhere in those 16 hours you bother to return the shopping cart. Or are you one of those that just leaves it where ever because you can't be bothered? I will be bringing this up at the next tenant's meeting.
Google Cloud does this, too, for GPUs on their compute instances, but the approval (I assume if there is manual approval, there are conditions which trigger it that I missed) was near instant in my experience.
They closed last year, it's now a 50-minute bus ride and then ~10 minutes walking (one way). Apparently we were one of the last customers to buy most parts for a desktop there. I fear we may not be able to refer to corner stores much longer if they're not of a type most people actually need on a weekly basis, with things moving to online sales
> In my case this took 24 hours.
Even shipping is faster than that, so yeah point still taken
Having your own GPU makes sense if it works out cheaper than the cloud equivalent (not just the GPU bit but everything else - bandwidth/storage fees, etc).
Given the prices of cloud infrastructure, it doesn't take much before walking to your local computer store and buying a GPU (or more!) becomes more cost-effective.
> and still be left with 16 hours to have a sleep, eat and do some other things.
Well there's your problem. You don't need to sit back the whole time while you're waiting for approval, it's okay to leave your chair.
Jokes aside, I think this probably depends on use-case. You can't easily return the computer. And you can't easily buy (and then return) 100 computers if you have a large one-time computation to run.
Do they still wait until the very last moment before launching to tell you that the account you're logged in under doesn't actually have permission to launch EC2 instances?
AWS is following the same path as other web technologies offered by big tech. In the beginning, they want to lure as many people as possible to their new, shiny product, so they offer irresistible prices. The next step, which is the important one, is to secure corporate and government contracts for that technology. Next thing you know, they will see small clients as a nuisance, and do everything to make their lives more difficult.
Effectively, it starts up cheap spot instances (based on specified criteria) across a variety of instance types to replace whatever regular instance in an autoscaling group comes online and then spins down the regular instance.
EG: That m4a you wanted may be expensive... but nobody is using m4ad so it's 85% off and it meets the specified CPU/RAM requirements... auto spotting will spin it up instead.
Having used it on and off over the years it is sometimes eyebrow raising to see 4xl boxes running cheaper than the xl box they replaced :)
It works, but it may depend on how resilient your workload is. We have timed spot instances that come online shortly before the top of the hour in the morning eastern, and via specifying instance alternates we'd often get instances that were much larger than our low-ish bids would warrant. Then most of them would get pre-empted <2 minutes later and there would be a bunch of churn as we had to replace them with what we should have gotten originally.
The author of AutoSpotting here, glad to see it mentioned in here and happy to help clarify such issues.
Are you sharing this as an experience of using AutoSpotting or other mechanisms to launch Spot instances?
What you described may still happen occasionally and it happened a lot with older versions, but the recent versions of AutoSpotting, especially the commercial edition available on the AWS Marketplace (although the community edition available on GitHub also does it to some degree) should avoid such situations in the default configuration should be much more reliable than before.
Actually when it comes to capacity, when Spot capacity is not available and we failover to on-demand, the latest AutoSpotting version will failover between multiple instance types with the same diversification used for Spot instances so you're pretty much guaranteed to have capacity even better than your initial single-instance ASG.
We also use the capacity-optimized-prioritized allocation strategy with a custom priority based on lowest cost, but also preferring recent instance types if available for more performance and lower carbon emissions.
Let me know if you have further questions about AutoSpotting, Im happy to help.
To be fair, its also easier than ever before to use spot instances.
If you set up an EKS/K8s cluster and install Karpenter. You can configure it very easily to use spot instances anytime prices are available for less than on-demand and to use on-demand when spot instances are unavailable or too expensive.
You end up never thinking about it but having full availability.
In practice it means the ceiling on your bill is the on-demand price, but you usually average out much lower.
I suspect as more and more people switch over to this model then the use of spot instances will stay closer to full saturation, with those discounts becoming negligible.
It's all about how your spot instances can be terminated. Perhaps this could interfere with a workload that might take a long while to finish.
On paper your app should be resilient to these things but in practice it's essentially an unforgiving chaos monkey that will delete servers from your cluster.
If you can handle that, you can definitely save a lot though. Especially web servers where it's expected you'll be responding in a few seconds while maybe your background workers are churning through longer lived tasks. The problem there is you need to already be big enough to have different nodes handling web vs worker traffic. A lot of decently sized apps can operate in a world where maybe you have 2-5 nodes on your cluster and your web and worker pods land on any of them based on Kubernetes' default scheduler.
Afair Karpenter handles that. AWS gives some lead time before taking away spot instances. Karpenter puts a taint on the node and spins up a replacement. The scheduler gently evicts the pods, and nothing is lost.
This does require architecting your services to be shut down tolerant, but that's par for the course. If you're using Kubernetes you've probably already settled on the "[herd, not] cattle, not pets" idea.
> The scheduler gently evicts the pods, and nothing is lost.
If you have heavy duty work being done, it might get kill -9'd after the grace period is surpassed. If you weren't using spot instances you have full control over the grace period.
You can of course handle this too, by breaking up your work in ways where it can be interrupted without losing progress but depending on what you're doing this could be very complicated. Basically the takeaway is you can't blindly start using spot instances, even if your app has been running in Kubernetes successfully for a while.
there's a cost to engineer something that automatically switches or someone going in and manually changing it. so the spot prices has to be higher than ondemand + switching costs. the new pricing models(a couple years old) though have mostly alleviated this
This shocks me too. When I first had to design a system around spot instances I assumed it caps out at the on-demand price. But in practice that is very much not the case. The spot price can routinely go above on-demand, with all the downsides of spot still being applied to the instance.
It is because many systems are only built to support spot. When running a workload, they might not want to give up availability so they just pay the higher price. To be fair, this is a game that AWS engineers. They essentially want to kick people off the infrastructure to free it up for other uses. So this is a way to signal how badly you don't want to be kicked off. A lot of times after a price goes above on-demand pricing, it will dip down below fairly quickly as workloads all re-adjust. Some companies are willing to play that gamble, that a short period above on-demand is still cheaper in the long run when you average out the invoice at the end of the month.
This is no longer the case. Back in 2018 AWS changed the pricing model so now there's no more bidding.
This means "bidding" more won't protect your capacity, AWS will just take it if they need it.
Prices are now relatively flat, and changing slightly over time by a fraction of a cent per day with a cap at the OnDemand price, while previously there used to be lots of from 0.1x to 10x the on demand price and with swings even multiple times within a day.
If an instance on-demand is, for the sake of easy discussion, $1.00/hr and the spot rate is usually $0.30/hr, it might make sense to bid $1.25/hr (or $1.26/hr) if you didn’t want to get pre-empted.
Most hours, you’d pay something close to $0.30/hr. Some hours, you’d pay over a $1.00/hr, but you’d save money overall against on-demand.
(This ignores Reserved Instances and Savings Plans.)
Maybe poorly architected software, where it’s cheaper to increase the value of a variable than have a bunch of API calls that check the current onDemand price and call different APIs depending on which number is smaller.
Interesting. While my data does show this being the case in a handful of pools (1-5 out of 13k pools at any given time), it appears to occur only exceedingly rarely.
I know at AWS they drill into folks that spot capacity isn’t guaranteed capacity. I didn’t even think about more demand for spot reducing supply or additionally AWS just simply not investing in as much hardware to accommodate spot because of lack of growth. That said, we're seeing leading indicators that this trend will reverse in as cloud growth returns, https://www.vantage.sh/cloud-cost-report/2023-q1
Even on-demand isn't guaranteed capacity. There have been a handful of times in my career where I've tried to spin up a lot more instances and got met with an out of capacity error. There's a special kind of reserved instance for guaranteed capacity.
You have the option to get On-Demand Capacity Reservations now which are separate from RIs. Capacity reservations don't require 1- or 3-year commitments like RIs do. This can be used if you want to get your savings through Regional RIs or Savings Plans instead.
A few years ago we ran builds on m5d instances (the ones with local nvme) and they'd pretty regularly run out of capacity in certain AZs (although it was usually only a single AZ and there'd be a slight delay before the ASG grabbed one from a different AZ)
Exactly this. Our group regularly maxes out instance types in a region for some of our simulations. Have to overflow the requests over to other regions in those cases. Unfortunately the load is not consistent enough to pay for the reserved capacity.
The q1 on-demand metrics are definitely very interesting!
One possible cause of this trend is actually still consistent with spot demand growth in aggregate: when customers deploy spot fleets with on-demand backup capacity and all spot prices get closer to on-demand, those fleets would become more likely to revert to on-demand, rather than a more expensive pool of spot instances.
Measuring this would be a great opportunity to leverage the vantage point that your team has!
It would be a bit odd to build capacity for spot instances. The whole point of low priority, preemptable instances is to fill up the capacity that you have available, but have earmarked for scheduling big jobs or handling failures (say a power domain or decent chunk of the network goes down).
Otherwise they would be no different from any other instance type.
I stopped using cloud hosting until _it is actually necessary_
I have two home workstations that are collecting dust. I have 6 and 16 core CPUs with HT hooked up to 32 and 128gb ram.
I figured out how to make my dynamic IP still serve requests accounting for IP changes using a background script. If someone has a better solution I’d appreciate that.
Both these machines are more than enough to do everything I could dream of building and they host dozens of my little apps and services that I use for myself.
I will only use cloud when necessary for commercial product and services.
- Rent a cheap VPS with an static ip address (for example, the most basic at Vultr)
- Set up wireguard or tailscale/headscale (or your preferred VPN) both on the VPS and your workstations, so you can access the services in your workstations from the VPS through the VPN
- In the VPS configure Nginx/traefik/caddy (or your preferred reverse proxy) pointing to the services in your workstations
Ta-da! Now people access through your VPS, but the processing is done at home. The VPS is cheap, and it doesn't matter if you home IP address changes. You could even take your workstation, move it to any place with internet connection, and it would keep serving pages without changing anything.
Whoah! That's awesome! Thank you for making my weekend better :D
I'll try setting that up over the weekend. Just wondering - would you recommend using your suggested setup for a "production" app?
Considering the hardware I own is powerful enough to host postgres + a half dozen services to run the app including a bunch of real time processing. If my calculations are correct I should have enough resources to handle a few thousand or more concurrent users, and at an old gig we were able to service millions of DAU with even less power. As long as I can meet <500ms client request times I think it should be fine, considering there's gonna be extra network hops.
Don't forget to choose a VPS close to you with low ping for this. Also, depending on the skills you want to learn, you can use nftables with DNAT to do the forwarding at L2.
I know this started as an anti-cloud stance, but there is a lot that cloud servers can do for the DIY self-hoster, for really not that much money per month. And one of those things is knowing that even though you're having a power outage at home, that your server is still up.
> You will get a lot of people who will try to find all the theoretical reasons why this wouldn't work.
It's not a theoretical.
Depending on where you live either:
- The available tech (i.e. without fibre to the premise your "last mile" might be entirely copper, or just the run to the cabinet, but both can affect speed)
- ISP offerings (Some only offer symmetric speed on much more expensive "business class" packages)
>I figured out how to make my dynamic IP still serve requests accounting for IP changes using a background script. If someone has a better solution I’d appreciate that.
cloudflare costs $0, scales up to your gigabit connection without blinking, and there are docker containers that keep the IP updated like it if were dynamic dns service.
The analysis ignores the basics: AWS controls the supply. At some point there’s a limit to that control via supply chain (meaning they wouldn’t be able to make it cheaper when they want to) but if the number is going up, it’s pretty likely that’s totally intentional.
"Ignore" is a pretty strong word here. This is an analysis of demand. Of course Amazon controls supply, but allowing compute to sit idle at these price points is not profitable so we can assume that they are selling available compute. The question then is whether aggregate instance demand is increasing with respect to available supply and pushing prices up.
You'll have to explain what you mean by "intentional" given raising prices above market equilibrium would leave compute idle (again, very unlikely). In that way amazon is just a market actor (albeit a large one) just like customers.
Yeah but welcome to the era of steeply discounted savings plans. Isn’t it like half of the cost of in demand, or something kind of crazy?
Had a couple of calls with their “please don’t leave us” team and they cut our bill more or less in half (more through fixing my boneheaded designs though).
This seems to ignore the fact that the prices are consistently low if you are tracking the lesser used instance types/regions, even when talking explicitly within AWS.
Spot prices don't directly reflect capacity. EC2 Spot used to be a real spot market, with actual real auctions where higher bids displaced lower bids.
But it was changed a while ago, so prices are now set algorithmically based on predicted demand/supply.
We run stateless calculations on EC2 across regions, and we definitely see that instances are harder to come by. Especially instances with GPUs. And for many instance types, the price advantage of EC2 Spot compared to committed spending is not significant anymore.
Well, I was one of the engineers that made the change :) I'm not sure how much I can tell, but the public reason was: "to make pricing more predictable".
Basically, one of the problems was customers who just set the spot price to 10x of the nominal price and leave the bids unattended. This was usually fine, when the price was 0.2x of the nominal price. But sometimes EC2 instance capacity crunches happened, and these high bids actually started competing with each other. As a result, customers could easily get 100 _times_ higher bill than they expected.
Another issue was that EC2 spot internally in AWS was implemented as a "bolted on" service that was not on the EC2 instance launch path, so it was easy to game the market. You could do very fun things, like:
1. You need an instance type that is right now under heavy contention. Not to worry! There's a way to get these instances!
2. You create a small VPC with only a couple of available IPs.
3. Then you submit a thousand EC2 Spot bids at 10x the price.
4. Your bids win and EC2 terminates other customers' instances. After all, you're willing to pay more!
5. EC2 then tries to launch these 1000 EC2 instances into the VPC. And fails, because there aren't any IPs available (see item 2).
6. Whoopsie. The bids are cancelled and EC2 instances are returned to the pool. Oh, and you're not charged anything because instances failed to launch.
7. Profit! Now there is plenty of capacity and you can submit bids at a normal price.
> Basically, one of the problems was customers who just set the spot price to 10x of the nominal price and leave the bids unattended. This was usually fine, when the price was 0.2x of the nominal price. But sometimes EC2 instance capacity crunches happened, and these high bids actually started competing with each other. As a result, customers could easily get 100 _times_ higher bill than they expected.
Thanks for sharing, though I'm confused about this part. It seems like expected behaviour?
Any possible spot/auction system I can think of would have the inherent possibility of a sudden surge in pricing.
But it turns out that this is not a customer-friendly behavior. So AWS decided to remove the bidding out of the equation, and instead terminate instances based on a complicated scoring system.
The idea is that it's easier to deal with the missing compute capacity, which you notice right away, rather than be blindsided with a 100x bill at the end of the month.
I would guess they were seeing thundering herds of bots trying to fight to grab spot-instance stock whenever any appeared below the bots' set cost threshold. Basically like scalper bots.
It's tough to know for sure. Many of the instance types in these regions are at the price floor, so there could be increases in aggregate demand that just don't show up because the DCs are overbuilt. These markets are thinner so if demand shifts to them (big if because moving regions is hard) we might quickly see price spikes.
The point of the article is pricing, not preemption. The strategies you’re talking about improve preemption but they won’t save you money if pricing is increasing across the board.
I was wondering if this could be a contributing factor. There is almost unlimited demand for training models - and a lot of value to derived from doing so - and it seems like a perfect workload for spot.
I have just purchased 5% of a data center's Deep Learning servers. 10 GPU / 384 GB of RAM, 2 x Xeon 10-Core processors. I am putting them in a solar powered data center and plan to lease them out at $0.2 per GPU hour.
I gave up on ec2 when they started requiring you "request for quota" to start a gpu instance.
You have to "request for quota" even if you want to run a single instance.
You have to specify which specific instance type you want.
You have to specify which region you want it in.
Then you sit back and wait for some human to "approve your request".
In my case this took 24 hours.
In this time I literally could have walked (not driven) down to the local computer store, bought a computer, pushed it back to my house in a shopping cart, spent a few hours configuring it and still be left with 16 hours to have a sleep, eat and do some other things.
AWS quota system is so far from "scalable" and "elastic" that it's effectively useless. You can't design any sort of infrastructure around that sort of quota system.
I dumped AWS at that point. Mind you, Azure has exactly the same quota system.
Seriously, just rent a server from Ionos or Hetzner or get a fast Internet connection and self host. It's faster and better and cheaper than any of the big clouds.