It is completely incorrect to characterize these observations as "mispricing" - this is a quirk of automatically-determined prices across very different products. If the author actually tried to use these instances in any significant volume they would understand the driver - capacity pools are nowhere near equal, and not as interchangeable for AWS as the article implies they would be for a user. Prices reflect demand munged with available capacity - uncommon instance types are uncommon precisely because they aren't used as much, so there aren't the same signals to drive the price up and down automatically.
Instances with attached NVMe are available in much lower volumes than others, as are AMD instances. Obviously these pools cannot be used as a drop-in replacement for non-"d" instances or Intel families.
In financial markets, this quirk of automatically-determined prices across different products is frequently called "mispricing" when those products logically should have a relationship with each other.
Straightforwardly: All hosts with space for a c6gd spot instance have space for a c6g instance. If Amazon is willing to host a c6gd instance in that slot for $X, they should be willing to also host a c6g instance there for $X.
In financial markets, the way this gets handled is through arbitrage: someone will buy the equivalent of the c6gd instance, and sell the c6g part for the higher price (they may also sell the "d" part for even more money). This has the effect of "correcting" the price. The AWS spot market does not allow you to do arbitrage, and AWS doesn't appear to do the arbitrage for you.
AWS probably likes this inefficiency in their market: some instance types are more popular than others, and some customers make assumptions that require them to use a very specific instance type (ie a c6gd would not work as a substitute for their c6g instance). However, the vast majority of users probably could work just fine if their c6g instance were a c6gd, and don't look for the arbitrage opportunity. That means Amazon gets paid extra.
> If Amazon is willing to host a c6gd instance in that slot for $X, they should be willing to also host a c6g instance there for $X.
The reality is that direct c6gd demand might be an order of magnitude lower than c6g direct demand - if AWS can get some more flexible people to adopt c6gd by offering a lower price, c6g capacity is slightly stabilized for on-demand usage by people who don't value the flexibility.
Also note that c6g to c6gd has a non-zero switching cost - extra NVMe on the instance adds a new source of potential hardware failure, increasing the probability of termination very slightly. There might be other software-related costs depending on whether your application makes any ill-advised assumptions about attached storage during setup.
So overall, I would just be happier to read this article if it was framed as "PSA: having more features in an ec2 instance is sometimes cheaper! Don't rule yourself out of extra savings by making overly-constrained fleet requests." The extra commentary about foregone revenue makes too many assumptions and detracts from the core point.
The point is that Amazon doesn't have to fill that slot with a c6gd. They can also fill it with a c6g. They just choose not to.
The fact that you have to host a c6gd to get that price instead of a c6g is an inefficiency in the spot market that likely makes Amazon money, but is a little customer-hostile. I think the article is probably wrong that Amazon is foregoing revenue due to this. This is a form of price discrimination and it is likely making Amazon money, but in a scummy way.
Agreed that it's definitely difficult to know the true missed revenue here without internal data, and even then you'd be making some assumptions. I am confident there is some missed revenue here, as amazon routinely has spot capacity constraints under existing prices so could definitely sell some substitute instances without moving the original instance market (even one instance per pool substituted equates to >$1M per year). In either case, a savvy organization can definitely benefit from the price discrepancy even if Amazon couldn't.
I can agree that there is missed revenue - but realistically it wouldale much more sense to sell that capacity via Fargate (which is closer to undifferentiated generic compute and RAM) rather than monkeying with the spot pricing algorithm.
Great point on Fargate, I'd be very curious on whether they select capacity for that from EC2 capcity or if there's a separate physical footprint for it.
Author here. The key here is that customers can leverage these pools in addition to their existing pools, improving capacity and price. AWS actually supports this out of the box (including substituting instances with drives) by specifying core and memory requirements directly instead of instance types.
Totally agree with that; it is a pretty common approach. The only part I don't agree with is calling out the price differences as some kind of "gotcha" that AWS somehow missed, particularly given the speculative "lost revenue" data which have no basis in reality.
See the emphasis on transparent substitutes in the article. This analysis is limited strictly to sets of instances that are fully hardware compatible, meaning AWS could resell one instance as another. There are way more savings to be had as a customer by leveraging instances that aren't transparent substitutes.
Which instances are not transparent substitutes, in your opinion? Keep in mind the defintion here is that Amazon could substitute the image transparently, e.g., by ignoring the additional resources in hypervisor, not that the instances are by default indistinguishable.
That being said, the substitute instances considered could be trivially accepted by any task running on the original instance, so long as it doesn't misbehave when given too many resources. In the case of vCPU, you can even hide extra vCPU cores, so a c6g.xlarge can be made effectively indistinguishable from a m6g.2xlarge by disabling the vCPUs at the hypervisor level.
> Across all AWS availability zones instances are mispriced by roughly $400/hr at any given time. This means that, with just a single instance of each type, Amazon is missing out on $200/hr or roughy $1.7 million each year. This is over roughly 15,000 pools of instances. Given Amazon controls roughly 100 million IPs, we can guess that each instance pool probably has on the order of 1000 instances (more for smaller instances, less for larger instances). Given this, the average mispriced pool might have hundreds of instances, meaning hundreds of millions each year in missed revenue due to mispriced spot instances. Because amazon keeps their number of instances a secret, it’s difficult to make a precise estimate from the outside, but the missed revenue probably falls somewhere in this range.
You are hypothesizing that the price differences produce "lost" revenues.
An alternative hypothesis can be that the price differences produce similar or higher level of revenues for AWS through price segmentation, with Amazon recognizing the lack of adoption of certain spot instance bidding features and auction markets reacting appropriately.
Unless you have the capacity and quantity demanded for each instance types, you can't prove your hypothesis. You are assuming scenario 3 (below) with no insights into price elasticity of the underlying customers.
Example:
Baseline:
Instance types A and B are equivalent.
A is priced at $3, with capacity of 1000, quantity demanded of 800.
B is priced at $2, with capacity of 1000, quantity demanded of 200.
Total quantity demanded = 1,000.
Revenues from instance type A = $3 x 800 = $2,400
Revenues from instance type B = $2 x 200 = $ 400
Total revenues = $2,800
Scenario 1: All customers purchase instance B instead due to better price discovery.
Revenues from instance type A = $3 x 0 = $0
Revenues from instance type B = $2 x 1,000 = $ $2,000
Total quantity demanded = 1,000.
Total revenues = $2,000
Amazon loses $800 in revenues, there are no "lost" revenues" recovered.
Scenario 2: Amazon changes instance type B price to $3. Total quantity demand decreases to 900 due to price elasticity of instance type B customers.
Revenues from instance type A = $3 x 800 = $2,400
Revenues from instance type B = $3 x 100 = $300
Total revenues = $2,700
Amazon loses $100 in revenues, there are no "lost" revenues recovered.
Scenario 3: Amazon changes instance type B price to $3. Total quantity demand remains at 1,000.
Revenues from instance type A = $3 x 800 = $2,400
Revenues from instance type B = $3 x 200 = $600
The missing component of your analysis is that amazon has 4th option: re-sell instances of B as instances of A when A is more expensive, and otherwise allowing the market to adjust. The analysis is strictly limited to instances where amazon could, in theory, do this (e.g., reselling c6gd as c6g).
Assuming the market is in equilibrium, the above scenarious aren't realistic, as demand at the market price would equal supply at the current price (roughly, of course).
Suppose there are 1000 c6g and 200 c6gd, with equilibrium price of $3 and $2, respectively (i.e., all instances have demand). Amazon re-SKUs c6gd as c6g until there are 1100 c6g selling fro $2.90 and 100 c6gd selling at $2.90. Total revenue is $3480 vs. $3400. Of course it's impossible to know the true numbers without hidden knowledge of the market, but this is more akin to what would occur. Amazon effectively has a risk-free arbitrage opportunity here, so it stands to reason that there is revenue to be made. Customers don't have this option (since you can't short spot instances), so the best you can do is diversify and save money.
Edit: Actually, the AWS spot market is often out of equilibrium in a way that makes this reselling even more effective. For instance, in the example in the article the c6gd instance is actually pegged at the minimum price, so some number of those instances could be resold as c6g without moving the c6gd price at all.
I think you’re think about the revenue functions for spot instances in isolation of the larger supply base of all instances. Spot instances are already a result of revenue management of a fixed supply base that increases in discrete increments over time. Instance capacity overall usually leads instance demand, shortage costs are very high in data centers.
Spot instance capacities are a function of the all instance capacity for the same type and on-demand instance usage. Spot instance pricing can influence the quantity demanded of on-demand instances of the same type, and vice-versa.
Anyhow, there’s no way we can figure out whether you’re right or wrong with any reasonable level of certainty.
While it's tough to say with certainty how much revenue is lost, there is certainly lost revenue. Consider that many substitute instances are available at the minimum allowable price (i.e., won't go any lower, there is unused capacity). These could be resold without moving the substitute market.
Instances with attached NVMe are available in much lower volumes than others, as are AMD instances. Obviously these pools cannot be used as a drop-in replacement for non-"d" instances or Intel families.