Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: I built a service to help companies reduce AWS spend by 50% (usage.ai)
123 points by kavehkhorram on Feb 3, 2022 | hide | past | favorite | 60 comments
Hey HN: Kaveh here, the founder of https://www.usage.ai/

We help companies drive down AWS EC2 spend. Why? Because the way it's done now is a pain. DevOps and Software Engineers end up spending time managing costs rather than focusing on business problems.

Previous to founding Usage, I worked on high-performance computing research at JP Morgan Chase and as a software engineer at a number of smaller startups.

Here's how it works: We are typically brought in by a DevOps manager to cut AWS EC2 costs. The app is entirely self-service and the savings are generated automatically, typically we do this live on a call. On average, we reduce AWS EC2 spend by 50% for 5 minutes of work.

To reduce by 50%+, we don't touch the instances, require any code change, or change the performance of your instances. We buy Reserved Instances on your behalf (a billing layer change only) and bundle them with guaranteed buyback. So you get the steep 57% savings of 3-year no-upfront RIs with none of the commitment (you can sell them back to us anytime after 30 days).

We make money off of a 20% Savings Fee. Happy to chat directly kaveh@usage.ai

Have you experienced any issues with managing your company or organization's AWS expenses? We'd love to hear your feedback and ideas!




The IAM policy shown on the website allows sts:AssumeRole without any restricting on resources or conditions which will be a deal breaker for many. Presumably you can restrict this to certain AWS principals?


Hi FujiApple,

We use the sts:AssumeRole policy to create temporary short-lived credentials for us to get access to the AWS APIs on your behalf. The assume role permission is constrained only to the policy we've defined on our landing page and in our app, which are read-only + the ability for us to manage your reservations on your behalf.


Yeah this should be addressed.

Also kudos I guess for landing kik as a customer.


A good place to start with cloud savings is just knowing what is out there. I built CloudOptimizer.io [1] for this purpose, aggregrating 10 cloud providers in one place.

Running it as a free, hobby project.

[1] https://cloudoptimizer.io


Nice. Are you calculating alibaba $/month correctly? Multi million $ per month? Maybe a currency thing?


This is cool, anything in this space is useful, but everywhere I've been with significant AWS spend has already negotiated something directly. Caching and proper autoscaling policies usually take care of the rest, I've found the tricky thing to be RDS..

On the other hand.. can I buy time on abandonded RIs directly from you for extra savings?


Yes - you can! Feel free to shoot me a note and we can chat more about this: kaveh@usage.ai


RIs like this are great, but the biggest savings we've found is moving everything we can to Spot Instances. We're hoping Aurora on spot becomes a thing as that's really our only remaining RI/on-demand cost.


Yeah, spots is where it's at. The problem is to leverage spots, the application in question needs to be 'cloud native'. Many companies moving to the cloud are simply picking up legacy app servers and dropping them on an ec2 instances and declaring success. Those will simply not survive the properties of spots.


Relevant: https://github.com/cloudutil/AutoSpotting

I've seen some third party services that automate migration to / replacement with spot instances, but haven't used them yet personally.

Going serverless, in many places, has been the most effective cost optimization for me.


Why serverless is cheap? How Amazon can offer serverless on lower cost than instances? What I mean, Amazon anyway needs to run instances and build on top of them serverless. So were the cost reduction happens with serverless for Amazon?


The amount of waste we experience can be really high. At Remind we started tracking this using a metric we call OUCH (Overprivisioned Underutilized Cpu Hours).

Consider a k8s/ECS installation that autoscales.

Each application will have a CPU target. In order to prevent spiky traffic patterns from overwhelming the running containers, we target 70% CPU. As usage goes above 70% CPU, we will launch more containers.

The k8s cluster will have a reservation target. In order to allow fast launching of new containers, we want the cluster to have only an 80% occupancy rate. If more than 80% of the cluster is reserved, we launch more instances and expand the cluster.

So to run my autoscaling container-based applications in a way that will be reactive and respond to incoming load, I have to leave 44% (1 - .7*.8) of my hardware idle. If we also factor in that AWS itself doesn't target a 100% occupancy rate (because then nobody could launch new instances), each unit of CPU I actually use requires a significant amount of idle infrastructure. Easily double, possibly triple in the larger scheme of things. All of that is either directly paid for (k8s nodes) or indirectly paid for (ec2 pricing inevitably has idle capacity costs baked in).

With serverless, we would eliminate many of those inefficiencies. With ultra fast launch times, we don't need to give containers headroom to handle spikes. By not running our compute cluster, we can instantly launch everything we need, eliminating k8s overprovisioning. We're left with AWS idle capacity as the only waste, and AWS mostly has solved that with the spot market.

The math doesn't always come out ahead, but there's a lot of opportunity for serverless to be cheaper in many cases.


> Aurora on spot

How would that work for a database? And have you considered or tried Aurora Serverless?


I guess the same way as aurora serverless. In general, RDS uses a seperate storage layer from the actual instances so you can do vertical scale/upgrades with 0 downtime (either read replica goes down or replica becomes master)


We run 3 nodes 24/7 (writer + 2 readers) and have RIs for them. But during daytime hours we autoscale and run an addition 8 or 9 readers. Some of these run for just a few hours and could easily run on spot (especially with minimum duration).

Serverless couldn't provide enough capacity for us (at peak we use up to 300 vCPUs in this cluster). That was on v1. v2 might change that when it supports postgres support.


I started an MVP a little along these lines.

The idea was a single page that showed all your AWS resources across all regions and all accounts.

The good thing was it ran purely in a browser via the AWS JavaScript APIs, so you did not need to create users or roles or give access to any third party - you just put the AWS key into the browser and it ran locally.

It's still there but effectively abandoned.

https://www.singlepagecloud.com


Part of a recent project I worked on involved this very issue, and significant savings are definitely possible. As an idea this is great, there's a gap in the market for this and very few addressing the issue, least of all AWS.

However, there's a couple of little things that may block its wider adoption.

1. It's a big ask for some companies to create any sort of IAM role for an external company or contractor. Even though they send and receive sensitive data from any number of 3rd party APIs, most will be uneasy about IAM access. It's just a hang up more than a concern, but still.

2. Engineering managers either don't understand or don't care about cloud spend. They get their budget at the start of the year, and they grow it based on the previous year. They usually don't have anywhere to put savings later on in the year, and don't want to reduce spend, and hence budget targets, for the following year.

3. I'd half expect your idea to be bought out by Amazon and shuttered. Kudos to you if that's what happens! But it's costing Jeff Bezos another yacht, so he may not like that.


There are dozens upon dozens of companies doing exactly this sort of service for AWS. And they all use the describe API calls requiring IAM permissions.


about 3 - they didn't buy any of the other companies that already does RI cost optimization. So he is probably safe


If it’s not getting bought out what’s the point? There’s faster ways to make money.


I previously worked in this space and have a fair bit of experience optimising AWS spend for customers.

Cool idea! I see the pain point you are addressing here is friction in the marketplace, rather than simply cost optimisation? Traditionally the use of standard RIs was a pain, due to the inflexibility of moving between instance families. And the fact that the marketplace requires a US bank account which made it a no-go for non-US customers.

However, these problems were addressed first through the use of convertible RIs. Which allow exchange of instance types, but can't be sold on the marketplace (from memory). But, to be honest, they were still a pain to manage. You needed a good cost person, or a good TAM to keep on top of the required conversions. So, secondly, savings plans were introduced. I generally recommend compute savings plans these days, as they are much more set and forget, though I acknowledge provide less discount than standard RIs. My personal opinion is that EC2 based RIs will probably be deprecated by AWS at some point in the future. For this reason I don't think its likely they'll release this automated marketplace as a feature in future.

I work almost exclusively with enterprise customers and see very little use of standard RIs these days, which given the marketplace angle I'm assuming is the only purchase option you are working with? Are you doing zonal or regional scope? But if you can find an angle that means customers can make more efficient use of standard RIs all the power to you, that is a win!

Recommendation wise, the native tools (Cost Explorer and the CUR) can deliver reasonable recommendations that are good enough for most customers. Especially when using more flexible purchase options like compute savings plans the need to be super accurate just isn't there anymore.


How do you handle the risk of not being able to resell instances that you've bought back? What if I buy 10,000 instances and sell them back to you after 30 days? Seems like a competitor could do something like that to intentionally sabotage your business. Though maybe there's so much liquidity in the market that this isn't much of a risk, and in the worst case you could probably find someone you could sell to at a loss.


Excellent question!


I guess self-service is good most of the times, but how is it going to handle temporary increase in resource usage? Think of a big infra refactoring project where a team may create lots of temporary instances and terminate them after a couple of months.

What's your view on Saving Plans vs Reserved Instances? Saving Plans seem to be much more flexible overall. Why only RIs then?


Usage refreshes its recommendations on a daily basis. When your instance count increases, Usage buys RIs. When it decreases or changes, Usage sells RIs.

Our RIs are actually more flexible than SP. There is no commitment and if you want to change region or instance type, Usage will buy the old type and sell you the new type.

We chose RIs because AWS allows us to buy and sell RIs. There is no marketplace for SPs at the moment.


That feels like something that AWS would want to shut down if the business ever gets large enough. AWS has its own partners / AWS distribution program, which usage.ai doesn't seem to be a part of.

Do you believe you'll be able to continue running this once someone high enough in AWS "notices" you?


This doesn’t make sense, there is no reason AWS would want this shut down. They are buying reserved instances, which AWS sell because it helps them to do so. They are charging the customer 20% savings to help them. They are buying the reserved instances back themselves if needed.

From an AWS perspective this is simple market usage of AWS RIs and cost savings for the customer while easy/reliable usage predictability for AWS for cpu forecasting. It’s a win. And as below looks like they have a healthy relationship with AWS.


They're effectively a kind of bulk resellers. AWS may choose to be happy about it (happy customers keen to spend more on services) or not (less AWS income). It really depends on how the management sees it.


I don't think it would result in less AWS income -- AWS knows exactly how much on-demand, reserved, and spot instances cost them and they price them accordingly.


I don't think this is true, if it shifts usage from spot instances to the reseller's spot instances backed by AWS reserved instances they'll be making less money.


They shift to reserved instances, not spot instances.

It'd be hard for a service provider to shift their customers instances to spot instances unless the customer could tolerate the spot instances being shut down on short notice, and if they can, that customer may as well just use the spot instances themselves.

Worst case, this will increase usage of reserve instances and reduce on-demand usage, but AWS priced them accordingly, so they don't care


That’s only true in a nominal sense. Money right now is worth more than money in the future. Money spent on reserved instances is money Amazon has right now, whereas net AWS spend from spot instances is money in the future.

How much money right now is worth more varies, but Amazon knows best here and prices accordingly.


We have a strong positive relationship with AWS and will be partners with them in the coming months!


like how Hollywood has a strong positive relationship with China until the censors suddenly deny all films access? Last year was pretty tough.. for example.

well, hope you get a few months of nice payouts! individuals don't need ARR :) one or two nice paychecks is good enough for lifelong success, so you only have to be right once or solve a market need once!


Does it break TOS?

Did Amazon shut down Snowflake despite losing a bunch of Redshift dough to them?

I'm not sure why you feel AWS would shut down a company who is using their resources in a clever manner.


Interesting idea however in my experience EC2 is generally not where I need to start optimizing my AWS bills. RDS & other state [1] are by far the largest line items on my bills.

[1] RDS / Aurora / Elasticache / OpenSearch


We have optimization features early in R&D for RDS, ElastiCache, and OpenSearch. If you'd like to try them out at some point, feel free to shoot me a note: kaveh@usage.ai


EBS and data egress are the big ones IMO.


ottertune.com optimizes RDS & Aurora, just fyi


Do users also get the reservation? If there is a capacity constraint in an AZ, are the instances reserved to the user’s account or to usage.ai’s account? (This is important to a minority of users, probably.)


The user's account!


Isn’t this one of the features by cloudhealth

[0] https://www.cloudhealthtech.com


I wonder what percentage of Amazon's AWS revenue derives from designing its interface in a way that maximizes unnecessary spend?


it'll be interesting one day when cloud computing units become fungible and futures trading starts on a commodities exchange alongside an actual spot market for excess capacity.


Has anyone used Cast.ai or Spot by netapp as a comparison?


Spot has an almost idential service to this called Eco https://spot.io/products/eco/

It reads your cost and usage report + AWS APIs and offers RI options for you to purchase or ignore. What Eco is lacking is being a reseller like usage.ai and buying your unused RIs


We use cast.ai and it certainly has saved us a LOT of money. I unfortunately am not on the implementation / devops side of this, but it has been worth using, or can speak to exact figures, but it was definitely a significant difference in expenditures.


Do you support EKS or ECS on EC2?


Yes -- as long as it's backed by EC2, we support cost reduction for EKS and ECS.


this is like energyogre.com but for EC2 RIs. really nice.


Honest question, who is still on bare ec2 anymore?


Are you asking that cuz serverless is the new marketing hotness? There's a big world of tech beyond your bubble and it's not running lambdas.


Everyone running profitable companies on the cloud.

"Serverless" is basically AWS making money off of your technical debt and uncertainty.

It is cheaper to run your prototype with lambdas and "managed services" but as soon as it scales to a certain point, you're much better off with bare VMs reserved and orchestrated as per your business needs.


What y about containers that can be freely moved between clouds and on premise?


Huh? What is wrong with "bare" EC2 instances?


Not sure what you mean by bare, but my ECS clusters are backed by a mix of ondemand and spot instances.


I have found ECS Fargate just running containers directly to be more convenient for most of my services and workloads. But I do occasionally miss some of the features that don't have analogies in ECS Fargate (yet).


We are, AMA.


Many of our customers are on EKS or ECS backed by EC2!


I cut costs of AWS by 99% by not using AWS at all




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: