Here's my $700 surprise bill story. There are many like it, but this one is mine.
* An example CloudFormation from a Re:Invent (AWS conference) session silently failed to tear down some resources.
* Not trusting CloudFormation, I looked through each (known service, region) manually to make sure resources had been torn down. This failed to identify the running resources because a tutorial div opened in regions with no running resources and remained open if you switched to a region with running resources, hiding them.
* Not trusting my manual service tour, I kept a close eye on my daily costs until I saw several days pass with $0 spend. This failed because free tier credits were hiding substantial service usage.
* Not trusting any of the above, I had billing alerts set as a catch-all. They correctly triggered on an unrelated usage surge, but with such high latency that I incorrectly attributed their failure to reset to high latency rather than to a genuine underlying charge.
Bam, $700 charge next month. Amazon was quick to refund half of it. I was eventually able to get them to refund the other half by making waves in the support system of a high-spend business account.
At the last re:invent session I went to, I surveyed a table of 6 people. After sharing my $700 figure, 3 of the 6 came forward with even bigger numbers, 1 of the 6 with a smaller number, and the remaining person was a newbie.
This sort of story probably fully explains origins of the OP's lament (that AWS billing is so opaque that even internal Amazon teams can't figure it out.) Some AWS middle manager knows that if he orders better billing tools to be implemented, AWS income will drop; not significantly, but ~$700 multiplied by the number of AWS users might make enough of a dent to be noted in his performance report. In order to make this omission more plausible, even internal investigation tools (which would encourage infrastructural improvements that make user-facing tools easier to build) are neglected.
But they didn't actually get the $700 they refunded it, along with spending who knows how many support hours along the way.
Maybe there enough people out there who don't notice a $700 overage to offset that, but it seems like there would at least be a visible decrease in support costs if they had a better system.
I suspect that full refunds for user mistakes aren't entirely typical.
Not everyone has a high spend account and a rep who brags about their policy of refunding mistakes frequently enough to snipe them with "If you have a policy of refunding mistakes, then why didn't you?" in front of customers.
I would like to emphasize that I've chosen my words carefully: I don't know for certain that the rep intervened, but I made trouble, emailed him the information necessary to resolve my issue, and then my half refund became a full refund. Any connection between these dots is pure speculation on my part.
Part of retail sales is people not caring enough to spend the energy to return / get a refund. Even if this story resulted in a refund, there are probably 100 other who didn't bother because it wasn't worth their time.
This is why it’s crucial to review employees against each of the _project_ goals that they aimed to deliver. You cannot judge all work equally, sometimes the appropriate outcomes can be not just orthogonal (has zero benefit with respect to performance metric “A”) but can even be detrimental (negative with respect to performance metric “A”) ... not handling this with something is what create these kind of problems.
Not from reinvent, but similar bill latency problem ...
Many government agencies have to go through 3rd parties ("lowest qualified bidder", which AWS often doesn't bid on) to pay our bill... And the contractors themselves use 3rd parties to figure out how much to bill us... so things like AWS bill alerts are not possible (imagine the whole billing section being permission denied while using the root account user). In addition, 3rd parties do not always provide great tools to set up bill alerts.
We have 40+ AWS account and can't track our spend without 10 button clicks per account which can't be done in parallel due to the apps browser caching locking us to working with one account at a time or it corrupts the cache. The process takes a minute per account, if the website doesn't crash trying to process our records (which are >3 million line item records for some accounts)
All that said: we basically were screwed at monitoring our bill.
In the fun that is semi-server less, we had a container running in ECS and logged directly into cloud watch. It went into an infinite loop late one month. It didn't really take our bill out of the expected ballpark when it was processed by the third party and delivered to us 20 days after the bill cycle closed
Next month however, by the time we got the bill, it had run for the ENTIRE month in an infinite loop, and already 75% of the next month. It pushed 50,000GB of an infinite loop log data at 50 cents per GB ingested the first month (plus storage costs). (That's about $30,000 for month 1)
AWS did not provide assistance because it was paid for on credits, but it basically ate all or remaining credits after the second month bill came in.
After that, we got the contractor to simply give us a copy of the "detailed billing reports" daily and built a process to make our own bill monitors (which at that stage had markups from the contractor). We eventually got somewhat of a better monitoring system through the third party app as well, but we were not aware that was possible because it was not accessible without the contractor setting it up for us (hidden menu options to combine accounts)
Better, there is a plug-in that manages your account switching for AWS so you can stay in one container. I'm on mobile so I don't have the name right now.
This all being said, I'm a generally happy AWS admin... What happened really can't be blamed squarely on anyone... AWS can't show us a bill since they don't bill us.. the 3rd party probably could have helped with the bill alerts before the incident, but they did help after... Also couldn't really expect AWS to provide more credits to cover lost credits they already provided to help us get off the ground. Overall we really wouldn't be able to do what we do with any on prem solution.
> What happened really can't be blamed squarely on anyone
Call this a "distributed denial of responsibility attack". It's very convenient that there's no one point of easy blame, that means that nobody has to change.
When I had shutdown my startup, I had to remove the resources from AWS which produced ~$1000/mo. I removed those and followed the same practices as the parent did along with regular Any.do alerts to ensure I didn't miss anything. In subsequent months I found that, there was something or the other which came up in charges, like a CloudWatch Alarm, Log somewhere. It took a while to remove all of them to bring a zero bill.
I'm not complaining that it's something nefarious on AWS part, I'm saying it's designed in a way such that someone using various services of it can easily loose track of billing.
Yes, I could have closed the account altogether; but I didn't want to. Now I wonder, if AWS starts charging for the billing alerts itself whether I would catch it before I actually receive a billing alert for the billing alerts.
Among the cloud providers, this seems to be unique to Amazon. There's a lot of malfeasance that's unique to Amazon, like HQ2 and the counterfeit and offensive products on Amazon.com. I don't feel like giving any part of their company money. For twitch I donate on Stream Labs. It sucks not being counted as a subscriber or being able to use the emotes, but I prefer that over Amazon taking a cut.
Google's Firebase has had some absolute horror stories, especially when Firebase found out they were under-billing some people and then "adjusted" their bill to be more accurate, causing massive charges.
I recently had a surprise GCP bill from a personal kubernetes project I did. It wasn't anything crazy, but it made me realize that I'm always only one click away from seriously hitting my credit card.
What the HQ2 malfeasance? Ie, if they committed unlawful acts why not pursue that in court?
I know they selected a location in New York they shouldn't have / the community rejected, and after a big outcry ended up going with not against community pressure.
I think the one in Virginia is going ahead, they just got permits there for a metro station. Are there protests in Virginia they are not listening too?
Actually the scale is larger than that. Amazon got over 100 cities to bid on the project. Entered into deeper negotiations with at least 20 of those. Got many cities to spend hundreds (or thousands) of hours putting proposals together, pitching teams, crunching numbers, offering deeper tax incentives than anyone else can get. In the end, they chose the location they were going to choose anyways.
So it was a colossal waste of time for 5,000+ people across North America. Burnt a lot of goodwill. Many cities learned a valuable lesson on dealing with these big multi-nationals as a result of it. So it won't likely happen like this again.
Genuinely curious - I thought it was strange when it first started - but surely every company must do this? Wouldn't making the investment of a large presence likely involve seeking concessions from each location as part of the scouting process?
What did Amazon do different here? Instead of in back rooms, it was all out in the open.
The posters here are misusing malfeasance terribly.
Other examples are federal grant programs - they are required to publicize them and lots of folks / nonprofits spend lots of time applying but reality is most will renew to existing partners etc
I believe (from having read a fair amount of the process hullabaloo) that they were taking advantage of tax incentives available to all employers choosing to locate in that area. To the extent that’s the case, I see absolutely no malfeasance.
There were additional, negotiated real estate related credits that are also likely to be similarly negotiated by any other developer.
They were negotiating with the government - who writes the laws.
I was curious about the malfeasance - but if this is the the claim of criminality - uh... not a good look for the folks yelling at amazon.
I think MUCH stronger claims may be possible around just their fake product volume and consumer harm there, but good luck with these claims - are they being litigated anywhere?
What is the cost reporting situation on azure and gcp?
We've gone for aws because they're supposed to have good customer support, but opaque cost reporting and the inability to inpose spending limits is a concern.
Does anyone know how azure/gcp:
1. Handle cost reporting
2. Handle spending limits (e.g. can I impose a hard spending limit per service/per user/globally?)
Used all 3 for complicated things, preferred AWS for better overall security/compliance support and more features. Azure is largely comparable to AWS for most shops and some services can be slightly cheaper. GCP seems to be way behind both but I'm sure lots of places could get away with using them.
1. Azure basically has comparable cost reporting to Amazon, though has a cost aggregator if you want to use both Azure and AWS. I personally thought it didn't really bring all the nice features AWS billing had into Azure very well so I'd not recommend if your AWS usage is large and varied. I found GCP to have less features than either Azure or AWS for billing.
2. None of the 3 providers have a hard spending limit feature, though Google app engine service (not GCP) let's you shut it down. Other than that, permission roles are generally the same, AWS wins slightly on features again but Azure had a slightly nicer UI.
Anyways, you should do your own research on what cloud seems sane to you, and not let randos on HN make your business decisions.
One thing that bit me was that GCP spending limits could only be updated on a 24 hour timer (so one full day before they kicked in)
This is absolutely terrible if you have a spike in legitimate traffic and try to increase the limit and you lose all that “front page of hn traffic” forever because your supposedly scalable system didnt scale.
I have no idea if they have fixed that issue yet but I doubt it.
2) No cloud has proper spending limits (aside from barebone compute like linode). It works on your bill continuing to get pricier as their overall multitenant costs come down.
> An example CloudFormation from a Re:Invent (AWS conference) session silently failed to tear down some resources.
Similar happened to me at a Kubernetes/GCP tutorial. We started with a $100 credit. The tutorial included setting up massive (for tutorial purposes) instances. Because I had played with my account before and had created a single instance before, I hit account limits and my tutorial code failed to work. A frustrating experience richer a was very busy at work when returning from the conference.
When getting back to my tutorial code 3 weeks later, I noticed that less than $1 of my $100 were left. The n - 1 instances I created during the tutorial had been running for 3 weeks. User error of course, but at which SW job no errors happen?
The only positive about GCP I remember is that they promised not to overflow from the credit into real money from my credit card. At least that's how I remember it. Did not (need to) test, because I noted the issue $1 early. In AWS there is no such promise as we know.
My job is cost optimizations at a very large corporation. We have been given the order to go all in on AWS. Some things I've found to be particularly annoying:
* Data transfer will bite you in the ass if you let it. Especially over NAT gateway in very high traffic sites. So you do the right thing and put your application in private subnets, route traffic in over the load balancers and out over the NATGW. Then you get a $20k/mo bill for your microserviced application that has a hundreds of requests per second during peak hours. Pro-tip: the poo-pooed nat instances are actually a cheaper solution, but you're on the hook for maintaining it.
* The CUR can get huge. I mean millions and millions of lines. AWS says you can throw it into S3, query with Athena, etc. etc. But if that data set is huge even _that_ will cost you a lot of money to run reporting, analysis, etc. Especially after you build that dashboard for the refresh happy VP.
* The Cost Explorer is admittedly getting better, but still lacking a lot of necessary detail. You have to pair it up with CloudWatch to get actual cost and usage in a usable way. The value add services like EMR/Elasticsearch service/all the ML stuff do the hideous job of hiding actual usage. You gotta dig hard.
* The third party cost tracking tools (CloudHealth/Metricly/CloudAbility/Cloudyn) are just a wrapper around what you can get out of the CUR. Their value-add is reporting and advisement, and giving recommendations on right sizing and reserved instances and savings plans. Though if your cloud team is sufficiently savvy they can do this themselves.
* No matter how you do your analysis, tagging will make your life so much easier. Can't emphasize this enough.
> The third party cost tracking tools (CloudHealth/Metricly/CloudAbility/Cloudyn) are just a wrapper around what you can get out of the CUR. Their value-add is reporting and advisement, and giving recommendations on right sizing and reserved instances and savings plans. Though if your cloud team is sufficiently savvy they can do this themselves.
I work for one of the vendors you mentioned; and while you're not wrong - the data sources are all from the vendors - there's a fair bit of work that goes into actually making sense of it to the point you can give it to people actually causing the spend. Also there's work that goes into optimisation, so that we can bear the cost of your second point.
Your last point is dead on though. For anyone doing cloud at any scale, tagging is non-optional if you want to do any kind of optimisation, chargeback or the like.
The entire FinOps foundation is good for cloud finance management - I believe the author of that post has an O'Reilly book coming out this month on the subject also.
Tagging is simple*. The real killer is the things you can't tag, like bandwidth. Be sure to use multiple AWS accounts if you want to split bills (or at least track bills) to sub-groups like per department. Give each of these groups their own account (or set of accounts, preferably, if we're talking a business.. maybe even different accounts for Dev/preprod/prod, if this is a major cost and the project is worth it)
For tags, you can make any tag you want and summarize bills by tags... So anything take can be tagged is trackable.. But things like bandwidth are not.
It's also hard to enforce tagging when you can't automatically destroy non-complaint objects, so again, separate accounts help here.. if the sub-department wants to know their spend better, THEY are more likely to enforce the rule than A top-down policy from a disconnected IT group... And you can't simply apply a gonna "all things must be tagged" enforced in the AWS level because some items can't be tagged, or the tagging has to happen after creation (for instance, by SDK/cli, you can't create an ec2 instance with tags.. you make the instance, then tag it. The GUI does this behind the scenes so it looks like one step)
So again, for major booking boundaries, use different accounts. After that point, it's on the delegated entities to use tags appropriately... And it's often different for each group anyway.
> It's also hard to enforce tagging when you can't automatically destroy non-complaint objects
You can automatically destroy non-compliant (with your tagging policy) objects, by querying objects that exist and examining their tags through the API (heck, you could even script the CLI to do this), and, if you use AWS Organizations, you can prevent noncompliant resources with a combination of service control policies (to require tagging) and tag policies (to specify use of tags).
> (for instance, by SDK/cli, you can't create an ec2 instance with tags.. you make the instance, then tag it.
That's...not true. The runinstances call in the SDK that creates one or more instances from an AMI takes an optional set of tag specifications for tags that can be applied to the instances and/or any of a wide variety of associated resources.
> And you can't simply apply a gonna "all things must be tagged" enforced in the AWS level because some items can't be tagged, or the tagging has to happen after creation (for instance, by SDK/cli, you can't create an ec2 instance with tags.. you make the instance, then tag it. The GUI does this behind the scenes so it looks like one step)
The accepted approach is warn then terminate. Give them an hour and then if nothing's done start the slaughter.
Set the standard early, enforce with whatever means necessary. It's a common practice to use tools like CloudCustodian to terminate instances that do not have identified tags (with extreme prejudice). Also, normalize everything, and implement the standards/normalization using your build toolset (Jenkins/Terraform/CI du jour) to enforce this.
I’ve wondered if the reasoning behind poor ML usage info has to do with the api endpoints being so profitable that notion of even looking at the code and implementing more straight forward metrics might kill the golden goose.
It's less that and more of the underlying technologies that make up those tools are what surface in the CUR. For instance, you want to know how much you're spending on ElasticBeanstalk. How do you do that? The easiest way is to just get a cost broken down by AMI, then look at the AMIs that are EB (though not the best solution it works most of the time). AI/ML tools do the same. It's hard to break them out (hence why tagging is so important).
We've got a last-ditch alert set when we spend more than $X in a day. But the way you have to do this is baroque and it is a bit unreliable. We set a metric on
Max(EstimatedCharges, over 1 day) - Min(EstimateCharges, over 1 day).
Unfortunately EstimatedCharges only updates once a day and sometimes the Max udpates before the Min, triggering a false alarm. Obviously we could make it more reliable by using a 48 hour period, but then we'd only find out if something went haywire when it had been going for 2 days.
Really, how much would it cost them to run the cron job for EstimateCharges once an hour? Even less would be good (you can spend a lot on AWS in an hour).
It also stops working if you have a credit (until it runs out) so good luck if something goes wrong during that period.
We even asked a consultant (recommended by Amazon) if there was a more fine grained method, and they thought the only way would be using a third party service which ingests all the events and does its own estimate. This is nuts! If your charging is as fine grained as AWS is, so should your reporting be.
Wondering if this would be a good Lambda application. Assuming boto3 allows it, you could have a Lambda function poll for this up to every minute. It actually might be a trivial function to write.
The problem is, the metric only changes value once per day. I don't think reading it in a lambda would trigger it to update faster (although I suppose it's worth trying).
Looked at my AWS bill today. On the positive side the bill is zero, because I have I voucher I got at a conference. But it expires end of the year, so I better understand what I will be paying for.
I spent quite some time in their cost explorer, but I don't understand a lot. Most days I have some positive costs, which is probably my usage. Some days I have negative cost, that's probably when they transfer credits from the voucher. They do this every end of month, but also irregulary some days when I use particular services and/or have higher than usual service.
It appears to me that the negative balances are a sum of costs of several days and the credit from the voucher. As a simpliplifed example I might see +1, +1, +1, -6. So I "reverse engineer" this as I "spent" 1, 1, 1, 3 and on the 4th day they credited 6 from my voucher. Too bad the 3 is not visible, I need to dig it out myself. In reality I use several services and they seem to credit them on different days. So the reverse engineering is not really possible. At least not without a major effort.
I remember many years ago it was possible to download hourly (probably also daily) usage reports. I.e. is usage in hours, KB, requests etc. not in money. I don't find them at all anymore. Anybody knows whether they still exist?
Also to my surprise I was billed 135 SQS requests last month. Well, I wasn't billed, because 1,000,000 are free. But my point is that I don't even know what SQS is and I am sure I haven't used it directly. It appears to me that they are "billing" me for their implementation details, because they might use those SQSes internally. Is that how it works? So if basically not using the services at all causes 135 requests, how much would that be if I really run some production there?
All in all, very opaque. Thank you AWS for the voucher, but I am not impressed about the billing transparency.
I would love to have tooling for what AWS consider an IO/OP, I've read the documentation quite a few times and I think I get it, basically every 256 kb read or reads under that but at what level of the stack is that considered from. I tried to find metrics/tools of how to count it as AWS counts but couldn't really find much.
The reason being I have a library that reads a file in an x-size buffer along a file iteratively using bufio in go, and I'm not exactly sure what optimizations are happening that I can't see, and at some points I'm incrementing a file a byte at a time, thats by definition an IO/OP I think (super inefficient). Unfortunately a lot of the cloud metrics don't give you enough granularity or quick feedback to optimize.
Understanding the AWS bill is far harder than it should be. That being said, there are some resources that weren't mentioned in this blog post. The ultimate source of truth is the AWS Cost & Usage Report [1] which can be delivered in Parquet and queried with SQL via Athena.
Although the Cost & Usage Report alone can solve many billing mysteries, in some cases it's also necessary to go to CloudTrail logs to determine exactly which user or application incurred charges.
I'm one of the founders of https://taloflow.ai (the company tied to the blog post above). We built Tim (taloflow infra monitor) to save endless hours going through spreadsheets or Cost Explorer. We built it as real-time dataflow with visualizations on Grafana so you can correlate events such as deployments to how your costs change. We also built a model that predicts your real-time costs inferred from infrastructure metrics that aren't available through Cost Explorer.
Our tool is free for any devs spending less than 60K a year on AWS. Let me know if you wanna test it out!
A friend of mine makes a living by helping big companies navigating AWS pricing and guessing what the bill will look like. He even built a complete software suite for that :
https://github.com/trackit/trackit
I would guess that the human driving it is the value-add. I'm sure s/he uses the custom built software to produce useful, actionable results for their clients.
My company was at some point hosting boatload of large media files on Amazon S3 for our clients to download. It was financial disaster. As soon as bandwidth from regular hosting services became available I've switched and laughed all the way to the bank. Now switching to dedicated servers for even cheaper.
Is there a cloud computing service that provides a free tier that doesn't need a credit card? I like to maintain a personal website, and I've had to periodically hop from one platform to another.
I started on free.fr, my parents' ISP, with a web page built in iWeb. But then I wanted a .com domain.
I moved to Wordpress, but that didn't let me customise the layout.
Then I moved to Google App Engine (appspot), which is good, but it's blocked in China.
Then I moved to OpenShift (rhcloud), which was great while it lasted. It wasn't just for hosting a static web page - I could SSH into the server, at last! But it shut down the free tier.
I tried Heroku, but I'm getting warnings about 80% of monthly usage even when there's no content there, just a redirecting page.
Currently I'm using Github Pages, but I worry about how Microsoft will try monetising that. It's also a pity not to have SSH, FTP, or SQL - all my apps (e.g. Pingtype) have to run on the client in localstorage.
The company I'm working for spends over $3000 per month on AWS.
Whenever I read about these "free giveaway" AWS coupons that require registering with a credit card, I just think that there's going to be a nasty fee like this. So in practice I just run things locally on my laptop. If there's a better provider, please tell me though! It's been a few years since I last checked the options, and moved everything over to Github.
Netlify does what you want for static sites. I like it better than GitHub Pages because you can just drag and drop a folder to deploy; it's not necessary to deploy through git pushes.
Netlify.com is pretty great for static websites. Its free tier gives you something like 100GB data bandwidth forever. But it does not give you access to SSH - its a managed service. I think you'll have a hard time finding any service willing to deploy virtual machines for free, without a credit card. However Vultr has $2.5 instances, which are basically foh free and can be resized and rebooted in a few seconds.
Are you specifically looking for something free? Or are you just trying to avoid waking up to a massive credit card charge?
If you don't get a lot of traffic, I'd go for an Amazon LightSail instance. $5/month includes 1 TB of data transfer. If you really don't like AWS, Digital Ocean has similar offerings.
Does AWS allow you to pay with pre-paid Visa/MC cards? Could use one of those to pay for the account to avoid a surprise bill from draining your bank account.
Ideally something that doesn't need a credit card to register, yet has paid tiers to allow scaling up if there's demand. A prepaid credit card sounds like a decent idea; if I had less money in my TransferWise then I could probably use that.
Good choice. This exact scenario just bit me this month. Fortunately the bill is less than $100 but I can't even figure out where some of the charges, specifically RDS, are coming from!
Dealing with $130 in charges after the $1000 was used up because they sent absolutely no notification that the credits were almost gone. Didn't find out until the charge appeared on my card.
Genuinely curious if this has been done, but considering that a $1000 is not a lot of money which suggests you are not running a big operation, why not use it and tie the rest to a card with a limit such as from privacy.com? Granted Amazon might deny such cards, in which case, if you really want it, you can set up a new debit card whose account has say a maximum you can spend?
Just because you can't actually pay it doesn't meant they won't bill you for it. That's just going to generate a debt to Amazon once the money runs out. It won't magically turn off your AWS charges...unfortunately.
I’ve had the card I use for AWS expire in the past, so I’ve got the billing notifications from them. Basically they start emailing you as soon as the charge fails and give you a couple months before they shut off your services. As long as you pay the past due charges by then, there are no complications. Based on the wording in the emails, it seems like they will just shut down the account if you don’t pay the past due charges. Which feels like the right outcome to me. I suppose they could send you to collections but companies that do that tend to be more upfront about it.
I'm more concerned about my account. Will they terminate my Amazon & prime account? Would they send it to collections and mess up my credit? I don't have the time or emotional energy to sort through that sort of mess.
AWS is great about dogfooding, most all their services run on top of native AWS (ec2, lambda, dynamo)... but they don't do that for billing. It's all just fake money being thrown around internally.
Not surprising. There's little benefit to AWS from it being clear. We are pretty happy with Cloudability. Especially if you apply tags for app name, app version, portfolio group, environment, etc.
I still get charged a buck a month when I have zero services running. I tore everything down a couple months ago and verified recently. I haven't had a chance to complain because it is such little money but aws billing is complete shit and their alerts are a dark pattern.
They have their own solution for this. Which is fine, and may turn out to be great.
But you might want to talk to people like @QuinnyPig from the Duckbill Group before you assume the fix to your AWS issues is a third party vendor's product.
Their billing tools are notoriously poor. Just yesterday they confirmed for me that Cost Explorer only has access to data going back 12 months, so good luck trying to do year over year comparisons.
I agree Cost Explorer leaves a lot to be desired compared to what third party vendors offer, but CUR can give you more granularity.
Options seem limited from providers in general since Azure and GCP haven't done much better in this regard - GCP cloud billing in particular felt less far along than the other two providers.
Found this issue with other SaaS-like providers too. Eg, a popular email relay/delivery service has a per-email price if you go over your rate plan, with no ability to set a hard limit for the account or per sub-user.
Compromised account or server? It'd be interesting for their spam filters to catch most of it. But an accidental loop or issue in your code? (like another commenter mentioned with a $30k bill). Yikes.
Incredible to lack such a basic feature to better protect an account especially when money is involved.
I also have had instances of Amazon raising large bills against my account when the stated services were not being used. I was able to fight the payment and get the payment reversed because it just so happened that during this specified period my account was locked.. Which also means that none of the Amazon resources allotted to my account would have been in active state
One thing I hate about AWS billing is that it doesn't separate out disk costs from the VM compute. Also, is it just me or is RDS ridiculously expensive? I'm fairly new to AWS, having mostly used (and continue to use) Azure.
People still treat their databases like mainframes; just make the machine bigger to get better performance, and use more reliable components to decrease the risk of data loss or downtime. Amazon is happy to take your money to manage that for you. Their profit margin exists in you being scared of what can go wrong, and the ease of getting things set up.
(I don't have a good suggestion for an alternative to this approach. The design of commonly-used relational databases surprises me. They all assume disks have some sort of intrinsic durability. Disk manufacturers all make you pay extra money to maintain this illusion; underprovisioning, wear leveling, background garbage collection, hardware RAID. But at the end of the day, it can all just get sucked up by a tornado (or a rogue shell script) and all that means nothing. I do not understand why disks are not just dumb blocks of NAND flash connected directly to the application, which can then provide cross-datacenter redundancy and save everyone a ton of money. I guess that is why Google makes their own SSDs and their own planet-scale database engine. They know it's silly. The rest of us are stuck with expensive garbage that is, with 100% certainty, going to fail. Who to blame is all that we can work around, and blaming Amazon is better than blaming yourself!)
Distributed systems bring their own problems. I've seen more production outages caused by poorly coded or configured HA setups than due to any single-host level hardware failure. I remember one application where a 60 second network outage caused a forty minute database outage due to HA supervisors going batshit. What works well are simple setups like async log shipping to another site with manual fail-over. If you think you cannot afford 30 minutes downtime there is really no amount of money you can spend to be sure you are improving your chances.
One thing I hate about AWS billing is that it doesn't separate out disk costs from the VM compute.
Doesn't it? EBS stuff (storage and IOPS) being a separate line item last I checked. Ephemeral storage (if applicable) is included with the compute price.
Also, is it just me or is RDS ridiculously expensive? I'm fairly new to AWS, having mostly used (and continue to use) Azure.
I seem to remember it being around what an EC2 instance cost until you go to multi-AZ and then you're paying for an extra instance. But I've only used RDS with postgres and mysql type engines, none of the proprietary stuff that would add on extra licensing fees.
It may in the actual invoice, but in the dashboard it's difficult to narrow down what it actually costs when you're doing maintenance. A lot of things are just listed as EC2 (other). I can't tell what I'm saving by getting rid of a disk. In this way Azure is more friendly, at least in my experience.
Nope, the RDS instances cost quite a bit more. For example, in us-east-1, an unreserved t3.micro costs $91.104000 a year but a Postgres t3.micro costs $157.680000 a year. That particular RDS instance does not include any storage, you pay as you go with EBS.
For the m4 series, RDS is almost double the instance cost.
Yeah, RDS is pretty expensive but unlike raw compute you do get guaranteed performance.
The issue with AWS is that it's easy to add new services without finding new vendors so companies just spend more and more on AWS as features are not as important as 'cost savings'.
Disk costs at Amazon are an annoyance when you need to spin up a VM just for a few days. Apparently you pay for the entire month in advance even if you just need it for a shorter period.
Whereas Azure disk costs for VMs accrue on a daily basis.
What kind of disk costs? I'm fairly certain this is not how EBS works: you pay only for the time you have the storage provisioned. I think the granularity on that calculation goes down to the level of seconds? Much smaller units than months, at any rate.
That's not what we were finding in our bill. Maybe it's the way we provisioned the VMs or something. But we don't have this issue with Azure. We can see the daily cost associated with the disks.
Aurora is a binary compatible (Mysql, PG) API over an internal AWS infra/storage layer that provides cross-AZ redundancy and transparent scaling. Storage is replicated 6 ways across 3 AZs. You still need to define the "compute" component as an EC2 instance, but storage is not via EBS. Read replicas are compute only, storage is shared. Backups are continuous to S3.
RDS is a managed database "cluster" (in the PG sense of the word) running on EC2 infra. You need to define both compute and storage. Backups are snapshots, not PIT. Replicas are via the standard DB engine replication,
A big (by NZ standards) teleco CEO, Theresa Gettung was famously quoted as saying “What has every telco in the world done in the past? It's used confusion as its chief marketing tool. And that's fine”. She was rightly pilloried for this, but the honesty is admirable.
Not really a billing opaqueness issue, but endless lambda loops are a nasty spending risk. One idea how to at least partially protect against it ttps://theburningmonk.com/2019/06/aws-lambda-how-to-detect-and-stop-accidental-infinite-recursions/
Disclaimer: I have not personally been involved using lambdas for anything serious, so my experience is limited.
* An example CloudFormation from a Re:Invent (AWS conference) session silently failed to tear down some resources.
* Not trusting CloudFormation, I looked through each (known service, region) manually to make sure resources had been torn down. This failed to identify the running resources because a tutorial div opened in regions with no running resources and remained open if you switched to a region with running resources, hiding them.
* Not trusting my manual service tour, I kept a close eye on my daily costs until I saw several days pass with $0 spend. This failed because free tier credits were hiding substantial service usage.
* Not trusting any of the above, I had billing alerts set as a catch-all. They correctly triggered on an unrelated usage surge, but with such high latency that I incorrectly attributed their failure to reset to high latency rather than to a genuine underlying charge.
Bam, $700 charge next month. Amazon was quick to refund half of it. I was eventually able to get them to refund the other half by making waves in the support system of a high-spend business account.
At the last re:invent session I went to, I surveyed a table of 6 people. After sharing my $700 figure, 3 of the 6 came forward with even bigger numbers, 1 of the 6 with a smaller number, and the remaining person was a newbie.