We use Kubernetes and spot instances to reduce EC2 billing up to 80%

threeseed · on Feb 25, 2020

I hope people don't go and take this advice and just run everything on Spot as that is a mistake.

It is very common for AWS to completely run out of entire classes of instance types e.g. all of R5 or all of M5. And when that happens your cluster will die.

What you want to do is split your cluster into minimum two node groups e.g. Core and Task:

Core: On-Demand for all of your critical and management apps e.g. monitoring. Task: Spot for your random, ephemeral jobs that aren't a big deal if it needs to be re-ran.

So for a Spark cluster for example you would pin your driver to the Core nodes and let the executors run on the Task nodes.

someone13 · on Feb 25, 2020

Shout-out for "AutoSpotting", which transparently re-launches a regular On-Demand ASG as spot instances, and will fall back to regular instances: https://github.com/AutoSpotting/AutoSpotting/

Combined with the fact that you can have an ASG with multiple instance types: https://aws.amazon.com/blogs/aws/new-ec2-auto-scaling-groups...

Means that you can be reasonably certain you'll never run out of capacity unless AWS runs out of every single instance type you have requested, terminates your Spot instances, and you can't launch any more On-Demand ones.

(and even so, set a minimum percentage of On-Demand in AutoSpotting to ensure you maintain at least some capacity)

londons_explore · on Feb 25, 2020

> runs out of every single instance type you have requested, terminates your Spot instances, and you can't launch any more On-Demand ones.

This is more common than you think.

Internally cloud providers schedule instance types on real hardware, and running out of an instance type likely means they have run out of capacity, and only a tiny amount exists in fragmentation. To access that tiny remainder, they'll terminate spot instances and migrate live users (which they have to do very slowly) to make space for a few more of whichever instance types make most business sense (which varies depending on the mix of real hardware and existing instance types).

It takes someone like AWS a good few weeks, sometimes months, to provision new actual hardware.

It isn't uncommon for big users to be told they'll be given a service credit if they'll move away from a capacity constrained zone.

cm2187 · on Feb 25, 2020

Is there a similar concept to airline upgrading? Better than to deny a paying customer to board the plane. Surely there must be spare capacity, somewhere in the datacentre, with slightly better specs.

londons_explore · on Feb 25, 2020

Yes - they totally do that. If there is only space for a large instance, but you want a small one, they fit your small one in the free capacity, and there is now space for someone else to fit another small one next to it.

For business reasons they might decide not to do that though - your small instance might mean they have to say no to a big allocation later.

Instead they just delay your instance starting and hope other instances moving around opens up a more suitable location for it.

Theres an entire paper on the topic: https://dl.acm.org/doi/10.1145/2797211

alien_ · on Feb 25, 2020

The AutoSpotting author here, always feels great to see my little pet project mentioned by happy users. Thank you for making my day!

To set matters straight, AutoSpotting pre-dates the new AutoScaling mixed instance types functionality by a couple of years and it (intentionally) doesn't make use of it under the hood for reliability reasons related to failover to on-demand. To avoid any race conditions, AutoSpotting currently ignores any groups configured with mixed instances policy.

In the default configuration AutoSpotting implements a lazy/best-effort on-demand->spot replacement logic with built-in failover to on demand and to different spot instance types. To keep costs down, it is only triggered when failing to launch new spot instances (for whatever reason, including insufficient spot capacity).

What we do is iterating in increasing order of the spot price until successfully launching a compatible spot instance (roughly at least as large as the original from CPU/Memory/disk perspective but cheaper per hour). If all compatible spot instances fail to launch, the group keeps running the existing on-demand capacity. We retry this every few minutes until we eventually succeed.

There's currently no failover to multiple on-demand instance types (this is a known limitation), but this could be implemented with reasonable effort.

We're also working in significantly improving the current replacement logic to address a bunch of edge cases with a significant architectural change(making use of instance launch events). I'm very excited about this improvement and looking forward to having this land, hopefully within a few weeks.

At the end of the day, unlike most tools in this space(including AWS offerings) AutoSpotting is an open source project so if anyone is interested in helping out implement any of these improvements(or maybe others), while at the same time getting experience with Go and using the AWS APIs, which are nowadays very valuable skills, you're more than welcome to join the fun.

alien_ · on March 3, 2020

Thanks for the shout-out, really appreciate it.

If you don't mind I'd like to get some feedback/feature ideas from users like you.

Please get in touch with me on https://gitter.im/cristim

ignoramous · on Feb 25, 2020

ASG, per the blog-post you linked to, now supports starting both on-demand and spot instances, so what's the use of AutoSpotting?

alien_ · on Feb 25, 2020

The author of AutoSpotting here, this is often being asked and I'm happy to clarify it.

The mixed capacity ASGs currently run at decreased capacity when failing to launch spot instances. AutoSpotting will automatically failover to on-demand capacity when spot capacity is lost and back to spot once it can launch it again.

Another useful feature is that it most often requires no configuration of older on-demand ASGs, because it can just take them over and replace their nodes with compatible spot instances.

This makes it very popular for people who run legacy infrastructure that can't be tampered with for whatever reasons, as well as for large-scale rollouts on hundreds of accounts. Someone recently deployed it on infrastructure still running on EC2 Classic started in 2008 or so that wasn't touched for years.

Another large company deployed it with the default opt-in configuration against hundreds of AWS accounts owned by as many teams, many with legacy instances running for years. It would normally take them years to coordinate as a mass migration but it just took them a couple of months to migrate to spot. The teams could opt-in and try it out on their application or opt-out known sensitive workloads. A few weeks later then they centrally switched the configuration to opt-out mode, converting most of their infrastructure to spot literally overnight and saving lots of money with very little configuration effort and very few disruption to the teams.

If you want to learn more about it have a look at our FAQ at https://autospotting.org/faq/index.html

It's also the most prominent open source tool in this space. Most competition consists of closed-source, commercial (and often quite expensive) tools so if you're currently having any issues or missing functionality, anyone skilled enough can submit a fix or improvement pull request.

616c · on Feb 26, 2020

Where can I read about some of these more impressive use cases you describe?

alien_ · on Feb 26, 2020

Have a look at https://github.com/AutoSpotting/AutoSpotting or the FAQ section on https://autospotting.org

If those don't answer your questions feel free to reach out to me and I'll do my best to explain further.

kondro · on Feb 25, 2020

It replaces on demand instances in-place. If there’s no spot instances, it will leave them running. If the spot instance gets killed, it will start again as on demand.

It sounds a bit hinky, but it tends to leave you with the number of instances you want running without having to determine what percentage of the ASG should be on demand or spot — especially with the possibility of not being able to start new spot instances if they’ve been terminated.

samrohn · on Feb 25, 2020

Yes. We ran into this early on. Setting your bid price(maximum price you are willing to pay for the resource you are spinning up) higher than the average spot price do not protect you against instance termination. Also even if you place your spot bid price at current on-demand price (you only get charged at the current spot market price) your instane will be terminated if there is no resource availble in the spot instance pool. For example, you have spin up an ec2 instance with 100 GB RAM with type spot instance, with bid price set to the on-demand price. If someone else spin up a 100GB as on-demand instance type- if there is no resource left in spot pool, your instance will be terminated.

tuananh · on Feb 25, 2020

author here: this is why i suggest to use mix of reserved instances and spot instances. worse case scenario, everything will be on the small number of reserved + on-demand instances.

there are other strategies to avoid this as well

using multiple type of instance, of different sizes, on different availability zones. because price spike is different for each combinations

using bigger size instance than usual because bigger = more stable / less likely to get evicted

ryeguy · on Feb 25, 2020

Why not use spot fleets to solve this? You can specify multiple acceptable instance types.

threeseed · on Feb 25, 2020

I can't speak for the other regions but in ap-southeast-2 we had a couple of incidents in 2019 where there was no Spot capacity for ANY instance types between say 64GB and 512GB of RAM.

And so even our instance fleets were failing to provision capacity.

ramraj07 · on Feb 25, 2020

Requiring a minimum of 64GB RAM seems quite unique computationally, so if it's mission critical that you have at least one instance of this type all the time up, then maybe you could reserve that one instance fully while depending on spots for scaling?

tuananh · on Feb 25, 2020

this is why ability to fall back to on-demand instances is critical.

londons_explore · on Feb 25, 2020

When there are no spot instances, it's common that on-demand instances can't be started either.

Unless you paid for a reservation, you're SOL.

maniktan · on March 2, 2020

Fwiw, Product@SpotInst here. Curious what makes you arrive at this conclusion? At SpotInst we have customers who migrate between Spot <--> RI <--> OD everyday.

tuananh · on Feb 25, 2020

that's for you to decide how many percent of reserved instances out of total compute power you need. it's always a trade off. in that case, pct of cost saving is lower but availability is higher.

paulddraper · on Feb 25, 2020

It seems weird that AWS wouldn't just do this automatically, right?

llarsson · on Feb 25, 2020

Yes and no. Because they do not want to disclose how much capacity they have and on demand costs more, customers would be left wondering if AWS unfairly decided to run on demand when spot could have been used instead.

halbritt · on Feb 25, 2020

I'm also puzzled why people wouldn't just use spot fleet with on-demand as backup.

tuananh · on Feb 25, 2020

because at the time when we did this (2016-2017), kops did not support spot fleet. worst case scenario, you get mix of reserved + ondemand instances.

CoolGuySteve · on Feb 25, 2020

I don't think I've ever seen spotfleet not satisfy a request with all the same instance type. When the instance gets terminated, it seems to fill again with the same instance type as before (and then get killed again in busy periods).

For the past few weeks I've been having this exact problem with GPU instances, where on-demand requests would kill my spotfleet instances in the middle of a job, I'd spin up again, get killed again a few minutes later.

The only way to fix it without rewriting a bunch of stuff was to blacklist that instance type from my spotfleet requests.

It definitely wasted a bunch of my time though which seems to be increasingly more common with newer AWS APIs.

maniktan · on March 2, 2020

Fwiw, Product@SpotInst here. Might I suggest that you give us a try. We ward off against this specific scenario by calculating a Spot score based on Instance Type/AZ and deliver flexibility with a choice of different instance types.

tuananh · on Feb 25, 2020

because at the time when we did this (2016-2017), kops did not support spot fleet

maniktan · on March 2, 2020

Fwiw, I lead Product for Ocean at SpotInst. Your analysis is _spot_ on :). At SpotInst, we predict interruptions by calculating a Spot Market Score which is based on Instance Family, Type and AZ (using historic data). This allows us to spread pods/workloads across a variety of spot instance types while delivering an SLA. As an additional data point, we have a mix of stateless/stateful workloads running 24x7 on our platform. For critical/management workloads/apps, RI could make a lot of sense and we can dynamically buy/sell those on your behalf, delivering a fully-managed Spot <--> RI <--> OD solution where needed.

croddin · on Feb 25, 2020

Fargate spot is out now and pretty nice, since it is just running a container of a certain size and can decide where it is run it seems like it would be less likely to run out of capacity as well.

rumanator · on Feb 25, 2020

The main problem with AWS Fargate, as with most of AWS's offering, is cost. More precisely, granularity. AWS Fargate cost infrastructure, at best, charges you per ceiling of 0.25 vCPU and 0.5GB of RAM (rounded up) per second, with a ceiling of 2GB of RAM. If your containers have some spiky workloads that require more than 2GB of RAM then the coarseness goes up and your are charged per 1GB of RAM (rounded up) per second. That means a spiky workload can lead you to spend more than a month's worth of a EC2 instance with 2GB of RAM (say, T3.small with 2vCPU and 2GB of RAM).

The whole service becomes even more ridiculous if a company is already operating EC2 instances or even kubernetes clusters on EC2 spot instances (the topic of this discussion), because more often than not these workloads can be already covered by the cluster's unused capacity.

maniktan · on March 2, 2020

Product@SpotInst here. Just fyi, we have customers launch Spark workloads that require 300G RAM and we are able to efficiently deliver this capacity by ensuring that the underlying infrastructure is shared with other pods that are not memory bound. We bin-pack, scale up and down quickly, while allowing you to have warmed-up spare capacity when needed to ensure that your pods/tasks are serviceable immediately.

teddyuk · on Feb 25, 2020

I spent months in the summer of 2018 not able to deploy azure gs4/5’s in north or west Europe - such a pain

hinkley · on Feb 25, 2020

are you using or have you built and dashboards to track this sort of stuff? It seems like deprioritizing certain types of work to save money might warrant some sort of forecasting info. Should I rely on something happening today or not?

Jnr · on Feb 25, 2020

Why are people so obsessed by AWS? It is one of the most expensive hosting solutions that tries hard to lock you into their ecosystem.

I somewhat understand why enterprises want to use it, but why are small startups using it so much and then complaining about the cost?

Nowadays when we have high speed internet, and a lot of things are containerized, it is so simple to change hosting partners. Just pick one that doesn't cost an arm and a leg and move to a different one if it didn't fit very well.

I have used linux containers for 10 years now and changed hosting a few times, each time reducing costs even more. Yes, it is a bit of manual labour, but if you have someone with sysadmin/devops skills, it is easily doable.

rumanator · on Feb 25, 2020

> Why are people so obsessed by AWS? It is one of the most expensive hosting solutions that tries hard to lock you into their ecosystem.

I agree with you, and that's why I try to get the point of view of those who actually decide to adopt AWS. They aren't crazy or stupid, and as AWS is the world's leading cloud provider then it's highly doubtful that the decision is irrational and motivated by ignorance.

So far, the main fallacy with regards to companies picking AWS is that cost is relevant. It isn't. AWS might overcharge a lot, but truth of the matter is that for any sizeable corporation it's irrelevant if they spend 200€ or 400€ on their cloud infrastructure. It's far less than a salary and it's even less than the bill for some office utilities. So once the infrastructure foot is in the door then why would management worry about cost? What they do care about is uptime and development speed, because that has direct impact on productivity, and thus value extracted from salaries. If a particular service provider enables you to show tangible results in no time at all (see spin up a database or message broker or workflow in less than nothing) then they don't mind paying a premium that matches, say, their air conditioning bill.

Legogris · on Feb 25, 2020

For a startup, it can work out like this: Start out on AWS/GCP/Azure in the initial phase when you want to optimize for velocity in terms of pushing out new functionality and services. When you start to require several message queues, different data stores, dynamic provisioning and high availability, you save a lot on setup and maintenance - the initial cost of getting your private own cloud up and running, and doing so stably, is not to be underestimated. Especially when you're still exploring and haven't figured out the best technologies for you long-term.

Then, at some point, that dynamic changes as you have a better understanding of your needs, the bills start to build up and the architecture is in less flux. You might also have a bigger team and can afford to start allocating more resources to operation. That is the point when it it might make sense to migrate over to self-managed.

Then at the same time, you have the scalability, which might be more of a key point for even larger organizations.

I think building somewhat cloud-agnostic to ease friction of provider migration is good, regardless, but do so pragmatically and look at the APIs from a service perspective.

Kubernetes? All the bigger providers have alternatives and you can run your own. Fargate? You're going to have to do some rewrites. MemoryStore? Just swap for your another Redis instance. BigTable? Highly GCP specific. etc.

Not saying there aren't a lot of companies who choose the wrong provider for the wrong reasons, but it can also be part of a conscious strategy. Also nobody got fired for IBM and so on.

rumanator · on Feb 25, 2020

> Then, at some point, that dynamic changes as you have a better understanding of your needs, the bills start to build up and the architecture is in less flux. You might also have a bigger team and can afford to start allocating more resources to operation. That is the point when it it might make sense to migrate over to self-managed.

I completely agree, and I had this discussion with my direct manager in the past. Yet, even if the potential savings are significant, managers might not be too keen on investing on switching your infrastructure. Running your own infrastructure is risky, and although top managers enjoy lower utility bills they don't enjoy the sight of having a greater risk of suffering downtime, specially if the downtime is self-inflicted and affects business-critical services.

So, if this transition doesn't go perfectly smoothly... The people signing for the migration to self-hosting services might be risking their whole career on a play that, at best, only brings some short-term operational cost savings. Does this justify a move whose best case scenario is equivalent to a AWS discount?

Legogris · on Feb 25, 2020

For sure, there are a lot of factors to consider here. For heavy workloads where you need solid I/O, the cost for the same performance on baremetal vs VMs on cloud can be >4x so you could even afford to have a duplicate setup on another network for redundancy and still be saving.

ghaff · on Feb 25, 2020

Moving workloads in-house does happen. But, in general, you're right. It's hard to advocate for a near-term expensive (in time and money) and at least somewhat risky (expect some nights and weekends crises) migration for possibly some longer term cost benefit (assuming you've accounted for all the costs). Which BTW neither you nor your manager may still be around to take credit for. And also BTW is at least somewhat counter to what companies are doing in general, for better or worse, and which execs will probably rightly see as a potential distraction from whatever the company is trying to accomplish.

Frankly the whole discussion mostly highlights that these are things you need to think about upfront before you're fully committed.

harikb · on Feb 25, 2020

Back in the day, when I was part of a startup, the DB guy was all into making us write “provide agnostic” sql in case we ever wanted to switch to Mysql or Oracle. We were actually using Postgres. This was a nightmare.

Things started improving when we said ‘f-it we are not moving out of Postgres, let us at least use the best features of PG’

There is a similar problem when trying to use AWS with the constant thought about moving out of AWS at some point.

Legogris · on Feb 25, 2020

Yeah, this is a bit what I mean with doing it pragmatically - at least when you choose provider-specific services know that either 1) you have an idea of how you would migrate it or 2) it is a conscious decision to leverage a USP. By taking the provider-agnostic paradigm to the extreme, you have the least common denominator, getting none of the upsides.

vraivroo · on Feb 25, 2020

> Start out on AWS/GCP/Azure in the initial phase when you want to optimize for velocity in terms of pushing out new functionality and services.

Have you ever done this? It's exorbitantly hard to migrate off of a cloud provider, and few ever do.

LysPJ · on Feb 25, 2020

I agree that migrating off a cloud provider can be very hard. However, architecting your system with portability in mind can help a lot, as rumanator points out:

> I think building somewhat cloud-agnostic to ease friction of provider migration is good

Of course, that's not always an option if the system is already built, but it's definitely a good approach.

aiisjustanif · on Feb 25, 2020

I have and It’s only hard if things aren’t dockerized imo. If everything is in something like cloud foundry it will be much harder.

If you don’t have heavy AD and policies in place even better.

rumanator · on Feb 25, 2020

+1 on containerization, and I would add that controlling your container orchestration service (particularly ingress and security) is another key factor. Whether someone uses Docker Swarm or Kubernetes, this setup enables anyone to redeploy their entire applications at a blink of an eye, regardless of which cloud service provider you use.

vraivroo · on Feb 25, 2020

If your entire infra is dockerized, you are in the vast minority and should probably be discarded as an outlier.

vidarh · on Feb 25, 2020

> They aren't crazy or stupid, and as AWS is the world's leading cloud provider then it's highly doubtful that the decision is irrational and motivated by ignorance.

Part of the problem is that a huge proportion of the people I come across who chose AWS used this exact argument. Part of the problem with that argument is that none of the big guys are paying the list prices (unless they're not doing their jobs; I've seen the kind of discounts available once you get into even the few hundred k/month range and tell your account manager you're considering moving), and a lot of them also used the same line of thinking.

It pulls in a lot of people who pick AWS for all the wrong reasons.

> AWS might overcharge a lot, but truth of the matter is that for any sizeable corporation it's irrelevant if they spend 200€ or 400€ on their cloud infrastructure.

The ones I used to deal with used to be more like a 3x-10x cost difference on bills in the 10k-100k/month range. I agree with you that if the difference is ~200/month, then who cares. But a lot of much bigger companies burn money this way. Often because they started off with a 200/month difference, and then never made it a point to re-evaluate as their costs grew.

The difference isn't always that bad, but especially bandwidth hungry services are ridiculously expensive on AWS (to the point where if people really badly want to stay on AWS and spend a lot on bandwidth, a quick and dirty fix is to rent servers to use as caches in front of their AWS setup)

I'm not saying people shouldn't use AWS. But as you point out, the right usecase for AWS is when you don't mind the cost, and pick it for convenience, and there's the warm fuzzy feeling of knowing you can hire people "off the street" who knows how AWS works.

AWS is the luxury option. Sometimes you want the luxury option.

But it worries me how many startups build in ways that end up locking them into a provider that for some of them multiplies their per user cost by anything from 2x to 10x. When I evaluate startup pitches today, I often ask whether or not they have thought this through. It doesn't matter so much that they're on AWS - that might well be a fine choice. What matters is whether it was a conscious decision, and they've done at least a superficial attempt at modelling the costs both for AWS and some alternatives, rather than just picked it by default.

devonkim · on Feb 25, 2020

For any start-up that is expecting to work with enterprise customers there is no choice and has been this way for at least 3 years now but to support AWS. This doesn't mean you must use AWS (with Azure close second and GCP irrelevant essentially for most large non-tech customers) for your entire footprint, but you will need to have POP in most regions that the F500 works with strategically. Any enterprise tech founder that utters the word cloud should know this as table stakes to compete for the foreseeable future.

vidarh · on Feb 27, 2020

That absolutely makes sense. But you can achieve that by either being able to deploy the client specific bits to AWS for clients that absolutely insist, or by simply deploying proxies and picking and choosing on a per service basis whether or not it makes sense to deploy it to AWS or proxy it to your own infra.

devonkim · on Feb 27, 2020

Sure, but there's a fun catch with this strategy. I'm familiar with some companies that refuse to work with vendors that use AWS for production hosting of their footprint because it would be funding their competitor. There's no such thing as a company that contractually requires all of production in AWS though, in contrast, not even AWS.

kapilvt · on Feb 25, 2020

wrt to aws, there's also the option of building out with serverless tech, which is a dramatic cost reduction (not paying for idle) and ideally scaled for usage to business model/revenue. cloud portability suffers but I've found for an ha setup, its a dramatic cost savings (10x) while getting traction for a service. I've seen that transformation bear out in many enterprise companies. As an example I built out https://awsapichanges.info and run it for less then a dollar month with fargate spot, s3, etc. Just saying ymmv depending on how you want to build/design your app's use of infra resources.

vidarh · on Feb 27, 2020

It really isn't a dramatic cost reduction for most people, as most people simply don't have sufficiently large difference between base load and peak. I love the concept of serverless, but I'm not seeing it as a cost savings measure for the most part, but as a simplification of architecture.

ldoughty · on Feb 25, 2020

I basically agree...

However, I'd argue if you pick the right tools from the start you can leverage AWS relatively inexpensively... But that's hard without enough cloud knowledge in the industry yet, and consultants are (generally) terrible at this.

The main advantage is you can pay that crazy $200/month for a scalable database without paying $5,000+/month (burdened) for the guy that can build it and maintain it for you. A developer can handle connecting and writing code for clusters not easily than they can learn how to do scalable database -- and this is just an example, replace DBs with some other function you might hire a person or a team to do.

rumanator · on Feb 25, 2020

> However, I'd argue if you pick the right tools from the start you can leverage AWS relatively inexpensively... But that's hard without enough cloud knowledge in the industry yet, and consultants are (generally) terrible at this.

You'll have a hard time finding a AWS consultant that's specialized (or inclined) to help you set up your infrastructure so that you don't use or need to use AWS. Not only is there no need for that sort of service, it would actually kill the goose that lays their golden eggs.

Odds are that you could find consultants that are specialized in some other cloud service provider, and aren't experienced enough in AWS to be in a position to smoothly migrate services out of it.

vidarh · on Feb 25, 2020

I used to do that, and used to actively recommend customers not to use AWS in most cases, or to use hybrid setups, as while my billable hours tended to be higher for AWS setups, the demand is high enough that it's not worth making bad recommendations to milk a client vs. showing them that you can help them secure substantial savings.

Some setups we made cloud agnostic enough that when we finally got migrations approved we were able to do zero downtime migrations by splitting the setup between providers temporarily. That incidentally was the best way of getting people to migrate: You make the case for resilience and flexibility, then argue for a test run for a month or so, and then all it takes is for them to compare the bills.

I even several times made offers to migrate customers off AWS for a proportion of what I estimated they'd save over the next 3-6 months. None of them ever took me up on it when they realized just how much that'd add up to vs. the fixed time based offer I gave them but it was a useful sales tool to demonstrate that I was willing to stand behind my estimates. One customer slashed their hosting bill 90% by getting off AWS (they were bandwidth heavy, and we cut their bandwidth costs by 98%; AWS outbound transfer is ridiculous)..

[as I've said elsewhere, AWS has its uses, but keeping cost down is not one of them... Ironically one of the good uses for AWS is to keep the cost of a dedicated setup down: Being able to "spill" over onto AWS (or other cloud) instances in the case of unexpected events lets you operate far closer to the wire than you otherwise would dare on a dedicated environment, even if you rarely use the capability; doing so also allows for more easily spinning up additional test/dev environments etc]

The biggest reason I can see why you don't find more consultants offering those services, is that a surprising proportion of people hire consultants to give them backing to do what they already want done ("see, the AWS consultant says I was right to want to use this AWS service") rather than to genuinely give independent advice. If you're not comfortable being repeatedly told "yes, but here's why we'll be ignoring the professional advice we ostensibly hired you for" and being very careful about forcefully presenting opinions backed by evidence that don't match the hiring managers preconceptions, the pool of contracts rapidly shrinks.

ldoughty · on Feb 25, 2020

I didn't say avoid AWS... I was more trying to point out that consultants and what not with certifications and "experience" are great at regurgitating information but often lack personal experience and result of their work months down the line. They suggest using tools based on their understanding, which is frequently with minimal hands-on experience... E.g. use a bunch of Dynamo DB tables with several indexes each to get serverless database.. ignoring concepts of duplicating data and leveraging hash/range, avoiding scans.. etc. (As an example)

einpoklum · on Feb 25, 2020

You're assuming the organization/company does not do a lot of computation. If that's true, then yes, cost is not much of a factor. But in that case, a company could just rent a server at some ISP and be done with it.

In interesting cases, cost is _very_ significant. And it's not a 2x factor, IIRC it can easily be 10x.

rumanator · on Feb 25, 2020

> You're assuming the organization/company does not do a lot of computation.

I am assuming nothing. I personally had this very same debate with a program manager of a company whose bread and butter was doing a lot of computation. At the eyes of upper level management, arguing about spending 20€ or 100€ on a cloud service provider is a kin of debating which brand of detergent should the company buy. It's irrelevant.

> But in that case, a company could just rent a server at some ISP and be done with it.

That's where you get it all wrong. Hiring bare metal services does nothing to your ability to scale, either to meet demand or to develop/test/try out new services, nor does it help you use higher-level managed services. Everything you have to do or manage by yourself is a productivity hit, and that productivity hit is measured as a percentage of your entire payroll, which eclipses how much you pay to your cloud service provider.

gowld · on Feb 26, 2020

100€ on cloud service is not "a lot of computation".

satanspastaroll · on Feb 25, 2020

Computing itself is the cheap part, especially on spot instances. It's the outbound bandwidth that'll kill you

einpoklum · on Feb 25, 2020

Ok, so s/computing/computing, storage and network I\/O/ . Same argument.

... although if it's outbound bandwidth - if you can take care of your compute, maybe it's possible to purchase outgoing bandwidth at an hourly resolution, rather than "whatever your wires can send us", and have your flexibility.

xtracto · on Feb 25, 2020

Because it works?

I was doing some testing of Serververless (the framework) for a personal project. I wanted to do it in Google functions + Database, but even for the basic out of the examples GCS wouldnt work; I spent 3 hours fiddling around. I moved on to AWS lambda + Dynamo and was done in 1 hour.

Also AWS support is simply amazing. Considering Google history of bad support, I wouldnt consider it for anything serious.

paulddraper · on Feb 25, 2020

> Also AWS support is simply amazing.

A few weeks ago, I had a two-node Elasticsearch cluster (evidently my mistake...though IDK why AWS can operate high availability two-node transaction RDS clusters no problem but not ES).

One node went down, only manual intervention by AWS support could fix it, automatic backups were completely inaccessible (since they rely on the cluster being up? ... WTF), and it took many hours for support to reset the cluster.

I eventually just "screw it," and spent a couple rebuilding the cluster from scratch.

andrew_wc_brown · on Feb 25, 2020

AWS has over 175+ services and is continually improving. I would say more new services on launch don't live up to the promise but quarter after quarter you see dramatic improvement, some services it's year after year. What makes AWS so valuable is not individual services but the fact you can string them together which outweighs the feature set of a single service.

Github is amazing for developer happiness but CodeCommit is secure and seamlessly integrates with so many AWS Services I Can live without all the bells and whisltes of Github.

paulddraper · on Feb 26, 2020

Elasticsearch is on the older side of those 175, since it launched in 2015. [1]

[1] https://aws.amazon.com/blogs/aws/new-amazon-elasticsearch-se...

abofh · on Feb 25, 2020

I've never had aws hand over anything under a tenuous legal argument. Github bent over and found the first reason they could. I'll pay my money to AWS first, because it doesn't compromise my business.

goatherders · on Feb 25, 2020

This. AWS support is wildly expensive. I actually encourage most companies to find a smaller partner with strong infrastructure for this exact reason.

andrew_wc_brown · on Feb 25, 2020

AWS Business Support is only $100 / USD per month where you can call in 24/7 and be connected with a Cloud Support Engineer within 5-15 mins. If you don't like the engineer, hang up call again and get someone different. That's incredibly in-expensive, saves me hours or hiring more devs.

You can get this Business Support for free a year if you join StartupSchool and get 3K USD credits with the business support.

If you can't afford it, maybe you are just hobbying around but AWS offers lots of ways to support you until you have a substainable revenue.

pnutjam · on Feb 25, 2020

Last I checked it was a percentage of your spend.

goatherders · on Feb 25, 2020

I think you may not appreciate how much $100 a month is for many people around the globe.

ghaff · on Feb 25, 2020

That may be. But, then, you're definitely in the category of your time isn't really worth anything and screwing around for a few days doing DIY on software/hardware/etc. is a better solution than paying someone to do something for you. That's fine but you're really describing paid support generally.

Swizec · on Feb 25, 2020

This right here

I always hear people on the internet talking about AWS being crazy expensive, but from SFBA it looks really damn cheap. Would I rather give $thousands to AWS or $hundredthousands to an internal specialist who’s likely gonna say my company is too small and boring to keep them interested anyway?

AWS wind that math every day. And that’s the market they target. Why wouldnt they?

goatherders · on Feb 25, 2020

AWS is extremely expensive once you get to any meaningful size. If you save on infrastructure you pay for it on enterprise support, engineers or consultants and/or bandwidth.

I LIKE AWS. I think it can be a great choice for many companies and use-cases. But the idea that AWS is firmly everyone, that it's less expensive for everyone when you factor in the TCO, simply isnt correct.

Swizec · on Feb 25, 2020

> But the idea that AWS is firmly everyone, that it's less expensive for everyone when you factor in the TCO, simply isnt correct.

Correct. I specifically said it’s perfect if your labor cost is higher than your AWS cost

Another good case is indie hackers whose project likely won’t ever make it out of the free tier

_hudj · on Feb 25, 2020

This just isn't true, in fact quite the opposite. If you just have a website you are working on aws is complex and expensive. When you need to handle it at scale then AWS is far better than other clouds and an order of magnitude cheaper than rolling your own racks.

abofh · on Feb 25, 2020

What's meaningful? It'll cost you a half million to build a full rack, power it and give it connectivity for a year. That's a lot of spot instances if you don't have a constant load

snark42 · on Feb 25, 2020

> It'll cost you a half million to build a full rack, power it and give it connectivity for a year.

For $1500/mo you can get half rack (5Kv) and 50 Mbps internet. A couple $5k switches and 6-8 10k servers and you're well under $125k, plus you probably don't need $10k servers or $5k switches and can find cheaper hosting. I realize you said a full rack, but that's probably overkill and could be done for $350k or less. Once you have the servers/switches your fixed costs are relatively low.

ghaff · on Feb 25, 2020

Part of it is that people see the AWS or support bill or consultant bill or whatever. But they don't really see the cost of DIY whether initial cost or ongoing support. The cost of build (on many dimensions) vs. buy is often underestimated. That's not to say you should never build--no, you shouldn't use AWS for everything--but it's important to understand the real costs.

xtracto · on Feb 26, 2020

I ran the tech team of a startup in a third world country (bullet lending $100 bucks to people) we were cheap and it was always a problem to convince finance to buy technology instead of throwing more cheap labour.

Still, those $100 monthly we paid for aws support were worth their weight in gold.

nkozyra · on Feb 25, 2020

For a person anywhere on earth? Sure. But 99.99999% of those people don't need on-demand cloud hosting support.

But for a business that requires cloud hosting with support? There aren't many places on the planet where $100 is a prohibitive cost for a business that's willing to spend at minimum that much for hosting.

In short, it's the cost of doing business.

frockington1 · on Feb 25, 2020

They once tried to get me to sign up for $100/month to fix an issue on their side. I refused and 3 days later my problem was magically resolved. AWS is starting to mimic the poor customer treatment I used to only associate with the Amazon marketplace.

whatsmyusername · on Feb 25, 2020

Elasticsearch is a bad example. Their elasticsearch offering has always been wack.

There's a tier of services at AWS:

s3/ec2/ecs/rds/elasticache - Flagships, will almost always work except in weird edge cases. Everyone uses these.

Niche Stuff - If you need it you'll know. It'll generally at least be an 80% tool (think athena/firehose/aws waf/etc).

Stuff with shitty pricing - Stuff they obviously only built for feature parity with GCP/Microsoft (think EKS/Managed SFTP/Cloud Active Directory)

Broken shit you use once, it screws you, and you never use it again - Usually it's either because the underlying open source tool is built in a fashion that isn't appropriate for a shared services environment or it's because someone at AWS has 'opinions' (think Elasticsearch/Cloudformation)

rpedela · on Feb 26, 2020

It is not recommended to run two node ES clusters. The AWS service shouldn't allow that configuration.

paulddraper · on Feb 26, 2020

Yeah, that was my mistake.

theterriblestid · on Feb 25, 2020

I once used a much larger EC2 than I needed and got billed a few hundred. After explaining the situation they refunded all of it.

numbsafari · on Feb 25, 2020

We do it because those alt vendors don’t have the security or compliance options offered by AWS/GCP/Azure.

For example, I wanted to replicate GCS to another hosted block store so that we could have a backup of our systems outside of our GCP account (they have been known to lock accounts on small businesses and not be very helpful in fixing it, GCS itself has been as stable as S3 in my experience).

Anyhow, I really wanted to use Backblaze B2 service for this purpose. Unfortunately, they don’t have the kinds of security controls or third party audits our industry requires, and their sales team indicated it wasn’t on the roadmap. I appreciate that honesty, but it’s one more reason the major players have a leg up.

They amortize the cost of compliance to the point where you don’t see it. For a long time AWS charged a lot extra for it, but GCP did not and was cheaper anyway. Now AWS gives compliance away for “free” as well.

It often leaves me wondering about other startups... how secure are they, really? I know my industry is onerous, but a lot of it is just “common sense” security-wise. Why should my browsing history or e-commerce purchases be any different from my medical records, when there are ways to use former to reverse engineer the latter?

codegladiator · on Feb 25, 2020

I used to get 18k monthly bill on AWS, moved everything to DO and now it costs me 8k a month. Billing is so much easier to understand now as well. It was k8s to k8s cluster transfer, so migration wasn't very painful.

vidarh · on Feb 25, 2020

And even DO is still quite expensive (though the cheaper alternatives do tend to have limitations - e.g. if you're ok with Europe only Hetzner tends to be far cheaper, but that doesn't work for everyone).

codegladiator · on Feb 25, 2020

Thanks but majority of the target audience is in SEA. Also we considered moving to DO only once they had their k8s offering ready.

I might have missed it in the search, but seems Hetzner is probably still working on it. Nobody here like to deploy k8s although the tools to do that are super sexy these days.

vidarh · on Feb 27, 2020

Hetzner deploys new features very slowly; that's the main issue with them if you're not prepared to roll your own.

But certainly for SEA they don't make sense, unless you e.g. have any subsets of functionality that are bandwidth intensive but not latency insensitive (e.g. I used to manage a network that was split between UK, Germany and New Zealand, and we used Hetzner for the German footprint and put anything where latency didn't matter, like bulk e-mailing, there, while customer facing stuff was all in their respective country). For that to be worthwhile you need a quite significant volume though.

shiftpgdn · on Feb 25, 2020

You could get that 8k/mo bill down to $500/month with a $8000 hardware purchase and well thought out colocation.

UncleMeat · on Feb 25, 2020

And support work. Saving 20k a month is one engineer in expensive markets.

paulddraper · on Feb 25, 2020

If you use k8s, you can shop around more easily.

Of course, the flexibility of k8s does come with complexity, so YMMV.

Tehnix · on Feb 25, 2020

Two main reasons: Elasticity and Availability.

Our services are highly elastic, and can vary from processing 8 million events/day to 600 million events/day. Same as with our users, which are mainly active during work hours, with some running night shifts, and then fewer running weekends.

We are probably the prime-case for cloud, since elasticity is where you save cost by going away from dedicated hosting.

As for availability; our customers are highly dependant on us processing their data live, and them being able to monitor, get alarms and react on their data. They rely on us to notify whichever technician needs to fix their production line when it's stopped, since they loosing money for every minute the line is not running (true for almost any manufacturing company).

This means that we need a lot of redundancy, and these things are built-in to almost all AWS offerings.

--

There's a case for dedicated servers still, I'll agree on that, but we are definitely benefiting from the cloud.

einpoklum · on Feb 25, 2020

I'm no cloud expert, but - I'm not sure your argument is convincing.

Let's say that, for a unit of computing work, ,the cloud price is N times of non-cloud. I'm being a bit vague here for generality; plus, when not on a cloud, the cost structure is different, but bear with me.

Ok. Now, your load varies by a factor of up to 133x. But unless your peak load is over N times your average, it is still cheaper for you to keep machines which support the peak load and have a bunch of idle time.

Extra benefits:

* Can do other computational work (e.g. experiments) during off hours without impacting system responsiveness.

* Can perhaps put some machines to sleep, or other power-saving measures, during off hours.

Extra detriments:

* You have to take care of more system and cluster administration work than on the cloud.

Above N, the cloud makes sense. Close to N - not sure. Well below N - doesn't seem to make sense.

You'll need to tell us that N is low enough.

vidarh · on Feb 25, 2020

My rule of thumb is that you need peaks shorter than ~6 hours for most "normal" type setups before AWS becomes viable from a cost perspective. Of course that depends on how significant the peaks are. Most consumer websites with an international audience for example, rarely have significant enough peaks.

This is exacerbated because the alternative is not either-or - the most cost effective system is often dedicated hosting + the ability to spin up cloud instances to take the peaks.

Doing so lets you provision your dedicated servers to run at much higher utilization most of the time, and make pure cloud setups look far more expensive, and most people with setups like that end up needing the cloud instances very rarely.

That's not to say there aren't exceptions with genuinely massive peaks, but even then it doesn't take a huge base load before a hybrid system starts to have the potential to bring substantial savings.

satanspastaroll · on Feb 25, 2020

Also to keep in mind, despite it's name, 600m shouldn't blow anyone's socks off in upkeep.

It translates to about 7krps/day. Assuming it's all in one region at day time, it's 14krps/12hrs, or 42krps/4hrs. It's well within a couple high-powered servers even in the worst case

originalvichy · on Feb 25, 2020

Because the cloud providers advertise to tech startups. They are told that the scalability of the cloud is needed, because YOUR startup can become an overnight success and can’t handle the traffic.

This combined with how unsexy most of the VPS or dedicated hosting providers look makes the cloud providers a seemingly good choice for startups.

oblio · on Feb 25, 2020

I'm not sure this is the only reason. Clouds are super, super attractive to stodgy, boring enterprises.

And people here working in start up environments are not aware how much hardware from internal IT departments costs. Internal IT departments are generally way less efficient than the cutthroat cloud providers. AWS/Azure/etc. hosting costs are peanuts compared to what most banks or other big enterprises pay for their internal hosting.

On top of that, internal IT response times are frequently horrible. Any developer worth his pay generally loves cloud providers and abhors IT departments. They're slow to deploy hardware or VMs, update firewall rules, install software, etc.

vidarh · on Feb 25, 2020

The main competition to AWS is not (or should not be) internal IT, but other cloud providers outside the big 2-3 (Digital Ocean etc.), and dedicated and managed hosting providers. I can usually get a server in less than 24h from a provider like Hetzner, but if I need capacity urgently I can also temporarily spin up cloud instances there.

The biggest issue is not that people want to avoid internal IT and/or want elasticity, but the number of people who never even do a proper cost comparison beyond the top tier most expensive providers.

devonkim · on Feb 25, 2020

AWS hasn't been selling itself as a competitor to other clouds for like a decade compared to in-house IT because their market entrance _is_ technology lagging companies. The first demo use cases for, say, SQS back to 2008 were literally failing over databases into the cloud. It wasn't really until Azure that any truly serious attempt to compete cloud to cloud at enterprise scale. For all the HN hype of all the various vendors out there, they're all small fries collectively compared to the many billions of the cloud gorillas like Alibaba, Azure, and AWS. DO, Linode, Vultr, etc. are simply not intended for bureaucracy-laden companies that are the lion's share of the global IT market and also are not even allowed under many enterprise vendor and purchasing agreements (had this problem with several customers' contracts / legal before that would have killed 7+ figure deals).

Most places outside tech hubs put massive emphasis upon being able to swap labor and are technology laggards because their switching costs are so high due to lack of velocity in their work in technology in the first place (and also why waterfall or spiral makes more sense for their efforts rather than Agile probably even today). The irony here is that strong AWS professionals are not cheap in any way, but neither is being dead in the water because your 60+ year old guru graybeard IT guy that managed the racks faithfully for 12+ years retires suddenly.

vidarh · on Feb 27, 2020

> AWS hasn't been selling itself as a competitor to other clouds for like a decade

And yet they're handing out $25k-$100k in credits like candy to startups to prevent them from going to Google cloud or Azure.

devonkim · on Feb 27, 2020

"Selling" is what I was getting at. When looking at the tail end of the funnel every other vendor is giving away drugs to hook kids when they're young. Tail distribution dealers can't afford to do this in contrast.

Another tell is that the ways to "prove" start-up status under most of the big providers' terms for cloud credit align more around "did you get VC funding?" than "are you small?" due to the longer lead time nature of bootstrapped start-ups. Most don't give you credits if you've been around 2+ years and soured on another provider. Maybe I just sucked at it but I was bootstrapping before and had trouble getting decent credit then.

zerg2k · on Feb 25, 2020

I have strong evidence of the opposite. I worked in a hedge fund (unnamed for privacy) they paid top $ for IT admins, which by the way were very capable. An incursion into AWS taught them a hard lesson - the were nearly 4-5 more efficient than AWS at similar hosting capacity. Moving to GPC was their only option to maintain elasticity at relatively “acceptable” prices.

oblio · on Feb 25, 2020

My "anecdata" is stronger than your "anecdata", I'd say.

> they paid top $ for IT admins, which by the way were very capable.

Most companies don't pay top dollar and as a result they have to scrape by with the leftovers. As a result Amazon for sure beats them in sysadmining efficiency.

rumanator · on Feb 25, 2020

> Because the cloud providers advertise to tech startups.

That's not true at all. Cloud services are very attractive to good old fashioned companies, because they express the cost of using computational resources as a simple monthly bill, just like any other utility. Your manager goes through the monthly bills and he sees, say, electricity, water, cloud infrastructure, cleaning, office supplies. That's it. No need to know what a server is or what a spot instance is. If the costs fluctuate smoothly and within boundaries then you simply don't worry about them.

vidarh · on Feb 25, 2020

> That's not true at all.

I have lost count of the number of startups I've evaluated that have gotten $25k-$100k in credits from AWS or Google Cloud, or both. When I was doing consulting I once got paid to first set up an environment on AWS despite them knowing it was too expensive for them, then to migrate from AWS to Google Cloud, then to migrate from Google Cloud to Hetzner - all because my fees for doing the setups and migrations were far lower than the free credits heaped on them by AWS and Google.

There are certainly reasons why people in bigger companies like them as well, though from what I see most of the time it is because they can slip the cost under the nose of the manager one level up in a way that is harder to do with more clearly quantified contracts. It's boiling frogs - as long as the cost just slowly creeps up, it doesn't get queried.

rumanator · on Feb 25, 2020

I am a bit skeptical regarding your claims. Without any detail, if you're talking about companies whose business justifies $100k credits from AWS then you're not talking about normal businesses that have at best a local presence. If your company has a global presence then Hetzner doesn't quit cut it, as they only have data centers on Germany and Finland[1] and the added latency of accessing their data centers from non-european space is something that easily adds 100-200ms to each request.

Hetzner is indeed excellent if you are a cost-conscious european client that is willing to absorb the cost of self-managed bare-metal or quasi-bare metal servers and cares about saving on bandwidth costs. However, if your goal is to provide a world-wide service then you are compelled to look elsewhere. Even competing european service providers such as OVH[2] or scaleway[3] fare better than Hetzner on this domain.

[1] https://www.hetzner.com/unternehmen/rechenzentrum/

[2] https://us.ovhcloud.com/about/company/data-centers

[3] https://www.scaleway.com/en/datacenter/

vidarh · on Feb 27, 2020

I'm talking about tiny startups.

And Hetzner works just fine for most small startups even if/when they have a US audience - the latency is perfectly manageable, though in those kind situations you will certainly want to expand into other regions later.

The main point, though, is that even when you scale, nothing stops you from using Hetzner for Europe, and e.g. OVH for US, and indeed filling in with things like AWS when there are needs you can't otherwise meet.

In reality very few startups gets to a scale where this starts to matter, and overoptimizing for it at the cost of infra spend at the start is a great way of running out of money - seen that happen way too many times.

warrenm · on March 11, 2020

And Hetzner has cloud/VPS/etc offerings in Germany & Finland: https://www.hetzner.com/cloud

smolder · on Feb 25, 2020

It seems some consulting companies prefer to push AWS over other service providers. AWS has the advantage of a lot of professional certifications and a big catalog of managed services to meet all kinds of needs. Cost is often mostly dev time for a lot of software running on it. It does pretty well at saving developers' time.

thu2111 · on Feb 25, 2020

Also: free cloud credits. Don't underestimate that. I know of two startups (one that I work at). Both are hooked on the cloud, albeit one has an insane cloud bill and one has a very reasonable bill.

The reason in both cases is MSFT/AWS gave them massive up front "free" credits, which meant there was no incentive to impose internal controls on usage or put in place a culture of conservation. AWS doesn't think twice before dropping $20k on cloud credits to anyone who wrote a mobile app.

At the company with the unreasonable bill, it's not even a SaaS provider. It literally just runs a few very low traffic websites for user manuals, some CI boxes and some marketing sites. The entire thing could be run off three or four dedicated boxes 95% of which would be CI, I'm sure of it. Yet this firm spends several tech salaries worth of money on Azure every quarter. The bill is eyebleeding.

The problem is everyone who wanted one got a VM. You got VMs being used for nothing except running a single test program that the owner then forgot about. People used expensive hosted cloud services instead of installing Postgres because "their time is more valuable" etc. Free credits created a culture in which the company just institutionally forgot that servers cost money. When the free credits ran out it was invisible to the engineers using the services and simply became another (opaque) line item for accountants to worry about the burn rate. In the rare cases engineers decided to "optimise" they did it by spending lots of time and effort on using Kubernetes so stuff would scale down at night.

There was an abortive attempt to move off the cloud. This was unfortunately stymied by incompetence by both the firm and the chosen hosting provider. It got some boxes from OVH and then didn't pay the bills for them, so the boxes were simply yanked and deleted. All the setup time was lost. In another case OVH allocated machines in datacenters that were far apart but they needed to be close together. In another case the machines were delivered but the hardware on one was faulty. Of course this stuff is a one-time hit and avoidable with non-broken processes, but it empowers those who just want to keep learning Azure/Kubernetes/Docker so they can put it on their CV.

The other firm is much smaller and has much more reasonable costs, they also get more for it (e.g. use Heroku which automates a lot of stuff).

Having observed these two different companies, I resolved up front that if/when I go back to running my own business I will not use cloud services, even if they offer free credits. I'll be taking pride in finding ways to drive server costs down as far as possible. Perhaps even hosting non-HA-required servers at my home given we have duplex gigabit fiber now. Yes, my time is valuable and all that, but establishing a culture up front in which people use the resources they need and not more is even more valuable.

ldoughty · on Feb 25, 2020

Counterpoint: my org got a large credit pool, but out development and engineering team were closely communicating with relevant leadership to explain costs and processes. We stretched out the credits for 3 years, despite spinning up/down nearly a hundred thousand VMs. Our infrastructure team still controlled access and setup, but it was a lot easier to say "director, Bob wants $200/month for this project, approved?" Then it was created in a way the costs tank to that project. Slower than Bob with direct access, but much faster than traditional systems since the VM is up in minutes and could be put in a sandbox if needed.

So in our case, credits got us into AWS, but instead of treating it like free money, we spent a lot of time improving efficiency and reducing our bill so when the credits ran out we could run our system for an affordable rate.

I take pride in reducing costs in the cloud. I don't argue cloud is the only way, but there's a lot of use cases where it just makes sense -- CloudFront+S3 for static websites, for example.

thu2111 · on Feb 25, 2020

Fair enough. Perhaps it's unfair to blame the cloud providers for these situations. Badly run companies can waste money in a lot of ways. Just seems like cloud frequently crops up as one of those ways.

warrenm · on March 2, 2020

But..but..BUT!!! IT'S FREEEEEeeeeeeee!!!!!!!!!!

gigatexal · on Feb 25, 2020

Nobody ever got fire from buying IB ... er AWS. [1]

[1] until the bill shows up ;-)

Actually -- to be honest, we're in the golden age of infrastructure. I love building on cloud. I don't have to worry about the fiddly bits as much and when I do things are often documented either in official docs or in blog posts and things.

whatsmyusername · on Feb 25, 2020

IBM was the 90s. The 2000s was VMWare.

js4ever · on Feb 25, 2020

No one got fired for choosing AWS ... Nor blamed in case of outage because of AWS.

swypych · on Feb 25, 2020

This.

It's sad to say but people are not rewarded by going with a perceived riskier option in an effort to save a couple 100k.

You go with what works and proven, and with what will make your life the easiest. If AWS goes down, then its much easier to explain why. If "Big Bill's Lowcost Cloud Solutions" provider goes down, then your entire judgement will be questioned. Even if Big Bill's Cloud solution is equally or more capable and only half the cost.

karatestomp · on Feb 25, 2020

Yep, it's risk aversion at the individual level. Your solution you fought for breaks once? That's associated with you, personally. AWS breaks three times in the same time period, or ends up costing a ton more than expected, or whatever? Well it's an "industry standard" so no-one gets the blame, even if it gets bad enough that a change (how about Azure!) is initiated.

ksec · on Feb 25, 2020

Exactly. Try telling your management to switch to DO or Linode, which is already the top spot for 2nd Tier Cloud Hosting. Who is there to take the risk of failing for saving that is in the grand scheme of things negligible.

AWS failed? Great you have half of the internet down as well to cover your ass. No one get the blame and everyone continue with their work.

Of course this depends on culture and regional reputation. For example you would have no problem if your Startup in France is using OVH.

pnutjam · on Feb 25, 2020

Not sure about DO, but Linode will bounce your servers whenever they feel like it. They usually give notice.

altmind · on Feb 25, 2020

But some were forced to close as they were unable to afford the bills.

paulddraper · on Feb 25, 2020

B2B SaaS hosting costs are often a low single digit percents.

Considering that software is known to be high margin, there's lots and lots of companies for which hosting can be painful, but not threatening.

danudey · on Feb 25, 2020

> if you have someone with sysadmin/devops skills, it is easily doable.

And if you don't, AWS can be pretty cost-competitive because your developers can handle a lot of the "infrastructure" that you used to need a dedicated sysadmin (or a team of) to handle.

Don't get me wrong, I wouldn't likely spend my own money on them, but it does beat having to try to find an on-call infrastructure person and a reliable data centre with proper routes, and then developers/devops people who can scale, monitor, manage, and maintain queueing services, message gateways, object storage, databases, firewalls, load balancers, containers, VMs, network attached storage, etc.

Also, don't underestimate the value of things like CloudFormation. Being able to make an API call and have an entire cluster configured, with load balancing, backups, multi-AZ redundancy, CDNs, etc., is pretty potent.

AWS might be expensive, but it gives you access to a lot of things that you might not otherwise have, even if you do have a sysadmin/devops guy.

notyourday · on Feb 25, 2020

Because fundamentally there are tiers of service that one need to use:

There's a critical core infrastructure - i would argue that it is:

* some sort of centralized management platform even if platform is just a set of deploy scripts ( 1 instance ),

* Origin DNS servers ( 3 instances - one per AZ)

* HTTP/HTTPS entry points ( 3 instances - one per AZ) ,

* A couple of database servers ( say primary and a backup 2 - instances ).

* I would add job server though it probably could be collapsed into the management platform.

You need to have fixed IP addresses for this ( you really do not want to deal with service discovery at this level ) and you want it to be at a provider that won't ever make you need to renumber, or preferably a provider that does not ever break fundamental things like IP address assignment, or a provider that would run out of resources.

One's "internet presence" disappears when this dies. Running all of this as a core workload on AWS/GCP/Azure is a no brainier - it will cost about $100/mo, be nearly insta-rebuildable and re-deployable and a couple configuration files in git would take care of bringing up the beach head up.

At this point your core is up and everything else becomes service specific. This is the point where costs become a consideration but drive to low cost should not come at the expense of tooling. If by embracing ephemeral resources and existing tooling one can cut the base AWS/GCP/Azure price down by 80% most people would think it is a win over having to invest into building tooling to make containers stable ( as in always behave in a predictable manner ) on a provider that can shave off 95% of the costs.

The biggest issues with the cloud providers is the cost of IO and cost of bandwidth which scales linearly with the workload but that's an issue for very specific subset of customers that should be hiring real ops people.

maayank · on Feb 25, 2020

I use it when needing vast geographical spread - e.g. nodes that are closest to exchanges or some such.

Because so many others use it, if you care about latency to the counter-party server it's often a nice way to ensure low latency (by using the same availability zone).

k__ · on Feb 25, 2020

I didn't have the impression people where choosing AWS for costs alone.

What cloud provider has a similar feature set?

Sure, if you want to meddle around with VMs and containers, you can use pretty much any provider, but if you want to go a step further there isn't much left

mekster · on Feb 26, 2020

What's a popular feature set that only AWS can do and cannot easily roll your own solution?

tuananh · on Feb 25, 2020

there was several reasons for our case. There might be more for other startups. Here are some:

- our clients (airlines) would very much prefer we use AWS over the smaller, lesser known offering.

- AWS offer free promotional credits for startups.

- AWS when utilized right (requires work) is not much more expensive than traditional hosting.

- Etc...

joana035 · on Feb 25, 2020

Those are the most expensive computers I ever saw: https://aws.amazon.com/ec2/dedicated-hosts/pricing/

oblio · on Feb 25, 2020

Maybe you haven't seen enough computers? :-p

Internal IT departments or hosting providers meant for highly regulated environments can charge 5-10x that for a VM of a similar size :-)

joana035 · on Feb 25, 2020

So that will be 50-100x more expensive than Hetzner?

kasey_junk · on Feb 25, 2020

Phr0ztByte · on Feb 25, 2020

I completely agree. Most startup's can do with cheaper or even bare-metal hosting for years before having to use highly-scalable solutions like AWS.

rumanator · on Feb 25, 2020

How much would it cost to get someone dedicated to manage your bare metal infrastructure? In this day and age, where using ansible or terraform is considered close to the metal, how much would it cost you to manage your own server?

And what about scaling?

That's precisely the problem. Bare metal hosts are unbeatable in cost, but fixed costs render them too expensive for a startup. Then, when fixed costs start to become irrelevant, you need to factor in the cost of rearchitecting your solution.

Then, when both of those costs become irrelevant, you already have the entire team trained and experienced in using a cloud service provider.

vidarh · on Feb 25, 2020

As someone who has been that "someone dedicated to managing your bare metal infrastructure": It will typicall cost less than that "someone dedicated to ensuring your AWS setup is correct and works and handle all the regular config changes".

I know that first hand for having done both, and seeing how the AWS systems consistently earned me more money, and how rarely I had to deal with the bare metal (I've done anything from actual own hardware in colos to renting servers from places like Hetzner; when I was handling servers in two separate colos I spent on average a day a year in each of them, the rest was done by "remote hands" at the data centre)

> That's precisely the problem. Bare metal hosts are unbeatable in cost, but fixed costs render them too expensive for a startup. Then, when fixed costs start to become irrelevant, you need to factor in the cost of rearchitecting your solution.

Actually buying hardware is too expensive. But coloing leased hardware or renting on a month by month bases from a dedicated hosting provider costs about the same when you amortize over a three year period unless you're physically located somewhere with cheap land. E.g. I work out of London, and when I was doing this we eventually deprecated the own hardware in favour of renting from Hetzner because colo space in London was so much more expensive that the savings on the actual hardware couldn't make up for it.

> Then, when fixed costs start to become irrelevant, you need to factor in the cost of rearchitecting your solution.

Or you architect it properly from the start. I've done zero downtime migration between AWS, GCE and Hetzner. I've had systems that tied in cloud instances, VMs running on our own hardware, containers running on dedicated instances, and VMS on rented hardware, all tied into a single system. If you run everything in containers anyway, all you need to make that happen is a simple orchestration system and a reliable network overlay, and an architecture that ensures reliable replication of your data.

Once you've done that, you're free to pick and choose and migrate services as you please depending on cost and need, and it really is not that hard to get working - you already do most of the necessary planning if you are setting up a reproducible cloud setup anyway.

tuananh · on Feb 25, 2020

It's more complicated than that. Sometimes, the decision is not just in the engineers' hands. eg - our clients (airlines) would very much prefer we use AWS over the smaller, lesser known offering.

thu2111 · on Feb 25, 2020

What justification do the airlines give for placing demands on your own suppliers?

tyingq · on Feb 25, 2020

It's pretty common for Fortune 500 type companies to ask for all sorts of intrusive terms they perceive as a benefit to them. Indemnity, long net payment terms, penalties for missed SLA, etc. They could be specifying AWS in this case to reduce latency and/or avoid the internet as a dependency. Maybe the client side is on AWS already, or they already have high bandwidth / dual path via AWS direct connect, and don't wish to do that for a different provider.

The reward being that once you're in, they are too big and slow to ever move away from your product.

karatestomp · on Feb 25, 2020

"We want it because it makes our lives (specifically, probably some paperwork) easier. Do you still want our money or not?"

tuananh · on Feb 25, 2020

i'm not familiar with that part. also sale people can make all kind of promises to close the deal.

amedvednikov · on Feb 25, 2020

Not necessarily. StackOverflow runs on a couple of servers.

0xbadcafebee · on Feb 25, 2020

You get what you pay for. You can pay a lot of money and have almost no worries (and lots of free time), or pay a little money and have lots of worries (and no free time).

And since the OP was talking about Kubernetes, and you're talking about "lock-in", another reason I love AWS is it is not a lock-in device. I can use any service of AWS's by itself, without being forced to use anything else or do something in a proprietary way. All the interfaces are a command-line or REST API with JSON data formats. They are all designed to operate by themselves, so you can provide replacement components at any time, hosted anywhere.

On the other hand, you can't just use one part of K8s, because you have to at least set up and manage an entire cluster first. And there's dozens of services K8s simply does not have, and other hosting providers don't have.

vraivroo · on Feb 25, 2020

Baloney. All the major clouds offer managed k8s. And using k8s in no way precludes you from integrating with services that are not on k8s (of course though, k8s can run anything you can containerize). So AWS has vastly more "lock-in" debt than k8s.

0xbadcafebee · on Feb 25, 2020

To replace an AWS service, you just plug in a different service integration, or use a literal API complete clone. To replace the use of any part of K8s with something non-K8s, you basically have to replace all your use of anything in K8s. So the debt is much higher with k8s because it takes much more engineering work to get off it and it's not compatible with anything but itself.

Furthermore, just because someone has a managed k8s doesn't make it less lock-in or less work. With AWS you don't need to use a cluster of anything. With k8s you are signing yourself up to tons of complex services and specific design and operation paradigms. With AWS you have no such inherent restrictions.

K8s is inherently more complex and difficult to use than AWS services, which aren't even a good comparison because they are so simple by comparison.

darkwater · on Feb 25, 2020

> You get what you pay for. You can pay a lot of money and have almost no worries (and lots of free time), or pay a little money and have lots of worries (and no free time).

This is, in my experience, quite a false statement when talking about AWS. You are either a (team of) 10x engineer(s) capable of anything or as you grow in people and services you will need to use a good chunk of your developer time managing AWS, or hire a dedicated person (sysadmin).

0xbadcafebee · on Feb 25, 2020

There are literally compete turn-key solutions in AWS marketplace to most things any business needs to do that you don't need to be an engineer to use. A monkey who can read a walk-through with screenshots can build clusters of apps and CI/CD pipelines with AWS. And there are thousands of companies that you can pay to help manage your AWS resources.

Just because businesses hire dedicated backed people doesn't mean they actually need to, they just feel better about it when they do, because that's how it was always done before.

chundicus · on Feb 25, 2020

Here's my two cents as a developer who has only paid the cost of AWS once for a one-off project (in retrospect I could've gotten more value with similar processing power with a different service).

AWS is established and relatively straightforward. I found the user experience to be pretty seamless. Maybe there are better alternatives but I've found Google Cloud Platform and IBM Cloud to be absolutely miserable and confusing (the latter more so).WRT Google Cloud Platform I think the main sin is the UI/UX, and if I had to foot the bill I would give them another consideration. I had very limited (but good) experience with DigitalOcean and Heroku, but those just don't have the infrastructure at scale to compete with AWS in my opinion.

What do you think is the best alternative to AWS?

Cthulhu_ · on Feb 25, 2020

I wonder if a lot of startups think "What if we're an overnight success and we have to scale up fast?". This is of course what Amazon makes sure to use in their marketing as well.

Examples: Twitter became far more successful than their architecture and infrastructure could handle back then.

Pokemon Go became an insanely huge success far exceeding their worst case scenario. There's a postmortem/whitepaper on Google Cloud Engine about that IIRC.

zeristor · on Feb 25, 2020

The white paper on Pokemon Go ramping up on GCP after launch:

https://cloud.google.com/blog/products/gcp/bringing-pokemon-...

manigandham · on Feb 26, 2020

It's all relative. For most businesses, productivity is more important than the cost of infrastructure because it creates more revenue to cover the cost, leading to a net profit.

That being said, it also makes sense to apply some basic optimizations to reduce costs by a significant margin, as long as the work involved in optimizing pays off. A few days to cut costs by 50% makes sense, a few months to save a few dollars doesnt.

hinkley · on Feb 25, 2020

i have absolutely no idea how much my company is paying for AWS. I expect I will soon, but it’s a little alarming that this data is not readily available to me. I think I am not the only one in that boat.

For people who know very well? I think it’s not unlike interpersonal dynamics. Codependency looks very different from outside versus inside. It’s difficult to tell why your friend keeps investing in this bozo, coming up with ever more elaborate ways to manage them and their moods.

And over lunch a mutual friend will discuss how they might be better off with someone else, and once in a great while someone will ask if maybe So-and-so might not be better off alone.

AWS is a dishonest partner. They have decided that you not knowing how much this is going to cost you is a good thing, and give no evidence that they are willing to change. This is who they are. Do the good things about them make up for that kind of bad thing? I don’t think so. I think you should look for another partner. Maybe just something light, not a serious relationship. Or maybe try some alone time and see what that’s like.

StreamBright · on Feb 25, 2020

I am not sure they are obsessed with AWS. I think they are obsessed with scalability (both technical and financial), elasticity (turning off capacity that you do not use), security compliance (required by many industries), reliability, availability, and support in case something goes sideways. Containers and Linux are orthogonal to this.

vorpalhex · on Feb 25, 2020

If you need a few servers, then yeah, AWS is super pricey and locked in.

If you need 50 engineers to be able to manage 200ish servers, well, uh, you just don't have a ton of options.

Obviously not everyone falls into that second category but plenty do.

snark42 · on Feb 25, 2020

> If you need 50 engineers to be able to manage 200ish servers, well, uh, you just don't have a ton of options.

Who needs 50 engineers for 200 servers? My IT team manages 750 servers in 5 co-locations + 200 end users + DBA responsibilities with 5. If we grow above 1000 we might need to hire a 6th.

papito · on Feb 25, 2020

It's also a very common skill. A lot of people are familiar with AWS. What money you would save by using an alternative, you will lose on training new recruits, or mistakes made from lack of expertise.

mekster · on Feb 26, 2020

Many of the knowledge works across other vendors that are just Linux. What AWS specific feature are people so desperate to use to get themselves locked up?

dirtydroog · on Feb 25, 2020

We moved to GCP from AWS because we thought it'd be cheaper. Turns out it wasn't that much cheaper at all. But Google paid for the transition so there's that...

joana035 · on Feb 25, 2020

Yes, and this is sad.

I have seen this as the outcome of a bubble which keeps repeating that "having to maintain is bad", "servers are bad" or "this is old".

wtf1234 · on Feb 25, 2020

AKA marketing.

NicoJuicy · on Feb 25, 2020

According to what think: accounting.

The bill of the cloud vendor is already approved and the 3rd party apps don't need to have a seperate approval process.

whatsmyusername · on Feb 25, 2020

You'll never get fired for suggesting AWS. And IAM is the killer feature.

stjohnswarts · on Feb 25, 2020

I think it's probably the new "no one got fired for buying IBM"

CSDude · on Feb 25, 2020

Because AWS is not just server hosting company.

krick · on Feb 25, 2020

What do you suggest instead of AWS?

voidfunc · on Feb 25, 2020

Two reasons come to mind:

1. Nobody gets fired for picking AWS. 2. Plentiful amount of talent that understands AWS and AWS patterns.

ldoughty · on Feb 25, 2020

AWS can be expensive, but it comes down to engineering and needs.

I'd argue it's most expensive for small (10-50 person... ish) businesses -- to small to hire a LOT of overhead staff, but not small enough that you're still in POC/MVP stage.

A lot of ideas can be tested in AWS essentially free. However, if you pick the wrong tools or don't anticipate future bills, you can end up with additional costs and need to re-engineer slightly.

That said, it works best with scalability concerned workloads. If you only need 10,000 machines for a few hours/days (research workloads), or your load is much heavier just 9-5pm, or you need vast constant scalability.

My org currently has 19,500 VMs in AWS right now -- they are student VMs running a mix of OS's. While many of them are offline for days or weeks at a time, we can easily start 5,000 of them in a minute or two if needed (though we usually don't see more than 400/minute). Sure, we can run these in a large virtualization system of our own, but we're only 3 developers, a single system admin, and one cloud focused person (though we do cross train a lot)... So AWS allows us scale quickly without as much concern on the underlying architecture.. purchasing equipment... Dealing with hardware failures or warranty/replacement claims. We can spend our 9-5 building, improving, and optimizing the system (including reducing costs). With this setup, nearly anyone in our team has the capability to research a new component of we really want to add it (like websocket support) and play around (in a sandbox), without getting others involved until it is ready for a POC demo... Then we can have a new feature in production in days/weeks despite needing new equipment.

All that said, I'm not a cloud-is-the-only-way evangelist... If you don't need any of that, you can fairly easily make a cloud agnostic system with container... Using ECS for many typical use cases is rather portable to other container systems. If your making larger lambda packages that can handle many paths (i.e. ELB/API Gateway Lambda proxy), they can move with less work than 100 purpose built functions.

If argue most people unhappy with AWS have leadership with goals or ideas sold to them that are silly (lift and shift)... Or staff that are not trained on the cloud provider so they don't know the right considerations for building in the provider they pick. I'd wager many developers don't even know how to use Dynamo DB properly for the first year or two they use it. It's hard to say it's AWS's fault if you have a large bill because your developers 'SELECT * FROM users' every time you lookup a single user... and AWS scales to support the inefficient scan (how does a DB provider know if your application logic only uses one value?).

Anyway, I hope this gives some insight on why orgs use AWS.

bitxbit · on Feb 25, 2020

You can do a lot with lambda while keeping your costs down.

juliansimioni · on Feb 25, 2020

On the one hand, building fault tolerant infrastructure that can, as a side effect, work painlessly on spot instances is great.

On the other hand, you can purchase reserved instances and get ~60% cost savings with zero engineering work. Its worth thinking long and hard about whether the cost of engineering time is worth that next 20%.

There's also a lot of useful ground in between "critical state, must never be lost (like a database)" and "can handle being terminated with 2 minutes notice". A service that can be re-created if necessary but takes 10 minutes to start up is really scary if run on spot instances, but can still be pretty useful.

jrockway · on Feb 25, 2020

Back when I used AWS, I went the reserved instances route. The pricing is pretty okay, and you are guaranteed to have machines at busy periods of the day.

The problem with spot autoscaling is that if everyone does it, it stops working. Everyone in us-east-1 gets most of their traffic from 9am EST to 5pm EST, because everyone hosts close to their users, their users are human, and humans are diurnal. Most "batch" workloads that people have also follow the same cycle; they're working on something in the office and want results now, not tomorrow morning. So they run their batch jobs during the day. If you can figure out how to get your traffic spikes from 9pm until 5am, spot instances are going to be great -- millions of CPUs are sitting idle waiting for your novel workload. But if your customers are working 9-5 jobs like you, and you care about latency enough to host close to them... you're competing for instances with every other computer user in the region.

lukasLansky · on Feb 25, 2020

Most of batch workloads are not that latency sensitive. You can send them to the other side of the world as everyone is sleeping there right now.

satanspastaroll · on Feb 25, 2020

A thing to keep in mind is AWS's major outobound traffic costs, at $90/TB. So taxiing the data oversea might not be affordable either

oblio · on Feb 25, 2020

Even for inter-AWS region transfers?

mi100hael · on Feb 25, 2020

It depends on how the data is being transferred. In general, the answer is yes because you'd be using public IPs.

If you're transferring between EC2 instances in peered VPCs meaning you can still use private IPs, it's more like $20/TB.

https://datapath.io/resources/blog/what-are-aws-data-transfe...

jrockway · on Feb 25, 2020

That's not been my experience, actually. The first CI service I used was hosted entirely in Europe, and we needed to ssh in to debug something, the keystroke latency was maddening. We eventually unsubscribed and just bought a big EC2 instance in us-east with Jenkins running on it. It cost approximately 100x more, but our productivity was high and frustration low. Well worth.

I personally think it will breed huge organizational problems if things like CI are slow. "I'll get a cup of coffee while this runs" and then you come back and forget what you were going to release. Soon it becomes "let's get another change into this build before we release" and then it's "well, it's been six months since we've released anything, what do we do." You have to start fast and stay fast if you want to keep developers productive. So saving a couple bucks on computers that are half a world away can end up being a huge expense if you're not careful.

As other comments mention, you also have to be careful about transfer costs. In the CI case, getting your source code into the CI server is cheap, but getting the containers out is going to cost you, especially if you don't make an effort to optimize them. For batch data processing jobs, the same applies; getting the result out is cheap, but getting the data in is going to be a lot of transfer. (If you were using Small Data, you could just run the job on your laptop, after all.)

The speed of computers half a world away is not great either. I remember updating some Samsung drivers once, which were served out of a Korean AWS region instead of CloudFront... and the downloads were glacially slow. Their website is the same way. I couldn't believe how a multinational corporation could push bits at me so slowly. When you're reading their documentation all day, or tweaking drivers, you notice it, and you start to think "next time I'm going to buy Intel". (Compare Samsung's SSD website with McMaster-Carr's website. What site do you hope to interact with again in the future?)

Anyway, you get a bill for compute resources, and you don't get a bill for unhappy employees context-switching all day, so I see why people want to craft clever schemes to save pennies on their compute costs. But be careful. Not every cost is charged directly to your credit card.

heartbeats · on Feb 25, 2020

What about using something like mosh for latency?

NikolaeVarius · on Feb 25, 2020

Note, AWS savings plans make this even easier

https://aws.amazon.com/savingsplans/

ckdarby · on Feb 25, 2020

^ Came here to say the above. Majority of the time the Saving Plan is actually what you want.

Unless you have legacy app you know must be around for 3 years and have zero efforts to try to refactor.

40acres · on Feb 25, 2020

Any idea of when this product was introduced?

NikolaeVarius · on Feb 25, 2020

End of 2019 https://aws.amazon.com/blogs/aws-cost-management/reinvent-ro...

40acres · on Feb 25, 2020

So is this blog post redundant?

QuinnyPig · on Feb 25, 2020

No. Spot gives deeper discounts than savings plans do.

chillydawg · on Feb 25, 2020

And you can stack them.

halbritt · on Feb 25, 2020

I did something like this in GKE with preemptible instances which are guaranteed to go away at least once every 24 hours. I had separate node groups for stateless and stateful work loads. My clusters were roughly a 50/50 mix of each. Worked out pretty well and yielded some decent cost savings.

tuananh · on Feb 25, 2020

preemptible instances are worse than spot. we observed there were spot instances that last a year for us.

halbritt · on Feb 25, 2020

That presumes that an instance type that last longer is "better".

ody4242 · on Feb 25, 2020

Yes, preemptible instances are expected to be expendable, I don't see this as a bad thing.

tuananh · on Feb 25, 2020

i believe what he meant is by being shutdown more frequent, we will be forced to build a more resilent system. otherwise, we would be more complacent.

rawoke083600 · on Feb 25, 2020

Yea were wondering the same thing... How much does this cost in engineering hours ? Its like asking someone who smokes... they always forget/lie about at least 50% of the cigarettes... And even if the engineering hours was super minimal compared to reserved instance... will this be true for other teams ?

tuananh · on Feb 25, 2020

i hear all these engineering hours cost but only because engineering cost in US is so high. we're in Southeast Asia where senior engineer only cost $1000 a month back then in 2016.

satanspastaroll · on Feb 25, 2020

Isn't cost of living also much lower in Southeast Asia?

mjlee · on Feb 25, 2020

Sure, but AWS infrastructure costs pretty much the same no matter where you live.

tuananh · on Feb 25, 2020

i believe he was talking how it cost more in engineering cost versus the saving we can get from infrastructure by migrating.

eg: if the infra cost down by 10,000 USD per month, they may say it's not worth it because they are paying more for 1 developer in US in a month.

ec109685 · on Feb 25, 2020

Reserved instances and savings plans are predicated on predicting usage for 1 to 3 years in advance.

With spot instances, you don’t need to do this planning, assuming you can fall back to reserved quickly enough.

johnmarcus · on Feb 25, 2020

this is why i love google cloud over aws these days - i can get that cost saving with doing none of the work nor making any of the commitment. It's usually that commitment business hate, imho.

tuananh · on Feb 25, 2020

why not both :)

we did mix reserved instances and spot instances for our production workload. worst case scenario will be reserved + on-demand

collyw · on Feb 25, 2020

And how much developer time did it cost?

We have done the same - our bills went down, but not by as much as 80% I think closer to 50%. But it took a fair bit of developer time, and we now have a lot of Kubernetes related problems to deal with. I guess those will smooth out over time, but I don't think anyone ever factors in this stuff when they claim great savings. Developer time ain't cheap.

On a plus note, running multiple small boxes via Kubernetes does give you a more high availability system. If one instance goes down, there will still be another one available, so it's not all negative.

satanspastaroll · on Feb 25, 2020

I think most people over-estimate the need for availability anyway. It's tempting to build the very best you can, when in reality a business really needs a good-enough solution. A whole ton of software can be down every weekend if needed.

A great in-between is to simply have a backup server ready to go in a few minutes time. Super simple compared to orchestrated container system.

Of course for client projects that spec out a certain number of 9's must be done just so, but can also be billed accordingly.