Another drawback is that while yes, you can scale up fairly easily with terrafor...

vidarh · on Jan 4, 2020

You can set up autoscaling groups via terraform just fine, with a little bit of care taken to ensure that you trigger on the right metrics.

If anything mail is pretty much the easiest thing you can possibly pick to scale, because the inbound mail will be automatically retried. And haproxy in front of SMTP servers works just fine (really, any load balancer that can load balance raw TCP connections, but I've used haproxy to load balance thousands of messages a second).

For your user-facing side you need a bit more resiliency, but nothing stops you from using a service like SES to back a traditional setup for sending either. Reliably scaling outbound mail is the easy bit - the hard part is the reputation management that managed mail services provides, and no specific deployment mechanism will solve that.

danenania · on Jan 5, 2020

Sure, but for heavy/bursty traffic, you can still have downtime while new VMs spin up. Retries might save you or they might make the problem worse, depending on the size and pattern of the burst and how your auto-scaling config interacts with the retry config of various hosts.

It may seem like a nitpick or something not worth worrying about, and for most that's probably the case. But for some businesses it could be a crucial difference. My point is simply that this is a legitimate benefit of serverless that wasn't mentioned above--I didn't think that would be a controversial point.

vidarh · on Jan 5, 2020

That is no different for serverless. You don't magically escape startup times - you need to carefully ensure that cold startup times are low enough, or that you maintain excess capacity to compensate.

The precise extent is different between different platforms depending on overheads, but that just means the point at which you need to trigger scaling is different.

You can find lots of descriptions of approaches people have taken to keep serverless instances running to avoid the cold start delays to work around this... For autoscaling groups you'd instead configure the alarm points used to trigger the scaling accordingly.

Serverless platforms tends to assume the startup will be fast enough to keep the connection open rather than return an error, but that is a load balancer config issue - you can set up retries and wait for other platforms too if it makes sense.

(Though for email this really does not matter - retries for all commonly used mail servers follow some form of exponential backoff up to many hours at least; retries works just fine in practice)

speedplane · on Jan 5, 2020

> That is no different for serverless. You don't magically escape startup times - you need to carefully ensure that cold startup times are low enough, or that you maintain excess capacity to compensate.

Serverless deployments are just another ladder step up the abstraction level, continuing the tradition that hardware doesn't matter. Similar to code compiled into assembly or a garbage collector managing memory. In the common cases, these cases are harmless (otherwise they wouldn't be popular), but they generally hide what's actually happening. Doing a garbage collection on a 200MB app is generally pretty snappy. But doing one on a 32GB server app can take seconds or minutes.

Abstractions like these are fine, as long as the limits of the abstraction are well understood. Sadly, that is rarely the case.

crtlaltdel · on Jan 4, 2020

> Reliably scaling outbound mail is the easy bit - the hard part is the reputation management that managed mail services provides, and no specific deployment mechanism will solve that.

^this. if you want to send email its not hard....but if you want that mail to pass spam filters its a different problem altogether. hosted services like SES and mailgun will expose problems in how you are using emails (not handling bounces, not handling unsubscribes, etc) and in our case was very helpful.

arno1 · on Jan 4, 2020

Yeah, this is normal. One bus can't fit more people than it physically can.

The high load can be alleviated by the use of more MX server DNS records (and the MX servers of course, across the different locations), LBs, smarter thresholds. Of course nothing is a panacea.

Either way you will hit the AWS's limits or will get a huge bill. And then, even if you set up the budget limits, it still won't make the service more available once you reach the limits.

danenania · on Jan 4, 2020

If you're running a saas and the increased traffic comes from paying customers, you likely prefer a huge bill to downtime.

But apart from that, there's a huge benefit in saying "I'm happy to spend any amount up to X" and not needing to do any capacity planning beyond that vs. continually trying to guess what's the right % to over-provision your VMs and having downtime when you get it wrong.

arno1 · on Jan 4, 2020

Yep, but how can you be sure the serverless provider will never go down? I've witnessed multiple times when AWS's services went down.

> If you're running a saas and the increased traffic comes from paying customers, you likely prefer a huge bill to downtime.

Well, in such situation, I would probably run more advanced container orchestrators such as Kubernetes which you will then configure to automatically spawn the additional server instances.

Of course there are certain advantages in running a serverless code as you have just mentioned, but since my primary concerns are "my data is mine" + "no vendor lock-in" + "I control all the gears", it is not the best option for me. Unless I want to run & provide the serverless services by & for myself.

It's always a game between the security (more freedom) and the convenience (less freedom). Though, for some, there is more freedom in the convenience (until they start to see their hands are in the digital cuffs :P)

danenania · on Jan 4, 2020

The serverless provider can go down just like the VM provider can go down, but the key difference is that it won't go down due to traffic bursts.

Auto-scaling helps, but it still takes awhile to spin up new VMs, and you'll have downtime in the meantime for sufficiently large bursts.

On lock-in, in my experience with any non-trivial infrastructure you end up tied to your provider anyway. You're still going to need IAM users and VPCs and subnets and provider-specific auto-scaling and all that jazz. Serverless lock-in is deeper, but either way switching is a big project.

arno1 · on Jan 6, 2020

This is why the important projects must be thoroughly planned and tested in order avoid non-trivial infra or a blind cat situation. And then it is a really good idea to leverage IaaC (Infrastructure as a code) which would bring the whole thing up really quick.

dillonmckay · on Jan 4, 2020

Email itself has built-in resilience for such scenarios.

z3t4 · on Jan 4, 2020

If it's a well behaving e-mail server it will keep trying to send the e-mail... A trick to stop spammers is to block all new connections for an hour. While spammers wont try again. Sadly some legitimate e-mail servers will not try again either :/ Also some e-mail servers wont try your backup e-mail server... Some servers will even give up if they haven't been able to establish a connection within a second. Some developers/admins give zero shit about edge cases and conditions outside their developer machine. Especially if it's a company that buys invoices they will take any reason to drop the e-mail so they can add a reminder fee.

kortilla · on Jan 4, 2020

It’s a mail server. Traffic bursts aren’t really a thing unless you’re servicing thousands of users.

danenania · on Jan 5, 2020

Right, I'm looking at this more as a potential backend for an email-heavy saas, in which case I think handling bursts without downtime could be pretty important. If you just need a mail server for yourself or a small company, I agree it's not an issue.