Hacker News new | past | comments | ask | show | jobs | submit login
EC2 Boot Time Benchmarking (daemonology.net)
108 points by cperciva on Aug 17, 2021 | hide | past | favorite | 57 comments



So basically, the latency from "CreateInstance" to "I can SSH in" for the fastest AMI is roughly 3 seconds, with max being 8 seconds.

That's actually pretty solid. If it were 5-10x faster than that you could probably fit that into a lot of interesting use-cases for on-demand workloads. The bottleneck is the EC2 allocation itself, and I'd be interested in seeing what warm EC2 instances can do for you there.

That said, I think for the majority of use cases boot performance is not particularly important. If you want really fast 'boot' you might just want VMs and containers - that would cut out the 3-8 seconds of instance allocation time, as well as most of the rest of the work.

Curious to see a follow up on what's going on with FreeBSD - seems like it takes ages to get the network up.


>If it were 5-10x faster than that you could probably fit that into a lot of interesting use-cases for on-demand workloads

That's basically lambda although you lose control of the kernel and some of the userspace (although you can use Docker containers and the HTTP interface on Lambda to get some flexibility back)

Under the hood, Lambda uses optimized Firecracker VMs for < 1 sec boot

>I think for the majority of use cases boot performance is not particularly important

Anything with auto scaling. I think CI is probably a big use case (those get torn up and down pretty quickly) and you introduce hairy things like Docker in Docker unprivileged builds trying to run nested in a container


Yeah, CI is a good use case. Even autoscaling I kinda feel like you need to be a lot faster to make a huge difference tbh though.

And yeah, Firecracker is pretty sick, but it's also something you can just use yourself on ec2 metal instances, and then you get full control over the kernel and networking too, which is neat.


There's no fundamental reason Lambda needs to make you lose control of the kernel; I'd love to have Lambda with a custom kernel, and that doesn't seem like it'd be too difficult to support.


You can do lambda with containers which should get you close, I think.


A container does not include the kernel, so it doesn't get any closer. I just want a single static binary and a kernel, not a full container.


I wonder why they don't expose a kernel instead of just the rootfs. It's hard to imagine a great reason. Maybe they harden their guest kernel?


At one point, Lambda didn't expose the ability to write custom runtimes, and only provided runtimes for specific languages. People reverse-engineered those runtimes and figured out how to build custom ones. Eventually, Amazon provided a documented, well-supported way to build and run custom runtimes, but that required documenting the interfaces provided to those runtimes (e.g. environment variables and HTTP APIs instead of language-runtime-specific APIs).

I'd love to see Lambda support custom kernels. That would require a similar approach: document the actual VM interface provided to the kernel, including the virtual hardware and the mechanism used to expose the Lambda API. I'd guess that they haven't yet due to a combination of 1) not enough demand for Lambda with custom kernels and 2) the freedom to modify the kernel/hardware interface arbitrarily because they control both sides of it.


Yeah I'd bet that it's just a "haven't justified this work yet" kinda thing. We just run Firecracker VMs ourselves.


Interesting.

How does that work?

How do Firecracker VMs differ from containers on lambda or fargate?


Lambda uses a stripped down Linux kernel (afaik it has some syscalls removed)

The kernel surface is part of their security model. There's some details here https://www.bschaatsbergen.com/behind-the-scenes-lambda

E.g. exposing the kernel would undo some intentional isolation


I've seen that, but I wonder to what extent they've done custom work and to what extent they've just used established Kconfig options to compile out surface area they're not using.

In any case, Firecracker+Nitro seem like they'd be a sufficient security boundary.


Curious to see a follow up on what's going on with FreeBSD - seems like it takes ages to get the network up.

Our BIOS boot loader is very slow. I'll be writing about FreeBSD boot performance in a later post.


Hi Colin

Do you know if there is any plans on FreeBSD creating a super minimal server version that can be used as a docker host ... similar in size to Alpine Linux.


I know lots of people have made stripped down versions of FreeBSD, e.g. nanobsd. Beyond that, I'm not aware of anything specific... but I probably wouldn't be since I don't pay much attention to the container space. Try asking on one of the mailing lists maybe?


Where did you get 3? The fastest numbers I could see in the post add up to 7-8s


Per the post it’s 8.4 and independent of the OS:

> The first two values — the time taken for a RunInstances API call to successfully return, and the time taken after RunInstances returns before a DescribeInstances call says that the instance is "running" — are consistent across all the AMIs I tested, at roughly 1.5 and 6.9 seconds respectively

“Running to available” is what’s in the table, ranging from 1.23s to 70 or so.


>use-cases for on-demand workloads. Yea true ! Maybe depending on the availability maybe they can "spawn up" some x-amount of "spare servers" to get the median time even lower..


> Curious to see a follow up on what's going on with FreeBSD - seems like it takes ages to get the network up.

FreeBSD rc executes all rc.d stripts sequentially, in one thread. OpenRC AFAIK can start daemons in parallel, but unfortunately switch to OpenRC was abandoned: https://reviews.freebsd.org/D18578


Benchmark that would interest me would be time from

"launch GPU spot instance" to "I can actually use the GPU, at least by running nvidia-smi".

I find that this can take up to 10 minutes and for the more expensive instances, this can mean non-negligible amount of money.

Great article BTW


Oh, that explains! I thought somehow the driver installation broke despite having made an AMI of the same instance type with drivers already installed, and so I just run the install shell script every time and then it works. I figured maybe some hardware address is wrong or it's subtly different hardware, but it being time-based makes a lot more sense.


Absolutely, same here. Additionally there seems to be very little consistency with GPU instance start up times, I’ve had 30 seconds one moment and 5 minutes another. Can’t say I’ve experienced 10 minutes luckily.


Yeah, less than a few seconds with spot consistently would be nice, but i've never seen it. When I was handling autoscaling via the ec2 api + python + ngnix, the daemon I wrote pretty much had to have a while loop to continuously check connectivity via ssh after a t3a.medium (with ubuntu) was kicked off from `ec2.request_spot_instances`

Perhaps I should have been using "Clear Linux 34640"


Probably depends on your spot price.. we always set maximum spot price at the price of “on-demand” instances but due to how priorities work it may also sometimes not work when the data center in given region is out of capacity for given type.

TBH the bigger issue than boottime for AWS is lack of physical resources to fulfill the demand - this is what most big players are struggling with.


No no, the spot price matches, the instance boots, you can ssh to it and do everything except using the GPU. E.g. you try to run your pytorch NN training, but it freezes for 5-10 minutes, then it runs fine. If you start your training again, it runs immediately.


I wonder why there isn't a well known firm running these kinds of benchmarks and "true" uptime reporting for the major clouds -- knowing perf stats for S3 or SQS or Kinesis etc in various geos and AZs seems like it would be useful.

It's not a big edge, but knowing about downtime before others could certainly be an edge


Gartner does exactly this - including boot times which are calculated using similar metrics (but over longer time periods and with more runs).

Not sure if the data is freely available, but I used to be part of the team at AWS that calculated these. See [0].

[0] https://www.gartner.com/en/cloud-decisions/benchmark-library


Wow, up until now I only associated Gartner with the magic quadrant, had no idea they had this.

It doesn't seem like they'll give you up to the hour information and you do have to pay for "Gartner for Professionals" which I'm not familiar with but it's definitely out there.

The key to this has to be supporting it cross cloud IMO -- people have the AWS vs GCP vs Azure conversation so often and I think most people know that GCP is best performance for the buck, but the other reasons to pick the other two are numerous. Might be nice to have some numbers on just how much performance differs against the others since that's one of more quantifiable things to weigh.


Do you know what takes the majority of those 8 seconds on AWS?


I've been doing a lot of work optimizing EC2 boot time lately. I can get an instance up and connecting outbound to a network port in a few hundred milliseconds, with plenty of room left to optimize there. But all instances do indeed seem to take at least 8 seconds to come up on the AWS end, as this article points out. This has been why I haven't put much effort into optimizing further, because that 8 seconds is entirely out of my hands.

I've tried changing the various parameters under my control: assigning a specific network address rather than making AWS pick one from the VPC, disabling metadata, using a launch template, not using a launch template, using the smallest possible AMI (just a kernel and static binary), using Fast Snapshot Restore on the snapshot in the AMI, etc. None of those makes the slightest difference in instance start time; still ~8 seconds.


would be interested to know the 95th percentiles as well as median (calculated over more runs than 10 I suppose). Its often worst case times that are important rather than average or median.


This. I'm surprised that even people who are very well-versed in math omit statistics other than an average or, at best, a median. Is it really just me who wonders about, and in this case really expects, variability in the value? Especially given the huge differences, you might have had a few slow or a few lucky starts that shifted the middle value. Probably they saw the raw data and didn't notice such huge spikes and would have changed the method if they had, but it's not even mentioned so this is pure speculation on my part.


What’s the secret of Clear Linux to boot so fast compared to other distributions?


Here is a presentation from 2019 showing how they reduced Kernel boot time from 3s to 300ms: https://www.linuxplumbersconf.org/event/4/contributions/281/...


That's for the automotive embedded version of Linux they created. The AMI isn't quite the same, but they have a kernel specifically patched and optimized to run in a kvm hypervisor, they're using systemd-boot, which is a bit faster than GRUB, which is likely the bootloader these other distros are using.

What else they did would probably require booting into Clear Linux myself and checking it out, but disabling services you don't need so systemd isn't wasting time, baking all kernel modules you know you'll need directly into the kernel instead of storing them as modules, and not using initramfs if you don't need it is probably the "general" answer for how to get a faster boot.

If you really want ultra-fast, you can turn Linux into a unikernel by compiling the kernel with your application as init, not using any other services at all, and booting via the EFISTUB instead of using a bootloader at all. I don't know if the AWS EC2 services provides any way of doing that, though. On your own machine, you can either use efibootmgr from a different booted system on the same BIOS, or by setting boot order in the BIOS, or naming the EFISTUB whatever that magic name is that EFI automatically loads if a boot order isn't configured in NVMEM.


Restructured the boot sequence and probably stripped everything which did not fit in a server / cloud context, also mayhaps removed everything which was not necessary from the base. Not unlike openbsd.


It would be interesting to know how fast is bottlerocket[0], which is Amazon's optimized OS for running containers, compared to Clear Linux.

[0] https://github.com/bottlerocket-os/bottlerocket


Afaik Clear Linux is the one using Intel's proprietary compiler optimized for Intel CPUs (so it's theoretically generating more efficient code)


“Better compiler options” don’t translate to a 10x lower “time to accessible”.



Also would be interesting comparing regions. us-east-1 is by far the largest so would be interesting to see what, say, us-west-1 or us-east-2 look like


Nice to see this.

Median is an odd metric to use. I think I might truly be more interested in the mean; maybe the geometric mean.

I am also very interested in worst-cases with this sort of thing, so the 80th and 95th and 100th percentiles would be helpful.

It would be interesting to see plots of distributions, as well: are boot times really unimodal? They seem like they could easily have multiple modes, which could really mess with all of these measures (but especially the median!).

Because the script is open source (thank you!) I may try this myself.


I used the median mainly because it was robust against any network glitches which occurred during testing.

That said, I did my final measurements from inside EC2 (rather than from my laptop over wifi) so it probably wasn't necessary.


Network glitches will have a one-sided effect: they will make you slower, but never faster. Trimming out the top 25% slowest measurements, or even just using the minimum, can be more effective. Those are common when profiling programs for exactly that reason - the median execution time for a loop might be influenced by OS weirdness like some background job, but the minimum will hopefully really just measure the loop.

Anyway - probably not a big deal, probably mostly academic, although modal behavior would be really interesting.


Yes and no. If there's a network glitch while we're polling to see if the instance is "running", it will result in the "pending-to-running" time being high but the "running-to-port-closed" time being low.


https://github.com/cperciva/ec2-boot-bench/blob/master/ec2-b...

C is an interesting choice of laguange for this task. They even implemented a rustic XML parser in there.


I wonder if EFI boot with something like efistub would be a even faster. Afaik EC2 BIOS boots which has some fixed overhead (although Clear Linux still manages to boot very quickly)

Also on boot, I /think/ EC2 pulls down the AMI to EBS from S3 so theoretically smaller AMIs might be faster but not sure


My experience with desktop hardware is that traditional BIOS boots faster than EFI, because the former does much less as well. (This is not the difference between "BIOS boot" and "EFI boot" options in an EFI BIOS, but rather a "pure native" BIOS vs "pure native" EFI with the same motherboard.)


Was that with or without a bootloader? My understanding was EFI was marginally faster to initialize (which might not be true based on what you're saying) but you could also skip the bootloader and load the kernel directly (EFISTUB) which would also eliminate initram (which seems fine assuming you're using vanilla VMs on popular hypervisors/providers)


I've tested both EFI and BIOS boots on EC2, and they have about the same overhead.

And I've tested AMIs that are no bigger than a Linux kernel and a single static binary; times are still on par with what's listed here. Still takes 7-8 seconds before the first user-controlled instruction gets to run.


Gotcha, thanks for the info

That's a bit disappointing but I guess AWS has spent a lot of time optimizing Lambda vs EC2


> EC2 pulls down the AMI to EBS from S3 so theoretically smaller AMIs might be faster

I managed to create a 1GB AMI of debian. Will be interesting to test this.

https://blog.miyuru.lk/aws-root-volume/


I clocked a custom Ubuntu-based AMI at 2.85 seconds for the median of 10 runs.

https://gist.github.com/hashbrowncipher/17a92c6afb9642503876...


I‘d be interested in the time being spent running user-data modifications, so benchmarking bootstrapping scripts would be possible.


Isint that strictly depenant on completely different factors like network latency? And kind of bootstrap scripts you wanna run?

Or are you talking about cloud-init which is run by AWS themselves before running userdata scripts ?


Does anybody know if something like this exists for Azure as well?


PerfkitBenchmarker has a boot timer like this (time to SSH) that works for all major cloud providers.

https://github.com/GoogleCloudPlatform/PerfKitBenchmarker




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: