*"Firecracker's RAM footprint starts low, but once a workload inside allocates R...

paulv · on July 10, 2023

The first footnote says If you squint hard enough, you'll find that Firecracker does support dynamic memory management with a technique called ballooning. However, in practice, it's not usable. To reclaim memory, you need to make sure that the guest OS isn't using it, which, for a general-purpose workload, is nearly impossible

dathinab · on July 10, 2023

> is nearly impossible

for many mostly "general purpose" use cases it's quite viable, or else ~fly.io~ AWS Fargate wouldn't be able to use it

this doesn't mean it's easy to implement the necessary automatized tooling etc.

so it's depending on your dev resources and priorities it might be a bad choice

still I feel the article was had quite a bit a being subtil judgemental while moving some quite relevant parts for the content of the article into a footnote and also omitting that this "supposedly unusable tool" is used successfully by various other companies...

like as it it was written by and engineer being overly defensive about their decision due having to defend it to the 100th time because shareholders, customers, higher level management just wouldn't shut up about "but that uses Firecracker"

dspillett · on July 10, 2023

> which, for a general-purpose workload, is nearly impossible

That depends on the workload and the maximum memory allocated to the guest OS.

A lot of workloads rely on the OS cache/buffers to manage IO so unless RAM is quite restricted you can call in to release that pretty easily prior to having the balloon driver do its thing. In fact I'd not be surprised to be told the balloon process does this automatically itself.

If the workload does its own IO management and memory allocation (something like SQL Server which will eat what RAM it can and does its own IO cacheing) or the VM's memory allocation is too small for OS caching to be a significant use after the rest of the workload (you might pair memory down to the bare minimum like this for a “fairly static content” server that doesn't see much variation in memory needs and can be allowed to swap a little if things grow temporarily), then I'd believe is it more difficult. That is hardly the use case for firecracker though so if that is the sort of workload being run perhaps reassessing the tool used for the job was the right call.

Having said that my use of VMs is generally such that I can give them a good static amount of RAM for their needs and don't need to worry about dynamic allocation, so I'm far from a subject expert here.

And, isn't firecraker more geared towards short-lived VMs, quick to spin up, do a job, spin down immediately (or after only a short idle timeout if the VM might answer another request if one comes in immediately or is already queued), so you are better off cycling VMs, which is probably happening anyway, than messing around with memory balloons? Again, I'm not talking from a position of personal experience here so corrections/details welcome!

tedunangst · on July 10, 2023

I'm struggling to understand how qemu with free page reporting isn't exactly the same as a firecracker balloon.

adql · on July 10, 2023

Yeah it's pretty hard problem as you'd need to defragment physical memory (while fixing all the virtual-to-physical mappings) to make contiguous block to free

flaminHotSpeedo · on July 11, 2023

A bit disingenuous to make a broad sweeping claim, then have a footnote which contradicts that claim, and upon closer inspection even that claim is incorrect.

It's absolutely usable in practice, it just makes oversubscription more challenging.

0xbadcafebee · on July 10, 2023

That and the fact that this was after "several weeks of testing" tells me this team doesn't have much virtualization experience. Firecracker is designed to quickly virtualize 1 headless stateless app (like a container), not run hundreds of different programs in a developer environment.

CompuIves · on July 10, 2023

Yes, we use this at CodeSandbox for reclaiming memory to the host (and to reduce snapshot size when we hibernate the VM).