I don't use GHA as some of our code is stored in Perforce, but we've faced the same challenges with EC2 instance startup times on our self managed runners on a different provider.
We would happily pay someone like depot for "here's the AMI I want to run & autoscale, can you please do it faster than AWS?"
We hit this problem with containers too - we'd _love_ to just run all our CI on something like fargate and have it automatically scale and respond to our demand, but the response times and rate limting are just _so slow_ that it means instead we just end up starting/stopping instances with a lambda which feels so 2014.
> We would happily pay someone like depot for "here's the AMI I want to run & autoscale, can you please do it faster than AWS?"
Change that to "here's the ISO/IMG I want to run & autoscale, can you please do it faster than AWS?" and you'll have tons of options. Most platforms using Firecracker would most likely be faster, maybe try to use that as a search vector.
Can you maybe share some examples? We're fine to use other image formats, but a lot of the value of AWS is that the services interact, IAM works nicely together, etc.
Fly.io comes up often [0] on HN, but there's an overwhelming amount of "it's a nice idea, but it just doesn't work" feedback on it.
Depot also does remote docker builds using a remote build kit agent. It was actually their original product. If you could feasibly put everything into a Dockerfile, including running your tests, then you could use that product and get the benefits.
I actually didn't know this. We've had some teething issues _building_ in docker, but we actually run our services in containers. I'm sure a few hours of banging my head against a wall would be worth it here.
> including running your tests,
"thankfully", we use maven which means that our tests are part of the build lifecycle. It's a bit annoying because our CI provider has some neat parallelism stuff that we could lean on if we could separate out the test phase from the build phase. We use docker-compose inside our builders for dev dependencies (we run our tests against a real database running in docker) but I think they should be our only major issues here.
I'm not fully investigated fargate limitations but I think it would be possible to use any k8s native CI on eks + fargate, maybe even use kubevirt for VM creation? from my exploration of fargate with eks, aws provisioned capacity in around 1s region
> AWS offers something very similar to this approach called warm pools for EC2 Auto Scaling. This allows you to define a certain number of EC2 instances inside an autoscaling group that are booted once, perform initialization, then shut down, and the autoscaling group will pull from this pool of compute first when scaling up.
> While this sounds like it would serve our needs, autoscaling groups are very slow to react to incoming requests to scale up. From experimentation, it appears that autoscaling groups may have a slow poll loop that checks if new instances are needed, so the delay between requesting a scale up and the instance starting can exceed 60 seconds. For us, this negates the benefit of the warm pool.
I pulled this from the article, but it's the same problem. Technically yes, eks + fargate works. In practice the response times from "thing added to queue" to "node is responding" is minutes with that setup.
My theory is that they keep nodes booted up and ready and when kube-scheduler cannot assign node then Aws will just add this ready instance to your vpc and ask it to join your cluster.
From user perspective it looked like you always have available capacity on your cluster
Our game code is in P4, but our backend services are on GH. Having a single CI system means we get easy interop e.g. game updates can trigger backend pipelines and vice versa.
In the past I've used TeamCity, Jenkins, and ElectricCommander(!)
We would happily pay someone like depot for "here's the AMI I want to run & autoscale, can you please do it faster than AWS?"
We hit this problem with containers too - we'd _love_ to just run all our CI on something like fargate and have it automatically scale and respond to our demand, but the response times and rate limting are just _so slow_ that it means instead we just end up starting/stopping instances with a lambda which feels so 2014.