Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hi Seth,

What about clusters that are used for lumpy work loads? Like data science pipelines? For example, our org has a few dozen clusters being used like that.

Each pipeline gets its own cluster instance as a way to enforce rough and ready isolation. Most of the times the clusters sit unused. To keep them alive we keep a small, cheap, preemptive node alive on the idle cluster. When a new batch of data comes in, we fire up kube jobs which then triggers GKE autoscaling that processes the workload.

This pricing change means we're looking at thousands of dollar more in billing per month. Without any tangible improvement in service. (The keepalive node hack only costs $5 a month per cluster.) We could consolidate the segmented cluster instances into a single cluster with separate namespaces, but that would also cost thousands in valuable developer time.

I don't know how common our use pattern is, but I think we would be a lot better served by a discounted management fee when the cluster is just being kept alive and not actually using any resources. At $0.01, maybe even $0.02, per hour we could justify it. But paying $0.10 to keep empty clusters alive is just egregious.



Those empty clusters that you get for free cost Google money. Perhaps it never should have been free, because that skewed incentives towards models like this.


Unfortunately, even if they switch to dynamically started clusters, the latency of spinning a new cluster is much higher than the latency of adding a bunch of preemptible nodes to existing node pool :/


Google are (were) not the only ones offering this free control plane model, though. My DigitalOcean DOk8s managed tend toward unstable if they are used with too small of node pools. (I don't know why that is, but it seems like a good way to make sure I pay attention to the workloads and also spend at least $20/mo for each cluster I run with them.)

It will be interesting in any case to see if DigitalOcean and Azure are going to follow suit! I'd be very surprised if they do, (but I've also been wrong before, recently too.)


The term is "loss leader." GKE provides the manager node, and cluster management so that we don't have to. And in exchange you sell more compute, storage, network, and app services. This is some ex-Oracle, "what can we do to meet growth objectives," "how can we tax the people who we own" thinking. They're customers, not assets Tim. Your cloud portability play should be the last project to jerk them around on.


Keep in mind that GKE cluster management was paid in the original GKE. GCP only stopped billing for cluster management when EKS released free cluster management.


When did EKS release free cluster management?


On GKE, you can use a single cluster with multiple node pools to achieve a similar effect. Just set the right affinity on your job resources.


PTAL at doing Multi-Tenancy in GKE!

https://cloud.google.com/kubernetes-engine/docs/best-practic...

We don't recommend using node pools for isolation.


If it is only workload isolation, why not?


For secure isolation, we learned it's not sufficient. It's good for resource isolation though.

PTAL at https://www.youtube.com/watch?v=6rMGRvcjvKc


That guide looks nice. Have you guys thought about releasing a terraform module or even a cloud composer workflow that will set that up in a project?


Thanks! We actually do and shipped together with the best practices.

https://github.com/GoogleCloudPlatform/gke-enterprise-mt

Please give us feedback there in case you hit any issue!


Yes, this is the general approach. However it unfortunately has security implications as you are putting MT workloads on a pool with access back to a shared control plane. Dealing with customer uploaded code is a nightmare.


Here you go:

https://github.com/rcarmo/azure-k3s-cluster (this is an Azure template that I use precisely for testing that kind of workloads - spinning up one of these, master included, takes a couple of minutes at most).

(full disclosure: I work at Microsoft - Azure Kubernetes Service works fine, but I built the above because I wanted full control over scaling and a very very simple shared filesystem)


Likely this model is precisely why they are introducing this fee.

I guess they realized they couldn't make cluster management MT.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: