Hacker News new | past | comments | ask | show | jobs | submit login
Databases on Kubernetes is fundamentally same as a database on a VM (twitter.com/kelseyhightower)
110 points by kretaceous on Feb 11, 2023 | hide | past | favorite | 106 comments



In theory there's no difference between theory and practice, but in practice there is.

With a stable operator it's theoretically similar, but it is not the same, by any stretch, if only because I guarantee your incident rate will be higher when performing changes to the database, all other factors being similar. You now are also exposed to a bunch of other changes, the shenanigans of k8s networking, a missing daemonset, someone was tweaking istio that messed everything...

The argument of "don't run it on k8s" isn't because you can't run it physically. It's because all those other things are going to increase your failure rate, and those failures will be way more complex to solve and so your time to recovery will also go up.

You'll have to deal with this complexity already for the things that Kubernetes is going to give you actual benefits on, doing databases should be at the bottom of that list. Use your complexity budget wisely. If you're already at the bottom of the list, you look at your team and you go "we have extra capacity and are very confortable with the complexity of our current incidents", go ahead, likely it is the right time for you if you like adventure.


If you run everything else on k8s having just one way to manage things makes life less complex.

Running an app with state (db, storage, etc) in like this is not that hard. We ran all of the low level storage on borg at google when I was there. It worked well.


Let's say a new Kubernetes version comes out in April. In November, as everything works perfectly well, you decide to install a Postgres operator on it. Bummer, it doesn't work. It's not a huge issue, you just wait until the bug is resolved (already done[0]), but it's just one of these tiny things that I don't get when running Postrges natively. And I'm saying this as a big fan of Crunchy Data running some production loads on it without a failure for quite some time now.

[0] https://github.com/CrunchyData/postgres-operator/issues/3476


And that repo you linked to has 1846 issues, 161 open. Which doesn't seem extraordinary based on my limited exposure to k8s.

Another example: https://github.com/zalando/postgres-operator/issues with 445 open issues. Why?

Maybe I'm wrong and this is all a good sign of progress, but my impression is that the entire k8s ecosystem is held together with reused duct tape.


Look at the feature-list of that repo. Half of it are actually deficiencies in postgres that warrants the external complexities, not something induced by kubernetes.

Sure, a lot of it is nice-to-haves but some are absolutely essential for even a basic postgres deployment even if you used VMs... Like, why is replication so difficult, why do i need all these extensions and why do i need to deploy a bouncer as a separate service in 2023?!


You're probably right and I'm probably being unfair.

My opinion is colored by seeing HA postgres deployed on k8s for no good reason, then having to deal with the consequences.

It just seems like once you're in the k8s world, the likelihood that your system will be massively over-engineered goes way up. This isn't the fault of the operators, or k8s itself, it's more of a cultural problem I think. And resume driven development is real.


Wait, borg has several special features and considerations for D (the service that actually holds 99% of the state). It's not exactly as seamless as you make it out to be.


Mostly just a higher priority


Practice that beats theory is just a new theory, so practice can't beat theory, it can only beat out of date or simplistic theories.

I understand what the saying means, but it gives people so many bad ideas about the value of theories and science or how to work with them. In practice you start out based on general and overly simplistic theories, and then expand the theories in the direction you need to solve your problems, and then that creates a new theory you base your practice around.

Wandering around willy-nilly without testing isn't good practice, while going deliberately with lots of testing and logic is what we mean when we talk about theory.


Emerging practice is the ground theory hasn't covered yet.

You are correct, though, in that if you want to make these practices more useful and grow your overarching theory, they absolutely must be approached with lots of testing and logic. The trick is knowing where it's most important to do this & when.

Many areas of practice are left untouched by theory because there is no obvious compelling benefit.


There can still be a difference between practise and a(n insufficient) theory without there being a new theory superseding the old one. A lot of historical developments take this shape: we discover something not predicted by the theory we though would explain it, work on it for a while, and only later do we find the new theory that explains also the novel thing.

For practise to be condensed into theory requires predictive power, and not all differences between practise and a theory are predictable -- yet.

That said, I agree with your overall point. If one finds a difference between practise and theory, it's because one tried to use the wrong theory for the situation at hand.


> Use your complexity budget wisely

Well said


I was deadly afraid to run databases in k8s in 2019. But I've been running CrunchyData's postgres-operator (no affiliation) since 2020 and I can't say anything bad about it, just works.

It's basically just a bunch of already well thought out open source tools like patroni, pgbackrest and postgresql combined into an operator. So it's not really magic.

And the backup to s3 feature in pgbackrest is so solid that I've used the clone cluster and restore cluster feature a few times, just because it's so convenient.

But this is in environments that serve only thousands of users, not at all comparable to huge startups, or environments where ms latency is important.


Know that you can't use crunchydata pgo in prod without a subscription as per their licensing terms for the docker containers


Could you link to something that sets this out? From what I am seeing (just reading in response to your comment, I don’t use pgo myself), both the container and pgo repos have an apache license.



Terms of service != licensing terms. The Docker repo terms of use seem talk about what it's "intended for" and is about support, suitability, security etc covering them against liability. Doesn't read like a license:

> By participating in the Program and accepting these terms, you represent that you understand that (i) absent an active Crunchy Data Support Subscription, the Crunchy Developer Software is unsupported and (ii) Crunchy Developer Software is intended for development purposes only, (iii) Crunchy Developer Software may not address known security vulnerabilities, and (iv) Crunchy Data is relying on your representation as a condition of our providing you access to the Crunchy Developer Software.

Whereas at their Github repo is there's an actual license (Apache) and says stuff like

> Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.


The operator is open source, but the operator deploys images that need to be licensed for production environments.

Contact crunchys email with your specific usecase and ask if it's within their free license if you don't believe this. expect a gigantic bill however as you're definitely in violation. (The bill will be their normal licensing fee, not because of the previous violation)


If you are worried about being unjustly bullied by their lawyers I guess you can also set up a builder for images yourself- according to the docker hub page they're built from the github repo. Could be a good exercise to do anyway, to ensure you know what's in the images and to not depend on random blobs off docker hub.


Well, they should sue everyone commenting on this page then.

You are probably confusing PGO with their commercial certified offering.


It appears there are two operators by Crunchy Data. The PGO (FOSS one) and the commercial one. See this FAQ [1]. The PGO is Apache 2 licensed.

1: https://access.crunchydata.com/documentation/postgres-operat...


It’s not the only PostgreSQL operator for Kubernetes. There’s Zalando, Stackgres and others.


Can you tell me how the storage there works when using their replica feature?

Can I just provide it storage from a k8s-host and it replicates it to the hosts where the other replicas live via software or do I need to provide it a StorageClass that supports distributed volumes?

This is the one thing I cannot wrap my head around and apparently there isn't that much information (read benchmarks) on the implications of using different kinds of storages for databases on k8s (or at least I can't find anything)


In the cloud, volumes are just disks. They are attached to the instance were your pod get scheduled. You can in gke choose the disk type (IOPS).

If configured like a daemonset, you are garanteed to always match disk to pod instance ordinal (postgres-0 get disk 0, postgres-1 get disk 1).

So basically a way to run multiple postgres instances.

But the devil is in the details.


I know that, I was asking more about the specifics of the Crunchydata postgres operator since they have replication built-in. If it was just them using the volumes and doing the replication on the application level (not using k8s storage management, idk how to better describe it) it'd be easy. If they expect me however to provide a Storageclass that provides replication on top of the volumes it'd be more annoying.

I could try this out myself to be honest, but I didn't quite get to it until now


Pg manages replication. If you use something like longhorn to replicate the volume you're just wasting storage - run pg replicas and stream Wal to object storage with wal-g, which the operator probably does (I user stackgres, and it does)


I think you misunderstand, crunchydata only deploys postgresql and provisions streaming replication between postgres nodes using patroni, which provisions the postgresql.conf necessary when you choose more than one replica in the PostgresCluster object definition.


I’m curious, have you had the chance of upgrading postgresql version usong that operator? If yes, how’s your experience?


Hmm good question. I have two sets of clusters and both are on 13.9 so I'm honestly not sure if I ever did a postgres version upgrade.

Without checking the docs I would imagine you could do a little hack where you restore from pgbackrest into a clean postgres 14 cluster.


We run our databases (ScyllaDB, Elasticsearch) on our workload orchestrator (Nomad), just like any other job. The only difference is that the servers running the database workloads are not available for scheduling any other jobs, that way the database will be the only thing running on them. I can't overstate how much easier this has made database operations for us. We get to keep all of our config in our Nomad repo, use the same mechanism for secrets management, service discovery, the fleet-wide logging and auditing sidecars, etc. Plus Nomad allows rolling updates with automatic revert which makes upgrading 10+ nodes of Elasticsearch one-by-one with healthchecks a total breeze. Since we've had this setup we are able to trigger database release updates with a one-line change and a single command.


Those database are distributed cluster native databases.

They have failover, cluster elections, node auto joining mechanisms.

Vanilla postgres have none of that.


True, that makes running these databases a bit easier since there are multiple nodes that can be restarted in any order. However the main point still stands: You could just as well run Postgres on Nomad/k8s by pinning it to a specific node and excluding that node from the general job scheduling pool. We do that for a single-node Redis instance for example. Still vastly superior to managing unique snowflakes nodes just for the DBs via ansible/systemd.


> secrets management, service discovery, the fleet-wide logging and auditing sidecars

Interesting, I have run my hobby projects by Nomad satisfiedly and are looking for ways to run serious workloads. Would you like to share more wisdom? How do you accomplish things above? Thanks.


Not OP but the last couple releases of Nomad have added quite a few QoL features without having to reach for/or setup Consul or Vault depending on your needs. At least in regard to basic service discovery and secrets management.

I'm unsure if something like open policy agent can directly work with the orchestrator and may have to be at the application level.

https://www.hashicorp.com/blog/nomad-service-discovery

https://www.hashicorp.com/blog/nomad-1-4-adds-nomad-variable...


Consul + Vault!


I think we're conflating Kubernetes with containers here.

I definitely agree that databases don't benefit from running under containers - they are pet-like (and so don't benefit from fast spin up or massive horizontal scaling) and tend to require host-level tuning (which breaks the container abstraction).

What Kubernetes brings is a well-principled orchestration framework that can easily be extended with custom operators for workloads such as databases that need it. (In fairness, he does refer to this in passing at the end.)


Everything is easy if you don't have to handle state.

K8s was always really crap for persistent storage. On AWS you can always dump to EFS or the managed lustre (what ever thats called). (which is better than attaching block storage.)


Please don't use EFS for anything requiring speed, especially writes. It's extremely slow in my experience.


One of my peeves with fargate is it only supports volumes on EFS.


yeah, metadata performance sucks arse, but it serves a purpose.


Unnecessarily harsh rant: senior staff/principals engineers who haven't operated (or been on-call) for anything important come to evangelize.

I suppose this is mainly a thought for projects like neondb/cockroachdb/stackgres (who I haven't heard of but was linked in the thread). It might be reasonable if you need incredibly many db instances, but for the general business who needs "a couple" of database instances, I can't imagine that putting Kubernetes on top would ever serve you better. I'm staying as far away as I can.


> I suppose this is mainly a thought for projects like neondb/cockroachdb/stackgres

StackGres is "just" a platform for running Postgres on Kubernetes. It helps you deploy and manage HA, connection pooling, monitoring, automated backups, upgrades and many other things. That you have a tiny Postgres instance; or hundreds of beefy clusters with many instances is up to you. It's not a distributed database (like the other ones mentioned), it is still "vanilla" Postgres.

Disclosure: Founder of OnGres (company behind StackGres)


You won't see databases on k8s in enterprise production environments. Startups or companies/services with lower reliability requirements, sure. But don't expect to walk into a fortune 500 and standup a postgres operator in production expecting to replace the existing federated solution.


> You won't see databases on k8s in enterprise production environments. Startups or companies/services with lower reliability requirements, sure. But don't expect to walk into a fortune 500 and standup a postgres operator in production expecting to replace the existing federated solution.

Blanket statements like that should be taken with a grain of salt.

F500's are not one thing. You don't have to scratch deeply to find teams running production DB's on k8s (ignoring or accepting the trade-offs, of which there are many including working with vendors and existing DBA's and their solutions) and you'll find DBA's evaluating the same and other trade-offs for themselves.

I personally think that running DB's on multi-tenant k8s with nodes that weren't specifically allocated for it is strapping in for a bad ride.


I don't mean to nit, but we are saying the same thing..


Ideally you should not be seeing k8s at all in mission critical infrastructure in any tech company. I know few FAANGs that stay away from it.


I disagree with that. Many fortune 500s are running k8s to power critical infra. GMF processes all OnStar data in realtime on k8s, GitHub runs entirely on k8s, etc etc. You need the personal and the tools to manage it, but at a certain point k8s makes sense. There are still use cases were k8s is not a solution.

EDIT: part of actions, codespaces and packages are not run on k8s, but 80% of github services are


> GitHub runs entirely on k8s

That's really not the endorsement you think it is.


Your comment doesn't really help understand why.


GitHub experiences outages pretty regularly, for example on 12 separate days last month: https://www.githubstatus.com/history


So what? Nothing here is sufficient to conclude it has anything to do with k8 whatsoever.

For example “users cannot resume code spaces created before the incident” sounds a lot more like an application level problem.


The point above was that it wasn't a good endorsement. Correlation is not causation but the opposite is also true.


Why? Yes the operations can be a bit messy. But in practice it solves the "I want to run, update and deploy my service without worrying about hardware allocations" problem. Otherwise you create an implementation of half of it.


My company has operated a cloud for three years that now manages hundreds of ClickHouse clusters on Kubernetes. We use the Altinity Kubernetes Operator for ClickHouse, aka "clickhouse operator," which we wrote and maintain.

I was very skeptical of data on Kubernetes when we first started, in part due to some initial experience with Kubernetes in 2018 but mostly due to prejudice against change. Overall it has worked out great. Here are 4 of many things we've learned.

1. Most modern databases are distributed systems. You don't just set up a single node but rather several or even dozens of nodes. Well-written operators make this relatively trivial even though it's quite complex underneath. In fact, the simplest way to learn how to set up a ClickHouse cluster is to bring it up under the operator and then look at the configuration on each container. That's how I learned it.

2. Kubernetes portability is overall quite good. We ported our cloud from AWS to GCP in 8 weeks. We've since expanded to run in many other environments as well.

3. We map ClickHouse server containers 1-to-1 to VMs spawned using Karpenter or native node groups. It makes it a lot easier to reason about performance, including things like network bandwidth to storage.

4. ClickHouse is still basically a shared nothing architecture where individual servers own patches of storage. Kubernetes enables a great scaling model if you use VMs attached to block storage--you can scale nodes from 2 to 64 vCPUs in a few minutes, plus you can easily extend volumes. This scaling model is in my opinion highy under-rated for databases. It's decoupled compute/storage that really works. With Kubernetes you get it essentially for free.

It's not all roses. Containers create new failure modes. You can't just ssh in, look at logs, and fix things. Pod crash loops [0] can be very problematic. Certain failure modes like bad EBS volumes (kinda alive states) are hard fix if your operator cannot quickly replace a node. And operator bugs create a new class of very-hard-to-debug problems. The best solution to all of these problems is not to have them, which means you need to focus--often for years--on operator reliabily and your day 2 infrastucture, such as monitoring.

[0] https://altinity.com/blog/fixing-the-dreaded-clickhouse-cras...

Disclaimar: I work for Altinity


Kubernetes does not innovate in the Area of Storage. In fact it does not even adopt any interesting concepts from HPC.

Each database container is attached to a single disk volume and they are not fungible.

There are ways of moving beyong this model, but you have to innovate at the API layer.

For example Microsoft Service Fabric, has reliable collections and queues. Its a single Dictionary and queue accessible by your application that is partitioned and replicated with your application. It scales with your application and always writes to local disk.

Why I am unsure of merits of this implementation, I imagine we need something like that to trully have good approach to databases in the new paradigme

https://learn.microsoft.com/en-us/azure/service-fabric/servi...


Are people still using Service Fabric? I was under the impression it was going away...


I believe some core azure services are built on top of it.

I have to real experience with it, just found their approach interesting


Kubernetes is fundamentally a middleware for dynamic network automation and code/container segmentation and provisioning. I ever understood how those make running a db any easier. You do not want to tread your db's as cattle. Sadly, they are largely pets. Your data is your most sacred possession.

I'm a fan of k8s, but the rational of running DB's on it has always seemed specious to me.


Your data is in a disk. Threat data like gold, postgres instances like cattle.

More like a trick poney, actually, because you'll have to teach it a lot about what to do with the several container lifetime hooks and states.

But hey, if you really need to run your postgres instances in kubernetes (which you don't because cloud postgres instances can be fully automated) kubernetes operators are like the world's expert in poney training working for you.


Postgres is neither cattle, nor are there “instances”, because it assumes full responsibility for the data it manages.

For example you can not have 2 postgres instances, each with their own locking mechanisms, check whether transactions are serializable, since they don’t know about the locks in the other instance.

This is very basic stuff so I wonder whether people who argue for treating databases like web servers lack basic training in this area.


Yeah, right? You'd be amazed about what people can achieve with bash scripts and kubernetes containers.

The crux of my argument is: you probably don't need postgres in kubernetes on the cloud (just use the damn cloud api to manage your database), but if for any reason you want to, manage a postgres with kubernetes it's like managing it with systemd.

Way more involved, but doable. Great part is: just as vast majority of people don't write their own systemd units or postgres management scripts, they can use kubernetes operators and benefit from the open source knowledge and ecosystem.


This works well until a misconfiguration has your cattle eating gold.


In the pet model you fix the problem and patch the rest of the pets. In the cattle model you fix the problem and thaw a new herd.

What's the practical difference?


Well, pets eat their weight in gold those days.

Your pet, your problem. one operator, world's problem.


I suppose you've never had a misconfiguration corrupt all your data before? Yeh, it's on the disk, and it's just a program reading/writing to that disk... but even if those programs are cattle, those disks sure as hell are not.


Which is why pretty much every Postgres k8s operators includes a backup solution that is fully integrated.

I much prefer to be able to treat the database instances like cattle, while the conceptual "cluster" and its associated backups should be treated as a pet, as you said.


I think this glosses over the downsides of sharing a kernel. There are lots of edge cases where containers can interfere with each other by consuming kernel resources or causing contention in certain kernel code paths.

So you can run a db on kube ok, but a "production like" setup will be keeping other workload off the same kernel. So the kube bit is because it's what you prefer to operate, and you might as well be using vms.


Create a scalable node pool just for databases. Each node with the right amount of memory (125% of what the a database instance will use) and CPU (db + 1 for kubernetes' infra).

Taint the node pool so that only the database can tolerate it.

Presto. Now when you scale up N instances, N new nodes are created.


You just simply pin the DB on dedicated workers. No big deal.


This was clearly written by somebody that has never run a large, very very busy database, with strict DR requirements.


You clearly didn't read or hear Kelsey's full take because he agrees with you. People here are taking a clip/tweet out of context

I suggest listening to his Space in full


Whats the difference between running a large DB running at 90% on a VM compared to k8s?

The method of snapshotting uses a different plugin, but apart form that, its essentially the same.


It is written by a L9 at Google though


But likely not a L9 SWE, is he?


Then the Emperor's new clothes must be the finest in the land, no?


The "don't run databases in Kubernetes/containers" predates CSI and stable persistent storage in Kubernetes, and is long since obsolete. With the proliferation of amazing Kubernetes operators and "cloud-native" databases such as CockroachDB and ScyllaDB, if anything, you need a good reason not to run your DBs in Kube if that's your primary orchestration platform.


Kelsey is super smart, but this is possibly the worst type of advice you can give.

You should NOT run your databases on Containers. As Werner Vogels says, "eventually, everything will fail" at scale, and the more complexity you add to a system, the higher the chances. K8S is an unnecessary layer.

Plus, don't even get me started on the security aspects of this choice...


You obviously didn't read or listen not the full content because this is what Kelsey says too, but with more context and humility

I suggest listening to his full space on the subject


Then why is he tweeting something that implies the opposite?

Don't tweet something that implies a very obvious conclusion, and then say "well if you read the reams of fine print you know he doesn't mean that".


Probably because single tweet is limited to 140 chars, so he spread it across several.

Did you read beyond the first tweet in the sequence?

His point is that the decision depends on what you are trying to do. Running prod database, bad idea. Letting developers or automation spin up ephemeral databases, good use case.

The idea that k8s and vms are the same is that neither handles day 2/3 issues for you and both methods require a lot of expertise. i.e. deploying on a VM doesn't solve the hard parts either


> Then why is he tweeting something that implies the opposite?

Exactly this.


To some extent, it is true specially after the containers checkpoint and restore functionality where you stop the container at an exact instruction, its all state including registers is saved and then it can be restored on a different machine waking up as if nothing happened. Also known as live migration. Based on [1] and it seems it is either already in or on its way into k8s [2]

[1] https://criu.org/Main_Page [2] https://www.youtube.com/watch?v=wCb1Rfoy7Fk


People still thinking one "runs stuff on kubernetes".

No, you run stuff on Linux, using kubernetes to manage it.


You run stuff in containers on Linux, along with 50 million other containers, and an automated system that will do all kinds of shit to your containers, not to mention your storage, networking, access control, sysctl, etc


    Is deploying a database binary and mounting
    a volume on a VM really the hard part?
The nice thing about containers is that you are independent of the underlying infrastructure. Be it a VM or bare metal.

I can easily spin up a Docker container on my laptop. Takes a second or so.

Spinning up VirtualBox or something is so much more hassle.


This falsehood needs to die.

It's true for non-trivial cases, and exceptionally false for some things. Like databases. A lot of the sysctl tuning for large database environments has to be done on the host, either directly, allowing unsafe sysctls and restarting kubelet, running with privileges, etc.

This is not "independent of the underlying infrastructure" or "the VM and my laptop have the same kernel tunables set". They don't. And it matters.


Only when your application depends on optimizing your database to the max.

Most applications don't.

And even if the application I work on does - then only in production. Not on my laptop. So on my laptop I will simply chose a docker image that resembles production as closesly as possible. I will not use a VM.


> Only when your application depends on optimizing your database to the max.

Only when you don't need to store data for any length of time. The point of containers is that they don't store state. You have to manage that.

And for some cases you can get away with it. But for most websites, keeping a single container up for many years isn't a great strategy. (Yes, I know there are ways around this. but they are not as simple as "docker run postgres")


Trivial use cases are generally trivial.


The vast majority of use cases for databases are trivial. For every esoteric, optimized to the max, bleeding edge kernel parameter tuned environment, you have 10,000+ CRUD apps that do little more than primary key look ups on vanilla btree indexes.


I’ve worked at a half dozen companies most people have never heard of and they all had database complexity and performance issues. The whole “only FAANG has scale” meme needs to die.

You’re probably right there’s a magnitude more tiny apps but who cares. A large number of us still are interested in and need to solve for non-trivial cases.


Most database performance issues are bad choice of indices, bad data model, bad queries, not enough RAM, missing pooling, unrealistic expectations and lack of pre-computed caches.

None of these are solved by "optimize kernel-parameters to the max".


I agree! I must have misread or misunderstood the context.


This is exactly what I was talking about.


This sounds like a good argument for firecracker or some other micro hypervisor wrapper around your container to give you more control. You should never ever touch the host OS of a production system.


I can spin up a Db on my laptop in a few minutes.

The difference is my laptop doesn't provide any guarantees of stability, uptime or throughput.

installing a DB is as simple as "apt install $db"

> The nice thing about containers is that you are independent of the underlying infrastructure

I mean you're really not. You're dependent on the machine, the OS, and the kernel version. Docker on windows is a VM, from what I remember. I assume OSX's docker is still VM based too.


Is docker fast because you have other containers running already and you’ve got a vm up?

Vagrant used to be a thing and it would spin vms up pretty quick.

There are differences technically, between docker and vms, certainly, but conceptually they are the same, aren’t they?

There’s no real


It's just fast.

I have not used a container today. Let's use one:

    $ time docker run --rm debian:11-slim echo hello
    hello

    real    0m0.528s
    user    0m0.015s
    sys     0m0.023s


VM platforms had the opportunity but not the motivation to achieve comparable startup times. There are a handful of examples out there like AWS Firecracker, but the majority out there are “full fat”.

As a random example, Azure full-clones 127GB disk images by default. It takes over a minute to create a VM. Booting form a cold start is sluggish because there is no sharing with other tenants, hence no caching.

Using the same hypervisor (Hyper-V) I can clone out a Windows server VM and boot it in about 3 seconds by simply using delta cloning. Subsequent boots of it or any of its siblings is just over a second!

Containers and Kubernetes are throwing the gloves down and will force the competition to pick up the pace.


Azure is the worst possible example and is invalid as a comparison outside of to show exactly how bad it is.

AWS EC2s start within a couple of tens of seconds max. AWS Lambda, Google Cloud Functions, Google Cloud Run start within ms (Google Cloud Functions used to have a cold start problem where sometimes they'd take up to a few seconds to start, but that has been fixed).

But overall, I'd say VM platforms are somewhat a thing of the past. Nobody cares about running an OS, what you need is the things inside (your application, database, etc.) so fundamentally a VM is an abstraction at the wrong layer, a means to an end. Don't get me wrong, they're still here and aren't going anywhere, but should no longer be the go-to outside of a few specific cases - bare metal, containers and "serverless" (running on bare metal) is where it's at.


Fast compared to traditional VMs but quite slow for what it's doing (setting up some namespaces and a cow filesystem forked off the image).


Swap is missing in k8s. That's a difference from a VM.

Otherwise the operator concept makes it even easier. Zalando operator for PostgreSQL takes care of ha and backup



it is impossible to predict the behavior of k8s. the more you have "going on" in a k8s cluster, the worse it gets. there's already a metric shit ton of things to worry about with a database without also having to worry about an entire k8s cluster and what it might be doing to your database


When run with good Operator Databases have much more automation than conventionally run databases on VMs - you ten to get high availability built in, easy upgrade backups etc.


"Storing data on an ephemeral drive in AWS is fundamentally the same as storing data on an EBS volume".

Yeah, but its not.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: