Databases on Kubernetes is fundamentally same as a database on a VM

vasco · on Feb 11, 2023

In theory there's no difference between theory and practice, but in practice there is.

With a stable operator it's theoretically similar, but it is not the same, by any stretch, if only because I guarantee your incident rate will be higher when performing changes to the database, all other factors being similar. You now are also exposed to a bunch of other changes, the shenanigans of k8s networking, a missing daemonset, someone was tweaking istio that messed everything...

The argument of "don't run it on k8s" isn't because you can't run it physically. It's because all those other things are going to increase your failure rate, and those failures will be way more complex to solve and so your time to recovery will also go up.

You'll have to deal with this complexity already for the things that Kubernetes is going to give you actual benefits on, doing databases should be at the bottom of that list. Use your complexity budget wisely. If you're already at the bottom of the list, you look at your team and you go "we have extra capacity and are very confortable with the complexity of our current incidents", go ahead, likely it is the right time for you if you like adventure.

lokar · on Feb 11, 2023

If you run everything else on k8s having just one way to manage things makes life less complex.

Running an app with state (db, storage, etc) in like this is not that hard. We ran all of the low level storage on borg at google when I was there. It worked well.

hdjjhhvvhga · on Feb 11, 2023

Let's say a new Kubernetes version comes out in April. In November, as everything works perfectly well, you decide to install a Postgres operator on it. Bummer, it doesn't work. It's not a huge issue, you just wait until the bug is resolved (already done[0]), but it's just one of these tiny things that I don't get when running Postrges natively. And I'm saying this as a big fan of Crunchy Data running some production loads on it without a failure for quite some time now.

[0] https://github.com/CrunchyData/postgres-operator/issues/3476

thunky · on Feb 11, 2023

And that repo you linked to has 1846 issues, 161 open. Which doesn't seem extraordinary based on my limited exposure to k8s.

Another example: https://github.com/zalando/postgres-operator/issues with 445 open issues. Why?

Maybe I'm wrong and this is all a good sign of progress, but my impression is that the entire k8s ecosystem is held together with reused duct tape.

Too · on Feb 12, 2023

Look at the feature-list of that repo. Half of it are actually deficiencies in postgres that warrants the external complexities, not something induced by kubernetes.

Sure, a lot of it is nice-to-haves but some are absolutely essential for even a basic postgres deployment even if you used VMs... Like, why is replication so difficult, why do i need all these extensions and why do i need to deploy a bouncer as a separate service in 2023?!

thunky · on Feb 12, 2023

You're probably right and I'm probably being unfair.

My opinion is colored by seeing HA postgres deployed on k8s for no good reason, then having to deal with the consequences.

It just seems like once you're in the k8s world, the likelihood that your system will be massively over-engineered goes way up. This isn't the fault of the operators, or k8s itself, it's more of a cultural problem I think. And resume driven development is real.

pclmulqdq · on Feb 12, 2023

Wait, borg has several special features and considerations for D (the service that actually holds 99% of the state). It's not exactly as seamless as you make it out to be.

lokar · on Feb 12, 2023

Mostly just a higher priority

Jensson · on Feb 11, 2023

Practice that beats theory is just a new theory, so practice can't beat theory, it can only beat out of date or simplistic theories.

I understand what the saying means, but it gives people so many bad ideas about the value of theories and science or how to work with them. In practice you start out based on general and overly simplistic theories, and then expand the theories in the direction you need to solve your problems, and then that creates a new theory you base your practice around.

Wandering around willy-nilly without testing isn't good practice, while going deliberately with lots of testing and logic is what we mean when we talk about theory.

slfnflctd · on Feb 11, 2023

Emerging practice is the ground theory hasn't covered yet.

You are correct, though, in that if you want to make these practices more useful and grow your overarching theory, they absolutely must be approached with lots of testing and logic. The trick is knowing where it's most important to do this & when.

Many areas of practice are left untouched by theory because there is no obvious compelling benefit.

kqr · on Feb 11, 2023

There can still be a difference between practise and a(n insufficient) theory without there being a new theory superseding the old one. A lot of historical developments take this shape: we discover something not predicted by the theory we though would explain it, work on it for a while, and only later do we find the new theory that explains also the novel thing.

For practise to be condensed into theory requires predictive power, and not all differences between practise and a theory are predictable -- yet.

That said, I agree with your overall point. If one finds a difference between practise and theory, it's because one tried to use the wrong theory for the situation at hand.

romankolpak · on Feb 11, 2023

> Use your complexity budget wisely

Well said

INTPenis · on Feb 11, 2023

I was deadly afraid to run databases in k8s in 2019. But I've been running CrunchyData's postgres-operator (no affiliation) since 2020 and I can't say anything bad about it, just works.

It's basically just a bunch of already well thought out open source tools like patroni, pgbackrest and postgresql combined into an operator. So it's not really magic.

And the backup to s3 feature in pgbackrest is so solid that I've used the clone cluster and restore cluster feature a few times, just because it's so convenient.

But this is in environments that serve only thousands of users, not at all comparable to huge startups, or environments where ms latency is important.

nwmcsween · on Feb 11, 2023

Know that you can't use crunchydata pgo in prod without a subscription as per their licensing terms for the docker containers

noodlesUK · on Feb 11, 2023

Could you link to something that sets this out? From what I am seeing (just reading in response to your comment, I don’t use pgo myself), both the container and pgo repos have an apache license.

oogali · on Feb 11, 2023

It’s stated in https://www.crunchydata.com/developers/terms-of-use

Which is linked to from https://hub.docker.com/r/crunchydata/crunchy-postgres

fulafel · on Feb 11, 2023

Terms of service != licensing terms. The Docker repo terms of use seem talk about what it's "intended for" and is about support, suitability, security etc covering them against liability. Doesn't read like a license:

> By participating in the Program and accepting these terms, you represent that you understand that (i) absent an active Crunchy Data Support Subscription, the Crunchy Developer Software is unsupported and (ii) Crunchy Developer Software is intended for development purposes only, (iii) Crunchy Developer Software may not address known security vulnerabilities, and (iv) Crunchy Data is relying on your representation as a condition of our providing you access to the Crunchy Developer Software.

Whereas at their Github repo is there's an actual license (Apache) and says stuff like

> Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.

411111111111111 · on Feb 11, 2023

The operator is open source, but the operator deploys images that need to be licensed for production environments.

Contact crunchys email with your specific usecase and ask if it's within their free license if you don't believe this. expect a gigantic bill however as you're definitely in violation. (The bill will be their normal licensing fee, not because of the previous violation)

fulafel · on Feb 13, 2023

If you are worried about being unjustly bullied by their lawyers I guess you can also set up a builder for images yourself- according to the docker hub page they're built from the github repo. Could be a good exercise to do anyway, to ensure you know what's in the images and to not depend on random blobs off docker hub.

hdjjhhvvhga · on Feb 11, 2023

Well, they should sue everyone commenting on this page then.

You are probably confusing PGO with their commercial certified offering.

linuxftw · on Feb 11, 2023

It appears there are two operators by Crunchy Data. The PGO (FOSS one) and the commercial one. See this FAQ [1]. The PGO is Apache 2 licensed.

1: https://access.crunchydata.com/documentation/postgres-operat...

fmajid · on Feb 12, 2023

It’s not the only PostgreSQL operator for Kubernetes. There’s Zalando, Stackgres and others.

5Qn8mNbc2FNCiVV · on Feb 11, 2023

Can you tell me how the storage there works when using their replica feature?

Can I just provide it storage from a k8s-host and it replicates it to the hosts where the other replicas live via software or do I need to provide it a StorageClass that supports distributed volumes?

This is the one thing I cannot wrap my head around and apparently there isn't that much information (read benchmarks) on the implications of using different kinds of storages for databases on k8s (or at least I can't find anything)

motoboi · on Feb 11, 2023

In the cloud, volumes are just disks. They are attached to the instance were your pod get scheduled. You can in gke choose the disk type (IOPS).

If configured like a daemonset, you are garanteed to always match disk to pod instance ordinal (postgres-0 get disk 0, postgres-1 get disk 1).

So basically a way to run multiple postgres instances.

But the devil is in the details.

5Qn8mNbc2FNCiVV · on Feb 11, 2023

I know that, I was asking more about the specifics of the Crunchydata postgres operator since they have replication built-in. If it was just them using the volumes and doing the replication on the application level (not using k8s storage management, idk how to better describe it) it'd be easy. If they expect me however to provide a Storageclass that provides replication on top of the volumes it'd be more annoying.

I could try this out myself to be honest, but I didn't quite get to it until now

knewter · on Feb 11, 2023

Pg manages replication. If you use something like longhorn to replicate the volume you're just wasting storage - run pg replicas and stream Wal to object storage with wal-g, which the operator probably does (I user stackgres, and it does)

INTPenis · on Feb 12, 2023

I think you misunderstand, crunchydata only deploys postgresql and provisions streaming replication between postgres nodes using patroni, which provisions the postgresql.conf necessary when you choose more than one replica in the PostgresCluster object definition.

znpy · on Feb 11, 2023

I’m curious, have you had the chance of upgrading postgresql version usong that operator? If yes, how’s your experience?

INTPenis · on Feb 12, 2023

Hmm good question. I have two sets of clusters and both are on 13.9 so I'm honestly not sure if I ever did a postgres version upgrade.

Without checking the docs I would imagine you could do a little hack where you restore from pgbackrest into a clean postgres 14 cluster.

heipei · on Feb 11, 2023

We run our databases (ScyllaDB, Elasticsearch) on our workload orchestrator (Nomad), just like any other job. The only difference is that the servers running the database workloads are not available for scheduling any other jobs, that way the database will be the only thing running on them. I can't overstate how much easier this has made database operations for us. We get to keep all of our config in our Nomad repo, use the same mechanism for secrets management, service discovery, the fleet-wide logging and auditing sidecars, etc. Plus Nomad allows rolling updates with automatic revert which makes upgrading 10+ nodes of Elasticsearch one-by-one with healthchecks a total breeze. Since we've had this setup we are able to trigger database release updates with a one-line change and a single command.

motoboi · on Feb 11, 2023

Those database are distributed cluster native databases.

They have failover, cluster elections, node auto joining mechanisms.

Vanilla postgres have none of that.

heipei · on Feb 11, 2023

True, that makes running these databases a bit easier since there are multiple nodes that can be restarted in any order. However the main point still stands: You could just as well run Postgres on Nomad/k8s by pinning it to a specific node and excluding that node from the general job scheduling pool. We do that for a single-node Redis instance for example. Still vastly superior to managing unique snowflakes nodes just for the DBs via ansible/systemd.

nanmu42 · on Feb 11, 2023

> secrets management, service discovery, the fleet-wide logging and auditing sidecars

Interesting, I have run my hobby projects by Nomad satisfiedly and are looking for ways to run serious workloads. Would you like to share more wisdom? How do you accomplish things above? Thanks.

jxfive · on Feb 11, 2023

Not OP but the last couple releases of Nomad have added quite a few QoL features without having to reach for/or setup Consul or Vault depending on your needs. At least in regard to basic service discovery and secrets management.

I'm unsure if something like open policy agent can directly work with the orchestrator and may have to be at the application level.

https://www.hashicorp.com/blog/nomad-service-discovery

https://www.hashicorp.com/blog/nomad-1-4-adds-nomad-variable...

MoOmer · on Feb 11, 2023

Consul + Vault!

maffydub · on Feb 11, 2023

I think we're conflating Kubernetes with containers here.

I definitely agree that databases don't benefit from running under containers - they are pet-like (and so don't benefit from fast spin up or massive horizontal scaling) and tend to require host-level tuning (which breaks the container abstraction).

What Kubernetes brings is a well-principled orchestration framework that can easily be extended with custom operators for workloads such as databases that need it. (In fairness, he does refer to this in passing at the end.)

KaiserPro · on Feb 11, 2023

Everything is easy if you don't have to handle state.

K8s was always really crap for persistent storage. On AWS you can always dump to EFS or the managed lustre (what ever thats called). (which is better than attaching block storage.)

candiddevmike · on Feb 11, 2023

Please don't use EFS for anything requiring speed, especially writes. It's extremely slow in my experience.

datatrashfire · on Feb 11, 2023

One of my peeves with fargate is it only supports volumes on EFS.

KaiserPro · on Feb 11, 2023

yeah, metadata performance sucks arse, but it serves a purpose.

ansc · on Feb 11, 2023

Unnecessarily harsh rant: senior staff/principals engineers who haven't operated (or been on-call) for anything important come to evangelize.

I suppose this is mainly a thought for projects like neondb/cockroachdb/stackgres (who I haven't heard of but was linked in the thread). It might be reasonable if you need incredibly many db instances, but for the general business who needs "a couple" of database instances, I can't imagine that putting Kubernetes on top would ever serve you better. I'm staying as far away as I can.

ahachete · on Feb 11, 2023

> I suppose this is mainly a thought for projects like neondb/cockroachdb/stackgres

StackGres is "just" a platform for running Postgres on Kubernetes. It helps you deploy and manage HA, connection pooling, monitoring, automated backups, upgrades and many other things. That you have a tiny Postgres instance; or hundreds of beefy clusters with many instances is up to you. It's not a distributed database (like the other ones mentioned), it is still "vanilla" Postgres.

Disclosure: Founder of OnGres (company behind StackGres)

grepfru_it · on Feb 11, 2023

You won't see databases on k8s in enterprise production environments. Startups or companies/services with lower reliability requirements, sure. But don't expect to walk into a fortune 500 and standup a postgres operator in production expecting to replace the existing federated solution.

rlonstein · on Feb 11, 2023

> You won't see databases on k8s in enterprise production environments. Startups or companies/services with lower reliability requirements, sure. But don't expect to walk into a fortune 500 and standup a postgres operator in production expecting to replace the existing federated solution.

Blanket statements like that should be taken with a grain of salt.

F500's are not one thing. You don't have to scratch deeply to find teams running production DB's on k8s (ignoring or accepting the trade-offs, of which there are many including working with vendors and existing DBA's and their solutions) and you'll find DBA's evaluating the same and other trade-offs for themselves.

I personally think that running DB's on multi-tenant k8s with nodes that weren't specifically allocated for it is strapping in for a bad ride.

grepfru_it · on Feb 11, 2023

I don't mean to nit, but we are saying the same thing..

goodpoint · on Feb 11, 2023

Ideally you should not be seeing k8s at all in mission critical infrastructure in any tech company. I know few FAANGs that stay away from it.

grepfru_it · on Feb 11, 2023

I disagree with that. Many fortune 500s are running k8s to power critical infra. GMF processes all OnStar data in realtime on k8s, GitHub runs entirely on k8s, etc etc. You need the personal and the tools to manage it, but at a certain point k8s makes sense. There are still use cases were k8s is not a solution.

EDIT: part of actions, codespaces and packages are not run on k8s, but 80% of github services are

goodpoint · on Feb 11, 2023

> GitHub runs entirely on k8s

That's really not the endorsement you think it is.

grepfru_it · on Feb 11, 2023

Your comment doesn't really help understand why.

remram · on Feb 11, 2023

GitHub experiences outages pretty regularly, for example on 12 separate days last month: https://www.githubstatus.com/history

joshribakoff · on Feb 11, 2023

So what? Nothing here is sufficient to conclude it has anything to do with k8 whatsoever.

For example “users cannot resume code spaces created before the incident” sounds a lot more like an application level problem.

remram · on Feb 11, 2023

The point above was that it wasn't a good endorsement. Correlation is not causation but the opposite is also true.

wbl · on Feb 11, 2023

Why? Yes the operations can be a bit messy. But in practice it solves the "I want to run, update and deploy my service without worrying about hardware allocations" problem. Otherwise you create an implementation of half of it.

hodgesrm · on Feb 11, 2023

My company has operated a cloud for three years that now manages hundreds of ClickHouse clusters on Kubernetes. We use the Altinity Kubernetes Operator for ClickHouse, aka "clickhouse operator," which we wrote and maintain.

I was very skeptical of data on Kubernetes when we first started, in part due to some initial experience with Kubernetes in 2018 but mostly due to prejudice against change. Overall it has worked out great. Here are 4 of many things we've learned.

1. Most modern databases are distributed systems. You don't just set up a single node but rather several or even dozens of nodes. Well-written operators make this relatively trivial even though it's quite complex underneath. In fact, the simplest way to learn how to set up a ClickHouse cluster is to bring it up under the operator and then look at the configuration on each container. That's how I learned it.

2. Kubernetes portability is overall quite good. We ported our cloud from AWS to GCP in 8 weeks. We've since expanded to run in many other environments as well.

3. We map ClickHouse server containers 1-to-1 to VMs spawned using Karpenter or native node groups. It makes it a lot easier to reason about performance, including things like network bandwidth to storage.

4. ClickHouse is still basically a shared nothing architecture where individual servers own patches of storage. Kubernetes enables a great scaling model if you use VMs attached to block storage--you can scale nodes from 2 to 64 vCPUs in a few minutes, plus you can easily extend volumes. This scaling model is in my opinion highy under-rated for databases. It's decoupled compute/storage that really works. With Kubernetes you get it essentially for free.

It's not all roses. Containers create new failure modes. You can't just ssh in, look at logs, and fix things. Pod crash loops [0] can be very problematic. Certain failure modes like bad EBS volumes (kinda alive states) are hard fix if your operator cannot quickly replace a node. And operator bugs create a new class of very-hard-to-debug problems. The best solution to all of these problems is not to have them, which means you need to focus--often for years--on operator reliabily and your day 2 infrastucture, such as monitoring.

[0] https://altinity.com/blog/fixing-the-dreaded-clickhouse-cras...

Disclaimar: I work for Altinity

ClumsyPilot · on Feb 11, 2023

Kubernetes does not innovate in the Area of Storage. In fact it does not even adopt any interesting concepts from HPC.

Each database container is attached to a single disk volume and they are not fungible.

There are ways of moving beyong this model, but you have to innovate at the API layer.

For example Microsoft Service Fabric, has reliable collections and queues. Its a single Dictionary and queue accessible by your application that is partitioned and replicated with your application. It scales with your application and always writes to local disk.

Why I am unsure of merits of this implementation, I imagine we need something like that to trully have good approach to databases in the new paradigme

https://learn.microsoft.com/en-us/azure/service-fabric/servi...

withinboredom · on Feb 11, 2023

Are people still using Service Fabric? I was under the impression it was going away...

ClumsyPilot · on Feb 12, 2023

I believe some core azure services are built on top of it.

I have to real experience with it, just found their approach interesting

John23832 · on Feb 11, 2023

Kubernetes is fundamentally a middleware for dynamic network automation and code/container segmentation and provisioning. I ever understood how those make running a db any easier. You do not want to tread your db's as cattle. Sadly, they are largely pets. Your data is your most sacred possession.

I'm a fan of k8s, but the rational of running DB's on it has always seemed specious to me.

motoboi · on Feb 11, 2023

Your data is in a disk. Threat data like gold, postgres instances like cattle.

More like a trick poney, actually, because you'll have to teach it a lot about what to do with the several container lifetime hooks and states.

But hey, if you really need to run your postgres instances in kubernetes (which you don't because cloud postgres instances can be fully automated) kubernetes operators are like the world's expert in poney training working for you.

eska · on Feb 11, 2023

Postgres is neither cattle, nor are there “instances”, because it assumes full responsibility for the data it manages.

For example you can not have 2 postgres instances, each with their own locking mechanisms, check whether transactions are serializable, since they don’t know about the locks in the other instance.

This is very basic stuff so I wonder whether people who argue for treating databases like web servers lack basic training in this area.

motoboi · on Feb 12, 2023

Yeah, right? You'd be amazed about what people can achieve with bash scripts and kubernetes containers.

The crux of my argument is: you probably don't need postgres in kubernetes on the cloud (just use the damn cloud api to manage your database), but if for any reason you want to, manage a postgres with kubernetes it's like managing it with systemd.

Way more involved, but doable. Great part is: just as vast majority of people don't write their own systemd units or postgres management scripts, they can use kubernetes operators and benefit from the open source knowledge and ecosystem.

candiddevmike · on Feb 11, 2023

This works well until a misconfiguration has your cattle eating gold.

LawTalkingGuy · on Feb 11, 2023

In the pet model you fix the problem and patch the rest of the pets. In the cattle model you fix the problem and thaw a new herd.

What's the practical difference?

motoboi · on Feb 11, 2023

Well, pets eat their weight in gold those days.

Your pet, your problem. one operator, world's problem.

withinboredom · on Feb 11, 2023

I suppose you've never had a misconfiguration corrupt all your data before? Yeh, it's on the disk, and it's just a program reading/writing to that disk... but even if those programs are cattle, those disks sure as hell are not.

Tostino · on Feb 14, 2023

Which is why pretty much every Postgres k8s operators includes a backup solution that is fully integrated.

I much prefer to be able to treat the database instances like cattle, while the conceptual "cluster" and its associated backups should be treated as a pet, as you said.

xyzzy123 · on Feb 11, 2023

I think this glosses over the downsides of sharing a kernel. There are lots of edge cases where containers can interfere with each other by consuming kernel resources or causing contention in certain kernel code paths.

So you can run a db on kube ok, but a "production like" setup will be keeping other workload off the same kernel. So the kube bit is because it's what you prefer to operate, and you might as well be using vms.

motoboi · on Feb 11, 2023

Create a scalable node pool just for databases. Each node with the right amount of memory (125% of what the a database instance will use) and CPU (db + 1 for kubernetes' infra).

Taint the node pool so that only the database can tolerate it.

Presto. Now when you scale up N instances, N new nodes are created.

lrvick · on Feb 11, 2023

You just simply pin the DB on dedicated workers. No big deal.

trollied · on Feb 11, 2023

This was clearly written by somebody that has never run a large, very very busy database, with strict DR requirements.

verdverm · on Feb 11, 2023

You clearly didn't read or hear Kelsey's full take because he agrees with you. People here are taking a clip/tweet out of context

I suggest listening to his Space in full

KaiserPro · on Feb 11, 2023

Whats the difference between running a large DB running at 90% on a VM compared to k8s?

The method of snapshotting uses a different plugin, but apart form that, its essentially the same.

stabbles · on Feb 11, 2023

It is written by a L9 at Google though

fh973 · on Feb 11, 2023

But likely not a L9 SWE, is he?

hotpotamus · on Feb 11, 2023

Then the Emperor's new clothes must be the finest in the land, no?

sofixa · on Feb 11, 2023

The "don't run databases in Kubernetes/containers" predates CSI and stable persistent storage in Kubernetes, and is long since obsolete. With the proliferation of amazing Kubernetes operators and "cloud-native" databases such as CockroachDB and ScyllaDB, if anything, you need a good reason not to run your DBs in Kube if that's your primary orchestration platform.

simonebrunozzi · on Feb 11, 2023

Kelsey is super smart, but this is possibly the worst type of advice you can give.

You should NOT run your databases on Containers. As Werner Vogels says, "eventually, everything will fail" at scale, and the more complexity you add to a system, the higher the chances. K8S is an unnecessary layer.

Plus, don't even get me started on the security aspects of this choice...

verdverm · on Feb 11, 2023

You obviously didn't read or listen not the full content because this is what Kelsey says too, but with more context and humility

I suggest listening to his full space on the subject

AtlasBarfed · on Feb 11, 2023

Then why is he tweeting something that implies the opposite?

Don't tweet something that implies a very obvious conclusion, and then say "well if you read the reams of fine print you know he doesn't mean that".

verdverm · on Feb 11, 2023

Probably because single tweet is limited to 140 chars, so he spread it across several.

Did you read beyond the first tweet in the sequence?

His point is that the decision depends on what you are trying to do. Running prod database, bad idea. Letting developers or automation spin up ephemeral databases, good use case.

The idea that k8s and vms are the same is that neither handles day 2/3 issues for you and both methods require a lot of expertise. i.e. deploying on a VM doesn't solve the hard parts either

simonebrunozzi · on Feb 12, 2023

> Then why is he tweeting something that implies the opposite?

Exactly this.

wg0 · on Feb 11, 2023

To some extent, it is true specially after the containers checkpoint and restore functionality where you stop the container at an exact instruction, its all state including registers is saved and then it can be restored on a different machine waking up as if nothing happened. Also known as live migration. Based on [1] and it seems it is either already in or on its way into k8s [2]

[1] https://criu.org/Main_Page [2] https://www.youtube.com/watch?v=wCb1Rfoy7Fk

dunno7456 · on Feb 11, 2023

People still thinking one "runs stuff on kubernetes".

No, you run stuff on Linux, using kubernetes to manage it.

throwawaaarrgh · on Feb 11, 2023

You run stuff in containers on Linux, along with 50 million other containers, and an automated system that will do all kinds of shit to your containers, not to mention your storage, networking, access control, sysctl, etc

JonathanBeuys · on Feb 11, 2023

    Is deploying a database binary and mounting
    a volume on a VM really the hard part?

The nice thing about containers is that you are independent of the underlying infrastructure. Be it a VM or bare metal.

I can easily spin up a Docker container on my laptop. Takes a second or so.

Spinning up VirtualBox or something is so much more hassle.

evol262 · on Feb 11, 2023

This falsehood needs to die.

It's true for non-trivial cases, and exceptionally false for some things. Like databases. A lot of the sysctl tuning for large database environments has to be done on the host, either directly, allowing unsafe sysctls and restarting kubelet, running with privileges, etc.

This is not "independent of the underlying infrastructure" or "the VM and my laptop have the same kernel tunables set". They don't. And it matters.

JonathanBeuys · on Feb 11, 2023

Only when your application depends on optimizing your database to the max.

Most applications don't.

And even if the application I work on does - then only in production. Not on my laptop. So on my laptop I will simply chose a docker image that resembles production as closesly as possible. I will not use a VM.

KaiserPro · on Feb 11, 2023

> Only when your application depends on optimizing your database to the max.

Only when you don't need to store data for any length of time. The point of containers is that they don't store state. You have to manage that.

And for some cases you can get away with it. But for most websites, keeping a single container up for many years isn't a great strategy. (Yes, I know there are ways around this. but they are not as simple as "docker run postgres")

xorcist · on Feb 11, 2023

Trivial use cases are generally trivial.

koolba · on Feb 11, 2023

The vast majority of use cases for databases are trivial. For every esoteric, optimized to the max, bleeding edge kernel parameter tuned environment, you have 10,000+ CRUD apps that do little more than primary key look ups on vanilla btree indexes.

therealdrag0 · on Feb 11, 2023

I’ve worked at a half dozen companies most people have never heard of and they all had database complexity and performance issues. The whole “only FAANG has scale” meme needs to die.

You’re probably right there’s a magnitude more tiny apps but who cares. A large number of us still are interested in and need to solve for non-trivial cases.

Too · on Feb 12, 2023

Most database performance issues are bad choice of indices, bad data model, bad queries, not enough RAM, missing pooling, unrealistic expectations and lack of pre-computed caches.

None of these are solved by "optimize kernel-parameters to the max".

therealdrag0 · on Feb 12, 2023

I agree! I must have misread or misunderstood the context.

koolba · on Feb 12, 2023

This is exactly what I was talking about.

lrvick · on Feb 11, 2023

This sounds like a good argument for firecracker or some other micro hypervisor wrapper around your container to give you more control. You should never ever touch the host OS of a production system.

KaiserPro · on Feb 11, 2023

I can spin up a Db on my laptop in a few minutes.

The difference is my laptop doesn't provide any guarantees of stability, uptime or throughput.

installing a DB is as simple as "apt install $db"

> The nice thing about containers is that you are independent of the underlying infrastructure

I mean you're really not. You're dependent on the machine, the OS, and the kernel version. Docker on windows is a VM, from what I remember. I assume OSX's docker is still VM based too.

tra3 · on Feb 11, 2023

Is docker fast because you have other containers running already and you’ve got a vm up?

Vagrant used to be a thing and it would spin vms up pretty quick.

There are differences technically, between docker and vms, certainly, but conceptually they are the same, aren’t they?

There’s no real

JonathanBeuys · on Feb 11, 2023

It's just fast.

I have not used a container today. Let's use one:

    $ time docker run --rm debian:11-slim echo hello
    hello

    real    0m0.528s
    user    0m0.015s
    sys     0m0.023s

jiggawatts · on Feb 11, 2023

VM platforms had the opportunity but not the motivation to achieve comparable startup times. There are a handful of examples out there like AWS Firecracker, but the majority out there are “full fat”.

As a random example, Azure full-clones 127GB disk images by default. It takes over a minute to create a VM. Booting form a cold start is sluggish because there is no sharing with other tenants, hence no caching.

Using the same hypervisor (Hyper-V) I can clone out a Windows server VM and boot it in about 3 seconds by simply using delta cloning. Subsequent boots of it or any of its siblings is just over a second!

Containers and Kubernetes are throwing the gloves down and will force the competition to pick up the pace.

sofixa · on Feb 11, 2023

Azure is the worst possible example and is invalid as a comparison outside of to show exactly how bad it is.

AWS EC2s start within a couple of tens of seconds max. AWS Lambda, Google Cloud Functions, Google Cloud Run start within ms (Google Cloud Functions used to have a cold start problem where sometimes they'd take up to a few seconds to start, but that has been fixed).

But overall, I'd say VM platforms are somewhat a thing of the past. Nobody cares about running an OS, what you need is the things inside (your application, database, etc.) so fundamentally a VM is an abstraction at the wrong layer, a means to an end. Don't get me wrong, they're still here and aren't going anywhere, but should no longer be the go-to outside of a few specific cases - bare metal, containers and "serverless" (running on bare metal) is where it's at.

fulafel · on Feb 13, 2023

Fast compared to traditional VMs but quite slow for what it's doing (setting up some namespaces and a cow filesystem forked off the image).

Alesiono · on Feb 11, 2023

Swap is missing in k8s. That's a difference from a VM.

Otherwise the operator concept makes it even easier. Zalando operator for PostgreSQL takes care of ha and backup

champtar · on Feb 11, 2023

It's still alpha, but it's there https://kubernetes.io/docs/concepts/architecture/nodes/#swap...

throwawaaarrgh · on Feb 11, 2023

it is impossible to predict the behavior of k8s. the more you have "going on" in a k8s cluster, the worse it gets. there's already a metric shit ton of things to worry about with a database without also having to worry about an entire k8s cluster and what it might be doing to your database

PeterZaitsev · on Feb 11, 2023

When run with good Operator Databases have much more automation than conventionally run databases on VMs - you ten to get high availability built in, easy upgrade backups etc.

AtlasBarfed · on Feb 11, 2023

"Storing data on an ephemeral drive in AWS is fundamentally the same as storing data on an EBS volume".

Yeah, but its not.