Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's odd, but I actually really enjoy using Kubernetes in production.

We have a few rules:

1. Read a good intro book cover-to-cover before trying to understand it.

2. Pay a cloud vendor to supply a working, managed Kubernetes cluster.

3. Prefer fewer larger clusters with namespaces (and node pools if needed) to lots of tiny clusters.

3. Don't get clever with Kubernetes networking. In fact, touch it as little as possible and hope really hard it continues to work.

This is enough to handle 10-50 servers with occasional spikes above 300. It's not perfect, but then again, once you have that many machines, pretty much every solution requires some occasional care and feeding.

My personal Kubernetes nightmare is having to build a cluster from scratch on bare metal.



> 3. Don't get clever with Kubernetes networking. In fact, touch it as little as possible and hope really hard it continues to work.

This one.


Kubernetes on bare metal is actually pretty easy. Kubernetes on a hosted solution which doesn't have a managed version is prone to error. Usually on bare metal you can make some guarantees regarding bandwidth and storage speed. Trying to roll out a cluster on a service that can't give you these guarantees is truly a nightmare.


I would also say that if you are going to be administering clusters at your company that you should at least set up a cluster from scratch (doesn't have to be bare metal) and learn how the kubernetes control plane works by breaking it in various ways etc.

In my experience most people don't like black magic, they want something that they understand on some level. A fully managed k8s cluster is black magic, once you have set up a vanilla cluster you get a much better feeling about how the control plane works together to get things done.


I have tried several times over the past few years to install Kubernetes on bare metal, and it has never worked.

I don't mean installing it on VMs on a laptop, I mean on a real linux cluster of 8 to 32 nodes, with real networks and real switches.

Managing bare metal machines is a cakewalk compared to getting Kubernetes running in-house, at least in my experience.

Obviously the cloud providers do it, so it's possible. But IMO it is something you do only if you have a full-time admin team available to set it up and manage it. It's not by any stretch of the imagination something you install and forget about.


What were you using to install kubernetes?


Did you try using kubeadm to bootstrap installing kubernetes? It is pretty simple.


> Kubernetes on bare metal is actually pretty easy.

I would not call it easy at all. Last time I tried that a year ago you still needed a special load balancer to get it going (https://metallb.universe.tf). Has this changed?


MetalLB is pretty simply to configure.


That's just not true, especially if you compare it to the LoadBalancer you get on a cloud platform which usually involves zero clicks. I'm not saying it's impossible but it's definitely not "easy".

Configuration instructions: https://metallb.universe.tf/configuration/

Hint: You better know what all of these are in your environment:

    For a basic configuration featuring one BGP router and one IP address range, you need 4 pieces of information:

    The router IP address that MetalLB should connect to,
    The router’s AS number,
    The AS number MetalLB should use,
    An IP address range expressed as a CIDR prefix.


Did you miss the part about layer 2 configuration, where you don't need BGP at all? https://metallb.universe.tf/configuration/#layer-2-configura...


But then "When announcing in layer2 mode, one node in your cluster will attract traffic for the service IP."

This bottlenecking seems undesirable. At the very least, if you have one "main" traffic heavy service whichever node ends up servicing that IP address could have elevated cpu usage from processing all the network traffic via kube-proxy.

The obvious solution would be to allocate say 2 or more so ip addresses for the service with dns round robin set up. Then as long as all three are being handled by different nodes you are not bottlenecking nearly as badly. But perhaps I am missing it, but I'm not seeing a feature where you can force those two or more ip addresses to be claimed by different nodes. (If the feature is strict, then you would want more data plane nodes than IPs, so that having one node down will result in having part of the Round robin DNS unclaimed by any node).


True. If you want true load balancing, you need a layer 3 solution (BGP.)


MetalLB has been in beta for YEARS. It's OK for dev/qa/staging, but I wouldn't put in prod.


I wouldn't use it in prod when there are other alternatives from cloud providers. But to say it is difficult to configure for a bare metal dev cluster is not true. The instructions are pretty clear.


I don't disagree, I think it is easy to install on a bare metal cluster, although I think using HA Proxy is just as easy and probably a better solution. I was just pointing out that it has been in beta for a very long time.


HA proxy isn't complicated to setup.


Good rules.

>2. Pay a cloud vendor to supply a working, managed Kubernetes cluster.

If one is at that level already, I don't think there's anything better than AWS ECS out there. It just works. Just works. Yes sure, it does not offer stateless workloads for example among other things but it works for 90% of the cases.

> 3. Don't get clever with Kubernetes networking. In fact, touch it as little as possible and hope really hard it continues to work.

Pretty much... Each CNI generates the SDN its own way slightly differing then the others. It is like you can write the program to print a chessboard on terminal in ten different ways.

Unfortunately, these implementation details aren't written or documented anywhere and they of course would keep changing from release to release anyway. Your only way out if you have production workloads that you can't afford going down without missing revenue? Just pay for the support for respective CNI as only they would know what the voodoo magic is under the hood.

Sure you can see the source code and all of them are open source but that's not your main business or the main day job and of course, the solutions aren't 100 line trivial implementations either.


> If one is at that level already, I don't think there's anything better than AWS ECS out there.

100%. To answer the OP's question: my nightmare is having to use it at all. I work with small, very early-stage companies whose applications by and large are not complicated. Perhaps at some level of scale and/or complexity, k8s makes sense. For the vast majority of the cases I see, something like ECS does everything they need, while being significantly more simple to understand, develop for, and debug.


Do you still recommend they host their applications in containers (e.g. Docker)? I feel like it's fairly low effort to start out that way, but can be a pain to add later.


Being that they're all using ECS, yes containerizing using Docker is a prerequisite.


Doh. I am only familiar with Azure, and was confusing ECS with ordinary VMs. Sorry about the stupid question!


No worries, the cloud acronyms overlap so much these days, if it's helpful:

EC2: Elastic Compute service, bare VMs.

ECS: Elastic Container service, Docker containers

EKS: Elastic Kubernetes service


> I don't think there's anything better than AWS ECS out there

Do you have any experience with Kubernetes on GCP being less good than AWS ECS? I'd expect them to be the gold-standard when it's a project coming from Google originally and we haven't had any Kubernetes problems that were related to GCP.


I have experience with Google's managed Kubernetes service (GKE).

It's basically great. Solid, few surprises, no compatibility issues with third party software packaged for Kubernetes. Autopilot looks even better – billing you only for the resources you allocate rather than for the full nodes, basically removing the bin-packing problem. Very little about our config was Google specific or would cause issues porting to another provider. It was up-to-date enough for us to use relatively new features, while lagging enough that everything felt pretty stable.

The only issue we had was wanting to use a somewhat obscure configuration for the Google Cloud load balancer instance that was underlying the Kubernetes ingress. This was possible, we just had to configure it manually in Terraform and point it at the cluster rather than being able to treat it as a cluster resource if I remember correctly. This was only a temporary solution while they were in the process of adding more custom control via K8s.

As far as I can tell it is considered to be the gold-standard.

Disclaimer: I now work for Google on non-cloud stuff, but this was my experience doing a port from bare-metal to GKE.


> 3. Prefer fewer larger clusters with namespaces (and node pools if needed) to lots of tiny clusters.

This is interesting - last time I worked with Microsoft Engineers from Azure - they said exactly the opposite.

One workload = One cluster.

„There are too many shared resources in Kubernetes that can leak collateral damage from one workload to another”.


Azure might also require special precautions. Honestly, I've seen Azure have a lot of networking issues, for example. But this is based on scuttlebutt and limited personal experience.

I've found that on GCS, certain workloads benefit from a dedicated node pool. This gets them their own CPU and RAM and volume I/O. Yes, I could imagine that there are shared Kubernetes control plane resources that might be affected, but I haven't seen that with any of our workloads. It might get more complicated if you have lots of in-cluster networking.

But none of this is my area of expertise. I just think that Kubernetes can mostly be pretty pleasant in practice for companies that have outgrown PaaS offerings like Heroku and Render.com.


Not really. You can do fairly large clusters, you need differently sized node pools. For example - we run apache NiFi is AKS which is a complete memory and cpu hog. We have a node pool 16cpu/64g ram for that workload which we specify a node selector. Microservices we use a different node pool. System services run on the default node pool.

If you're running Azure functions with KEDA - setup a nodepool for that with a lower cpu/memory footprint.


It's really quite frustrating to find that out from experience. Things like operators, anything Cluster*, named resources, etc.

If you're doing relatively simple things with the cluster, then you can do namespaces. The more custom shit you do, the better off you are with true isolation.


What is their definition of a workload? Do they want a cluster per microservice? Per application? Per customer?


We had a different set of microservices doing specific part of the system.

One part was responsible for data transformation and the other was responsible for user modifications.

Its hard to tell whats the cutting point it depends on the system architecture.


Wouldn't a name space make more sense than a whole cluster?


Why is bare metal a nightmare? I have a project coming up which must be on bare metal so was thinking of doing this. Also, if it's so bad, what's better to use on bare metal? Thanks


My experience with bare metal is multi-fold:

* Documentation sees it as a second-class citizen, if that (loadbalancers, volumes are heavily biased towards cloud providers)

* Many cloud-provided instances of kubernetes will always use the exact same VMs backing the nodes. So they really don't have to care all that much about what config your bare metal cluster has or needs.

RancherOS/K3S can be really quite nice for getting bare-metal clusters up & going really fast. They don't always feel the most complete though, mostly lacking around failure documentation. Even RancherOS has a bias towards cloud clusters, but it's quite easy at least to get a simple k3s cluster going. I'd personally recommend going that way. RancherOS if you're managing multiple clusters, plain k3s if you're doing just one. It'll even come with a pretty decent LoadBalancer & volumes. If you need better management of volumes, Longhorn or minio isn't bad.

microk8s/KinD are for dev-env only, and I wouldn't recommend it for any bare metal cluster. 'Fun' to screw around with though.

Edit: I had a lot of really obnoxious DNS problems, mostly due to docker daemon & how the system config would interact with k8s/k3s. Super annoying when you can get everything working in docker containers manually, but not working in k8s. Once you get your bare metal system configured to work, it'll be fine. It's also very confusing how many different network options there are, and their claims are dubious at best.

To expand on the network subsystems: canal/calico/flannel/ipvs based vs iptables based, etc. We did a bunch of low-latency (sub ms) perf testing for ipvs vs iptables. Docs say ipvs should be both faster (throughput) and lower latency. Tested evidence did not show that to be the case for both small #s of pods & large numbers of pods. This was for a small cluster, so that could be impacting the results.

Never mind that it's a rather huge PITA to switch between them all. Rancher/K3S makes it a bit easier, but still annoying.


> loadbalancers, volumes are heavily biased towards cloud providers

Can you even run a "loadbalancer" if all you have is a single machine with a single IP behind a router you don't control? I got stuck on that the last time I tried running my own kubes.


not necessarily a router you don't control, but MetalLB does provide some nice LoadBalancer constructs for a bare-metal deployment. Putting Vyos infront of it is magical!

https://metallb.universe.tf/


K3s uses their own loadbalancer, so yes. It will add extra hops to any of your services if you care about sub-ms latency.

I looked at metalLB, and didn't really fit with what we wanted to do, so YMMV. It's pretty limited unless you control a lot about your IP space.


Why would you need a load-balancer if you only have a single machine?


Because kubernetes says so? Can you run it without a load balancer?


You can use Nodeport instead of loadbalancer. Or use metallb if you insist to have LB so that it’s more closer to real production environments.


“Real production” smh. This is why docs make baremetal second class citizens, people assume something has to be a certain way for it to be “real.”


There's always one more thing that you need to install to have a working cluster that comes out of the box in cloud.

You want networking? OK, go read about Calico, Flannel, Cilium, etc and choose one. If you didn't fully read the instructions for the networking plugin you plan to use, plan to blow away your cluster and set it back up from scratch with the correct RFC1918 address range for your network plugin that doesn't conflict with your presumably existing network. Plan to dive in and re-jigger things when you need IPv6.

You want a working LoadBalancer? OK, now you need MetalLB or PureLB, among others. Make sure your IPAM people know that you've blocked off several addresses or a CIDR range for K8S dynamic address allocation. IP's allocated via K8S aren't going to respond to ICMP packets and people will assume they're unused :)

You want ingress controllers? OK, well you can pick from Nginx or Traefik. There's actually a ton of them but those seem to be the most popular.

You want certificate management? OK, go install CertManager. You'll need to have programmatic access to your DNS providers if you want to use Let's Encrypt with wildcard certificates.

Oh, you need some kind of volume provider? Well.. there's hostPath but people generally don't recommend that for security reasons. I guess you could use the NFS volume provider but that's a little creaky for all of the usual reasons that NFS has been creaky for the last 30 years. You could go install Rook - but that's another entire complex distributed system ontop of your distributed system. (I love Ceph, BTW- but this is really overwhelming for a new person)

At this point you have essentially a working cluster, probably with a single master unless you set up something like OKD, in which case you already had to setup an entire HAProxy setup before even approaching the K8S parts.

Prepare to have a non-insignificant number of full time employees keeping the plane flying while you swap out the wings in real time to keep up with the fast K8S release cycle.

IMO, the complexity of K8S really incentivizes trashing all of your on-prem hardware and just paying for cloud. That's the end game.


I actually found bare metal to be fairly pleasant, and because I built it I understood a ton about how it worked so was able to figure out issues a lot easier.

My advice would be to take careful notes about your setup steps though, even if you're following a guide. For some reason in the k8s world I have a hard time finding blog posts/guides/etc that I used months later, and Chrome seems to eat my bookmarks :-(. I suspect SEO is a ruthless beast when it comes to K8s.


I did a project 5 years back that had to be bare metal, and going to for Kubernetes was probably the worst project decision I've made so far. We didn't have the required competency and wasted so much time on it, we should have gone for something more bland and simple.

My only tip if you really decide to go for it is to make sure to use a well-supported linux distro. We had to be on REHL and that turned out to be ill fitted.


If you plan on running bare-metal I highly recommend RKE2. It just works, it sets up most things for you (CNI included).

Don't even think about using kubeadm, it's the worst. It's overcomplicated and the smallest issue will wreck your cluster.

Also as a quick tip, don't use firewalld or iptables, use CNI resources (eg calico GlobalNetworkPolicy c; )


Because a vendor will do a lot of ground work (choosing a CNI and CSI implementation for instance) for you, and everything usually covered by a cloud-controller will be entirely up to you (e.g. LBs)


Actual bare metal, where you own the physical hardware and pay for the physical network connections is actually pretty painless. I see many people trying to use hosted compute to try and set up a bare metal cluster. This is a recipe for heisenbugs.


Having physical access (or IPMI) certainly helps, but there's also a lot more knowledge about networks bundled in companies that already run data centers, so setting up something like MetalLB (BGP load balancers) and Rook (Ceph CSI) to cover the parts that your cloud vendor would usually provide automatically is not as big of a deal. But the overall complexity for someone completely new to the topic is still higher.


> 2. Pay a cloud vendor to supply a working, managed Kubernetes cluster.

... which makes it trash IMHO. I don't see anything intrinsic about the problem domain that mandates a completely uninstallable unmaintainable Rube Goldberg machine. But having it be that way certainly benefits the cloud vendors who push it since it keeps people from escaping big cloud costs and using simple commodity VMs, bare metal, or colocated stuff.

Complex is the new closed. It can be fully "open" but it doesn't matter if mere mortals can't use it.


Regarding item 1, any recommendations for a good book? Manning has a couple titles that look good, but I’m curious to hear what others would suggest.


> My personal Kubernetes nightmare is having to build a cluster from scratch on bare metal.

We have a bare-metal k8s cluster... In my opinion the thing we got right is to use external load-balancers (good old haproxy) to point at nginx-ingress-controllers (whose pods are pinned to two "service" nodes) and to load-balance the apiserver traffic.

Most other traffic is inter-cluster, and managed by calico anyway.


And don't expose workloads to the internet unless it is a prod app.

Can you recommend me a good intro book to read cover to cover (hopefully not too thick).


"Kubernetes in Action" by Marko (Manning publishing) is my recommendation. Took me from someone who knows docker/docker-compose to someone who can handle Azure/AWS Kubernetes, understand the terms and design apps. Very good book.


I'll second that rec, same story... I wouldn't consider myself at all an expert based solely on that book, but it did give me a lot more confidence in branching out from a straight-and-narrow configuration, that I'd at least be able to know what to look up when I run into problems.


Thanks


Which book would you suggest please?


You hit every high point of my own experience but there are caveats. 1. If you have to do on prem then virtualize and package till you can wash, rinse repeat. 2. Secure systems with k8s are a thing: Stigged k8s, stigged host systems, mtls, psp, network policy, MAC integration - this makes k8s really unpleasant to deal with if you come from pub cloud, pub k8s provider. See #1. 3. Performance: dns sucks and it sucks for all kinds of reasons: usually avoidable with node local caching approaches, but sometimes not. 4. Yes: Big clusters...until you need federation.


>My personal Kubernetes nightmare is having to build a cluster from scratch on bare metal.

One's heaven is another one's nightmare, i like building it from scratch because then i know every single knob, and doing a excellent job in documentation makes sure that others have that knowledge too.

But hey since "administrators" is a forgotten art, you are probably better of just buying some black-boxes with terrible performance.


Depending on your individual circumstances buying "some black-boxes with terrible performance" might be a worthwhile tradeoff


Well yes that's true. It's always about the circumstances.


What are some of the best Kubernetes books?


I personally liked O'Reilly's Kubernetes: Up and Running, which was fairly thorough, and Nigel Poulton's books, which were shorter and focused on the highlights (at least the editions I read).

The reason I always recommend that people read a book before getting into Kubernetes is that there are several things that make a lot more sense once someone takes the time to explain them.

It actually gave me some 90s nostalgia. In order to use a new server technology, I actually needed to sit down with an O'Reilly book.


Is website documentation not good enough? It looks very thorough. Actually I’d say that it’s rare to encounter so verbose and full documentation nowadays. May be it’s even too deep, but I enjoyed reading it.


The website has tons of reference documentation!

But what a lot of people need is someone to just explain:

1. The basic idea of setting a desired configuration, and having the cluster try to bring reality into sync with the config.

2. How pods, replica sets, deployments and services fit together, and why Google thought it was a good idea to split them up that way. Also, how ingress fits in with all this.

3. Basic volume management.

4. Other common optional topics, just to get an overview.

The big advantage of a book is that it will try to cover the essential ideas, any why they work the way they do, without getting lost in describing a hundred advanced features you can look up later.

If there's an introductory section on the website that covers just the essentials, that might be enough! But I didn't find one when I was learning.


For intro level: Kubernetes Up and Running. (Here's a free version provided by VMware: https://www.vmware.com/content/dam/digitalmarketing/vmware/e...). Will teach you the basic vocabulary and get you well enough oriented to use k8s.

For trying to get to pro level: Programming Kubernetes. This one is focused on writing code for the k8s ecosystem, but it will teach you a lot of the internals.


Kubernetes in Action by Manning (https://www.manning.com/books/kubernetes-in-action) is quite through, good and beginner friendly.


Second that, But I do recommend to come with docker/docker-compose understanding in advance.


There’s a second edition in MEAP.


>Prefer fewer larger clusters with namespaces (and node pools if needed) to lots of tiny clusters.

People do this? I thought the whole point was to abstract everything away. You should have containers running on pods. You shouldn't care about what's in the containers or what metal the pods are running on.


Some people don't trust the namespacing in Kubernetes, or have contractual obligations to keep environments separate. I've rarely seen clusters with more than 10 nodes, but I have seen single customers run 5 tiny different clusters, for different environment.


Thank you for your insight. Really solid advice.

I have a question thought.

> My personal Kubernetes nightmare is having to build a cluster from scratch on bare metal.

Can you share a few details about which distribution you used and how did you handled ingress?


I've done exactly that. It wasn't fun. RHEL, k3s, ansible, Longhorn, metallb, to name a few.

Storage is the fun part.


I was playing around with a local Elasticsearch cluster and I couldn't figure out how to "do" k8s storage. Some kind of like... shared NFS volume or something maybe?

Could you spread some tips from your experiences?


Longhorn is honestly pretty easy to get up & going for backing a PersistentVolume, which you can then mount however you want.

K3S has some local storage options too, but that's of mixed usage. Or you just do a hostpath + NFS if you want something that has as little Kubernetes magic as possible.


Yeah, NFS is your best bet if you are on a lab enviroment. Mount the volume at each host, then use hostPath to mount it in the pods (https://kubernetes.io/docs/concepts/storage/volumes/#hostpat...).


We used a large amount of local physical storage spread pretty evenly over the machines in the cluster and then gave this to Longhorn to manage.

It pretty much takes care of everything else. But does require some preventative TLC


can you recommend any good intro books?


Responded to a sibling comment with the same question here: https://news.ycombinator.com/item?id=31894095




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: