Hacker News new | past | comments | ask | show | jobs | submit login
Kubernetes SidecarContainers feature is merged (github.com/kubernetes)
217 points by xdasf on July 10, 2023 | hide | past | favorite | 59 comments



While this is a very welcome improvement in terms of functionality, I can't help by feel that the re-use of "restartPolicy" to mean something similar, but different, when used in a different context, is a very poor decision.

Kubernetes already has an issue with having a (perceived) high barrier to entry, and I'm not sure that "restartPolicy on a container means this, unless isn't used in this list of containers, in which case it means this".

I would have preferred to see a separate attribute (such as `sidecar: true`), rather than overloading (and in my opinion, abusing) the existing `restartPolicy`.


The challenge with a separate attribute is that it is not forward compatible with new features we might add to pods around ordering and lifecycle. If we used a simple boolean, eventually we’d have to have it interact with other fields and deal with conflicting behaviors between what “sidecar” means and more flexibility.

The only difference today between init containers and regular containers is:

a) init containers have an implicit default restart policy of OnFailure, and regular containers inherit the pods restartPolicy

b) init containers are serial, regular containers are parallel

We are leaving room for the possibility that init containers can fail the pod, and be parallelized, as well as regular containers having unique restartPolicies. Both of those would allow more control for workflow / job engines to break apart monolith containers and get better isolation.

The key design point was that “sidecars aren’t special containers” - because we want to leave room for future growth.


It's par for the course. Most of K8s's design has been shoving whatever crap they feel like in, regardless of confusion, difficulty, complexity, etc for the end user.


At some level it seems deliberate so that administration of the complexity can be sold to you for a price once you realise that you can't hack it on your own, but are now too invested to back out.


I've been brushing up on my Kubernetes knowledge recently and came across so much gross stuff like this. "If field X is set to Y, then value Z for key V is invalid." Jesus christ. I wish they put more effort into approachability.


A very welcome change. It's gonna be helpful for the case where the database proxy (CloudSQL) and the main container got terminated out of order.

https://cloud.google.com/sql/docs/postgres/connect-kubernete...


That is very annoying. I remember having spent some time with this same issue in Google App Engine as well, which also runs Cloud SQL Proxy as a sidecar container.

https://github.com/GoogleCloudPlatform/cloudsql-proxy/issues...


The fact that this is needed for so many different things Google pushes, and it has been so slow to make it in, has been very frustrating and telling.


Just FYI for people who don't know about it yet: with cloudsql-proxy v2 there's a new parameter called "--quitquitquit" that starts up an HTTP endpoint to be used for graceful shutdowns. Basically your main container makes a POST to this endpoint, and sidecar exits.


On the one hand, great.

The other hand, one of the main criticisms of Kubernetes is that it has no composition or orchestration capabilities. It's great about defining pieces of state, but managing blocks of state & multiple things at once is left almost entirely to external tools.

The ability to compose &sequence multiple containers feels like a very specific example of a much broader general capability. There's bedevilling infinite complexity to trying to figure out a fully expressive state of state management system - I get why refining a couple specialized existing capabilities is the way - but it does make me a little sad to see a lack of appetite for the broader crosscutting system problem at the root here.


Yeah I work on the team that builds Amazon Elastic Container Service so I can't help but compare this implementation with how we solved this same problem in ECS.

Inside of an ECS task you can add multiple containers and on each container you can specify two fields: `dependsOn` and `essential`. ECS automatically manages container startup order to respect the dependencies you have specified, and on shutdown it tears things down in reverse order. Instead of having multiple container types with different hardcoded behaviors there is one container type with flexible, configurable behavior. If you want to chain together 4 or 5 containers to start up one by one in a series you can do that. If you want to run two things in parallel and then once both of them have become healthy start a third you can do that. If you want a container to run to completion and then start a second container only if the first container had a zero exit code you can do that. The dependency tree can be as complex or as simple as you want it to be: "init containers" and "sidecar containers" are just nodes on the tree like any other container.

In some places I love the Kubernetes design philosophy of more resource types, but in other aspects I prefer having fewer resource types that are just more configurable on a resource by resource basis.


Your approach sounds a lot like systemd's, with explicit dependencies in units coupling them to each other.

It's pretty cool how one can have a .device or what not that then wants a service- plug in a device & it's service starts. The arbitrary composability enables lots of neat system behaviors.


As a consumer, ECS + Fargate is my happy path. I appreciate the lack of complexity. Thanks.


Deploying Fargate with CDK has to have been the most pleasant developer experience I have ever had with any product so far.

If image caching becomes a reality with Fargate I can't imagine a need to ever use anything else

https://github.com/aws/containers-roadmap/issues/696


So I can give some behind the scenes insight on that. I don't think image caching will be a thing in the way people are explicitly asking, but we are exploring some alternative approaches to speeding up container launch that we think will actually be even more effective than what people are asking for.

First of all we want to leverage some of the learnings from AWS Lambda, in specific some of the research we've done that shows that about 75% of container images only contain 5% unique bytes (https://brooker.co.za/blog/2023/05/23/snapshot-loading.html). This makes deduplication incredibly effective, and allows the deployment of a smart cache that holds the 95% of popular recurring files and file chunks from container images, while letting the unique 5% be loaded over the network. There will be outliers of course, but if you base your image off a well used base image then it will already be in the cache. This is partially implemented. You will notice that if you use certain base images your Fargate tasks seems to start a bit faster. (Unfortunately we do not really publish this list or commit to what base images are in the cache at this time).

In another step along this path we are working on SOCI Snapshotter (https://github.com/awslabs/soci-snapshotter) forked off of Stargz Snapshotter. This allows a container image to have an attached index file that actually allows it to start up before all the contents are downloaded, and lazy load in remaining chunks of the image as needed. This takes advantage of another aspect of container images which is that many of them don't actually use all of the bytes in the image anyway.

Over time we want to make these two pieces (deduplication and lazy loading) completely behind the scenes so you just upload your image to Elastic Container Registry and AWS Fargate seems to magically start your image dramatically faster than you could locally if downloading the image from scratch.


Ditto. ECS/Fargate has always been the easiest, most flexible, most useful containerization solution. It's the one AWS service with the most value to containerized services, and the least appreciated.


there was a pretty big feature gulf between it and K8s when it first launched. I found myself wishing i had a number of kubernetes controllers initially (Jobs (with restart policies), Cronjobs, volume management etc).

From what i've head they've made a great many quality of life improvements but as is often the case it can be hard to regain share when you've already lost people.


In general, the intent here is to leave open room for just that.

dependsOn was proposed during the kep review but deferred. But because init containers and regular containers share the same behavior and shape, and differ only on container restart policy, we are taking a step towards “a tree of container node” without breaking forward or backward compatibility.

Given the success of mapping workloads to k8s, the original design goal was to not take on that complexity originally, and it’s good to see others making the case for bringing that flexibility back in.


I've a question that I've been wondering about for a while. Why does ECS impose a 10 container limit on a task? It proves very limiting in some cases and I've to find hacky workarounds like dividing a task into two when it should all have lived and does together.


I like it this way to be honest. We needed to create a custom controller for Dask clusters consisting of a single scheduler, an auto-scaling set of nodes, an ingress and a myriad of secrets, configmaps and other resources.

It wasn’t simple, but with meta controller[1] it was relatively easy to orchestrate the complex state transitions this single logical resource needed and to treat the whole thing as a single unit.

I’m not saying Kubernetes can’t make simple patterns easier, but baking it into core leads to the classic “tragedy of the standard library” problem where it becomes hard to change that implementation. And the k8s ecosystem is definitely all about change.

1. https://metacontroller.github.io/metacontroller/intro.html


This is all true, and if you read the KEPs they were thinking about this. One camp was advocating for solving the problem of specifying the full dependency graph spec (of which sidecars are one case), another advocating for just solving the most needed case with a sidecar-specific solution to get a solution shipped. The latter was complicated by a desire to at least leave the door open for the former.

Pragmatism won out, thankfully IMO.

Edit to add: see this better description from one of the senior k8s maintainers: https://news.ycombinator.com/item?id=36666359


There's lots of tools built on top of K8s to accomplish this tho. For example, Argo, Tekton, Flyte etc.


Absolutely, no shortage of things atop. Helm is probably the most well used composition tool.

It seems unideal to me to forever bunt on this topic, leaving it out of core forever. Especially when we are slowly adding im very specialized composition orchestration tools in core.


Requiring the user to write their own operators to manage state using the kubernetes api is very much a feature and not something which is missing.


Agreed that is a feature and not a bug.

But! The one thing that custom orchestrators can’t do is easily get the benefit of kubelet isolation of containers and resource management. Part of slowly moving down this path is to allow those orchestrators to get isolation from the node without having to reimplement that isolation. But it will take some time.


Oh I see orchestrating runtimes is quite different. Good points!


Helm really solves a different use case than this.

This is about describing the desired coordination among running containers. Helm is about how you template or generate your declarative state. You could certainly add this description to your templates with Helm, but you couldn't actually implement this feature with Helm itself.


I bundled both composition & orchestration under the same header.

It so happens that pods have multiple containers, which is another example of Kubernetes having a specialized specific composition or orchestration implementation. One that started as composition, and here iterates towards orchestration.


Compositions of blocks of state may not end up with a more reliable software. Each of state management are controlled by independent processes that may interact with each other (example: horizontal pod autoscalers are not directly aware of cluster-autoscaler). The whole system is more like an ecology or a complex adaptive system than it is something you can reason directly with abstractions.

In the Cynefin framework (https://en.wikipedia.org/wiki/Cynefin_framework), you can reason through "complicated" domains the way you are suggesting, but it will not work when working in the "complex" domain. And I think what Kubernetes help manage is in "complex" not "complicated" domain.


Orchestration of k8s wouldn't be necessary if they had made K8s' operation immutable. As it stands now you just throw some random YAML at it and hope for the best. When that stops working, you can't just revert back to the old working version, you have to start throwing more crap at it and running various operations to "fix" the state. So you end up with all these tools that are effectively configuration management tools to continuously "fix" the cluster back to where you want it.

I hope the irony is lost on no one that this is an orchestration tool for an immutable technology, and the orchestrator isn't immutable.


You can use gitops (eg fluxcd) to revert to previous cluster states.


If you wanted to do the opposite of what I'm saying, sure


Worth noting that this is hitting Alpha in Kubernetes 1.28, so won't be available by default at this stage.

If you've got self-managed clusters, it'd be possible to enable with a feature gate on the API server, but it's unlikely to be available on managed Kubernetes until it gets to GA.


In case anyone else was looking for a clear, concise summary of the new feature:

"The new feature gate "SidecarContainers" is now available. This feature introduces sidecar containers, a new type of init container that starts before other containers but remains running for the full duration of the pod's lifecycle and will not block pod termination."


thank you!


It's a shame it took so long. If the main container shutdown (i.e connection drain, processing inflight queue items) takes a while, and your service mesh dies (nice go binary) and main container cannot communicate with internet anymore.

But I'm not sure about initContainers being used. init keyword implies it'd run and die in order for others to continue. Using restartPolicy with init instead of a dedicated sideCars field feels weird.


We did that to leave open more complex ordering of both init containers and sidecars (regular containers do not have a restart order). For instance, you might have a service mesh that needs a vault secret - those both might be sidecars, and you may need to ensure the vault sidecar starts first if both go down. Eventually we may want to add parallelism to that start order, and a separate field would prevent simple ordering from working now.

Also, these are mostly init containers that run longer, and you want a sidecar not starting to be able to block regular pods, and adding a new container type (like ephemeral containers) is extremely disruptive to other parts of the system (security, observability, and UI), so we looked to minimize that disruption.


Without restart policy, a failing init container is retried forever. With a policy of never, the entire pod is marked as having failed. The init containers still have to run and succeed before the main pod continues.


Any documentation on this? What does this mean?


So, until now, a sidecar container was just the idea of running containers in you Kubernetes pod, along with your main service, that were 'helpers' for something: connection to databases or vpns, mesh networking, pulling secrets or config, debugging... But they didn't have special status, they were just regular containers in your pod.

This sometimes posed some problems because they weren't available for the full life cycle of the pod, notably on the init process. So if your init containers needed secrets, connections, networking... that was being provided via a sidecar container, you were going to have a hard time.

With this change, among other things, sidecars containers are going to be available for the whole life cycle of the pod.

There are other implications, probably, but I still haven't finished reading the KEP [0]. Check it out, and there you'll find its motivation and several interesting examples.

  0: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/753-sidecar-containers
Edit: corrected syntax


The KEP (Kubernetes Enhancement Proposal) is linked to in the PR [1]. From the summary:

> Sidecar containers are a new type of containers that start among the Init containers, run through the lifecycle of the Pod and don’t block pod termination. Kubelet makes a best effort to keep them alive and running while other containers are running.

[1] https://github.com/kubernetes/enhancements/tree/master/keps/...


KEP: https://github.com/kubernetes/enhancements/tree/master/keps/...

TLDR: Introduce a restartPolicy field to init containers and use it to indicate that an init container is a sidecar container. Kubelet will start init containers with restartPolicy=Always in the order with other init containers, but instead of waiting for its completion, it will wait for the container startup completion.


Hopefully these changes should make Envoy sidecars (and sidecar co-existence in general) more reliable.


What's the use-case for Envoy as a sidecar? (As someone using Envoy Gateway.)


Service mesh. Istio for example injects an envoy sidecar to each pod, and manages everything "meshy" (routing, retries, mTLS, etc) via this sidecar.

This is also how linkerd works, though they're using their own purpose-built sidecar rather than envoy.


Isn’t the new ambient mesh available in istio 1.18 sidecar-less?


Yep, it’s also alpha, under intense development, and by every account (including those vendors who are chomping at the bit to start selling it to customers) absolutely not production ready.


Oh, good to know. I was about to pack a spike into an upcoming sprint.


It seems like there could be a better marker for this. Maybe my skill with Kubernetes is too low for it to make sense.


When I first learned about the sidecar pattern I thought it was great. I am not sure about it anymore. Most of it could be propagated to custom images or layers at the boundary. To me this feels a bit sketchy. Too have containers that kinda is part of the mesh but then does not share the same lifecycle as the mesh.


If you create a custom image you would need to create a complex health endpoint that is essentially only considered healthy if all the components baked into your image are considered healthy. This gets harder when you are not the author of the sidecar process on which you rely. With a single image it would be easier to run into a situation where the sidecar process (baked into your image) is in an unhealthy state but your container is not restarted because the app itself is not reporting unhealthy status.


Monolith apps can have many dependency checks and it is not really an issue but I get your point. It can become messy. What I have seen gone into sidecars is TLS-termination, caching, authentication, service clients, metrics and logging. Things I would prefer to have in a dedicated proxy layer or in the images.


> Pod is terminated even if sidecar is still running

this is great for things like Jobs and Istio

eliminates the scheme where the main container had to signal to the sidecar it was exiting otherwise the pod would hang


Yep, I was looking into running Jobs with Sidecars awhile back and came across this issue. I was actually surprised this morning to see a link on HN be in the "already read" state. Nice to see this feature merged, however our Cluster is on 1.25 I think? Probably a ways away from being able to use this.


Is there a clean way to share an emptyDir between sidecar(s) and main container(s)?

Looking at the logging usecase and want to be able to add a log shipper sidecar to a pod with ephemeral storage.


An easier solution for you might be something like vector-which will automatically harvest the logs from pods, and has excellent routing capabilities.

You wouldn’t need a sidecar-per-pod this way either.


This is great. My team at Netflix (I'm not longer there) sponsored some of the work behind this, via Kinvolk (now acquired by MSFT). Great to see that it finally shipped. At the time, this was a blocker to us using Kubelet, and we thought it might take a few...months to sort out. Turns out it was closer to a few years, but its a tricky API, and important to get right.


The lack of native sidecar support was my biggest surprise when moving from ECS to EKS, and it was not fun hacking with shared process IDs to accomplish sidecars. I'm glad this is finally in but also curious how it takes roughly 3ish years(?) from KEP proposal to merge?


How does the syntax look for defining a sidecar in a deployment? Is it similar to initContainers?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: