Is it me, or does it seem weird to encrypt your secrets by uploading the secret key to GCP (contained in the config .yaml file)? I assume the controller instances are operated by Google in this[1] example.
Moreover, is there any sensible way at all to encrypt secrets without baking the secret key into your image? I can’t think of any.
I want to deploy an app that makes use of one or more fairly important secrets, but I haven’t found a sensible way to make it auto-scale while keeping the secrets on-premise.
As far as I can see, the only sensible solution is to create in-cloud/off-premise secret keys that can only be accessed by images signed with an on-premise secret key.
So,
1. Create secret key on an offline, on-premise machine
2. Produce application image, transfer to offline machine, sign with on-premise secret key
3. Create off-premise (in-cloud) secret, which can only be accessed by images signed with the on-premise secret key
4. Upload app image and signature to the cloud, allowing only this image access to the in-cloud secret
To do this sort of encryption at rest you need a root. There are essentially three common solutions:
1. Keep the secret key in memory. This can be annoying though as a system administrator needs to unlock every process as it comes up; automation can help but is rarely deployed. This is what Vault does for example. There is a long design doc that we all worked on with this as an eventual option[1] but it is unclear if there is a ton of utility for users to be unlocking API servers
2. Hide the secret key inside of hardware using something like an HSM or a TPM. This is really the state of the art however it doesn't really work well in many environments and requires good inventory management.
3. Give machines a stable cryptographic identity and exchange that identity for a secret key that is kept in memory. This solution sort of shifts the problem around but is generally a good tradeoff between automation and security. The idea is, similar to SSH host keys, on first boot a machine generates an identity and then a system administrator approves that identity and later gives that identity authorization to do tasks. Kubernetes has a lot of the groundwork in place for this and it is still underway[2].
We are working on a solution to this problem in the Clevis project[0]. It is a basic FUSE filesystem that will transparently decrypt your secrets/configuration. It will evaluate your decryption policy on each open and log the attempt.
You can see the initial proof of concept[1]. It isn't secure yet, for a variety of reasons. But it is enough to play around with. Moving to a better encryption scheme will give us the ability to do locks and per-block validation.
In AWS the combination of IAM roles + Param store gives us a pretty decent version of this. We have the app talk to param store to get secrets then just keeps them in memory.
I can highly recommend OpenShift and its Ansible deployment scripts. Documentation is very well-written and complete.
It takes care of all the annoying parts of Kubernetes and even has services like a full-featured Docker registry with ACLs and so on, a Docker build system and even a centralized logging mechanism (all optional, of course).
Going through the previous version of this tutorial really helped me, even though we're doing IBM Cloud private on-prem + Bluemix Container Service (don't ask.)
It works pretty well with Cloud Shell, in case you have corporate firewall issues. If your session is interrupted, run the commands that set the region and region zone again.
I can confirm that it costs about $6/day that the machines are provisioned, and is well worth it, but remember to run all the clean-up steps in the last chapter when you're done or if you're not going to finish it right away.
This is great. I have been looking at Kubernetes for sometime and have struggled with adapting it to our deployment model. A lot of the tools and tutorials want someone to sit and run commands in order to start controllers and worker nodes but that doesn't make sense in our automated environment. What we really want is a way to bake AMIs etc that have everything ready to go and when we do a deployment or scale out it is as simple as starting an instance. This collection of labs lays a lot of that out and I think this is something we can work with.
We have, and as far as we could see it is a command line tool that needs to be run manually. This model doesn't work for us as we scale instances in and out by the dozens throughout the day. I could be missing something but anything that requires command line interaction isn't viable.
For instance, if we see that our worker cluster is has reached some consumed capacity threshold (CPU, memory, etc) then we need to add more worker nodes. The way we do this is via Auto Scaling Groups. Auto Scaling Groups work on launch configs that are essentially an AMI and instance user data. Getting an AMI that just has the Kubernetes stuff that can be started at will and told to join clusters at runtime has not been clear from the documentation I've read.
Thanks, I have looked at this as well. There are a few issues:
- It only scales worker nodes
- It only scales nodes if a pod can't be scheduled. This is too late for us as are requirements are to scale when resource utilization crosses a threshold (ie reserved memory is over 70%).
- It still doesn't solve the baked AMI problem. I may be wrong about this but the AWS docs for CA are lacking. For instance, when does the ASG/launch config get created? Is that something it does or it needs to be done ahead of time? Which AMI does it use? etc.
1) Yes, but I'm not sure why you would want to scale other nodes (I assume you're referring to the masters?)
2) Yes, the cluster-autoscaler will give you more node capacity. If you want to scale Pods on a CPU basis, you can use a Horizontal Pod Autoscaler (https://kubernetes.io/docs/tasks/run-application/horizontal-...). I believe it now support custom metrics as well, so you can scale on any resource threshold.
3) The ASG/Launch Config is created by Kops when you create the cluster; and is also manageable through the kops tools. It will default to the kops default AMI (which includes the kubelet, and everything you need for k8s to run), but you can also override that with a different AMI if you have a k8s compatible AMI that more fits your needs.
To expand on the sibling comment by nrmitchi, in Kubernetes you don't scale your worker nodes by CPU or memory. Instead you scale your pods by CPU or memory. Running the cluster autoscaler ensures that when you run out of space to schedule a new pod new worker nodes are brought up. It also works the other way; when your nodes are being underutilized for a while, the autoscaler will kill some nodes (and the scheduler will automatically handle placing any pods on those nodes onto other nodes).
Right, I see that in the CA docs. This doesn't work with our business requirements since our workloads are spikey and we don't want to wait around for pods not being able to be scheduled to start more nodes.
A contrived example is imagine if for every gmail user that logs into gmail google spins up a pod for the user. No imagine if there wasn't any capacity to schedule a pod for a user that is logging in. That user would either be denied or have to wait for an instance to come up and then the pod to be scheduled. Not ideal.
Yes, this is my biggest problem with the autoscaler as well. I opened an issue about this almost a year ago, looks like they're finally getting around to addressing it: https://github.com/kubernetes/autoscaler/pull/77
We built a Kubernetes platform installer[1] using Terraform which works fairly well and I think is more white box because it uses Terraform directly. However, we are still dealing with some of the necessary Terraform language repetition in our modules.
I did something similar off of CoreOS's tutorial. So while I'm missing a lot of understanding of the newer functionality, going through this was worth it.
Another vote for CoreOS's repo; it was how I _really_ loaded k8s into my head, because the vagrant setup is multi-machine but _local_ and thus one can see what's going on and destroy-recreate it at will.
Moreover, is there any sensible way at all to encrypt secrets without baking the secret key into your image? I can’t think of any.
I want to deploy an app that makes use of one or more fairly important secrets, but I haven’t found a sensible way to make it auto-scale while keeping the secrets on-premise.
As far as I can see, the only sensible solution is to create in-cloud/off-premise secret keys that can only be accessed by images signed with an on-premise secret key.
So,
1. Create secret key on an offline, on-premise machine
2. Produce application image, transfer to offline machine, sign with on-premise secret key
3. Create off-premise (in-cloud) secret, which can only be accessed by images signed with the on-premise secret key
4. Upload app image and signature to the cloud, allowing only this image access to the in-cloud secret
[1] https://github.com/kelseyhightower/kubernetes-the-hard-way/b...