Horrors of Using Azure Kubernetes Service in Production

summarity · on Aug 6, 2018

My $DAYJOB is leading a team which develops applications and gateways (for the 1k+ employee B2B market) that integrate deeply with Azure, Azure AD and anything that comes with it. We do have Microsoft employees (who work on Azure) on our payroll, too.

I can tell you, as I'm sure anyone in my team can, that Azure is one big alpha-stage amalgation of half-baked services. I would never ever recommend Azure to literally any organization no matter the size. Seeing our customers struggle with it, us struggle with it, and even MS folks struggle with even the most basic tasks gets tiring really fast. We have so many workarounds in our software for inconsistency, unavailability, questionable security and general quirks in Azure that it's not even funny anymore.

There are some days where random parts of Azure completely fail, like customers not being able to view resources, role assignments or even their directory config.

An automatic integration test of one of our apps, which makes heavy use of Azure Resource Management APIs, just fails dozens of times a week not because we have a bug, but because state within Azure didn't propagate (RBAC changes, resource properties) within a timeout of more than 15 minutes!

Two weeks back, the same test managed to reproducibly produce a state within Azure that completely disabled the Azure Portal resource view. All "blades" in Azure just displayed "unable to access data". Only an ultra-specific sequence of UI interactions and API calls could restore Azure (while uncovering a lot of other issues).

That is the norm, not the exception. In 1.5 years worth of development, there has never been a single week without an Azure issue robbing us of hours of work just debugging their systems and writing workarounds.

/rant

On topic though, we've had good experiences with these k8s runtimes:

- GKE

- Rancher + DO

- IBM Cloud k8s (yeah, I know!)

blablabla123 · on Aug 7, 2018

Haha... I have experience with Azure as well, seen both good and bad things. As I read the title, I was already quite sure to read such a post. When Kubernetes became popular, I tried it with Azure and both scripts and documentation were broken. When I found out, I stopped trying.

Regarding Azure in general: Azure Websites is c*. Having used Heroku and App Engine for some time before, this feels like a joke. Deployments sometimes work, sometimes they don't. Have to deal with node gyps? Don't, just don't. If you ever are forced to use Azure Websites (free startup package? ;)), learn Ansible as soon as possible and convince your team to switch to VMs.

The VMs are okay, you can't do much wrong with. I don't really know where the complexity of Azure Websites comes from, maybe from the fact that it runs on Windows, but this cannot be the full explanation. I have seen people work with node on Windows (even without Ubuntu on Windows) and they were fine. For anyone interested, this is the Azure Websites backend: https://github.com/projectkudu/kudu

Disclaimer: my long adventure with it was years ago, maybe the service has changed 100% but I doubt it

RoadieRoller · on Aug 6, 2018

My Team uses CosmosDB heavily (so far, and not too far anymore) and it is another half-baked service. The support people are Indians (Microsoft outsourced Azure support to an Indian company - MindTree) and are not very knowledgeable on the CosmosDB service. They always point us to the URI of a web article (of course from Microsoft) and says everything will work if you follow the article, and close the ticket. Over the time, we understood that we know more than them on CosmosDB, and used to ignore their replies, but nevertheless raise tickets to make sure that they are aware.

megaman22 · on Aug 7, 2018

I generally like Microsoft, but their official support channels are pretty terrible. Just go looking for something in the MSDN forums - a large percentage of the posts from "Microsoft" people are telling the customers/developers that they posted in the wrong forum (often incorrectly), or suggesting some inane thing that the original post already specified and then closing the thread. GitHub issues are slightly better, although if you get off the beaten path of the new hotness, responses get very thin.

You're much better off trying to find some backchannels via MVPs on Twitter or through blogs, or figure out the developers or evangelists that give talks on this kind of stuff and contact them directly.

gamblor956 · on Aug 6, 2018

My company uses Azure quite heavily, but not Kubernetes.

No crashes, ever. Way more reliable than AWS ever was. (GCP is our failover.)

So it seems that your experience is, from my POV, the exception. Maybe there's something wrong with the way you guys have Azure set up?

politician · on Aug 6, 2018

Which provisioning model are you using -- ASM or ARM? The last time I used Azure we used the (deprecated) ASM model which was pretty stable instead of the newer and often broken ARM model. We ended up staying on the deprecated model until we moved to AWS (for unrelated reasons).

electricEmu · on Aug 7, 2018

Wow, our experiences were different. I found ARM to be a great tool compared to ASM.

The moment I removed the last ASM bits, my entire infrastructure became reliably versioned and deployable.

politician · on Aug 7, 2018

That's really great to hear!

jgalentine007 · on Aug 9, 2018

This has been my experience as well, I was surprised to read about some of the issues that were posted. Been using azure app service for about 50 .net/core services for over a year with 100% uptime. Guess I'm just lucky!

partiallypro · on Aug 7, 2018

In most cases the cause is on the DevOps team and not on Azure, GCS or AWS. I can attest to that in screwing up some configs early on. That being said this is a very new offering from Microsoft and is possible it has some kinks to work out.

Spoom · on Aug 7, 2018

The biggest red flag I saw when I was working with Azure is that I noticed that a lot of their CLI commands ("az do-a-thing") actually have the pattern of "retry until it works"... and the first few tries often actually fail!

brianafrank · on Aug 7, 2018

Thanks for the shout out to IKS (ibm cloud). you have no idea how obsessed we are with this service so it's always great to see someone noticed. :)

merqurio · on Aug 7, 2018

We were very surprised with the quality of IBM's Kubernetes service too ! We had a cluster there for almost a year and everything runned very smoothly.

We missed having more instance types to choose from, but it was a nice experience.

danberg · on Aug 8, 2018

We are constantly working to add more machine-types based on customer requests. Recently we have added several new flavors including bare metal systems. You may want to check them out. https://console.bluemix.net/docs/containers/cs_clusters.html...

mijoharas · on Aug 7, 2018

So, just to understand, are you writing the software because the large business clients are already tied to azure and gonna keep using it? Trying to understand why there is a market, and why people would want to use it.

ripberge · on Aug 7, 2018

Can't speak to the Azure Resource Management API's but I've had a very different experience with Web Apps, Azure SQL, Storage, Traffic Manager. Other than a few short-lived bouts of missed writes into Service Bus and Cosmos (their worst product, IMO) the platform has achieved 100% uptime for us for several years. Pretty amazing. From what you're describing I'm guessing you're REALLY not their average user and they de-prioritize quality on your use cases. Probably better off on another platform if you can help it.

blablabla123 · on Aug 7, 2018

Some stuff works really good, I had very good experience with the Table and Blob Storage, Traffic Manager as well. But seriously, these are pretty basic services. ;-) What did you host on Web Apps if I can ask?

ripberge · on Aug 7, 2018

It's basically a SASS eCommerce app. Kind of an enterprise class OLTP solution, but with pretty unusual traffic bursts and due to what we're selling very high DB contention. We test with 100,000 concurrent users, have achieved about 10,000 orders per minute--much more than our credit card gateway (Stripe) will allow us to process. There are some rough edges here and there, but I've experienced just as bad and sometimes worse with AWS.

lovich · on Aug 7, 2018

I've had wildly different results. My shop wasn't large by any means but azure worked pretty much perfectly for us. The only issue we ever had was when changing the size on an azure swl db went from a 20 minute operation to taking up to an hour sometimes. Other than that it let us scale as we wanted and have the engineers duplicate environments with their local changes arbitrarily. Gave us a 400 dollar/month bill for something that would have taken a full devops to handle with 10 engineers

Bombthecat · on Aug 7, 2018

With DO you mean digital Ocean?

lloeki · on Aug 7, 2018

I guess so since RancherOS is available as a fully supported option (as well as Fedora Atomic and CoreOS).

FWIW we're running a simple, custom cluster made of Debian droplets set up using kubeadm.

alexpi · on Aug 7, 2018

I think DO stands for DC/OS in this context

summarity · on Aug 7, 2018

No, I actually meant DigitalOcean. We tested both Rancher (k8s) and Portainer (Swarm) backed by DO Infrastructure. Both worked well. Of course it's not a managed solution, but both are operationally very easy. DO also announced native k8s support, so I'm excited for that.

sitepodmatt · on Aug 7, 2018

It's in beta and closed access at the moment though https://www.digitalocean.com/products/kubernetes/ Would be interested as test/QA environment but I guess combined with 'CPU Optimized Droplets' could work for production loads too

eip · on Aug 8, 2018

> That is the norm, not the exception.

Have you never used any other Microsoft software?

I mean it is the same software company that made Windows ME, Vista, 7, and 10, along with countless other chocolate covered turds.

JediPig · on Aug 7, 2018

i rarely comment. I was going to give a thumbs up on testing azure for k8s. I am removing azure from the list perm. After reading these horror stories, azure just killed itself as a cloud provider. GC & AWS. AWS support engineers bend over backwards for us.

QiKe · on Aug 7, 2018

(Eng lead for AKS here) While lots of people have had great success with AKS, we're always concerned when someone has a bad time. In this particular case the AKS engineering team spent over a day helping identify that the user had over scheduled their nodes, by running applications without memory limit, resulting in the kernel oom (out of memory) killer terminating the Docker daemon and kubelet. As part of this investigation we increased the system reservation for both Docker and kubelet to ensure that in the future if a user over schedules their nodes the kernel will only terminate their applications and not the critical system daemons.

wpietri · on Aug 7, 2018

Does it seem weird to anybody else that a vendor would semi-blame the customer in public like this? I can't imagine seeing a statement like this from a Google or Amazon engineer.

It also doesn't seems to ignore a number of the points, especially how support was handled. I think it's bad form to only respond to the one thing that can be rebutted, ignoring the rest. And personally, I would have apologized for the bad experience here.

ummonk · on Aug 7, 2018

While it might be phrased in a way that implies the customer is partly to blame, the actual details would indicate the main problem was with Azure Kubernetes Service. Critical system daemons going down because the application uses too much memory is not a reasonable failure mode (and the AKS team rightfully fixed it).

wpietri · on Aug 7, 2018

Exactly. The whole point of offering a service to the public is that you know more than other people. So of course customers will do wrong things, be confused, etc.

In Microsoft's shoes, I would have strongly avoided anything that sounded like customer blame. E.g.: "We really regret the bad experience they had here. They were using the platform in a way we didn't expect, which led to an obviously unacceptable failure mode. We appreciate their bringing it to our attention; we've made sure it won't happen going forward. We also agree that some of the responses from support weren't what they should have been and will be looking how to improve that for all Azure users."

The goal with a public statement like this isn't to be "right". It isn't even to convince the customer. It's to convince everybody else that their experience will be much better than what is hopefully a bad outlier. The impression I'm left with is that a) Azure isn't really owning their failures, and b) if I use their service in a way that seems "wrong" to them, I shouldn't expect much in the way of support.

Drdrdrq · on Aug 7, 2018

...and apparently forgot to notify customer of it, and in general communicate with customer better.

I think this is the main reason for AWS lead. They simply treat customers right (well, better than G and MS anyway).

rblatz · on Aug 7, 2018

Yet Azure is the top cloud provider, and AWS is #2.

nemothekid · on Aug 7, 2018

They are number one because Microsoft doesn’t break out Azure and Office 365 revenue.

cthalupa · on Aug 7, 2018

My understanding, which may be incorrect, is also that they consider all SPLA revenue as cloud revenue as well.

(SPLA is the licensing paid by service providers to lease their customers infrastructure running Microsoft products. So if you pay some VPS or server provider $30/mo or whatever they charge for Server 2012, and they turn around and send $28 of it to MS or whatever, MS reports that $28 as cloud revenue)

ickler9 · on Aug 7, 2018

Well, this is on the front page, the top comment is misinformation, the posters left out details that made them look bad, and they seem to be going on a smear campaign out of spite on every platform they have. at what point is any of this in good faith?

wpietri · on Aug 7, 2018

What makes you think it's not in good faith? As far as I can tell, Prashant Deva had a series of bad experiences on Azure, including significant downtime. He's mad, and he's saying so.

From his perspective he was using it right; from Azure's apparently he was using it wrong. A difference in perspective isn't bad faith.

ickler9 · on Aug 8, 2018

Probably the part where he doesn't actually ever say it's a difference in perspective- that's your take. He says AKS is terrible, etc, etc. You're giving him a benefit of the doubt, which I appreciate, but he's gone too far in his bias. Maybe underlying it is a real issue, that clearly hackernews wants to indulge, but the threshold has been crossed.

wpietri · on Aug 8, 2018

He doesn't have to say it's a difference in perspective. He's giving his perspective. That's what blog posts generally are.

I note that you don't say your comments here are just your perspective as you trash-talk him. Does that mean you're pursuing a smear campaign and not acting in good faith? Why should he be held to a standard you yourself aren't willing to follow?

shaklee3 · on Aug 7, 2018

I would prefer a vendor responds publicly rather than request a private message. It's possible that one side was angry, and writing a blog post that makes it on HN will surely get a ton of negative attention. If that's the case, they should have the right to clear anything they'd like. I didn't read it as blame, but explanation.

azurezyq · on Aug 7, 2018

I think it's good to listen to both sides. But the response from Azure eng can be more professional. Customers have the right to do anything, maybe technically right or wrong. But the original post's attitude is more like blaming and throwing out random tech details, not explanation.

brudgers · on Aug 7, 2018

Steve Jobs told customers they were holding their phones wrong, so, to me, not really.

Cthulhu_ · on Aug 7, 2018

Jobs was a dick; Microsoft has PR and developer relations teams that are trained in how to provide feedback to their community.

wizardofmysore · on Aug 7, 2018

No. If the customer is at fault there is no problem in blaming them especially if they run a smear campaign.

jschwartzi · on Aug 7, 2018

Well, we really don't know what was said as the blog didn't actually provide any of the original communications. It's a he said she said thing at this point. Frankly the author comes across as having a huge axe to grind. That may be with good reason but it's hard for me to judge the quality of the Azure support when we never see any of their communications, just paraphrases.

anaisbetts · on Aug 7, 2018

[flagged]

jazoom · on Aug 7, 2018

I like hearing the other side of the story. I wish I got the other side more often.

po · on Aug 7, 2018

AKS engineering team spent over a day helping identify that the user had over scheduled their nodes, by running applications without memory limit, resulting in the kernel oom (out of memory) killer terminating the Docker daemon and kubelet.

I'm a bit confused why the cluster nodes don't come configured like this out of the box... kubernetes users aren't supposed to have to worry about OOM of the underlying system killing ops-side processes are they?

dilyevsky · on Aug 7, 2018

they do if cluster admin didn't setup proper system-reserved and kube-reserved (both are kubelet flags) and configured enforcement.

alpb · on Aug 7, 2018

In this case, the cluster admin would be whoever is provisioning the cluster nodes. In Google Kubernetes Engine, the "Capacity" and "Allocatable" info shown on the nodes are different (I see some mem/cpu reserved for probably system stuff). This makes me think GKE probably subtracts node capacity from what's allocated for the system automatically.

P.S. I work at Google.

dilyevsky · on Aug 7, 2018

correct, it should be provisioned by k8s provider (AKS in this case) and is what GKE is doing https://cloud.google.com/kubernetes-engine/docs/concepts/clu...

note, it also needs to match node configuration (how cgroups are setup specifically) so I doubt this works well on EKS which is BYO node. Maybe it's the issue with AKS too, I don't know enough about how it works...

QiKe · on Aug 7, 2018

AKS now reserves 20% of memory from each agent node and a very small amount of CPU to protect docker daemon and kubelet to function with misbehaving customer pods. However, that just means customer's pods will be evicted or no place to schedule when all resource is used up. This is something we see now in customer support cases.

dilyevsky · on Aug 7, 2018

That seems crazy high. If I have a node with 512G of RAM the kubelet/sys will take 100G? why would kubelet ever need this much?

QiKe · on Aug 7, 2018

AKS caps at 4G.

crb · on Aug 8, 2018

4GB of RAM per machine, or 4GB reservation of a 20GB machine?

wokwokwok · on Aug 7, 2018

'my stuff didnt work on AKS' is one thing; 'my stuff brought AKS and the dashboard down' is an fundamental failure that is in no way mitigated by this comment, and it feels very dishonest to try to redirect the blame for it.

My experience with azure has been reasonably positive, but even I've seen some weird stuff where things randomly don't work (AAD) or the dashboard just refuses to show anything for a while.

That this is a widespread endemic problem in Azure seems entirely plausible...

pdeva1 · on Aug 7, 2018

it is unclear what this response hopes to achieve. it is mentioned in the post that our containers do crash. that should under no condition cause the underlying node to go down. this has even been pointed out by others responding to this thread. it is interesting though that none of the other issues in the blog post are bought up.

HelloNurse · on Aug 7, 2018

Setting aside the workarounds and safety margins discussed in other comments, I would expect a reasonable operating system to allow explicitly prioritizing processes so that the important ones can only run out of memory after all user processes have been preemptively terminated to reclaim their memory. I would also expect a good container platform to restart system processes reliably, even if they crash.

amouat · on Aug 7, 2018

Yeah, it should do that. You can read up on how the kubelet and Linux OOM work in k8s here https://kubernetes.io/docs/tasks/administer-cluster/out-of-r.... Once the OOM kicks in though, I think you're in a pretty bad place.

Scheduling is only really going to work well if you set limits, requests and quotas for containers. Please do this if you're running containers in production. I know it's a pain, as it's non-trivial to figure out how much resource your containers need, but the payoff is you avoid the issues described in the article.

baaym · on Aug 7, 2018

My guess is that the system reservation change was very welcome for me as well.

Note that a service as AKS also draws in new customers that may not yet have years of Kubernetes experience. I'm one of those for example, and I created an AKS cluster so we could deploy short-lived environments for branches of our product. We're using GitLab and the 'Review Apps' integration with Kubernetes.

The instability experienced by the author of this article is something I experienced as well, and I have spent a lot of time draining, rebooting, and scaling nodes to try and find out what is happening. I would not have been able to guess the absence of resource limits could possibly kill a node.

Fortunately these instabilities disappeared a couple of weeks ago after a redeployment of the AKS instance, and it has been stable ever since. I guess the system reservation change was included there? From my perspective that was also the moment AKS truly started feeling like a GA product.

crunchlibrarian · on Aug 7, 2018

Sounds like you're still beta testing

exikyut · on Aug 7, 2018

Ah, and Hyper-V supports dynamic memory, so the system reservation backing can effectively be thin provisioned. That's nice. (Hm, dynamic memory probably got switched on from the start.)

Thanks for posting this here. It would be cool for there to be a way to hold application users to account without needing to chase viral Internet posts and do your best to pin some accurate reporting on slightly after the fact. A tricky general problem.

If there's one thing I miss with Azure (and AWS), it's the perpetually-free 600MB RAM KVM VM GCloud gives everyone to play with. It only has 1GB outbound, but inbound bandwidth is free, and I can do pretty much whatever I want with it. But anyways...

AaronFriel · on Aug 7, 2018

I don't think Azure ever uses dynamic memory for VMs - if I SSH into a VM I see the full allocation of whatever size it was supposed to be out of the bat.

I think this has to do with cgroups and ensuring the OOM killer doesn't target what is essentially the `init` process of a Kubernetes cluster - the docker daemon or kubelet.

specialp · on Aug 7, 2018

This is a pretty bad mistake from the customer if this is true. If not done already it would probably be good to expose Prometheus metrics on CPU/Memory usage per node.

markbnj · on Aug 7, 2018

Yes, this is true on it's face: it's bad to deploy containers to k8s without appropriate resource limits. However, this should in no way affect the operation of the node, so the implied transfer of responsibility for this incident from AKS to the customer is invalid imo.

sleepybrett · on Aug 7, 2018

It does if you don't have enough system reserved. It causes the kubelet to not function well and eventually the node goes sideways due to oom killing.

markbnj · on Aug 7, 2018

Right, I think we're saying the same thing. If the node is properly configured an end-user pod should not be able to take down the kubelet.

dilyevsky · on Aug 7, 2018

lol so aks forgot to provision enough resources and possibly setup enforcement and you are blaming the user? the user should be able to run as close to edge of "allocatable" as possible or even go over it and be oom kill'ed without bringing down the entire node. this functionality is even built into kubelet already. there's no way you can twist this to make it into user error.

btown · on Aug 7, 2018

More generally I should be able to choose to run an interruptible workload that I know to leak memory. I should expect that if I don’t, one of my coworkers will, and the node will stay up. Not leaving enough RAM for the node’s core resources is a mistake, but far from the worst thing in the world.

QiKe · on Aug 7, 2018

We are indeed working on more convenient container monitoring and logging on Azure portal.

bengale · on Aug 7, 2018

Last time I tried to use AKS I just got cryptic errors about the size of VMs available in Europe so I gave up and used GCP.

trhway · on Aug 7, 2018

>the AKS engineering team spent over a day helping identify that the user had over scheduled their nodes, by running applications without memory limit, resulting in the kernel oom (out of memory) killer terminating the Docker daemon and kubelet.

sounds like a bunch of people have just learned for the first time about OOM killer. I mean the production systems with overcommits and the running loose OOM killer and I bet without swap ... And they blame the customer. Sounds like a PaaS MVP quickly slapped together by an alpha state startup. You may want to look into man pages, in particular oom scoring and the code -17.

praseodym · on Aug 7, 2018

Actually Kubelet should already be adjusting OOM scores to make sure that user pods (containers) get killed over Kubelet or the Docker daemon. Why didn't that work here?

trhway · on Aug 7, 2018

adjusting scores for other processes skews the chances yet doesn't guarantee. The way to guarantee it for a given process is to disable the killer for that particular process.

nimos · on Aug 7, 2018

Interesting. The kubelet seems to use varying negative OOMAdjusts to prioritise killing[0] but if I'm reading the kernel code right anything at -999/-998 would return 1 from the badness function and essentially be equally valid to kill unless it was using over 99.9% of available memory.[1]

I see OOMScoreAdjust=-999 for kubelet being used but why not -1000. -999 seems like it would be equally likely to be evicted as -998 unless the for_each_process(p) macro always goes first to last processes?

[0]https://github.com/kubernetes/kubernetes/blob/master/pkg/kub...

[1]https://elixir.bootlin.com/linux/latest/source/mm/oom_kill.c...

trhway · on Aug 7, 2018

seems that way to me too - everybody like kubelet and "guaranteed" containers gets 1.

>unless the for_each_process(p) macro always goes first to last processes?

It seems it would usually go first to the first processes - the macro below - i.e. it would get to the "top" processes like kubelet, docker, etc before the containers.

#define next_task(p) \ list_entry_rcu((p)->tasks.next, struct task_struct, tasks)

#define for_each_process(p) \ for (p = &init_task ; (p = next_task(p)) != &init_task ; )

Given that "chosen" is updated only "if (points > chosen_points)" it seems that the first listed process with score 1 will stay the "chosen" in that situation, ie. it will be one of the top processes like the [-999] kubelet, not a [-998] container.

From a provider of an Azure class i'd have expected that they wouldn't rely on that machinery and would instead go the way of disabling the killer for the top processes outright.

enitihas · on Aug 7, 2018

Interesting, How does one disable the killer for a process? I thought on linux it was only possible to adjust oom scores.

technofiend · on Aug 7, 2018

Mental note - create a new cgroup for docker and kubelet.

h4b4n3r0 · on Aug 7, 2018

At Google you can’t even run anything on Borg until you specify how much memory it will use. You also have to specify how many cores you need and how much local (ephemeral) disk. And memory limit is hard: your task is killed without any warning if it attempts to exceed the limit. I was actually puzzled to discover that these limits are not required on k8s. Not only this leads to screwups like this one, it also makes it impossible to optimally schedule workloads, because you simply don’t know how much of each resource each job is going to use.

dilyevsky · on Aug 7, 2018

that's not actually how this works on Borg these days (and by "these days" i mean past 5+ years) and there's nothing about k8s not requiring limits by default that lead to this.

h4b4n3r0 · on Aug 7, 2018

I'll let current googlers comment on that. That's how it worked 3 years ago when I was there. You could also let Borg learn how much a job is going to use, but no serious service that I'm aware of used this for anything in Prod.

puzzle · on Aug 7, 2018

I left years ago and there were serious services using Autopilot. The name is no longer secret, see https://github.com/kubernetes/kubernetes/issues/44095 or Tim Hockin's slides from two years ago, where he revealed that 2/3 of Borg users rely on it: https://speakerdeck.com/thockin/everything-you-ever-wanted-t...

h4b4n3r0 · on Aug 7, 2018

The slide merely says "most Borg users use Autopilot", which could easily be true. Heck, I used it myself for non-production batch jobs. Those jobs were run as me. Any engineer at Google can spin up a job, and I'd venture to guess that most of them run at least something there every now and then. That's ~40k logical "users" as of 2018. The interesting question (which I admit I don't know the answer to as of today) is whether users that run search, ads, spanner, bigtable, and other shared service behemoths use Autopilot. FWIW my team did not use it at all.

JediPig · on Aug 7, 2018

you just killed any hope for azure running k8s. seriously. killed it with that statement.

empath75 · on Aug 7, 2018

[flagged]

nickbauman · on Aug 7, 2018

When I deploy to Amazon ECS the upper limit of the resource geometry of my service is checked and if it exceeds that upper limit available of the underlying cluster, it refuses to deploy. I understand k8s has similar features. It reads like Azure doesn't have their k8s configured correctly.

markbnj · on Aug 7, 2018

If the containers in a pod request more ram than is available on any node in the cluster then the pod will fail to schedule and will remain in pending state, which can be seen in the events for the controller (replicaset, daemonset, etc) using, for example `kubectl describe replicaset myreplicaset.` We've gotten ourselves into this situation a few times on GKE. It's easily resolvable by tuning the resource requests or scaling the nodepool and has no adverse effect on the operation of the cluster.

paxys · on Aug 7, 2018

Worth nothing that both Microsoft and Amazon's Kubernetes offerings are very new (literally weeks since GA). While "officially" ready, it is pretty naive to rely on them for production-critical workloads just yet, at least compared to Google Kubernetes Engine which has been running for years.

If you absolutely need managed Kubernetes, stick to GCP for now.

ageitgey · on Aug 7, 2018

Here's a fun fact about Azure Kubernetes:

1. Deploy your Linux service on k8s with redundant nodes

2. Create a k8s VolumeClaim and mount it on your nodes to give your application some long-lived or shared disk storage (i.e. for processing user-uploaded files).

3. Wait until the subtle bugs start to appear in your app.

Because persistent k8s volumes on Azure are provided by Azure disk storage service behind the scenes, lots of weird Windows-isms apply. And this goes beyond stuff like case insensitivity for file names.

For example, if a user tries to upload a file called "COM1" or "PRN1", it will blow up with a disk write error.

Yes, that's right, Azure is the only cloud vendor that is 100% compatible with enforcing DOS 1.0 reserved filenames - on your Linux server in 2018!

manigandham · on Aug 7, 2018

You're not using Azure Disks because they are attached to your VM as a block device and have no knowledge of the file system. PVs in AKS using Azure Disks can only be attached to a single node, as clearly stated in the documentation: https://docs.microsoft.com/en-us/azure/aks/azure-disks-dynam...

>> An Azure disk can only be mounted with Access mode type ReadWriteOnce, which makes it available to only a single AKS node. If needing to share a persistent volume across multiple nodes, consider using Azure Files.

So you must be using a file share across multiple nodes using Azure Files, which is a SMB file share service that may have compatibility issues with the Samba protocol as described in the (arguably hard to find) docs: https://docs.microsoft.com/en-us/rest/api/storageservices/na...

>> Directory and file names are case-preserving and case-insensitive.

>> The following file names are not allowed: LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, LPT9, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, PRN, AUX, NUL, CON, CLOCK$, dot character (.), and two dot characters (..).

colemickens · on Aug 7, 2018

The only way this would make any sense is if you were using Azure Files, rather than Azure Disks. There's virtually never a time when it makes sense to use Azure Files over Azure Disks, and even when it does, a change in the application would be better advised than using Azure Files.

>Yes, that's right, Azure is the only cloud vendor that is 100% compatible with enforcing DOS 1.0 reserved filenames - on your Linux server in 2018!

This is hyperbolic bordering on flatly false. This is more reasonable and accurate:

"Azure is the only cloud vendor that serves their Samba product from Windows boxes, and thus leak Win/NTFS-isms into their Samba shares [that shouldn't be used anyway]."

How would an ext4 filesystem, mounted under Linux, attached as a block device to a VM, be subjected to Windows-isms? What you're implying doesn't even make sense.

parasubvert · on Aug 9, 2018

I really would have to disagree with the statement one should never use Azure Files over Azure disks.

1. Most Azure VM types have very stringent limits on attached disks; a K8s worker can easily blow past this limit.

2. You have tremendous complexity to deal with: pick Azure managed disks vs unmanaged disks on storage accounts (you can’t mix them on the same cluster). You have to understand the trade of of standard vs premium storage and how they bill (premium rounds up and charges by capacity, not consumption). And you need the right VM types for premium.

3. Managed disks each create a resource object in your resource group. A resource group last I checked had hard limits on the number of resources (like 4000?). Each VM is minimum 3 to 4 resources (with a NIC, image, and disk)... at scale this gets difficult.

4. Azure disks require significant time to create. , mount and remount. A StatefulSet pod failure will sometimes take 3-5 minutes for it’s PV to move to a different worker. And worse when your Azure region has allocation problems. Azure files are near instantaneous unmoubt/remount.

5. Azure disks are block storage and thus only ReadWriteOnce. Azure files are RWM.

So, sure, if you’re running a cluster database with dedicated per node PVs and limited expected redeployments... use Azure disks. If you need a PV for any other reason... especially for application tiers that churn frequently.. use Azure files.

colemickens · on Aug 10, 2018

This is all true, I sort of forget that I still have a sort of Azure Stockholm syndrome. I'd say it's good feedback but it's nothing Azure doesn't know about.

Maybe Azure Files performance has improved to the point where it's more usable for storage scenarios. I suppose it probably comes down to the use case and application behavior.

It would be good if Azure had someone testing out these scenarios and interfacing with the larger k8s community, maybe through the SIG, for these sorts of musings and questions.

AaronFriel · on Aug 7, 2018

Is this with the managed-disk volume (which IIRC is formatted ext4) or with `AzureFiles`, which is essentially SMB/CIFS?

nojvek · on Aug 7, 2018

I believe this is a cultural problem with Microsoft. Probably similar to other companies but it was very evident at Microsoft. People responsible to allocate resources (The management chain) rarely dogfood the product.

While the Engineers and PM would complain a lot about quality issues, management wants to prioritize more features. It was a running joke at Microsoft: No one gets promoted for improving existing things, if you want a quick promo, build a new thing.

So when you see a bazillion half baked things in Azure. That’s because someone got promoted for building each of those half baked things and moving on to the next big thing.

Going from 0-90% is the same amount of work as 90-99% and the same amount of work as 99.0% - 99.99%. Making things insanely great is hard and requires a lot of dedicated focus and a commitment to set a higher bar for yourself.

hb3b · on Aug 7, 2018

I joined a healthcare startup in 2014 that had a small infrastructure on Azure. Back then AWS weren't signing BAAs and Azure was the only player in town. Being an early startup, the company didn't purchase a support plan from Azure. One day Azure suffered a major outage (may have been storage related) and over an hour later, I reached out to Microsoft for written confirmation that we could forward to customers. Since we didn't have a support plan they flat-out refused to provide any documentation whatsoever about the issue. They wanted $10,000.

Azure - never again. Company moved to AWS within a quarter.

mgalgs · on Aug 6, 2018

FWIW, Amazon's hosted Kubernetes offering (EKS) isn't stable either (DNS failures, HPA is known to be broken, etc.).

shamsalmon · on Aug 6, 2018

I know HPA is a legit issue but DNS failures seem to be fairly normal in kubernetes. Scaling up kube-dns has helped us resolve that particular issue as well as moving away from Alpine and into minimal Debian images. Alpine has its own DNS issues that caused us much pain.

atombender · on Aug 7, 2018

We've had issues with KubeDNS, too. Lots of retries and timeouts on the client side, and lots of conntrack entries.

Libc has pretty slow retries (5s, I think) by default, and until 1.11 hits you can't easily set up resolver configs, though you can inject an envvar separately into each. And musl-based distros like Alpine don't even support some of libc's options, iirc.

We ended up scaling up KubeDNS to 2 replicas and moving them to a dedicated nodepool just to make sure they weren't competing with other nodes. That fixed our issues for now.

praseodym · on Aug 7, 2018

Kube-dns (or CoreDNS in newer clusters) is pretty stable in my experience. It's still a very good idea to run more than one replica so that you can tolerate a single node failure, but if DNS failures are "fairly normal" that definitely warrants some additional investigation.

sleepybrett · on Aug 7, 2018

Most dns problems in kubernetes, in my experience, can be traced to udp failures due to the iptables kubeproxy backend.

shamsalmon · on Aug 7, 2018

Thanks we did look into it but not as thoroughly as probably needed. Switching out from Alpine fixed pretty much all our issues.

ivelichkovich · on Aug 17, 2018

EKS HPA workaround https://medium.com/eks-hpa-workaround/k8s-hpa-controller-6ac...

ivelichkovich · on Aug 17, 2018

EKS HPA Workaround https://medium.com/eks-hpa-workaround/k8s-hpa-controller-6ac...

curiousDog · on Aug 6, 2018

This is sad. From what I hear, one of the founders of k8s works on AKS.

Only a matter of time before GCP becomes the #1/2 cloud provider.

h4b4n3r0 · on Aug 7, 2018

I’ve seen at least 4 “founders of Kubernetes” by now. How many are there in total?

rhencke · on Aug 7, 2018

At least three to five.

https://en.wikipedia.org/wiki/Kubernetes#History

cheeze · on Aug 7, 2018

Agree. The quality of their software is so much better than the other major players.

swozey · on Aug 7, 2018

Yeah, Brendan Burns is at ms

taherchhabra · on Aug 7, 2018

Had a similar experience with azure cosmos graph API. The API is half baked. Doesn't support all gremlin operations. Even supported operations give non standard output. Switched to aws Neptune immediately when they launched

wnsire · on Aug 7, 2018

> The API is half baked

Doesn't surprise me. Cosmos was too good too be true :

- Serverless - Infinite Scalability - Mongo API or Gremlin API or SQL API

It's obvious that it can't hold up to all it's promises.

the_new_guy_29 · on Aug 7, 2018

"Noone can give you what i can promise you"

AaronFriel · on Aug 7, 2018

There are definitely growing pains with using Kubernetes on Azure. I've wondered a few times if other platforms have similar issues and have seen more than a few complaints about EKS.

Microsoft has some great people working on Azure, but I do feel like AKS was released to GA too soon. Without a published roadmap and scant acknowledgment of issues, I'm not sure I could recommend it to my clients or employer. It's disappointing, because I've had few issues with other Azure services.

Full disclosure: I receive a monthly credit through a Microsoft program for Azure.

markbnj · on Aug 7, 2018

> I've wondered a few times if other platforms have similar issues and have seen more than a few complaints about EKS.

I can't speak to EKS but we've been running production workloads on GKE for over a year with very good results. There have been a very few really troublesome "growing pains" type issues (an early example: loadbalancer services getting stuck with a pending external IP assignment for days) but Google has been awesome about support, even to the extent of getting Tim Hockin and Brendan Burns on the phone with us at various times to gather information about stuff like the example I gave above. I give them high marks and would recommend the service without hesitation.

curiousDog · on Aug 7, 2018

Brendan Burns is the lead engineer on AKS AFAIK

mohamedmansour · on Aug 7, 2018

Brendan Burns was the Lead Engineer on Kubernetes at Google around 3 years ago, then went to Microsoft to lead this entire AKS/ACS/Container effort.

https://www.linkedin.com/in/brendan-burns-487aa590

rcconf · on Aug 6, 2018

Hmm, this isn't great. Currently using Azure Kubernetes Service and we haven't had many issues so far, but we just made the shift.

Hope I don't have to move over to Google cloud.

RantyDave · on Aug 7, 2018

It's heaps and heaps better than Azure.

zeroone101 · on Aug 7, 2018

You'll be fine. We run a number of AKS clusters and they have all been rock solid. I think the problem is people hear "managed cluster" and so they don't think they need to understand how k8s works. Follow best practices (resource limits, etc.) and you'll be just fine. We've even tested out the upgrade flow on a live production cluster and it was butter smooth.

outworlder · on Aug 6, 2018

Why? GKE works perfectly over there.

specialp · on Aug 7, 2018

Many people (me included) don't trust Google to stay with any business besides advertising. There's been too many times that they have ended services with not much time to get off.

axitanull · on Aug 7, 2018

That and countless horror stories about once your google account or service is banned for whatever reason detected by their bot, you have no way to appeal with their nonexistent customer support.

brazzledazzle · on Aug 7, 2018

The Cloud products follow a Deprecation Policy documented here: https://cloud.google.com/terms/

I can understand being upset about their consumer products but it doesn’t really apply here.

dvasdekis · on Aug 7, 2018

This is what always puzzled me about the Google-Alphabet strategy, specifically the idea of having all the assets under a single share ticker (GOOGL).

The more services you put under one banner, the more the stink of one disaster is going to linger, and hinder adoption of the successes.

To me, a far simpler proposition would be a new brand & share issuance for each new sub-company (eg. Waymo), with existing Alphabet shareholders getting pro-rata shares in the new company.

brazzledazzle · on Aug 7, 2018

I’d bet on it being about the recognition that Google has been a pioneer and thought leader in the scalable systems/hosting space and they didn’t want to throw out the baby with the bathwater.

parasubvert · on Aug 9, 2018

DNS failures were almost certainly related to all the k8s system services on the cluster not having CPU or memory reservations, and KubeDNS was flaking.

In general AKS is a vanilla k8s cluster and expects you know what you’re doing. MS arguably should enforce some opinions about how things like system services have reservations, etc, but none of this is vanilla. The trouble is that K8s defaults are pretty poor from a security (no seccomp profiles or apparmor/se profiles) and performance perspective (no reservations on key system DaemonSets).

We’ve had this interesting industry pendulum swing between extreme poles of “we hate opinionated platforms! Give me all the knobs!” And “this is too hard, we need opinions and guard rails!”. I think the success of K8s is exposing people to the complexity of supplying all of the config details yourself and we will see a new breed of opinionated platforms on top of it very shortly. It reminds me of the early Linux Slackware and SLS and Debian days where people traded X11 configs and window manager configs like they were treasured artifacts before Red Hat, Gnome and KDE, SuSE, and eventually Ubuntu, started to force opinions.

spicyusername · on Aug 6, 2018

This is probably why they're releasing OpenShift on Azure. To let Red Hat engineers manage the kubernetes part.

https://azure.microsoft.com/en-us/blog/openshift-on-azure-th...

netdur · on Aug 6, 2018

I like openshift, I am not sure why developers are not hot about it.

wnsire · on Aug 7, 2018

> I am not sure why developers are not hot about it.

Because it's designed entirely for enterprise customers. If you have a startup you have very little reason to choose OpenShift compared to Heroku or AWS honnestly.

I still love Redhat tho.

federicoponzi · on Aug 6, 2018

What do you like about openshift that lacks in kubernetes?

FlorianRappl · on Aug 7, 2018

We have a larger migration project going on for months. So far not a single failure occurred and our TEST environment is already fully migrated (quite responsive and rock solid) since 2 weeks.

However, I do share that Azure indeed has released a lot of half-baked features and services lately (last 1.5 to 2 years). I hope this trend does not continue.

stefanatfrg · on Aug 7, 2018

Couple of questions to the OP:

1. What version of docker / container runtime is being used?

2. What base image for your containers is being used? eg. alpine has known DNS issues [1]

[1] https://www.youtube.com/watch?v=ZnW3k6m5AY8

bsaul · on Aug 7, 2018

Side question : what are the best practices for development ? Are you suppose to run a local kubernetes deployment ( it looks like it's pretty hard to set up) , or do you run everything outside of containers when developping and then deal with k8 packaging and deployment as a completely separated issue ( which looks like it could lead to discovering a lot of issues on the preproduction environment) ?

gercheq · on Aug 7, 2018

Azure is not bad but there are definitely some rough edges. We're having trouble with their Bizspark Sponsorship biling https://news.ycombinator.com/item?id=17698948

rdl · on Aug 7, 2018

Key Vault (their HSM product) is even worse.

ubuntunero · on Aug 7, 2018

interesting. thanks

partiallypro · on Aug 7, 2018

It's a very new offering, the Linux App Services are still in beta, I have no idea why you would roll it into production expecting no hiccups. AWS is also new on this. Give it 6 months and let the kinks work out before migrating workloads. Seems like common sense.

verst · on Aug 7, 2018

App Service on Linux is an unrelated service that actually runs on top of Service Fabric (a stateful microservice and container orchestration platform).

apurvajo · on Aug 7, 2018

App Service on Linux is not in Beta, it is GA product for over year now with SLA of 99.95%. It does not use Service Fabric in backend, it uses it's own custom Orchestrator (which essentially removes the quirks of learning about orchestration away from the user)