Hacker News new | past | comments | ask | show | jobs | submit login

Reading the comments here validates my experience. When K8s was pitched as a way to make this all run smoothly, I thought, "Great! I'll write my code, specify what gets deployed and how many times, and it'll Just Work(tm)." I built a service which had one driver node and three workers. Nothing big. It deployed Dask to parallelize some compute. The workload was typically ~30 seconds of burst compute with some pretty minor data transfer between pods. Really straightforward, IMO.

Holy smokes, did that thing blow up. A pod would go down, get stuck in some weird state (I don't recall what anymore), and K8s would spin a new one up. Okay, so it was running, but with ever-increasing zombie pods. Whatever. Then one pod would get in such a bad state that I had to nuke all pods. Fortunately, K8s was always able to re-create them once I deleted them. But I was literally deleting all my pods maybe six or seven times per day in order to keep the service up.

Ultimately, I rewrote the whole thing with a simplified architecture, and I vowed to keep clear of K8s for as long as possible. What a mess.




This can probably be chalked up to youre-doing-it-wrong (sorry) but not knowing your precise scenario, it's hard to know what went wrong. Maybe really old versions misbehaved (only started a few years ago and its been smooth sailing), but I've never seen your problem on any of our stuff and we have dozens of different services on a bunch of languages/frameworks, and none of them just give up for no reason ( though a lot often die for predictable and self-induced reasons).

I think there was some jank on AWS CNI drivers at one point that delayed pod init, but that's probably the most wtf that I've personally bumped into thankfully.


> This can probably be chalked up to youre-doing-it-wrong

Yes, and the unforgiving part of k8s is that there is a right way documented somewhere, you might have just spent 3 days sifting through docs and posts and community forums to find it.

It's sometimes worth it, sometimes not, my main gripe with k8s would just be that there is no "simple things" and it shouldn't be pitched as making it easier for small shops. Even if a small use case can be done elegantly, it will probably require a pretty comprehensive and up to date knowledge of the whole system to keep that elegance.


Yep very much so. Doing it wrong ™ applies to any deployment and shouldn’t be held against k8s. We have over a hundred services deployed in who knows how many pods in a dozen environments and it’s definitely not that unstable.


> Doing it wrong ™ applies to any deployment and shouldn’t be held against k8s

I think there's definitely a huge asterisk there if the tool makes it very easy to "do it wrong", hard to "do it right", etc.

Of course with k8s it's tough because it's capturing computation! Hard for it to "know" what one is trying to do inside the containers. And in the case of k8s the only thing I could think of that is ... kinda in that space is managing volumes, since it runs into the dilemna of adding persistence to ephemeral things.


I imagine it's akin to management expecting you to spend your off hours learning all this "great new tech" while they think working off hours is reading online articles on hacker news to "stay up to date".


> it’s definitely not that unstable

So how unstable is it?


Not at all unstable in my experience


> This can probably be chalked up to youre-doing-it-wrong (sorry)

I think you're absolutely right. I freely admit that I knew NOTHING about K8s before embarking on this project (and still pretty much know nothing about it now), and I was able to cobble together something that 'worked', but that doesn't mean it was right.

And as another commenter points out, "a huge asterisk there if the tool makes it very easy to 'do it wrong'". I would rather be very clearly told that I've got it wrong and be prevented from progressing further vs. making something that superficially seems right then crashes and burns in prod.

I'm sure there are folks that can wield Kubernetes with great effectiveness, and good on them, but I found it to be supremely frustrating and the wrong tool for the right job. Not that I have a better solution, so I'm admittedly just kind of complaining.


We've had great success running celery applications in k8s, so it's surprising to hear dask was a problem for you. Especially considering dask recommends k8s as a deployment option.


Love Dask. Very robust and therefore very easy to get wrong. When you need a longer term solution that uses Dask, it pays to architect things well, in advance vs on the fly in a sandbox.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: