Operator Framework: Building Apps on Kubernetes

thedevopsguy · on May 1, 2018

I've been an early adopter of docker. Used Compose when it was still called Fig, used and deployed kubernetes beta up to version 1 for in-house PAAS/heroku like environment.

Must say I do miss those days when K8s was an idea that could fit in your head. The primitives were just enough back then. It was powerful developer tool for teams and we used it aggressively to accelerate our development process.

K8s has now moved beyond this and seems to me to be focussing strongly on its operational patterns. You can see these operational patterns being used together to create a fairly advanced on-prem cloud infrastructure. At times, to me, it looks like over-engineering.

Looking at the borg papers, I don't remember seeing operational primitives this advanced. The develop interface was fairly simple i.e this is my app, give me these resources, go!

I know you don't have to use this new construct but it sure does make the landscape a lot more complicated.

hardwaresofton · on May 2, 2018

I agree that this new construct makes the landscape even more complicated, but I disagree that k8s has reached the point of over-engineering. Most of the parts of k8s are still essentially complex to me -- they're what you'd need if you wanted to build a robust resource pool management kind of platform.

Ironically, the push to "simplify" the platform with various add-on tools is what is making it seem more complicated. Rather than just bucking up and telling everyone to read the documentation, and understand the concepts they need to be productive, everyone keeps building random, uncoordinated things to "help", and newcomers become confused.

For example, I don't know who this operator framework is aimed at -- it's not at application developers, but at k8s component creators who write cluster-level tools, but what cluster tool writer would want to write a tool without understanding k8s at it's core? Those are the table stakes -- if I understand k8s and already understand the operator pattern (which is really just a Controller + a CRD, two essential bits of k8s), why would I use this framework?

I think if they really wanted to help, they'd produce some good documentation or a cookbook and maintain an annex of the state of the art in how to create/implement operators. But that's boring, and has no notoriety in it.

pm90 · on May 2, 2018

I don’t see how these abstractions make the product more complex. They’re still optional.

hardwaresofton · on May 2, 2018

It's not that they force Kubernetes to be more complex, it's that they muddy the waters. I clearly understand that they're optional, and that they're an add-on essentially, but it might not look this way to a newcomer.

People are being encouraged to download a helm chart before they even write their first kubernetes resource definition. People might start using this Operator Framework before they implement their own operator from scratch (that's kind of the point) -- though honestly it's unlikely that they'll actually be clueless since it's for cluster operators.

merb · on May 1, 2018

> You can see these operational patterns being used together to create a fairly advanced on-prem cloud infrastructure. At times, to me, it looks like over-engineering.

well consider you wanted to have a High Available solutions that supports Blue Green / Rolling Deploys without downtime. You either built it yourself or you rely on something like k8s. It's not that much over engineering. K8s is a lot of code, yes. But the constructs is still pretty simple. I think deploying k8s is still way easier than most other solutions out there, like all these PaaS's and Cloud solutions. Spinning up K8s is basically just using ignition/cloud-config, coreos and PXE or better iPXE. Yeah sometimes it's troublesome to upgrade a k8s version or a etcd cluster. However everything on top of k8s or even coreos itself is extremly simple to upgrade.

inb4 or our current system is using consul, haproxy, ansible and some custom built stuff to actually run our stuff. System upgrades are still done manually or trough ansible and my company plans to replace that with k8s. it's just way simpler to keep everything up-to date and run for high availability without disruption on deployments. it's also way simpler to actually get new services/tools into production, i.e. redis/elasticsearch without needing to keep them up to date/running.

kkapelon · on May 1, 2018

>I think deploying k8s is still way easier than most other solutions out there, like all these PaaS's and Cloud solutions

Have you seen nomad + consul + traefik? Much easier to install and the end result is close to a K8s cluster.

cytzol · on May 3, 2018

Not the parent, but I really like Nomad + Consul + Fabio (or Traefik) too. I tried learning Kubernetes but there was so much to take in all at once; I tried learning the HashiStack and I could try it out one product at a time.

erikb · on May 2, 2018

It's not clear for me, if you confuse the compose/swarm development progress (which is like your ideals) and kubernetes (which afaik always was over-engineered to begin with).

Kubernetes has the huge day-1 problem, that it doesn't solve all of your problems. The hard stuff like networking and distributed storage are hook-in APIs. That's fine on Google's cloud where all the other stuff is there and was developed with these interfaces in mind, so all the endpoints are there. But most companies don't work in GCP/AWS alone. The moment you come on-premise you see that kubernetes only does 25% of what it needs to do to get the job done.

So, oyu have this tool who already lacks 75% in its original design and it tries to overcome this by adding more stuff. Then you combine this with a prematurely hyped community who just adds more stuff to solve problems that are solved, that don't need to get solved or that aren't problems, just to get their own names and logos into what's out there.

These two are patterns that make it very clear that it is impossible for Kubernetes to ever become a lean, developer friendly tool. But it's a great environment to make money already, I can tell you. And I think maybe that was the main goal from the beginning.

GauntletWizard · on May 1, 2018

There's some truth and some wistful hope in your post; In my time at Google, the only thing that was anything like these "Operators" was what was developed by the MySQL SRE team, which was great but they also admitted it was a bit "round peg, square hole". There's a shared persistence layer that hasn't quite shown up yet; you need a low-latency POSIX filesystem and a throughput-heavy non-POSIX system (Chubby and GFS in the Borg world/Etcd and ??? in k8s). Not having the ability to work with persistent, shared objects is the biggest detriment to the ecosystem. S3 sorta works if you're in AWS, GCE supports Bigtable and etc

jacques_chester · on May 1, 2018

Operations is always more complex than folk expect it to be and product evolution typically reflects that. Kubernetes was simple because it couldn't miraculously teleport to do all the things ever-larger clusters require of it.

We forever rush to the limits of current technology and then blame the technology.

I think it's worth noting that Kubernetes never tried hard to impose an opinion about what belongs to the operator (as in the person running it) and what belongs to the developer. You get the box and then you work out amongst yourselves where to draw the value line.

Cloud Foundry, which came along earlier, took inspiration from Heroku and had a lot of folks of the convention-over-configuration school involved in its early days. The value line is explicitly drawn. It's the opinionated contract of `cf push` and the services API. That dev/ops contract allowed Cloud Foundry to evolve its container orchestration system through several generations of technology without developers having to know or care about the changes. From pre-Docker to post-Istio.

Disclosure: I work for Pivotal, we do Cloud Foundry stuff. But as it happens my current day job involves a lot of thinking about Kubernetes.

dominotw · on May 1, 2018

I just find docker swarm to be so much simpler and works for the type of stuff we deploy.

Sad to see it 'lose the race' against kubernetes.

pm90 · on May 1, 2018

Can anyone talk about the positives/negatives of Operators v/s Helm Charts?

From what I see, it seems like Operators are a better tool for defining and deploying Custom k8s resources whereas helm charts are good way to organize applications (deployment, service etc. templates packaged into one tar).

mfer · on May 1, 2018

Helm and Community Charts maintainer here...

Helm is a package manager. Think of it like apt for Kubernetes.

Operators enable you to manage the operation of applications within Kubernetes.

They are complementary. You can deploy an operator as part of a Helm Chart. I recently wrote a blog post that explains how the different tools relate. https://codeengineered.com/blog/2018/kubernetes-helm-related...

nikolay · on May 1, 2018

Any plans to switch from templatized YAML to Jsonnet/Ksonnet?

bryanlarsen · on May 1, 2018

The helm 3 proposal uses Lua. Not sure if they've actually started working on it: https://github.com/kubernetes-helm/community/blob/master/hel...

nikolay · on May 2, 2018

Honestly, this is a terrible choice! I like Lua, but it's not mainstream, and it's not going to be! There are tons of embeddable JavaScript engines - they could've used TypeScript, too!

mfer · on May 2, 2018

Let me try to add a little color here.

Lua support is NOT being added for templating. It's being added to enable extensions in charts (a new thing) and for cross platform plugins. Existing style plugins will still work. This is providing some extras to help enable more cross platform work.

There are a couple reasons Lua was chosen and documentation has been written up as well.

1. There are go packages for this so we can do it in a cross platform manner (mac/win/linux/bsd/etc). Helm is used on many platforms. 2. Security issues, especially with extensions to charts, have a path forward to address. In a similar way to how we opt-in for apps to use features on cell phones we can do that for features used in extensions.

The people who did the analysis has embedded JS engines in applications recently. They are aware of the benefits, pitfalls, and how it all works.

hobofan · on May 2, 2018

> but it's not mainstream, and it's not going to be!

Why are you suggesting jsonnet then?

nikolay · on May 2, 2018

Because it's popular in Kubernetes and makes sense, unlike Lua in the context.

mfer · on May 2, 2018

> Any plans to switch from templatized YAML to Jsonnet/Ksonnet?

Not exactly. Two things to know...

First, a little history.

Only a tiny amount of people have asked for it. A prototype was created and no one, not even the ksonnet devs, were up for taking the work forward from the prototype phase. The code is still up on GitHub and Helm was built with the intent of more engines being used.

Second, ksonnet is going through some major changes right now. We can all come back and look on it again when it's ready.

Helm 3 is working to make it easier to bring your own tools, like jsonnet. It's just not likely to be baked in or replace YAML. People really aren't asking for it much.

nikolay · on May 2, 2018

Ksonnet might be under active development, but Jsonnet is stable. YAML + Jinja2 is a dirty workaround - you can't even guarantee valid YAML with it as to Jinja2, everything's just text unlike with Jsonnet.

streetcat1 · on May 1, 2018

To understand operators you should think differently about k8s. I.e. as it core, k8s is a distributed messaging queue, where the messages are in the form of a declarative desired state (defined by yaml file) of a thing.

The operator (or to be more precise one or more controllers) listen to those messages and try to reconcile the desired state with the current state.

So an operator is a combination of the message types and the controllers that take of the reconciliation process.

Taking this point of view, k8s is much more than container orchestration framework. I.e. it can orchestrate anything in the real world, as long as there is a way to define the desired state of a thing and a controller that can affect the real world.

Back to the original question. Helm was created in order to raise the abstraction of resource definition (I.e. the desired state) from plain yaml to property file which is much more readable and smaller. Along the way, it also became a packaging tool.

alpb · on May 1, 2018

The main difference of Operators is explained here https://coreos.com/blog/introducing-operators.html Operators basically encapsulate human operational knowledge on a particular system, like how do you run a Memcached Ring on Kubernetes capsulated by https://github.com/ianlewis/memcached-operator.

elsonrodriguez · on May 1, 2018

You can think of K8s Controllers/Operators as a way to extend the K8s API at runtime with a new type of object that is managed by a controller you've written yourself (as opposed to the built in k8s objects that are handled by the default controller-manager).

A Helm chart by comparison is a way to template out K8s objects to make them configurable for different environments.

Some shops will end up combining one with the other.

smarterclayton · on May 1, 2018

And that’s why the team created a Helm-operator based on Lostromos (a tool for expanding Helm charts without running a tillers erver) that makes it easy to use a Helm chart for templating but still add additional lifecycle on top:

https://github.com/operator-framework/helm-app-operator-kit

The templating aspect of Helm and the set of quality content is complementary to being able to add higher level lifecycle.

hardwaresofton · on May 2, 2018

tl;dr - this is a rant. the operator pattern should stay a pattern. "framework" and "application" are meaningless these days. Stop trying to make things so easy a "monkey could do it".

No no no no no. The words "framework" and "application" are so meaningless now that even reading this post is draining.

CoreOS pioneered the Operator pattern, but I think building up that pattern into a framework to get people developing it away from the knowing the basics of k8s is such a mistake. The operator pattern falls out of the primitives that k8s offers (there's literally a concept called a controller) -- this makes it seem like another app platform. I think the level of abstraction isn't even right, this is like trying to enable people to write daemons without knowing anything about linux or signals or processes.

Then again, I also dislike tools like Helm because they do the same thing. Why is everyone so in a rush to make inevitably leaky abstractions to make everything so easy a monkey could do it? All you're doing is encouraging cookie cutter programmers to write cookie cutter poorly understood code that will break on someone inevitably.

All essential complexity introduced by features in the layer below an abstraction cannot be simplified, it can only be hidden or removed. It is OK for things to be hard, as long as they are simple (in the rich hickey easy vs simple analogy).

pgalgali · on May 13, 2018

Can we use Operator Framework for building operators that also require new api server using api aggregator?

cookiecaper · on May 1, 2018

> You may be familiar with Operators from the concept’s introduction in 2016. An Operator is a method of packaging, deploying and managing a Kubernetes application.

"Operators", as introduced in 2016, were just bespoke Go programs that communicated with Kubernetes internals in a pretty low-level way.

You were writing special-case plugins for Kubernetes, but they didn't want to make it sound that way, because I guess that just doesn't sound hip or devopsy. This branding exercise worked out for CoreOS -- Red Hat just bought them.

This whole space is massively infused with bullshit. It's because all of these companies want to make money selling you cloud stuff, because it's profitable to rent computers at 3-5x the TCO. Google especially is hungry to claw back the lead in the cloud space from Amazon, and it's not hard to conceive why Kubernetes doesn't seem to work without fuss anywhere except GKE, or to understand the massive marketing dollars that Google is pumping into this whole Kubernetes farce (and for the record, Google seems to consider HN an important platform for k8s PR; I've been censured after too many Googlers found my k8s-skeptical posts "tedious").

Anyway, I guess that's neither here nor there. Just annoyed at what is by now the totally conventional status quo of overhyped empty promises made by people who seem more like ignorant promoters and fanboys than serious engineers.

This "Operator Framework" seems to be the same concept of Operators, just with additional library support for the plugins -- err, "Operators". It may be a good improvement, will have to research more.

atombender · on May 2, 2018

Operators are just a pattern, not a technology.

I think they chose "operator" over the more traditional "controller" because the latter can be quite simple, whereas an operator is potentially a combination of several things, including CRDs, API extensions, and controllers. For example, an operator might start different controllers depending on what cloud it's deploying to. It's a useful distinction; if someone says "I'm using this operator for X", I instantly know what they mean.

FWIW, I'm one of those who remember your name, simply because you pop up in every Kubernetes discussion with a predictably contrarian, long-winded opinion. I don't know what you're getting out of it. In this case, you're not wrong — but the curmudgeonly, somewhat tone deaf way that you go about it isn't very nice, which probably explains the downvotes.

cookiecaper · on May 2, 2018

>FWIW, I'm one of those who remember your name, simply because you pop up in every Kubernetes discussion with a predictably contrarian, long-winded opinion.

There are several k8s conversations on HN each day. I skip most of them.

> I don't know what you're getting out of it.

I get a lot of useful feedback out of it. Most of the posts I make on HN are about trying something out and gauging the response, because I want to learn from it. Sometimes I get a complete correction, which is great, because I stop believing something that's wrong. More often I get minor shifts in my personal POV, perspective on what arguments are effective and which aren't, the pitfalls/commmon counterarguments to specific positions, good feedback on tone / interpretation, and lots of other valuable information. Also, once in a while I make a nice personal connection.

I'll note for the record that response on k8s/containers is mixed. There are certainly a substantial number of threads where I'm at the bottom, but there are also threads where I make essentially the same arguments and score pretty well, along with a few supporting comments from people who say they don't get it either. HN's responses generally seem to be signaled by the tone of the thread and headline, and the preponderance of existing responses. If the groundwork is laid with a positive outlook, negative comments will usually have a hard time, and vice-versa.

Also, my arguments are usually not purely repetitive, even if they have the same core message (because same core things remain relevant). I have never talked about Operators on HN before. They came up and they're a good example of how people pretend that Kubernetes is more production-ready than it is, by obfuscating things like "you have to write special programs to teach Kubernetes how to deploy and manage your applications because the YAML configuration interface they tout isn't good for complex cases", behind the much hipper "Build a Kubernetes Operator, then you'll be cloudified and Dockerized out the wazoo!!"

I admit that I find a culture focused on this type of hype to be grating and immature, and as a sign of its inability to really bring substantial improvement to the table. I don't think I feel this way about anything new in general, I just think it's a reaction to a progressively-worsening engineering deficit in the "devops" field. I hope that I can learn whether this is right or wrong as time goes on.

> In this case, you're not wrong — but the curmudgeonly, somewhat tone deaf way that you go about it isn't very nice, which probably explains the downvotes.

Yeah, so this is a great example of why I continue to post about this. Most people would consider this subject matter very dry, but Kubernetes is something that people imbue with much more personal identification than is typical for infrastructure orchestration projects. Where are the CloudFoundry Diego disciples (paging jacques_chester ;) )?

It's important to post and learn the pressure points, and if there is any argument or circumvention that is effective against that identity imprint. I'm still trying to learn, so I continue to post and draw feedback from the community. I appreciate your participation in teaching me thus far. :)

gtirloni · on May 1, 2018

Your comment would be more useful if it shared any hands-on experience with Operators instead of the usual ranting. I think you've made your general opinion on containers and Kubernetes pretty clear already.

cookiecaper · on May 1, 2018

First, I don't think anyone notices or cares when I post specifically, so it's hard to really feel like my "general opinion" is well-known enough to not talk about it anymore. I'm no Joel Spolsky over here!

Second, I feel like it's valid to point out that Operators are not really just a method of "packaging", in a post that tries to make it sound like Operators are just a small bit of YAML or metadata. You're writing real, non-trivial Go code that tells Kubernetes explicitly how to deploy and manage the lifecycles of specific types of applications.

At least until now with the "Operator Framework", there wasn't really even anything that firmly defined an Operator as an Operator; it's just what some people called their Go code that manipulated k8s's object handling and lifecycle internals.

But, if you insist, here's one operator I've worked with: https://github.com/coreos/prometheus-operator . This is from CoreOS itself.

Here's a patch I submitted about a year ago: https://github.com/coreos/prometheus-operator/pull/289 . This required updating the way the software handled HTTP response codes in one of its "watcher" daemons (because all packaging methods need those, right?), and fixing the order of operations in the bootstrap scripts.

Some more general info about this repo:

    $ du -sh prometheus-operator/.git
        51M     prometheus-operator/.git

The repo size is 51M.

    $ git rev-list --count master
        1716

There have been almost 2000 commits.

    $ cloc --vcs=git --exclude-dir vendor,example,contrib .
         290 text files.
         278 unique files.
         121 files ignored.
    
    github.com/AlDanial/cloc v 1.74  T=0.82 s (295.4 files/s, 48117.2 lines/s)
    -------------------------------------------------------------------------------
    Language                     files          blank        comment           code
    -------------------------------------------------------------------------------
    Go                              50           1581           1392          20622
    JSON                             9              0              0           6276
    YAML                           132            260            885           4164
    Markdown                        30            792              0           2957
    Bourne Shell                    17             65             58            257
    make                             1             34              2             91
    Python                           1             10              5             40
    TOML                             1             11             20             30
    Dockerfile                       2              8              0             27
    -------------------------------------------------------------------------------
    SUM:                           243           2761           2362          34464
    -------------------------------------------------------------------------------

It appears there are over 20k lines of Go code after excluding vendor libraries and the example and contrib directories (arguably, contrib should've been included).

I dunno, it just feels a little disingenuous, to me, to say that something that involves this much code is just a "packaging method" for a normal application. "Sure, just write an operator to package that up" like it's comparable to a package.json manifest or something. It's not! You need custom daemons that watch files to make sure that your k8s deployment stays in sync, and then you need to exert very meticulous and specific control over Kubernetes' behavior to make things work well.

I think it's demonstrative that it takes north of 20k lines of Go code to package an application for deployment on Kubernetes. What do you think?

-------------

EDIT: And one clarification: my opinion on containers as such is probably not well-known, since you're conflating it with my opinion on Kubernetes.

I like containers conceptually (who wouldn't?) and I run several of them through LXC:

    NAME               STATE   AUTOSTART GROUPS IPV4            IPV6 UNPRIVILEGED 
    axxxx-dev          STOPPED 0         -      -               -    false        
    gentoo-encoder     STOPPED 0         -      -               -    false        
    jeff-arch-base-lxc STOPPED 0         -      -               -    false        
    jeff-crypto        RUNNING 0         -      xxx.xxx.xx.xxx  -    false        
    jeff-ffmpeg        STOPPED 0         -      -               -    false        
    jeff-netsec        STOPPED 0         -      -               -    false        
    jeff-ocr           STOPPED 0         -      -               -    false        
    localtorrents-lxc  RUNNING 0         -      xxx.xxx.xx.xxx  -    false        
    nim-dev            STOPPED 0         -      -               -    false        
    plex-2018          RUNNING 1         -      xxx.xxx.xxx.xxx -    true         
    unifi              STOPPED 0         -      -               -    true

I believe this is the kind of thing people actually want. Highly efficient, thin "VMs" that are easy to manage and run as independent systems without requiring the resource commitment.

There is a good place for Kubernetes in probably about 1% of deployments where it's used. Most other people are just trying to run something like LXC, but they're confused because everyone who is critical of k8s drops to -4 and gets HN's mods after them. :)

justinsaccount · on May 1, 2018

> It appears there are over 20k lines of Go code after excluding vendor libraries and the example and contrib directories (arguably, contrib should've been included).

close..

  $ find -name '*.go'|grep -v ./vendor|xargs wc -l|sort -n|tail
     507 ./test/e2e/alertmanager_test.go
     540 ./pkg/client/monitoring/v1/types.go
     562 ./pkg/alertmanager/operator.go
     643 ./pkg/prometheus/statefulset.go
     719 ./pkg/prometheus/promcfg.go
     760 ./test/e2e/prometheus_test.go
     835 ./pkg/client/monitoring/v1/zz_generated.deepcopy.go
    1152 ./pkg/prometheus/operator.go
   11410 ./pkg/client/monitoring/v1/openapi_generated.go
   24526 total

Deleting the auto generate api files and ignoring test/ gives

  -------------------------------------------------------------------------------
  Language                     files          blank        comment           code
  -------------------------------------------------------------------------------
  JSON                             9              0              0           6276
  Go                              31           1020           1065           6028

which is quite a bit off from 20k.

> I dunno, it just feels a little disingenuous, to me, to say that something that involves this much code is just a "packaging method" for a normal application.

What is disingenuous is to call the prometheus operator, that deploys an entire monitoring stack a "normal application" The operator sets up monitoring on all the nodes, runs all the server components including configuring grafana and setting up dashboards.

Meanwhile,

  $ cloc puppet-prometheus-5.0.0 puppet-grafana-4.2.0
     192 text files.
     157 unique files.
      48 files ignored.
  
  github.com/AlDanial/cloc v 1.76  T=1.09 s (131.9 files/s, 10534.0 lines/s)
  --------------------------------------------------------------------------------
  Language                      files          blank        comment           code
  --------------------------------------------------------------------------------
  Ruby                             43            425            108           2477
  Puppet                           32            132           1747           2280
  Markdown                          6            494              0           1201
  ERB                              20            105              0            925
  YAML                             35              0            200            459
  JSON                              4              0              0            329
  Bourne Shell                      2             28            130            198
  Bourne Again Shell                2             28             56            176
  --------------------------------------------------------------------------------
  SUM:                            144           1212           2241           8045
  --------------------------------------------------------------------------------

Which is that large and doesn't do ANY of the things that the prometheus operator does.

> I believe this is the kind of thing people actually want. Highly efficient, thin "VMs" that are easy to manage and run as independent systems without requiring the resource commitment.

Or, people want to use k8s so they can run entire clusters of machines as a single consistent system and take advantage of things like rolling deployments and self healing applications.

cookiecaper · on May 1, 2018

> which is quite a bit off from 20k.

True. I certainly admit that I don't know the code well enough to know exactly which lines or files are critical and which aren't, but I think it's getting into the weeds to nitpick the specifics too much (for example, I would argue that while it's probably good to exclude generated files, you shouldn't exclude the tests from the line count since they require real human time to maintain).

The point is that thousands of lines of unique, application-specific code are required to create an "operator" that runs Prometheus within k8s. This is not what most people think of when someone says "a method for packaging".

> What is disingenuous is to call the prometheus operator, that deploys an entire monitoring stack a "normal application"

While Prometheus is certainly a big system in its own right, I don't think that necessarily makes it a bad representative. Many people are planning to port their own complicated systems to Kubernetes.

> Or, people want to use k8s so they can run entire clusters of machines as a single consistent system and take advantage of things like rolling deployments and self healing applications.

It's hard to talk about this because there is so much wrapped up into the gob that is k8s, and of course not all of it is bad. But we've had "rolling deployments" and "self-healing applications" before, without having to write 10k+ lines of code to manage the platform deployment. These aren't a new thing to k8s.

k8s provides a platform that gives a nomenclature to them, but it's not always clear that there is a benefit to running on that platform v. running more traditional setups, especially when you consider that you still have to configure and code your (k8s-internal) load balancers, web servers, and applications to handle these things.

There's no free lunch. Kubernetes is a container orchestrator. It automates system-level commands like "docker run ..." and provides a (mostly redundant) fabric for those containers to feed into. That's great and there are some people who really need that, but far too many people read comments like yours and interpret it to mean "If I use Kubernetes I will have self-healing applications". It doesn't work that way.

ecnahc515 · on May 1, 2018

You just said you don't understand the code well enough to discern which files matter and don't, then go to say it takes thousands of lines of code.

If you remove the generated code, and discount the tests, its barely a thousand. Much of it is the test code, and Go is fairly verbose for testing code.

It generally maybe takes a few hundred lines of code to write simple to moderately complex operators. A lot if it's generated and boilerplate. I would say a lot of that's due to lack of generics in Go, but I wouldn't say it's very much code overall. Additionally, the framework being presented here aims to reduce that down by removing the boilerplate, and making it easier to express the end goal (eg: self-healing, auto-rebalancing, etc) using less code.

It's certainly not much more code to implement an operator than what you would see in a well written Puppet module/Chef cookbook/Ansible playbook, and it does a lot more. You certainly could try to do self-healing using these tools, but it's significantly more difficult in my experience.

I agree that there's no free lunch and that you won't just necessarily get self-healing applications by using Kubernetes. But it's certainly easier to build them when using Kubernetes. The only thing that's really changed is instead of writing to a particular cloud provider API to handle this, you're able to leverage something more agnostic to the specific cloud/vendor you're using for your infrastructure.

justinsaccount · on May 1, 2018

> But we've had "rolling deployments" and "self-healing applications" before, without having to write 10k+ lines of code to manage the platform deployment

You continue to be disingenuous and imply that every application requires 10k lines of code to run on k8s.

I recently used k8s to deploy an application. To configure 2 services with exterior and interior load balancing and health checks for rolling deployments and self healing took 115 lines of yaml, maybe 40 of which was specific to my application.

115 lines. Not 10,000+

Then once things were working I created a 2nd namespace for production and deployed an entire 2nd copy of everything. This took me 10 minutes and 2 kubectl commands.

> "If I use Kubernetes I will have self-healing applications". It doesn't work that way.

But that's exactly how it worked. I wrote 115 lines of yaml and had multiple environments, load balancers, health checks, and rolling deployments.

I know how to do this using "traditional setups", and I know it takes a lot more than 115 lines of generic yaml.

cookiecaper · on May 1, 2018

> You continue to be disingenuous and imply that every application requires 10k lines of code to run on k8s.

Let me clarify and state unambiguously that it won't necessarily take 10k lines of code to run any random application on Kubernetes.

You can, in fact, deploy Prometheus without using the Prometheus operator and you'll technically be "running your monitoring" within k8s. It just isn't likely to be very reliable or useful. :)

> But that's exactly how it worked. I wrote 115 lines of yaml and had multiple environments, load balancers, health checks, and rolling deployments.

If you already had a fully "stateless", self-healing capable application running on not-k8s, and your layout is as simplistic as "2 services with load balancers", you can probably move to Kubernetes with a comparatively small amount of fuss. If your existing setup was pretty tiny, this may have been a worthwhile project.

If you didn't already have a stateless, self-healing-capable system, and you didn't change your application to accommodate it as part of the port, then regardless of what Kubernetes reports about your pod state, you don't have a self-healing application.

The barrier between application and platform is artificial. They must work together. It's sort of a convenient fantasy that you can try to demarcate these areas. You can't just take any random thing and throw it on Kubernetes and say it's all good now because you can watch k8s cycle your pods.

Maybe you think this is implicit, but as someone who has spent the last 2.5 years building out k8s clusters for software written by average developers, I can assure you that there are a great deal of people who aren't getting this message.

I went full-time freelance about a month ago. One of the last in-house k8s services I deployed, the guy told me, "Oh yeah, we can't run more than one instance of this, or it will delete everything." Yet, these people are very proud of the "crazy scalability" they get from running on Kubernetes. Hope the next guy reads the comments and doesn't nudge that replicas field!

If you already had a non-trivial system that worked well for failover, recovery, self-healing, etc., why'd you replace it with something that is, for example, still just barely learning how to communicate reliably with non-network-attached-storage, as a beta feature in 1.10 [0], released last month? There are many things that sysadmins take for granted that don't really work well within k8s.

I accept that at first glance and with superficial projects, it can be easy to throw the thing over the fence and let k8s's defaults deal with everything. This is definitely the model and the demographic that Google has been pursuing. But if you have something more serious going on, you still have to dig into the internals of nginx and haproxy within your k8s cluster. You still have to deal with DNS. You have to deal with all the normal stuff that is used in network operations, but now, you're just dealing with a weirdly-shaped vaguely-YAMLish version of it, within the Great Googly Hall of Mirrors.

Once you do that enough, you say "Well, why am I not just doing this through real DNS, real haproxy, real nginx, like we used to do? Why am I adding this extra layer of complication to everything, including the application code that has to be adapted for Kubernetes-specific restrictions, and for which I must write <INSERT_ACCEPTABLE_LINE_NO_HERE> lines of code as an operator to ensure proper lifecycle behavior?"

Most people aren't willing to give themselves an honest answer to that question, partially because they don't really ask it. They just write some YAML and throw their code over the fence, now naively assured that the system is "self-healing". Then they get on HN and blast anyone who dares to question that experience.

[0] https://github.com/kubernetes/features/issues/121

justinsaccount · on May 1, 2018

> If your existing setup was pretty tiny, this may have been a worthwhile project.

What existing setup? I wrote and deployed this application to k8s in the span of like 4 days. If I was using "real DNS, real haproxy, real nginx" I'd probably still be trying to work out how to do zero downtime rolling deployments, and then how to clone the whole thing so I could have a separate production environment.

cookiecaper · on May 1, 2018

Yeah, so there's the crux. If you're starting from scratch and you design something explicitly to fit within Kubernetes's constraints and demands, and those constraints and demands work well with the specific application you're designing, it will, of course, be a pleasant experience to deploy on the targeted platform. The same is true for anything else.

If you make your goal to "build something that runs great on Platform", it shouldn't be a surprise that the new thing you made runs great on it. I've been talking about Real Things That Already Exist and Run Real Businesses. That's usually what we're talking about when we talk about infrastructure and servers, and that's where we see this dangerous cargo culting where people don't realize "Just use an Operator" means "just write thousands of lines of highly-specific lifecycle management code so that Kubernetes knows how to do your things".

justinsaccount · on May 1, 2018

Well, that's not really what I did.

It was a variation of an earlier project that I had deployed to EC2.. just on EC2 I had a mess of fragile boto/fabric stuff to get the test/prod machines provisioned and the application installed. It "worked" but I had no redundancy and deploys were hard cut-overs.. and if the new version didn't come up for whatever reason, it was just down.

I didn't do anything in the application itself to design it to run on k8s, I was able to re-use some existing test code to define things like

        livenessProbe:
          exec:
            command:
            - /app/healthcheck.py
          initialDelaySeconds: 
          periodSeconds: 300
          timeoutSeconds: 5

so, 7 lines of yaml and I had self healing and rolling deployments. I could have built this out on EC2.. probably would have taken me a few hundred lines of terraform/ansible/whatever and never worked as well. It's the kind of thing where I could have maybe gotten rolling deployments working, but I would have just ended up with an "ad-hoc, informally-specified, bug-ridden, slow implementation of half of k8s"

I would have been perfectly happy to just run this whole thing on EB/heroku/lambda/whatever but the application was doing low level tcp socket stuff, not http, and almost every PaaS these days wants you to speak http.

TheIronYuppie · on May 1, 2018

I'm sorry you've been down voted. FWIW, I've asked you several times for feedback, but you've never responded. We'd love to be better!

Aronchick (at) google.com

Disclosure: I work at Google on Kubeflow

cookiecaper · on May 1, 2018

Minor correction: I've replied a couple of times on HN and gotten no response. You once clarified that this is because you don't check HN often enough to reply before the deadline, which is fine. I do admit that you once sent me an email and that I didn't reply to it.

moderation · on May 1, 2018

cookiecaper covers his lack of response to k8s threads in his bio at https://news.ycombinator.com/user?id=cookiecaper. There is an email if you want to contact him directly.

n0w · on May 1, 2018

In a world where we are trying to codify more and more of the operational side of our applications, I personally look at operators as a better alternative to the configuration management systems we use to configure generic operating systems today.

We're moving away from an imperative configuration/operational model to a declarative one. While these operators target applicatings running in k8s I could imagine them being created to manage applications running elsewhere as well.

cookiecaper · on May 2, 2018

I agree with your conclusions that the world is becoming more declarative and that we need to converge on an agnostic system to make these declarations.

However, you might've been trained incorrectly, perhaps as a joke, because you've assigned properties to the opposite solutions!

Configuration management like Ansible or Salt are agnostic modules that can be plumbed under the hood to work with any platform: use Ansible's "copy" module and it can be implemented under the hood as cp, rsync, passenger pigeon, whatever. You can use the same playbook against any target, including targets that run on orchestration platforms like Kubernetes.

Kubernetes Operators, on the other hand, are tightly linked to Kubernetes internals. They are not declarative; they require explicit instruction on how to manage your application's lifecycle, implemented in a statically-typed programming language. Indeed, if your system doesn't need to muck around in the Kubernetes internals, you don't create an "Operator", you just use the pre-baked object types like Service, Deployment, etc.

So I agree in principle, but Kubernetes is the opposite of what you've expressed here.