Hacker News new | past | comments | ask | show | jobs | submit login
Istio moved to CNCF Graduation stage (github.com/cncf)
197 points by AlexB138 on July 12, 2023 | hide | past | favorite | 117 comments



We are using istio at scale.

I have a love-hate relationship with it. It is very complex and builds on 5 other layer of abstraction (K8s, Envoy, Iptables,...). Grasping what is going on requires you to understand all of those layers first. Istio essentially adds one layer of proxy for all your ingress/egress requests and from an engineering/performance/cost perspective that is not amazing.

Once it is working and deployed though it provides a solid set of functionalities as part of the infrastructure directly. AuthN/Z, mTLS, security, metrics and logs are all deployed by default without the end-user having to do anything.

Eventually I expect Istio will evolve to a model that makes more sense with Ambient/eBPF (For cost/performance reasons)

The community behind Istio is especially helpful and one of the main reasons why we went with this project.


You have no idea how many Istio companies have turned to Linkerd because of this very complexity. Here's one of my favorite writeups. https://nais.io/blog/posts/changing-service-mesh/


It's malpractice to not seriously consider Linkerd over Istio at this point.


What about trafaek?


https://github.com/traefik/mesh

Last commit at Nov 28, 2022.

In kubernetes world it means that this project is dead, I guess?



traefik? [1]

1. https://traefik.io/


Try to give Kuma (https://kuma.io) a try, also a CNCF project.


> It is very complex and builds on 5 other layer of abstraction

Yeah this is a definite no for me.


Boy do I have bad news for you about all of modern software...


Not all modern software. No one is forcing anyone to use this stack.


Yes, pretty much all modern software. The real difference is whether it’s a leaky abstraction or not. Sounds like istio is leaky.


Abstractions are different from automatic. Istio is closer to automatic.


I don’t deal with Istio daily but I observed it sucked up a vast number of hours. Mysterious cracks seem to lurk in its bowels but nobody has any idea precisely where because it’s such a complex beast. Beware.


“once it is working and deployed” is the caveat here. debugging issues with it at my last job was such a constant headache we nearly scrapped it for consul.


We also run Istio at scale and feel the same way. During adopting it’s a pain, when it’s up and running it’s a dream.


Did you folks try to upgrade it? For us it was a nightmare (breaking changes, etc.)


We tried Istio, but our Devops team (8 people) said they don't have the capacity to manage that complexity. We're rolling with Linkerd ever since, still a joy


have you tried Contour yet?

https://projectcontour.io


Contour is a gateway: a controller that manages Envoy proxies at the edge of a Kubernetes environment. Istio is a service mesh: a controller that manages Envoy proxies at the edge and alongside each workload. If you are using Istio, you probably don't need Contour.

A year ago, a number of Envoy gateway maintainers (including Contour) announced their intention to join up to build one implementation of an Envoy gateway. They haven't made a lot of noise since, but they are apparently up to v0.4.

https://blogs.vmware.com/opensource/2022/05/16/contour-and-c...

You can also use the Gateway API to manage Istio. So, if you are using Istio, you probably don't need Envoy Gateway either.

Wherever you look, it's still Envoy. Unless of course you look at Linkerd, who have their own thing.


Hi everyone, I'm the person who drove the CNCF process for Istio (and made the linked commit). I'm happy to answer any questions.


Congrats on the graduation. My company has been using it for a long while through all its design iterations.

We still haven’t achieved an amazing distributed tracing strategy, we don’t use its MySQL or Redis interfaces, and haven’t rolled out more advanced features like smart retries. It’s hard to get momentum on that versus other must have work.

But for mTLS and authn and authz, it works great. Thanks for the hard work.


And yet -- grpc is still "incubating". Do these statuses really mean much?


gRPC had a graduation application open for 3 years. It was rejected very recently: https://github.com/cncf/toc/pull/300.

Reading between the lines, it sounds like the main problem is Google's tight control over the project. Apple contributes to the Swift implementation and MSFT drives the native .NET implementation, but there's little non-Google input in decision-making for Go, Java, C++ core, or any of the implementations that wrap core.

More subjectively, I'm impressed by the CNCF's willingness to stick to their stated graduation criteria. gRPC is widely used (even among other CNCF projects), and comes from the company that organized the CNCF - there must have been a lot of pressure to rubber-stamp the application.


100% this. GRPC has nearly all its code contributions by google. If google removed its funding the project would be at risk. It’s closer to a proprietary offering with source code available than a mature FOSS ecosystem. Think redhat enterprise Linux is to gRPC where-as istio/k8s is closer to Debian.


You acknowledge contributions from several large companies. It's obvious it won't go away if Google pulls support or anything. Perhaps the fact that some random folks across the internet don't feel compelled to submit patches is simply due to the low level nature of the project and that it does the job quite well and its scope is limited by its nature, thereby limiting the need for customization by every individual. I really wonder what the standard is. If I recall correctly, for instance a certain proxy graduated by CNCF was such crap at one point that it linear searched routes. That naturally necessitates contributions if you actually use such software in production at scale.


Why is there a need to read between the lines? The post seems quite clear what is needed and it sounds like the ball is just in gRPC court. If anything it seems promising that there was movement after 3.5 years.


It’s really hard to have any influence on it unless you’re inside the Google fence. Even for the primary maintainers of those two external parties that you mentioned.


IMHO that's accurate for grpc. The project works great if you're all golang on backend. As soon as you use other languages it gets complicated and the story falls apart--you almost certainly have to pull in tooling to manage protobuf generation, and proxying your grpc backend code to web frontend code (easy if your backend is golang, but many more options and questions if not). The fact grpc (and protobufs in general) need so much extra tooling is a bit of a code smell of immaturity and incubation IMHO.


Yes you need additional tooling but often, such as in C++ it's the nature of the build environments for that language. There are many organizations using gRPC in mission critical environments along with C++.


.NET tooling is quite good, although I have only used it in toy examples.


It's maintained by James Newton-King, so it's in the hands of .NET royalty :)

Unlike the other gRPC implementations, it's also taken seriously by MSFT's entire developer division. MSFT contributed a bunch of performance improvements to the protobuf runtime and generated code for .NET, the Kestrel webserver team runs gRPC benchmarks, and support for gRPC-Web is baked right into the core frameworks. From a user perspective, it's clear that someone cares about how all the pieces fit together.


Even if I only briefly used it, I wish they had the same love for the COM tooling, after 30 years, Visual Studio still can't offer an IDL editing experience similar to editing proto files.

Maybe it is about time to gRPC Windows. :)


+1 for gRPC for .NET. Its quite nice _except_ on macos. I forget the exact issue but there's some extra friction to hosting https locally on macos that makes the dev flow a lot more cumbersome.


The lack of server ALPN support on macOS is probably the extra friction you're referring to. This made accepting HTTP/2 connections with TLS impossible. Fortunately, support will be added in .NET 8 with https://github.com/dotnet/runtime/pull/79434.


Bazel, rules_proto and its grpc rules for various languages is the tooling you are looking for in my opinion! It's so nice to be able to share your proto files and generate stubs / clients across language boundries, being able to see exactly which pieces of your application break when a shape or signature changes in a proto file. Without a monorepo, it would be hard to see the impact of proto changes across disparate code repositories.


Perhaps the tooling feel somewhat overkill for toy apps that can be done with Ruby on rails or such. Many large orgs with billions of dollars of revenue per year companies are utilizing gRPC across many languages.


Surely Google has been using gRPC across many languages for a decade or more at this point, and surely this is a solved problem at Google. How is this not solved outside of Google?


Google uses Bazel, but the rest of the world doesn't.


They do have meaning, but the meaning is orthogonal from metrics like total use throughout an industry: https://github.com/cncf/toc/blob/main/process/graduation_cri...

There is little doubt in my mind that gRPC is a larger and more impactful project than Istio.


Here's the insights for grpc https://devboard.gitsense.com/grpc/grpc They are attracting less people than istio but not by much.

Full disclosure: This is my tool


The CNCF has their own tool - a Grafana dashboard for all their projects:

https://devstats.cncf.io

A bit awkward to use but lots of great info there. I use it every now and then (I'm a maintainer of Dapr).


Yeah I was aware of devstats and yes the UI is awkward to use. I'm planning on open sourcing DevBoard, since GitSense is really the differentiating factor, so CNCF is free to use it, if it wants to. I personally think Grafana is great for analyzing time series data but I don't believe it's a very good dashboard system if you need to tell a story, which is what I believe software development insights needs.

If you goto to https://devboard.gitsense.com/dapr/dapr?board=gitsense_examp... you can see how my DevBoard widget system is different than Grafanas. Note, the repo that I talk about in the Intro page hasn't been pushed to GitHub yet, but will be soon (hopefully by the end of this week). I'm planning on creating widgets where you can just feed it some numbers and it will generate a graph, but since my widgets can be programmed, you can do much more with the data to tell a story and to help surface insights.


Some of Google Cloud's critical APIs only seem to use gRPC over HTTPS. It relies on such an esoteric part of the TLS specification that many (most?) proxies can't carry the traffic. You end up hunting for 2 days to find out why your connections don't work only to realize they probably never will. So I would say it's good that gRPC isn't being pushed hard yet.

On a personal level, it's one of those projects that someone obsessed with "perfect engineering" develops, regardless of the human cost. Crappier solutions (ex. JSON-over-HTTP) are better in almost all cases.


I've never encountered a proxy that can't do the portions of TLS 1.3 that gRPC requires - NGINX, Envoy, linkerd, all the managed cloud offerings I know of, and any random Go binary can handle it. What esoteric portions of the spec are you referring to?

gRPC _does_ require support for HTTP trailers, which aren't used much elsewhere. If you want to use streaming RPCs, you also need a proxy that doesn't buffer (or allows you to disable buffering).


> perfect engineering

You couldn't get a better example of good pragmatic engineering than gRPC compared to something like CORBA or DCOM. I can't talk about "all cases" but in the cases I've come across it's a much better solution than JSON over http.


Hopefully grpc will never graduate as to not encourage people to use it. There are better ways.


c'mon, you gotta at least drop some of the "better ways" you prefer.


I prefer another CNCF incubating project, NATS. (nats.io)

It decouples the service addresses via a pubsub architecture.

So if I want service A to send a request to service B, then it is done by subscribing to a shared topic, there is no service discovery.

It kind of replaces GRPC and Istio.

I like the “static typing” and code generation you get from grpc so a hybrid of the 2 would be my preference.

I actually solved the code generation part for NATS though by using AsyncAPI (like Open API but for messaged based systems). Would be better if baked in.


Nats is a message bus/queue though, which is different from the serialisation format and rpc framework offered by protobufs +grpc.

I love a good queue, but these are “orthogonal” to borrow a favourite HN term.


Yeah completely different tech but it can solve the same problem—connection and communication between services. Implementation details.

If you don’t think about the tech between services, at the end of the day my service is using some protocol to send and receive data, using grpc or otherwise.

NATS has a clean request/reply paradigm built in that makes this simpler.


Orthogonal is a math term fwiw. Most SWEs are math majors by way of CS/CE. Hence it shows up here a lot.


Nats is probably one of the most underappreciated pieces of software. It solves so many problems and I think makes grpc a thing of the past.


There's a proto service implementation from NATS folks that I think does what you want - https://github.com/nats-rpc/nrpc


Nice! We don’t use Go though :) we use F#, Rust, and C.

Would love to see this work supported by the core Nats team. A standard spec so others could make clients.


Story below is about java experience.

The proto files format is ok, because it has nullability defined in the type.

However everything else is bad. I had to use grpc on a project and it was a pain and created problems.

Want to use postman to test locally? Forget it, you have to use some grpc client, and all of them are not ideal.

Want to write automation tests? Good luck finding a tool that supports grpc.

Want to add distributed tracing to calls? You have to use some unofficial code, and better learn the grpc implementation details if you want to be sure that the code is good.

Use json over http, or json over http2 if possible. You will have a much better and less frustrating experience.

Grpc is good for servers moving petabytes of data and for low latency needs (compressed json over http2 would be the same performance in terms of low latency, maybe a few percent slower). I guess 99% of it's users would do much better with a json over http interface.

Nowdays it is very easy to make an http call. In java it can be done with an annotation over an interface method, ex: https://docs.spring.io/spring-cloud-openfeign/docs/current/r... It is also recomended to store the data types used in the interface externally, similar to how proto files work, so that you don't have code duplication on the client and the server.


what do you people use for inter service com? http??? why?


What do you mean "why"? Why not?

gRPC itself runs on top of HTTP.


If I needed a service mesh, I'd probably use Linkerd. What would I be missing out on?


I like to equate the question to "If I needed a container orchestrator, I'd probably use Nomad. What would I be missing out on with Kubernetes?"

Ignore the CNCF for a second. Both are open source, so will survive regardless, but the former has a single vendor behind it, and the latter has almost all the cloud industry.

There are valid use cases for FreeBSD, but the default choice is Linux.


> the former has a single vendor behind it, and the latter has almost all the cloud industry

The same could be said about Apple products, but that doesn't mean people should be dissuaded from using them. Quite the opposite: being in charge of a technology means you can be 100% focused on it and be relentlessly focused on making it great for your customers.


Exactly. Linkerd is fast and simple in no small part because it doesn't have 20 competing, sharp-elbowed vendors pulling it in 21 different directions. Customer focused is everything.


I'd love to hear a compelling comparison between linkerd and istio without having to compare either with other (unrelated) categories of software.

Anyone who reads these comments should be able to get an understanding of both service meshes without having to research other software.

Are you saying that istio IS kubernetes and linkerd is not? I don't think linkerd WANTS to be kubernetes.

I love your podcast, Craig, but this "hot take" is too hot to hold


This is a good analogy. For more context in the past 5 years working with customers in the Bay Area I’ve not encountered one who mentioned linkerd let alone ran it in production.

More than half those companies ran istio in production at large scales.


Maybe you're just not talking to the right companies. There are a ton of Linkerd adopters and the list is constantly growing! https://linkerd.io/community/adopters/


I think it does come down to risk and risk mitigation. As someone who works for an “Istio vendor” we see some of the largest deployments of Istio in the world for mission critical/tier-0 workloads… and all the steps it took to get there including evaluation/POC of other vendors/mesh technologies.

Part of these decisions are based on things like “What is the rest of the industry doing?” “How vibrant/diverse is the community?” “How mature is the project _for enterprise adoption_?” “What vendors are available for enterprise support?” “Is it already available in my platform of choice?” etc.etc.

The sting of “picking the wrong container orchestrator” is still fresh in a lot of organizations.

We see Istio make it through these questions with good answers for a lot of organizations where other/alternative service mesh vendors strike out pretty quickly.

This is even before we get to the “feature comparisons” for usecases these large organizations focus on/have.


Interesting. IMO Istio is built to entice VPs and Linkerd is built to entice engineers.


No need to ignore CNCF. Linkerd has had a "Graduated CNCF" rubber stamp for a 2 years.


You have not given me one single reason why I would be better of with Istio


What didn't you like about Traefik, Consul, NGINX Service Mesh, or AWS AppMesh?


I think you could disregard a few of these without too much thought. Nginx - predatory vendor, AWS - only makes sense if you need the deep integration, Consul - only makes sense if you are on the hashi stack. Traefik I haven't spent much time with recently, but I'm a bit suspect of ingresses that reposition themselves as mesh to gain adoption.


> Consul - only makes sense if you are on the hashi stack.

As an enterprise Hashistack customer, every time I contact Consul support, they assume I'm using Kubernetes instead of Nomad, and when I tell them I'm using Nomad, I get blank stares.

Vault is great, and Consul is...fine, but the Consul PMs saw which way the orchestrator market was trending (towards k8s domination) and have adjusted their priorities accordingly. But if I had my way, I wouldn't touch the Nomad+Consul stack with a 20-foot pole ever again.


Also a Hashistack enterprise customer and we have the same experience.

Thankfully we don't have to talk to Consul support very often, but each time we meet with Consul product people it's like they don't even know Nomad exists.

We're starting to adopt Consul Service Mesh with Nomad and I am not excited.


In case anyone wants to read the rendered markdown:

https://github.com/cncf/toc/blob/main/proposals/graduation/i...



I may have missed the announcement where Istio’s ownership was being transfered to a vendor-neutral foundation like the CNCF, or is the Open Usage Commons What can be used in place?



I think this just demonstrates the power of vendor-neutral Open Source. I don’t mean that in an inflammatory way. Istio, a collaborative project from Google/IBM that was arguably going to be a slight differentiator for their respective clouds was forced to go vendor-neutral after Linkerd did.

Same thing happened to Knative.

CNCF definitely has some politics but its been interesting to see large OSS projects be essential dead on arrival now if its not in a vendor neutral holding org.

I personally try to favor vendor neutral projects now. Slightly smaller chance of being burned like I was with Grafana switching licenses.


Enough searching around told me what CNCF is, but I still don't know what it means to "graduate"


You can read more about incubated and graduates CNCF projects here: https://www.cncf.io/projects/


Tldr: you can bring your open source project to CNCF and if there are enough infra/platform developers vouching for it then you can become an incubated project. After that, there is a bunch of boxes (traction, integration, stability) that you need to check before CNCF endorses you as a graduated project.


But what does it actually mean? Is this in incubator in the same sense as a startup incubator? This just sounds like a popularity contest, where the winners get a stamp of approval from the CNCF, provided you play by their rules. What value does the CNCF add?


Think of it as process maturity. They don't want to just rubber-stamp some company's project as a "CNCF" project, only to have it collapse when the company pulls employees off it.


As a gatekeeper (and as a result, status, attention, prestige, etc.)

Funding can be attained more readily if you have a project that reaches any of these stages.


So it's just a stamp of approval?


Also a lot of projects in sandbox or incubation are “finding themselves” so to speak. It’s what used to happen when a project would release a v1 that’s really mostly a POC, then have some major overhauls/rewrite in v2/v3. At least in theory by “graduation” or v3, you know that this thing has a non-trivial user base, fairly defined in what it does and how it does it. Some independent reviews and established patterns etc.

Nothing protecting them from rotting or dying or breaking obviously, but at least you know you won’t be shouting at it alone.


Sort of. It's also a statement of maturity.


> Sort of. It's also a statement of maturity.

And of funding. Some of the hoops such as a 3rd party audit are _not_ cheap


there are 3 stages of a CNCF project: incubation,sandbox,graduated

for each there are conditions, including number of contributors, number of companies oficially backing it, etc.


Why would you want this?


A “graduated” project might find it easier to get adoption, contributors, as the guarantees (stability, integrity, security, governance) are some of the criterion that org use while deciding on tooling.

It’s basically saying: If you were shying away from using Istio in production, we graduated, take a look now?


The graduation criteria I care about is "Have completed an independent and third party security audit". Lots of software in the cloud native world puts security in the back seat sadly!


It's a statement that it's not some personal project and it is backed more or less consistently by a bunch of people.


Funding can be attained more readily if you have a project that reaches any of these stages.


What are some major CNCF projects that have gotten investor funding in the last year?


It's about the modern day J2EE - Kubernetes. Approximately the same cost per "getting something done" ratio. Exactly the same kind of orgs (both on the seller and buyer sides). I hear the money is good though, if you're into that kind of slow work.


You might not know what J2EE or Kubernetes do if you compare them like that.


I know and understand both technologies in enough detail, thank you. They were/are abused as instruments of complexity by consulting firms. They are both fantastic at this purpose.

Of course it's also possible to use this tech to actually get stuff done. But that's not what it's widely used for in this consulting environment.

The typical end-customer is you (the tax payer.) You have no influence over what you paid for, though. Also: it's a recurring cost, like Netflix. It doesn't really stop...


The problem you're describing is consultancies, not Kubernetes.


Yeah, but like J2EE, Kubernetes will become synonymous with waste [in the public sector] in about 5-10 years from now.

I mean, you've got to admit, it's a repeat of history?


Here is some community information for istio https://devboard.gitsense.com/istio/istio

Not kubernetes level https://devboard.gitsense.com/kubernetes/kubernetes but still very good.

Full Disclosure: This is my tool, but I figure the insights would be interesting/useful.


Finally… took a while.

Now CNCF needs to figure out how to get Istio to work nicely with the networking k8s addons


The CNCF doesn't really dictate that kind of stuff. If something doesn't play nice try the Istio slack or file an issue on the main repo: https://github.com/istio/istio



I think it should be titled Envoy+Istio in the same spirit of GNU+Linux.

Jokes aside, Envoy really deserves some spotlight.


What's an alternative to istio? I want to have http metrics between our services inside kubernetes. I don't really want all the fancy shmancy mtls, dpi and stuff, they don't bring value to me.


Try with Kuma (https://kuma.io/) also part of CNCF, which has been created with a much simpler model for supporting a fleet of Envoy across single and multiple cluster.


If you don't want any of the fancy stuff, then you can just use Envoy without Istio and configure it yourself.


That's an interesting thought. Even if I would throw it away for istio in the end, the experience for managing Envoy, might be valuable.

How do I do that exactly? I need to install some iptables rules inside a pod to redirect pod traffic to envoy?


Envoy is the proxy that does the heavy lifting. Istio is just a glorified configuration system. Even if you choose to use Istio you're still using Envoy.

You're spot-on about using iptables rules. There is an example here with a yaml configuration and some iptables commands: https://github.com/envoyproxy/envoy/blob/main/configs/origin...

You might be able to re-use some of that. It should be pretty easy to get metrics for outbound/inbound http requests, but I don't remember the exact yaml incantation.


Thanks, I'll look into that. Might actually be the simplest solution in the end.


lol i’m not quite following how manually injecting envoy, “configuring envoy yourself/by hand” in a pod and “copying istio code for iptables re-direction” and then trying to maintain this yourself is easier than just using istio?

install istio, turn off mtls if you dont want that (https://istio.io/latest/docs/reference/config/security/peer_...) and you have what you’re looking for. doesn’t get simpler than that.


https://linkerd.io/ is a much lighter-weight alternative but you do still get some of the fancy things like mtls without needing any manual configuration. Install it, label your namespaces, and let it do it's thing!


I don't think that I need mtls and extra CPU load for useless (to me) encryption does not sound so good. Can I opt-out of this specific feature?

Also I'm worried about its pervasiveness. Is it possible to enable those side-cars only on selected pods?


So to answer my own question:

It's not possible to disable mtls with meshed services, no configuration option for this particular feature.

There's no pervasiveness with linkerd, one need to add `linkerd.io/inject: enabled` annotation to the target service and restart deployment.


At least in Istio yes, you can annotate pods or namespaces to be part of your service mesh.


Istio is amazing once you grok how it works and get it running. It has a lot of gotchas (objects in istio-system become global?) and there’s a lot of ways to abuse or misuse it.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: