I have a love-hate relationship with it.
It is very complex and builds on 5 other layer of abstraction (K8s, Envoy, Iptables,...). Grasping what is going on requires you to understand all of those layers first. Istio essentially adds one layer of proxy for all your ingress/egress requests and from an engineering/performance/cost perspective that is not amazing.
Once it is working and deployed though it provides a solid set of functionalities as part of the infrastructure directly. AuthN/Z, mTLS, security, metrics and logs are all deployed by default without the end-user having to do anything.
Eventually I expect Istio will evolve to a model that makes more sense with Ambient/eBPF (For cost/performance reasons)
The community behind Istio is especially helpful and one of the main reasons why we went with this project.
I don’t deal with Istio daily but I observed it sucked up a vast number of hours. Mysterious cracks seem to lurk in its bowels but nobody has any idea precisely where because it’s such a complex beast. Beware.
“once it is working and deployed” is the caveat here. debugging issues with it at my last job was such a constant headache we nearly scrapped it for consul.
We tried Istio, but our Devops team (8 people) said they don't have the capacity to manage that complexity.
We're rolling with Linkerd ever since, still a joy
Contour is a gateway: a controller that manages Envoy proxies at the edge of a Kubernetes environment. Istio is a service mesh: a controller that manages Envoy proxies at the edge and alongside each workload. If you are using Istio, you probably don't need Contour.
A year ago, a number of Envoy gateway maintainers (including Contour) announced their intention to join up to build one implementation of an Envoy gateway. They haven't made a lot of noise since, but they are apparently up to v0.4.
Congrats on the graduation. My company has been using it for a long while through all its design iterations.
We still haven’t achieved an amazing distributed tracing strategy, we don’t use its MySQL or Redis interfaces, and haven’t rolled out more advanced features like smart retries. It’s hard to get momentum on that versus other must have work.
But for mTLS and authn and authz, it works great. Thanks for the hard work.
Reading between the lines, it sounds like the main problem is Google's tight control over the project. Apple contributes to the Swift implementation and MSFT drives the native .NET implementation, but there's little non-Google input in decision-making for Go, Java, C++ core, or any of the implementations that wrap core.
More subjectively, I'm impressed by the CNCF's willingness to stick to their stated graduation criteria. gRPC is widely used (even among other CNCF projects), and comes from the company that organized the CNCF - there must have been a lot of pressure to rubber-stamp the application.
100% this. GRPC has nearly all its code contributions by google. If google removed its funding the project would be at risk. It’s closer to a proprietary offering with source code available than a mature FOSS ecosystem. Think redhat enterprise Linux is to gRPC where-as istio/k8s is closer to Debian.
You acknowledge contributions from several large companies. It's obvious it won't go away if Google pulls support or anything. Perhaps the fact that some random folks across the internet don't feel compelled to submit patches is simply due to the low level nature of the project and that it does the job quite well and its scope is limited by its nature, thereby limiting the need for customization by every individual. I really wonder what the standard is. If I recall correctly, for instance a certain proxy graduated by CNCF was such crap at one point that it linear searched routes. That naturally necessitates contributions if you actually use such software in production at scale.
Why is there a need to read between the lines? The post seems quite clear what is needed and it sounds like the ball is just in gRPC court. If anything it seems promising that there was movement after 3.5 years.
It’s really hard to have any influence on it unless you’re inside the Google fence. Even for the primary maintainers of those two external parties that you mentioned.
IMHO that's accurate for grpc. The project works great if you're all golang on backend. As soon as you use other languages it gets complicated and the story falls apart--you almost certainly have to pull in tooling to manage protobuf generation, and proxying your grpc backend code to web frontend code (easy if your backend is golang, but many more options and questions if not). The fact grpc (and protobufs in general) need so much extra tooling is a bit of a code smell of immaturity and incubation IMHO.
Yes you need additional tooling but often, such as in C++ it's the nature of the build environments for that language. There are many organizations using gRPC in mission critical environments along with C++.
It's maintained by James Newton-King, so it's in the hands of .NET royalty :)
Unlike the other gRPC implementations, it's also taken seriously by MSFT's entire developer division. MSFT contributed a bunch of performance improvements to the protobuf runtime and generated code for .NET, the Kestrel webserver team runs gRPC benchmarks, and support for gRPC-Web is baked right into the core frameworks. From a user perspective, it's clear that someone cares about how all the pieces fit together.
Even if I only briefly used it, I wish they had the same love for the COM tooling, after 30 years, Visual Studio still can't offer an IDL editing experience similar to editing proto files.
+1 for gRPC for .NET. Its quite nice _except_ on macos. I forget the exact issue but there's some extra friction to hosting https locally on macos that makes the dev flow a lot more cumbersome.
The lack of server ALPN support on macOS is probably the extra friction you're referring to. This made accepting HTTP/2 connections with TLS impossible. Fortunately, support will be added in .NET 8 with https://github.com/dotnet/runtime/pull/79434.
Bazel, rules_proto and its grpc rules for various languages is the tooling you are looking for in my opinion! It's so nice to be able to share your proto files and generate stubs / clients across language boundries, being able to see exactly which pieces of your application break when a shape or signature changes in a proto file. Without a monorepo, it would be hard to see the impact of proto changes across disparate code repositories.
Perhaps the tooling feel somewhat overkill for toy apps that can be done with Ruby on rails or such. Many large orgs with billions of dollars of revenue per year companies are utilizing gRPC across many languages.
Surely Google has been using gRPC across many languages for a decade or more at this point, and surely this is a solved problem at Google. How is this not solved outside of Google?
Yeah I was aware of devstats and yes the UI is awkward to use. I'm planning on open sourcing DevBoard, since GitSense is really the differentiating factor, so CNCF is free to use it, if it wants to. I personally think Grafana is great for analyzing time series data but I don't believe it's a very good dashboard system if you need to tell a story, which is what I believe software development insights needs.
If you goto to https://devboard.gitsense.com/dapr/dapr?board=gitsense_examp... you can see how my DevBoard widget system is different than Grafanas. Note, the repo that I talk about in the Intro page hasn't been pushed to GitHub yet, but will be soon (hopefully by the end of this week). I'm planning on creating widgets where you can just feed it some numbers and it will generate a graph, but since my widgets can be programmed, you can do much more with the data to tell a story and to help surface insights.
Some of Google Cloud's critical APIs only seem to use gRPC over HTTPS. It relies on such an esoteric part of the TLS specification that many (most?) proxies can't carry the traffic. You end up hunting for 2 days to find out why your connections don't work only to realize they probably never will. So I would say it's good that gRPC isn't being pushed hard yet.
On a personal level, it's one of those projects that someone obsessed with "perfect engineering" develops, regardless of the human cost. Crappier solutions (ex. JSON-over-HTTP) are better in almost all cases.
I've never encountered a proxy that can't do the portions of TLS 1.3 that gRPC requires - NGINX, Envoy, linkerd, all the managed cloud offerings I know of, and any random Go binary can handle it. What esoteric portions of the spec are you referring to?
gRPC _does_ require support for HTTP trailers, which aren't used much elsewhere. If you want to use streaming RPCs, you also need a proxy that doesn't buffer (or allows you to disable buffering).
You couldn't get a better example of good pragmatic engineering than gRPC compared to something like CORBA or DCOM. I can't talk about "all cases" but in the cases I've come across it's a much better solution than JSON over http.
I prefer another CNCF incubating project, NATS. (nats.io)
It decouples the service addresses via a pubsub architecture.
So if I want service A to send a request to service B, then it is done by subscribing to a shared topic, there is no service discovery.
It kind of replaces GRPC and Istio.
I like the “static typing” and code generation you get from grpc so a hybrid of the 2 would be my preference.
I actually solved the code generation part for NATS though by using AsyncAPI (like Open API but for messaged based systems). Would be better if baked in.
Yeah completely different tech but it can solve the same problem—connection and communication between services. Implementation details.
If you don’t think about the tech between services, at the end of the day my service is using some protocol to send and receive data, using grpc or otherwise.
NATS has a clean request/reply paradigm built in that makes this simpler.
The proto files format is ok, because it has nullability defined in the type.
However everything else is bad.
I had to use grpc on a project and it was a pain and created problems.
Want to use postman to test locally?
Forget it, you have to use some grpc client, and all of them are not ideal.
Want to write automation tests? Good luck finding a tool that supports grpc.
Want to add distributed tracing to calls? You have to use some unofficial code, and better learn the grpc implementation details if you want to be sure that the code is good.
Use json over http, or json over http2 if possible. You will have a much better and less frustrating experience.
Grpc is good for servers moving petabytes of data and for low latency needs (compressed json over http2 would be the same performance in terms of low latency, maybe a few percent slower). I guess 99% of it's users would do much better with a json over http interface.
Nowdays it is very easy to make an http call. In java it can be done with an annotation over an interface method, ex: https://docs.spring.io/spring-cloud-openfeign/docs/current/r...
It is also recomended to store the data types used in the interface externally, similar to how proto files work, so that you don't have code duplication on the client and the server.
I like to equate the question to "If I needed a container orchestrator, I'd probably use Nomad. What would I be missing out on with Kubernetes?"
Ignore the CNCF for a second. Both are open source, so will survive regardless, but the former has a single vendor behind it, and the latter has almost all the cloud industry.
There are valid use cases for FreeBSD, but the default choice is Linux.
> the former has a single vendor behind it, and the latter has almost all the cloud industry
The same could be said about Apple products, but that doesn't mean people should be dissuaded from using them. Quite the opposite: being in charge of a technology means you can be 100% focused on it and be relentlessly focused on making it great for your customers.
Exactly. Linkerd is fast and simple in no small part because it doesn't have 20 competing, sharp-elbowed vendors pulling it in 21 different directions. Customer focused is everything.
This is a good analogy. For more context in the past 5 years working with customers in the Bay Area I’ve not encountered one who mentioned linkerd let alone ran it in production.
More than half those companies ran istio in production at large scales.
Maybe you're just not talking to the right companies. There are a ton of Linkerd adopters and the list is constantly growing! https://linkerd.io/community/adopters/
I think it does come down to risk and risk mitigation. As someone who works for an “Istio vendor” we see some of the largest deployments of Istio in the world for mission critical/tier-0 workloads… and all the steps it took to get there including evaluation/POC of other vendors/mesh technologies.
Part of these decisions are based on things like “What is the rest of the industry doing?” “How vibrant/diverse is the community?” “How mature is the project _for enterprise adoption_?” “What vendors are available for enterprise support?” “Is it already available in my platform of choice?” etc.etc.
The sting of “picking the wrong container orchestrator” is still fresh in a lot of organizations.
We see Istio make it through these questions with good answers for a lot of organizations where other/alternative service mesh vendors strike out pretty quickly.
This is even before we get to the “feature comparisons” for usecases these large organizations focus on/have.
I think you could disregard a few of these without too much thought. Nginx - predatory vendor, AWS - only makes sense if you need the deep integration, Consul - only makes sense if you are on the hashi stack. Traefik I haven't spent much time with recently, but I'm a bit suspect of ingresses that reposition themselves as mesh to gain adoption.
> Consul - only makes sense if you are on the hashi stack.
As an enterprise Hashistack customer, every time I contact Consul support, they assume I'm using Kubernetes instead of Nomad, and when I tell them I'm using Nomad, I get blank stares.
Vault is great, and Consul is...fine, but the Consul PMs saw which way the orchestrator market was trending (towards k8s domination) and have adjusted their priorities accordingly. But if I had my way, I wouldn't touch the Nomad+Consul stack with a 20-foot pole ever again.
Also a Hashistack enterprise customer and we have the same experience.
Thankfully we don't have to talk to Consul support very often, but each time we meet with Consul product people it's like they don't even know Nomad exists.
We're starting to adopt Consul Service Mesh with Nomad and I am not excited.
I may have missed the announcement where Istio’s ownership was being transfered to a vendor-neutral foundation like the CNCF, or is the Open Usage Commons What can be used in place?
I think this just demonstrates the power of vendor-neutral Open Source. I don’t mean that in an inflammatory way. Istio, a collaborative project from Google/IBM that was arguably going to be a slight differentiator for their respective clouds was forced to go vendor-neutral after Linkerd did.
Same thing happened to Knative.
CNCF definitely has some politics but its been interesting to see large OSS projects be essential dead on arrival now if its not in a vendor neutral holding org.
I personally try to favor vendor neutral projects now. Slightly smaller chance of being burned like I was with Grafana switching licenses.
Tldr: you can bring your open source project to CNCF and if there are enough infra/platform developers vouching for it then you can become an incubated project.
After that, there is a bunch of boxes (traction, integration, stability) that you need to check before CNCF endorses you as a graduated project.
But what does it actually mean? Is this in incubator in the same sense as a startup incubator? This just sounds like a popularity contest, where the winners get a stamp of approval from the CNCF, provided you play by their rules. What value does the CNCF add?
Think of it as process maturity. They don't want to just rubber-stamp some company's project as a "CNCF" project, only to have it collapse when the company pulls employees off it.
Also a lot of projects in sandbox or incubation are “finding themselves” so to speak. It’s what used to happen when a project would release a v1 that’s really mostly a POC, then have some major overhauls/rewrite in v2/v3. At least in theory by “graduation” or v3, you know that this thing has a non-trivial user base, fairly defined in what it does and how it does it. Some independent reviews and established patterns etc.
Nothing protecting them from rotting or dying or breaking obviously, but at least you know you won’t be shouting at it alone.
A “graduated” project might find it easier to get adoption, contributors, as the guarantees (stability, integrity, security, governance) are some of the criterion that org use while deciding on tooling.
It’s basically saying: If you were shying away from using Istio in production, we graduated, take a look now?
The graduation criteria I care about is "Have completed an independent and third party security audit". Lots of software in the cloud native world puts security in the back seat sadly!
It's about the modern day J2EE - Kubernetes. Approximately the same cost per "getting something done" ratio. Exactly the same kind of orgs (both on the seller and buyer sides). I hear the money is good though, if you're into that kind of slow work.
I know and understand both technologies in enough detail, thank you. They were/are abused as instruments of complexity by consulting firms. They are both fantastic at this purpose.
Of course it's also possible to use this tech to actually get stuff done. But that's not what it's widely used for in this consulting environment.
The typical end-customer is you (the tax payer.) You have no influence over what you paid for, though. Also: it's a recurring cost, like Netflix. It doesn't really stop...
The CNCF doesn't really dictate that kind of stuff.
If something doesn't play nice try the Istio slack or file an issue on the main repo: https://github.com/istio/istio
What's an alternative to istio? I want to have http metrics between our services inside kubernetes. I don't really want all the fancy shmancy mtls, dpi and stuff, they don't bring value to me.
Try with Kuma (https://kuma.io/) also part of CNCF, which has been created with a much simpler model for supporting a fleet of Envoy across single and multiple cluster.
Envoy is the proxy that does the heavy lifting. Istio is just a glorified configuration system. Even if you choose to use Istio you're still using Envoy.
You might be able to re-use some of that. It should be pretty easy to get metrics for outbound/inbound http requests, but I don't remember the exact yaml incantation.
lol i’m not quite following how manually injecting envoy, “configuring envoy yourself/by hand” in a pod and “copying istio code for iptables re-direction” and then trying to maintain this yourself is easier than just using istio?
https://linkerd.io/ is a much lighter-weight alternative but you do still get some of the fancy things like mtls without needing any manual configuration. Install it, label your namespaces, and let it do it's thing!
Istio is amazing once you grok how it works and get it running. It has a lot of gotchas (objects in istio-system become global?) and there’s a lot of ways to abuse or misuse it.
I have a love-hate relationship with it. It is very complex and builds on 5 other layer of abstraction (K8s, Envoy, Iptables,...). Grasping what is going on requires you to understand all of those layers first. Istio essentially adds one layer of proxy for all your ingress/egress requests and from an engineering/performance/cost perspective that is not amazing.
Once it is working and deployed though it provides a solid set of functionalities as part of the infrastructure directly. AuthN/Z, mTLS, security, metrics and logs are all deployed by default without the end-user having to do anything.
Eventually I expect Istio will evolve to a model that makes more sense with Ambient/eBPF (For cost/performance reasons)
The community behind Istio is especially helpful and one of the main reasons why we went with this project.