I recently built a bunch of stuff on Azure, and the product limitations are abso...

verst · on June 5, 2021

That sounds painful indeed. I've never had to use any of these services or features on any other cloud so I can't compare. I've certainly heard that Azure Networking isn't great, but then again I am not someone who ever has needs that can't be met by what is being offered.

It sounds like you mostly deal with the Infrastructure level services - VMs, availability sets and networking.

What are your thoughts on the PaaS offerings (and there are many - too many to the point it gets confusing)? The Log Analytics issue seems very surprising - definitely something I'd expect to recover quickly without the need for intervention.

jiggawatts · on June 5, 2021

I used or touched most of their flagship services, including a bunch of PaaS stuff, including DNS, App Service, Service Fabric, AKS, Front Door, etc...

Microsoft's Azure team simply doesn't have quality in their vocabulary. Everything they do misses the mark, and PaaS is significantly worse than IaaS, especially for performance. Just barely good enough? Ship it! Not good enough? Ship it anyway!

The first thing I noticed about App Service is that if you use ARM templates, there's a different schema for the "Primary" slot and the other named slots. This is an insanely bad design, and should have been caught very early on and never seen the light of day. That team is just beyond lazy: instead of updating the ARM schema when they introduce new features, they just shove them into barely-documented (or undocumented) environment variables that the platform picks up. So in other words, the "bag of app settings" isn't just the settings used by your app, it is also the system configuration! This makes it night impossible to factor out reusable chunks of ARM templates, because they blend wildly unrelated things into the same flat list of variables. Things like: Regional options, network routing(!), and App Insights monitoring settings are side-by-side with your app settings. It's nuts.

But the performance issues just blew my mind. App Service is shockingly slow. Microsoft runs it in their VNets, and then tunnels the traffic through basically a VPN gateway running on virtual machines. So if you need private VNet integration, your latencies go from merely disappointing to fantastically bad. Think up to 10 ms for a "ping" HTTPS REST call, or 3-7 ms for a SQL "print 1" statement to their "Business Critical" tier! It's absurd.

For comparison, in the IaaS space, they're catching up with the network performance that AWS or GCP have been providing for a while. The combination of Proximity Placement Groups and Accelerated Networking reduces latency to about 50 microseconds, which is very good, and very noticeably speeds up practically all applications. Combined with the new AMD EPYC VM SKUs, I'm yet to see a speed-up smaller than twice as fast compared to the older Intel SKUs without those networking features.

Unfortunately, 100% of their PaaS components run without the aforementioned features. All of it. The Private Endpoints? Reedy little VMs running on old Intel CPUs with software emulated NICs. Azure SQL Database? Ditto. App Service? No acceleration, and can't be placed in a proximity placement group to be close to the Azure SQL Database! In some locations you have a mere 20% chance of your web server being put into the same data centre as your database! Crazy.

My impression is that their insistence on using IPv4 for everything is the source of most of their woes. Everything has to be NAT-ed multiple times in a typical PaaS app, or even tunneled or proxied, which is madness. If they had just embraced IPv6 early on and used it for all of their PaaS services, they could have eliminated an awful lot of complexity while boosting performance very dramatically. For example, there are at least four or five different, unrelated ways of connecting App Service to a VNet, none of which would be required at all if they just used IPv6: https://docs.microsoft.com/en-us/azure/app-service/web-sites...

Someone really needs to bang some heads together at that place and explain to them that scalability is not the only concern, and that latency also matters.

And availability.

App Service has no Zone Redundant option! Azure SQL Database does, but not the matching App Service. So a typical 2-tier PaaS application has mismatched high availability capabilities at the various tiers. Again, how did nobody notice this and fix it years ago? Boggles the mind.

I could seriously go on and on for hours, like how Azure DNS only collects metrics every 2 hours, which means if you make an administrative change you don't get to see the impact for at least an hour. At which point you see a ZERO in the graph (that only shows values in 1 hour intervals) and you have a panic attack.

Or how Azure Front door supports none of the technologies you'd expect, and actually slows down most web applications. It's missing all of the following: Brotli, 0-RTT, HTTP/3, ECC certificates, TLS v1.3, OCSP stapling, and probably some more than I've forgotten. The competition, like Cloudflare, typically supports all of these and more. Don't worry, you can now enable HSTS headers, but they charge you $50/mo for the privilege.