Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I recently built a bunch of stuff on Azure, and the product limitations are absolutely insane. I came up with a new term in the aftermath of this project: "Almost Minimal Viable Product" (AMVP). It's like an MVP, but not quite.

Just in the last few weeks I hit these fun "broken by design" issues:

Availability Sets decrease your availability because they force big-bang changes for the member VMs. They flat out prevent one-VM-at-a-time changes for large categories of settings, such as SKU family, Accelerated Networking, and Proximity Placement groups. VMware had similar features yet no such limits over a decade ago.

Speaking of availability sets, you can create one with the number of fault domains set to "1", which makes sure that your critical servers are all plugged in to the same power rail and will fail together, ensuring disaster. You can't change this parameter.

Oh don't worry, their doco helpfully tells you to work around these glaring issues by deleting the VMs and recreating them. Except that this wipes out a bunch of settings and data that can't be recreated. Data loss is their official solution!

Speaking of data loss: You can't move a VM from one Recovery Vault to another without permanently deleting its backups first.

Other than that, Recovery Vault is a great product with only a few small feature gaps, such as the inability to back up Ultra SSD disks. You know: the type used for the most important VMs!

They NAT IPv6. I still can't get over that. You can't do anything if you enable IPv6 anywhere. For example, they just released Virtual WAN, but it has exactly zero support for IPv6. It just flat refuses to work with it. Ditto for NAT Gateway, which will refuse to NAT IPv4 if you have IPv6 enabled.

Speaking of IPv6: They generously hand them out in blocks as large as 16 addresses at a time. You get a whole /124 range all to yourself!

Stopping a VM can take up to half an hour, sometimes 2-3 hours. I hope you weren't making those aforementioned big-bang changes!

They have Gen 2 images for Windows, but not Windows + SQL Server. In fact, SQL Server has a random subset of the images you'd expect it to have, with gaps all over the place.

You can enable OS-level ("Guest") metrics, but you can only see them one VM at a time, not in any multi-VM view. You cannot imagine how fiddly this is to enable through any kind of automation.

Recently, Log Analytics randomly stopped collecting IIS logs world wide. The fix is to restart the service manually. This went on for like a week.

Some of their managed certificates are validated based on the "TLD" name, not the DNS zone name. So if you have "dev.myapp.dept.org.megacorp.com", then you have to figure out who receives these emails at the head office. In a different time zone. PS: They've never heard of you, and this looks 100% like a phishing attempt. PPS: This is totally broken for some domains, it goes to the wrong one by design.

Look, I could go on, but listing all of the showstopper issues I encountered while doing rather trivial stuff in just the last few weeks would require several hours of typing, and I'm tired because I was up until 9:30pm waiting for Azure VMs to take their sweet time to reboot.



That sounds painful indeed. I've never had to use any of these services or features on any other cloud so I can't compare. I've certainly heard that Azure Networking isn't great, but then again I am not someone who ever has needs that can't be met by what is being offered.

It sounds like you mostly deal with the Infrastructure level services - VMs, availability sets and networking.

What are your thoughts on the PaaS offerings (and there are many - too many to the point it gets confusing)? The Log Analytics issue seems very surprising - definitely something I'd expect to recover quickly without the need for intervention.


I used or touched most of their flagship services, including a bunch of PaaS stuff, including DNS, App Service, Service Fabric, AKS, Front Door, etc...

Microsoft's Azure team simply doesn't have quality in their vocabulary. Everything they do misses the mark, and PaaS is significantly worse than IaaS, especially for performance. Just barely good enough? Ship it! Not good enough? Ship it anyway!

The first thing I noticed about App Service is that if you use ARM templates, there's a different schema for the "Primary" slot and the other named slots. This is an insanely bad design, and should have been caught very early on and never seen the light of day. That team is just beyond lazy: instead of updating the ARM schema when they introduce new features, they just shove them into barely-documented (or undocumented) environment variables that the platform picks up. So in other words, the "bag of app settings" isn't just the settings used by your app, it is also the system configuration! This makes it night impossible to factor out reusable chunks of ARM templates, because they blend wildly unrelated things into the same flat list of variables. Things like: Regional options, network routing(!), and App Insights monitoring settings are side-by-side with your app settings. It's nuts.

But the performance issues just blew my mind. App Service is shockingly slow. Microsoft runs it in their VNets, and then tunnels the traffic through basically a VPN gateway running on virtual machines. So if you need private VNet integration, your latencies go from merely disappointing to fantastically bad. Think up to 10 ms for a "ping" HTTPS REST call, or 3-7 ms for a SQL "print 1" statement to their "Business Critical" tier! It's absurd.

For comparison, in the IaaS space, they're catching up with the network performance that AWS or GCP have been providing for a while. The combination of Proximity Placement Groups and Accelerated Networking reduces latency to about 50 microseconds, which is very good, and very noticeably speeds up practically all applications. Combined with the new AMD EPYC VM SKUs, I'm yet to see a speed-up smaller than twice as fast compared to the older Intel SKUs without those networking features.

Unfortunately, 100% of their PaaS components run without the aforementioned features. All of it. The Private Endpoints? Reedy little VMs running on old Intel CPUs with software emulated NICs. Azure SQL Database? Ditto. App Service? No acceleration, and can't be placed in a proximity placement group to be close to the Azure SQL Database! In some locations you have a mere 20% chance of your web server being put into the same data centre as your database! Crazy.

My impression is that their insistence on using IPv4 for everything is the source of most of their woes. Everything has to be NAT-ed multiple times in a typical PaaS app, or even tunneled or proxied, which is madness. If they had just embraced IPv6 early on and used it for all of their PaaS services, they could have eliminated an awful lot of complexity while boosting performance very dramatically. For example, there are at least four or five different, unrelated ways of connecting App Service to a VNet, none of which would be required at all if they just used IPv6: https://docs.microsoft.com/en-us/azure/app-service/web-sites...

Someone really needs to bang some heads together at that place and explain to them that scalability is not the only concern, and that latency also matters.

And availability.

App Service has no Zone Redundant option! Azure SQL Database does, but not the matching App Service. So a typical 2-tier PaaS application has mismatched high availability capabilities at the various tiers. Again, how did nobody notice this and fix it years ago? Boggles the mind.

I could seriously go on and on for hours, like how Azure DNS only collects metrics every 2 hours, which means if you make an administrative change you don't get to see the impact for at least an hour. At which point you see a ZERO in the graph (that only shows values in 1 hour intervals) and you have a panic attack.

Or how Azure Front door supports none of the technologies you'd expect, and actually slows down most web applications. It's missing all of the following: Brotli, 0-RTT, HTTP/3, ECC certificates, TLS v1.3, OCSP stapling, and probably some more than I've forgotten. The competition, like Cloudflare, typically supports all of these and more. Don't worry, you can now enable HSTS headers, but they charge you $50/mo for the privilege.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: