The ESP32 and ESP8266 are significantly more complex and come with a bunch of downsides that make an already difficult venture into the "hardware land" even more so. Watchdog timeout or some other exception, you'll get a stacktrace that requires a fragile Java piece of software to decode, and it only does that to some extent. It's seriously not pleasant.
In my anecdotal experience the ESPs are also not as robust against mistreatment as the AVRs are. According to the datasheet they shouldn't survive as much as they seem to.
So while the hardware might be better, and there is always better hardware out there, it's sometimes worth to avoid the complexity.
The ESP is compatible with the Arduino IDE. I've found it as easy to program as any other microcontroller. (Though I have encountered watchdog issues with unoptimized code)
As long as you stay within the guide rails and libraries provided by the IDE.
The average user cannot debug the RTOS running on an ESP, a single Cortex-M4 is more complex than an AVR but still understandable down to bare metal by your average enthusiast.
I'm in the middle of porting a project from arduino to esp-idf. If you want to do anything slightly complex with the Bluetooth stack, those features just don't exist in the arduino core. It's very frustrating.
That's why I typically prefer to build my base application on a standalone processor and export Bluetooth or Wifi duties to an nRF52820 or ATWILC1000.
Yes, the processor required to run these network protocols is pretty beefy, so a big and fast chip like those made by Espressif has plenty of capacity to run your own app alongside the big communication stacks, but it's far simpler and more power efficient to let a small and simple Bluetooth chip do only Bluetooth stuff, and power it up or down as needed and talk to it over SPI or whatever from a host processor that you have full control of.
In my anecdotal experience the ESPs are also not as robust against mistreatment as the AVRs are. According to the datasheet they shouldn't survive as much as they seem to.
So while the hardware might be better, and there is always better hardware out there, it's sometimes worth to avoid the complexity.