Airflow is really nice and awesome solutions for simple workflow and tasks. But when we use Complex DAGs with too many subDAGs then real pain starts. Real issue with airflow is -
1. scalability
2. Scheduler delays.
3. security ( Open source airflow ) -
A ) Airflow uses single super role that has access to resources for all its orchestration jobs which is potential compliance risks.
B ) Lack of granular roles and security groups which leads to rely on trust that no airflow users mistakenly make any changes through UI
I feel there is some undocumented dependency between scheduler, celery and web server which always hit performance issue of ETL job.
Also We see more reliability issues on the platform as more workloads are added.
Also We see more reliability issues on the platform as more workloads are added.