Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Interesting:

> Airflow was started in October 2014 by Maxime Beauchemin at Airbnb. It was open source from the very first commit and officially brought under the Airbnb GitHub and announced in June 2015.

I believe I starting building my tool somewhere around 2010, possibly 2011. The core mechanism has been completely unchanged in that time. If Airflow was a thing at the time, I'd have hopefully looked into it. I looked at a handful of similar products and didn't find anything that was a good fit.

Based on a really quick skim of the Airflow docs it seems like it checks all of the boxes. Off the top of my head:

* LocalExecutor (with some degree of parallelism, assuming the dependencies are all declared properly) seems to do exactly what I want.

* I could write an Operator to handle the interaction with the system where the processes actually run. The existing Python script that does this interaction can probably get me 90% of the way there. Due to the nature of what I'm running, any job scheduler will have to tell the target system to do a thing then poll it to wait for the thing to be done. To do this without any custom code, I could just use BashOperator to call my existing script.

* It's written in Python, so the barrier to entry (for me) is fairly low.

* Converting the existing Makefile to an Airflow DAG is likely something that can be done automatically. We deliberately keep the Makefile very consistent, so a conversion program can take advantage of that.

I think my dream of replacing this might have new life!



But... why would you want to spend energy replacing something that has been running stable for over a decade?


There are a number of deficiencies with the current system that aren't showstoppers, but are pain points nonetheless. Off the top of my head:

* There's no reasonable way to do cross-batch dependencies (e.g., if process X in batch A fails, don't run process Y in batch B). I've got a few ideas on how I could add this in, but nothing has been implemented yet.

* There's no easy way to visualize what's going on. Airflow has a Gantt view that looks very useful for this purpose, our business users would absolutely LOVE the task duration graph, and the visualization of the DAG looks really helpful too.

* Continuing a failed batch is a very manual process.

None of these are showstoppers because, as you said, this has been running fine for over a decade. These are mostly quality-of-life improvements.


Ah, I understand. That makes sense. If you have business users, then it makes sense to go with something like Airflow because they do make it easier for less technical users to inspect jobs, kick them off, visualize them, etc. The UI makes all the difference for those use cases.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: