Hacker News new | past | comments | ask | show | jobs | submit login

> what you really want is a company wide DAG where individuals > can add their own nodes and which handles schema > matching as nodes are upgraded, invalidation > of downstream data when upstream data is invalidated

I've never seen this work in practice, and doubt it can work, due to the complexities involved.




It really helped in our case. We have a team of 10+ researchers who alse ship code in production. They were repeatedly running into a problem where they recompute same data in runtime, or reinvent the wheel because they didn’t know somebody already computed that datum. I end up writing a small single-process (for now) workflow engine running a “company-wide DAG” of reusable data processing nodes (all derived from user-submitted input + models). Now it is much easier for individuals to contribute + much easier to optimize pipelines separately. I might open source it some time soon.


It's what is done de facto by large enough groups anyway. They just have to kludge tooling together for it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: