Hacker News new | past | comments | ask | show | jobs | submit login

In some fields its more important than others.

In life sciences research to support synthetic control arms, the FDA is caring more about the lineage/manipulation of the data than the data science models used to predict X/Y/Z.

IE - what was the data originally, what did it end up as prior to ingestion into AIML, why was it changed, what steps were involved, etc.

There are not a ton of good out of the box solutions for data lineage and its driving me nuts.

We have Apache NIFI which promises data lineage out of the box and _appears_ to deliver. I've never implemented it though.

We have pachyderm which has some support here but I don't know about it.

Besides that it appears roll-your-own.

I kind of wish there was an accepted best practice for data lineage but its - surprisingly - wild west. And its completely 100% required for industry use.




DBT does pretty well?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: