Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The idea is that it doesn't store binary files locally, just pointers in the DB + meta data (SQLite if you run locally, open source). So, it's versioning, structuring of datasets, etc by "references" if you wish.

(that's is different from let's say DVC - that does copy files into a local cache, always)



So in the case from the README, where you're trying to curate a sample of your data, the only thing that you're reading is the metadata, UNTIL you run `export_files` and that actually copies the binary data to your local machine?


Exactly! DataChain does lazy compute. It will read metadata/json while applying filtering and only download a sample of data files (jpg) based on the filter.

This way, you might end up downloading just 1% of your data, as defined by the metadata filter.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: