Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thanks! I hadn't seen anyone do it this way before with a very large, partitioned dataset, but it works shockingly well as long as you're not trying to `SELECT *` the entire table. Props to the DuckDB folks.

Eventually I plan to add some thin R and Python wrapper packages around the DuckDB calls just to make it easier for researchers.



I blogged a few more notes here: https://simonwillison.net/2025/Mar/17/opentimes/


Nice! I know a couple of projects that have been using this pattern.

- https://bsky.app/profile/jakthom.bsky.social/post/3lbarcvzrc...

- https://bsky.app/profile/jakthom.bsky.social/post/3lb4y65z24...

- https://skyfirehose.com

Love this distribution pattern. Users can go to the Parquet files or attach to your "curated views" on a small DuckDB database file.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: