Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

so just dump the data in and process via sql. What about pre-processing ie: different data sources (pdf, cvs), incomplete or overlapping data? I imagine some code had to be written to do this?


All my datasource where csv and mdb files. So, I directly imported them to PostgreSQL. Very little code was written to clean the data. 99% of the cleaning was done with SQL queries.

The code is mostly used to display the data. Cleaning the data with SQL queries is much faster than writing code.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: