so just dump the data in and process via sql. What about pre-processing ie: diff... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		bootload on June 5, 2015 \| parent \| context \| favorite \| on: Show HN: Explore 16 Years of Green Card Applicatio... so just dump the data in and process via sql. What about pre-processing ie: different data sources (pdf, cvs), incomplete or overlapping data? I imagine some code had to be written to do this?

negrit on June 5, 2015 [–]

All my datasource where csv and mdb files. So, I directly imported them to PostgreSQL. Very little code was written to clean the data. 99% of the cleaning was done with SQL queries.

The code is mostly used to display the data. Cleaning the data with SQL queries is much faster than writing code.

Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact