so just dump the data in and process via sql. What about pre-processing ie: different data sources (pdf, cvs), incomplete or overlapping data? I imagine some code had to be written to do this?
All my datasource where csv and mdb files. So, I directly imported them to PostgreSQL. Very little code was written to clean the data. 99% of the cleaning was done with SQL queries.
The code is mostly used to display the data. Cleaning the data with SQL queries is much faster than writing code.