Or "better" (for some value of better), join the data to something identifiable (e.g., the public records you listed) before developing models for everything you want to retain, then discard the original data once your deep enough net has effectively auto-encoded whatever you wanted to retain in an identifiable manner.