> This is really a form of compression, not normalization.
First of all, the index hashes of the emails depend on the email strings, hence the indexed original schema is not normalised.
Secondly, it would not effectively be compression unless there were in fact dependencies in the data. But we can make many fair assumptions about the statistical dependencies. For example, certain emails/ips occur together more often than others, and so on. In so far as our assumptions of these dependencies are correct, normalisation gives us all the usual benefits.
First of all, the index hashes of the emails depend on the email strings, hence the indexed original schema is not normalised.
Secondly, it would not effectively be compression unless there were in fact dependencies in the data. But we can make many fair assumptions about the statistical dependencies. For example, certain emails/ips occur together more often than others, and so on. In so far as our assumptions of these dependencies are correct, normalisation gives us all the usual benefits.