Hacker News new | past | comments | ask | show | jobs | submit login

A hash function that takes arbitrary data and generates fixed-length hashes will always have collisions, because there is simply more possible inputs than possible outputs. Also known as Pigeonhole principle[0] - if you have 8 pigeons and 5 holes for them, and you want to put all pigeons in the holes, then at least 3 of them will have to go to an already occupied one.

The practical aspect of making hash function "collision-free" is by using trickery and dark magic to reduce the probability of collision down to somewhere below "Earth just got hit by an asteroid, we all have other problems". You want to go really, really low because hashes are mostly used to quickly verify data, and malicious actors will be happy to throw compute at trying to generate collisions so they can alter the data.

As for how the trickery and dark magic is performed, I'm a wizard of too low level to explain these details to you; all that I recall is that you try and make the hash maximally sensitive to every bit of the input data, so that a random bitflip anywhere has a huge probability of greatly changing the hash.

--

[0] - https://en.wikipedia.org/wiki/Pigeonhole_principle




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: