Hacker News new | past | comments | ask | show | jobs | submit login

It's not enough to find 33 independent questions that evenly split the world's population.

An optimal, though inelegant solution to that goal might look something like this:

"Is the {1..33}th bit of sha1(name : location : date of birth) 1?".

Clearly you'll have tons of collisions with that solution, as you would have with any solution using 33 independent questions.

To uniquely identify people, we'd either need to use more bits, or look very closely at the population and derive very specific questions.




> Clearly you'll have tons of collisions with that solution

Why?

If we assume the hash code assignment to one of the 2^32 people is uniformly random from a set of 2^160 codes, the odds of finding a collision are astronomically small (order of 2^-95 or so). Am I missing something?


You are not taking the entire hash, you are only taking the first 33 bits of the hash. Since there are only about 8.5 billion different values for the 33 bits and there are about 7 billion people, the odds are astronomically low that each of those 7 billion people will receive a different one of those 8.5 billion possibilities.

This is the birthday paradox with instead of 365 days you have 2^33 possible answer values and instead of 23 people you have 7 billion people. I leave it as an exercise for the reader to fill in these values into one of the formulas to calculate the probability of successfully giving each person a unique 33 bit answer: http://en.wikipedia.org/wiki/Birthday_problem


> You are not taking the entire hash, you are only taking the first 33 bits

Right, my bad, didn't pay attention to the problem we are trying to solve :)


It is interesting to see your comment way down the thread, even though it is by far the best.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: