Hacker News new | past | comments | ask | show | jobs | submit login

Anyone know, if we permute all 6-16 character length alpha numeric strings, how many would would have their sha-1 hash be a match for a given five character prefix?

I’m certainly not saying I think this is an issue! I’m just academically curious about the number and how to go about calculating it.




A SHA-1 hash digest is a 160-bit hexadecimal string. These are 40 digits long, with 16 possible values per digit, yielding a total search space of 16^40 possible values.

If we splice off the first five digits (which Troy originally used as the database partition key), we get 1,048,576 five digit values each (16^5). Then we continue calculating with the sixth position as the new first position in the string.

The math from here is a straightforward function mapping 16^n -> 16^(n - 5):

* 16 six digit values match any given five digit partition key, p,

* 16^2 = 256 seven digit values match any p,

* 16^3 = 4,096 eight digit values match any p,

* 16^4 = 65,536 nine digit values match any p,

* 16^5 = 1,048,576 ten digit values match any p,

* 16^6 = 16,777,216 11 digit values match any p,

* 16^7 = 268,435,456 12 digit values match any p,

* 16^8 = 4,294,967,296 13 digit values match any p,

* 16^9 = 68,719,476,736 14 digit values match any p,

* 16^10 = 1,099,511,627,776 15 digit values match any p,

* 16^11 = 17,592,186,044,416 16 digit values match any p.

So in general, to calculate how many n digit SHA-1 digests correspond to any m digit prefix, we simply calculate (16^n)/(16^m), which yields 16^(n - m). Thus we have (16^[6..16]) / (16^5) for the five digit prefix case. Hopefully that elucidates it for you!


Shouldn't the size of the alphanumeric alphabet used for the passwords be in there somewhere? The question was about how many permutations of all 6-16 character alphanumeric strings map to a given 5 digit SHA-1 prefix.

Your answer seems to be for the case where the alphabet of the input string is hex digits.

I.e., I think we want Sum[A^i,{i,6,16}]/16^5, where A is the size of the alphanumeric alphabet.

For A=62, this is 4.6x10^22 or 2^75.3.

For A=95, this is 4.2x10^25 or 2^85.1.


Note that there's no need to actually perform N-M+1 exponentiations and N-M additions. Two exponentiations, a subtraction, and a division will give you the same answer.

  Sum[A^i,{i,1,N}] = (A^(N+1) - 1) / (A-1).
  Sum[A^i,{i,M,N}] = (A^(N+1) - A^M) / (A-1).
Consider A=10, M=6, N=2. Sum = (10,000,000 - 100) / 9 = (9,999,900) / 9 = 1,111,100, which is easily checked mentally.

Yes, I've stared at permutations for far too long, but I can't have been the first to notice the pattern. It must be in plenty of textbooks.

Another cute combinatorics question: using an alphabet of size A, generate the Nth largest M-character sequence where no two adjacent characters are equal. A simple count-and-check algorithm takes O(N) time, but there's a simple O(M) algorithm (that is, constant time if the number of characters is fixed), assuming constant-time basic arithmetic operations.


If by "alphanumeric" you mean [0-9a-zA-Z], that's an alphabet of 2 * 26+10=62 characters. Since I'll be doing this in my head, I'll add two punctation characters so we end up at 64=2^6.

Now, I assume you mean "five character prefix of the hex-encoded 160 bit hash"? One hex digit is half a byte; a nibble or 4 bits. 5 * 4=20 bits, leaves 160-20=140 bit hash. Which can represent 2^140 different values.

För a password of 1 to 16 characters from our alphabet, we have 64^16 passwords. The five character prefixes account for 64^5 of those.

64^16=2^(6 * 16)=2^96

64^5=2^30.

(2^96-2^30) < 2^140.

So, if I understood your question: all of them could (should) have unique representations in the remaining hash.

But I think you meant how many would collide in the prefix?

That'd be (2^96-2^30)/(2^20).

I think, it's getting to be bed time. But at least I should be sufficiently wrong for someone to correct me ;)


In theory, if sha-1 were a "good" hash function in the sense of evenly distributed, then if there are 2^20 prefixes then each would get 1/2^20 fraction of all the hashes.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: