Hacker News new | past | comments | ask | show | jobs | submit login

That's not Benford's law though. That's just a weird distribution due to a weird cutoff.

Bensford's law is 1:30.1%, 2:17.6% 3:12.5% etc.




For the record you’re changing the goalposts. The op claimed that his example proves that the digits always have the same chance of appearing, which is clearly false.

When the max is uniformly distributed then Benford’s law emerges. I mean, all you have to do is read the link - where I derive it.

What exactly is the law — please don’t handwave. If the law is those exact point values mentioned in the article then I just showed you how we arrived at them.


What you are describing is not even the result of a uniform distribution. It's a two step process involving two uniform distributions. The end result is some weird non uniform downward sloping distribution.


That’s because we aren’t trying to look at one specific uniform distribution. We were asking why Benford’s law happens for almost all processes that follow a uniform distribution and record the result as positional notation with digits — namely that 1 appears a lot more than 2, which appears a lot more than 3, etc. Roughly in the proportion that 1 is twice that of 2, which is 1/3 more than 3, etc.

(Btw it is NOT true for eg dictionary words for example, an initial A doesnt appear more than B. That should tell you something!)

And to understand the reason we just have to look at the family of uniform distributions, and see that for almost all of them, this proportion holds. Sure, for some of them, the 1,2,3 may be even MORE prevalent relative to 4-9 because the maximum value was 400 or 4000 or 40000. Ok? You can see this. For a uniformly distributed process that happens to have that as the maximum, Benford’s law will have the same proportions between 1,2,3 but then drop for 4-9 since they didn’t get that “boost”.

But if you keep sampling and this maximum keeps growing by some continuous distribution that’s not perfectly synced with the metric system, then it’s as likely to be in the range 100-200 as it is to be in 200-300. And then as likely to be in 1000-2000 as in 2000-3000. Given that, we get something like Benford’s law.

Now, perhaps it is ALSO TRUE for other distributions. I just explained why it’s true for uniform ones.


If you took a random letter in the alphabet and then sampled from any letter before this letter you would get more samples from the earlier letters of the alphabet. That is because the two step process discards higher letters in the first step. This is not a uniform distribution and is not Bensford's law. It's just a weird two step process that over-samples earlier letters.


You are right, it would — but that’s not how it works. People don’t sample words by beginning with AAA and moving on to larger ones. So your point is just being put out there to “win”?


You just keep going round and round with handwaving that makes no sense. I read your link. I did not see Benford's law emerging anywhere in your link.

What does "max is uniformly distributed" even mean? If you think that the Benford's law holds good for a set of uniformly distributed numbers, why not simply provide that set? It would be so easy to prove your claim if you just provide an example set of numbers that obeys Benford's law.

All sets of numbers you have presented so far (0-300, 0-30000, 0-300000000000000000000000) do not follow Benford's law. It is very simple to show. In all these sets, the probability of first digit as 1 is equal to the probability of first digit as 2 which contradicts Benford's law.


That’s because you aren’t trying to find the probability of a digit given any SPECIFIC maximum, you are trying to sum the probability of the digit given that the maximum is in a certain range, over all ranges.

With large ranges, even if you exclude a power of 10 in the upper bound, it does not change the 11.11% chance of each digit being the first digit.*

That is JUST FALSE ok? For for pretty much any distribution you choose for the max, other than 100% chance it is a power of 10 and 0% chance other numbers, you’ll get that the digit 1 comes up way more than 2, which would come up more than 3, etc. How much more? This comes from the fact that there are just as many numbers 100-200 as there are 0-100. Ok? And that’s all 1s. Then you hit the 2s, and so on.

If the max happens to be anywhere in the range 100-1000 with equal probability, you get that result. Benford’s law. If the max is distributed as some sort of continuous distribution — and not that ridiculous distribution of ONLY ever being powers of 10 — then you likely get something similar.

What are you arguing about?? If you are saying it’s mysterious why the lower digits come up more than higher ones, well the mystery is over. If you want an EXACT fit to the numbers in the article then I think they come out whenever the max is uniformly distributed between 10^n and 10^(n+1). But they may also have a sort of “law of large numbers” thing where pretty much any continuous distribution of the max leads to this law. That part I can’t tell you. What I can tell you is OBVIOUSLY the lower digits will come out more frequently.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: