Hacker News new | past | comments | ask | show | jobs | submit login

Can you provide a concrete example of a range of numbers that you think obeys Benford's law?



This is exactly the question I was going to ask.

I wrote: > The leading digits of a uniform distribution does not follow Benford's law.

And @EGreg wrote: > I’m sorry to tell you this, but you inadvertently misled people with that empirical test. This just goes to show that we have to check our assumptions, as scientists or mathematicians trying to prove a statement. (Even with empirical tests :)

So, what specific range of the uniform distribution yields leading digits that follows Benford's law?


Literally any range with min = 0 and where the max isn’t a power of 10.

For example 0-300

One third of numbers are evenly distributed: 0-100

One third starts with 1: 100-200

One third starts with 2: 200-300

Do you understand?


I understand.

There is a distribution of leading digits that looks like:

    d   P(d)
    1   30.1%   
    2   17.6%   
    3   12.5%   
    4   9.7%    
    5   7.9%    
    6   6.7%    
    7   5.8%    
    8   5.1%    
    9   4.6%    
 
As wikipedia says, "It has been shown that this result applies to a wide variety of data sets, including electricity bills, street addresses, stock prices, house prices, population numbers, death rates, lengths of rivers, physical and mathematical constants."

Neat! For each of those data sets you get the same distribution. Now, someone (I won't say who), says that it also is true for the uniform distribution.

But it isn't.

It simply isn't.

And I said as much when I said, "The leading digits of a uniform distribution does not follow Benford's law."

And your counter example is if you take a uniform distribution from 0-300, the leading digits go to something like:

    d   P(d)
    1   36.7%   
    2   36.7%   
    3   3.7%   
    4   3.7%    
    5   3.7%
    6   3.7%    
    7   3.7%
    8   3.7%
    9   3.7%
Great, so I don't know how we can disagree at this point. The above distribution is not Benford's Law.

> "The leading digits of a uniform distribution does not follow Benford's law." -- me

And you, directly disagreeing with that correct statement:

> This just goes to show that we have to check our assumptions, as scientists or mathematicians trying to prove a statement. -- EGreg

Indeed.


That's not Benford's law though. That's just a weird distribution due to a weird cutoff.

Bensford's law is 1:30.1%, 2:17.6% 3:12.5% etc.


For the record you’re changing the goalposts. The op claimed that his example proves that the digits always have the same chance of appearing, which is clearly false.

When the max is uniformly distributed then Benford’s law emerges. I mean, all you have to do is read the link - where I derive it.

What exactly is the law — please don’t handwave. If the law is those exact point values mentioned in the article then I just showed you how we arrived at them.


What you are describing is not even the result of a uniform distribution. It's a two step process involving two uniform distributions. The end result is some weird non uniform downward sloping distribution.


That’s because we aren’t trying to look at one specific uniform distribution. We were asking why Benford’s law happens for almost all processes that follow a uniform distribution and record the result as positional notation with digits — namely that 1 appears a lot more than 2, which appears a lot more than 3, etc. Roughly in the proportion that 1 is twice that of 2, which is 1/3 more than 3, etc.

(Btw it is NOT true for eg dictionary words for example, an initial A doesnt appear more than B. That should tell you something!)

And to understand the reason we just have to look at the family of uniform distributions, and see that for almost all of them, this proportion holds. Sure, for some of them, the 1,2,3 may be even MORE prevalent relative to 4-9 because the maximum value was 400 or 4000 or 40000. Ok? You can see this. For a uniformly distributed process that happens to have that as the maximum, Benford’s law will have the same proportions between 1,2,3 but then drop for 4-9 since they didn’t get that “boost”.

But if you keep sampling and this maximum keeps growing by some continuous distribution that’s not perfectly synced with the metric system, then it’s as likely to be in the range 100-200 as it is to be in 200-300. And then as likely to be in 1000-2000 as in 2000-3000. Given that, we get something like Benford’s law.

Now, perhaps it is ALSO TRUE for other distributions. I just explained why it’s true for uniform ones.


If you took a random letter in the alphabet and then sampled from any letter before this letter you would get more samples from the earlier letters of the alphabet. That is because the two step process discards higher letters in the first step. This is not a uniform distribution and is not Bensford's law. It's just a weird two step process that over-samples earlier letters.


You are right, it would — but that’s not how it works. People don’t sample words by beginning with AAA and moving on to larger ones. So your point is just being put out there to “win”?


You just keep going round and round with handwaving that makes no sense. I read your link. I did not see Benford's law emerging anywhere in your link.

What does "max is uniformly distributed" even mean? If you think that the Benford's law holds good for a set of uniformly distributed numbers, why not simply provide that set? It would be so easy to prove your claim if you just provide an example set of numbers that obeys Benford's law.

All sets of numbers you have presented so far (0-300, 0-30000, 0-300000000000000000000000) do not follow Benford's law. It is very simple to show. In all these sets, the probability of first digit as 1 is equal to the probability of first digit as 2 which contradicts Benford's law.


That’s because you aren’t trying to find the probability of a digit given any SPECIFIC maximum, you are trying to sum the probability of the digit given that the maximum is in a certain range, over all ranges.

With large ranges, even if you exclude a power of 10 in the upper bound, it does not change the 11.11% chance of each digit being the first digit.*

That is JUST FALSE ok? For for pretty much any distribution you choose for the max, other than 100% chance it is a power of 10 and 0% chance other numbers, you’ll get that the digit 1 comes up way more than 2, which would come up more than 3, etc. How much more? This comes from the fact that there are just as many numbers 100-200 as there are 0-100. Ok? And that’s all 1s. Then you hit the 2s, and so on.

If the max happens to be anywhere in the range 100-1000 with equal probability, you get that result. Benford’s law. If the max is distributed as some sort of continuous distribution — and not that ridiculous distribution of ONLY ever being powers of 10 — then you likely get something similar.

What are you arguing about?? If you are saying it’s mysterious why the lower digits come up more than higher ones, well the mystery is over. If you want an EXACT fit to the numbers in the article then I think they come out whenever the max is uniformly distributed between 10^n and 10^(n+1). But they may also have a sort of “law of large numbers” thing where pretty much any continuous distribution of the max leads to this law. That part I can’t tell you. What I can tell you is OBVIOUSLY the lower digits will come out more frequently.


The numbers in the range 0-300 do not obey Benford's law. In base 10, a set of numbers that Benford's law if the leading significant digit d (0 < d < 10) occurs with probability log10(1 + 1/d). This isn't the case for the set of numbers between 1 and 300, inclusive.


Your assertion that for large ranges every digit has the same chance of appearing is very wrong. Your empirical test is rigged by choosing a very rare max, literally the only one where it would “prove” your assertion.

Benford’s law appears when the max of your range is uniformly distributed


If you present a weird distribution to begin with, it should not be surprising that every digit does not have the same chance of appearing. That's not the point. We are not talking about weird distributions here.

If we are going to argue like this, I might as well present a set of two numbers S = {1, 2} and claim that when we choose numbers from uniform distribution, the probability of 3 occurring as the first digit is 0. Other commenters are not assuming weird distributions like this because this kind of discussion does not provide any new insights and is just a waste of time.


You can create all the strawmen you want. I am going to quote from Wikipedia:

The law states that in many naturally occurring collections of numbers, the leading significant digit is likely to be small.

I have explained why that happens for the vast majority of UNIFORMLY DISTRIBUTED VARIABLES.

The vast majority. That implies that there is a collection of all possible uniformly distributed variables, and in particular those that are sampled from real world processes.

As long as they are uniformly distributed, with 0 as the minimum and M as the maximum, the first digit will appear more commonly.

I explained it several times. Why are you still insisting that statements about MAJORITY of uniform distributions are weird?

Yes statements about collections of uniform distributions are not statements about ONE SPECIFIC uniform distribution. And?


Can you provide an example range of uniformly distributed integers that obeys Benford's law?


* The law states that in many naturally occurring collections of numbers, the leading significant digit is likely to be small.*

Pretty much all of them.


You are quoting from Wikipedia and that quote is an oversimplification.

If you redefine the law like that, then sure, I agree that there are many uniform distributions too where the 1st digit is likely to be small. Here is another simple example: Consider the distribution of positive integers from 1 to 2. If we pick a number at random from {1, 2} then the 1st digit is likely to be small. This kind of analysis is boring.

But (fortunately!) that's not what Benford's law says. Benford's law provides a specific formula. Check https://en.wikipedia.org/wiki/Benford%27s_law#Definition to see the specific formula that must hold good for a set of numbers to be said to obey Benford's law. That's what makes Benford's law so interesting whereas your example ranges are degenerative cases where nothing new, surprising, or interesting is going on.


Actually if you look at examples in the real world, you don't always get that EXACT distribution, but rather the phenomenon, that the smaller digits are way more prevalent.

And once again, this is because of a simple analysis. Let me state it a DIFFERENT way, maybe this is something that you will take note of: the only time all possible digits have equal probability of being leading digits is when we have a uniform distribution with a max that is a power of 10. Literally every other distribution starts to exhibit that phenomenon. Now you can quibble as to what distributions lead to that EXACT curve fit. And there can be explanations for why power law distributions do. But other distributions exhibit this same PHENOMENON, while not necessarily converging to that exact proportion. As I said before, any other uniform distribution would not have those exact proportions, but would exhibit the phenomenon.

Basically, every continuous distribution is highly unlikely to have a cliff at a power of 10. It is going to go down gradually, and therefore if it includes the range 8000-9000 then it will probably include numbers above 10000. And even a discontinuous one with a uniform distribution (with a cliff at the end) will exhibit the phenomenon. OK? So if you have the range 8000-9000 in there, that means 1s and 2s were a lot more prevalent, and if you have a continuous distribution then 10,000+ numbers will be there, but perhaps not numbers 90,000+.

Do you at least get the intuition behind this? As soon as you get numbers close to a power of 10, your distribution probably includes numbers in the next order of magnitude, i.e. a lot of leading 1s. The more numbers you get starting with 9, the more it is highly unlikely you're right at the max of your distribution, with a cliff. Unless that happens to be the contrived "empirical test" that was linked to as "proof" that uniform distributions lead to equal changes for every digit to be leading.

The intuition is what matters. Now, maybe for UNIFORM distributions, or POWER distributions, that exact curve fit can be worked out. Perhaps you can show there is a "large" family of distributions for which the curve fits. Kind of like the law of large numbers. But I didn't do anything quite as ambitious. I simply showed why it's not just true for normally distributed processes, but others which you would think are uniformly distributed. Because chances are, that process has more 1s than 2s, and 2s than 3s, in exactly the proportion that a uniform distribution with some max would have, and the chances are the max isn't exactly a power of 10.

In practice, all this means is that the distribution may look like Benford's law for 1, 2, 3 and then drop to equally small for 4, 5, 6, 7, 8, 9. The 1 is going to be 10x more prevalent than the 9. The 2 would 5x more prevalent or whatever, UNLESS the distribution had a cutoff right before 200 or 2000. Understand? And this DOES HAPPEN.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: