If you present a weird distribution to begin with, it should not be surprising t...

EGreg · on Feb 17, 2020

You can create all the strawmen you want. I am going to quote from Wikipedia:

The law states that in many naturally occurring collections of numbers, the leading significant digit is likely to be small.

I have explained why that happens for the vast majority of UNIFORMLY DISTRIBUTED VARIABLES.

The vast majority. That implies that there is a collection of all possible uniformly distributed variables, and in particular those that are sampled from real world processes.

As long as they are uniformly distributed, with 0 as the minimum and M as the maximum, the first digit will appear more commonly.

I explained it several times. Why are you still insisting that statements about MAJORITY of uniform distributions are weird?

Yes statements about collections of uniform distributions are not statements about ONE SPECIFIC uniform distribution. And?

foo101 · on Feb 17, 2020

Can you provide an example range of uniformly distributed integers that obeys Benford's law?

EGreg · on Feb 17, 2020

* The law states that in many naturally occurring collections of numbers, the leading significant digit is likely to be small.*

Pretty much all of them.

foo101 · on Feb 17, 2020

You are quoting from Wikipedia and that quote is an oversimplification.

If you redefine the law like that, then sure, I agree that there are many uniform distributions too where the 1st digit is likely to be small. Here is another simple example: Consider the distribution of positive integers from 1 to 2. If we pick a number at random from {1, 2} then the 1st digit is likely to be small. This kind of analysis is boring.

But (fortunately!) that's not what Benford's law says. Benford's law provides a specific formula. Check https://en.wikipedia.org/wiki/Benford%27s_law#Definition to see the specific formula that must hold good for a set of numbers to be said to obey Benford's law. That's what makes Benford's law so interesting whereas your example ranges are degenerative cases where nothing new, surprising, or interesting is going on.

EGreg · on Feb 18, 2020

Actually if you look at examples in the real world, you don't always get that EXACT distribution, but rather the phenomenon, that the smaller digits are way more prevalent.

And once again, this is because of a simple analysis. Let me state it a DIFFERENT way, maybe this is something that you will take note of: the only time all possible digits have equal probability of being leading digits is when we have a uniform distribution with a max that is a power of 10. Literally every other distribution starts to exhibit that phenomenon. Now you can quibble as to what distributions lead to that EXACT curve fit. And there can be explanations for why power law distributions do. But other distributions exhibit this same PHENOMENON, while not necessarily converging to that exact proportion. As I said before, any other uniform distribution would not have those exact proportions, but would exhibit the phenomenon.

Basically, every continuous distribution is highly unlikely to have a cliff at a power of 10. It is going to go down gradually, and therefore if it includes the range 8000-9000 then it will probably include numbers above 10000. And even a discontinuous one with a uniform distribution (with a cliff at the end) will exhibit the phenomenon. OK? So if you have the range 8000-9000 in there, that means 1s and 2s were a lot more prevalent, and if you have a continuous distribution then 10,000+ numbers will be there, but perhaps not numbers 90,000+.

Do you at least get the intuition behind this? As soon as you get numbers close to a power of 10, your distribution probably includes numbers in the next order of magnitude, i.e. a lot of leading 1s. The more numbers you get starting with 9, the more it is highly unlikely you're right at the max of your distribution, with a cliff. Unless that happens to be the contrived "empirical test" that was linked to as "proof" that uniform distributions lead to equal changes for every digit to be leading.

The intuition is what matters. Now, maybe for UNIFORM distributions, or POWER distributions, that exact curve fit can be worked out. Perhaps you can show there is a "large" family of distributions for which the curve fits. Kind of like the law of large numbers. But I didn't do anything quite as ambitious. I simply showed why it's not just true for normally distributed processes, but others which you would think are uniformly distributed. Because chances are, that process has more 1s than 2s, and 2s than 3s, in exactly the proportion that a uniform distribution with some max would have, and the chances are the max isn't exactly a power of 10.

In practice, all this means is that the distribution may look like Benford's law for 1, 2, 3 and then drop to equally small for 4, 5, 6, 7, 8, 9. The 1 is going to be 10x more prevalent than the 9. The 2 would 5x more prevalent or whatever, UNLESS the distribution had a cutoff right before 200 or 2000. Understand? And this DOES HAPPEN.