Hacker News new | past | comments | ask | show | jobs | submit login

Don't repeating patterns increase the likelihood that the pattern is non-random with each repetition?

If you saw a graph of coin flips that showed a clear sinusoidal pattern and that pattern continued over many cycles, wouldn't it be reasonable to at least suspect that the coin was not "fair"? One cycle, maybe not. Two, okay, I'm slightly suspicious. Three, four, five?

Of course the pattern says nothing about causation, just that something is biasing the output somehow.

I am not a statistician but I do have a fair amount of background in things like information theory, and I find it very hard to believe that regularity has zero significance. Plug that output into Shannon's equations and compare it with a random source. Or doing VBR mp3 compression on /dev/urandom vs. a piece of music.

The fooled by randomness argument can be taken too far. In a reductio ad absurdium you could argue that I do not exist because theoretically a random source could produce what I just typed. You would theoretically be correct. If you output digits of pi long enough, they will contain this text.




> Don't repeating patterns increase the likelihood that the pattern is non-random with each repetition?

1. No, not for finite-length patterns anyway.

2. Your question is deeper than it appears, and it alludes to some complex theories about randomness.

Let's say I present a series of decimal digits that is a million digits long, and I claim that they're random, in this context meaning they have no exploitable internal order (i.e. have high entropy). If you understand the risks in making assumptions about randomness, and in spite of all sorts of apparent non-random sequences within the list, you might say, "the list isn't long enough to draw any conclusion about randomness." And you would be right.

My million-digit list might actually be a sequence of digits of Pi starting at some arbitrary point within Pi and extending for a million digits from that point. The list appears random, but it isn't -- it actually has very low entropy.

On that basis, guess how many digits you would need to be assured that they represent a random sequence? Spoiler: an infinite number.

All the principles that apply to a claim of randomness, apply to a claim of non-randomness, and for the same reasons.

> I am not a statistician but I do have a fair amount of background in things like information theory, and I find it very hard to believe that regularity has zero significance.

Okay. If I flip a fair, unbiased coin and record all the flips, counting heads as 1 and tails as zero, over time I will see any number of apparently significant patterns, and the longer the sequence, the more likely that I will see "significance".

In a long sequence of flips of a fair coin, for any particular sequence of length n to have >= 50% probability requires 2^n flips. For example, this means after 256 flips, the chance for an uninterrupted sequence of 8 heads has risen above 50%. Therein lies the problem -- with a large enough data set, you will see all sorts of meaningless patterns.

The only way to sort this kind of thing out is to:

1. Adopt the "null hypothesis" -- meaning start out assuming that a correlation means nothing, then gather evidence to contradict that assumption. In other words, don't start out by assuming what you must prove.

2. Consider alternative explanations for the tested outcome -- avoid confirmation bias.

3. Remember lex parsimoniae, otherwise known as "Occam's razor", the precept that the simplest explanation tends to be correct. And the simplest explanation is that a pattern that lacks an explanation, also lacks significance.

In the final analysis, one cannot make any reliable statement about a data pattern without knowing how the sequence was generated, in other words the result data ultimately has no meaning, only knowledge of the generating function can offer that kind of significance.

> If you output digits of pi long enough, they will contain this text.

Yes, but that's an argument against the meaning of perceived patterns, not in favor of it.


This gets really philosophically deep if you keep going... :)

Like... how do you bootstrap epistemology? Given what you say above, how is it that a completely naive learner learns anything? If you immerse a naive learning entity in random noise, it will only learn random correlations. But if you immerse it in an environment with structure, we must assume it would begin to mirror that structure in its internal state. (Learning is information transfer.) But at some point it has to start somewhere... to start by attempting to correlate one apparently non-random pattern with another.

BTW, I do agree that the graph I posted doesn't prove anything. But if it continues to repeat, at what point should we start questioning the null hypothesis and searching for underlying causal factors? Does statistics have anything to say about that?

This gets into areas like "if we built an autonomous space probe, how would we program it to look for 'interesting' things? Define interesting..."


> Given what you say above, how is it that a completely naive learner learns anything?

Well, that's a very good question, and I think the answer is by being naive, meaning suspending disbelief until the person has enough experience to be an informed consumer of ideas.

> If you immerse a naive learning entity in random noise, it will only learn random correlations.

That's true, but children are natural scientists, naturally curious, predisposed to think there's a mechanism behind everything. If that instinct succeeds, they will look for and sometimes find actual mechanisms -- where they exist.

> But if you immerse it in an environment with structure, we must assume it would begin to mirror that structure in its internal state.

That's true even when the structure is an illusion, as with religion and fixed belief systems. To me personally, the hardest part of growing up is not discovering the real mechanisms of life, but unlearning the phony mechanisms that we tend to be force-fed as children.

> But at some point it has to start somewhere... to start by attempting to correlate one apparently non-random pattern with another.

I would have said that the start is locating a plausible mechanism for a pattern that might otherwise mean nothing, then proving a correlation. Then offering the explanation to one's friends to see if they can find a flaw in your reasoning. Hmm -- I just described about 80% of modern science. :)

> BTW, I do agree that the graph I posted doesn't prove anything. But if it continues to repeat, at what point should we start questioning the null hypothesis and searching for underlying causal factors? Does statistics have anything to say about that?

Yes, it does -- it's the same with all apparently nonrandom sequences. Unless the observer tries to find and test explanations, the default assumption must be that, no matter how persuasive, the data are random and lacking a cause-effect relationship.

Here's my favorite example of what can go wrong. Let's say I'm a doctor and I think I've cured the common cold. My cure is to shake a dried gourd over the patient until he gets better. The cure might take several days but it always works. It's repeatable. It's falsifiable (it might fail, but so far it hasn't). Other laboratories successfully replicate the experiment. So it's "scientific", at least according to the definition of science that doesn't require things to be explained (as with psychology).

To a mature, skeptical mind, everything is wrong with it -- no attempt to explain, confirmation bias, etc. But to someone starting out in life, to someone not sufficiently skeptical, it's a scientific breakthrough. It's not random. :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: