The frequentist answer is that you could get the proportion of tosses that come up heads arbitrarily close to 50% in a large enough sample.
The Bayesian answer is that given all the evidence available to me I would be willing to bet $1 to win $1 if the next toss comes up heads. (Since I work in finance I'll add that this assumes I'm risk neutral, so losing $1 is exactly as bad for me as winning $1 is good.)
Except for the risk-neutrality detail this is all Probability 101, right? Or are you thinking of something else?
That the relative frequency converges to 50 % is obviously not true. You would have to be exceptionally lucky, but you could get heads, heads, tails repeated for ever and therefore the relative frequency of heads would fluctuate increasingly tiny amounts around 66.(6) %. This of course has probability zero, but it is not impossible. And there are of course many other sequences of outcomes for which the relative frequency does not converge to 50 %. So at best you could say that the relative frequency converges to 50 % with high probability, but now you have a circular definition because you make use of probabilities while defining probabilities.
The Bayesian view is problematic for several reasons. Why do I need someone with believes about the coin, we are talking about intrinsic properties of tossing a coin after all. And if that is not enough, we also throw some betting in. Tossing a coin does certainly not depends on the invention of money and gambling, at least ignoring that coins are usually money. Last but not least you have to explain where your believe about a 50 % probability for heads comes from and how it is justified. I could certainly believe that the probability for heads is 25 %, that would not be a good believe.
>That the relative frequency converges to 50 % is obviously not true. You would have to be exceptionally lucky, but you could get heads, heads, tails repeated for ever
This would violate the law of large numbers. You may end up with the sequence HHT 10million times, but the chances that you continue to get that sequence for another 10million times is all but zero, and then gets even smaller as you add another 10million trials. As the number of trials approaches infinity, you will arrive at 50%.
No, the law of large numbers does not assert that all sequences of outcomes converge, only that this happens almost surely. Or look at it the other way round, what mechanism would prevent heads, heads, tails repeated forever? I can certainly get heads, heads, tails on the first three tosses. After that I start over, three more tosses all independent of what just happened, again a 12.5 % chance for heads, heads, tails. Why could this not continue forever?
I'm pretty sure your confusion about probability stems from you not understanding the mathematical concept of a "limit". Your sequence HHT has a 12.5% chance of happening, but so does the sequence TTH and as the number of trials approaches infinity, you will get similar counts of the two because they have equivalent probabilities.
No, I will get similar counts with high probability but not surely. There are sequences of probability zero that do not converge. Think about it this way. Every coin toss in a sequence, finite or infinite, on its own can surly turn out to be heads, can't it? And all tosses are independent, aren't they? So why can't all tosses turn out heads? Unless you have a convincing argument why some tosses have to yield tails eventually, you have to deal with the fact that not all sequences of tosses converge to the expected probability.
> Every coin toss in a sequence, finite or infinite, on its own can surly turn out to be heads, can't it?
I would agree if the sentence contained just the word "finite". The "or infinite" is where you are thinking too intuitively, and not mathematically correct anymore. The difference between "finite" and "infinite" is precisely the solution to this paradox in your mind.
I hope to be able to point out that it is easy to correct for that with just a bit of structured (but maybe non-intuitive) thinking.
One of the simplest and (I would argue most complete) definitions of "a probability of 0.5 for both test outcomes A and B" is that, given an infinitely large sample size, half of the samples show result A, while the other half shows result B. Think about this for a second, and I recommend to also use this opportunity to appreciate again that half of infinity is still infinity.
This relates the mathematical concept of infinity to the definition of probability.
With this definition, it probably feels like I so far just reworded your question. That may not be satisfying. So, I would like to encourage you to imagine that you have superpowers and can actually perform an infinite number of tests. You do that on a sunny day and observe that all test outcomes were the same: A. You call it a day and you can conclude (using the mathematical definition from above) in your diary of days-with-superpowers: "Today I have empirically determined that the test shows outcome A with a probability of 1". You might smile and add "Peter said that outcome A has a probability of 0.5, but I have proven him wrong".
In other words: if you do an infinite number of tests, the normalized distribution of test results precisely is the probability distribution of test results.
I think we have learned by now that the concepts of infinity and probability are deeply related and can, by definition, be used to explain each other. That might still not be satisfying. So, I would like to focus on the "finite" case for a bit.
Imagine you don't have super powers anymore, but you're pretty resilient and motivated and you want to do the experiment to (in)validate Peter's claim: "The probability for both, outcome A and B, is 0.5 each!".
After 1.333.337 tests you have seen 1.333.337 test results showing A. You're tired from all the testing and you complain (correctly!): "it is now really pretty unlikely that Peter is right! I am pretty damn sure that he is wrong! How long do I still need to do this to be absolutely sure?" -- and then a voice from the darknet reminds you: "for being absolutely sure, that is, for finding an answer that is correct with a probability of 1, you need to have super powers and make an infinite number of tests -- sorry dude, you can't do that, ever, because it's inconvenient, takes infinitely long and such -- so I need to disappoint you, you will never be sure, but maybe just enjoy your life as much as you can".
Infinity usually does not allow for actual intuitive thinking. But there are a few really simple mathematical rules around infinity and convergence that make it actually pretty simple and again intuitive to deal with the concept.
I appreciate your attempt but you did not convince me the slightest bit. Let's take the 1,333,337 coin tosses all heads. This result has no bearing on the probability of the coin at all. It may make you strongly doubt that the coin is indeed a fair coin but - and that is the point I am trying to get at - there is nothing that prevents a fair coin from coming up heads 1,333,337 times in a row. Whatever your experiment shows, it could always be a statistical fluke.
And the infinite case does not changes much, at least not in a way obvious to me. Back with those super powers I toss the coin infinitely often and get heads 50 % of the time. That was fun, let me do that again tomorrow. Well, again heads 50 % of the time. This is the way it goes for a long time but then something strange happens, one day all tosses come up heads. The very next day everything is back to normal. What is now the probability of heads, we got two different answers for your way of defining he probability? And all it took was an extreme statistical outlier on single day.
> What is now the probability of heads, we got two different answers for your way of defining he probability? And all it took was an extreme statistical outlier on single day.
The point of probability is that performing an experiment an infinite number of times guarantees that every outcome happens with a proportion that exactly equals its probability (for a formalization of what that even means, look at measure theory). If you get different proportions on different days, you have different probabilities. That means, you weren't performing the same experiment.
That's why I say you cant prove a probabilist/statistician wrong about probabilities of an outcome in any finite amount of time.... Its always good to be in such a business except for they cant be proven right either (in finite time)
I agree, there are problems and non-intuitive aspects to both approaches to probability. Otherwise there wouldn't be two approaches.
I think your objection to frequentism unfairly conflates the idealized world with the physical world. Only in the idealized world can you assert a priori that a coin has 50% probability of coming up heads. In that world you can toss the coin infinite times and it almost surely comes up heads close to 50% of the time.
In the physical world you merely have a model stating that heads will come up 50% of the time. In this world you could toss the coin millions of times and have it come up heads 66% of the time, but all you've done is provide really strong evidence that your model is wrong.
Also, you left out 2 arguments against frequentism: it allows inconsistent beliefs, and in practice it has allowed bad approaches in scientific papers.
As for the Bayseian view, being non-intuitive isn't the same as problematic. The Twins Paradox https://en.wikipedia.org/wiki/Twin_paradox violates our intuitive understanding of how time works, but that's because our intuition is wrong in some conditions. You thought when I said the coin had 50% chance of coming up heads that I was making a statement about the coin, but really I was making a subjective statement about how I would bet. If you believe it's 25% then there's a clear way for us to resolve our different beliefs.
I think the interesting part is not that the relative frequency might converge to something other than 50 % under non-ideal conditions, but that it might not converge at all, admittedly only very, very rarely. And this seems to force you into an infinite regress.
50 % probability for heads means that if you toss the coin infinitely often the relative frequency will converge to 50 %. But not quite, in very rare cases it won't. So you have to toss a coin infinitely often an infinite number of times and then you will see that all but a tiny fraction of the experiments indeed show the relative frequency converge to 50 %.
But now you have to quantify that this tiny fraction is something of probability zero. And even worse, it is still possible that none of your repeated experiments showed convergence, you seem right back where you started.
I would love to know to what you are referring with the inconsistent beliefs.
I would not say that Bayesian view is non-intuitive, I would say it fails to account for important things. A priori probabilities have to be rooted somewhere. Where does you beliefe in a 50 % probability for heads come from? Because you have previously observed the relative frequencies of coin tosses? Because you made some theoretical observations about symmetries?
There must be, at least so it seems to me, something about the probabilities associated with coin tosses that is independent of any individual, otherwise it would become rather difficult to explain how different individuals would arrive at similar probabilities independent of each other. So banning probabilities into the realm of beliefs does not cut it in my opinion.
> I would love to know to what you are referring with the inconsistent beliefs.
You could build frequentist models for the odds of 0 to 1 inches of rain falling in Cleveland tomorrow, 1 to 2 inches, and 0 to 2 inches, and the odds of 0 to 1 plus 1 to 2 don't need to add up to 0 to 2 inches. You can justify all 3 models, but they are inconsistent. Bayesian models don't allow that. I read about this in Aaron Brown's "Red-Blooded Risk". Here's an online source asserting "frequentists can have two different unbiased estimators under the same likelihood functions." https://chenghanyu.wordpress.com/2014/03/26/the-strengths-an...
> So you have to toss a coin infinitely often an infinite number of times
This is a core piece of your confusion. Infinity isn't a number; you can't multiply it by another number and get a meaningful result. You are saying you could run an infinite number of tests on Monday and then run an infinite number of tests on Tuesday and so on, but the concept of infinity holds that you did as many trials on Monday as you did on Monday and Tuesday combined. (Inf * 2 = Inf).
Well, you can build a physical model of a coin, and all forces acting on it during toss/fall, and show that in an ideal scenario (perfect coin/perfect landing surface), there would be roughly half of the initial conditions leading to heads, and roughly half leading to tails.
If you want to go down this road, then we will have to switch to a nuclear decay based coin or something like that. In the case of a coin toss there never was any real randomness, as you say we were just ignorant of the initial conditions. Given a distribution over the possible initial conditions, we can determine the probabilities for heads and tails. The possibly huge number of degrees of freedom and deterministic chaos will of course make this an unpleasant exercise.
Can we base our belief about coin toss probability on the belief that about half of the initial conditions lead to heads and half to tails, without verifying it?
I hope to have convinced with my previous answer that the mathematical definition of probability is key to resolving your paradox here. Mathematically (using the concept of infinity) and also physically (using the concept of large numbers) there is a simple and commonly accepted answer to your question.
Since that definition is very basic, it is hard to find a direct answer to your question.
You might find it interesting that in physics when we deal with large numbers that are not infinity it is usually good enough to know that a certain event is pretty damn unlikely (and it is essential to corroborate this with numbers as in "this happens once in 10^100 years"). In physics, we rarely have 100 % certainty. Logical consistency and a certainty that is large enough are usually key to success (i.e. growth of knowledge) in natural sciences.
I guess another way of stating the problem is this: say you toss a coin and get heads the first 1000 times. Now you are likely to believe that the coin is biased. But that cannot be proved. It may be that you have simply not tossed the coin enough times to perceive its fairness. Maybe after the 1000th toss you start to get enough tails such that after the millionth toss, it is not at all clear that there is any bias.
So practically we can make a judgement as to whether the coin is biased on not based on how many tosses we think is sufficient, but theoretically it is impossible to distinguish a biased coin from a fair one if we toss to infinity
Yes, that is exactly what I had in mind. Probabilities are weired in the way that they say what will happen but then still leave open the possibility that this will not happen at all. You toss a coin a billion times, you will get about 500 million heads. Well, or you don't and there are exactly zero heads.
That one is easy. First let's clarify what exactly we mean with bigger. Let's say we mean larger in volume. Now we pick a procedure for determining the volume of animals, say we submerge them briefly in a large water tank and mark the resulting water level. That animal that lead to the highest water level is the bigger one. Any objections?
The same objections you have regarding the fair coin.
You say you submerge the animals in a tank of water. How does that work? You have this giant tank, put in the animal and one molecule of water? Just like a single toin coss, a single molecule of water will not tell you much.
You need hundreds of billions of water molecules?
Ok. And how do you know these will behave in the expected way?
Because you have a model in your head how molecules behave? Well, I have a model how fair coins behave.
Because you saw different sized objects result in different water levels before? I saw fair and unfair coins result in different head:tail distributions before.
But similar to your objection about the Bayesian view of probability requiring money and betting to be defined, you've now defined the idea of size to be dependent on giant tanks of water
That we're modeling the outcome of a coin toss as a sample space set containing at least two event elements, one of which we call "heads," and that we have a probability measure which assigns 0.5 to the subset consisting of only the "heads" event.
Yes, but what are the consequences of this definition? What does a probability of 50 % for heads imply about the coin? Or what is the difference to an unfair coin with a 40 % probability for heads and a 60 % probability for tails?
It almost sounds to me as if he is pushing into gambler's fallacy space. Like if a (fair) roulette wheel hasn't hit 7 in 200 spins, then 7 is somehow due.
No, his arguments is exactly opposite to the gambler's fallacy. If the gambler's fallacy were not a fallacy, then there would be no issue. But because we know it is a fallacy, then the fact that we have 1000 heads so far tells us (in theory) nothing about the fairness of the coin. But in that case, what does a probability of 50% actually mean?
So after how many consecutive heads will you decide that the coin is biased? 10? 100? 1000? And what is that number based on?
In any case, if you choose any number X, then what you are really saying is that a fair coin cannot produce X heads in a row. In that case, consider this thought experiment: lets assume we know a coin is fair, then by your logic, as soon as we have X-1 heads in a row, we already know the outcome of the Xth toss: it should be a tail. But this is a contradiction, because if the coin is really fair, then the probability of a tail on any toss should always be 50%, irrespective of what has happened before
Well, that is already in the setup, there is also a 100 % chance that the other side is tails. I don't think that observation will turn out to be very useful.
What does it mean that tossing a fair coin has a 50 % probability of showing heads?
If you think you know the answer, you are probably wrong.
EDIT: Instead of just voting this down, try to give an answer. If you think it is easy, you have not thought about it careful enough.