Bayesian models at the scale of modern LLMs are not commonly used because the equivalent techniques (like MCMC) are more expensive, which limits how big and useful the model can be. This is a practical example of when pragmatic frequentism is better in a real scenario.
> it is a counting problem
It isn’t a counting problem in the general case. If you use MLE and do number of heads / total flips, that is the Frequentist approach. Of course I deliberately picked the most simple random variable I could think of, so the APPROACH could be differentiated.
The Bayesian approach starts with a prior. This is implicit in Frequentist, but explicit in Bayesian. In this case, the equivalent prior is a uniform distribution between 0 and 1. Then the Bayesian approach to the problem uses Bayes theorem to decide how to update the uniform distribution based on the result of every flip.
Is the result the same? Yes - because these are different approaches, which are both valid. However, in this case the Frequentist approach resulted in a simpler solution because these implicit assumptions matched the ones we would do anyway and matched our intuition. However, if you believed that the prior distribution was non-uniform, then Bayes may become easier.
> Bayes needs at least two scenarios
The general case is that Bayes needs a prior distribution (in this case, probability of heads is uniformly between 0 and 1 is a beta distribution). Then you use Bayes rule conditioned on the data to generate the “update” rule to generate the posterior, given the result of n coin flips.
Could you explain why you believe this?
Bayesian models at the scale of modern LLMs are not commonly used because the equivalent techniques (like MCMC) are more expensive, which limits how big and useful the model can be. This is a practical example of when pragmatic frequentism is better in a real scenario.
> it is a counting problem
It isn’t a counting problem in the general case. If you use MLE and do number of heads / total flips, that is the Frequentist approach. Of course I deliberately picked the most simple random variable I could think of, so the APPROACH could be differentiated.
The Bayesian approach starts with a prior. This is implicit in Frequentist, but explicit in Bayesian. In this case, the equivalent prior is a uniform distribution between 0 and 1. Then the Bayesian approach to the problem uses Bayes theorem to decide how to update the uniform distribution based on the result of every flip.
Is the result the same? Yes - because these are different approaches, which are both valid. However, in this case the Frequentist approach resulted in a simpler solution because these implicit assumptions matched the ones we would do anyway and matched our intuition. However, if you believed that the prior distribution was non-uniform, then Bayes may become easier.
> Bayes needs at least two scenarios
The general case is that Bayes needs a prior distribution (in this case, probability of heads is uniformly between 0 and 1 is a beta distribution). Then you use Bayes rule conditioned on the data to generate the “update” rule to generate the posterior, given the result of n coin flips.