There's no fundamental ontological difference between "give me your likelihood ratios between drawing cards [a and b] as the next draw from this deck of cards" and "give me your likelihood ratios between finding and not finding a continent in an unexplored part of the world". While what we ultimately care about is the territory, which is fixed, we only have our imperfect map of the world with which to model it. There is nothing inherently "probabilistic" about a shuffled deck of cards - the cards have a specific configuration; the uncertainy is only present in the observer(s). Predicting the next draw from a deck of cards is subject to "out of context" model violations the same way that any other prediction over future world states is.
The proposed escape hatch seems to be to dodge the question, but of course in reality you are not agnostic over all possible future events that haven't been ruled out by being "bad explanations". You can say, "well, don't make predictions with numbers on them", but by acting in the world you are making predictions with numbers on them all the time! Bayesian epistemology doesn't claim to be perfect, it claims to be the least inaccurate way of making those predictions and updating on new evidence. Now, there's a separate question of how bounded reasoners (i.e. humans) can best accomplish their goals, and "explicitly run Bayes to decide whether to get out of bed" probably isn't it. But superforecasters are an existence proof that explicit Bayesian calculations are still extremely useful even in highly uncertain domains.
Mechanistic explanations are great but even having a mechanistic explanation doesn't let you say P(x) = 1. There can be enormous differences in reasonable mitigations to take between P(x) = 0.99 and 0.999999, depending on context.
> There's no fundamental ontological difference between "give me your likelihood ratios between drawing cards [a and b] as the next draw from this deck of cards" and "give me your likelihood ratios between finding and not finding a continent in an unexplored part of the world".
Can you explain this claim? It seems very clearly false to me, so perhaps we're thinking about things in very different terms here. To me, the extremely obvious and fundamental difference between the two is that we have a very good explanation (to use Deutsch's favorite term for emphasis, although that terminology isn't strictly important here) for why we know the probabilities of various events in a game of cards. Of course, our explanation of why we know those probabilities does depend on some assumptions of our model, namely some things about how the cards are shuffled and dealt, but the explanation doesn't really have anything to do with modeling subjective uncertainty of a particular observer. Sure, it is probably physically possible for some observer to have more certainty, like someone just outside the room with an extremely sensitive radar, but that observer would likely be understood to be violating the rules of the game (just like any card player who cheats using more feasible methods).
To me, that's fundamentally different from invoking probability when discussing things like the existence of a specific thing in a specific unexplored part of the world, again assuming we don't have some explanation for why the existence of that thing in that place actually has a probability associated with it.
> Suppose you have a penny and you are allowed to examine it carefully, convince yourself that it's an honest coin; i.e. accurately round, with head and tail, and a center of gravity where it ought to be. Then, you're asked to assign a probability that this coin will come up heads on the first toss. I'm sure you'll say 1/2.
> Now, suppose you are asked to assign a probability to the proposition that there was once life on Mars. Well, I don't know what your opinion is there but on the basis of all the things that I have read on the subject, I would again say about 1/2 for the probability. But, even though I have assigned the same "external" probabilities to them, I have a very different "internal" state of knowledge about those propositions.
> To see this, imagine the effect of getting new information. Suppose we tossed the coin 5 times and it comes up tails every time. You ask me what's my probability for heads on the next throw; I'll still say 1/2. But if you tell me one more fact about Mars, I'm ready to change my probability assignment completely. There is something which makes my state of belief very stable in the case of the penny, but very unstable in the case of Mars.
> Ultimately it's a question of how much each bit of evidence would cause you to update (that is, difference of degree, rather than kind).
Isn't this only because you have strong priors for the coin and weak ones for Mars life? Suppose you didn't have your example's information about the coin. Then the situations are the same and a new piece of evidence in either case (e.g., discovering whether the coin is round, has two heads, or where its center of gravity is).
There is no such thing as a "strong" prior or a "weak" prior. A prior is jut a number between 0 and 1.
But the coin example is really not very apt because there is potential information that will change your belief about the coin. For example, you might discover that the coin is magnetic, and there is an external field being generated somewhere in just the right way to make the coin land heads every time. Or something like that. Whatever the case, if you get (say) 100 heads in a row you should start to have some serious doubts about your initial analysis of the situation.
Priors are not just numbers between 0 and 1. They are probability distributions. If they were not, it would be impossible to update beliefs based on evidence.
My prior beliefs about the coin include the possibility that the coin has a bias other than 50%. This distribution may be centered at 50%, but I may assign more or less probability to extreme biases, and this determines how I update my belief when I observe several heads in a row.
Of course it is possible to update beliefs based on evidence if priors are numbers. If I flip a coin but don't show you the outcome, you will rationally assign a prior of 0.5 to the proposition "The coin landed heads-up". If I then show you that the coin did indeed land heads-up (or tails-up) you will update your assessment of the probability of "The coin landed heads up" to something close to 1 (or 0).
Now, it is possible to do Bayesian analysis on probability distributions rather than "bare" priors, but that is just a way of assigning priors to a family of propositions rather than single propositions. For example, "This is a fair coin" is not a proposition to which one can assign a prior because it isn't a well-defined proposition. The intuition behind "fair coin" is something like "the ratio of heads to tails will approach 1 as the number of flips goes to infinity". But no coin can be flipped an infinite number of times, so this is a meaningless definition. Furthermore, consider a coin that always landed on the face opposite that of the previous flip. The ratio of heads to tails for such a bizarre coin would indeed approach 1 as the number of flips approached infinity, but that is not what is generally meant by a "fair coin". So the usual way of dealing with coins is assigning priors to the family of propositions, "If I flip this coin N times the probability that I will get M heads is X" for all possible values of N and M. But each of the propositions in that family gets a prior which is a number.
But priors are not *just* numbers between 0 and 1, as GP claimed.
It is true that in your example, the outcome is binary: there is only a single proposition (the coin landed heads). And so the prior distribution is a Bernoulli distribution, but I would agree you could think of this prior as a single number between 0 and 1.
But in such a scenario, the only possibilities for updating beliefs based on evidence is to observe heads, then update the posterior to P(Heads)=100%, or observe tails, then update posterior to P(Heads)=0%. I might say that this is updating your beliefs based on evidence, but only trivially so.
If you observe heads and form a posterior of P(Heads)=99.999%, then it means that your prior probability distribution was more complex: it included the possibility that "the coin landed heads but I mistakenly observed tails". In this case, your priors include a family of propositions, not a single number.
In GPs example, the prior included the possibility that the coin was magnetic, etc. So it could not be represented by a single number.
> the prior distribution is a Bernoulli distribution
No, it isn't. The prior on the outcome of multiple tosses is a Bernoulli distribution, but the prior on the outcome of a single toss is a number. And for a fair coin, that number is 0.5 before you see the result, and it is either 0 or 1 after.
Note that "This coin will land/landed heads" is a different proposition than "this coin is a fair coin." The latter is much more problematic because it is hard to define what a "fair coin" is.
> your prior probability distribution was more complex
No. You are confusing my prior on a single proposition (a number) with the set of priors on all propositions which I have considered in my lifetime. The latter is obviously a set of numbers, one for each proposition.
> "the coin landed heads but I mistakenly observed tails"
Yes, that is two separate propositions:
1. The coin landed heads.
2. My senses are reliable.
All evidence must be assessed relative to some "background model", a set of propositions whose priors are very nearly 1. And it is also possible to model an infinite family of propositions about continuous properties, e.g. The length of that rod is between X and Y for any two real numbers X and Y. But none of this changes the fact that for any single proposition the prior is a number.
> it could not be represented by a single number
That depends on what you mean by "it". The probability that the coin landed heads certainly can be represented by a single number, as can the probability that the coin is magnetic. Two different propositions, hence two different priors.
That framing actually stops holding for continuous distributions. Like in a distribution on heights of people there isn't a probability or prior associated with a particular height. Probabilities are assigned to subsets of this distribution (sometimes called sigma algebras). Priors stands for prior distribution and it really is best to see Bayesian analysis as a machine that consumes probability distributions as input and spits out probability distributions as output.
> That framing actually stops holding for continuous distributions.
No, it doesn't. A continuous distribution is just a way of assigning probabilities to a parameterized set of propositions where the parameters are continuous and so the set is infinite. But for any given proposition in that set the prior is a number.
> Priors stands for prior distribution
No. Bayes's theorem is:
P(H|E) = P(E|H) * P(H) / P(E)
The "prior" is P(H) which is a probability, i.e. a number between 0 and 1, as are all the other "P" values. The "*" and "/" operations in Bayes's theorems are scalar multiplication and division.
> No, it doesn't. A continuous distribution is just a way of assigning probabilities to a parameterized set of propositions where the parameters are continuous and so the set is infinite. But for any given proposition in that set the prior is a number.
A continuous distribution does not assign probabilities to each proposition but subsets of them. To see this concretely consider the continuous Uniform distribution from 1 to 1.5. It will for all values of its support have a probability density of 2. Most people would not consider 2 a probability.
For continuous distributions, Bayes's theorem becomes about probability densities and not probabilities.
> A continuous distribution does not assign probabilities to each proposition but subsets of them.
Nope. It assigns probabilities to individual propositions.
> Most people would not consider 2 a probability.
That's true, but in your example 2 is not a probability but a probability density, and a probability density is not the same as a probability. To get a probability out of a probability density you have to integrate. For each possible interval over which you could integrate there is a corresponding proposition whose probability of being true is exactly the value of the integral.
No, propositions are simply statements for which is it meaningful to assign a truth value. It might be possible to come up with a proposition that is associated with a non-measurable set, though I can't offhand think of an example. But remember: Bayesian probabilities are models of belief. Priors are peronsal. And so in order to assign a prior to a proposition, the statement of the proposition must have some referent in your personal ontology. You can't hold a belief about the truth value of a statement unless you know (or at least think you know) what that statement means. So unless a person has something in their ontology that corresponds to a non-measurable set (and I suspect most people don't) then that person cannot assign a Bayesian prior to a statement associated with a non-measurable set. But that's a limitation of that particular individual, not a limitation of Bayesian reasoning. For example: you cannot assign a Bayesian prior to the statement, "The frobnostication of any integer is even" because you don't know what a frobnostication is.
(Note that there are all kinds of ways that statements can fail to be propositions. For example, you can't assign a Bayesian prior to the statement: "All even integers are green" despite the fact that you know what all the words mean.)
But ultimately, it should take more surprising evidence to shake up your beliefs about the coin that it would to shake up your beliefs about Mars, right?
So, the prior which is strong/weak is the probability distribution of probability distributions. There is a <<1% chance that I will ever believe there is a >90% chance of flipping heads. But there is a much greater probability that I will one day believe there is a >90% chance of life on Mars.
What terminology would you use to describe that, if not strong/weak priors?
> it should take more surprising evidence to shake up your beliefs about the coin that it would to shake up your beliefs about Mars, right?
I would say it's the exact opposite. Encountering an unfair coin would not surprise me a bit because I know, for example, that double-headed coins exist. But encountering life on Mars would surprise me a lot because I've accumulated a lot of evidence already that there isn't any. (There might have been in the past, but there isn't any now.)
The only thing that I consider "surprising" is encountering evidence that moves one of my priors from being close to 0 to being close to 1 or vice versa. That happens less and less as time goes by because (I hypothesize) my beliefs converge to reality, and reality is stable in the sense that there is something out there towards which it is possible to converge. But at no stage in this process are my beliefs about propositions modeled by anything other than numbers between 0 and 1, with 0 meaning, "I am absolutely certain this proposition is false", 1 meaning, "I am absolutely certain this proposition is true", and 0.5 meaning, "I have absolutely no idea whether or not this proposition is true."
How does this work for the reasoning you just gave? Was there some evidence that moved your priors about Bayesian reasoning from near 0 to close to 1?
Apply that to any philosophical or mathematical reasoning. Let's take an interpretation of QM. How would you assign a probability to Many Worlds Interpretation? Do you have some way of assigning your degree of belief between 0 and 1? Or are there just arguments you find more or less convincing for it?
There was a thread on here the other day about black hole information paradox and how we may never be able to test any mathematical reasoning regarding what happens to the information inside a black hole. Can you assign a Bayesian probability to that?
> Was there some evidence that moved your priors about Bayesian reasoning from near 0 to close to 1?
I think you misinterpreted what I said. My belief in Bayesian reasoning was very nearly 1 from the first time I encountered Bayes's theorem. Nothing I've seen since has changed this, i.e. nothing about Bayes's theorem has surprised me.
The last time I can remember being surprised (in a Bayesian sense) about anything was when Donald Trump won the 2016 election. My prior on his winning was very close to zero when he announced his candidacy (and frankly I think his was too), and obviously my posterior after the election was very close to 1. (Contrast this with the reasoning currently being deployed by proponents of the Big Lie.) That event really rocked my world, but I can't think of anything that has surprised me in the same way since then.
> How would you assign a probability to Many Worlds Interpretation?
The same way you assign a probability to any proposition: you look at the evidence and decide whether the MWI is a good explanation that accounts for all the evidence. I did a very deep dive into this exact question about three years ago:
It depends on what you mean by "that". You can assign a Bayesian probability to any proposition, but you have to be careful because not all statements are propositions. For example: "Penguins are beautiful" is not a proposition. It makes no sense to say that it is "true" or "false" because it is ambiguous (do you mean all penguins or just some penguins?) and it involves a value judgement.
The MWI is particularly thorny because it is really easy to hide ambiguity and value judgements in seemingly innocuous language. For example, you will often hear proponents of the MWI say something like, "The universe splits every time a quantum measurement is made." But that is ambiguous because it doesn't tell you what a quantum measurement is. It also has a tacit "value judgement" hidden in the phrase "the universe". The whole point of the MWI is that there is not one universe, there are unfathomably many of them. So which universe is "the" universe? Well, to me, "the" universe is the universe that "I" live in. But what does the word "I" mean in the context of the MWI? Because according to the MWI, every time the universe splits, I split along with it and become multiple me's in multiple universes. So which "me" is the real me?
The MWI is very strongly analogous to the beauty of penguins.
> I think you misinterpreted what I said. My belief in Bayesian reasoning was very nearly 1 from the first time I encountered Bayes's theorem. Nothing I've seen since has changed this, i.e. nothing about Bayes's theorem has surprised me.
My question was whether you used Bayesian reasoning to accept Bayesian reasoning as your epistemology. And I don't think you did. Instead, I think you were given a persuasive argument for why it's a good way to think about evaluating claims or making predictions, and then you started to think that way.
MWI is an example where there isn't a good way to assign a probability, because so far it's not testable. Instead, people like Sean Carrol are convinced by the mathematics that it's true, or they're skeptical for some reason like it violating Occam's razor or the measurement problem you pointed out. At any rate, it's isn't because there is some probability attached.
IOW, Bayesian reasoning isn't the only reason we believe things.
But both of those cases are concerning the “Bayesian epistemology” side of things, not using probability to reason about how to bet in games of chance with well-established rules and mechanics. The usage of probability Deutsch is approving of is the latter, and is unrelated to a particular observer’s level of confidence that the cards are designed fairly or that the dealer and other players are playing fairly.
> suppose you are asked to assign a probability to the proposition that there was once life on Mars
As soon as you accept this question as meaningful, you have already come out as a Bayesian. Deutsch would simply say (if I understand him right) that this question is not a matter of probabilities, whereas the coin question is.
Suppose you were a bookie and somebody wanted to make a million dollar bet on it. You'd love to take the money, but obviously you need a line. How do you pick it?
A bookie would not care at all what the real likelihood of life on mars would be. A bookie would care about the likelihood that would attract approximately equal bets on both sides of the line, since that way they guarantee that they make money. So they would do whatever sociological research would be required to estimate that and set the line accordingly and probably not do an ounce of research on Mars itself.
This assumes that there is a chance to create a matched book. This is usually possible for something "liquid" like a big league sporting event, but not always for poorly-followed sports or weird prop bets on low-interest games. Even for liquid games, the book is not always perfectly matched and they may end up with some actual exposure to the outcome of the event.
In any case, in this scenario its just one guy looking to make a bet. There is definitely SOME line, possibly million-to-one or higher, that a smart book (or any rational actor) should be willing to take the bet.
> Predicting the next draw from a deck of cards is subject to "out of context" model violations the same way that any other prediction over future world states is.
The observation that the uncertainty is only present in the observer is certainly true in the card context, but really questionable in the world states one. For one, don't we need to assume a really aggressive "deterministic evolution of future world states" to argue this uncertainty is only present in the observer? I, for one, am totally uncomfortable with this assumption...
Also, there's a clear computational difference between these two settings. You kind of point this out by acknowledging that explicit Bayesian calculation are unreasonable in many settings - but in practice, I'm on a rationalist email thread where folks are trying to calculate explicit probabilities about the increased likelihood of nuclear war over the past 6 months. It's totally silly.
We need to look at the actual way that Bayesian tools are actually used (and usable in practice) by it's adherents. As far as I've observed, it's mostly just silly signaling games where people make up numbers to justify whatever story they want to tell (given that the rationalist communities spend so much time worrying about confirmation bias as a fundamental one, I'm not sure this is even surprising).
Also, superforcasters are most certainly not "proof that explicit Bayesian calculations are still extremely useful." There are literally _millions_ of experts who make prediction - there's no version of history where this isn't a random subset who performs dramatically better than average just due to statistical chance! Misunderstanding this as a proof of the usefulness of the probabilistic reasoning tool is the classic example of being fooled by randomness.
>a really aggressive "deterministic evolution of future world states"
What would it be if not deterministic? Chaotic sure, but that just means things are very sensitive to initial conditions. Unless we extend those initial conditions to include quantum uncertainty then things are pretty much deterministic.
Of course that idea goes out the window depending on your opinions on free will.
Even if it's entirely deterministic above the quantum/micro level, that doesn't mean we can predict anything without computing the entire future world, just like in the game of life. Also, the wavefunction is deterministic, so you could in principle compute the many worlds, but that wouldn't tell us which one we will observe.
> The observation that the uncertainty is only present in the observer is certainly true in the card context, but really questionable in the world states one. For one, don't we need to assume a really aggressive "deterministic evolution of future world states" to argue this uncertainty is only present in the observer? I, for one, am totally uncomfortable with this assumption...
My objection is to Deutsch's special pleading with regards to games of chance, or whatever else he thinks is "based on a physical understanding of the situation where a randomising process had approximated probabilities". The difference in irreducible uncertainty between "shuffled cards" and "anything else" is a matter of degree, not kind.
> Also, there's a clear computational difference between these two settings. You kind of point this out by acknowledging that explicit Bayesian calculation are unreasonable in many settings - but in practice, I'm on a rationalist email thread where folks are trying to calculate explicit probabilities about the increased likelihood of nuclear war over the past 6 months. It's totally silly.
Is it? Rationalists did better than pretty much any other set of people that could usefully be regarded as a "community" when it came to COVID, in terms of seeing it coming and dealing with it successfully. Do you think they did this by gut feeling? (Sure, some of them did, but much of that gut feeling was informed by explicitly probabilistic reasoning performed by others.)
I'd love to have a better toolset for deciding when the level of risk involved in staying in [random major city] crosses a threshold that justifies moving. Right now we have mechanistic reasoning (what effects do nuclear explosions have) and reference class forecasting (how likely is Putin to do [some unusual thing]), both of which spit out credences with respect to the questions we care about (how likely am I to die if I stay here). The point isn't to follow the numbers off a cliff, but if you're well-calibrated (as in, have a track record with a decent brier score, or something), then pretending those numbers are totally useless is a bit silly. The question is not "are these numbers wrong" - yes, of course they are! The question is what alternative you're proposing, and whether it's any better. If you aren't well-calibrated and don't have experience using explicit Bayesian reasoning to make major life decisions like "I should leave this city because I think the risk of it getting bombed in a nuclear war has crossed some threshold", then I don't suggest you start now. But you could do worse than looking at people who do have such track records. Well, you'd be able to do that in a world where we cared about keeping track of that sort of thing. Too bad people keep finding reasons not to do that, huh?
> We need to look at the actual way that Bayesian tools are actually used (and usable in practice) by it's adherents. As far as I've observed, it's mostly just silly signaling games where people make up numbers to justify whatever story they want to tell (given that the rationalist communities spend so much time worrying about confirmation bias as a fundamental one, I'm not sure this is even surprising).
Surely you aren't telling me that you've updated your priors on observed evidence, and as a result have a different expectation about future world states (that is, "how useful would explicit Bayesian reasoning be if I tried using it")?
> Also, superforcasters are most certainly not "proof that explicit Bayesian calculations are still extremely useful." There are literally _millions_ of experts who make prediction - there's no version of history where this isn't a random subset who performs dramatically better than average just due to statistical chance! Misunderstanding this as a proof of the usefulness of the probabilistic reasoning tool is the classic example of being fooled by randomness.
Superforecasters consistently perform much better than chance (about as well as domain experts who aren't superforecasters, for difficult questions, in fact). Their level of performance is not compatible with the "the small number of people out millions for whom the the coin flip came up heads 10x in a row" hypothesis - if you had conducted an experiment to distinguish between the hypotheses "Superforecasters will perform no better than chance" and "Superforecasters will perform [x] better than chance", the posterior likelihood ratio in favor of the second would be truly absurd, even if you were pretty far off on where exactly in the distribution they'd be.
Thanks for the points and taking the time to write up so much! I am enjoying this interaction, even if we don't fully agree :}
> The difference in irreducible uncertainty between "shuffled cards" and "anything else" is a matter of degree, not kind.
Difference in degree or difference in kind is not the important point to sort out. The important question is what computation methods actually work in practice for humans. Some algorithms that work well on degree-small datasets fail to terminate in the course of the universe for large-datasets.
If you've ever operated in a particular uncertain environment (like, as a first time startup founder), it immediately becomes clear how totally useless explicit probability calculations are when making decisions. I was a self-proclaimed rationalist and tried reasonable hard to be a good Baysian; I ended up deciding I might as well sacrifice a goat and read it's entrails.
> The point isn't to follow the numbers off a cliff, but if you're well-calibrated (as in, have a track record with a decent brier score, or something), then pretending those numbers are totally useless is a bit silly.
I'm actually trying to make a stronger point that these numbers are "totally useless." I think in practice many applications of explicit probability calculation are actually quite harmful.
Rationalists love to talk about map and territory as the two items to be concerned about, but there's also another thing called "agent's belief in the effectiveness of their map." What models/explicit math/probability gets you in improved mapping, it takes away from you by making one's belief in the effectiveness of their map much much more.
If you look at large-scale model failure in practice, it's almost always because someone thought the math they were doing captured the full system in a complete way (and then it didn't). 2008 is a fantastic example of this.
> Well, you'd be able to do that in a world where we cared about keeping track of that sort of thing. Too bad people keep finding reasons not to do that, huh?
I'd love to start tracking predictions generally! I agree with you here!
> Surely you aren't telling me that you've updated your priors on observed evidence, and as a result have a different expectation about future world states (that is, "how useful would explicit Bayesian reasoning be if I tried using it")?
To be clear: it's the explicit probability calculation and mathification that I take my major issue with. I am most definitely not against learning from what I observe :-)
> Their level of performance is not compatible with the "the small number of people out millions for whom the the coin flip came up heads 10x in a row" hypothesis
Can you link this math? I'd love to see it - not flippant, genuinely looking to check it out and be educated here!
Yeah, to clarify, my original objections were generated by Deutsch's dismissals of Bayesian epistemology on grounds of fundamental invalidity, rather than practicality. I agree that practicality is often a serious concern! I've also attempted a startup and not once did I ever run an explicit Bayesian calculation.
Naive applications of many powerful techniques often turn out to be actively harmful, unfortunately.
But without explicit probability calculation and mathification, we can't actually track our predictions (often of hugely uncertain events) over time, and if you can't track your predictions over time you'll have a hard time improving.
I don't know if anyone's written up the math, but it should be pretty intuitive to get a sense of the differential. Consider that you need 20 bits of information to distinguish one thing from approximately a million. (Isomorphic: the odds of a coin coming up heads 20 times in a row is 1/2^20.) Now, of course, superforecasters aren't perfect predictors, but they're also predicting things much harder than coin flips - there's no such thing as "playing it safe" with brier scores. Matching or outperforming domain experts across a wide variety of domains, over a long period of time, involving many questions (hundreds if not thousands) is not really the sort of thing that happens by coincidence.
> fundamental invalidity, rather than practicality
I'm not sure there's any difference between these in practice - same with the difference being degree or kind. At the end of the day, the question is: what can I actually implement in my own brain to make good decisions.
I think the biggest place we might disagree here is about _what the goals_ are of prediction in the first place. "if you can't track your predictions over time you'll have a hard time improving" might be true, but for me, the terminal goal is _not_ to make increasingly accurate predictions.
My goal is to operate as effectively as possible. We both agree that explicit Bayesian calculations aren't useful in a startup world. We also both agree that Bayesian techniques are very easy to footgun with in the wild. For me, both of these point pretty squarely in the direction called: we need non-Bayesian decision making tools.
To paint the shape of what I think these decision making tools should look like:
1. They avoid mechanistic explanations.
2. They are explicitly not mathy.
3. They apply simple and robust heuristics to generate lots of options.
4. They exist in a purposely constructed iterative environment so that you get lots of attempts.
These decision making tools are built around a) acknowledging that we cannot mathematically reason about the uncertainty we're dealing with in any legit capacity (without just foot-gunning), and that b) spending time on _decisions_ vs. on _execution_ is silly in most contexts, as execution is where you actually learn things (and thus you should make lots of quick and dirty decisions), and c) more good options are always valuable!
I can talk more about what this looks like in practice, as it's sort of the shape of how my cofounders and I run our startup. It's a WIP of course, but in practice we find the iterative, quick and dirty heuristic approach to lead to much faster and more robust growth than long-term predictions (which we used to do a lot of).
Also, do you have links to work that has informed your thinking on superforcasters? Or links to the specific set of superforecasters you're talking about? It seems you're thinking about a specific set of people, and I'd love to learn more about em!
> by acting in the world you are making predictions with numbers on them all the time!
This was a key insight for me several years ago. Pretty much any time you make a choice, you are implicitly making a probability calculation, something along the lines of "which of my options has the highest expected value/probability of success?" You may not be putting specific numbers on things, but you are necessarily making an inference about which of several numbers is largest. So the question is not whether you should be making predictions based on probability (you don't have a choice) but instead how you should be doing so.
> Pretty much any time you make a choice, you are implicitly making a probability calculation, something along the lines of "which of my options has the highest expected value/probability of success?"
That's not quite right. The calculation is more along the lines of "which of my options has the highest risk-adjusted reward?" A 10% chance of getting $1,000 is more attractive than a 50% chance of getting $100.
It should be noted though that there are other ways to model the decision, such as regret minimization and its variants.
It depends on the choice being made. If you're considering one of several routes on the way to work, you don't care about arrival time, you only care about probability of arriving before a certain time.
In any case, the calculation isn't the point. My point is that in any non-trivial choice, you are necessarily making such a calculation, even if only approximately.
>There's no fundamental ontological difference between "give me your likelihood ratios between drawing cards [a and b] as the next draw from this deck of cards" and "give me your likelihood ratios between finding and not finding a continent in an unexplored part of the world".
These are not the same, my observables being independent of each other is not the same thing as my not knowing about the correlated observable. In a practical sense, there are lots of ways that I could reason that there is a continent in the unexplored bit, and obviously no way to know which card will come up in a shuffle.
My point is that various observations appear "independent" (i.e. uncorrelated) only because of uncertainty in our map. (You could point to quantum randomness but it doesn't seem likely that this matters for most of the questions we're trying to answer, but even if it did there'd be no problem modeling it with Bayes.)
It's possible I'm misunderstanding your objection, though.
>I could reason that there is a continent in the unexplored bit, and obviously no way to know which card will come up in a shuffle.
You can reason the continent exists, but you can't know. You're using knowledge you already have. Similarly, knowledge accumulates as you run through a deck, which can be used to reason that a specific card will come up next. I'm not sure I see a difference.
What a rich essay. Not sure why Bayes features in the title (and opens
a distracting garden path into discussions of statistics), as the
actual discussion is deeper, regarding the use of knowledge of the
past to constructing "knowledge" of the future and how these are
really two different epistemologies.
Rather oddly I found it connected a lot with another post from today,
"Why Pessimism Sounds Smart"
Deutsch says; "In the 20th century, there was a sort of congealing of
the intellectual climate into a very rigid pessimism, so that a
prediction or prophecy could only be taken seriously if it was
negative."
If he unpacks this, it's in earlier statements about the faults of
induction and the lack of wonder, faith, possibility and sheer
mischief that a scientific method occludes if it admits strong
theories and builds a probability cone of worlds that extrapolate from
them.
I guess, as I was trying to say in the other thread, that Deutsch calls
"epitome of the wrong theory", that we've used technology to create a
set of distorting optics that stop us seeing optimistic interpretations
all around us. How do we get out of that parochialism of the present?
Embrace radical falliblism and entertain the idea that against all
probability you, I and all the smartest people we know might be
hopelessly wrong - and that that's a good thing?
He doesn't actually argue against the Bayesian view of statistics:
> The word ‘Bayesianism’ is used for a variety of things, a whole spectrum of things at one end of which I have no quarrel with whatsoever and at the other end of which I think is just plain inductivism. So at the good end, Bayesianism is just a word for using conditional probabilities correctly. So if you find that your milkman was born in the same small village as you, and you are wondering what kind of a coincidence that is, and so on, you've got to look at the conditional probabilities, rather than the absolute probabilities. So there isn't just one chance in so many million, but there's a smaller chance.
His problem seems to be the extension of it to epistemology:
> At the other end of the spectrum, a thing which is called Bayesianism is what I prefer to call ‘Bayesian epistemology’, because it's the epistemology that's wrong, not Bayes’ theorem. Bayes’ theorem is true enough. But Bayesian epistemology is just the name of a mistake. It's a species of inductivism and currently the most popular species. But the idea of Bayesian epistemology is that, first of all, it completely swallows the justified true belief theory of knowledge.
His problem with inductivism is that when you follow it you don't try to make theories more believable by getting rid of ones that don't fit, but by confirming instances in which your theory does fit:
> It's inductivism with a particular measure of how strongly you believe a theory and with a particular kind of framework for how you justify theories: you justify theories by finding confirming instances. So that is a mistake because if theories had probabilities – which they don’t – then the probability of a theory (‘probability’ or ‘credence’, in this philosophy they’re identical, they’re synonymous)... if you find a confirming instance, the reason your credence goes up is because some of the theories that you that were previously consistent with the evidence are now ruled out.
> And so there's a deductive part of the theory whose credence goes up. But the instances never imply the theory. So you want to ask: “The part of the theory that's not implied logically by the evidence – why does our credence for that go up?” Well, unfortunately it goes down. And that's the thing that Popper and Miller proved in the 1980s. A colleague and I have been trying to write a paper about this for several years to explain why this is so in more understandable terms.
He cites this paper as an example of a proof, but claims it isn't very approachable (which is why he is working on one with more understandable terms).
Yeah. Sadly, the standard undarstanding what "Bayesian" means among philosophers is very different from its meaning among statisticians. I've written on the subject for both audiences and it's been a real pain. And it's not just disagreement about what probabilities are (although I suspect that was the root cause historically). It's each discipline ignoring all sorts of aspects of the literature of the other discipline. Oh well. Life is suffering.
Yeah, and Hume's problem of induction is that we don't have solid reasons for expecting the future to be like the past, because causality is not observed, but rather is a habit of thought. We think the sun will shine tomorrow because laws of physics regarding nuclear fusion, but we can't prove they will be the same tomorrow morning. We just assume it. That goes for all science. Which is why Kant was motivated to try and save science from Hume.
This isn't even a case where we can assign some probability. If the laws of physics just changed on us, we have no priors to calculate that. We can argue that they won't, but we have no proof of that. The laws may might just be contingent (A always follows B for no reason until it no longer does for no reason, because nothing causes it to follow or not follow, because there is no causality and laws are nothing but constant conjunction up to this point in time).
Bayesian probabilities are different from frequentist probabilities. An abridged summary:
Frequentist probability is about the limit that something approaches after many repetitions. e.g. rolling a dice. It works well in a casino and some quantum physics applications but mostly fails outside of that. e.g. we want to know what the probability of a team winning the next superbowl is, or what is the probability of life on Mars. A frequentist would say that question doesn't make sense, there's either life on mars or there isn't, there's going to only be one result of the next superbowl.
Bayesian probability is about the degree of belief. It is about how to update the degree of belief based on prior knowledge and observations. e.g. If we thought there was life on mars, we should see some evidence of it, we don't see evidence so we should adjust our prior belief of the probability of life on mars downward. It answers the questions we want to ask, but those answers frequently depend on what priors you choose, and if those priors can sometimes dominate the analysis.
I'm having trouble figuring out a non-absurd interpretation of that paper.
For example, their Equation-7:
> p(h←e, e) = p((h←e)e, e) = p(he, e) = p(h, e)
, which looks like they're saying that, when there's evidence "e", the probability that a hypothesis "h" is true is equal to the probability that "e" proved it.
For example, say we consider the hypothesis, "h", that there aren't 10-armed spider-monkeys that like jazz-music currently on Earth. Then based on that hypothesis, we make the prediction that we won't see a 10-armed spider-monkey listening to jazz-music in the next room. Then, let's say we check that room, and there's not such a 10-armed spider-monkey listening to jazz-music, such that there's "e".
Did our test prove the hypothesis? Wouldn't seem like it.. I mean, even if there were 10-armed spider-monkeys, it'd seem like they could just be in places other than in the other-room. So, "p(h←e, e)" would seem pretty close to zero.
However, it still seems like the hypothesis that such spider-monkeys don't currently exist on Earth would seem fairly probable. So, "p(h, e)" would seem fairly close to one.
So, "p(h←e, e) = p(h, e)" wouldn't seem to hold, even approximately.
That said, the author didn't clearly specify exactly what they meant, so maybe they meant something else? But I'm not seeing an obvious, non-absurd interpretation of their claims.
The way I read it, the comma "," means "given", so what they write p(a, b) would be written as the conditional probability p(a|b) in contemporary CS/math literature.
h <- e means (h OR (NOT E)), which is the usual meaning of implication (either the consequent is true or the antecedent is false)
So p(h <- e GIVEN e) = p((h OR (NOT e)) GIVEN e) = p(h GIVEN e) since NOT e is false given that e is true.
, and then if we take the condition of "b" as assumed for brevity,
1 = p(h←e) + p(e) - p(he)
, then it appears that the conditions of "h←e" and "e" cover all possibilities, plus an excess overlap of "he".
So, "h←e" refers to NOT(e) plus AND(h,e).
So, "h←e" equals OR(NOT(e), AND(h,e)).
So, the evidence "e" implies the hypothesis "h" when both are true, plus also when evidence "e" is false.
---
So, "Theorem 1" claims
p(h←e, e) < p(h←e)
, which we can now parse given the above to
OR(NOT(e), AND(h,e)) when e < OR(NOT(e), AND(h,e))
, and we can reduce the left-hand side to find
h when e < OR(NOT(e), AND(h,e))
h when e < NOT(e) + AND(h,e)
h when e < NOT(e) + (h when e) * e
0 < NOT(e) + (h when e) * e - (h when e)
0 < NOT(e) + (h when e) * (e - 1)
0 < (1 - e) + (h when e) * (e - 1)
e - 1 < (h when e) * (e - 1)
1 - e > (h when e) * (1 - e)
1 > h when e
, or to write that last line out,
p(h | e) < 1
, which matches out with the condition that they attached to "Theorem 1", which requires that p(h|e)!=1.
But to work that out with the sides keeping their values,
h when e < NOT(e) + (h when e) * e
h when e < NOT(e) + (h when e) * (1-NOT(e))
h when e < (h when e) + NOT(e) - (h when e) * NOT(e)
h when e < (h when e) + NOT(e) * (1- (h when e))
h when e < (h when e) + NOT(e) * (NOT(h) when e)
, which appears to be the last line of their "Theorem 2".
So.. I guess that explains the definitions that they were using.
---
Anyway, what seems odd to me about that is that "Theorem 1" seems like it's meant to be surprising -- like it's meant to show that finding evidence reduces the meaningfulness of the evidence itself, or something?
However, some things seem off. For example, the expression of "h←e" seems weird to me; it'd seem more sensible for it to be like this:
OR(AND(NOT(e), NOT(h)), AND(h,e)) when e < OR(AND(NOT(e), NOT(h)), AND(h,e))
h when e < OR(AND(NOT(e), NOT(h)), AND(h,e))
h when e < (!h when !e) * !e + (h when e) * e
0 < (!h when !e) * !e + (h when e) * (e - 1)
0 < (!h when !e) * (1 - e) - (h when e) * (1 - e)
0 < (!h when !e) - (h when e)
(h when e) < (!h when !e)
, where the inequity isn't obviously of particular interest.
Because the second thing that seems off is the notion that this matters -- that the evidence, "e", should be a concern for not just figuring out the probabilities in the model, but also retro-actively adjusting the meta-model, or something?
In short, after tracing their math and such, it's unclear what point they might be trying to make, as this doesn't seem surprising or unexpected.
> And so there's a deductive part of the theory whose credence goes up. But the instances never imply the theory. So you want to ask: “The part of the theory that's not implied logically by the evidence – why does our credence for that go up?” Well, unfortunately it goes down. And that's the thing that Popper and Miller proved in the 1980s. A colleague and I have been trying to write a paper about this for several years to explain why this is so in more understandable terms.
This seems to be his argument against Bayesianism epistemology (besides the unapproachable paper). I am curious, but have no idea what he's trying to say. If someone does understand, would you mind giving a concrete example of a theory plus some evidence plus "the part of the theory that's not logically implied by that evidence"?
You have a hypothesis h = "it's raining somewhere in England" and evidence e = "it's raining in London". You have an empirical theory that e implies h, which may or may not be true depending on whether London is in England in your universe, but you don't know which universe you live in. There are also universes where London is not in England but your theory is still true for some complicated meteorological reasons that are unknown.
For all h and e, you can always write (using C bitwise operations to denote logic) h = (h | ~e) & (h | e). The second factor (h | e) is the part which logically follows from the evidence, that is, it is true in all universes in which e is true. The first factor is the part that is not logically implied by the evidence, that is, there are universes where it is raining in London and yet h is false because London is not in England.
Now the real question: somebody tells you that it is raining in London, so your credence in e goes up. What happens to the probability of (h | ~e)? It should go down, because as e becomes "more true", ~e becomes "more false", and thus (h | ~e) becomes more false in the sense that there are fewer worlds where (h | ~e) is true.
But (h | ~e) is the same as "e implies h", which is your empirical theory. So your belief in the theory should go down as you gather more evidence. Another way to say it is that, as the evidence becomes stronger, the part logically implied by e becomes more likely, and whatever remains (h | ~e) becomes a smaller set of possibilities, so it is less likely.
Note that your belief in (h | ~e) goes down, but your belief in h goes up. I think Deutsch's criticism is that people confuse the two, and they think that evidence increases the credence in the theory instead of the hypothesis.
I'm having trouble following this hypothetical world where we don't know if London is in England, so let me try translating to a historical theory:
h = "this person has cancer"
e = "this person is a smoker"
For simplicity, let's ignore the fact that not everyone who smokes gets cancer, and that those who do get cancer might not get it on their first cigarette. Just: smoking -> cancer, is the theory.
h = (h | ~e) & (h | e) -- this is in fact, always true
Now we learn that the person is a smoker. So e is 1, and (h | e) goes up. And (h | ~e) = (e -> h) goes down. Because the person's a smoker so---uh oh---our theory is getting tested and it could be proved false right now.
This... doesn't seem that surprising? If we observe lots of people smoking, but haven't checked anyone for cancer, it does make the strict theory (smoking -> cancer) less likely because we're making the very strong prediction that this large set of people must all have cancer.
I wouldn't expect to get evidence confirming the implication (smoking -> cancer) until we started checking whether smokers have cancer. And once you start checking that, then (h | ~e) = (e -> h) will rise (assuming your theory is true), right?
Yeah, now that I think more about it, I think I was confused myself. Specifically, I think I got the math right but the example wrong (and I confused you too).
Let me try again. I think that Deutsch is saying that h is the proposition "smoker implies cancer", and e is a specific instance of a person where the hypothesis holds (either a nonsmoker or a smoker with cancer). He is talking about e being instances of h, so h must be a higher order proposition about instances.
But now h can be decomposed as we said into a logically necessary part (h|e) and a part (h|~e) that may be true or false depending upon which universe you live in. By the argument above, finding more instances of the theory should decrease our belief in the (h|~e) part. Since h|~e is the same as e->h, gathering more e should decrease our faith that the evidence validates the hypothesis.
Presumably Deutsch is saying that the logically necessary part is sort of trivial (a mere theorem) whereas (h|~e) has actual physical content, so why do we believe that the evidence increases our confidence in the physical portion of the hypothesis?
By the way, I got all this math from the unapproachable paper, which is not that unapproachable if one looks at the math alone. Like you, I am trying to figure out how this math applies to the real world.
> Specifically, I think I got the math right but the example wrong (and I confused you too).
Don't worry, I was much more confused before you came along :-). I also followed the math in the paper but didn't know what devastating contradiction for Bayesianism it was oh-so-darkly hinting at.
I keep trying to describe what happens if e and h are both general, and bouncing off.
Let's try a different assumption: e is the statement "the first person tested is consistent with smoking->cancer". And in fact, we just tested them, and they're a smoker and have cancer. So e is unambiguously true. And h is the hypothesis "smoking -> cancer". Then:
- e is 1
- h has increased
- h|e has increased, in fact it is 1
- h|~e has decreased
h|e is tautologically true, once we have observed e. And h|~e is e->h, the amount to which this observation implies "smoking->cancer". Which seems odd, though I don't have an immediate intuition about how an observation is supposed to effect, not the hypothesis that it supports, but the implication between itself and that hypothesis. But maybe that's the darkly-hinted-at problem?
I notice I'm confused, though. Is h|e about this specific instance of e, or the general e? How about e->h? It feels like we're moving the goal-posts: at first h|e = 1 because we're talking about a specific e (we found a smoker with cancer), and e->h decreases because we're talking about that specific e, but then later we start to draw conclusions about the general e->h such as describing it as whether "the evidence validates the hypothesis". And I don't know how to formally relate the specific e->h to the general e->h.
There is a subtlety going on here. You should really parameterize by time. So we have e.g. h(t) - it's raining somewhere in England at time t.
Now our empirical theory is forall t e(t) -> h(t).
We are told that e(T) for some particular T. This makes us believe that h(T) | ~e(T), but should barely change anything about our belief in the theory.
Edit: I noticed you realized the part of higher order in a side thread, but a specific important point to add here. You mentioned that "logically necessary part is sort of trivial (a mere theorem) whereas (h|~e) has actual physical content". Our theory makes predictions about the future - so it's impossible to observe e(t) for all t. We can't just rely on observing all instances of the actual physical thing. We have to make use of logical necessity.
> His problem seems to be the extension of it to epistemology
Isn't that a bit of a strawman? Bayesian epistemology as stated would require someone to believe something like "drawing without replacement from this urn, I got 100 white balls and 1 black ball, therefore there is a nonzero probability that this urn contains only white balls", which is not a belief I can imagine anybody to seriously hold.
Here is my understanding of what Deutsch and the paper by Popper/Miller are trying to say.
There are three concepts involved: an evidence "e", for example "I extracted 1 black ball"; a hypothesis "h", for example "the urn does not contain only white balls"; a theory "h <- e" that, by means of logic or otherwise, deduces the hypothesis from the evidence. Your (qsort) theory is that if you see a black ball then the hypothesis is correct.
Everybody, including you, me, and Deutsch, agree that if the probability of the evidence goes up, then the probability of the hypothesis goes up as well.
What Deutsch and Popper/Miller are also saying, however, is that if the probability of the evidence goes up then the probability of the theory goes down (proof in the paper).
I need to study the paper more carefully, because I am not 100% sure that it is strictly correct (there are many factors that would transform a < into <= if they were 1, and I suspect that some are), but I believe the weaker statement that if the evidence goes up the probability of the theory does not go up at all.
In any case, this conclusion is consistent with what all scientists have believed forever: the only way to increase confidence in a theory is to try to break it. Or at least they believed this until fact checkers and censorship came along and threw the baby away with the bathwater.
This is the best explanation I’ve seen yet of what Deutsch was getting at. I think he actually has a good point. The problem with misapplications of Bayesian probability is that they tend to produce a bogus sense of progress while eliding the interesting questions. For example take the hypothetical statement: “The news that a supercomputer has defeated the reigning Go champion has increased the probability that machines are capable of consciousness.” The form of this statement completely obscures the interesting question, which is: What is the causal connection between being good at Go and consciousness. There is an unspoken theory that the two are connected, but the statement offers no evidence for it, and hence is meaningless.
For some reason, the Go example reminds me of the monolith in Clarke's 3001 and the protomolecule in The Expanse. The book authors describe both as very advanced technology that lack consciousness. They do what they're designed to do, but they do not experience doing it, and thus lack the advantage of a certain kind of relfective awareness the conscious constructs they create, such as Dave or Miller, do possess. The Star Trek holodeck also sometimes accomplishes this.
I have no idea whether it's possible to create a machine that can generate conscious simulations without itself being conscious. But there's no Bayesian reasoning to apply to such scenarios, since we have no idea as of yet. They're philosophical arguments. And applying to that to current technology is misplaced.
I could believe it, in this[1] sense. "It is possible to draw a black ball from an urn full of white balls, without any tricks" would pretty much upset my entire epistemic grounding, but with sufficient evidence, I don't see why I would refuse to believe it. Thus, I have no problem assigning it a non-zero (but insanely low) probability.
I think it's more like the black swan problem, you only see white swans such that you believe there is only white swans ( such that historically people made reference to a black swan to imply something that clearly didn't exist ). Which is the problem of induction. Eventually new knowledge was introduced and the knowledge situation completely changed.
Unless you replace the balls after drawing, there is a nonzero probability that the urn contains only white balls after the draw, though the probability is zero (under the assumption of stable ball colors) that it contained all white balls before the draw.
It's logically invalid. You can't have an urn filled with only white balls and draw a black ball. That would mean the urn wasn't filled with only white balls. There's zero probability the urn contained only white balls after you draw that black ball.
Before you might have given a nonzero probability because you didn't know whether there was a black ball. But once you know there is, you can't have some nonzero belief in there only being white balls.
Ah, I thought you you were talking about the urn after the balls had been drawn.
Still, I've been to enough magic shows, been fooled by enough subtly-wrong mathematical proofs, been utterly convinced that my program's behavior was impossible, etc that I side with the bayesian perspective here. I don't think there's a single thing that I'm justified in saying is true with 100% probability.
> I don't think there's a single thing that I'm justified in saying is true with 100% probability.
Not even simple deductive proofs? 2+2=4 by definition. There's no probability for it being wrong. We couldn't do math if that wasn't the case. It's not even correct to assign those kinds of statements a probability. Now there is a possibility for humans to make mistakes with complicated proofs, as you pointed out. But the correct proof doesn't have a probability, only the belief in it being correct.
For empirical matters, I'm not sure what sort of probability you could assign to some skeptical scenario like it's all a dream, or nobody exists outside your mind, or we live inside a simulation. We take it that there's a world with other people for granted.
Sure, you can be fooled into thinking a black ball was pulled out of an urn of white balls, but whether the urn has all white balls is just a fact, not a probability. And under proper conditions, we should be able to verify that fact. To doubt that is to entertain wild skeptical scenarios where we can't really do science.
At any rate, I don't see that sort of reasoning as Bayesian. It's just radical skepticism.
We absolutely can do math in the face of some probability of being wrong about our proofs (and the proofs of our proof systems, and our proofs of the consistency of our logics, and so on). I know this because we do, in fact, do math and some of our proofs are almost certainly wrong.
Obviously I couldn't tell you which generally accepted proofs are wrong, only that we've drawn a few black balls from the urn of double-checking-established-proofs already, and so we should not assume that we won't find more. Imre Lakatos' essay Proofs and Refutations is a fun read on the subject.
A simple deductive proof of 2+2=4 (say in Peano Arithmetic) is clear enough that I'd give it a very small chance of being wrong, easily less than 10^-9 but certainly not less than 10^-100. To push something below a probability like 10^-100 one would have to be unreasonably confident in things like "I haven't developed a brain injury causing me to be very confident in 2+2=4 and confabulating all other evidence as needed". Such a thing is absurdly unlikely of course, but a 10^-100 occurrence is far, far less likely still.
> Sure, you can be fooled into thinking a black ball was pulled out of an urn of white balls, but whether the urn has all white balls is just a fact, not a probability. And under proper conditions, we should be able to verify that fact. To doubt that is to entertain wild skeptical scenarios where we can't really do science.
You can verify it enough put the odds of it low enough to arguendo it as true for most purposes, but that's because for most purposes a certain probability of being wrong is acceptable, and once you've pushed something far below that threshold then you can fairly safely ignore it.
As an intuition pump, imagine that you have personally drawn the balls from the urn. You drew a black ball and carefully observed it. Then, someone approaches you and offers to make you a bet: if the urn had ever contained a black ball then he would pay you one dollar, otherwise you would owe him a hundred million dollars. He hands you a clear, unambiguous, and legally enforceable contract to that effect.
Do you sign? If you believe that there is a 100% chance that you drew a black ball, then there's a zero percent chance that you'll lose this bet. It's a free dollar, why wouldn't you sign it?
> To doubt that is to entertain wild skeptical scenarios where we can't really do science.
> At any rate, I don't see that sort of reasoning as Bayesian. It's just radical skepticism.
We can do science in the face of the probability that we might be wrong about anything and everything, the key thing is to keep in mind the bounds – at least roughly – of the different ways that we can be wrong.
It does involve a bit of humility, but not a step change compared to the ordinary scientific humility of "this is the best answer we have so far, but it could be wrong".
> A simple deductive proof of 2+2=4 (say in Peano Arithmetic) is clear enough that I'd give it a very small chance of being wrong, easily less than 10^-9 but certainly not less than 10^-100. To push something below a probability like 10^-100 one would have to be unreasonably confident in things like "I haven't developed a brain injury causing me to be very confident in 2+2=4 and confabulating all other evidence as needed". Such a thing is absurdly unlikely of course, but a 10^-100 occurrence is far, far less likely still.
In the case of having a brain injury that makes you incorrectly think a simple deductive proof is true when it's not, then bayesian reasoning isn't going to fare any better. We're in hinge proposition territory where we can't reasonably doubt what gives the basis for reasoning. 2+2=4 by definition of basic arithmetic. The cost of doubting that is is to doubt any consistent reasoning, including probabilities.
> Do you sign? If you believe that there is a 100% chance that you drew a black ball, then there's a zero percent chance that you'll lose this bet. It's a free dollar, why wouldn't you sign it?
The only reason not to sign is because there's strong reason to believe they have a trick up their sleeve they can fool you with. But if we take something verifiable by everyone, like the roughly spherical shape of the Earth, or the speed of light in a vacuum, there is no reason not to take the bet. It would be a free dollar.
> We can do science in the face of the probability that we might be wrong about anything and everything, the key thing is to keep in mind the bounds – at least roughly – of the different ways that we can be wrong.
We can't be wrong about everything, because that would mean science was impossible. That would be the radical skepticism I was mentioning, and you can't use Bayesian reasoning with radical skepticism.
Which is why Kant had to come up with categories of thought to rescue rationality in the face of Humean skepticism. You can't give a probability to the liklihood of the sun continuing to undergo fusion tomorrow if the laws of nature could change on us at any time for no reason (tomorrow is Thanksgiving for the turkey scientist who is confident about continued survival Hume would say).
I don't usually make this kind of comment, but the comment to which I am responding should be occupying the top spot on this forum, not the trash flamewar comment that currently occupies the top spot. If your moderation system can't differentiate actual expertise from flamewar trolling, then is it fair to say that the moderation system is not working?
Great essay, deeper than being about just Bayesianism. In the essay David talks a lot about explanations and what is good one v/s a bad one. The short TED lecture by him gives a deeper insight into his explanation of why good explanations (hard to vary theory of the world) are the key to scientific and human progress since the scientific revolution.
For the clearest and most precise view of Deutsch's philosophy of science in the context of quantum theory, read his paper The logic of experimental tests, particularly of Everettian quantum theory: https://www.sciencedirect.com/science/article/pii/S135521981...
Prevailing discussions (e.g. Dawid & Thébault, 2014; Greaves & Myrvold, 2010) of the testability of various versions of quantum theory have approached the matter indirectly, in terms of support or confirmation – asking how our credence (degree of belief) for a theory should be changed by experiencing results of experiments. However, experimental confirmation is a philosophically contentious concept. Notably, it is rejected root and branch by Popper (1959). I shall present an account of the nature and methodology of scientific testing that closely follows Popper׳s. It differs from his, if at all,3 by regarding fundamental science as exclusively explanatory. That is to say, I take a scientific theory to be a conjectured explanation4 (explanatory theory) of some aspects of the physical world – the explicanda of the theory – that is testable (I shall elaborate what that means below) by observation and experiment. A scientific explanation is a statement of what is there in reality, and how it behaves and how that accounts for the explicanda. Neither confirmation nor credence nor ‘inductive reasoning’ (from observations to theories or to justifications of theories as true or probable) appear in this account. So in this view the problem described in Section 1 is about testing theories.
This contradicts the ‘Bayesian’ philosophy that rational credences obey the probability calculus and that science is a process of finding theories with high rational credences, given the observations. It also contradicts, for instance, instrumentalism and positivism, which identify a scientific theory with its predictions of the results of experiments, not with its explanations. My argument here, that Everettian quantum theory is testable, depends on regarding it as an explanatory theory, and on adopting an improved notion of experimental testing that takes account of that.
Scientific methodology, in this conception, is not about anyone׳s beliefs or disbeliefs. Rather, it assumes that someone has conjectured explanatory theories (though it says nothing about how to do that), and it requires those who know (i.e. are aware of) those theories and want to improve them, to attempt to locate specific flaws and deficiencies and to attempt to correct those by conjecturing new theories or modifications to existing theories. Explicanda in the sciences usually involve appearances of some sort (e.g. the perceived blueness of the sky). Theoretical matters can also be explicanda (e.g. that classical gravity and electrostatics both have an inverse-square force law), but those will not concern us here. Explanations of appearances typically account for them in terms of an unperceived, underlying reality (e.g. differential scattering of photons of different energies) that brings about those appearances (though not only them).
I believe you can actually get Popperian falsification out of Bayesianism if you squint right. Consider an falsification experiment. It disproves some theories while not changing our relative beliefs in other theories.
this would be basically
for all i in S1 p(evidence | theory i) = 0
for all i in S2 p(evidence | theory i) = k * p(evidence)
I would say that most scientific evidence is of this sort, except that the probabilities for the "falsified" theories can also be a little bit above 0 to account for measurement error.
Edit: Actually it may be fruitful to introduce a distinction similar to the one probability theory has... In probability theory there is a difference between sample and event. An event is a set of samples. In our case, I believe we want a distinction between a theory... and a lets call it micro-theory. To pick a funny and memorable example a theory could be something like "there is a Loch Ness monster". Now a micro-theory could be something like "there is a Loch Ness monster, that is invisble, and makes no sounds, but it can be detected by radar... and... and...". So it would include all these additional constraints. So the theory it's composed of a lot of these micro theories right. Now if we take photos of every inch of Loch Ness, and we don't find any Loch Ness monster, we make it less likely that there is a monster right. We may say "oh we were careless, and just missed it" but if we keep looking eventually that becomes an impossibility. So we disprove a bunch of micro theories, but some will remain. Our previous micro theory that among other thing says that the monster is invisible remains. And whats worse the relative likelyhood of them is unchanged compared to the no monster.
p(cant get monster on photo | no monster) = k p (cant get monster on photo) and
p(cant get monster on photo | invisible monster) = k p (cant get monster on photo)
Now if we want to further increase our belief in "no monster" we would have to go after these wacky micro-theories and disprove them, using e.g. radar. But given that a sensible person assigns those micro-theories low prior likelyhood we may be satisfied with the situation and not bother.
So basically these was just Popperianism in Bayesianism clothing right? Almost... Notice the very last point. We allowed ourselves not to bother with theories of invisible monsters. Because of our prior likelyhood.
This is an awful take, and I hope other readers ignore it, and give the podcast a listen/read.
It's a long podcast, and if you skim the transcript you can see that the discussion doesn't start until much later.
I'm a pretty strong Bayesian, and have heard more than my fair share of vague, hand wavy, and stubborn frequentist arguments against Bayesian statistics. When I see an "Against Bayesianism" rant, I'm already biased against it from seeing so many awful arguments thrown out there, mostly to troll Bayesians.
This is absolutely not one of those. This is a very thoughtful and clearly articulated discussion of the applications of, what Deutsch easily agrees, is a correct statistical methodology to larger epistemological questions.
It is long, so I only had a chance to skim this but it is incredibly obvious that David Deutsch is not "seeking attention", but has very legitimate concerns with the mindless application of Bayesian reasoning to larger epistemological problems in science. I'll certainly be revisiting this later for a closer listen.
So that's the best appraisal you can give of David Deutsch: "Just an old man seeking attention"?
Well, leaving aside the usual embarrassment I feel when it comes to the impromptu nonsense a fair share of HN commentators think it's worthwhile to contribute here when there's a piece of news involving physics or physicists, that's an ageist take without any content whatsoever.
There's more old people who know what they're talking about than young people. That's just the obvious consequence of having been around reading and thinking about stuff more time. You'll notice it eventually because as the song goes, time waits for no one.
Yep, we're getting too impatient and that's not helpful when it comes to think deeply about what we've been taught. Yet that's the most important part of any job IMO. I'll check it out, maybe you're right and he's rambling. That would be surprising to me, which is the reason I replied to your post.
I don't think he's rambling. He isn't sure what familiarity his audience will have with the intellectual underpinnings of his argument, so he recaps those before embarking on the argument proper, in order to make sure the audience can follow.
Granted, he does tell a couple
of anecdotes in the process, but maybe that's his style. I think it's fair to consider impatience implicated here - for what it's worth, when I find myself feeling that way about coverage of stuff I already know but not everyone is guaranteed to, I usually just skip ahead or scroll ahead, checking in here and there, until I hit something on point or that I don't already know. (Usually the second one!)
Impatience is an emotion, and while we can't help much what we feel or how we feel it, we can most of the time treat what we feel as input. Think of it, if you want, like a Datadog alert. How do we handle those? By investigating to understand the root cause and taking whatever action that requires in the context, if any. If we let them drive our behavior directly without taking the time for considered action, we easily risk causing more problems than we're likely to solve.
Granted, I don't entirely love this metaphor, which is no less flawed than any. Maybe too some dork on Twitter will use this as an example of the mechanistic techbro attitude endemic to the diseased discourse of Hacker News comments, or something; it does lend itself somewhat to such misrepresentation.
But despite that lossiness I think it's not wholly without use, because it does point at least vaguely toward a way in which we can manage and make valuable use of even the most unpleasant among our emotions, and one that's served me well over the years since I stumbled upon the concept in some writing or other, I've long since forgotten where.
(I don't think Deutsch was rambling, but I certainly am, in an effort to distract myself from a quite unpleasant facial pain I can't do anything meaningful about until Thursday. Please excuse me.)
I read the transcript and he doesn't seem to be rambling any more than anyone else speaking off the top of their head, rather than writing a carefully edited essay.
Also what mnl said. I hope their comment helps you see your comment in the context of how it would appear to people who read HN with a hostile attitude and look for reasons to reject it (not mnl obviously).
He only started talking about Bayesianism about 29 minutes in to the podcast.
Don't blame him for that... the interviewer asked him all sorts of tangential questions before finally asking him about Bayesianism directly.
But even some of the earlier things he talks about (particularly Popper's objections to inductivism) are actually relevant to his critique of "Bayesian epistemology", which he claims is "a species of inductivism" (which, to his mind, Popper demolished).
That's not the comment I was expecting to see at the top.
I haven't read TFA yet but I popped in to say how much respect I've got for David Deutsch and what an influence he has been on my intellectual development.
Physicists have this nice saying: "Shut up and calculate".
Bayesianism, frequentism, who cares? "All models are wrong, but some are useful".
Bayesianism would be a footnote in history if it weren't for Markov Chain Monte Carlo, which allows you to perform estimations when Maximum Likelihood fails.
All those people who wax poetic about priors and expert knowledge and stuff, just keep away from them. You don't want to mix with that crowd.
> Physicists have this nice saying: "Shut up and calculate".
Some, but not all. Some want to know what it is that makes it so we can calculate the probability of a measurement, or whether a black hole loses information. Some are/were quite philosophical, including Feynman, despite his protestations.
Not some, all. Physicists earn their right to talk about deep foundational things after they first shut up and calculate. Those who are not able to calculate, don't get tenure.
There's no fundamental ontological difference between "give me your likelihood ratios between drawing cards [a and b] as the next draw from this deck of cards" and "give me your likelihood ratios between finding and not finding a continent in an unexplored part of the world". While what we ultimately care about is the territory, which is fixed, we only have our imperfect map of the world with which to model it. There is nothing inherently "probabilistic" about a shuffled deck of cards - the cards have a specific configuration; the uncertainy is only present in the observer(s). Predicting the next draw from a deck of cards is subject to "out of context" model violations the same way that any other prediction over future world states is.
The proposed escape hatch seems to be to dodge the question, but of course in reality you are not agnostic over all possible future events that haven't been ruled out by being "bad explanations". You can say, "well, don't make predictions with numbers on them", but by acting in the world you are making predictions with numbers on them all the time! Bayesian epistemology doesn't claim to be perfect, it claims to be the least inaccurate way of making those predictions and updating on new evidence. Now, there's a separate question of how bounded reasoners (i.e. humans) can best accomplish their goals, and "explicitly run Bayes to decide whether to get out of bed" probably isn't it. But superforecasters are an existence proof that explicit Bayesian calculations are still extremely useful even in highly uncertain domains.
Mechanistic explanations are great but even having a mechanistic explanation doesn't let you say P(x) = 1. There can be enormous differences in reasonable mitigations to take between P(x) = 0.99 and 0.999999, depending on context.