EDIT: There is actually a short discussion at the end of the PDF on ranked programming:
"Besides ranking theory, there are other alternatives to probability theory. These include Dempster-Shafer functions,
possibility measures, and plausibility measures (see [Halpern, 2017] for an overview and comparison, which includes ranking theory). Possibility measures measure uncertainty on an arbitrary totally ordered scale, such as the integer scale from 0 to ∞ (making them equivalent to ranking functions) or real numbers from the interval [0, 1]. The latter is often chosen to model vagueness or fuzziness. Possibility theory has been used to equip logic programming and answer set programming with the ability to deal with uncertainty [Nicolas et al., 2006; Bauters et al., 2010; Alsinet and Godo, 2000]".
> Our work is based on the idea that ranking theory can be used as a general tool -- like probability theory -- to build arbitrary models involving uncertainty.
Why would I believe that? What benefit does calculating these "degrees of surprise" give?
Compare with probability theory which has done an outstanding job at providing quantitative answers to questions in an uncertain context.
Author of this work here (and a bit surprised to find this on hacker news).
I don't disagree with your point about probability. However, there are various alternatives to probability which can be more appropriate for certain applications. For instance, with degrees of surprise (or ranks as they are called) you do not need to know exact probabilities, although the price you pay is that the scale is more coarse grained. If you reason about events that are either very probable or very improbable, then degrees of surprise are easier to work with than probabilities.
There's also a computational benefit. Execution of a ranked program is done much like a non-deterministic program where choice points execute "least surprising first". Compare that with the sampling techniques necessary in probabilistic programming that do not even provide exact results.
What I don't understand is _what_ these applications are.
What does this surprise value tell me, and how can it be calibrated to anything quantitative? What's the equivalent of statistics that I can use to alter my original model's rankings to conform to what I observe?
I can understand ranking as coarse-graining of probability if there's some way to approximately map it back to the standard notion of probability.
On the other hand, using rank to direct collections of traces in a well-defined order is admittedly a neat trick; I can see using it to find something like the "most reasonable" or "easiest to explain" solution to a given logic problem. (I have actually seen a hand-coded version for Sudoku that attempts to minimize the backtracking depth to make it as easy to explain as possible.) But when would I want to do that to a program where the ranks represent instead how common a given event is? Verifying the no-exception case first? Is that useful?
Ranks represent subjective degrees of belief but they do not, like subjective probabilities, nicely map to relative frequencies. So compared to probabilities, ranks are indeed limited in this regard. For instance, you can learn probabilities from data but there's no obvious way to learn ranks from data.
But for subjective belief they're still useful. Consider the problem of diagnosing a system with components that fail in rare cases. However we have no idea about failure probabilities. We can then use ranks. A diagnosis for some observed behaviour would then be the least surprising (i.e., lowest ranked) failure state that explains the observed behaviour. This is also the reason for least-surprising-first execution: the most important prediction or hypothesis is the most likely one and thus the least surprising one. There are some concrete examples in the paper which demonstrate this.
I am currently thinking about combining probabilities with ranks so that you can reason about both kinds of uncertainty in the same model. This could be implemented using a programming language that supports both ranked choice statements and probabilistic choice statements.
Probability only deals with a very specific type of uncertainty: stochasticity.
This is by far not the only type of uncertainty. Compare e.g. uncertainty as vagueness. Probability is entirely ill-equipped to deal with this kind of uncertainty.
E.g. consider needing to make a distinction between class A and B (real life example: heart muscle vs papillary muscle). Now consider the existence of a 'hybrid' class between the two (e.g. a cellular structure that is neither heart, nor papillary muscle, but a mixture of the two).
Any probabilistically stated problem that attempts to define uncertainty of the hybrid tissue, will fail even before starting, because it will attempt to provide a stochastic intepretation, when in fact sampling from that pixel should always deterministically lead to the exact same result, albeit uncertain in terms of vagueness. This type of uncertainty is better addressed using fuzzy measures other than probability (which is only a subset of more general fuzzy theory).
See Klir & Folger for nice discussion on this (I forget the title, but it is called something like Uncertainty in Fuzzy Theory Applications).
> This type of uncertainty is better addressed using fuzzy measures other than probability (which is only a subset of more general fuzzy theory).
Do you mean that both probability theory and possibility theory are subsumed by fuzzy logic? What about works that integrate fuzzy logic and probability? eg. https://arxiv.org/abs/1610.09156 employs both fuzzy logic and probability; can those be understood in terms of fuzzy logic alone?
> See Klir & Folger for nice discussion on this (I forget the title, but it is called something like Uncertainty in Fuzzy Theory Applications).
Do you mean the 1988 book "Fuzzy Sets, Uncertainty, and Information" by George J. Klir and Tina A. Folger?
If you have any other references on this I would be glad to read!
edit: regarding the my first question, I found this
Fuzzy Logic and Probability
Petr Hajek, Lluis Godo, Francesc Esteva
In this paper we deal with a new approach to probabilistic reasoning in a logical framework. Nearly almost all logics of probability that have been proposed in the literature are based on classical two-valued logic. After making clear the differences between fuzzy logic and probability theory, here we propose a {em fuzzy} logic of probability for which completeness results (in a probabilistic sense) are provided. The main idea behind this approach is that probability values of crisp propositions can be understood as truth-values of some suitable fuzzy propositions associated to the crisp ones. Moreover, suggestions and examples of how to extend the formalism to cope with conditional probabilities and with other uncertainty formalisms are also provided.
Sorry for the delay, just seeing this. Yes, effectively the entirety of probability theory can be reduced to Cox's axioms of probability, which themselves are specialised instances of fuzzy set/logic theory. Therefore in theory any probabilistic formulation can also be expressed in fuzzy terms with suitable constraints.
E.g. I think what is implied by your final quote is the distinction between 'crisp' events and non-crisp ones. Probability only deals with crisp events. In other words, if you have a probability distribution over events, those events are required to be mutually exclusive and exhaustive for Cox's axioms of probability to apply. A fuzzy formulation can express the same problem but without that constraint, e.g. events may be 'fuzzy' (in that you might consider an 'event' with an associated 'probability / fuzzy membership in some set' of its own, but also the definition of the event itself is 'leaking out' into neighbouring events.
As an example, consider a set of integers {1 .. 10}, and you want to define a probability distribution for each of the numbers on that set. A fuzzy formulation could allow you to generalise this space further using 'fuzzy integers', expressing the notion e.g. 'approximately ten'. You can then then talk about the probability of 'approximately ten' in the set (where the definition of 'approximately' is not necessarily stochastic).
I'm not an expert in any of this, alas, but it does seem to me that exposure to fuzzy theory in the public seems to follow a pareto distribution, where 99% of people have only heard of ridiculously elementary uses of fuzzy theory (and therefore think it's some sort of poor-man's probability theory), which probably only accounts for something like 1% of the field, and the remaining 99% of fuzzy theory is only dealt with by 1% of people who are more preoccupied with abstract math formalizations and have no interest in practical applications (which is a shame).
There are mathematical formalisms (like possibility theory http://scholarpedia.org/article/Possibility_theory - this article seems to be better than in Wikipedia) that allow making decisions based on the relative likelihood of events (those "degrees of surprise").
It can be useful when the actual probabilities of events are not known (it's often the case in real life scenarios). Then possibility theory may actually tell you what decision to make based on this limited information. My understanding is that the point is not that probability theory does not work, but that it requires a lot of knowledge about the system, which might be not available to the agent. In such situations, possibility theory provides a framework for making reasonable deductions.
While I didn't read that article in detail, I did not find any clear discussion of how it differs from probability.
The one difference I saw was that total possibility is not normalised, in principle for a continuous variable the total can be infinity. But I'm not sure what effect this has in practice besides making it tricky to work with some contrived examples.
Logistic regression or neural networks with softmax activation use $ln p$ when manipulating probabilities. Normalization to a total probability of 1 is usually delayed to the end (so intermediate values are actually $ln p_i + c$ where c will only be computed during normalization). This representation of probabilities has the nice property that Bayes' law turns into simple addition.
The linked approach sounds very similar, calling $-ln p$ rank. However it appears to use a different normalization approach.
EDIT: There is actually a short discussion at the end of the PDF on ranked programming:
"Besides ranking theory, there are other alternatives to probability theory. These include Dempster-Shafer functions, possibility measures, and plausibility measures (see [Halpern, 2017] for an overview and comparison, which includes ranking theory). Possibility measures measure uncertainty on an arbitrary totally ordered scale, such as the integer scale from 0 to ∞ (making them equivalent to ranking functions) or real numbers from the interval [0, 1]. The latter is often chosen to model vagueness or fuzziness. Possibility theory has been used to equip logic programming and answer set programming with the ability to deal with uncertainty [Nicolas et al., 2006; Bauters et al., 2010; Alsinet and Godo, 2000]".