Hacker News new | past | comments | ask | show | jobs | submit login
What is ergodicity? (2016) (larspsyll.wordpress.com)
155 points by lamby on June 26, 2019 | hide | past | favorite | 54 comments



Interestingly, ergodicity has two meanings (https://www.merriam-webster.com/dictionary/ergodic):

1. of or relating to a process in which every sequence or sizable sample is equally representative of the whole (as in regard to a statistical parameter)

2. involving or relating to the probability that any state will recur especially : having zero probability that any state will never recur

Nassim Taleb mostly uses ergodicity in the second meaning, while the article concentrates on the first. I find it utterly confusing.


Taleb uses his own definition of ergodicity which he's yet to state


I’m curious why you would think that.

afaik, Taleb has repeatedly & explicitly stated “no probability without ergodicity”, which is pretty much a frequentist mantra. The problem with non-statisticians like Nate Silvers (who holds a BS in Econ, as opposed to Taleb who earned his PhD in Statistics), is that they traffic in electoral polls involving human subjects which are anything but ergodicity. Yet, they brazenly attribute probability models to their conclusions. That has severe definitional issues, which is what Taleb objects to.

As to the gatekeeping issues, my mom buys eggs for $3 and groceries for $7. 3+7=10, so she hands over a $10 bill. Wait a minute, does that mean my mom is a mathematician, because she has engaged in arithmetic, which is math ? That’s the issue here. Silvers can simply post how many people want to vote which way & leave it at that. It’s just an opinion poll. Instead he engages in an elaborate charade where he takes polls of polls, ensemble averages, then says this is what is most likely, then on election day when most likely becomes least likely, he washes his hands off. Thats just clearcut fraud. All Taleb is saying is that you simply don’t have ergodicity here so you don’t have a probability. You just have a [0..1] fraction that doesn’t mean much because it doesn’t converge to a limit as information approaches infinity. So if you had skin in the game and purchased an option based on the projected outcome you would lose big time.


I don't understand your (or Taleb's) point. If you insist on "no probability without ergodicity" then you can't put a probability on stock price movement, or any election or sporting event. But people put probabilities on those all the time. If that breaks the assumptions of frequentist models then it's the frequentists who are wrong about what people mean when they use probabilities.

I agree with the idea that Nate Silver is an entertainer, and if he were a bookie he'd need to be more precise about the odds he places on things and his range of uncertainty. But to me that doesn't suggest that Silver shouldn't speak in terms of probability at all. For a journalist (a big caveat) he's remarkable for being rigorous, open about his level of certainty, and willing to admit to mistakes.


I disagree that statisticians should adjust their definitions to fit common usage.

I can put any probably on anything I want. That doesn’t imply those numbers mean anything usefu or that what I’m doing is backed by any established theory.


There are established theories of probability that have nothing to do with ergodicity. See for example:

Laplace - Théorie analytique des probabilités - 1812

Keynes - A Treatise on Probability - 1921

Jeffreys - Theory of Probability - 1939

Savage - The Foundations of Statistics - 1954

de Finetti - Teoria delle Probabilità - 1970


I hardly think it's fair to say that Nate Silver isn't interested in accountability for his predictions. The site recently finished a retrospective that tries to address this issue:

https://projects.fivethirtyeight.com/checking-our-work/

Probability theory, whether frequentist or otherwise, is "just" math. Identifying that mathematical structure with real-world events is always going to involve a judgment call. So if your framework requires assumptions that don't hold in the real world, then maybe it's not the appropriate framework to use. Because clearly people are making decisions based on numbers between 0 and 1, and they're not generally losing big time.


> So if you had skin in the game and purchased an option based on the projected outcome you would lose big time.

Would you lose at any odds? Then the probability is zero.

Would you (expect to) lose at some odds and win if the odds offered were different? Then the probability is somewhere in between.


> as opposed to Taleb who earned his PhD in Statistics

Wrong.


Management Science / Nassim Taleb - 1998 : The Microstructure of Dynamic Hedging

https://helyettegeman.com/phd-students/

Not so different from economics.


your reply made me question myself so I got Skin in the Game off the shelf and whadddayknow he _does_ give a definition:

"Perfect ergodicity means that each one of us, should he live forever, would spend a proportion of the time in the economic conditions of the entire cross-section[of the U.S. population]: out of, say, a century, an average of sixty years in the lower middle class, twenty years in the blue-collar class, and perhaps one single year in the one percent"


Honest question: Aren't both equivalent? If the process will eventually traverse its entire state space, you should be able to infer the behavior of the process across all time from a finite (sufficiently large) sample of the process's behavior.


Second version is much more loose, e.g. most semi-stationary processes fit the bill.

In fact article makes a mistake by requiring stationary nature from ergodic process. You can have an ergodic process of the first kind where ML expectation is equal to the value of the process function, which nonetheless is not stationary in absolute sense. (Because definite integral does not match indefinite.) Just perfectly predictable and having the indefinite integral identical to integral over space state. Which implies strictly bounded but not exactly stationary in any defined term.

Likewise you can prove an equivalence class of the process to be identical to another known ergodic process without actually calculating such difficult to obtain complete integrals.

And ergodicity is not required for representative sampling either, just definite error bounds on predictive (statistical) power or likelihood estimates. Which can be estimated or directly calculated for many kinds of processes accurately. It is only required for certainty.


Yes, that's the content of the Birkhoff Ergodic Theorem.


Naively I would have thought that there is another definition which goes along the lines that a process is ergodic if it looses memory of its starting conditions. But, maybe I’m mixing up things here.


That's essentially equivalent except in pathological synthetic cases, as time goes to infinity. Since any sample is representative of the whole, samples that don't include the intial conditions are as representative of the whole as samples that do include the initial conditions.


Well, trivially, a dynamical system with that switches between two states is ergodic. We always know what the past state of the system was given the present.

No, ergodicity is more about visiting the each part of the phase space with some positive frequency i.e. on a long trajectory you see everything that can happen, and not too infrequently.


The dictionary is not a reliable place to look up technical terms. (Actually, it's not a reliable place to look up most anything.)


Are you contesting either of the meanings?


No, I'm questioning the validity of answering the title question by consulting a dictionary. That's not how one should go about answering technical questions, even if we find it works sometimes. (The proper way to address the question would also allow you to recognize when a dictionary was wrong.)

Dictionaries don't actually tell you what a word means, they give examples of what it may mean from common usage.


There is a recent paper[1] and an epic twitter thread[2] well worth the reading

[1] https://arxiv.org/pdf/1906.04652.pdf

[2] https://twitter.com/hulme_oliver/status/1139148255969906689


The twitter thread gives a nice example of a failure of ergodicity that I was not aware of:

"Now consider a gamble with multiplicative dynamics.

Win 50% of your current wealth for heads, lose 40% of your current wealth for tails.

Changes in wealth now are non-ergodic, so calculating the expectation value is not informative of the time average growth rate of wealth.

This exact gamble has a positive expectation value, but it has a negative time average growth rate. We call it the Peters coin game."

Clearly the expectation of return at a fixed time is > 0, but they give an example time series with obvious negative drift.


I'm confused. My takeaway was that dice rolls and coin tosses are ergodic because flipping one coin 20 times or flipping 20 coins is meaningfully equivalent. Whereas following an individual person for 1000 days is different from following 1000 people for 1 day.

Is that not the right heuristic?

Also -- why is it called ergodicity? I keep thinking it has something to do with work or thermodynamics, but I'm missing it...


You are remembering the ergodic hypothesis in thermodynamics, whereI think the name originated and then was generalized.


Is there a relationship?


In a sense. Ergodic hypothesis is basically that the phase space is explored "equally" (in a precise sense), over time. Ergodic random processes are pretty much independent of their initial state. So at a hand-wavy level its a very similar idea.


How does this work?

By binomial theorem, the expectation at time t is ((1.5 + 0.6)^n)/2^n) = 1.05^n so the "annualized" return per unit time is 1.05, exactly the same as the expectation.

What is "time average growth rate"? Web search turns up lots of references do it by Peters, but no definitions.


You have the wrong model. You're betting a different amount each time. Say you start with $1 and get a heads then a tails:

($1 * 1.5) * .6 = $.90

What if you get a tails then a heads?

($1 * .6) * 1.5 = $.90

Doesn't seem like such a great game now, does it?

The Kelly Criterion sets a limit for how much of your total wealth you should bet when the odds favor you. If you bet more than that limit you increase your odds of losing everything without improving your expected return.

So imagine a better game where you could bet any amount, and you still got paid $5 for every $4 you risked and had a 50% chance of winning. The correct amount to bet is 10% of your bankroll. If you bet more than that in the long run you will go broke.


You are ignoring half of the possible outcomes. If you get HH or TT, that's $2.25 and 0.36, which averages to $1.25.

50% of ($1.25 + $.90) for a $1 bet seems like a great game to play.


This is actually an awesome illustration of why a lot of famous results in academic finance were wrong in the 70s and 80s. Returns don't average, they multiply. If you get HHTT or TTHH you're in terrible shape:

2.25 * .36 = .36 * 2.25 = .81

You started with $1 and now have 81 cents. I recently learned that a lot of early computer-era academic finance actually made the same error of averaging returns, as described in Finding Alpha:

http://falkenblog.blogspot.com/2016/08/finding-alpha-pdf.htm...


After many rounds, you expect to see half heads and half tails. In that case your outcome at n rounds would be around 1.5^n/2 * 0.6^n/2 = 0.9^n/2 which trends towards zero.

You would need around 26% more heads than tails just break even (0.6*(1.5^1.26)=1). But even if you were lucky at first, you will always "regress toward the means" in terms of head to tails ratio if you kept playing.


> If you bet more than that in the long run you will go broke.

The optimal bet is not the same as the critical bet separating the positive growth and negative growth regions.


You're correct. Betting more than double the Kelly amount results in negative expected growth:

https://wizardofodds.com/gambling/kelly-criterion/


This is assuming additive growth, not multiplicative growth. The expectation is relevant when you have additive growth, but multiplicative growth is the relevant model here because you are increasing by 50% of your wealth vs. decreasing by 40% of your wealth (not adding fixed amounts, e.g., +$0.50 vs. -$0.40). Multiplicative growth also returns 0.95 per unit time, whereas the (irrelevant) expectation value is 1.05 (hence the violation of most people's intuition). Here is some R code demonstrating it:

set.seed(123)

# simulating multiplicative growth

n <- 1e4

init_wealth <- 1

x <- sample(c(0.6, 1.5), n, replace=TRUE)

init_wealth * prod(x)

# analytic model

n1 <- length(x[x==0.6])/n

n2 <- length(x[x==1.5])/n

rate <- (0.6^n1) * (1.5^n2)

init_wealth * (rate^n)

(Edit: formatting, and wording for clarity)


>calculating the expectation value is not informative

Well, only if you use a linear estimator, i.e. the arithmetic mean. Any accountant would know to use a geometric mean (the annualised rate of return).


It is still an interesting example. Following your suggestion, which is a good one, the player's holdings at time N, starting with $1, are:

   H = [1.5 or 0.6] * [1.5 or 0.6] * ... } for N factors
To make it into a more familiar process, take logs:

   log H = log([1.5 or 0.6]) + log([1.5 or 0.6]) + ...
Now we have a familiar random walk with drift, and our intuition is much better-developed for this process. In particular, the mean

  E log H = Sum{1..N} E log([1.5 or 0.6])
          = Sum{1..N} [-0.023]
          = -0.023 N 
          < 0
So the expectation of the log earnings is negative. In fact, because of the drift, sample paths of log H will be unbounded below with probability 1. This nontrivial fact (from the basic theory of random walk) will determine how the process sample paths (even outside the log domain) "look" - they will always eventually squash towards 0. Every single one.

But the expectation of the earnings themselves (as several comments have shown) is > 1. Apparently, without the compression given by the log, the upward excursions given by lucky bets outweigh the unlucky ones.

We can look at this discrepancy ("good game in original units, bad game in log units") the other way. Even though the sample paths eventually go to zero, in that case, you only lose $1. Whereas many other sample paths will make large excursions upward before coming to zero. So the expectation of the non-logged process adds up over many losses of $1, and a few low-probability gains of much more.


They're simply calculating expectation incorrectly. Expectation is a probability-weighted sum over every attempt. While the game of 50/40 is based on a probability-weighted product over every attempt.

Basically "average expectation" for that game is 0.95 (and they give a false name to it).

Tl;dr: it's got nothing to do with ergodicity.

More importantly, they quote "Peters coin game" which has no results in Google whatsoever.


>>> More importantly, they quote "Peters coin game" which has no results in Google whatsoever.

Maybe a reference to this paper [1] ?

[1] The time resolution of the St. Petersburg paradox, by Ole Peters

https://arxiv.org/pdf/1011.4404.pdf


> they quote "Peters coin game" which has no results in Google whatsoever.

As he says in the Twitter thread anyway, they are coining this name.


The expectation of a single flip is 'positive' ( > 1 ), the expectation of n flips is 'negative' ( < 1).

My layman reading of this is that this 'appears' to be a violation of martingale theory because this 'should' be a sub-martingale but martingale theory requires finite variance so I suspect the variance of this must go infinite as n tends to infinity?


Why do you say that the expectation of n flips is negative?

For 2 flips there are 4 equiprobable cases with gains +125%, -10%, -10%, -64% and the expectation is 10.25%. The expected gain for n flips is 1.05^n-1


AFAIK (am reading quickly) mean values of COMPOUND interest rates (which is what you are doing) are not useful.

If you earn 10% and then lose 10% you are worse off than before. Start with 10: (11-1.1 < 10).


Earning 50% and then losing 40% is equivalent to losing 10%. The naïve calculation you're arguing against would yield a gain of 10%. kgwgk didn't make that mistake.


The definition of expectation is what it is :-)

I agree that expectation may not be the most meaningful number here, in the long run is almost sure that we lose everything but there is a zero probability of infinite gains that makes up for it!


1.05^n is > 1, with positive finite expectation at every finite time step.

Why do you say almost sure to lose everything?


For any threshold (say $1, assuming you start with $1mn) and any confidence level (say 99%) there is a number of flips n such that you have over 99% probability of holding less than $1 after playing n times. On the other hand, the expected value of your wealth at that time is 1.05^n million dollars.


This is well-stated.

In fact, an even stronger statement is possible: If you start with $1MM, eventually you will fall below $1 and never go above $1 again no matter how long you keep playing.

And all of that not just on a fraction of sequences, but with probability 1.


True, as I said in a previous comment we have “almost sure” convergence to zero (i.e. the sequence will converge to zero with probability one).


You're right, I screwed up the arithmetic making it look like something it's not.

It's just a regular sub-martingale with 'positive' expectation as you say.


In addition to the paper and Twitter thread mentioned by fasteo elsewhere on this thread[a], this video of a lecture by the author is well-worth watching too:

https://www.youtube.com/watch?v=f1vXAHGIpfc

Among other things, in this lecture the author shows, step by step, in non-technical language, an example with coin tosses in which changes in wealth are non-ergodic (in the sense he explains), such that the expected value of winnings over time is not informative of the time average growth rate of wealth.

[a] https://news.ycombinator.com/item?id=20285638


Getting a good feel for both stationarity and ergodicity in systems is worthwhile; assuming them is a common way to “intuitively” infer incorrect behaviour.


The author writes,

>Paul Samuelson once famously claimed that the “ergodic hypothesis” is essential for advancing economics from the realm of history to the realm of science.

It's curious that science and history are pitted against each other; not only is science notoriously difficult to define, but there are several historical approaches to scientific objects (e.g archaeology for the object of human history). This also seems to involve a logical model in which the concepts stay static, but some popular models outside of mainstream economics are used because the authors argue that an approach divorced from real development leads to results that don't apply to real situations (e.g. the dialectical logical method, which is both syntactic and semantic.

Unfortunately the sort of logic Samuelson applies has been misapplied (in neo-Ricardian lenses) to thinkers which seem to hold strictly temporal (rather than equilibrium) interpretations of economy.


Ergodic and chaotic are not the same either (the difference being in long time diverging behavior of two trajectories that start very close together); many lazy physics science writers and occasionally physicists conflate them.


Great read. Eat your heart out, Betteridge's Law!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: