Hacker News new | past | comments | ask | show | jobs | submit login
So You Think You Have a Power Law (umich.edu)
83 points by saurabh on Dec 15, 2013 | hide | past | favorite | 17 comments



The author is quite an interesting thinker, and I like many of his posts. (I agree that the default font set on his website is all wrong, much too small, so I zoomed in with my browser to read his words.) The key claim in his abstract is important and deserves more attention. (The article title in submission to HN should have "2007" included, as this posts dates back that far.)

"Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the empirical detection and characterization of power laws is made difficult by the large fluctuations that occur in the tail of the distribution. In particular, standard methods such as least-squares fitting are known to produce systematically biased estimates of parameters for power-law distributions and should not be used in most circumstances."

The author's take-home points are a list of good practices. He comments that much of the content of whole journals would disappear if authors and reviewers followed these practices.

"1. Lots of distributions give you straight-ish lines on a log-log plot. True, a Gaussian or a Poisson won't, but lots of other things will.

"2. Abusing linear regression makes the baby Gauss cry.

"3. Use maximum likelihood to estimate the scaling exponent. It's fast! The formula is easy! Best of all, it works!

"4. Use goodness of fit to estimate where the scaling region begins.

"5. Use a goodness-of-fit test to check goodness of fit. In particular, if you're looking at the goodness of fit of a distribution, use a statistic meant for distributions, not one for regression curves.

"6. Use Vuong's test to check alternatives, and be prepared for disappointment. Even if you've estimated the parameters of your parameters properly, and the fit is decent, you're not done yet.

"7. Ask yourself whether you really care. Maybe you don't. A lot of the time, we think, all that's genuine important is that the tail is heavy, and it doesn't really matter whether it decays linearly in the log of the variable (power law) or quadratically (log-normal) or something else."

Good stuff. It takes a lot of practice to get statistical analysis right.


Random note to author if they read this, the default font is unreadably small on my 24" monitor.


Use CTRL and +/- to zoom in/out.


Same here, unreadable to me


Install iReader on Chrome (like Safari reader).


I love Clauset, Shalizi, and Newman for keeping up this fight. Even if you agree with them, you still have to include a linear regression in your paper to satisfy reviewers.

I was surprised to see this on HN, though. I guess everyone is trying to use power laws in one way or another.


Well, ever since Saddam tried to use power law to justify his invasion of Kuwait we've all been stepping on eggshells.


What?


Visually identifying power laws from a log-log plot is a pervasive anti-pattern; this is a good treatment of why we should be sceptical and what we should do instead. Thanks to OP.


This pattern annoys me to no end. People are quick to jump from a claim like "some users are more valuable than others" to "POWER LAW", and from there to folksy wisdom about an "80-20 rule".


I agree, both terms feel oddly specific to me when people throw them around casually, but the layman use is at least directionally correct.

Much more cringe inducing are all of the invented definitions of the Law of Large Numbers! (e.g. http://www.fastcompany.com/1825592/9-reasons-choose-corporat...).


Guh. I encountered this so much on an investment site that I wrote a blog post about the Law of Large Numbers because it was annoying me enough that I wanted to have a shorthand link. http://confounding.net/2012/03/12/thats-not-how-the-law-of-l...

I equally hated "Well, it means something different in this field." No it doesn't, it's math. That's the whole point.


About eight months ago I took the data from the Angel Investor Performance Project(1) and ran it through the powerlaw software(2).

The software said the data on angel funded startups (at least the ones in the AIPP survey) did not indicate that returns followed a power law. Given how often we say "power law!" in regards to startup outcomes, I just thought that was interesting.

(1) http://www.angelcapitalassociation.org/data/Documents/Resour...

(2) http://code.google.com/p/powerlaw/


Investment returns may not follow a power law given that they can come from "normalstan". The distribution of startup values for both funded and unfunded startups (including IPOs AND failures) would come from "extremistan" due to the presence of outliers. The data from extremistan will more likely follow a power law distribution. The terms normal and exstrem-istan come from the writings of Nassim Taleb, author of Fooled By Randomness and Black Swan: Impact of the Highly Improbable.


> The data from extremistan will more likely follow a power law distribution.

That's exactly the kind of logical inference that the linked article warns against, unless you have some additional reasoning behind that assertion.


In general, very, very few things in nature follow a power law. And, either way, like is stated, it is fairly difficult (if not impossible) to even make sense of what is a power law---notably, in phenomenological cases, power laws have no interpretable units, which come about only by the use of constants.


This post is fantastic and the paper is fascinating. I particularly enjoyed his writing:

> This is why God, in Her wisdom and mercy, gave us the bootstrap.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: