Hacker News new | past | comments | ask | show | jobs | submit login
General factor of intelligence g, a Statistical Myth (umich.edu)
42 points by eru on Feb 12, 2011 | hide | past | favorite | 33 comments



It's a great article. But it's a bit more philosophical than practical, and I particularly take issue with the author's use of the term "statistical myth".

Near as I can tell, Thompson's model is the following: there are lots of abilities, and any test uses many (232+) of them at random. Individuals have a random selection of abilities.The author then declares that any individual ability is not g, since no specific ability describes performance very well. All very true.

But there is a single, explanatory variable in this model: g = # of abilities an individual has [1]! Moreover, Thompson's model makes a specific prediction about g - it should be normally distributed. So overall, I agree with the author's scientific argument: g is very likely to be decomposable into subfactors.

But I don't agree with his claim that g is a "statistical myth". Let me give an argument illustrating the fallacy he is making. Suppose I want to explain the thermodynamic law PV=nT. I can build a moderately more complicated statistical model [2] involving only 10^23 newtonian particles, with normally distributed velocities, and completely reproduce all the predictions of the thermodynamics. But not a single one of those particle positions explains pressure or temperature! Thus, thermodynamics is just a statistical myth.

Thermodynamics and g are simplified models of the world, based on the fact that the macroscale is dependent primarily on the sum of a large number of microscale variables [1]. They both have decent, though imperfect, predictive power. There is almost certainly a more complicated underlying theory, which will reproduce thermo/g as theorems about statistical aggregates. (For example, g may eventually be explained as the interaction of neurons.) Does this make them "statistical myths"? Of course not. Just macroscale models which have an underlying microscale explanation.

[1] Or perhaps a weighted average based on how frequently abilities are used in tests.

[2] http://en.wikipedia.org/wiki/Statistical_mechanics

[3] For example, pressure is the the average force imparted by particles colliding with the side of a vessel divided by the area on which the collisions occur.


Reading the original article further I understand more of what the author meant -- and I agree with him more.

What he means by "statistical myth", I think, is that you can throw any number of measurements into a big bag, even weakly correlated, and you'll always be able to find a general macro-factor that is correlated with all of them and therefore seems to "describe the bag" pretty well... even if the bag itself is absolutely meaningless.

For example, one could throw into the bag height, sexiness, feet length, and BMI, which are all somewhat correlated with one another and with "IQ".

There is a macro factor describing such a bag; but would you call it "general intelligence"?

The fact that we talk about g is because we consider a priori that there actually is such a thing as "general intelligence" and that we are measuring elements of it. Maybe we are, maybe we're not; correlations alone tell us nothing about it.

The analogy with physics, be it temperature or pressure or what have you, is in my opinion quite flawed, because in the case of those entities we know that the elementary events / forces are linked by way of causality with the macro factor, and we know that for reasons other than mere correlation.

In the case of g, on the contrary, the only link between the different elements is the correlation itself. And it gets worse: to be included as a relevant IQ test, a new test has to be correlated with g. To quote from the article:

"By this point, I'd guess it's impossible for something to become accepted as an "intelligence test" if it doesn't correlate well with the Weschler and its kin, no matter how much intelligence, in the ordinary sense, it requires. (...) This is circular and self-confirming, and the real surprise is that it doesn't work better."

So g is at the same time what we're looking for and what we're building upon, having decided it's there already.

This looks much more like religion than science.


For example, one could throw into the bag height, sexiness, feet length, and BMI, which are all somewhat correlated with one another and with "IQ".

The point is that if those phenomena are all correlated with one another, there is likely to be an underlying reason for that. Maybe it's not general intelligence, but there is something.

Suppose you threw a different set of things into the bag: phase of the moon at birth, whether you were bitten by a wolf, and how good you are with computers. These things almost certainly will not be correlated, because there is indeed no relationship between them. The only reason the author discovered a large principal component in all his models is because he explicitly built one in!

Now, my analogy with physics is not to the physics of today. Rather, it's to physics pre-Boltzman. Before that, we didn't really know how the atomic theory of physics was related to thermodynamics. All we really had were correlations - boiling is correlated with burning, with faster chemical reactions, expansion of solids, with the human perception of warmth, etc. We observed they all occurred together, and postulated an underlying variable T which correlated well with all of them. We then exploited the correlation between this hidden variable T and thermal expansion to design a specific test to measure it (thermometers).

Pre-Boltzman, all we had were correlations between fundamentally different physical phenomena. We had various incorrect theories about why some of them occurred together, but that's about it.

Now suppose Boltzman comes along. He says "what you call a single variable T, I can explain with the sum of many variables." Does that mean it's reasonable to conclude T is a "statistical myth"?


See, the article is not arguing that. The article has nothing against latent variable(s) to account for what we observe as intelligence. What it is against is, misapplication of methodology. The myth is not in a latent variable summarizing intelligence, that may very well exists as admitted by the author. The myth is that g is derived in a meaningful way and explains the correlations in data instead of being a by product of the fact that the data is made to correlate (these days) and is just a measure of the correlation of the tests (now made to correlate). Again, I remain sceptical you read the article in full.

g is not as useful as T, since its ability as an explanatory variable and verification in experimental settings are sorely lacking.

Also, even if a valid concept of a single explanatory variable for intelligence were created, I personally, remain sceptical of the scope of its usefulness considering the space of complexity at hand (humans, genes, environment,...) and likely a lot less profound and far reaching as the insights of Boltzmann. And on the political side, the capacity for damage it would entail could be large - many people's lives could be impacted negatively. So it would have to be wielded carefully, one eugenics movement is enough.


What it is against is, misapplication of methodology.

Yes, and the author rightly points out that latent variables do not exclude the possibility of microstructure. That doesn't mean the use of the latent variables is a "statistical myth", unless you define the term "statistical myth" so broadly as to include temperature and pressure.

g is not as useful as T, since its ability as an explanatory variable and verification in experimental settings are sorely lacking.

This is both undisputed, and unrelated to the author's argument. The difference between our pre-Boltzman understanding of T and our contemporary understanding of g is one of precision. The author's argument was independent of precision, so invoking precision to protect his argument is disingenuous.

You might want to criticize the confidence levels of g. That's a perfectly legitimate thing to do. But that's not what I'm responding to.

So it would have to be wielded carefully, one eugenics movement is enough.

Not sure about that. The Dor Yesherim organization does such a great job of eugenics, I'd love to see further eugenics movements in other genetically isolated groups.


Yes! This is what I am trying to say but in a much more muddled and round about way.


The authors viewpoint seems to be that things that can be directly measured do exist. So I would think that the author would be find with talking about temperature the same way he's ok with talking about IQ ("That thing that IQ tests measure").

I would agree with you personally, and say that really any measurement involves some implicit figuring so there really isn't any difference between IQ and T.


A flaw with your argument is that biology is harder than physics. That is, the interactions of the elementary constituents of biology are many orders of magnitude vastly more complex than those of the point particles found in newtonian physics. Which makes sense considering the compositional hierarchy involved from atom -> cells -> humans. Hence reductionists statistical methods are likely to be far less useful/meaningful than in thermodynamics.


I do not care about the downvoting. But I am guessing it is due to my saying that biology is harder than physics [1]. Perhaps I should clarify to say that a broad mathematical understanding of biology is a harder problem than doing so for your typical system studied in physics. In biology, systems can only be modeled and not understood or derived from fundemental principles, as the interactions of the basic entities are too complex. Or consider: Quantum mechanics, it is a straight forward linear theory (just conceptually difficult) while much of biology is analytically and computationally hard, often involving things like non-linear dynamical systems. A further difficulty here is that the behaviour of the whole of the things studied in biology cannot be inferred by studying its constituents - due to feedback-dissipation processes leading to self organization.

While I am clarifying, I might as well point out that while I personally believe that the usefulness for looking for broad general factors is meaningless considering the dimensions in consideration, there might be some use for them in aggregrate. However, these should not be taken for more than they are. Statistics. Furthermore, You, yummyfajitas, argue agianst a strawman. Did you read the article? Because if you did you would have seen:

I don't want to be mis-understood as being on some positivist-behaviorist crusade against inferences to latent mental variables or structures. As I said, my deepest research interest is, exactly, how to reconstruct hidden causal structures from data.

...

Similarly, pointing out that factor analysis and related techniques are unreliable guides to causal structure does not establish the non-existence of a one-dimensional latent variable driving the success of almost all human mental performance. It's possible that there is such a thing. But the major supposed evidence for it is irrelevant, and it accords very badly with what we actually know about the functioning of the brain and the mind.

Which leads me to the fact that Shalizi has no qualm against dimensionality reduction. Instead he is against the methodologies used and how the conclusions are drawn:

1) g is almost a tautology. It is due to the correlations in the tests which are made to correlate. When performing such factor analysis on these variables a dominating factor which most explains their variance must appear due to algebraic reasons.

2) No one has tried to explained g directly, experimentally or otherwise. Instead they still argue in terms of correlations.

3) They still use simple correlation matrices - linear models from yestercentury. Other more appropriate or robust methods such as non-parametric statistics have since been refined or developed.

[1] http://metabolism.math.duke.edu/docs/04whyams.pdf


The downvoting (not by me, BTW) is probably because you didn't understand the math and simply appealed to complexity. Your appeal to complexity was both incorrect (pressure = sum of hidden variables, g = sum of hidden variables, complexity is equal) and also irrelevant (Shalizi's argument is against statistical reductions and is independent of complexity).

My objection is not to the fact that Shalizi is hypothesizing a microstructure to g. Here is what I had to say about that part of the article: "It's a great article... Thompson's model makes a specific prediction about g - it should be normally distributed. So overall, I agree with the author's scientific argument: g is very likely to be decomposable into subfactors."

My objection is simply to the term "statistical myth". By Shalizi's argument, macroscale variables which abstract away an ensemble of microscale variables are a "statistical myth". Depending on your philosophical axioms, that's a fine thing to believe - but we should acknowledge that if you believe this, then you also believe pressure is also a statistical myth.

As for the new points you raise:

1) G is a tautology if you deliberately posit a family of normally distributed variables with positive correlations. In fact, it's a tautology if you posit a family of strongly correlated normally distributed variables of any sort (the factors might just have negative components).

This does not, however, explain why performance on various tests is correlated. G and multifactor models are an attempt at explaining this. Thompson's model (which has g built in) is another.

2) Indeed, we don't really understand it. For a long time we didn't understand pressure or temperature either - all we understood was the macroscopic effects. So what?

3) You are clearly speaking about something you don't know much about. If you want to argue that linear models don't work because the data is nonlinear, do it. You haven't. Neither did the author.


Complexity is not equal because the hidden variables of g (I will use it as you do for now) is not equal to the hidden variables of pressure.

The term statistical myth is appropriate because g is a myth. It is not backed by any direct experimental evidence but arises due to manipulations of statistical techniques. The Gas Law you cite was derived from experiments not from muddled manufacturings of statistics. And the tests correleate because they are made to correlate - it is all quite circular. Your post misrepresents Shalizi. Shalizi's argument is not "macroscale variables which abstract away an ensemble of microscale variables are a 'statistical myth'". Rather, it is that the way in which the latent variable g is first arrived at and then subsequently used to drive conclusions is invalid and meaningless:

But now new tests are validated by showing that they are highly correlated with the common factor, and the validity of g is confirmed by pointing to how well intelligence tests correlate with one another and how much of the inter-test correlations g accounts for. (That is, to the extent construct validity is worried about at all, which, as Borsboom explains, is not as much as it should be. There are better ideas about validity, but they drive us back to problems of causal inference.) By this point, I'd guess it's impossible for something to become accepted as an "intelligence test" if it doesn't correlate well with the Weschler and its kin, no matter how much intelligence, in the ordinary sense, it requires, but, as we saw with the first simulated factor analysis example, that makes it inevitable that the leading factor fits well. [13] This is circular and self-confirming, and the real surprise is that it doesn't work better.

As I quoted Shalizi prior: 'I don't want to be mis-understood as being on some positivist-behaviorist crusade against inferences to latent mental variables or structures. As I said, my deepest research interest is, exactly, how to reconstruct hidden causal structures from data.'

I also do not understand how you use g, as explained it is not composed of factors. It is the dominating factor. While I do not know about Intelligence tests I did follow the mathematics he gives and have applied similar in machine learning contexts. And Considering the wide variety of data linear models fail to properly capture my intuition is that yes, linear methods and assumptions of a Gaussian is overly simplistic without a solid backing argument, which has been unable to be given for near on a century now.

p.s. you are right, my original post was orthogonal to what Shalizi had to say. This was just my argument agains't yours in that I don't feel generalizing to one factor for a system as complex as human intelligence will produce as meaningful results as generalizing to a simple law for a collection of atoms/ a gas did.


I think what the author means is that g is a statistical necessity, and that being such, doesn't imply that it's an actual reality.

What you seem to be saying is that, in any case, g is a useful proxy for a reality that is difficult to know in detail. This doesn't contradict the author's thesis.

The question remains, does g exist beyond arithmetics, is there such a thing as "general intelligence". Maybe it's a useless question, in that we don't need to know whether g "exists": we only need to know that it works.

But there is a big difference between intelligence and thermodynamics: policies are implemented that impact the lives of millions of people based on an understanding of g that may be completely false. If g is only a statistical artifact and nothing more, then it's still useful for statisticians, but for policy makers, the underlying factors need to be identified and understood.


That's why I object to the term "statistical myth" - it implies the measure is not useful in cases when it is. If g is a useful proxy, it's useful for policy makers. All they need to know is that low T implies dead homeless, and low g implies a higher probability of unskilled employment.

The fact that g is actually a sum of independent variables, or T is a sum of squares of velocities of particles is irrelevant to them.


I wish there were more Cosma Shalizis in the world. Somehow, statistics has become the ultimate in cargo cult mathematics. We need more statisticians who can write clearly to set fire to our thatched airplanes.


Pardon my naivety, but what do you mean by "thatched airplanes"?


See: http://en.wikipedia.org/wiki/Cargo_cult

"Thatched airplanes" being an imitation of the real thing.


Nogwater nails it below.

Specifically, I'm think of things like http://cscs.umich.edu/~crshalizi/weblog/491.html and http://cscs.umich.edu/~crshalizi/weblog/656.html


Even if we can't measure it, what makes it so hard to believe that intelligence is inheritable? Is height inheritable? Yes. Is eye color inheritable? Yes. Is skin color inheritable? Yes. Is physical strength....? Yes. Why wouldn't a human characteristic that has provided one of the biggest advantages through evolution not be inheritable too?


The author isn't saying that intelligence isn't heritable. He just showed that we can measure heritability in things that don't exist (the sum of a persons weight and height in some units) as well as things that exist like their actual weight and height. Therefore, the fact that g is heritable doesn't mean that its a thing that exists.

Except I think the author would point out that there isn't just _one_ heritability for things like height because height would be much more heritable in environments where everyone has adequate nutrition as opposed to in environments where nutrition was a matter of luck, for instance.

All of this assumes a certain philosophical veiwpoint on what it means to "really exist" which I don't agree with, but which I do recognize as coherent.


What if we postulate a general factor "b" that correlates with a superior body, is that inheritable? It's not that certain characteristics of the brain aren't inheritable, it's that "intelligence" is very diffuse and hard to define.

It's not even so clear that what many of us would call intelligence is mainly an evolutionary advantage, either.


The example of the German monozygotic twins Otto and Ewald, both well nourished but sportsmen who pursued different sports, show that physique is exquisitely sensitive to environmental influences even between two individuals who share a genome and a prenatal environment in the same mother's womb. Take a look at the photos.

http://www.marksdailyapple.com/control-gene-expression/

http://www.joebower.org/2010/05/we-inherit-and-we-also-becom...

AFTER EDIT: I'm asked in a reply below what my point was, and it's partly to point out that the term "heritable" means something far, far different from "determined by genes." There are whole books

http://www.amazon.com/Nature-Nurture-Environmental-Influence...

http://www.amazon.com/Genes-Behavior-Nature-Nurture-Interpla...

http://www.amazon.com/Dependent-Gene-Fallacy-Nature-Nurture/...

by professional geneticists, medical doctors, and psychologists patiently refuting the confusion in most popular literature about what "heritability" means, but the main point in this thread is that Shalizi is correct, and many psychologists are wrong, about what heritability figures mean in relation to IQ.


Not really sure what your point is there. It's natural two people with virtually identical genes subjected to vastly different training would have different body-types. It's also natural that two people with vastly different genes would respond differently to virtually identical training.

Consider someone like this boy vs his classmates: http://www.sciencentral.com/articles/view.php3?type=article&...


Is height entirely inherited? No, diet makes extraordinary differences. Is physical strength entirely inherited? No, exercise builds muscle.

There are of course factors on both sides, but for mental / physical strength I'll happily argue for landing waaaay towards the what-you-do-with-it, not-what-you-have side.


Where I live you honestly can't say that diet stunts people's growth. That would apply in places where food isn't readily available, but certainly not in the western world. From looking at people I know I can tell you too, that the differences in muscle size is more related to how much muscle they had before going to the gym than how much time they spend there.

Can you start going to the gym and get stronger than your friends who don't go? No. Out of 10 friends maybe you'll pass 1 or 2 of them in strength, but that's it. That is what I have seen but I'd say it is pretty far from what you suggest.


Can you start going to the gym and get stronger than your friends who don't go? No.

Presumably your circle of friends consists of people who are already physically active. Even so, I doubt your assessment concerning strength training. Looking at the people around me, I'm guessing fewer than 5% (not 80%) of them are up to 100 pushups [http://www.hundredpushups.com/]


You presume a) everyone has access to healthy food as kids, and b) strength training is nearly ineffective.

What, you forgot about the massive population of poor the world has, western included? Or how many in the western world eat crap, because it's easier than making things for ourselves? Height is largely determined by what your parents feed you, once you're able to be on your own you're essentially fully grown. I know a decent number of people whose parents were either relatively poor, and they subsisted on mac & cheese frequently, or were largely idiots when it came to food and raising kids. And many of them are noticeably shorter than average. My wife, for instance, is just barely over 5 feet, and her family-then-single-father fell into both categories until he joined the military.

As to the second... I have absolutely no idea where you get that idea from. Maybe everyone around you weight-trains outside of the gym frequently, or you've never done so for an extended period of time? You're absolutely nuts, though.


It wasn't about me, I've been doing sports my whole life but never tried weight lifting. Before I spend my time writing a reply, please answer this question. Do you think that if we all lived in the same environment, had the same diet, did as much physical exercise and all other variables were the same except our genes, we wouldn't see a difference of 1 foot between people or a difference of pounds of muscle mass? And more related to the article, we wouldn't see a difference in intelligence?


Foot between heights: assuming identical diets, absolutely. Different people metabolize and absorb things differently. Assuming ideal diets for everyone, I'd bet a foot would be near the limits.

Muscle mass, depends significantly on how much exercise. Some respond to it differently than others, so some would be more muscular with light exercise while others would be more with heavier. A difference though, yes; if somewhere near the middle, accounting for height / overall body build, a moderate amount of difference, but not a whole lot.

Intelligence, barring physical defects, I don't know. People certainly seem to have specialties, especially if you consider some of the "greats" of history as nigh-savants, but measuring an overall intelligence of a person is a nightmarishly subjective task. Very broad statement: not much of a difference. A lot of specialists are severely lacking in communication skills, a lot of artists in engineering, etc, and those deficiencies would have to be factored in.

At the end of the day, I'd bet we're all pretty darn similar. But I do lean significantly towards the nurture-over-nature side of the argument; do nothing, and you become a blob, regardless of your biology. It happens to animals too. There are, of course, some differences, but you have an incredible amount of control over what you make of you.


Because that would contradict their political ideology.


Yes, I think that might be it. Unfortunately, there isn't much evidence any side can put on the table. Genetics will advance a lot in the coming years, so that might put an end to the debate and that ideology will be shattered. If that same ideology impedes that we study heritability in the West, I'm sure China will still study it.


Cosma was one of my very favorite professors at CMU


There's a point he omitted: In curve fitting, exploratory data analysis, data mining, etc., we are looking for X. We don't know if X exists. But if X does exist, then our methods have a shot at finding X. So, we look. Maybe we find something. We test what we find, and it appears to work as we believe X would. So, we start to believe that X exists and we have found it.

Do like following this 'paradigm'? No! But, when people do follow this paradigm and claim to have found X, then I can't be sure they are wrong!

E.g., maybe there is a statistical model that predicts the stock market. So, do a lot of curve fitting. Find something that appears to predict. Then if there is a predictive model, maybe have found it. Test the model on old data not used in constructing the model and see if it works. If it does, then we start to believe that there is a predictive model and that we have found it or something close enough.

When people do such things, I can't say that they are wrong.


Largely I agree with him, and at times I suspected some such. E.g., just testing some software, I generated some 'random' symmetric, positive definite matrices, found the eigenvalues, and noticed that there was a big one and the sizes went down quickly. So, in linear equations, a few variables constructed from the eigenvectors of the largest few eigenvalues, can make a good approximation to all of many variables. So, factor analysis makes a good data compression technique. Can't find fault with that.

That just a few of the largest eigenvalues/vectors can explain all the data well is curious. So really he might have just used R and some Monte Carlo to show us how variance explained increases with number of factors used. I'm surprised he didn't do this.

Much or all of this has long been clear.

But what I didn't like was his drifting off into old goals of the psychologists. I couldn't figure out if he was a psychologist with an ax to grind or what. Instead, he's a statistical physicist. Curious.

The psychologists looking for 'causality' have a goal that is from tough down to impossible. We should have been concluding that. That he got off into arguing about 'causality' seemed a bit silly. I don't know all what silly stuff the psychos are trying to believe, but arguing with silly psychos is a bit silly.

But it remains: Give a test with some mental puzzle problems, and in just a linear way can explain a lot of the data with just one factor. Curious. Maybe somewhat useful. 'Causality'? Likely not if only because we know that there is a biological and neurological basis and have made no connection with that.

Then I didn't like his use of 'factors': He has some factors correlated. No: The usual approach is that, like all the eigenvectors are orthogonal, the factors are all uncorrelated. Maybe the psychos look for some uncorrelated 'factors' trying to get at some of their guesses about causality, but in this case he should have been more clear.

Finally, he wants Gaussian to justify being interested in means, variances, and covariances. Well, in the Gaussian case, sample mean and variance are 'sufficient' statistics. But even without Gaussian, means and covariances remain important, e.g., for the inner products in the Hilbert space of L^2 real random variables.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: