When I wrote "In this case it measures the relative reduction in MSE" I meant ex...

justk · 2024-07-04T14:23:32

Let d1 = data[state==0] and d2 = data[state==1], then var(d1$pref) = 0.26, var(d2$pref)= 0.26 and var(d$pref)= 0.256 (using R and one of your dataframes), so the intuition is that knowing the state does not give information about the preferences of the voters, so this suggests that any model based on state should give poor results and so having R^2=1 is not a big paradox in this case.

There must be a formula to compute R^2 from variances both among states and inside states but anyway, when the variances inside any state are bigger that the total variance that should imply that the feature that divides the population in groups is of little value for prediction so it should have a small R^2 value.

kgwgk · 2024-07-04T14:30:32

That seems more or less what I said in the comment you replied to: the prediction of individual votes remains quite bad even if the state is taken into account. That's why the relative reduction in MSE is low. That's why the R² is low. I don't think there is any paradox.

I was replying to someone who claimed that "R2 is not the correct measure to use. This article is a perfect example of the principle that simply doing math and getting results is not necessarily meaningful." I've not seen any comment from anyone getting "different results" with a different measure.

Edit: You used var(...) which includes a factor N/N-1 and doesn't give exactly the total sum of squares.

The example dataframe contains 40 observations (20 per state) and you get higher variance estimate for the subsamples than for the aggregate sample but if you put toghether a few copies of the data (for example doing "data <- rbind(data, data, data, data, data)") even the adjusted (unbiased) estimator of the variance is lower for the states.

You can calculate the "exact" values yourself doing (x-mean(x))^2 or undoing the adjustment:

  > var(data$pref)*39/40
  [1] 0.25
  > var(data[data$state==0, "pref"])*19/20
  [1] 0.2475
  > var(data[data$state==1, "pref"])*19/20
  [1] 0.2475

> when the variances inside any state are bigger that the total variance

They are not. But you're right in that a small difference shows that dividing the population in groups is of little value for prediction and that's why the R^2 value is small.

justk · 2024-07-04T15:18:03

The correcting factor n/(n-1) in R is what explains my paradox about the law of total variance Var(Y) = E(var(Y|X)) + Var(E(X|Y)), I was obtaining result that don't match this formula because I corrected all the variances with the factor 20/19 but the total variance should have the factor 40/39 just like you pointed. Thanks for the comments and the correction.

I just added another comment that relates analysis of variance to this post to show that there is no real paradox here.

Finally, the formula for the total variance above is related to my intuition that having some information (having the data for each state) should make the means of the variances in each group smaller that the total variance, because variance is related to lack of information. But analysis of variance suggests (see other comment of mine) that the state factor is not representative because the high variance in each group (each state) and the low difference between the groups means and the total mean.