Hacker News new | past | comments | ask | show | jobs | submit login
Success of a Marriage in 15 Minutes? (slate.com)
79 points by JeffJenkins on March 15, 2010 | hide | past | favorite | 35 comments



Slate and Rolling Stone continue to be some of my favorite websites strictly because of this level of reporting. When I first heard of Gottam's claim in a separate article on another site my immediate thought was "anyone can. They'll just say everyone stays together and get 80% accuracy." This is the problem with non-scientific researchers. They don't want to learn about statistics. They want easy, simple claims that everyone can understand. The corresponding problem with the media is that they do not understand science well enough to be critical of researchers' claims.


I love anything that points out Gladwell's pseudo-intellectual bullshit too.


He fitted a curve to the training data and called it a prediction. How is this not fraud?


I'd say it would only be fraud if he misreported his methods in his papers. I think the Slate author just says that she assumed his methods to have been different than they were, after reading Gladwell's story about them.

Also, it seems to me that if you make a model that fits the data, that doesn't at all mean that it doesn't have predictive power. You could fit it with one data-set, and test it on another one. It just means that you now have a purely empirical model, with no built-in assumptions on why it is so.


Not testing the model on a validation set doesn't mean it doesn't have predictive power... but if you don't test it, there's no way you can say it does have predictive power. And saying it predicts the data 87% of the time when you're using the training set is in fact fraudulent.

I can't believe that no one called him on this before, or that he got away with this! Anyone who has take a college statistics course knows that you have to have separate sets for training and for validation. There are even a ton of statistical methods to deal with this problem with a small data set, i.e. bootstrapping your training set. There's no excuse.

In fact, I remember reading some Feynman lecture in which he specifically mentions how bad this is... anyone got a citation for me?


I remember reading some Feynman lecture in which he specifically mentions how bad this is... anyone got a citation for me?

Perhaps you are referring to this passage of "Cargo Cult Science":

"There is also a more subtle problem. When you have put a lot of ideas together to make an elaborate theory, you want to make sure, when explaining what it fits, that those things it fits are not just the things that gave you the idea for the theory; but that the finished theory makes something else come out right, in addition."

http://calteches.library.caltech.edu/51/02/CargoCult.pdf

(Caltech lecture series is primary source)

http://www.lhup.edu/~DSIMANEK/cargocul.htm

(There are several online reprints of this lecture text in HTML, more searchable than .PDF format.)

See

http://en.wikipedia.org/wiki/Cargo_cult_science

for background and both of those citations.


> In fact, I remember reading some Feynman lecture in which he specifically mentions how bad this is... anyone got a citation for me?

It's interesting that we still long for citations of some authority figure --- even when we all agree. (And I, too, would like to see what Feynman said.)


It's more of a longing to figure out what the heck I'm remembering. This happens to me all the time with people, too... someone will look really familiar but I can't place them at all, and it drives me crazy.


I'd say it would only be fraud if he misreported his methods in his papers.

Fair point, you are right. Many authors do, in fact, misreport in this fashion, but I have no evidence that John Gottman has done so.

I think the Slate author just says that she assumed his methods to have been different than they were, after reading Gladwell's story about them.

No, she doesn't just say that. This is not a difference of opinion, it's a matter of fact of what is written in the actual article. "I think" does not apply.

Also, it seems to me that if you make a model that fits the data, that doesn't at all mean that it doesn't have predictive power.

That is not what I said.

You could fit it with one data-set, and test it on another one. It just means that you now have a purely empirical model, with no built-in assumptions on why it is so.

That is a valid methodology, but not the one the article says he used. What was described in the article was pure curve-fitting to the training data, which is not predictive. You can only claim predictiveness by using the generated curve on a test set and seeing how well it does, which is exactly what the article says he did not do. You are arguing in circles!

You should, in fact, expect to fit a curve to a training set with higher accuracy than its general predictive power! You can only test its predictive power on test data.


> That is not what I said.

Right, sorry, I should have been clearer that I was reacting to the article and not to you there. Or actually, I was reacting to what I thought the article said, because indeed that's not what it said! Thanks.


> I think the Slate author just says that she assumed his methods to have been different than they were, after reading Gladwell's story about them.

I did too, because I used to think Gladwell was a reliable source. Oops!


At least as this article describes it, it's not even predictive on its face.

He did not assess "fresh" couples, and come back a few years later to see how they fared. Rather, he interviewed people who were already either having trouble or they weren't. Although I haven't studied it for proof, I have to assume that the interactions of a honeymoon couple will be different from a couple in trouble, even if the honeymoon couple is destined to have problems. Thus, he's not able to interview the honeymoon couple and determine what their prospects are.


Well, to be fair, if I give you the data set:

a=1 b=2 d=4 e=5 f=6

and you predict that c=3, you are not committing fraud.

The gold standard would be testing on some more couples and finding the model he made is predictive.

However, I would be convinced by a strong correlation (i.e. the model is not overfitted), coupled with a plausible model. I'm not a statistician - does anyone know if there is a measure of how over-fitted a model is based on degrees of freedom and size of sample?


To properly use your analogy, if I first saw your data, then "predicted" that "a = 1.1, b = 2.1, d = 3.99, e = 5.01, f = 3.9999" and claimed that I had "predicted" your data with 90% (or whatever) accuracy, that is substantially misrepresenting what I have done.

However, I would be convinced by a strong correlation (i.e. the model is not overfitted), coupled with a plausible model.

That is a problem. That should not be enough to convince you. It is enough to make it an avenue worth exploring.


I don't think so. I think my analogy was that given the data, if I can create a model that accurately predicts the data given, and is plausibly simple, i.e. for X=Y, Y=ordinal(X), then I can have some faith in the model.

To put it another way, if Gottman said: Add up these 5000 variables multiplied by these factors to get the score then I wouldn't believe it. But I think he said something much closer to (on p. 13 of the 98 paper): We tried a couple of combinations of variables, that we had hypotheses for. One combination gives a very accurate predictor (Table 1, Husband high negativity). Two others are also accurate (Table 1, low negativity by husband or wife.)

I think where the paper went wrong, and what the Slate article is criticising is the claim made in the second column of p. 16, where they do seem to put all the variables into a model and pull the answer from nowhere. So, I'll give you that the 82% prediction is not justifiable. But negativity as a predictor for divorce seems well supported.

---

If I drop 30 monkeys out of different height trees and I measure 100 things about each monkey, each second, as it falls, and then I give you a model that purports to tell you a monkey's speed from those 100 measurements, then you won't be surprised to received an over-fitted piece of junk.

If instead I give you speed=constant * t^2, would I need to drop another 30 monkeys?


This is one of those things where the size of the result set let's you easily over fit data. Take 50 people born in 1970 and you could make a fairly simple model based on their birth day to predict if they were living or dead because you only need 50 bits of information and 10 of them can be wrong. There is even a strong bias where most people born in that year are still alive.

Edit: As a simple rule of thumb compare the complexity of the formula with the number of accurate bits in your output compared to the most simple model possible. If the formula is anywhere near as complex as your results it’s probably over fitting the data.


Fraud implies he did it on purpose, it seems likely in this case that he didn't realize that his methods were invalid.


I agree, it's not fraud, strictly speaking

However, to create a "predictive model", and then not check it on fresh data, is so staggeringly inept as to be reckless.

He should have known better, and the zillions of folks who parroted the results really should have read the study carefully.


It is a bit disappointing. At my university, we were taught of this fallacy in the mandatory introduction to science course. I remember a beatiful analogy which went something like:

As long as both your sample data and your test data are collected between January and May, you can make a model that accurately predicts the time the sun sets based on the length of your hair.


The evidence would abundantly suggest that merely being exposed to this idea in a course is not sufficient to guarantee that the resulting scientist will have any clue.

I see two problems here: First, by the time you get to that level of schooling, you've developed a "resistance" (for lack of a better term) to schooling. That is, you've long since internalized the fact that there are two worlds, the one described in school and the real world, and the overlap between the two is tenuous at best. Merely being told about this stuff in the school world isn't enough, you somehow need it to be penetrated down to the "real" world. Computer scientists get a bit of an advantage here in that after being told about this, their homework assignment will generally be to implement the situations described and verify the results. Computer courses are rather good at bridging this gap; most disciplines lack the equivalent and a shocking amount of basic stuff is safely locked away in the "schooling" portion of the world for most people.

This is a result of the second part of the problem, which is that the teachers themselves help create this environment. I went up the computer science grad school approach for this stuff, where not only do they show you the fundamental math, they make you actually implement it (like I said above). I got to watch my wife go up through the biology stats course, and they don't do that. At the time I didn't notice how bad it was, but the stats courses they do are terrible; they teach a variety of "leaf node" techniques (that is, the final results of the other math), but they never teach the actual prerequisites for using the techniques, and in the end it is nothing but voodoo. Students who consign this stuff to "in the school world but not real" are, arguably, being more rational than those who don't; things like confidence interval and p-values are only applicable in Gaussian domains, and I know they say that at some point but they don't ever cover anything else from what I can see. By never connecting it to the real world (very well) and indeed treating in the courses as a bit of voodoo ("wave this statistical chicken over the data and it will squawk the answer, which you will then write in your paper") they contribute to the division between school and real.

So, I'd say two things: Those who say this stuff is taught in "any college stats course" are probably actually literally wrong in that it is possible to take a lot of stats without really covering this, and secondly, even if covered, it ends up categorized away unless great efforts are taken to "make it real" for the student, which, short of forcing implementation on the students (impractical in general) I'm not sure how to do. Most scientists seem to come out of PhD programs with a very thin understanding of this stuff.


He was a math major and should certainly know better and thoroughly know his methods. I am unwilling to give the benefit of the doubt in this case.

Update: Also, he did it twice, and the article implies he has continued doing it. That's way beyond simple methodical error.


I agree with the several commenters who have commented that statistically speaking, the cited researcher has not yet achieved "prediction." He has done some interesting curve-fitting on a smallish data set, but has not tested his curves on fresh data sets--something that every responsible scientist must do sooner rather than later, if the scientist wishes to speak of "prediction." I indicated my agreement with other comments by a bunch of upvotes, and also posted the citation for Feynman's "Cargo Cult Science" lecture.

That said, my wife and I have been using a couple of Gottman's recent books to reexamine aspects of our twenty-six-year-plus marriage, and there is some good advice for couples in Gottman's writings. It may not all be rigorous science, but clinical experience based on close observation of many couples can be helpful to any one couple who want to enjoy their marriage more and to pass on advice to their children (four, in our case) about how to have happier marriages in the next generation. So once we all agree that proper science involves TESTING models developed through analysis of one data set on other data sets, we can start some application of the clinical observations on ourselves and see what we think after trying this out at home. My wife and I have been pleased to become acquainted with these writings and to discuss them together.


i realise this might be too personal, but i'd be interested in knowing what you thought was useful after all that time (i've been in a relationship for around 15 years).


Styles of talking while discussing problems. EVERY marriage has problems, for sure, but how a couple deals with problems can make problems part of what enriches the relationship, or part of what drags it down. We encountered some new kinds of problems in the last few years (from the outside), and thus had this interest recently.


thanks. sounds interesting, i'll look out for a copy...


Scribd link to original paper: http://www.scribd.com/doc/28394063/353438

JSTOR link to original paper: http://www.jstor.org/stable/353438


The title of the paper being "Predicting Marital Happiness and Stability from Newlywed Interactions" is quite meaningful in light of what is being discussed in this thread.

Full citation is

John M. Gottman, James Coan, Sybil Carrere and Catherine Swanson, Predicting Marital Happiness and Stability from Newlywed Interactions, Journal of Marriage and Family, Vol. 60, No. 1 (Feb., 1998), pp. 5-22


tl;dr Researcher coded fairly short conversations between about 60 married couples, then six years later he fed their marital status and self rated happiness into the computer. He used modeling software to create an optimal equation for prediction of marriage and happiness based on the variables he had coded for originally.

It's a good first step, but it's not strictly predictive. It's like an older financial model.


Sweet, delicious over-fitting. What a way to start the day d;D.


I'm surprised to that the discussion here is only about how Gottam came to his results, although that's fair game and I agree with you.

But there is learning here, too. My mother is a psychologist and she says that the most destructive thing she sees in marriages is the level of toxicity. Couples can have lots of fights, but if they never lose respect for each other, they're usually ok. It's when this respect devolves that trouble begins.


There's a journal article criticizing that exact assumption: "The Hazards of Predicting Divorce Without Crossvalidation"

http://www3.interscience.wiley.com/journal/118971429/abstrac...


Is about whether metrics gathered from a 15 minute session, in which a man and woman argue a contentious issue, can accurately predict divorce rates. The metrics are primarily measured using facial recognition software it seems. John Gottman, who claims to predict divorce with a 91% accuracy, is the main subject. QI.


I'm surprised he didn't use other resampling (e.g. http://en.wikipedia.org/wiki/Resampling_%28statistics%29) techniques to validate the models. Not that resampling is a perfect answer, but...


If he now interviews a new set of couples, does his predictions according to the developed model, and it _still_ gives high accuracy, then we can call it science.

The power of a good model lies not in how good it predicts what you know, but how it can be used to model unknown/new phenomena.


Makes you wonder how many of the other studies in Gladwell's books are equally flawed




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: