Do Social News Sites Deteriorate? (Analysis of 1.8M HN Comments)

pg · on May 31, 2011

"Thus, if we look at the PG analysis as a measure of his mental state, we can speculate this is due to the prosperity he’s experienced in these last few years."

Actually it's because the site has become so large that I have to be more diplomatic. When it was small I could say harsh-sounding things without worrying about being misinterpreted. Now I have to be more of a politician.

tansey · on May 31, 2011

That was my other hypothesis, but it didn't really correlate well with the growth in the site's user base. HN has experienced an exponential growth rate, but your comments show a linear decline in negative impact. I went with the "better life" theory because it seems like personal prosperity and happiness would be more likely to show gradual, linear change. This is also supported by the trends I saw in other famous top HN users, particularly those that I know have experienced increased financial prosperity.

staunch · on May 31, 2011

Instant link bait if you let us look up our own charts.

djcapelis · on May 31, 2011

Maybe they decided to become more diplomatic as the community has gotten larger and begun to have real impact on people's reputation. Why do you think that would just be limited to pg?

I would think HN-famous people would especially be prone to become more diplomatic as their comments began to have more impact.

tansey · on May 31, 2011

Hi everyone,

So there seems to be a lot of skepticism about whether our algorithm can actually measure emotional impact accurately. For the long answer, I'll refer you to the about page[1] for EffectCheck. For the short answer:

My co-founders are an AI PhD and a Clinical Psychology PhD. They spent three years curating a huge dictionary of words using a methodology similar to the Harvard Psychosocial Dictionary [2], but with the twist that they were focused on lexical impact of words. The dictionary is pretty accurate at measuring both impact and sentiment [3]. For example, we can predict Amazon reviews as being positive or negative, using the stars to validate if we are correct-- blog post on that coming soon.

Regarding context of the word usage: Both behavioral studies and fMRI scans have confirmed that context is not as important as one might believe. Our brains process multiple meanings of words in parallel, and the emotions associated with those words linger in our subconscious even after we know the correct context. Similarly, in cases of reviews and comments, people who use hostility-evoking words are often hostile themselves (angry people tend to make others angry) and the same is true of the other five fundamental emotions we measure.

Happy to answer more questions. Also happy to analyze any data that you would like to see in order to verify accuracy of the algorithm-- just give me a link or the text. :)

[1] http://effectcheck.com/about

[2] http://www.wjh.harvard.edu/~inquirer/homecat.htm

[3] Note that sentiment is correlated to impact only if the writer or speaker is writing without detailed attention to word choice. For example, political speech writers comb over every word to make sure they have the desired impact-- thus, the final text likely has little correlation to the sentiment of the orator.

hugh3 · on May 31, 2011

I can easily believe that you can predict quite well ratings of Amazon reviews by a simple keyword-counting algorithm. Just by counting words like "good", "enjoyed", "fantastic" vs "terrible", "boring", "awful" you're gonna get a very strong correlation in that limited domain.

But what tests have you done on your broader methodology? What experiments can you really do to figure out the extent to which the use of the word "nosegay" is correlated with actual depression?

Also, as someone else said, where are the error bars? If there really is a correlation between word choice and other metrics, then some simple statistics should give you error bars on your other metrics, right?

Oh, one more thing: in the example on your website you say that the sentence:

"That joke kills me!"

is "subconsciously" aggressive. My question: would your algorithm rate that at exactly the same level of aggression as the sentence:

"I'm gonna kill you!"?

cuz, y'know, intuitively one seems rather more aggressive than the other.

rhizome · on May 31, 2011

It seems that the point is to introduce their special-sauce black box, with an argument to authority about its methodology. I think the correlations you ask for are where the problems will lie, in that there is a value judgement that is being hidden. If I can put myself out on a limb here, I'd say that that measurement is going to be fundamentally unscientific.

tansey · on May 31, 2011

> Also, as someone else said, where are the error bars? If there really is a correlation between word choice and other metrics, then some simple statistics should give you error bars on your other metrics, right?

Please see my response to the original question about this. [1]

"That joke kills me!" vs. "I'm going to kill you!" (corrected grammar as we do not usually focus on slang)

They do not score exactly the same, though they are both very high on anxiety, hostility, and depression. The former has high marks for happiness and compassion due to the word "joke" being used.

[1] http://news.ycombinator.com/item?id=2603946

jlees · on May 31, 2011

You claim that context is not important, but surely in a domain-specific corpus such as HN comments you have to take domain, if not context, into account?

For example, words that are generally positive may be only used in a snarky context in hacker circles, whereas words that are generally negative may similarly be used with positive affect (e.g. the word "hacker" itself). Did you customise your lexicon at all for this genre?

Also, in my experience, it's not the lexicon of words -- heck, SentiWordNet has existed for long enough -- it's what you do with that input signal that counts. Many "bag of words" approaches are on the shelf now, and very few of them are particularly accurate or clever, but most of them do the job just well enough that they get away with it.

tansey · on May 31, 2011

...surely in a domain-specific corpus such as HN comments you have to take domain, if not context, into account?

That's a great insight and I believe you are correct. In general, I think the domain does have an impact and I am working on some automatic demographic profiling technology. However, for a first-look, I think our general word dictionary covers an abundant set of terms that do not vary much between domains. For instance, your comment has the following anxiety-evoking words:

not, important, but, example, positive, used, hacker, negative, long, bag, but, away

Of all those words, only "hacker" would likely need to be adjusted for the HN domain. For a first look, we'll chalk that up to some minor added noise in the data.

EffectCheck does include the ability to automatically profile the document type (e.g., "Press Release" or "Motivational Speech") based on a training corpus. For example, most press releases have a high degree of confidence-- based on a corpus of press releases, we normalize the levels so that "Very High" on confidence for a press release requires much higher confidence elicited per-word than it would in a complaint letter.

jlees · on May 31, 2011

Is there really much of interest in press releases, or were they just a handy corpus you had around? I started my PhD off looking at affect in scientific papers, and changed tack because there basically is none.

Anyway, I guess I'm not clearly understanding your definition of "anxiety" because few of those words evoke anxiety as I would define it -- not? bag? away? I'll try to dig into whatever you've published to deeper understand your system, but I'm not convinced by the standalone word approach yet, though I see what you're trying to do (I think). I'd really want to check out the bigrams/trigrams, even if you don't do full parsing or any intermediate sophistication. Obviously there's an efficiency aspect, so I guess if your results are good enough already and you have customers, no need to make sacrifices.

On a tangent, how does your system interpret common HN words like "healthcare", "Obama" and "education", I wonder? Could you have an "intellectual" or "hacker-ish" dimension based on manual categorisation of words into "this is stuff HN should be talking about" vs "this is stuff that belongs on reddit"?

tansey · on May 31, 2011

>Is there really much of interest in press releases, or were they just a handy corpus you had around?

Actually, both. Corporate marketing departments spend a lot of effort crafting the proper press releases. Entire agencies exist to help companies with press releases. I think we make a great companion tool for professionals in that area. We have an example of how EffectCheck can help with press releases on our blog [1].

>On a tangent, how does your system interpret common HN words like "healthcare", "Obama" and "education", I wonder? Could you have an "intellectual" or "hacker-ish" dimension based on manual categorisation of words into "this is stuff HN should be talking about" vs "this is stuff that belongs on reddit"?

We certainly can do that, and we offer it as a service to clients that want hand-crafted, demographic-specific dictionaries (like political campaigns).

[1] http://blog.effectcheck.com/2011/04/14/delivering-bad-news-a...

Geee · on May 31, 2011

I'm pretty sure that even when we don't know how accurately it can measure the 'absolute' values, we can clearly see the change which is relative to the comments on the same context and domain. It can be argued if it really measures happiness etc., but I'm sure there's correlation in the change even if it's not accurate.

jarrett · on May 31, 2011

Check out the results for their own webpage. Click "Try EffectCheck on this page!" here:

http://effectcheck.com/tour

I'm a little skeptical.

It rates "anxiety" as "medium-high." Yet I don't feel the slightest bit anxious reading that page. Nor does the author come off as anxious. It all feels very positive to me. (I'm counting the text alone, not the graphic design.)

It rates "compassion" as "very high." I wasn't able to find a description of what exactly the compassion measurement means in EffectCheck. But I'll go out on a limb and assume a high compassion rating means a text will either make you feel like the author is compassionate, or make you feel compassionate yourself. I don't feel either for that page. Not that it's hostile--I just don't feel anything on that axis when I read the page.

It's entirely possible that I've misunderstood how EffectCheck's ratings should be interpreted. It's also possible that I'm missing the point by comparing how I think I feel against EffectCheck's results. Maybe we're unreliable observers of how a text emotionally affects us, and EffectCheck more accurately predicts what's really going on deep in our brains. In which case, forgive my unwarranted naysaying.

tansey · on May 31, 2011

>It's also possible that I'm missing the point by comparing how I think I feel against EffectCheck's results. Maybe we're unreliable observers of how a text emotionally affects us, and EffectCheck more accurately predicts what's really going on deep in our brains. In which case, forgive my unwarranted naysaying.

Bingo. Our algorithm focuses on the lingering, subliminal emotions that you aren't necessarily aware you're feeling. Though not related to the emotions we measure, one commonly-referenced study shows that if you're walking down the street and you see a big sign with the word "ELDERLY" written on it, you'll actually walk slower without realizing it [1]. Lots of words have these subconscious impacts and you simply cannot accurately poll your own brain to determine how you feel.

[1] See "Stumbling on Happiness" for a great overview of this study: http://www.amazon.com/gp/product/1400077427/ref=as_li_ss_tl?...

scott_s · on May 31, 2011

How can we falsify your claims? That is, what is a test we can perform that if it went a certain way, would show that your claims are false?

jarrett · on May 31, 2011

That seems very believable in a general sense, and I have vague memories of other studies that suggest similar things.

The remaining question for me is whether and how EffectCheck's results have been compared against actual measurements of these subliminal effects.

ChuckFrank · on May 31, 2011

This sounds like solid and worthy research. I would be especially interested to see your Amazon words v. stars results. Just off the top of my head, I can see a ton of uses for your research -- More accurate real time political polling / Better product assessments across the vast internet landscape for both designers and brand managers / Policy analysis for new policy proposal and communication evaluations. This has fantastic potential. Please show us the testing and results when you get to it. Thank you for sharing.

PaulHoule · on May 31, 2011

I know this stuff works. Some friends of mine from Cornell were able to train an SVM to predict people's moods on Livejournal.

noelsequeira · on May 31, 2011

I don't mean to come across as hostile or undermine the effort that's been invested, but I can't help but be a tad skeptical about these visualizations, especially given the fact that the author (in what might be construed as cavalier fashion) presents seemingly nebulous metrics like they are absolute matter-of-fact ("Anxiety/Confidence Ratio", "Hostility/Compassion Ratio", "Depression/Happiness Ratio").

It would certainly help if the algorithm used to compute these metrics were shared and dissected. I tend to believe that sentiment analysis is much more art than science, given how tricky and profound context can be.

To conclude, I'm not going to end this comment on a negative note. I'd much rather reserve my vitriol and caustic criticism for another thread and day and abstain from calling this an attempt at gaming HN to promote a startup.

I do this not out of a lack of indignation towards what I've just read, but because I'd like to end this seemingly hateful comment on a note that isn't bitter or negative but is instead quite the opposite (without using a single word that would help the OP's algorithm figure this out).

And with that, I throw down the gauntlet. Analyze this!

tansey · on May 31, 2011

>And with that, I throw down the gauntlet. Analyze this!

Sure. Scored with EffectCheck:

Anxiety - Very High

Hostility - High

Depression - Very High

Confidence - Low

Compassion - Low

Happiness - Very Low

I will post a longer explanation detailing how/why it works, since others have had this question as well.

rotoscoped · on May 31, 2011

This is excellent. Would you be willing to check this text from my blog please?

"It's said that great military commanders, chess players and Go players feel physical pressure on their stomachs when their game pieces are threatened, and the pressure indicates the moves to make. This full-body thinking communicates much more rapidly than purely deductive mental reasoning. The intuition is the result of thousands of prior episodes where such reasoning was employed, acted on and the outcome experienced in all its pain or glory.

Other than hours of practicing the game, or whatever one does, the only other way to improve the chance of learning this physical intuition is to be sure one's body is not sending conflicting signals. Please eat well and exercise."

tansey · on May 31, 2011

Sure. Here you go:

Anxiety - High

Hostility - High

Depression - High

Confidence - Medium

Compassion - Very Low to Low

Happiness - Low to Medium

rotoscoped · on May 31, 2011

Haha, thanks. Looking forward to an explanation of this.

sfk · on May 31, 2011

I'm not sure that I understand. You have just demonstrated that the algorithm does not work.

ma2rten · on May 31, 2011

Depends if you actually ment what your wrote. I think his point is that if you did, you would have chosen a different wording from a psychological point of view.

waterlesscloud · on June 1, 2011

I'd argue that it worked perfectly, actually.

What were the poster's true motivations? Not what they stated, but what they truly meant.

recoiledsnake · on May 31, 2011

What this kind of analysis needs is a blind test. Ask three groups to rate text independently in the same metrics - 1) General public. 2) Psychologists 3) Algorithm.

The results should be interesting. However, if the authors claim subliminal effects, then I do not how that could be tested.

hack_edu · on May 31, 2011

Whats your model for folks who have proven mental health issues? I have little faith that a clinically depressed, or worse, person would fit the same profile.

And there's got to be enough folks with mental health problems to destroy your margin of error, this is the lonely internet afterall.

rryan · on May 31, 2011

Where are the confidence intervals and error bars? In order to be taken seriously when aggregating 1.8MM comments, you need those. The variability in pg's plots makes me think the data for any individual is going to be just as noisy.

tansey · on May 31, 2011

We are looking at a time-based analysis of the monthly means of PG and the HN community. The null hypothesis is that the slope of a regression line should be 0. Since we are dealing with a time-based analysis of the mean, the variance of each bucket is irrelevant. I re-ran the analysis to make sure that my results were statistically significant:

Anxiety/Confidence PG: p <= 0.0007 HN: p <= 0.0001

Hostility/Compassion PG: p <= 0.0005 HN: p <= 0.0001

Depression/Happiness PG: p <= 0.0067 HN: p <= 0.0001

Note that for the HN comments, even though I used a 2nd order (parabola) fit for the graphs, the p values above are for a linear regression as that is the more appropriate fit for determining statistical significance here.

vecter · on June 1, 2011

p values only make sense when the residuals are normal and independent. Is that true for this data set?

pgroves · on May 31, 2011

What jumps out at me is that the community curves all peak in the months following May 2009, which is when the stock market bottomed. The community was least happy when the economy was at it's "darkest before the dawn" moment.

In other words, I wonder if these graphs are just tracking general sentiment, and would look the same for any site.

tansey · on May 31, 2011

It's a good question, and I guess we'll see how well the correlation holds. It's worth noting that the stock market bottomed in March 2009, so this may be a lagging indicator.

topomorph · on May 31, 2011

In case anyone wants to make their own trends (less sophisticated, but maybe more transparent), I have an app at http://hn-trends.heroku.com that plots word percentages on HN over time.

For example, some emotive words that appear to have increased since the beginning:

* sad: http://hn-trends.heroku.com/trends?q=sad

* fuck: http://hn-trends.heroku.com/trends?q=fuck

* shit: http://hn-trends.heroku.com/trends?q=shit

Some words that appear to have decreased:

* great: http://hn-trends.heroku.com/trends?q=great

* cool: http://hn-trends.heroku.com/trends?q=cool

* lol: http://hn-trends.heroku.com/trends?q=lol

* reddit: http://hn-trends.heroku.com/trends?q=reddit

* digg: http://hn-trends.heroku.com/trends?q=digg

I haven't looked at any of this closely, so make of it what you will.

(I was meaning to wait until I had time to dig deeper and add extra features like time series smoothers and trend/slope metrics before "releasing" this, but figured this might be useful now given the post. I still plan on adding those later.)

ChuckFrank · on May 31, 2011

Certainly we can all argue about the underlying structure of the analysis of the data. However, I think that @tansey has done two great services here:

1.He's proposed a full service taxonomy with by proposing a name for the phenomenon SND. Which is much a much better choice than JTS - Jumping the Shark or otherwise.

2. He's asking how we can evaluate that phenomenon. Proposing one solution.

So the question becomes, how else can we evaluate the phenomenon and what can we do to reduce SND?

Well having spent time elsewhere, here are a number of clear indicators of SND:

1. Shorter, less thoughtful responses, often veering into humor or the absurd. With chuckles getting the most upvotes

2. Less fact checking and less source linking in both posts and comments

3. More image / pic posting

4. Linkjacking, with materials not linked to the original materials.

5. More community centered posts aka AMA etc.

6. Fewer news links.

So perhaps simply evaluating the length of comments of that same 1.8M HN data could support PG allegations.

The next question is what can be done to prevent SND?

I think that would be clear: Don't support the characteristics that lead to the decline.

stcredzero · on May 31, 2011

1. He's proposed a full service taxonomy with by proposing a name for the phenomenon SND. Which is much a much better choice than JTS - Jumping the Shark or otherwise

I think "Evaporative Cooling" is descriptive and very apt.

recoiledsnake · on May 31, 2011

> 5. More community centered posts aka AMA etc.

I consider AMA the best kind of self post... I don't see community centered posts as necessarily a bad thing except if there is too much of drama or he-did, she-did type content.

nbashaw · on May 31, 2011

How does the author measure anxiety, hostility, depression, confidence, compassion, and happiness? That seems to be left out.

tansey · on May 31, 2011

Hi! My startup [1] is based on an algorithm that can measure the emotional impact of text. Our blog is where we post interesting results from applying our algorithm.

[1] http://effectcheck.com

raganwald · on May 31, 2011

Speaking of your startup, it appears to have "Oracle Pricing" as its revenue model. You have to talk to a salesperson to get a price, which is often code for "We need to know how much you can afford and how much you are willing to pay before we quote a price."

I like everything else I see.

gammarator · on May 31, 2011

Trying the sample EffectCheck script on http://effectcheck.com/about : the sidebar says it's counting occurrences of phrases which score on each dimension. How is the association of word/topic determined? The measurement is valuable if and only if "Anxiety" defined as the count of ("different", "trigger", "taken out", "over", "carefully"...) corresponds to anxiety on some other psychologically relevant scale.

hugh3 · on May 31, 2011

Uhhhhh-huh. But how does the algorithm work, and (more importantly) how do we know whether it works or not?

pavel_lishin · on May 31, 2011

I assume the actual algorithm itself is not open sourced.

I'm wondering if there's anything that trips it up, like adding points to a comment's "hostility" score for profanities, which some people use when they're fucking excited about an idea.

brudgers · on May 31, 2011

Hypothesis: PG's mood is improving because YC of which HN is a part is increasingly successful.

Hypothesis: The net mood of HNer's is deteriorating because it the ratio of financially successful members is lowered as a more people join. In addition as HN grows the ratio of preexisting and external relationships among members gets smaller, e.g. the number of HN'ers affiliated with YC companies has not kept pace with the overall growth in HN membership.

elblanco · on June 5, 2011

At risk of coming out of HN retirement, a friend forwarded this topic to me. I wrote an extensive commentary before signing-off:

http://news.ycombinator.com/item?id=2059012

(now back to peaceful slumber)

bhousel · on May 31, 2011

Any plans to compare against corpora from other social news sites?

BTW, I think what you're doing is brilliant, and I can think of about a dozen practical applications for your technology.. Ignore the naysayers.

tansey · on May 31, 2011

Thanks! I'd be happy to analyze of corpora like reddit or Digg, but I don't want to scrape the sites as that is a great way to have my IP banned. Are there publicly-available datasets that I can download somewhere?

mef · on May 31, 2011

Thanks for this interesting analysis.

I would challenge the assumption that the symptoms of SND include a decrease in the positive emotion measured in the average HN comment.

Is it not possible that there is a growing segment of the HN user base which posts and enjoys the kind of comment that induces negative sentiment in people like pg? Would that not raise the sentiment of the average HN comment, while at the same time lowering the sentiment of the average "good old days" user?

sidww2 · on May 31, 2011

Are the number of votes for a posts available? It could be interesting to use the metrics given to compare posts with high vs low number of votes and could also help validate the use of given metrics.

tansey · on May 31, 2011

I did originally have a similar idea, where I started out by looking at the top 10% of comments each month (ranked by upvotes). However, the resulting trends looked pretty similar to the overall community: http://blog.effectcheck.com/top_dep_hap.png

recoiledsnake · on May 31, 2011

Others have done a good job at critiquing the automated analysis, so I will concentrate on the hypothesis.

>There is a perceived phenomenon among online social news communities (e.g. Digg, reddit) that as the popularity of the community increases, the average member becomes baser and the overall quality of discourse decreases.

I think that's not an accurate portrayal. Having been through the "decline" of Reddit (been reading it even before it had comments!), I observed that the real decline was in the quality(as perceived by me) of stories that made the frontpage(s) and in the quality(as perceived by me) of comments that were upvoted.

In 2006, I would literally read almost every story on the first few pages and the content was very interesting, there were long essays and articles that were on the front page. Now? After the influx of the general public and the juveniles(not saying this in a demeaning manner), quality has dropped in every way. Pun and argumentative threads were the ones that were on top. I do like humor but not just endless pages of snark and humor. Pictures or story-in-headline posts requiring an attention span of 3 seconds tend to get the most votes,

>If a community were suffering from SND, there are a few symptoms we might observe:

>Anxiety, hostility, and depression would rise.

>Confidence, compassion, and happiness would fall.

>The ratios of anxiety/hostility, hostility/compassion, depression/happiness would rise.

>Since each of the negative emotions has an opposite emotion, the last point enables us to measure the general negative/positive trend in each of the three main categories.

How does this analysis address the above points? All the supposed-to-be-funny short comments probably would count as a positive under this analysis. Reddit's community now is pretty happy, confident and compassionate. The analysis is pretty lacking when applied to a supposed-to-be intellectually stimulating social site.

Anyway I see that Slashdot(although not as popular now) and HN seem to have bucked the trend. Although there is some decline on HN, it's not as as steep or as bad as Reddit so far.

Edit: Here's an idea, do your analysis of all the comments of the first x posts of Reddit vs. the same on HN. Could be interesting since they both have almost the exact same format.

AJ007 · on May 31, 2011

Here is a metric that should be measured, reading level: http://www.google.com/support/websearch/bin/answer.py?hl=en&...

In the 16 or so years I've been online I've watched many communities grow, expand, and vanish. For better or worse a large community never functions the same as a small one. Beyond theories and conjectures, it would be interesting to know why.

ignifero · on May 31, 2011

The solution: divide the community to more specialized subgroups. That's what http://textchannels.com is doing