It's worth keeping in mind that with things like these you get what you put in.
When the service first came up on the front page, I punched in some personal writings from my journal into it. Comparing my results from that to this HN assessment is like night and day. The HN assessment shows me as highly intellectual, imaginative, and adventurous, with a strong value for self-enhancement. The personal stuff showed me as being much more emotional, with a huge emotional range, because—surprise, surprise—I tend to write more about emotional parts of my life in my journal than on HN comments.
While this service might be useful to get a broad sense of people for marketing purposes, using it on an individual basis is like talking to a fortune teller—it could tell you nearly anything about yourself, and you'd be able to come up with an explanation to justify it.
Exactly. We adopt a different tone based on the venue and, since no one shows the same face everywhere, even with the same name, these are going to be wildly different.
Ironically, this really did feel like a machine based cold-reading.
It's a bit more complicated due to the reddit api, but if you run that, then run redcom <user> and it should throw all your comments into <user>-redcom.txt
Never user/heard of jq before, it's a pretty nice tool.
Is there any insight as to how this works? I got an Openness rating of 97% and a Harmony rating of 100%, both of which I know are not true. (I also received a Love rating of 1% under my Needs, although that's pretty accurate.)
I, too, got a "Love" rating in the low single digits. That's probably not too far from the truth in a global sense, but remember that in this case, the API is working with a local context (HN comments). I'm probably unlikely to have discussed the subject of love on HN, or anything tangentially related to it, or anything that would somehow give an indication of my need for love. Now, if you were to run my Facebook history through the same process, I imagine you'd find a slightly different analysis.
To some extent, our personalities are our personalities. We behave, on some level, the same in every context and in every community. But the extent to which that's the case is up for debate. We probably use different approaches, or if you prefer, we show different aspects of our personalities, in different contexts and in front of different groups. That's why it's extremely difficult to take one context (HN, for instance), and extrapolate universal characteristics from it.
Even within the set of HN history, I got some oddball results. For instance, Watson considers me very "Fiery" (51%) here. There are plenty of areas in my life in which the word "Fiery" makes a bit of sense. HN isn't one of them.
A problem I've observed with these kind of blackbox systems is that the process from input to output really is a mystery.
When the results are right, they're just "right" so you should accept them, when they're wrong they're actually also right by whatever magical hamster wheel is operating inside of the thing and you just don't "get it".
The problem is that humans like to have some clue as to how the results were derived, something easy to explain that gets the gist across. Something like "Watson counted all the words you use and compared them to different reference lexicons to arrive at the score". This provides a little bit of context so we understand the semantics of the result and how to consider them and reason with them.
But for all we know the results we're seeing are from some arbitrary stochastic method:
openness=rand(90,99)
harmony=rand(90,100)
etc.
For things like this to be accepted by the users (humans) there needs to be a quick explanation for how this works otherwise we get head scratchers.
We plan to add more information to our docs soon about the service, including a description of each of the traits, and possibly reference some of the many data sources used.
Meanwhile, you can do a search on "IBM System U" (the project's not-so-internal code name.) This particular slideshare.net prez has some great info on the methodology, validation tests and references: http://slidesha.re/1ri0vPV
I also get rather high Openess and harmony scores, though I'm particularly amused that Hedonism is scored. How can I be 72% sympathetic and 47% coopoerative but not agreeable?
The IBM documentation doesn't really say anything about how these numbers are calculated
This is very cool. Although I don't know how accurate this is, if calibrated and tuned to yield a certain degree of accuracy it will have a variety of use cases.
For example -- When interviewing someone, being able to run their github username (if known of course) to analyze their commit messages, comments, discussions. Or even their hn, reddit, twitter user names (if the usernames are linked with their first names, nothing creepy). It will potentially help to identify candidates that are downright rude, arrogant etc.
Or analyze internal mailing lists, hipchat/slack channels for co workers who are potentially burnt out.
>>It will potentially help to identify candidates that are downright rude, arrogant etc.
This sounds very dangerous to me. I assume when recruiters and/or lead engineers decide to reach out to me via LinkedIn, they did their homework on me. I purposely link to enough stuff for them to realize "smtddr" is my handle. The same blog & youtube channel in my HN profile is also in my LinkedIn. But I expect a human to look, not some computer judging me. I can totally see people getting lazy and just doing stuff like only filtering for people who rate 80% on openness or something. Then everyone will start grooming their posts simply to get positive results... then someone will create a social website that claims to block those scanners so people can say whatever they want.
It just forces people underground and the filters won't work anymore since at that point you might as well assume everyone is gaming the system.
(fwiw, I'm also against standardized tests. Anything that forces a whole group of people to start grooming themselves for a very specific measurement kills diversity, imho. Since the very term "standardized" kinda goes against the concept of diverse... and people become lazy and just rely on such tests to make or break the deal)
But can you really judge a person based upon commits them might make?
You can read the words they wrote but you cannot perceive the tone they wrote it in....I dont think this is helpful and may dismiss candidates that are otherwise perfect except when Watson analyzes them.
Another commenter mentioned http://watson-um-demo.mybluemix.net/demo above, so I threw this together in bash for reddit comments. Bit more of a pain that entering a username, but if you take this:
run it in bash, then run redcom <user>, it should throw all your comments into <user>-redcom.txt which you can then copy/paste to http://watson-um-demo.mybluemix.net/demo
Yeah, I think it's worth keeping in mind it's scoring you based on your comments on a particular kind of site which tends towards certain kinds of interactions.
Does Watson internally use observations drawn from this study[1]?
And another question---is this applicable to non-native English speakers? Do they acquire the same language habits as if English was their mother tongue?
See my response to minimaxir above about additional documentation coming soon, and a link with good background info on the technology. Meanwhile, here are brief descriptions of the Big 5 Personality traits:
Big 5 Personality:
- Openness - associated with curiosity,
intellect, and an appreciation for art and adventure
- Conscientiousness - associated with organization and
industriousness
- Extraversion - associated with positive and outgoing
attitudes toward other people
- Agreeableness - associated with compassion and
cooperation toward other people
- Emotional Range - associated with a sensitivity to negative emotions
For more information on systematic associations between personality and individual differences in word use, please refer to studies like Tal Yarkoni, "Personality in 100,000 words: A Large scale analysis of personality and word use among bloggers", 2010
When the service first came up on the front page, I punched in some personal writings from my journal into it. Comparing my results from that to this HN assessment is like night and day. The HN assessment shows me as highly intellectual, imaginative, and adventurous, with a strong value for self-enhancement. The personal stuff showed me as being much more emotional, with a huge emotional range, because—surprise, surprise—I tend to write more about emotional parts of my life in my journal than on HN comments.
While this service might be useful to get a broad sense of people for marketing purposes, using it on an individual basis is like talking to a fortune teller—it could tell you nearly anything about yourself, and you'd be able to come up with an explanation to justify it.