"(an obvious example that may have had potential 5 years ago was sentiment, although now everyone and their dog is offering that and it's not entirely clear there's direct value to it)"
Presumably the sentiment you are referring to is real time financial sentiment derived from online chatter. A source other than online chatter wouldn't yield enough data in terms of time granularity or symbols, i.e we've had sentiment data for decades in the form of the AAII and NAAIM, they just weren't useful as data sources for active trading.
5 years ago a statistically significant sample set of investors and traders wasn't chatting online to yield actionable data. Today there is, the data has alpha, and smart money is trading on it.
No, actually, I am referring to real time sentiment analysis of any kind - services such as http://www.dataminr.com/ or http://www.infotrie.com/ do this today mostly using online chatter, which you correctly point out probably would not be possible 5 years ago. However, real time news sentiment analysis would have been possible and relatively novel 5 years ago. For instance, Columbia's Paul Tetlock published a very interesting paper in the Journal of Finance in 2007 ago on the subject here: http://www0.gsb.columbia.edu/faculty/ptetlock/papers/Tetlock... (he used words from a Wall Street Journal column). If you read the paper, you'll find he used daily data over the 16 year period 1984-1999. While this is not of the same granularity you could get from services like InfoTrie or Dataminr, it certainly is of high enough granularity to trade on (and could have been done 10 years ago apparently!) and appears more useful than AAII or NAAIM.
I am interested in your claim that the data has alpha and smart money is trading on it. I have been looking for any peer reviewed, academic articles actually documenting that alpha (ie. by anyone except the people trying to sell you the data). One such paper I found was Bollen et al's http://arxiv.org/abs/1010.3003. From my understanding, there are several problems with the methodology (I could find a rebuttal paper I read if you're interested) and the methodology was used in Derwent Capital - although the fund only traded it for 1 month: http://venturebeat.com/2012/05/28/twitter-fueled-hedge-fund-...
DataMinr is an event detection service not a 'real time sentiment'
infotrie does sentiment analysis on news stories they also mix this news sentiment with social media sentiment. This is a muddled approach and not pure real time sentiment.
Let's back up a moment. You wouldn't expect Gallup to have accurate surveys of the presidential election by processing news articles and blogs. They poll individuals in brief snapshots of time. Polling for sentiment is the same thing. News articles and blog posts are amplified opinions of a small set of people, they are a poor substitute for direct measurement of many many people.
The Bollen paper is widely discredited in our space. Derwent licensed the technology and subsequently ran into trouble because the tech and data was not sound.. though Bollen's initial theory was correct.
Not peer reviewed but you can see a few papers Deltix published documenting alpha:
News articles and blog posts are amplified opinions of a small set of people, they are a poor substitute for direct measurement of many many people.
You might be correct (sounds very reasonable) but I guess the question is why do I necessarily care about directly measuring many many people? What if loud, amplified opinions are actually better indicator of market momentum?. Tetlock's article suggests that those amplified opinions actually do contain useful information. Again, I have yet to find any conclusive evidence that measuring people's chatter actually ... matters.
Derwent licensed the technology and subsequently ran into trouble because the tech and data was not sound.. though Bollen's initial theory was correct.
Right - I think there are more than a couple complaints about his paper. Why do you say his initial theory was correct though? If I recall correctly, his initial theory was that social chatter contained market information that could be extracted using neural networks and used to predict short term market returns (like a few days forward). I haven't really seen anything along the same lines (especially using his approach of neural networks) anywhere - although the articles you link to might be considered similar.
I am just reading through the articles you posted from Deltix now, sound interesting. One problem is that because of the relatively novelty of services like PsychSignal, the data does not go back as far as traditional indicators (this was a problem with Bollen's paper as well). When you say you have impressive alpha, sharpe, or whatever for 6 months.. that does not really prove much, especially in the eyes of academics. I'd love to replicate or work with this data presented in Deltix - but given the data is proprietary, that's difficult to do.
side note:
This was just mentioned in the Wall Street Journal yesterday, http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1807265 claims to find signal in blog posts. That's not to say measuring many people's chatter wouldn't work as well or better though. Also just heard of paper today, so no guarantees about quality but maybe someone finds it interesting.
Presumably the sentiment you are referring to is real time financial sentiment derived from online chatter. A source other than online chatter wouldn't yield enough data in terms of time granularity or symbols, i.e we've had sentiment data for decades in the form of the AAII and NAAIM, they just weren't useful as data sources for active trading.
5 years ago a statistically significant sample set of investors and traders wasn't chatting online to yield actionable data. Today there is, the data has alpha, and smart money is trading on it.