Deep Learning Sentiment Analysis for Movie Reviews Using Neo4j

lightsidelabs · on Sept 18, 2014

How does this compare to other benchmarks? It looks like you're using the sentence-level dataset out of Cornell, based on your Github. Even a naive unigram baseline easily beats the 70% threshold you mentioned in your post. A few years ago, I co-authored [1] a publication with very similar graph-based features on this dataset that achieved 77% accuracy, and the state of the art has moved beyond that since then. Without a comparison to baseline it's hard to tell whether this (much more sophisticated) technique is adding value.

[1] Shilpa Arora, et al. "Sentiment classification using automatically extracted subgraph features." Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text.

kennybastani · on Sept 18, 2014

The benchmarks really depend on the classifier used. The explanation in this blog post is more about the natural language parsing model. I thought that part up based on books I read from thought leaders (Gleick, Kurzweil, J. Hawkins). The value I'm hoping to provide is less in my own research and more on the use of a graph database. I'll leave the research part to fine people like you.

As for benchmarks, I've seen differences at different sample sizes during training. The model seems to do better with more training examples. Though that increases the number of features and the dimension of the vectors when calculating cosine similarity. I'm really hoping to attract more input like this as Graphify grows as an open source project. Please feel free to get in touch with me. Skype is kenny.bastani.

I'll post benchmarks in the next blog post.

izyda · on Sept 17, 2014

Nice post but when you say you used deep learning, what exactly do you mean? You describe your method for picking your features and then you used deep learning to find features from that presumably should be the most informative for classifying.

It would be helpful to know what specific deep learning algorithm (convolutional, deep belief?). Or at the very least, what / who's implementation of neural nets did you use in your model and how it compares performance wise to the more conventional tools in NLP (when you give them the same original features to start with).

kennybastani · on Sept 17, 2014

Great question. To be honest, I used deep learning algorithms as a metaphor into Neo4j's property graph data model. Graph databases like Neo4j store data as a graph, which is a similar data structure to a neural network. I store weights in the relationships based on the frequency that a feature has been matched from the low-level representations near the bottom of the tree, to higher-level representations.

So there are two parts, there is building a natural language parsing model and then there is a Vector Space Model classifier that uses TF-IDF weights as vectors to calculate the cosine similarity between inputs.

I explain more about the high-level idea here: http://bit.ly/1lMjSm5

Let it be known that I've arrived at most of this stuff by means of intuition and graph data modeling in Neo4j. I'm a hobbyist when it comes to the machine learning stuff. My goal is to show how amazing a combined application/persistency solution, like a Neo4j extension, is for solving these kind of machine learning problems.

People smarter than me should take a look at it to solve similar problems.