The benchmarks really depend on the classifier used. The explanation in this blog post is more about the natural language parsing model. I thought that part up based on books I read from thought leaders (Gleick, Kurzweil, J. Hawkins). The value I'm hoping to provide is less in my own research and more on the use of a graph database. I'll leave the research part to fine people like you.
As for benchmarks, I've seen differences at different sample sizes during training. The model seems to do better with more training examples. Though that increases the number of features and the dimension of the vectors when calculating cosine similarity. I'm really hoping to attract more input like this as Graphify grows as an open source project. Please feel free to get in touch with me. Skype is kenny.bastani.
As for benchmarks, I've seen differences at different sample sizes during training. The model seems to do better with more training examples. Though that increases the number of features and the dimension of the vectors when calculating cosine similarity. I'm really hoping to attract more input like this as Graphify grows as an open source project. Please feel free to get in touch with me. Skype is kenny.bastani.
I'll post benchmarks in the next blog post.