I've just skimmed the paper, and this new algorithm looks very nice -- the authors call it "LargeVis."
According to the paper, LargeVis improves on Barnes-Hut t-SNE in two ways: first, it uses the idea that "the neighbors of my neighbors are likely my neighbors too" to construct an approximate graph of nearest neighbors in the high-dimensional space in a manner that is computationally much more efficiently than the method used by t-SNE. Second, the authors apparently have found a clever way to use SGD to map this graph to two (or three) dimensions with computational cost linear in the number of nodes.
If the authors release an open-source implementation, LargeVis looks likely to supplant t-SNE as the go-to algorithm for visualizing high-dimensional data.
The idea the neighbors of my neighbors are likely to be neighbors is basically an assumption of positive curvature. In large codimension embdedded submanifolds can have very negative curvature, in which case neighbors of neighbors might not be as likely to be neighbors as one might first think.
Firstly - in general it's trivially true that you have probability 0 to cleanly embed a high dimensional space into a 2/3 dimensional representation over the set of all possible high dimensional data - yet interesting data often does have lower-dimensional structure.
Secondly - so what? Can you think of plausible scenario where this assumption does not hold and it's possible to generate a low-dimensional embedding? If it's impossible to embed, then it's not an algorithmic problem if you fail to find an embedding.
Correct. The authors in fact mention this in the paper, and state that this is probably not an issue because in practice most large high-dimensional datasets tend to have points which lie on/near embedded submanifolds of much lower dimension.
I wonder why they are positioning this as visualisation technique rather than dimensionality reduction which would be a much bigger deal. Is there something to suggest the technique doesn't work well for reduction in arbitrary dimensions?
Disclaimer: Havent read it but have read summaries here.
It sounds like it is very fast but not very rigorous. This lets you get a feel for the data but it doesn't give you the same guarentees other dimensionality reductions do.
I read a paper [1] about visualizing the Deep Q-Network used for playing Atari games. They map to 2D the game state and then colorize it by the Q scores. As a result they could visualize the strategy and observe how it is hierarchically organized in clusters containing sub-clusters. They could show regions associated with initial and termination rules. This method can be used to map out a game strategy and to focus on specific regions.
The dimensionality reduction problem space rarely tries to reduce down to only two or three dimensions. The fact that in the end you are being reduced to so few dimensions allows for a lot of potential errors to not meaningfully impact the result.
Anyone know if its possible or makes sense to use something like t-sne for dimensionality reduction? If so, could the reduced data set be used to build a classifier?
I wish academics would publish pure python implementations of their "new" algorithms. Standard python with Pypy is enough for speed of development and runtime.
The biggest thing about t-SNE is that it's been used in competitive machine learning for quite a long time successfully by many different people because it's on R via CRAN and Python via sklearn. LargeVis has potential, but it could also be not so useful like the vast majority of academic work.
After a paper like this comes out, usually someone adds it to their library or makes a public implementation of it. Usually there are more than one. Maybe the authors aren't the best at implementing a clean version of it.
Any version is better than none. Even if it is just a basic/crude reference implementation that library-authors can use to add the algorithm/functionality to their library.
This seems like a nice and useful method, but the conclusions of the paper seem problematic, specifically that the resulting visualizations are more "effective" and higher "quality" than t-SNE (or anything else for that matter). Can that aspect of a visualization method be judged by only showing a few examples, without any tests of how people actually use or benefit from it?
As far as I see authors haven't decided to publish their implementation of the algorithm. That's a real pity given the fact that it's a really useful one.
According to the paper, LargeVis improves on Barnes-Hut t-SNE in two ways: first, it uses the idea that "the neighbors of my neighbors are likely my neighbors too" to construct an approximate graph of nearest neighbors in the high-dimensional space in a manner that is computationally much more efficiently than the method used by t-SNE. Second, the authors apparently have found a clever way to use SGD to map this graph to two (or three) dimensions with computational cost linear in the number of nodes.
If the authors release an open-source implementation, LargeVis looks likely to supplant t-SNE as the go-to algorithm for visualizing high-dimensional data.