The dot-product of two vectors quantifies their similarity -- in fact, we call it "dot-product similarity" in a nearest-neighbors context: If the dot product > 0, the vectors point in similar directions; if < 0, the two vectors point in dissimilar directions; if = 0, the two vectors are orthogonal.
To make it more explicit, dot product of two vectors is just cosine of the angle between them, multiplied by their lengths.
The angle and cosine start to lose their geometric intuition when we go beyond 3D.
The concept of correlation has no issue with additional components. The concept of similarity of two 17-element vectors is clear. In fact correlation intuitively scales to "infinite component vectors": the dot product becomes multiplying two functions together and then taking an integral.
The Fourier transform of a periodic signal is based in the concept of how similar the signal is to a certain basis of sine/cosine quadratures spaced along a frequency spectrum. This is like a projection of a vector into a space; only the vector has an infinite number of components since it is an interval of a smooth function.
> The angle and cosine start to lose their geometric intuition when we go beyond 3D
... they do?
"Geometry" in general loses intuition beyond 3D, but apart from that, angles between two vectors are probably the one thing that still remains intuitive in higher dimensions (since the two vectors can always be reduced to their common plane).
Even angles behave very counter-intuitively in high dimensions. E.g. in high dimensional spaces uniformaly randomly chosen vectors always have the same inner product. Why? Sum x_i y_i is a sum of iid random variables, so the variance goes to zero by the central limit theorem.
I would say that this is intuitive. For any direction you pick, there are (n-1) orthogonal directions in nD space. It's only natural that the expected inner product drops to zero.
The variance goes to 0 only if you normalize, which is to say that two random high-dimensional vectors are very likely to be close to orthogonal (under mild assumptions on their distribution).
I agree that that's one of those important but initially unintuitive facts about high dimensions. Just like almost all of the volume of a reasonably round convey body is near its surface. But it also doesn't really contradict the GP comment.
Most people don't find arguing from formulas intuitive unless the formulas themselves are intuitive. If you truly believe they are, I'd be curious to know why.
There is an intuitive version of this. Volume in n dimensions is C*r^n (C is some constant) and surface is the first derivative, leading to a ratio of n/r (the C constant cancels out). Hmm... Maybe not that intuitive
But the former formula already tells you that most of the volume is near the high range of r: that which was to be shown.
The surface area to volume concept adds nothing.
Because the volume of a sphere is proportional to r cubed, you know there is much more volume between r in [0.9, 1.0] than in the same sized interval of r [0.0, 0.1].
You can find the break-even point almost in your head. At what r value is half the volume of a R = 1.0 sphere below that value? Why, that's just the cube root of 1/2 ~= 0.794. So almost half the volume is within 20% of the radius from the surface.
That's far from the claim that almost all the volume is near the surface: half is not almost all, and 20% isn't all that near. However, you can see how it gets nearer and nearer for higher dimensions.
For a ten dimensional sphere, the tenth root of 1/2 is ~ 0.933. So over half the volume of a ten dimensional sphere is within the 7% depth.
The surface area to volume ratio is just a limit of the shell volume to total volume ratio as the shell thickness goes to zero. So both should asymptotically scale with higher dimensions in the same way.
Word2vec with embedding size 300 and more do refute your claim. I successfully trained word2vec model with above embedding sizes and used inner product similarity to create word clusters as it is out of the box there. Then I made a clusutering language model and got significantly lower perplexity compared to word-based language model.
Not really, for example, in physics, lines in 4D are just as meaningful as they are in 3D, more even (they are called geodesics). So are the angles between them. The real problem is that we just don't have good intuitions of higher dimensions in general.
that these are perpendicular to each other, which I will easily call ninety degrees, and that two such collinear vectors are at zero degrees.
But I somehow wouldn't go from that intuition into specific cosines. Like "Oh, look, if I divide out the lengths from the dot product, I'm getting 0.5! Why that's the cosine of 60 degrees!"
> dot product of two vectors is just cosine of the angle between them, multiplied by their lengths
How do you define the "angle" between two n-dimensional vectors? Most likely using the dot-product and the arccos. "cos(angle) times lengths" might give a good intuition for 2D or 3D space, but it doesn't help in higher-dimensional vectors.
It generalizes perfectly. The angle between two lines in any dimensions is the same concept.
Two (non-collinear) lines share a plane. The angle on that plane is just the ordinary angle, no matter how many dimensions the two lines are embedded in.
In the case they are collinear, the angle between them is zero on any plane that intersects them. So that corner case works too, regardless of numbers of dimensions.
> Two (non-collinear) lines share a plane. The angle on that plane is just the ordinary angle, no matter how many dimensions the two lines are embedded in.
Okay, but now you've got a plane in n-dimensional space. How do you define/calculate the angle between the two vectors without falling back on the dot product?
You could say: The angle between the two vectors A and B is defined as the smallest rotation around their common normal vector so that the rotated first vector points in the same direction as the second vector. But what is the normal vector? It's a third vector C which is 90° to each of A and B. Now your definition is cyclic. Okay, then: C is a non-zero vector so that dot(A,C)=0 and dot(B,C)=0. Now you're back to using dot-products for defining angles in higher-dimensional space.
take the first line/vector, then just call that line/vector "first-new-dimension".
Then take the second line/vector, and decide that this vector can be written as a 2-dimensional vector made out of "first-new-dimension" and "second-new-dimension", now you just have to figure out what "second-new-dimension" is.
A simple trick would be to measure the length of the line, and then add/remove a multiple of the first dimension until the length of the line becomes as short as possible, this new line is your "second-new-dimension".
Now, even if you are working with a 10-dimensional space, you have two lines that only exist in the first two (new) dimensions, so, you can treat them as two-dimensional objects and find an angle using your old 2-dimensional methods.
To make it more explicit, dot product of two vectors is just cosine of the angle between them, multiplied by their lengths.