This comment is similar to the comment I wanted to make because I also thought i...

tomhazledine · 2023-12-08T10:45:46 1702032346

Even that simple explanation makes my brain itch a little :D I never did master trig

I'm curious if there ARE alternative methods to cosine similarity. A lot of the things I've read mention that cosine similarity is "one of the ways to compute distance..." or "a simple way...". But I've not seen any real suggestions for alternatives. Guess everyone's thinking "if it ain't broke, don't fix it" as cosine similarity works pretty darn well

seanhunter · 2023-12-08T11:41:52 1702035712

Yeah there are a few other ways. The most common are the “L2 norm”, which would be the hypoteneuse of a right triangle. so if your points are (x1,y1), (x2,y2) then it is sqrt((x1-x2)^2 + (y1-y2)^2)) which you might recognise from Pythagoras’ theorem (c^2 = a^2 + b^2). If you have 1000 dimensions then instead of just twice for x and y you are doing that that a thousand times but the principle is the same.

Another one is “Manhattan distance” (known as the L1 norm or sometimes as “taxicab distance”), which is just abs(x1-x2)+abs(y1-y2) in that example. If you imagine a set of city blocks and you want to go from one place to another the cab has to go north/south and east/west and can’t go diagonally. That’s the distance it travels. You’re adding up all the North/south parts and the east/west parts.

There are a bunch of other distance measures eg one project I worked on we used Mahalanobis distance which is a more complex measure which adjusts for dimensions in your space being affected by covariance. That wouldn’t be useful for this particular problem though.