That cosine distance works at all as a concept isn't terribly shocking, especial...

dvt · 2025-03-11T07:29:06 1741678146

> behaves identically to euclidean distance

Tbh, I would argue that's also pretty surprising, as Euclidean distance is notoriously unintuitive[1] (and noisy) in higher dimensions. (I guess norming does help, so that's likely a good point.)

[1] https://bib.dbvis.de/uploadedFiles/155.pdf

cionx · 2025-03-11T19:30:22 1741721422

> every finite set of vectors, with your favorite metric, can be embedded in euclidean space with at most ~41% relative error

I’ve never heard of this before. Do you have a reference?

hansvm · 2025-03-11T20:24:18 1741724658

I don't have any references off the top of my head, but I can state a result (which hopefully is easier to search) and then build to the thing I claimed:

- There's a well-known result that says you can embed any finite, undirected, unweighted graph (with shortest-path as the metric of choice) in Euclidean space with at most sqrt(2) multiplicative error (~41% relative error). The proof of that, as I recall, basically shows that the worst-case embedding is any graph with a 4-vertex cycle, from which the result immediately follows.

- Construct the complete graph where each vertex is one of your (finitely many) vectors.

- Let G(0) denote that graph.

- Greedily construct a sequence of graphs G(1), G(2), ... by adding one extra vertex splitting some edge. The edge you split is the one that minimizes skew between your metric's definition of distance between your original set of vectors and the new distance defined using shortest paths. When more than one equivalent split exists, choose one.

- Note that G(N) consists of better and better rational approximations of the desired distances, approaching 0 skew as N approaches infinity.

- Using that well-known result, we can embed G(N) with at most ~41% relative error, and as N grows the extra relative error approaches 0, so multiplying those together you get ~41%.

- Delete the extra vertices from G(N) to obtain the embedding for your original vectors.

- This isn't strictly necessary, but you can (I'm 90% sure -- I'd have to work through these details very carefully to be positive) obtain sqrt(2) exactly (instead of just less than sqrt(2)+epsilon for every positive epsilon) in a few ways. The one I'm most familiar with falls back to logical compaction, basically taking all the math that exists for the hyperreals and using that to create a nonstandard space holding all of these graphs. The limit approaching 0 can be phrased as a first-order statement with consequences that allow you to conclude that there exists an embedding which actually attains the limit. Topological arguments of some sort would probably be a more common way to go about it.

hansvm · 2025-03-12T03:25:04 1741749904

I thought about it a bit more:

- That last point (attaining sqrt(2)) follows from Bolzano-Weierstrass. Just unroll the embedding into an n^2-dimensional space if you have n points, and that sequence of embeddings (as a technical detail, center them on the origin to squeeze them all into a bounded space). The function (skew of that embedding) of the limit equals the limit of the function in this case, so there is some embedding attaining that bound.

I'm still not exactly sure how the result I'm basing this all on is proven.