k-D trees and BSPs don't work well in high dimensions (say >20 to 30 dims), since your true nearest neighbor is highly likely to lie on either side of any dividing hyperplane.
If each dividing hyperplane separates a set of vectors into two roughly equal parts, then for N vectors you have something on the order of log2(N) separating planes in order to get down to a single vector per terminal node. But for, say, a billion vectors in a 100-dimensional space (and assuming something like a k-D tree where you partition each dimension at each level), you have only partitioned log2(10^9) ~= 30 dimensions this way, yet there are 70 other dimensions remaining through which you can travel to find your true nearest neighbor (the curse of dimensionality; in high dimensions everything is "near" everything else) that you have not checked if you are discarding subspaces along the way.
This is a problem for other approximate methods as well though. IVF methods still do a form of partitioning (via Voronoi cells) but this is obtained by clustering the data so it is more attuned to the global structure of the data rather than greedy divide and conquer. But in IVF methods it is still possible that you need to check every IVF cell in order to find your true nearest neighbor, just that it is less likely to happen in practice because the partitioning is more attuned to the overall structure of data rather than via greedy subdivision.
I wonder if nearest neighbor is actually what you want to begin with. Imagine points distributed in a U-shape, the nearest neighbor might require jumping across the gap but a better match could be a bit further away but somewhere along the shape. Especially if the dimension do not have a meaningful intrinsic scale, I could just scale the x dimension and pull the two sides of the U apart. So maybe I could actually be more useful to look at the topology of the points, something like a Vietoris–Rips complex [1] but invariant under [local] rescaling of dimensions. Not sure if something like that exists or is possible in a robust and efficient way.
I think you would prefer something like a support vector machine over k-nn for better learned point-wise local rescaling (you could also use eg ISOMAP or MDS to try to linearize the manifold), which you could also use w various tda methods (see the giotto-tda github repo jupyter notebooks for some similar examples)
If each dividing hyperplane separates a set of vectors into two roughly equal parts, then for N vectors you have something on the order of log2(N) separating planes in order to get down to a single vector per terminal node. But for, say, a billion vectors in a 100-dimensional space (and assuming something like a k-D tree where you partition each dimension at each level), you have only partitioned log2(10^9) ~= 30 dimensions this way, yet there are 70 other dimensions remaining through which you can travel to find your true nearest neighbor (the curse of dimensionality; in high dimensions everything is "near" everything else) that you have not checked if you are discarding subspaces along the way.
This is a problem for other approximate methods as well though. IVF methods still do a form of partitioning (via Voronoi cells) but this is obtained by clustering the data so it is more attuned to the global structure of the data rather than greedy divide and conquer. But in IVF methods it is still possible that you need to check every IVF cell in order to find your true nearest neighbor, just that it is less likely to happen in practice because the partitioning is more attuned to the overall structure of data rather than via greedy subdivision.