Hacker News new | past | comments | ask | show | jobs | submit login

No, it is not possible to approximate the median in the general case. The linked paper does no such thing. Their algorithm can be completely defeated by carefully tailored inputs. They only ever test them against very well-behaved random distributions.

But it brings up an important point: real data that you actually encounter is not a random sampling from "the general case". It is often possible to do a pretty good job approximating the median from real-world data, if you have some understanding of the distribution of that data. You just have to accept the fact that you might be totally wrong, if the data behaves in ways you don't expect. But of course, the more you have to know about your data before running the algorithm, the less useful the algorithm itself is.

The difference is whether you can make guarantees or not. Most algorithm design is concerned about such guarantees: on all possible inputs, this algorithm has at worst such-and-such performance. Hence the reliance on Big-O. You cannot ever make such guarantees on a "median approximator" without specifying some sort of precondition on the inputs.

What guarantees do we want to make? With "an estimator", it'd be nice to say something like: we'd like our approximation to get better given more values. That is: the more values we see, the less the next single value should be able to change our approximation. If you've looked at a billion values, all between [min, max], it'd be nice if you knew that looking at the next one or two values could only have an effect of at most 1 / f(1 billion) for some monotonically increasing f. Median does not have that property: looking at just two more values (even still within the range of [min, max]) could move your final answer all the way from max to min. If you stopped 2 data points earlier, your answer would be as wrong as possible. This remains true, for some inputs, even if you've looked at 10^10^10^10^300 values. The next 2 might change your answer by 100%.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: