Hacker News new | past | comments | ask | show | jobs | submit login

I've been using a variation of quickselect to find median rows of a matrix at certain columns.

There's a problem with quickselect: while it will find the median, it doesn't properly pivot the median around itself if the median occurs multiple times. So you could end up with the median sprinkled around both sides, which may or may not be a problem (it was for me).

One way to solve this is to take a second pass over the data to pivot -- which is essentially the core of QS where you nibble from both ends and swap low/high values.

Another nice property of QS is that it'll be approximately sorted, with values generally growing closer to the median towards the middle. This helps if you need to do repeated sub-selections on the partitions.




There is actually a paper about exactly this:

"Optimal Pivot Selection in Fast Weighted Median Search"

ieeexplore.ieee.org/iel5/78/6236322/06193457.pdf

It gives a formula for the optimal subset size to use to find the pivot and also does some optimization after the first partitioning step.


I like quickselect, because it is essentially quicksort. The difference is that one of the two partitions is ignored, since you know the median is in the other partition.


The two-pivot variant has a smaller recursion depth in addition to avoiding the multiplicitous median issue, but does come at the cost of extra swaps. But in the end, quickselect is what you use when you don't worry about pessimal performance.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: