Hacker News new | past | comments | ask | show | jobs | submit login

> Reservoir sampling is still O(N), and it requires generating a random number for every element.

I agree with your comment in general. Only: you can do better than generate a random number for each element. (Generate a random number to tell you how many elements in the steam to skip.)




Can you show that this will still maintain a uniform probability distribution?


> Can you show that [generating a number of records to skip instead of a yes/no for each record] will still maintain a uniform probability distribution?

That should work if you pull the skip count from the appropriate distribution (negative binomial, I think).


Yes, you have to work out the right distribution, and how to sample from it (at all, and perhaps how to sample from it efficiently).


Yes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: