> “Which pages are getting an unusual hit in the last 30 minutes?”
A good statistic for this is an exponentially weighted moving average, which can be computed online by a leaky integrator using a single accumulator register per counted entity.
It's not so much as counting, as estimating a rate.
> a single accumulator register per counted entity.
That requires O(n) memory space, where n is the number of counted entities. If you're willing to spend this much memory, the problem is easy. Algorithms like the one in the article (or the paper I mentioned) are for when you want to use far less memory than that.
Count-min sketches can be adapted to use exponential decay. Presumably any sort of recurrent decay could be used to get approximately rectangular windows as well.
Why not store the items in a random order? This can be achieved by performing a sort using a random key. Sorting can be done as a background job, so that the list is always "reasonably" random.
The items are stored in a map in random order (Red -> 45, Blue -> 23 etc). The picture makes it easier to read by showing many red boxes but the actual implementation just stores a number for each item label.
My understanding is that HyperLogLog will estimate the cardinality e.g., the number of different types of events in the stream. The frequency counting algorithms in the article can be used to estimate the number of occurrences of each of the most frequently occurring types.
E.g., Where the frequency counting algorithms might estimate the following counts for a given stream:
A: 2, B: 5, C:1
...HyperLogLog would estimate that you have three types of events (A, B and C but you only get the count: 3) in it. Presumably, HyperLogLog requires fewer time and space resources to achieve this more summarized perspective.
This 2002 paper describes a data structure for this that I suspect is superior to what is described in the article: https://www.cs.rutgers.edu/~farach/pubs/FrequentStream.pdf