Please elaborate.

kragen · on Sept 9, 2021

For both alternatives we begin by computing how far the mouse has gone:

    int m = abs(dx) + abs(dy);   // Manhattan distance

For the single-pole RC exponential filter as WanderPanda suggested:

    c -= c >> 5;                 // exponential decay without a multiply (not actually faster on most modern CPUs)
    c += m;

For the box filter with the running-sum table as nostrademons suggested:

    s += m;                      // update running sum
    size_t j = (i + 1) % n;      // calculate index in prefix sum table to overwrite
    int d = s - t[j];            // calculate sum of last n mouse movement Manhattan distances
    t[j] = s;
    i = j;

Here c, i, s, and t are all presumed to persist from one event to the next, so maybe they're part of some context struct, while in old-fashioned C they'd be static variables. If n is a compile-time constant, this will be more efficient, especially if it's a power of 2. You don't really need a separate persistent s; that's an optimization nostrademons suggested, but you could instead use a local s at the cost of an extra array-indexing operation:

    int s = t[i] + m;

Depending on context this might not actually cost any extra time.

Once you've computed your smoothed mouse velocity in c or d, you compare it against some kind of predetermined threshold, or maybe apply a smoothstep to it to get the mouse pointer size.

Roughly I think WanderPanda's approach is about 12 RISCish CPU instructions, and nostrademons's approach is about 18 but works a lot better. Either way you're probably looking at about 4-8 clock cycles on one core per mouse movement, considerably less than actually drawing the mouse pointer (if you're doing it on the CPU, anyway).

Does that help?