For both alternatives we begin by computing how far the mouse has gone:
int m = abs(dx) + abs(dy); // Manhattan distance
For the single-pole RC exponential filter as WanderPanda suggested:
c -= c >> 5; // exponential decay without a multiply (not actually faster on most modern CPUs)
c += m;
For the box filter with the running-sum table as nostrademons suggested:
s += m; // update running sum
size_t j = (i + 1) % n; // calculate index in prefix sum table to overwrite
int d = s - t[j]; // calculate sum of last n mouse movement Manhattan distances
t[j] = s;
i = j;
Here c, i, s, and t are all presumed to persist from one event to the next, so maybe they're part of some context struct, while in old-fashioned C they'd be static variables. If n is a compile-time constant, this will be more efficient, especially if it's a power of 2. You don't really need a separate persistent s; that's an optimization nostrademons suggested, but you could instead use a local s at the cost of an extra array-indexing operation:
int s = t[i] + m;
Depending on context this might not actually cost any extra time.
Once you've computed your smoothed mouse velocity in c or d, you compare it against some kind of predetermined threshold, or maybe apply a smoothstep to it to get the mouse pointer size.
Roughly I think WanderPanda's approach is about 12 RISCish CPU instructions, and nostrademons's approach is about 18 but works a lot better. Either way you're probably looking at about 4-8 clock cycles on one core per mouse movement, considerably less than actually drawing the mouse pointer (if you're doing it on the CPU, anyway).