For bonus points, can someone either write a stable partition that runs in linear time with no extra storage (or maybe just constant storage) or prove that it is impossible?
Move all non-matching elements to the end of the list in the order you encounter them, then read the matching elements forwards from the start of the list (as before) and the non-matching elements backwards from the end of the list?
You could even be nifty and return a reversed iterator yourself to hide the implementation detail from the caller.
Not sure I understand you algorithm - are you saying to append the items as you see them to the end of the list? Wouldn't that violate the no-additional memory requirement? That would also add a dependency on using iterators that support growing the underlying data structure without invalidating the existing iterators.
Swap with the element at the end of the list, not append, and move the swap point one position back each time. Otherwise why would I say you would need to read them in reverse order?
I think that's what he's trying to address when he refers to moving the swap point. This is a standard algorithm (e.g. http://www.cplusplus.com/reference/algorithm/partition/), but it's not stable, which means it doesn't meet your requirements.
Stable partition with constant storage is harder. I'm pretty sure it's not possible, but I'm not prepared to offer a proof of that. I believe it (at best) turns into NlgN moves (which is how stable_partition performs as well) no matter what exact approach you take.