That makes sense, thanks for clarifying. So if say there are 1000 client connection threads and they monitor some status and say ping their clients every 5 seconds, on wake-up, it could be only a small fraction of the stack would have to be paged-in. And it would depend how deep the stack was when the thread of put to sleep?
That's not to say threads aren't expensive, they still require several heavyweight struct allocations on the kernel side, e.g. struct task_struct, which I counted to 1k before getting bored (and wasn't even quarter way through the fields)
Edit: Linux git HEAD with Debian unstable .config: