Neither CRITICAL_SECTION nor SRWLock enters the kernel when uncontended. (SRWLock is based on keyed events, CRITICAL_SECTION nowadays creates kernel object on-demand but falls back to keyed event on failure)
What _is_ the big difference between CRITICAL_SECTION and a futex, really? I always assumed that futexes were meant to be pretty similar to CRITICAL_SECTION (mostly-userspace locks that didn't have fatal spinning issues on contention).
A critical_section is a mutex, a futex is a general synchronization primitive (a critical_section might be implemented on a more general primitive of course, I'm not a Windows expert).
Critical section was IIRC built on top of windows manual/auto reset events which are a different primitive useful for more than just mutex but without the userspace coordination aspect (32 bit value) of futexes.
Well, technically both WaitOnAddress and SRWLOCK use the same "wait/wake by thread ID" primitive. WaitOnAddress uses a hash table to store the thread ID to wake for an address, whereas SRWLOCK can just store that in the SRWLOCK itself (well, in an object on the waiter's stack, pointed to by the SRWLOCK).