Many high performance locks do not, fundamentally, need to allocate memory. If they do, it is often for:
1) optional features such as statistics tracking or diagnostics (e.g., the ability to traverse all held locks in the process)
2) A space optimization to keep the lock object very small if uncontended, only allocating ancillary data structures on the heap needed to block at the OS level when contention is detected.
You can easily avoid this: don't support (1) in your specific malloc lock, and don't apply the optimization (2): just put those fields in the main structure.
In any case, no fancy lock is needed either: really they just want a vanilla "userspace aquire + userspace limited spin + fallback to OS blocking primitive" which is pretty much strictly better than their "userspace aquire + forever spin" existing lock. I don't think this is hard to do at all unless the OS-level locking primitives are very strange on OSX.
1) optional features such as statistics tracking or diagnostics (e.g., the ability to traverse all held locks in the process)
2) A space optimization to keep the lock object very small if uncontended, only allocating ancillary data structures on the heap needed to block at the OS level when contention is detected.
You can easily avoid this: don't support (1) in your specific malloc lock, and don't apply the optimization (2): just put those fields in the main structure.
In any case, no fancy lock is needed either: really they just want a vanilla "userspace aquire + userspace limited spin + fallback to OS blocking primitive" which is pretty much strictly better than their "userspace aquire + forever spin" existing lock. I don't think this is hard to do at all unless the OS-level locking primitives are very strange on OSX.