I've written some memory allocators that use lock-free allocation maps. In this case, a 4kb granularity allocator that used compressed free-bitmaps for usage in both userspace and kernelspace.
The entire thing was lock-free, so allocations would never stall on a lock, even if the allocator had to decompress a bitmap (the way compression worked was by looking for free memory ranges and then replacing them by a range expression, which saved a few MB of memory per Gigabyte of memory available, as well as ordering largely free memory blocks to the start of the bitmap list to reduce seek time). Despite the complexity in the allocator, it could serve any number of threads without ever locking up. The worst case was a long latency if a thread had to scan the entire free list for a free block (It would switch to a different allocation behaviour if a thread had more than a dozen failed allocation attempts).
A bunch of ring buffers I've worked with are atomic (ie, lock free), probably the most common application of lock-free stuff.
Interesting! Is this work public somewhere perhaps?
I'm currently working on a similar technique (though with smaller blocks, currently ranging 64-1024 bytes - basically any 5 power-of-twos) for a lock-free allocator intended for real time embedded systems. I use a 32-bit word for each 1024 byte "page", and have a tree structure inside of the word (it takes 31 bits to cover 5 power-of-two ranges).
My code (which only allocs and never frees at the moment) is here:
Only a much older version which sadly does not feature compressed bitmaps or reordering of the bitmaps, plus having one or two bugs rather critical bugs. It's part of my toy project kernel. Largely written in Rust with a few unsafe behaviours since it's intended to be able to bootstrap given only a pointer and size of a heap memory block using only 8 kilobytes of stack, as well as being able to add or remove memory to the entire list.
There is only one section that behaves like a lock, which handles the free range decompression. It has to be somewhat exclusive in behaviour since it must be able to decompress the block without allocating memory, instead it simply uses the decompressed lock to allocate the memory necessary to accomodate for overhang memory in the range (ie, if a range is larger than a block). It simply means that during this time, the available memory shrinks by the amount represented by the range.
IIRC it's about 128 bytes of metadata, the rest is bits in the bitmaps to handle page allocation. Each block can in theory handle about 124 MiB, though I do have some logic so that sections can be split into multiple blocks to reduce contention as well as handling large pages (which largely reside in a single block as even 16MB pages enable managing Terabytes of memory).
Allocation is simply a CAS on the byte containing the free bit for the specific page (Base Address + 4KiB * Bit Index) then doing an atomic increment on the free page counter. Deallocation is the reverse. Failed CAS attempts mean it continues to the next free bit atleast on the next byte (= (Bit Index >> 3 + 1) << 3). The happy allocation path is fewer than a hundred instructions, the unhappy path is about a thousand instructions. CAS latency can be quite unpredictable though, so probably not suitable for realtime systems.
The entire thing was lock-free, so allocations would never stall on a lock, even if the allocator had to decompress a bitmap (the way compression worked was by looking for free memory ranges and then replacing them by a range expression, which saved a few MB of memory per Gigabyte of memory available, as well as ordering largely free memory blocks to the start of the bitmap list to reduce seek time). Despite the complexity in the allocator, it could serve any number of threads without ever locking up. The worst case was a long latency if a thread had to scan the entire free list for a free block (It would switch to a different allocation behaviour if a thread had more than a dozen failed allocation attempts).
A bunch of ring buffers I've worked with are atomic (ie, lock free), probably the most common application of lock-free stuff.