There are so many complaints here about how simplistic this is and how it is nowhere near what a "good" malloc does. That's kind of the point. Let's implement the most brain-dead version of this code. This is code you can quickly understand and reason about. It gives you a baseline to understand all the improvements that are included in a "real" malloc.
It would have been nice to include a list of the types of changes that a modern malloc would make to this basic structure to make it faster or fix bugs.
Maybe we could standardise on a set of adjectives to cover the spectrum from: "this is a handwave that needs filling in" and "this is cheap and cheerful code that does the basic thing" all the way to "this has been in production for a decade" and "this code not only handles 99,7% of the corner cases, it even also handles several dozen corner cases that only ever occurred on architectures which haven't existed since the 20th century"?
As someone who really enjoys reinventing the wheel for fun, the final words are something of a mantra for me:
> Let me stress here that while this collector is simple, it isn’t a toy.
> There are a ton of optimizations you can build on top of this—in GCs and programming languages, optimization is 90% of the effort—but the core code here is a legitimate real GC.
> It’s very similar to the collectors that were in Ruby and Lua until recently.
> You can ship production code that uses something exactly like this.
One of my OS class assignments was writing an allocator. It was scored by running some artificial workloads and measuring the memory overhead.
The results from past years were public and included the test case names like "PowersOfTwo" or "Primes". So instead of a generic allocator some of us optimized for the test suite :)
I managed to get one of the all time high scores by implementing a variable length integer encoding scheme that could fit powers of two, primes and other "interesting" numbers up to some size into one or two bytes.
I've had to find a malloc simple enough to port to another language recently and this one was really well written with good features and research papers behind it: https://github.com/mattconte/tlsf
Unfortunately, allocation is far more complicated than that. Even more unfortunately, the ecosystem around such a critical task is woefully incomplete.
One of the biggest dangerous for the link's approach is partial interception, leading to conflicting libraries being used.
Implementing a malloc was one of the assignments of CS110 when I was an undergrad at Stanford ages ago. Grades were based on correctness, memory efficiency and maybe speed. IIRC mine used a variant of the dlmalloc strategy, and got perfect or near perfect marks. Definitely a cool and challenging assignment.
Yes, by all means, let's implement malloc(3) in a way it should never have been implemented after the VAX/11 architecture, and using much less readable code and worse examples than in the Old Testament ?
Hint: Storing the free-list in the free space means that to free even more memory, you have to page in otherwise unused VM pages.
Nice to see this here. This article really helped me learn how memory allocation worked. I managed to implement my own thanks to it and a few other resources.
I used a doubly linked list of memory blocks. They are split into smaller blocks on allocation and merged with surrounding free blocks on deallocation.
Super cool. I'm not a C programmer by any stretch, used to code C++ a lot a long time ago, I'm always in awe of the low level shenanigans going on in the bellows of C.
Calloc sometimes doesn't really do this. I believe on some architectures (like Linux) it requests the page and sets a traps so that it only fills with zeros when you first access segments of the memory space. This winds up being much better for cache coherency if you're doing something that runs across the memory space.
It would have been nice to include a list of the types of changes that a modern malloc would make to this basic structure to make it faster or fix bugs.