Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I know custom allocators are important for tiny platforms and so on, but what's a reason C/C++ can't allocate memory in a satisfactory way by default on those platforms? And how do other languages get away with it?


C++ doesn't come out of the box with an allocator at all. Implementations have to provide it. But this article isn't talking about the difference between, say, jemalloc and mimalloc. It's talking about cases where you want to minimize calls to global operator new, or cases where you want to make a lot of allocations but you don't want to have to delete anything. The latter is often a massive advantage in speed. For example if you need to use a std::set<int> within a scope, and it doesn't escape that scope, it will be much faster to provide an arena allocator that allocates the nodes used by std::set, both because it will minimize the necessary calls to global new -- it may even eliminate them if you can safely use enough stack space -- and especially because there isn't a corresponding deallocation for every allocation. You simply discard the entire arena at the end of the scope.

Fans of Java may rightly point out that garbage collection also has this property, but GC brings other costs. Nothing in Java even remotely approximates the performance of an STL container backed by an arena allocator.


"Satisfactory" means different things for different people in different contexts, so a general-purpose malloc might not do as well as you'd like.

The simplest and fastest allocator you could possibly write is something where malloc just increments a pointer into a chunk of memory, and free is a no-op. This is obviously terrible as your system-wide allocator, but you can allocate an arena from the system malloc, then use the simple allocator to chunk out that arena. This is useful for example if you know that a web request has a reasonable upper bound on memory usage. You just allocate that much memory from the system at the start of the request, use the increment allocator for all request-bound objects, then release the arena at the end of the request. One single call to the system's somewhat heavy malloc/free, many calls to a trivially simple malloc and a no-op free.

Other languages don't so much "get away with it" as they're simply not used for workloads where it matters.


There's no single confluence where every application's needs are provided by a single C/C++ implementation's default allocator. I'll just speak to one example off the top of my head: the default glibc malloc() stores an allocation size near or below the actual allocation data itself. In contrast, an alternative like jemalloc tracks the size in a separate control block. When freeing allocations with the former, that memory has to be touched; with the latter, the control block has to be touched instead. Similarly it can lead to less or more efficient packing of aligned data. All of this yields better or worse performance depending on the application.


It is very hard to beat a good off the shelf allocator in the general case, but it is easy in specialized cases, simply because you can make tradeoffs and rely on knowledge the allocator doesn't have.


> but what's a reason C/C++ can't allocate memory in a satisfactory way by default on those platforms?

Custom allocators are important when memory performance is relevant. Allocation/deallocation is really slow compared to using memory which is already allocated to a program, so this can often be one of the biggest performance syncs in a program, especially one which churns through a lot of data (e.g. videogames, simulation, ML training etc.)

Custom allocators can also help with memory coherency: you can pack objects you're using together close together in memory, which minimizes the amount of CPU cache misses, which can also be one of the most expensive parts of execution. It may be difficult or impossible for a compiler to design a more optimal memory layout than a programmer with knowlege of how the program will use the data.

> And how do other languages get away with it?

Custom allocators are most relevant in performance-critical contexts, so you're probably already using C++ because a higher-level language was already too slow for your use-case. In other words, other languages pay the same cost for memory management, but if you're programming in Python you're probably working in a domain where the performance difference doesn't matter.

Even programs written in a "fast" GC'd language like GO will have a performance ceiling largely dictated by memory churn.


It's not just tiny platforms. For example, you often want allocators with some predictable characteristic. Eg; constant time allocation for a given request size. Or you might want an allocator that keeps certain items contiguous so they'll tend to already be in cache. Custom allocators tend to be used when you have a particular requirement that a general purpose allocator isn't optimal for.

For tiny platforms often it's bare-metal so there's only one running "process" and so something like "malloc" that needs to be aware of memory usage across the system isn't relevant. Moreover, embedded systems tend to care about deterministic performance so an allocator specific to the application can be a better match.


> And how do other languages get away with it?

General purpose allocators are (basically) never the best for speed, fragmentation, and so on. They can at most be good enough. Basic knowledge about how your memory is going to be used that'll inform the use of a few simple custom allocators is likely to give you several times the performance that you'd see with a general purpose allocator. In the best case you'll literally reset an allocation pointer once per loop and just overwrite memory that you know isn't used anymore, for example, making allocation a write into an already existing buffer and "free" the resetting of the pointer. This has nothing to do with platforms, but rather is just basic removal of dumb code that shouldn't be running. There's zero reason a an actual `free`/`delete` should be used in those cases and using it is likely to slow down that loop considerably.

> And how do other languages get away with it?

Much like general purpose allocators are never the best, there is zero evidence to suggest that garbage collection is ever the optimal choice; it can only ever be good enough. There's also a lot of lore around garbage collectors being magic and doing lots of great things for you to ensure less fragmentation and nice cache locality but people are usually just amazed at how it's "not that bad" in the end.

There's no silver bullet: Garbage collectors and general purpose allocators aren't magic pieces of code written by developers who can conjure up the best code for every use case. Like most general code they're "not bad" at everything but not actually very good at anything. Running less code for your allocation and eliminating things based on your knowledge of how things are going to be used is always going to be better.


>> I know custom allocators are important for tiny platforms and so on...

For tiny platforms like micro controllers used all over your car, standard practice is to never use dynamic memory allocation. When you have 1K or even 64K of RAM and need to run for a while the only way to be sure you never run out of memory is not to use malloc at all. Fragmentation is a thing. This may mean managing fixed collections of objects much like the allocators in the article, but the linker figures out where to put them at compile time.


>And how do other languages get away with it?

They don't get away with it. They suffer the consequences of bad performance. Of course depending on your metric they are "faster" in some situations that are common to that language but C++ often lets you avoid temporary heap allocations by using the stack which further increases performance.


> And how do other languages get away with it?

They either bring similar mechanisms, bring a selection of their own allocators that covers enough space or just don't allow influencing it much and get away with it because many applications/users do not care.


You can allocate pools of memory so that you have a fixed memory footprint. That way you have (almost) no risk of OOMing later on in the process.

This can also give you better control over cache locality of memory.


It's not just useful for tiny platforms. Consider the erlang virtual machine, which has something like 11 custom allocators internally, for different performance characteristics.


Which other languages do you have in mind?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: