There should be a minor law in the spirit of Greenspun’s tenth rule: Every sufficiently performance-sensitive system eventually grows some version of a slab allocator[1].
There’s really no substitute for packing fixed-sized objects together in N big array chunks. If you can operate in a particular context (or entirely) with N=1, you can substitute handles for pointers. For any reasonable # of objects, those handles can be smaller than a pointer and provide even better compaction and thus cache effectiveness.
Everything I read these days seems to trend towards columnar stores...
I write a mixture of R, Python, C and C++, in that order of prevalence, and mostly for data intensive tasks. After nearly two decades of unlearning loops and implementing instead with FP idioms, vectorization, and matrix multiplies to take advantage of R's strengths, I often have trouble recreating R's performance when I first reimplement in C or C++.
I think it would be beneficial to start with functional programming in more CS programs, they are a much better fit for modern CPU architectures in many ways.
In my Perfect Programming Language That Doesn't Have To Deal With The Problems Of Actually Existing, serialization of data structures into memory is explicitly defined in the language, just as one would define a JSON or Protobuf serialization. One could define a "Point3{x,y,z}", and hypothetically define one array of them in the conventional manner, define another array as a columnar serialization, and potentially even define a compressed data structure right there in memory if that was advantageous.
It opens a wide array of exciting problems and possibilities, if you start thinking of that as something a language should let you twiddle with rather than hard coding it. It seems like a lot of high performance stuff is moving in a direction where this would be advantageous.
(For example, if you compress an array, you lose random access. The language would have to carefully build up capabilities from scratch so it can have a concept of "arrays" that can't be randomly accessed, can only be written in some very particular manner that may even require a "flush" concept, etc. My gut says it ought to be possible, since, after all, we can do it manually, but it would be a very careful operation to disentangle casual assumptions about what data structures can do based on the definition of what data they contain vs. how they are represented in memory, and what capabilities you get from that.)
The idea of "univalence", being pursued in Homotopy Type Theory (AKA HoTT), is that isomorphic types (i.e. we can convert back and forth without information loss) are equal, and hence one can be used in place of the other.
The dream scenario is to write all of our libraries and application code using types that are the most straightforward (e.g. counting in unary, mapping over linked-lists, looking up values from lists of key/value pairs, etc.), then cast these to more efficient types (e.g. machine words, arrays, tries, etc.) for the final program. It's still ongoing, and a naive approach would simply lead to a bunch of conversion functions being scattered around (making the program even slower), so a smart approach would be needed, e.g. where compilers can fuse away these conversions.
HoTT changes what equal means. In order for univalence to work you must be comfortable with arbitrary code running to perform the necessary substitutions. Isomorphism isn't for free, you can do some fancy cast-conversions (where cast is an identity function) that has been explored in generic zero-cost reuse [1] but this is not HoTT and uses essentially an UIP equality (which is inconsistent with HoTT).
Your usage of the word "cast" in the second paragraph is a misnomer, because no systems programmer is going to want to perform an expensive transporation in HoTT!
That's what I was getting at w.r.t. naive vs smart implementations. The point of univalence is that, from a logical/functional perspective, `slowFoo(slowX)` is indistinguishable from `fastToSlow(fastFoo(slowToFast(slowX)))`. Applying this to composite expressions will produce lots of calls like `slowToFast(fastToSlow(x))` which can be optimised down to `x`.
The problem, of course, is that we can't enforce this within the logic (since those expressions are indistinguishable), so we must rely on compiler optimisations (i.e. deforestation/fusion).
That link looks interesting, but appears to be more about re-using the same computational-content/runtime-representation for multiple compile-time/proof-context semantics (akin to McBride's "ornaments"), which seems more like a dual problem to that of swapping out slow representations for fast ones.
That sounds like, for performance reasons, you want control over the actual types used. But you could eliminate those changes from affecting the semantic representation of your program. Or at the very least with an easy check that the optimization doesn't affect your semantics.
I remember when that first came out. I have not checked in on it in a while. Do you have any links to communities/reddits/etc working with that? (/r/haskell was very excited for a while, but then the energy went somewhere else, and I don't know where.)
One of the things I only alluded to in my third paragraph, but very much had this sort of thing in mind, is that it's going to take a lot of type work and a lot of thinking in properties (like Traversable, Functor, etc. in Haskell) to make this work out well, because a lot of these rewrites to underlying representation will have manifestations at the property level (this array is now Traversable but no longer RandomAccessible, etc.).
Jonathan Blow's new language, Jai, has some control over layout allowing easy switching between array of structs and struct of arrays without having to change the accessing code.
That video briefly mentions the CppCon 2014 talk by Mike Afton, "Data-oriented Design and C++". The talk was great, pretty much lined up with how I think of modern systems (treat RAM like disk: seeks take a long time, but streams will likely be able to maximize CPU usage)
But the questions at the end, and the resistance to these basic ideas, roughly "what about platform portability, programmer productivity, etc."; these questions were a bit shocking to me. Is this not a C++ conference? This was six years ago, so perhaps the machine constraints were not as well known back then, but it really shows how strongly the community culture will influence a language and its capabilities.
I thought he pulled that back out of the language, though with the intent of re implementing it with as a library with the metaprogramming facilities of the language for anyone who wanted it?
I haven't seen the update, but to me that seems a bit silly. This is a pretty fundamental type manipulation, and ideally the compiler would have some understanding of it...
Could you explain? What I meant, I believe Jai is vaporware and gets too much attention. I mean Jai sound exciting until you find out that the download link is missing.
It's still under development, with no public release, but I think it's a bit unfair to call it vapourware, since he has produced plenty of material that demonstrates he has a working compiler and has clearly spent a great deal of time developing it, not just talking about it.
Whether it gets too much attention is debatable. He has some good ideas, imo, worth exploring, although I don't agree with all of his opinions. He has a prominent enough profile that it will likely inspire others to incorporate features from Jai if they are seen to be desirable, even if Jai is not ultimately released. (Personally, I think it will eventually.)
Well, what is your hunch when it will be released? My expectation is eithet never, or there will be a release and it will crumble quickly when people want to do things with it.
I was privy to a fairly high-wattage conversation between Walid Taha, Jeremy Siek, Todd Veldhuizen, et al, where they discussed this issue at length. They were concerned that languages almost universally equate data types with data structures. Their opinion (drunkenly expressed) was that the program semantics should be based on both. The wrinkle was was happened if you wanted to expose transformations of the underlying structure against the type when trying to say something reasonable about escape hatches.
If I had a PPLTDHTWTPOAE (?) it'd have this capability for sure.
Yes, as I was typing this up I began to realize that's where I was really going. Essentially turning an entire struct of, say, { int A; string B; Thingy C } into only a set of promises about being able to get an int, a string, a Thingy, etc., and promises around being able to set them (let's just do a struct for now, no methods or hiding), and then decoupling the type, being the set of promises being made, from the structure, being the concrete layout in memory and corresponding code capable of fulfilling those promises.
It's all easy in the simple struct case, gets more interesting when you start trying to manipulate them by putting them into arrays or associative maps, etc.
On the one hand, this is nearly revolutionary in some sense, but in another it's simply an incremental advance in a direction we're already going. "Object's are a poor man's closures, closures are a poor man's objects" being an old saying leaning in this direction, and it's hardly a new observation in Haskell that there's not a heck of a lot of practical difference between an associative array of X -> Y and a function (x -> y), at least when it comes to reading them.
In a language like Haskell, you'd parameterize every function by both the "data type" and the "data structure": monomorphization will happily emit ABI-compatible functions. I think a more interesting case is how you'd do this without a monomorphization explosion. It seems like being able to describe both subtyping and substructuring relationship (a la C's structure prefix ABI convention) would keep things mostly in check.
I dunno... seems like you'd be combining all the hard bits of a nominative type system with all the hard bits of a polymorphic structural type subsystem.
I am sure I am missing something, but this thread of comments really got me thinking. I started thinking this was a use case for existential types, but I’m not sure it’s immediately applicable. Then went down a research paper rabbit hole and google binge looking at various ideas. Now I’m sitting here thinking (and I doubt this scotch is making it clearer) that this is sort of like Haskell’s type classes being seen as the ‘Data Type’ level and regular type constructors being seen as the ‘Data Structure’ level. Obviously a language would need to incorporate the class syntax as a less separate construct than it is in Haskell, but I can’t shake the feeling this almost captures the thrust of the concept.
Sign me up! I'm constantly defining `XYZ` and `PackedXYZ` for memory-bound problems and I have often though it would be nice to be able to do this (in a more generic way than mere bitfields). It never occurred to me that it could be extended to column stores though.
Typically functional programming isn’t considered to be CPU friendly. Specifically typical functional programming allocates memory all over the heap, especially given the prevalence of things like linked lists and so on. This obviously makes for crumby cache performance and memory pressure. I’m not sure in what sense FP lends itself to columnar data structures, but those would obviously have still better cache and vectorization properties. I’d be curious to learn more!
The difference is that if you're doing functional programming with vectors as the fundamental data type (rather than scalars in linked lists), then the implementation could be hyper optimized underneath the programmer to maximize computational throughout. Whereas with imperative for loops, the implementation is somewhat fixed because python, C, and C++ force you to choose row (array of struct), or columnar (struct of arrays). R prefers struct of arrays, and makes arrays of struct a very difficult and horribly unacceptable slow. But if you are willing to use higher order functions, R makes struct of arrays suuuuuper fast. Python/C/C++ make arrays of structs the "default", by programmer culture if not by language/runtime capabilities.
Moving away from pointers and using a shared pool of handles with integer indexes will introduce a whole new array of issues. A plain int carries no information about the validity of the object behind that handle because they are carried around as copies, not references.
I remember debugging a regression with UNIX network sockets where valid connections were being killed, and the bug was only triggered under heavy load. Deadlines were approaching and as a desperate measure I did the exact opposite the article suggests: I wrapped all the socket calls to only accept pointers to an opaque struct with a single integer and made sure the int was set to a guardian value to indicate invalidation after an error. The culprit was a double-close that normal debugging tools such as Valgrind could not find but my unorthodox refactoring did.
Later I've learnt to love UUID's whenever performance allows. They're not pointers so they don't introduce memory handling issues and they can be easily tracked in e.g. distributed data pipelines and logging systems.
A pointer is an absolute locator of a byte in the address space of a process, while an integer index implies a certain base (start of the module-private array) and a multiplier (object size).
By using handles, "user code" is freed from having to know the base and the multiplier, which allows for a more compact locator that remains constant in case of base/multiplier changes.
You should probably read the whole article, because it describes strategies for dealing with exactly this problem in the section under Memory safety considerations and then again in the update at the end. The short form is: if you have a bounded size on your array, then you will often have unused bits in your index pointer; you can use these to store a "generation counter", which needs to match the slot. You can now re-use indices, because you can compare the generation of the index (stored in the extra bits) with the generation count in the slot, and if they match, it means that the item has been 'freed' and a new thing is in the slot.
> A plain int carries no information about the validity of the object behind that handle because they are carried around as copies, not references.
The article suggests a handle containing a "plain int" plus a "unique bit pattern", the latter acting like a watermark which is compared to the one in the private array of the corresponding module, preventing dangling accesses.
Apart from rare occasional collisions in the bit pattern causing a false negative on a dangling access, what other issues can be found with this approach?
That's probably not a bad way to think about it - in an application that large, dealing with lots of data, needing high performance, it is effectively an application-specific database.
On the other hand, think about the improved cache footprint from using 2-byte object handles rather than full-size 8-byte pointers. If the "user code" keeps a lot of those handles in sequential structs, that is a huge win.
Oh, sorry, I didn't mean to be disparaging. What I meant was that if the application needs to handle lots of in-memory data, then thinking of it as an in-memory database seems like a great idea. The ECS architecture seems to be pulling in ideas from column oriented databases, there are probably lots of ideas like that which could be repurposed. (Of course there is the other side of that, which is if you can avoid storing lots of data in-memory at all, possibly using an off-the-shelf relational database, then that's a great approach too.)
I mean, calling it a database might be technically accurate, but is misleading in the sense the it gives the impression that it's a big heavyweight thing. It's not like there's a SQL implementation in there.
You jest, but SQLite run in-memory is a pretty fast database. I'm currently using it for ECS implementation in a little game I'm writing on the side (albeit in Common Lisp, and not yet very performance-heavy).
The reason I went this way is because when writing my third ECS with various access pattern optimizations, I realized I'm hand-implementing database indices - so I may as well plug SQLite for now, nail the interface, and refactor it into Array-of-Struct implementation with handles later.
The translation of that key to a memory location should be very straightforward. A few bitwise operations to extract the necessary pieces from the handle, a multiplication of the index, an addition to the private base pointer and there you go. Nothing fancy or CPU-expensive.
Weak_ptr in C++ only applies to shared_ptr, which is not used for most C++ pointers. Because of the atomic ops done on each copy, in performance sensitive contexts people aren't going to be using it.
I was writing an interpreter using shared ptrs all over the place.
Performance was fine until I linked pthreads, at which point it took a nosedive forcing me to refactor the entire project.
No one really tells you this. People will say don't use shared ptrs because they're slow, which is far from the whole truth. They're pretty darn fast as long as you don't enable threading.
I would go a step farther and say that uncontended shared_ptrs are typically fine. The problem comes in when you have shared_ptrs that are frequently copied in more than one thread. That requires synchronizing the shared state across cores, sometimes across NUMA nodes. That's slow.
The same thing goes for mutexes. Modern, uncontended mutexes are very fast. But they can start sucking resources when you're locking them from multiple threads at the same time.
>I was writing an interpreter using shared ptrs all over the place.
I am doing that :/
In Object Pascal it is the standard memory management way. All strings and arrays are reference counted, and so I use it for all my own data structures as well
I have not even added threading and I doubt I will, but the reference counting is already the slowest part
You can avoid reference counting overhead by using handles as described in this article. Each object in a slab would have just one reference, and you handles would be indexes into an array of them. Now, after you do that you will be on be hook for making sure you don't accidentally deallocate any of the objects that is currently in use. And it's not guaranteed to be faster. But it is one way to get around this issue.
> Later I've learnt to love UUID's whenever performance allows.
This is essentially the same as what the article suggests, just with a lot more spare bits for creating unique handles.
"This is where the ‘free bits’ in a handle come in: Let’s say our handles are 16-bits, but we only ever need 1024 items alive at the same time. Only 10 index bits are needed to address 1024 items, which leaves 6 bits free for something else.
If those 6 bits contain some sort of ‘unique pattern’, it’s possible to detect dangling accesses: [...]"
The problem isn't with the integer indices being by-value, but rather that they are subject to manual memory management. A descriptor can be closed and then replaced by a different object at the same integer value. (Or both can happen atomically with dup2.)
> "I wrapped all the socket calls to only accept pointers to an opaque struct with a single integer and made sure the int was set to a guardian value to indicate invalidation after an error."
This worked because after destroying the descriptor and storing the sentinel value, you didn't destroy the structure.
In production code, that would be a problem. You wouldn't want to leak these wrapper structures, so at some point they would have to be deleted.
And then you're back to the same problem: a structure containing a guardian int is freed while still in use, then the memory is re-allocated to some other purpose, and now the guardian checks are inappropriately testing memory they no longer own.
What works is garbage-collected handles: pointers to objects that wrap a manually-managed resource, but which themselves do not go away while they are referenced.
By the way, POSIX file descriptors could provide a sort of garbage-collected discipline, because they support reference counting. So that is to say, instead of passing around an integer descriptor, you can enforce that dup() must be used by anyone wanting to share the underlying object: everyone must share through their own integer descriptor, not by sharing the integer. Then if five objects or threads or whatever are sharing the same open file, they have five different integers. The object goes away when all five of those contexts separately issue a close on their integer. Nobody closes an integer that they got from somewhere else, always their own dup.
POSIX descriptors also lack the feature described in the article: tag bits. So that is to say, imagine a parallel universe with POSIX descriptors that are not simply non-negative integers clumped close to zero, but have two parts: the integer clumped close to zero, and some bits indicating a generation. For instance suppose descriptor 0x01FC means "generation 0, index 1FC". Then suppose you close(0x01FC) such that the 1FC slot is the lowest available position. Something opens a new descriptor. That results in 0x11FC, not 0x01FC. The generation has incremented. Now suppose a double close happens: another close(0x01FC). The kernel extracts the generation 0, and looks at object 1FC. It sees, hey that thing is generation 1! This is a stale handle! and returns an error.
floooh suggests handles not be plain ints, but also have some bits with a 'unique pattern' (eg. generation counter) for safety checks. This is one of the reasons given for handles being better than pointers.
I am surprised that not every large project uses this technique. If you are running on a 64bit system, the use of
32bit handles instead of pointer results in huge memory savings. You can also put the start addressed of the arrays (the handles are indexing into) anywhere in virtual memory and grow them as needed.
Finally you can stripe the data for one object into multiple arrays and use the same handle to index each of them which can improve locality.
> you can stripe the data for one object into multiple arrays and use the same handle to index each of them which can improve locality.
I wish programming languages made it easy to structure arrays of objects like this. I think Johnathan Blows’s JAI has some kind of transparent support for SoA.
This basically is how tables work in k. Tables are defined as a list of dictionaries, but in fact stored as an array for each element (ie. each column of the table is an array, but each row can be read as a dictionary).
We have a mixin class that will allocate a subclass with its own array of handles (the handles overload * and -> so they act like pointers). If you have an instance variable you can choose to give it a different allocator, where that instance variable of object with handle `n` is the nth object in the new area. Then your instance method `barf& foo();`, instead of getting the instance variable `barf m_foo` Instead does some inlined arithmetic and gives the caller a reference to the object.
I don't know. The hugest saving from not malloc-ing small objects comes from the memory-chunk overhead I believe. You could do object pools and still use pointer as usual I think with no runtime cost of passing handles to getter-functions. The usual pattern of "wrapping" malloc/free.
What about CPU data cache misses due to the fact that malloc doesn't have a concept for how best to organize that data chunk of data based on your application's specific usage pattern?
That can definitely be a problem. Using handles makes it easier to swap out your allocation mechanism if you ever have to optimize your cache efficiency.
Hell, projects that need to squeeze out that level of performance will typically have multiple different memory allocation mechanisms.
Object pools pale in comparison to striping the data into arrays (or as we call it in the game dev world a Struct of Arrays) when it comes to reduced cache misses.
Given that you iterate over the object fields in a "column like" manner, right? Random access should be slower since an object of say four elements would have to fetch four cache lines.
Hell, you might even be able to go smaller. For example, if you know that you're going to have less than 256 instances of a given object, an 8-bit handle will do the job nicely at an eighth of the size of a full-blown 64-bit pointer.
Since handles refer to offer rather than absolute addresses, this sort of scheme should allow for "serializing" data by memmapping chunks of memory to disk.
Obviously, this scheme wouldn't be useful for cross-platform save files, and might not even translate across builds.
Still, as long as you have a robust scheme for invalidating data when a new version is deployed, I could see this being useful for caching, and for saving state for short periods of time on mobile platforms that like to restart processes with little warning. (Rather than potentially losing your place within the game level any time you switch between apps, you only lose your place within the game level when an app update gets installed.)
Handles are essentially integer indexes, relative to a base pointer, also implying a certain object size (needed for index multiplication). A nice property therefore is that the base pointer can be different each time the program is started, but the handle stays constant.
So in case that module-private array is mmapped to disk, handles themselves can be serialised, unlike pointers. This can enable quick and easy storage of most of the program state, allowing for a fast recovery afterwards.
We used to call this "seekfree" loading back in games because you could read the whole game level in one chunk without seeking on spinning media.
If you wanted pointers out of it after you loaded it you just had a fix-up phase that swapped the offsets for pointers.
We also used the non pointer mmap version on eMMC platforms for really easy ways to store large data while letting the kernel handle the paging with clean pages.
Also forces you to think about data locality which is a powerful thing.
MacOS 1.0 in 1984 used handles in order to allow the heap to be coalesced more easily (although locked handles and regular pointers often created islands). Given the first Mac had 128K RAM and no MMU, it was highly necessary at the time.
Those were not the same kind of handles AFAIK. This particular article is about pooling objects of the same time AND using indices (which the articles calls handles) to reference them. They aren't handles by any definition I've heard of in the past. AFAIK the definition of a handle is an opaque reference to some object, when you want to use the object you ask the handle for the object's actual address being aware that the address is only useful for some defined period (like having to call ptr = acquire(handle) ... do work with p ... release(handle) (p is now not valid).
Handles by that definition have the usage the can be moved in memory to clear up fragmentation but they have none of the other benefits this particular post is talking about. Those benefits only come from their specific implementation, not from the idea of "handles".
Of course you mean System 1, not MacOS 1.0. Handles were used all the way through System 7, last released in 1997. Perhaps by then it was more about the legacy than the need. Don't know the physical limits, but a Quadra 950 could support 256MB RAM and more through VM (ramdisk), and of course these later gen machines did have MMUs.
In ye olden days, addressing was probably physical, so being non-multi-tasking ensured the safety of using handles, ie if used properly. IOW the system process was free to clean up the heap and manipulate your pointers b/c your app would only yield at specific points where you aren't holding onto a dereferenced handle.
Handles continued to be used all the way through Mac OS 9, as they were used by a ton of critical Toolbox APIs like the Resource Manager. A vestigal version even existed in Mac OS X -- handles still existed in Carbon, but I'm not sure they would ever be relocated by the system like they were previously.
> of course these later gen machines did have MMUs.
They did, but it was only used to simulate a system with more physical memory. Every process still saw the same view of a single "flat" memory space.
> the system process was free to clean up the heap and manipulate your pointers b/c your app would only yield at specific points where you aren't holding onto a dereferenced handle.
True, but it was also your job as a developer to not be holding onto those pointers when the memory manager ran.
In many ways a pointer on a modern OS with a virtual memory subsystem really is a handle already. It's not like it points to a single given exact location in physical memory. The OS and CPU are already presenting a bit of a facade.
Sadly this is always a pointer in actual implementations, but should have been a handle, where a handle is a structure containing two fields: an index or pointer-like value, and a verifier that can be used to check that the handle is [still] valid. Using a handle would have made it possible to get much better memory safety.
I think it would be much easier to pull off and work with now than then. The c++ world now has raii and smart pointers, so locking and unlocking could be something the compiler does for you based on scope, all using a pointer syntax.
Having worked on games that used pooled allocation techniques like this, and others that just had tons of tiny allocations in a single heap, my experience is that the latter pain is much worse.
Also helps with arrays that you might wish to realloc. I've sometimes hit the (obvious, in hindsight) bug that you realloc an array and all pointers to members that are held elsewhere suddenly become invalid.
Somehow, my first specialty as a developer was perf analysis, and so all through the Java Golden Era I read every press release about the JVM and subscribed to the ACM SIGPLAN, which had at least one paper on GC in virtually every proceeding that was published, and some occasional cool stuff on interpreter construction. I believe the 4rd technical book I read was The Java Virtual Machine. After Stevens, Comer (3 volumes), and the Java Programming Language.
The original GC had handles. With Hotspot (JDK 1.2, IIRC?) those were gone, and by Java ~8 they were back again, but embedded in the object header, where the pointer indirection presumably hurts less.
In an alternate universe, Sun would have shrunk the memory size for Strings earlier and kept the overhead per object, in which case I suspect the GC stall around 1-2GB would never have happened. But I don't think they had admitted to themselves how Stringly Typed production Java code was, and how UTF-16 wasn't going to last forever.
shared_ptrs, if misused, can cause a lot of memory management issues, because you end up with a web of object lifetime dependencies that is hard to reason about. I think this is a severely underrated problem in C++, particularly for newer developers. shared_ptr is extremely powerful, but it it makes it easy to gloss over issues of object lifetimes and ownership, when in fact those issues are still very important for building robust systems.
In the code bases I've worked on, we have pushed hard to have unique ownership wherever possible and only use shared_ptr where there was a clear need. You can still have a huge object graph kept alive by a single unique_ptr, but it happens less often and it's easier to trace back and fix.
A nice thing about the handle approach is that it makes it a lot harder to build up these object graphs or in general to implement anything without being explicit about object lifetime.
I've seen parts of the handle/object pool approach misapplied and cause more trouble than it's worth, though. It's a good idea for self-contained subsystems where there are a limited number of object types that you would apply this to. I don't think it scales to 100s or 1000s of distinct types of objects because then you're going to have headaches dealing with the sheer number the object pools.
I've also seen object pooling be implemented proactively as an "optimisation" to avoid calling malloc() but then become a bottleneck because of lock contention in the object pool.
Does this assume that the handle consumers have to somehow 'release' the handle in addition to 'deleting' it? As was explained with an example of delayed release of some deleted but still GPU queued element.
I'm not sure if the suggested generation counter approach is the optimal way to deal with the delayed release. Also, when it comes down to own memory managers, benchmarking is a must. Added complexities may eventually eat up the initial savings.
The "generation counter" suggested in the article has nothing to do with delayed release, as far as I understand. It is used as a strategy to avoid returning colliding handles close in the time domain, reducing the chances for a collision of the "unique bit pattern", which would allow non-detectable dangling access conditions.
My understanding is that the dangling access conditions potentially arise due to the delayed release of deleted handles.
["... Instead the rendering system would only mark the resource object for destruction when the user code requests its destruction, but the actual destruction happens at a later time when the GPU no longer uses the resource."]
There’s really no substitute for packing fixed-sized objects together in N big array chunks. If you can operate in a particular context (or entirely) with N=1, you can substitute handles for pointers. For any reasonable # of objects, those handles can be smaller than a pointer and provide even better compaction and thus cache effectiveness.
[1] https://people.eecs.berkeley.edu/~kubitron/courses/cs194-24-...