Handles Are the Better Pointers (2018)

twoodfin · on Aug 13, 2020

There should be a minor law in the spirit of Greenspun’s tenth rule: Every sufficiently performance-sensitive system eventually grows some version of a slab allocator[1].

There’s really no substitute for packing fixed-sized objects together in N big array chunks. If you can operate in a particular context (or entirely) with N=1, you can substitute handles for pointers. For any reasonable # of objects, those handles can be smaller than a pointer and provide even better compaction and thus cache effectiveness.

[1] https://people.eecs.berkeley.edu/~kubitron/courses/cs194-24-...

epistasis · on Aug 13, 2020

Everything I read these days seems to trend towards columnar stores...

I write a mixture of R, Python, C and C++, in that order of prevalence, and mostly for data intensive tasks. After nearly two decades of unlearning loops and implementing instead with FP idioms, vectorization, and matrix multiplies to take advantage of R's strengths, I often have trouble recreating R's performance when I first reimplement in C or C++.

I think it would be beneficial to start with functional programming in more CS programs, they are a much better fit for modern CPU architectures in many ways.

jerf · on Aug 13, 2020

In my Perfect Programming Language That Doesn't Have To Deal With The Problems Of Actually Existing, serialization of data structures into memory is explicitly defined in the language, just as one would define a JSON or Protobuf serialization. One could define a "Point3{x,y,z}", and hypothetically define one array of them in the conventional manner, define another array as a columnar serialization, and potentially even define a compressed data structure right there in memory if that was advantageous.

It opens a wide array of exciting problems and possibilities, if you start thinking of that as something a language should let you twiddle with rather than hard coding it. It seems like a lot of high performance stuff is moving in a direction where this would be advantageous.

(For example, if you compress an array, you lose random access. The language would have to carefully build up capabilities from scratch so it can have a concept of "arrays" that can't be randomly accessed, can only be written in some very particular manner that may even require a "flush" concept, etc. My gut says it ought to be possible, since, after all, we can do it manually, but it would be a very careful operation to disentangle casual assumptions about what data structures can do based on the definition of what data they contain vs. how they are represented in memory, and what capabilities you get from that.)

chriswarbo · on Aug 13, 2020

The idea of "univalence", being pursued in Homotopy Type Theory (AKA HoTT), is that isomorphic types (i.e. we can convert back and forth without information loss) are equal, and hence one can be used in place of the other.

The dream scenario is to write all of our libraries and application code using types that are the most straightforward (e.g. counting in unary, mapping over linked-lists, looking up values from lists of key/value pairs, etc.), then cast these to more efficient types (e.g. machine words, arrays, tries, etc.) for the final program. It's still ongoing, and a naive approach would simply lead to a bunch of conversion functions being scattered around (making the program even slower), so a smart approach would be needed, e.g. where compilers can fuse away these conversions.

voxl · on Aug 13, 2020

HoTT changes what equal means. In order for univalence to work you must be comfortable with arbitrary code running to perform the necessary substitutions. Isomorphism isn't for free, you can do some fancy cast-conversions (where cast is an identity function) that has been explored in generic zero-cost reuse [1] but this is not HoTT and uses essentially an UIP equality (which is inconsistent with HoTT).

Your usage of the word "cast" in the second paragraph is a misnomer, because no systems programmer is going to want to perform an expensive transporation in HoTT!

[1] https://arxiv.org/abs/1803.08150

chriswarbo · on Aug 15, 2020

That's what I was getting at w.r.t. naive vs smart implementations. The point of univalence is that, from a logical/functional perspective, `slowFoo(slowX)` is indistinguishable from `fastToSlow(fastFoo(slowToFast(slowX)))`. Applying this to composite expressions will produce lots of calls like `slowToFast(fastToSlow(x))` which can be optimised down to `x`.

The problem, of course, is that we can't enforce this within the logic (since those expressions are indistinguishable), so we must rely on compiler optimisations (i.e. deforestation/fusion).

That link looks interesting, but appears to be more about re-using the same computational-content/runtime-representation for multiple compile-time/proof-context semantics (akin to McBride's "ornaments"), which seems more like a dual problem to that of swapping out slow representations for fast ones.

rocqua · on Aug 13, 2020

That sounds like, for performance reasons, you want control over the actual types used. But you could eliminate those changes from affecting the semantic representation of your program. Or at the very least with an easy check that the optimization doesn't affect your semantics.

jerf · on Aug 13, 2020

I remember when that first came out. I have not checked in on it in a while. Do you have any links to communities/reddits/etc working with that? (/r/haskell was very excited for a while, but then the energy went somewhere else, and I don't know where.)

One of the things I only alluded to in my third paragraph, but very much had this sort of thing in mind, is that it's going to take a lot of type work and a lot of thinking in properties (like Traversable, Functor, etc. in Haskell) to make this work out well, because a lot of these rewrites to underlying representation will have manifestations at the property level (this array is now Traversable but no longer RandomAccessible, etc.).

zero_iq · on Aug 13, 2020

Jonathan Blow's new language, Jai, has some control over layout allowing easy switching between array of structs and struct of arrays without having to change the accessing code.

clucas · on Aug 13, 2020

Here is the demo video of the stuff zero_iq is talking about: https://www.youtube.com/watch?v=ZHqFrNyLlpA

Basically right on point with what the GP is saying.

epistasis · on Aug 14, 2020

That video briefly mentions the CppCon 2014 talk by Mike Afton, "Data-oriented Design and C++". The talk was great, pretty much lined up with how I think of modern systems (treat RAM like disk: seeks take a long time, but streams will likely be able to maximize CPU usage)

But the questions at the end, and the resistance to these basic ideas, roughly "what about platform portability, programmer productivity, etc."; these questions were a bit shocking to me. Is this not a C++ conference? This was six years ago, so perhaps the machine constraints were not as well known back then, but it really shows how strongly the community culture will influence a language and its capabilities.

runevault · on Aug 13, 2020

I thought he pulled that back out of the language, though with the intent of re implementing it with as a library with the metaprogramming facilities of the language for anyone who wanted it?

jiggawatts · on Aug 13, 2020

I haven't seen the update, but to me that seems a bit silly. This is a pretty fundamental type manipulation, and ideally the compiler would have some understanding of it...

andi999 · on Aug 13, 2020

There is no Jai.

jacobush · on Aug 13, 2020

There Is Only IBM And AT&T And Exxon?

andi999 · on Aug 14, 2020

Could you explain? What I meant, I believe Jai is vaporware and gets too much attention. I mean Jai sound exciting until you find out that the download link is missing.

zero_iq · on Aug 14, 2020

It's still under development, with no public release, but I think it's a bit unfair to call it vapourware, since he has produced plenty of material that demonstrates he has a working compiler and has clearly spent a great deal of time developing it, not just talking about it.

Whether it gets too much attention is debatable. He has some good ideas, imo, worth exploring, although I don't agree with all of his opinions. He has a prominent enough profile that it will likely inspire others to incorporate features from Jai if they are seen to be desirable, even if Jai is not ultimately released. (Personally, I think it will eventually.)

andi999 · on Aug 19, 2020

Well, what is your hunch when it will be released? My expectation is eithet never, or there will be a release and it will crumble quickly when people want to do things with it.

thechao · on Aug 13, 2020

I was privy to a fairly high-wattage conversation between Walid Taha, Jeremy Siek, Todd Veldhuizen, et al, where they discussed this issue at length. They were concerned that languages almost universally equate data types with data structures. Their opinion (drunkenly expressed) was that the program semantics should be based on both. The wrinkle was was happened if you wanted to expose transformations of the underlying structure against the type when trying to say something reasonable about escape hatches.

If I had a PPLTDHTWTPOAE (?) it'd have this capability for sure.

jerf · on Aug 13, 2020

Yes, as I was typing this up I began to realize that's where I was really going. Essentially turning an entire struct of, say, { int A; string B; Thingy C } into only a set of promises about being able to get an int, a string, a Thingy, etc., and promises around being able to set them (let's just do a struct for now, no methods or hiding), and then decoupling the type, being the set of promises being made, from the structure, being the concrete layout in memory and corresponding code capable of fulfilling those promises.

It's all easy in the simple struct case, gets more interesting when you start trying to manipulate them by putting them into arrays or associative maps, etc.

On the one hand, this is nearly revolutionary in some sense, but in another it's simply an incremental advance in a direction we're already going. "Object's are a poor man's closures, closures are a poor man's objects" being an old saying leaning in this direction, and it's hardly a new observation in Haskell that there's not a heck of a lot of practical difference between an associative array of X -> Y and a function (x -> y), at least when it comes to reading them.

thechao · on Aug 13, 2020

In a language like Haskell, you'd parameterize every function by both the "data type" and the "data structure": monomorphization will happily emit ABI-compatible functions. I think a more interesting case is how you'd do this without a monomorphization explosion. It seems like being able to describe both subtyping and substructuring relationship (a la C's structure prefix ABI convention) would keep things mostly in check.

I dunno... seems like you'd be combining all the hard bits of a nominative type system with all the hard bits of a polymorphic structural type subsystem.

throwaway17_17 · on Aug 14, 2020

I am sure I am missing something, but this thread of comments really got me thinking. I started thinking this was a use case for existential types, but I’m not sure it’s immediately applicable. Then went down a research paper rabbit hole and google binge looking at various ideas. Now I’m sitting here thinking (and I doubt this scotch is making it clearer) that this is sort of like Haskell’s type classes being seen as the ‘Data Type’ level and regular type constructors being seen as the ‘Data Structure’ level. Obviously a language would need to incorporate the class syntax as a less separate construct than it is in Haskell, but I can’t shake the feeling this almost captures the thrust of the concept.

svrb · on Aug 13, 2020

Sign me up! I'm constantly defining `XYZ` and `PackedXYZ` for memory-bound problems and I have often though it would be nice to be able to do this (in a more generic way than mere bitfields). It never occurred to me that it could be extended to column stores though.

p1necone · on Aug 13, 2020

Isn't this just an interface in an OO language?

throwaway894345 · on Aug 13, 2020

Typically functional programming isn’t considered to be CPU friendly. Specifically typical functional programming allocates memory all over the heap, especially given the prevalence of things like linked lists and so on. This obviously makes for crumby cache performance and memory pressure. I’m not sure in what sense FP lends itself to columnar data structures, but those would obviously have still better cache and vectorization properties. I’d be curious to learn more!

epistasis · on Aug 13, 2020

The difference is that if you're doing functional programming with vectors as the fundamental data type (rather than scalars in linked lists), then the implementation could be hyper optimized underneath the programmer to maximize computational throughout. Whereas with imperative for loops, the implementation is somewhat fixed because python, C, and C++ force you to choose row (array of struct), or columnar (struct of arrays). R prefers struct of arrays, and makes arrays of struct a very difficult and horribly unacceptable slow. But if you are willing to use higher order functions, R makes struct of arrays suuuuuper fast. Python/C/C++ make arrays of structs the "default", by programmer culture if not by language/runtime capabilities.

carlmr · on Aug 13, 2020

Also I always find it's so easy to make everything data parallel if you program in a functional way.

Sure it might not be the most efficient implementation possible, but due to the ease of parallelizing, you are more likely to just try it out.

anilakar · on Aug 13, 2020

Moving away from pointers and using a shared pool of handles with integer indexes will introduce a whole new array of issues. A plain int carries no information about the validity of the object behind that handle because they are carried around as copies, not references.

I remember debugging a regression with UNIX network sockets where valid connections were being killed, and the bug was only triggered under heavy load. Deadlines were approaching and as a desperate measure I did the exact opposite the article suggests: I wrapped all the socket calls to only accept pointers to an opaque struct with a single integer and made sure the int was set to a guardian value to indicate invalidation after an error. The culprit was a double-close that normal debugging tools such as Valgrind could not find but my unorthodox refactoring did.

Later I've learnt to love UUID's whenever performance allows. They're not pointers so they don't introduce memory handling issues and they can be easily tracked in e.g. distributed data pipelines and logging systems.

AnIdiotOnTheNet · on Aug 13, 2020

> A plain int carries no information about the validity of the object behind that handle because they are carried around as copies, not references.

A pointer is just an integer index to a byte in memory in most computing architectures.

bluetomcat · on Aug 13, 2020

A pointer is an absolute locator of a byte in the address space of a process, while an integer index implies a certain base (start of the module-private array) and a multiplier (object size).

By using handles, "user code" is freed from having to know the base and the multiplier, which allows for a more compact locator that remains constant in case of base/multiplier changes.

kazinator · on Aug 13, 2020

Ah, but that pointer doesn't get re-used on you until you free it, and for that there are strategies like refcounting and GC.

Integers in a small array will get re-used; they have to.

andolanra · on Aug 13, 2020

You should probably read the whole article, because it describes strategies for dealing with exactly this problem in the section under Memory safety considerations and then again in the update at the end. The short form is: if you have a bounded size on your array, then you will often have unused bits in your index pointer; you can use these to store a "generation counter", which needs to match the slot. You can now re-use indices, because you can compare the generation of the index (stored in the extra bits) with the generation count in the slot, and if they match, it means that the item has been 'freed' and a new thing is in the slot.

bluetomcat · on Aug 13, 2020

> A plain int carries no information about the validity of the object behind that handle because they are carried around as copies, not references.

The article suggests a handle containing a "plain int" plus a "unique bit pattern", the latter acting like a watermark which is compared to the one in the private array of the corresponding module, preventing dangling accesses.

Apart from rare occasional collisions in the bit pattern causing a false negative on a dangling access, what other issues can be found with this approach?

rightbyte · on Aug 13, 2020

You are essentially passing an database index key to the getter-function. The only issue I see is that your system has a database integrated into it.

DougBTX · on Aug 13, 2020

That's probably not a bad way to think about it - in an application that large, dealing with lots of data, needing high performance, it is effectively an application-specific database.

bluetomcat · on Aug 13, 2020

On the other hand, think about the improved cache footprint from using 2-byte object handles rather than full-size 8-byte pointers. If the "user code" keeps a lot of those handles in sequential structs, that is a huge win.

DougBTX · on Aug 14, 2020

Oh, sorry, I didn't mean to be disparaging. What I meant was that if the application needs to handle lots of in-memory data, then thinking of it as an in-memory database seems like a great idea. The ECS architecture seems to be pulling in ideas from column oriented databases, there are probably lots of ideas like that which could be repurposed. (Of course there is the other side of that, which is if you can avoid storing lots of data in-memory at all, possibly using an off-the-shelf relational database, then that's a great approach too.)

wtetzner · on Aug 13, 2020

I mean, calling it a database might be technically accurate, but is misleading in the sense the it gives the impression that it's a big heavyweight thing. It's not like there's a SQL implementation in there.

TeMPOraL · on Aug 13, 2020

You jest, but SQLite run in-memory is a pretty fast database. I'm currently using it for ECS implementation in a little game I'm writing on the side (albeit in Common Lisp, and not yet very performance-heavy).

The reason I went this way is because when writing my third ECS with various access pattern optimizations, I realized I'm hand-implementing database indices - so I may as well plug SQLite for now, nail the interface, and refactor it into Array-of-Struct implementation with handles later.

bluetomcat · on Aug 13, 2020

The translation of that key to a memory location should be very straightforward. A few bitwise operations to extract the necessary pieces from the handle, a multiplication of the index, an addition to the private base pointer and there you go. Nothing fancy or CPU-expensive.

throwaway894345 · on Aug 13, 2020

> A plain int carries no information about the validity of the object behind that handle because they are carried around as copies, not references.

All of this applies to pointers as well.

bluetomcat · on Aug 13, 2020

In C world yes, in C++ world no, because of weak_ptr.

asdfasgasdgasdg · on Aug 13, 2020

Weak_ptr in C++ only applies to shared_ptr, which is not used for most C++ pointers. Because of the atomic ops done on each copy, in performance sensitive contexts people aren't going to be using it.

codr7 · on Aug 13, 2020

Been there, done that.

I was writing an interpreter using shared ptrs all over the place.

Performance was fine until I linked pthreads, at which point it took a nosedive forcing me to refactor the entire project.

No one really tells you this. People will say don't use shared ptrs because they're slow, which is far from the whole truth. They're pretty darn fast as long as you don't enable threading.

asdfasgasdgasdg · on Aug 13, 2020

I would go a step farther and say that uncontended shared_ptrs are typically fine. The problem comes in when you have shared_ptrs that are frequently copied in more than one thread. That requires synchronizing the shared state across cores, sometimes across NUMA nodes. That's slow.

The same thing goes for mutexes. Modern, uncontended mutexes are very fast. But they can start sucking resources when you're locking them from multiple threads at the same time.

vvanders · on Aug 13, 2020

Shared pointers also defer ownership decisions and lead to memory leaks via cycles which is its own very special level of hell.

benibela · on Aug 13, 2020

>I was writing an interpreter using shared ptrs all over the place.

I am doing that :/

In Object Pascal it is the standard memory management way. All strings and arrays are reference counted, and so I use it for all my own data structures as well

I have not even added threading and I doubt I will, but the reference counting is already the slowest part

Not sure what the alternative is

asdfasgasdgasdg · on Aug 13, 2020

You can avoid reference counting overhead by using handles as described in this article. Each object in a slab would have just one reference, and you handles would be indexes into an array of them. Now, after you do that you will be on be hook for making sure you don't accidentally deallocate any of the objects that is currently in use. And it's not guaranteed to be faster. But it is one way to get around this issue.

benibela · on Aug 14, 2020

I do not understand how this would help

It would still need reference counting on the handles to know when the handle is not used anymore and the element in the array can be reused

Without reusing handles it would surely run out of memory. Unless a full GC is used

magicalhippo · on Aug 13, 2020

> Later I've learnt to love UUID's whenever performance allows.

This is essentially the same as what the article suggests, just with a lot more spare bits for creating unique handles.

"This is where the ‘free bits’ in a handle come in: Let’s say our handles are 16-bits, but we only ever need 1024 items alive at the same time. Only 10 index bits are needed to address 1024 items, which leaves 6 bits free for something else.

If those 6 bits contain some sort of ‘unique pattern’, it’s possible to detect dangling accesses: [...]"

coldtea · on Aug 13, 2020

>A plain int carries no information about the validity of the object

So, just like a pointer then?

kazinator · on Aug 13, 2020

The problem isn't with the integer indices being by-value, but rather that they are subject to manual memory management. A descriptor can be closed and then replaced by a different object at the same integer value. (Or both can happen atomically with dup2.)

> "I wrapped all the socket calls to only accept pointers to an opaque struct with a single integer and made sure the int was set to a guardian value to indicate invalidation after an error."

This worked because after destroying the descriptor and storing the sentinel value, you didn't destroy the structure.

In production code, that would be a problem. You wouldn't want to leak these wrapper structures, so at some point they would have to be deleted.

And then you're back to the same problem: a structure containing a guardian int is freed while still in use, then the memory is re-allocated to some other purpose, and now the guardian checks are inappropriately testing memory they no longer own.

What works is garbage-collected handles: pointers to objects that wrap a manually-managed resource, but which themselves do not go away while they are referenced.

By the way, POSIX file descriptors could provide a sort of garbage-collected discipline, because they support reference counting. So that is to say, instead of passing around an integer descriptor, you can enforce that dup() must be used by anyone wanting to share the underlying object: everyone must share through their own integer descriptor, not by sharing the integer. Then if five objects or threads or whatever are sharing the same open file, they have five different integers. The object goes away when all five of those contexts separately issue a close on their integer. Nobody closes an integer that they got from somewhere else, always their own dup.

POSIX descriptors also lack the feature described in the article: tag bits. So that is to say, imagine a parallel universe with POSIX descriptors that are not simply non-negative integers clumped close to zero, but have two parts: the integer clumped close to zero, and some bits indicating a generation. For instance suppose descriptor 0x01FC means "generation 0, index 1FC". Then suppose you close(0x01FC) such that the 1FC slot is the lowest available position. Something opens a new descriptor. That results in 0x11FC, not 0x01FC. The generation has incremented. Now suppose a double close happens: another close(0x01FC). The kernel extracts the generation 0, and looks at object 1FC. It sees, hey that thing is generation 1! This is a stale handle! and returns an error.

edflsafoiewq · on Aug 13, 2020

floooh suggests handles not be plain ints, but also have some bits with a 'unique pattern' (eg. generation counter) for safety checks. This is one of the reasons given for handles being better than pointers.

muth02446 · on Aug 13, 2020

I am surprised that not every large project uses this technique. If you are running on a 64bit system, the use of 32bit handles instead of pointer results in huge memory savings. You can also put the start addressed of the arrays (the handles are indexing into) anywhere in virtual memory and grow them as needed. Finally you can stripe the data for one object into multiple arrays and use the same handle to index each of them which can improve locality.

jagger27 · on Aug 13, 2020

> you can stripe the data for one object into multiple arrays and use the same handle to index each of them which can improve locality.

I wish programming languages made it easy to structure arrays of objects like this. I think Johnathan Blows’s JAI has some kind of transparent support for SoA.

yiyus · on Aug 13, 2020

This basically is how tables work in k. Tables are defined as a list of dictionaries, but in fact stored as an array for each element (ie. each column of the table is an array, but each row can be read as a dictionary).

gumby · on Aug 13, 2020

It’s trivial to do this in C++ and often the caller doesn’t even realize.

aks_tldr · on Aug 13, 2020

do you have any pointers of such implementation ?

gumby · on Aug 15, 2020

We have a mixin class that will allocate a subclass with its own array of handles (the handles overload * and -> so they act like pointers). If you have an instance variable you can choose to give it a different allocator, where that instance variable of object with handle `n` is the nth object in the new area. Then your instance method `barf& foo();`, instead of getting the instance variable `barf m_foo` Instead does some inlined arithmetic and gives the caller a reference to the object.

rightbyte · on Aug 13, 2020

I don't know. The hugest saving from not malloc-ing small objects comes from the memory-chunk overhead I believe. You could do object pools and still use pointer as usual I think with no runtime cost of passing handles to getter-functions. The usual pattern of "wrapping" malloc/free.

OnACoffeeBreak · on Aug 13, 2020

What about CPU data cache misses due to the fact that malloc doesn't have a concept for how best to organize that data chunk of data based on your application's specific usage pattern?

batty_alex · on Aug 13, 2020

That can definitely be a problem. Using handles makes it easier to swap out your allocation mechanism if you ever have to optimize your cache efficiency.

Hell, projects that need to squeeze out that level of performance will typically have multiple different memory allocation mechanisms.

rightbyte · on Aug 13, 2020

Ye well then you'll need something fancy. The size saving is there though.

sarchertech · on Aug 13, 2020

Object pools pale in comparison to striping the data into arrays (or as we call it in the game dev world a Struct of Arrays) when it comes to reduced cache misses.

rightbyte · on Aug 13, 2020

Given that you iterate over the object fields in a "column like" manner, right? Random access should be slower since an object of say four elements would have to fetch four cache lines.

codr7 · on Aug 13, 2020

You're more likely to have the next object cached by the time you access it compared to separate allocations.

The more sequential the access pattern is and the smaller the objects are, the greater benefit.

yellowapple · on Aug 13, 2020

Hell, you might even be able to go smaller. For example, if you know that you're going to have less than 256 instances of a given object, an 8-bit handle will do the job nicely at an eighth of the size of a full-blown 64-bit pointer.

benibela · on Aug 14, 2020

Until you get too many objects. That can happy quickly nowadays

wtracy · on Aug 13, 2020

Thinking out loud:

Since handles refer to offer rather than absolute addresses, this sort of scheme should allow for "serializing" data by memmapping chunks of memory to disk.

Obviously, this scheme wouldn't be useful for cross-platform save files, and might not even translate across builds.

Still, as long as you have a robust scheme for invalidating data when a new version is deployed, I could see this being useful for caching, and for saving state for short periods of time on mobile platforms that like to restart processes with little warning. (Rather than potentially losing your place within the game level any time you switch between apps, you only lose your place within the game level when an app update gets installed.)

bluetomcat · on Aug 13, 2020

Handles are essentially integer indexes, relative to a base pointer, also implying a certain object size (needed for index multiplication). A nice property therefore is that the base pointer can be different each time the program is started, but the handle stays constant.

So in case that module-private array is mmapped to disk, handles themselves can be serialised, unlike pointers. This can enable quick and easy storage of most of the program state, allowing for a fast recovery afterwards.

vvanders · on Aug 13, 2020

We used to call this "seekfree" loading back in games because you could read the whole game level in one chunk without seeking on spinning media.

If you wanted pointers out of it after you loaded it you just had a fix-up phase that swapped the offsets for pointers.

We also used the non pointer mmap version on eMMC platforms for really easy ways to store large data while letting the kernel handle the paging with clean pages.

Also forces you to think about data locality which is a powerful thing.

mattgreenrocks · on Aug 13, 2020

Excellent insight, thank you.

coldcode · on Aug 13, 2020

MacOS 1.0 in 1984 used handles in order to allow the heap to be coalesced more easily (although locked handles and regular pointers often created islands). Given the first Mac had 128K RAM and no MMU, it was highly necessary at the time.

greggman3 · on Aug 13, 2020

Those were not the same kind of handles AFAIK. This particular article is about pooling objects of the same time AND using indices (which the articles calls handles) to reference them. They aren't handles by any definition I've heard of in the past. AFAIK the definition of a handle is an opaque reference to some object, when you want to use the object you ask the handle for the object's actual address being aware that the address is only useful for some defined period (like having to call ptr = acquire(handle) ... do work with p ... release(handle) (p is now not valid).

Handles by that definition have the usage the can be moved in memory to clear up fragmentation but they have none of the other benefits this particular post is talking about. Those benefits only come from their specific implementation, not from the idea of "handles".

jlokier · on Aug 13, 2020

Microsoft Windows did the same:

https://devblogs.microsoft.com/oldnewthing/20041104-00/?p=37...

In later versions, heap memory handles were even implemented in Intel x86 hardware:

https://devblogs.microsoft.com/oldnewthing/20041105-00/?p=37...

octetta · on Aug 13, 2020

Aren't *nix file descriptors kind of an OS "handle"?... Okay, I'm hiding in case anyone sees this comment and lobs a criticism my way ;).

javajammer · on Aug 13, 2020

No need to hide, same thought occurred to me reading the article too.

jiveturkey · on Aug 13, 2020

Of course you mean System 1, not MacOS 1.0. Handles were used all the way through System 7, last released in 1997. Perhaps by then it was more about the legacy than the need. Don't know the physical limits, but a Quadra 950 could support 256MB RAM and more through VM (ramdisk), and of course these later gen machines did have MMUs.

In ye olden days, addressing was probably physical, so being non-multi-tasking ensured the safety of using handles, ie if used properly. IOW the system process was free to clean up the heap and manipulate your pointers b/c your app would only yield at specific points where you aren't holding onto a dereferenced handle.

duskwuff · on Aug 13, 2020

> Handles were used all the way through System 7

Handles continued to be used all the way through Mac OS 9, as they were used by a ton of critical Toolbox APIs like the Resource Manager. A vestigal version even existed in Mac OS X -- handles still existed in Carbon, but I'm not sure they would ever be relocated by the system like they were previously.

> of course these later gen machines did have MMUs.

They did, but it was only used to simulate a system with more physical memory. Every process still saw the same view of a single "flat" memory space.

news_to_me · on Aug 13, 2020

> the system process was free to clean up the heap and manipulate your pointers b/c your app would only yield at specific points where you aren't holding onto a dereferenced handle.

True, but it was also your job as a developer to not be holding onto those pointers when the memory manager ran.

coldcode · on Aug 13, 2020

Oh right, my modern brain went to the wrong name.

valuearb · on Aug 13, 2020

Yes, and could be a huge pain to develop in. We always wrapped the handles in library calls to assert them, which minimized problems.

Ideas are in a wheel, same ones always coming around again.

User23 · on Aug 13, 2020

Came here to say this. Referencing objects through handles allowed much better memory management on a primitive OS like System 3.

cmrdporcupine · on Aug 13, 2020

In many ways a pointer on a modern OS with a virtual memory subsystem really is a handle already. It's not like it points to a single given exact location in physical memory. The OS and CPU are already presenting a bit of a facade.

cryptonector · on Aug 13, 2020

Yes! For example, in the GSS-API the various complex, opaque objects in the C bindings are specified in the RFC (2744) as:

   typedef <platform-specific> gss_ctx_id_t;
   typedef <platform-specific> gss_cred_id_t;
   typedef <platform-specific> gss_name_t;

Sadly this is always a pointer in actual implementations, but should have been a handle, where a handle is a structure containing two fields: an index or pointer-like value, and a verifier that can be used to check that the handle is [still] valid. Using a handle would have made it possible to get much better memory safety.

cpr · on Aug 13, 2020

Perhaps they should ask the old MacOS toolbox Handle users what they think.

(Pretty painful trade-off in the long run, is what I think, having lived through that.)

utborin · on Aug 13, 2020

Visions of the "moves memory" icon in THINK Reference came to mind as soon as I saw the headline.

asveikau · on Aug 14, 2020

I think it would be much easier to pull off and work with now than then. The c++ world now has raii and smart pointers, so locking and unlocking could be something the compiler does for you based on scope, all using a pointer syntax.

mypalmike · on Aug 13, 2020

Having worked on games that used pooled allocation techniques like this, and others that just had tons of tiny allocations in a single heap, my experience is that the latter pain is much worse.

dang · on Aug 13, 2020

Discussed at the time: https://news.ycombinator.com/item?id=17332638

steveklabnik · on Aug 13, 2020

(2018)

This is often a really useful strategy to use in Rust code as well.

kmeisthax · on Aug 13, 2020

There's even a crate for it: "generational_arena".

edflsafoiewq · on Aug 13, 2020

You can also use it in GCed languages if you want to avoid creating cycles in the object graph.

earthboundkid · on Aug 13, 2020

Yes, Go supports this idiom very well.

secondcoming · on Aug 13, 2020

Thanks for that contribution, Steve

rwmj · on Aug 13, 2020

Also helps with arrays that you might wish to realloc. I've sometimes hit the (obvious, in hindsight) bug that you realloc an array and all pointers to members that are held elsewhere suddenly become invalid.

hinkley · on Aug 13, 2020

Somehow, my first specialty as a developer was perf analysis, and so all through the Java Golden Era I read every press release about the JVM and subscribed to the ACM SIGPLAN, which had at least one paper on GC in virtually every proceeding that was published, and some occasional cool stuff on interpreter construction. I believe the 4rd technical book I read was The Java Virtual Machine. After Stevens, Comer (3 volumes), and the Java Programming Language.

The original GC had handles. With Hotspot (JDK 1.2, IIRC?) those were gone, and by Java ~8 they were back again, but embedded in the object header, where the pointer indirection presumably hurts less.

In an alternate universe, Sun would have shrunk the memory size for Strings earlier and kept the overhead per object, in which case I suspect the GC stall around 1-2GB would never have happened. But I don't think they had admitted to themselves how Stringly Typed production Java code was, and how UTF-16 wasn't going to last forever.

teddyh · on Aug 13, 2020

“We can solve any problem by introducing an extra level of indirection.”

— “Fundamental theorem of software engineering”

https://en.wikipedia.org/wiki/Fundamental_theorem_of_softwar...

timgarmstrong · on Aug 13, 2020

shared_ptrs, if misused, can cause a lot of memory management issues, because you end up with a web of object lifetime dependencies that is hard to reason about. I think this is a severely underrated problem in C++, particularly for newer developers. shared_ptr is extremely powerful, but it it makes it easy to gloss over issues of object lifetimes and ownership, when in fact those issues are still very important for building robust systems.

In the code bases I've worked on, we have pushed hard to have unique ownership wherever possible and only use shared_ptr where there was a clear need. You can still have a huge object graph kept alive by a single unique_ptr, but it happens less often and it's easier to trace back and fix.

A nice thing about the handle approach is that it makes it a lot harder to build up these object graphs or in general to implement anything without being explicit about object lifetime.

I've seen parts of the handle/object pool approach misapplied and cause more trouble than it's worth, though. It's a good idea for self-contained subsystems where there are a limited number of object types that you would apply this to. I don't think it scales to 100s or 1000s of distinct types of objects because then you're going to have headaches dealing with the sheer number the object pools.

I've also seen object pooling be implemented proactively as an "optimisation" to avoid calling malloc() but then become a bottleneck because of lock contention in the object pool.

zoomablemind · on Aug 13, 2020

Does this assume that the handle consumers have to somehow 'release' the handle in addition to 'deleting' it? As was explained with an example of delayed release of some deleted but still GPU queued element.

I'm not sure if the suggested generation counter approach is the optimal way to deal with the delayed release. Also, when it comes down to own memory managers, benchmarking is a must. Added complexities may eventually eat up the initial savings.

bluetomcat · on Aug 13, 2020

The "generation counter" suggested in the article has nothing to do with delayed release, as far as I understand. It is used as a strategy to avoid returning colliding handles close in the time domain, reducing the chances for a collision of the "unique bit pattern", which would allow non-detectable dangling access conditions.

zoomablemind · on Aug 13, 2020

> ... non-detectable dangling access conditions.

My understanding is that the dangling access conditions potentially arise due to the delayed release of deleted handles.

["... Instead the rendering system would only mark the resource object for destruction when the user code requests its destruction, but the actual destruction happens at a later time when the GPU no longer uses the resource."]

renox · on Aug 14, 2020

Not necessarily, there's also the "use after free" error, which can happen everywhere..

gumby · on Aug 13, 2020

I have a standard mixin class that does this, overloading * and -> so users often don’t even notice.

dtheodor · on Aug 13, 2020

Curious how Rust's ownership interplays with this memory management approach. Does it actually get in the way of implementing this?

steveklabnik · on Aug 13, 2020

On the contrary, it kind of encourages it https://kyren.github.io/2018/09/14/rustconf-talk.html