Yes, I am wondering too. How would it even stop use-after-free and double-free? ...

jsnell · on July 12, 2023

I think the part where your reasoning is invalid is this part:

> so its generation number is undefined

With e.g. a random C compiler and a random malloc, that's true. But why couldn't the language and runtime cooperate to ensure it is defined?

For example deallocation can write a predictable value to that slot, which is never used as a legit generation index. The memory allocator can make sure that a memory address that ever contained a generation can never contain anything else than generation ids for the entire runtime of the program (e.g. by ensuring that for a given page, all objects are the same size and the allocations are aligned to that size). The language can make sure that nothing else can get written to such a memory address by enforcing bounds checks.

verdagon · on July 12, 2023

Yep, this is the correct answer. Accessing released memory is undefined in C, but well-defined in Vale. The goal is to ensure that the user predictably+safely gets either a segmentation fault or an assertion failure.

We have a future improvement planned here too: for unrelated reasons (to support generation pre-checking) the random generational references implementation will soon not even unmap any virtual address space, instead remapping it to a central page, so we won't even get any segmentation faults, just assertion failures.

ajb · on July 12, 2023

Ok but then aren't you going to get memory fragmentation? If you allocate and then deallocate a billion 1kB objects, you can't then coalesce them to allocate larger units because the generation number locations before each 1kB can't be given back to user code.

verdagon · on July 12, 2023

In the basic generational references approach that was a drawback, and the reason it couldn't release memory back to the OS. We planned to use something like MESH [0] to reduce the fragmentation.

We created two newer approaches since then, which let any memory be reused for any purpose:

* Random generational references, where it's fine if generations overlap with other data.

* Side-table generations, which is slower but we keep the generations in a side-table. It's can be seen in old 0.1 versions as the "resilient-v2" mode, and I plan on resurrecting it for unrelated reasons.

The former will be the default, and the latter we'll be adding back in as an option. Hope that helps!

[0] https://arxiv.org/pdf/1902.04738.pdf

flohofwoe · on July 12, 2023

Traditional memory allocators set aside some of the memory for metadata (for instance to keep track of allocated and free memory regions), I guess that Vale stores the generation count associated with an "allocation item" in a similar way, e.g. somewhere else than the actual items.

Also, the blog post talks about 'generational indices', not pointers. This seems to indicate that items of the same type (or at least same size) are grouped into arrays (and since it's an index anyway, the metadata could be stored in one or multiple separate arrays at the same index).

PS: I already linked it elsewhere, but here's how the same can be achieved without language/compiler support (at least it's the same general idea): https://floooh.github.io/2018/06/17/handles-vs-pointers.html

The big step forward by Vale is that the compiler can elide most of the 'dangling checks' on memory accesses, the method outlined in the blog post requires a few rules-of-thumb the coder must follow when using a pointer that's been looked up from a generational-index.

conaclos · on July 12, 2023

If I am not wrong, there is a generation number embedded in the reference (smart pointer?). This allows to check if the generation of the reference and the generation of the referee match.

marhee · on July 12, 2023

Yes, there is a generation number in the reference. It is checked against the generation number of the allocation that is stored in the allocation:

  void __check(GenerationalReference genRef) {
    uint64_t currentGeneration = *(uint64_t*)((char*)genRef.alloc - 8);
    assert(genRef.rememberedGeneration == currentGeneration);
  }

So indeed you it allows you to check for a match, as long as the alloc pointer is valid. The alloc pointer is invalid after a free, because it maybe be in a region no longer accessible to the program (it was returned to the os by free's implementation) or it was given out as part of an other allocation, in which case it can hold arbitrary data.