How would it even stop use-after-free and double-free?
The "check" function accesses the allocation because it needs the generation number of the allocation. So basically, the reference needs to access the allocation to check if it can access the allocation. Right.
(That doesn't work of course, because if the allocation was freed, access to the allocation and so its generation number is undefined).
This seems obvious so maybe I am missing something big here?
Or something entirely different is meant or targeted here with "memory safety".
I think the part where your reasoning is invalid is this part:
> so its generation number is undefined
With e.g. a random C compiler and a random malloc, that's true. But why couldn't the language and runtime cooperate to ensure it is defined?
For example deallocation can write a predictable value to that slot, which is never used as a legit generation index. The memory allocator can make sure that a memory address that ever contained a generation can never contain anything else than generation ids for the entire runtime of the program (e.g. by ensuring that for a given page, all objects are the same size and the allocations are aligned to that size). The language can make sure that nothing else can get written to such a memory address by enforcing bounds checks.
Yep, this is the correct answer. Accessing released memory is undefined in C, but well-defined in Vale. The goal is to ensure that the user predictably+safely gets either a segmentation fault or an assertion failure.
We have a future improvement planned here too: for unrelated reasons (to support generation pre-checking) the random generational references implementation will soon not even unmap any virtual address space, instead remapping it to a central page, so we won't even get any segmentation faults, just assertion failures.
Ok but then aren't you going to get memory fragmentation? If you allocate and then deallocate a billion 1kB objects, you can't then coalesce them to allocate larger units because the generation number locations before each 1kB can't be given back to user code.
In the basic generational references approach that was a drawback, and the reason it couldn't release memory back to the OS. We planned to use something like MESH [0] to reduce the fragmentation.
We created two newer approaches since then, which let any memory be reused for any purpose:
* Random generational references, where it's fine if generations overlap with other data.
* Side-table generations, which is slower but we keep the generations in a side-table. It's can be seen in old 0.1 versions as the "resilient-v2" mode, and I plan on resurrecting it for unrelated reasons.
The former will be the default, and the latter we'll be adding back in as an option. Hope that helps!
Traditional memory allocators set aside some of the memory for metadata (for instance to keep track of allocated and free memory regions), I guess that Vale stores the generation count associated with an "allocation item" in a similar way, e.g. somewhere else than the actual items.
Also, the blog post talks about 'generational indices', not pointers. This seems to indicate that items of the same type (or at least same size) are grouped into arrays (and since it's an index anyway, the metadata could be stored in one or multiple separate arrays at the same index).
The big step forward by Vale is that the compiler can elide most of the 'dangling checks' on memory accesses, the method outlined in the blog post requires a few rules-of-thumb the coder must follow when using a pointer that's been looked up from a generational-index.
If I am not wrong, there is a generation number embedded in the reference (smart pointer?). This allows to check if the generation of the reference and the generation of the referee match.
So indeed you it allows you to check for a match, as long as the alloc pointer is valid. The alloc pointer is invalid after a free, because it maybe be in a region no longer accessible to the program (it was returned to the os by free's implementation) or it was given out as part of an other allocation, in which case it can hold arbitrary data.
How would it even stop use-after-free and double-free?
The "check" function accesses the allocation because it needs the generation number of the allocation. So basically, the reference needs to access the allocation to check if it can access the allocation. Right.
(That doesn't work of course, because if the allocation was freed, access to the allocation and so its generation number is undefined).
This seems obvious so maybe I am missing something big here?
Or something entirely different is meant or targeted here with "memory safety".