Using uninitialized memory for fun and profit (2008)

devbug · on Sept 4, 2023

A similar data-structure that is also very useful uses sparse-dense arrays but encodes a free-list in the unused memory. It's an ideal data structure if your access patterns are sporadic reads/writes bookmarked by frequent complete iterations of the dense array. You can insert and remove (swap 'n' pop) in O(1) and iterate in O(n) without polluting your cache and skipping over empty slots.

It's frequently used in Entity-Component Systems (ECS) which introduce safety by associating a counter with each entry in the sparse array (usually called a "generation") and incrementing them on deletion. You can also verify cross-linkage between the sparse and dense arrays like the data structure described in the article, but that comes at the cost of another indirection and fetch from memory.

gpderetta · on Sept 4, 2023

Is this related to std::hive [1] as proposed for next C++ standard?

[1] https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p04...

devbug · on Sept 4, 2023

At a quick glance, this is very close. However, the proposal goes through several contortions to maintain pointer stability. If you preallocate sufficient storage for your worst case (which can be unmapped virtual memory) then it makes implementation a lot simpler.

Neat that the committee is looking to introduce it. Too bad we eschew the STL (and even CRT) in games.

mandarax8 · on Sept 4, 2023

This sounds like just what I need for a project, any idea where I could find more info on this or how its called?

orlp · on Sept 4, 2023

In Rust there's my crate slotmap, the above described design matches the DenseSlotMap type. It's also often called a generational arena.

devbug · on Sept 4, 2023

There’s some articles floating around that touch on the idea. Bitsquid comes to mind. [1]

To give a slightly more detailed description, you allocate two arrays of the extents. One is the dense array of whatever type, say T, and the other is the sparse array of 32 bit or 64 bit integers. The sparse array stores indices into the dense array for extant objects. However, for extinct objects, the indices instead point to the sparse array itself to encode a free-list.

To insert, in O(1), you check the free-list for an unused slot. If there is none, then you grow the sparse array (increment a counter) and use the newly introduced slot. Then you insert (or construct in place) your object at the end of the dense array. Finally, you map that slot in the sparse array to the end of the dense array.

To delete, in O(1), you swap the object you wish to delete with the last in the dense array and decrement the counter (swap and pop). Then you add the index in the sparse array to the free-list. You’ll need to know the index assigned to the object you swapped with to do this, which can either be stored on the object itself or by introducing another array the maintains an inverse mapping.

To iterate, you can treat the dense array like any other array.

This scheme poses some challenges:

- You lose pointer stability because you are relocating your objects. You must access them through indices. If you can defer your deletions, say to a frame boundary, you assume your pointers are stable.

- Indices, without further work, are not safe. They are just as safe as pointers. Usually you don’t expect more than N objects, so you can use log2(N) bits for the forward and reverse mappings between sparse and dense arrays and then use the leftover bits to identify stale indices. You can, and should, encapsulate this in a type-safe handle system. [2][3]

- High churn can cause index invalidation to fail (if you wrap a counter) and performs worse than other data structures.

- “Swap and pop” requires T to be trivially copyable, or ideally, trivially relocatable.

It’s also worth mentioning that you can reorder the dense array to maintain custom invariants. This is really useful if you have dependencies between objects and want to iterate in order.[4]

[1]: http://bitsquid.blogspot.com/2014/08/building-data-oriented-...

[2]: https://floooh.github.io/2018/06/17/handles-vs-pointers.html

[3]: You usually can use 24 bits for index and 8 bits for a discriminator (counter) in games. Handles (that you pass around) can be wider to exploit register widths and increase safety, since you have 32 bits for further runtime checks.

[4]: By doing more work at insertion and deletion, a constant factor on O(1), you can save work on iteration or updates, a constant factor on O(n). A great example of this being a worthwhile optimization is for scene graphs or transform hierarchies. If you partially order by depth in the hierarchy, you can guarantee correct computation without chasing indexes or pointers. If you maintain additional information about boundaries, you can trivially parallelize computation of transforms or reduce redundant work.

turndown · on Sept 4, 2023

> return sparse[i] < n && dense[sparse[i]] == i

This line lives in fear of the optimizer. sparse[i] references uninitialized memory and is thus UB and could be optimized out at a certain level. Also I think I read once that even if we’ll allow that sparse[i] stays in the program, there’s still no guarantee that each access of sparse[i] will read the same value since it’s still UB.

laanaah · on Sept 4, 2023

The way I usually deal with this nonsense is to write a function in assembly that returns its pointer argument:

  stop_messing_with_my_code:
    mov rax, rdi
    ret

  extern "C" void *stop_messing_with_my_code(void *);

  void *p = stop_messing_with_my_code(malloc(100));
  // compiler cant know what my function does, it might have initialized it,
  // aligned it properly, returned the correct, type, etc.

Also useful for getting around the idiotic type-based aliasing so you can write custom allocators (i.e a byte pool allocated statically/globally, then allocate data from that which is common in embedded, which is technically not possible in standard C or C++...)

Joker_vD · on Sept 4, 2023

> a byte pool allocated statically/globally, then allocate data from that which is common in embedded, which is technically not possible

Is it? I thought that char-pointers-into-char-arrays are specifically allowed to alias with whatever type.

But yeah, it's a shame we don't have an official "launder" function a-la C++: the malloc/calloc/realloc get to launder the pointers they return simply because the standard declares those functions special.

jilller · on Sept 4, 2023

>Is it? I thought that char-pointers-into-char-arrays are specifically allowed to alias with whatever type.

I think they mean something like this:

  char storage[1024];
  struct pool pool_of_foos = { storage, sizeof(storage) }; // use storage as my memory
  struct foo *p = pool_allocate(&pool_of_foos, sizeof(struct foo));
  // pool_allocate cannot be implemented in conforming C or C++.
  // i.e you cannot place a 'struct foo object' in the memory location containing 'char objects'

How can malloc() work then you might ask? Well it has defined semantics in the standard, so its essentially a magical special exception, you cannot write such a function yourself in standard C, you must use compiler extensions or assembly.

Which is quite insane that it has come to this, I doubt K&R had that as their intention for the language.

Joker_vD · on Sept 4, 2023

After a closer re-reading of C's rules on effective type, I think you're actually right, you return pointers-to-structs that point inside a (static or automatic) char array, you have to slice the storage returned from malloc/calloc/realloc.

But C++ has std::launder that allows you to do that, see the very last example in [0]: it does placement-new on a char array, then launders the pointer to that array recasted as a pointer-to-struct, and it's supposed to not be UB.

[0] https://en.cppreference.com/w/cpp/utility/launder

gpderetta · on Sept 4, 2023

You do not need launder for this. Placement new is enough as long as you refer to the newed object via the pointer returned by new.

My understanding is that launder exists if you need to recover the pointer to the allocated object directly from the pool pointer.

Re C, I'm far from expert, but I think that the issue exists primarily if you try to allocated within an existing named object (like in the parent examu). If you carve out your allocations from a larger chunk of anonymous memory it is supposedly conforming

[A lot of "I think", "supposedly"... All object lifetimes and pointer derivation stuff is underspecified in the standards]

gpderetta · on Sept 4, 2023

Laundering your pointers through no-op inline asm also helps.

thrtythreeforty · on Sept 4, 2023

This is an extremely useful trick for foiling the optimizer. To make it explicit, it goes like this:

    asm volatile("" : "+r"(my_val) ::);

This means "an empty assembly statement that you must assume contains side effects, and takes my_val as an input register and modifies it somehow." Note that because there is no assembly, this is perfectly portable (across architectures, maybe not across compilers, since only GCC and Clang know this syntax).

I've used it to get the compiler to forget that function pointers cannot be null, for example.

Of course you are still in "implementation defined behavior" land re: the standard. But it's often pretty easy to reason like the compiler does once you know your target architecture, and once your values are all unconstrained integers.

alexvitkov · on Sept 4, 2023

If you're running in some theoretical fairy dust C++ spec environment sure, but in reality sparse is just a pointer and the compiler can't know its value, or whether or not it's "initialized".

Even if you construct a degenerate case:

    int* sparse = (int*)malloc(100 * sizeof(int));
    int x = sparse[40]; // provably unitialized

it still won't "optimize out" the access, since

1. it's insane. nobody would write such an optimization because there's no possible performance gain, and it changes program behavior

2. malloc is just a function, it's not treated in any special way

3. there's no guarentee this malloc is a per-spec malloc, it can be a user-defined function for which this is perfectly valid

nlewycky · on Sept 4, 2023

> 1. it's insane. nobody would write such an optimization because there's no possible performance gain, and it changes program behavior

LLVM explicitly has an 'undef' constant value which facilitates this optimization. https://llvm.org/docs/LangRef.html#undefined-values

FWIW, the most important reason compilers do this is to decrease compile time. The compiler notices that some code has undefined behaviour and deletes the code now instead of waiting to prove that the path to this code is unreachable. The later optimizations tend to be slower and scale badly with more code in the function, so deleting it earlier will speed up the compilation.

> 2. malloc is just a function, it's not treated in any special way

The compiler is full of optimizations that treat malloc and other functions specially. This file implements an analysis, but the results of the analysis is used by transformations. https://github.com/llvm/llvm-project/blob/main/llvm/lib/Anal...

> 3. there's no guarentee this malloc is a per-spec malloc, it can be a user-defined function for which this is perfectly valid

Yep, there's a flag for that mode, `-ffreestanding` which corresponds to freestanding mode in the C89 standard, section 2.1.2.1. Without that flag, we assume malloc, strlen, etc., are the standard library functions that do as described in the standard.

turndown · on Sept 4, 2023

Others have shown that your degenerate example really does optimize it all away, but I'll just say that we really do exist in a theoretical fairy dust C++ spec environment, because when we compile a piece of code we compile it against the abstract machine. This is one of those insane gotchas of programming languages - unless you're writing assembly yourself you must pass the bar set by the abstract machine and nothing else. I recommend [0], along with every link in the article, and every article on the blog.

0: https://research.swtch.com/ub

userbinator · on Sept 4, 2023

This is one of those insane gotchas of programming languages

...created only by adversarial compiler writers who seem to be as user-hostile as a lot of other "developers" these days. They can get away with being pedantic assholes, because they know that realistically no one is going to switch compilers that easily.

That said, C and even C++ used to not be like this.

tyg13 · on Sept 4, 2023

This seems like the sort of a comment that can only be written by completely refusing to understand why compiler writers take advantage of UB. Compiler writers are not engaging in "adversarial" behavior with their users -- they are their own users!

The simple truth is a lot of UB exists to allow for optimizations that programmers entirely take for granted. In order to optimize something, you have to make assumptions, particularly when dealing with languages as permissive as C and C++. By programming to an abstract machine and explicitly specifying which behavior is defined and which is not, we allow a compiler to make these assumptions. Despite what you may wish, a compiler is not able to read your mind to know when a given piece of UB is actually code that you intended to execute with particular semantics, or whether it's the result of some other optimizations on a code path that will never execute.

eru · on Sept 4, 2023

> Compiler writers are not engaging in "adversarial" behavior with their users -- they are their own users!

Only if your language is self-hosting, isn't it?

arcticbull · on Sept 4, 2023

That's usually a major checkpoint for a programming language maturing no?

Go became self-hosting in 2015 (1.5) [1]

Rust was originally written in OCaml and became self-hosted in 2011 - excluding LLVM of course, but I guess, that's kind of up to you where you draw that specific line.

[1] https://go.dev/blog/go1.5

tialaramex · on Sept 4, 2023

It might make sense for a General Purpose language, but really only if the language's goals make it suitable.

For example suppose you decided goal #1 is ease of use for beginners so you consciously choose verbose syntax. It's unlikely that your compiler is written by beginners and they may get sick of needing all this verbosity.

For Special Purpose languages it only makes sense if their purpose is centrally writing compilers. It doesn't make sense to write your Verilog "compiler" in Verilog.

eru · on Sept 5, 2023

> That's usually a major checkpoint for a programming language maturing no?

It depends on the goals of the language. Of course, compiler writers tend to be (relative) fans of their language, so they might want to use it more than might be wise.

But eg I wouldn't want to write _any_ compiler in C or Fortran or Cobol. That includes not wanting to write C, Fortran or Cobol compilers in these languages.

(But I think any language that you might want to write a compiler in should have algebraic data types and pattern matching at the very least, especially these days.)

Somewhat ironically, writing a Python compiler in Python would be an ok idea, but writing a Python interpreter in Python is probably not a good one.

sweetjuly · on Sept 4, 2023

Compiler authors aren't doing this because they hate you, they're doing this because we want ever high performance. C and C++ didn't use to be like this because compilers were not capable of making the optimizations they can today.

userbinator · on Sept 4, 2023

MSVC and especially ICC got higher performance than GCC without exploiting UB, so how do you explain that?

kaba0 · on Sept 5, 2023

They definitely “exploit” UBs, any proof on the contrary?

spacechild1 · on Sept 5, 2023

> MSVC

Citation needed. This does not match my experience.

> ICC

If I'm not mistaken, this is largely because ICC uses unsafe math optimizations by default.

userbinator · on Sept 5, 2023

Here's some examples where ICC comes out ahead, often far ahead:

https://keyj.emphy.de/compiler-benchmark/

https://iitd-plos.github.io/col729/labs/lab0/lab0_submission...

As for MSVC, my experience has been that it often generates smaller and slightly faster code than the others (except for ICC), but there's definitely some styles of code which it won't want to optimise.

hoseja · on Sept 4, 2023

Somebody removed all the insane UB-"optimizations" and the result wasn't even any slower. Can't find the blog now.

ynik · on Sept 4, 2023

How do you decide whether an UB-optimization is insane or not? "I know it when I see it" may work for lawmakers, but it's not good enough for compiler developers.

Because if you want to disable all UB-based optimizations, there's already a compiler flag that does that: -O0.

Seriously, in a language as low level as C++, every optimization needs some form of assumptions, and UB is how we currently allow compilers to make those assumptions.

Example: because it's undefined behavior to read pointers out-of-bounds, it is not possible for a valid program to scan a whole stack frame, so it is okay for the compiler to remove variables from the stack frame and instead store them in registers.

Without UB-optimizations, you would need some others justification of why it is okay for the compiler to change the result of a stack scan. Or use -O0 which keeps all variables on the stack. (but that certainly will be slower!)

userbinator · on Sept 4, 2023

Look at Intel's compiler or MSVC, vs. GCC and LLVM. The former two are known to rarely exploit UB, yet are competitive and even faster with certain code (especially ICC) compared to the latter two.

and UB is how we currently allow compilers to make those assumptions.

No, the behaviour of the target hardware is what should guide that.

Example: because it's undefined behavior to read pointers out-of-bounds, it is not possible for a valid program to scan a whole stack frame, so it is okay for the compiler to remove variables from the stack frame and instead store them in registers.

UB has no bearing on that. If a variable has never had its address taken, then there's no expectation that it ever be in memory.

gpderetta · on Sept 6, 2023

> UB has no bearing on that. If a variable has never had its address taken, then there's no expectation that it ever be in memory.

there were plenty of programs that expected specific stack layouts and were broken when compilers started optimizing stack spilling/register allocation more aggressively. Remember that register existed as a keyword.

The address taken might have been a useful practical rule for some compiler, but I don't think it was ever endorsed by the standards (except for prohibiting taking the address of a register variable in C).

tyg13 · on Sept 4, 2023

I'd be quite interested to read that post, given that one of the big UB hammers is strict aliasing which gets used a _lot_

kaba0 · on Sept 4, 2023

As mentioned in another comment in this thread, UB optimizations (whatever that means) are not about “hah, let’s make this edge case a trap for the programmers”, it is about general assumptions the compiler writer gets to make if they leave the door open for that through UB.

Say, you could remove this memory read and reuse the variable’s value from the top of the function, but to be able to make this optimization you need it explicitly stated that no other thread can modify the memory in the meanwhile. UB makes the assumption valid, without it you could barely apply any optimizations.

nwallin · on Sept 4, 2023

c code:

    #include <stdlib.h>

    int foo(int x) {
        int* p = malloc(1024);
        x = p[x];
        free (p);
        return x;
    }

assembly:

    foo:                                    # @foo
            ret

https://godbolt.org/z/jfvKon5v7

4gotunameagain · on Sept 4, 2023

change `int* p = malloc(1024);` to `volatile int* p = malloc(1024);`

asm:

  foo:                                    # @foo
        push    rbx
        mov     ebx, edi
        mov     edi, 1024
        call    malloc@PLT
        movsxd  rcx, ebx
        mov     ebx, dword ptr [rax + 4\*rcx]
        mov     rdi, rax
        call    free@PLT
        mov     eax, ebx
        pop     rbx
        ret

https://godbolt.org/z/E6Teqns8b

tleb_ · on Sept 4, 2023

That is because there is a read to p[x]. If it gets removed, the malloc and free disappear. volatile shouldn't appear in such code that does not handle MMIO memory.

4gotunameagain · on Sept 4, 2023

Why not? I have used volatile countless times to work around optimisations

dwattttt · on Sept 4, 2023

The optimisation is (absent a bug in the compiler) valid, unless you've written UB in there somewhere. The _reason_ to try not to volatile something is that optimisation is quite a powerful one; if the compiler can prove something written to memory can't have changed since it was written, then it doesn't need to load it from memory if it's later used. Hits to memory can be quite slow, so this optimisation entirely removing the hit can make a big difference if the code is performance sensitive.

renox · on Sept 6, 2023

In the general case you're right, in this case where the optimization is related to the usage of uninitialized memory I don't think that you're correct. Of course this means that the programmer must 'think like an optimizer' and rewrite is-member(i): return sparse[i] < n && dense[sparse[i]] == i as uint sparse = sparse[i] return sparse < n && dense[sparse] == i otherwise the compiler would generate two load instead of 1.

adwn · on Sept 4, 2023

Well, you're 100% wrong: https://godbolt.org/z/en31xx9Yx

clang optimizes this function to a single ret, optimizing away not just the memory access, but the call to malloc() as well. I does this (and is allowed to do this) because it recognizes that sparse[40] accesses uninitialized memory, which is UB, and therefore anything goes – including replacing the entire function with a no-op.

pca006132 · on Sept 4, 2023

> and therefore anything goes – including replacing the entire function with a no-op.

Well, I think the compiler is just thinking that the control flow of a correct program cannot cause UB, so the path causing UB is infeasible path and is optimized away. E.g. https://godbolt.org/z/h69GnfYbd where the path causing UB is optimized away and it just returns 42 (and a call to rand).

lifthrasiir · on Sept 4, 2023

I wonder why the original example emits `ret` instead of `ud2`. Clang is known to emit it for paths with the provable UB.

ynik · on Sept 4, 2023

It's because accessing uninitialized memory is not UB in clang.

From clang's point of view, a read from uninitialized memory returns the special value `undef`. Many operations applied to `undef` also return `undef`, but some (e.g. branching `if (undef)`) are outright UB. But the original example doesn't do anything with `undef`, so it's not clang-UB. The compiler generated valid code for the function, it just happens to pick "whatever was previously in RAX" as the value read from the uninitialized memory. The malloc call is optimized out because it's no longer necessary after the uninitialized read was replaced (the same will happen to any other malloc call where the return value is unused).

Note that the reason for this weird `undef` semantics ("almost but not quite UB yet") is that padding bytes are also uninitialized, but copying a whole struct incl. padding (even a memcpy-style byte-wise copy) must be defined behavior.

nwallin · on Sept 4, 2023

Theirs not to reason why. Theirs but to do and die.

Likely there's some SSA going on that recons the value from the pointer is undefined. So any value will do. And whatever value happens to be in eax is a perfectly cromulent value; might as well just return that.

lifthrasiir · on Sept 4, 2023

I mean, there is even no reason to emit a single instruction. However it would be helpful for the unconditional UB to turn to a trap (in fact, UB is particularly bad because some of them won't have a visible effect for a long time). Given clang and LLVM does emit ud2 for traps from UB, I think this could also have become ud2 but it somehow didn't.

mjan22640 · on Sept 4, 2023

is there an option to emit a warning on encountering ub?

nlewycky · on Sept 4, 2023

That's impossible in the general case, but there are compiler warnings that try to help you catch as much as they can, use `-Wall` when building code, and there is runtime instrumentation `-fsanitize=undefined` to catch as much UB as possible without breaking the ABI. Also consider using `-fsanitize=address,undefined` while developing and `-fsanitize=thread` for threaded code. `-fsanitize=memory` is a bit difficult to deploy because you need all your libraries (except libc) to be built with it too, including your C++ standard library if you're building C++.

lifthrasiir · on Sept 4, 2023

Others have explained pretty well about UBs, but I would give an explicit evidence against 2 by quoting the actual glibc code [1]:

    /* Allocate SIZE bytes of memory.  */
    extern void *malloc (size_t __size) __THROW __attribute_malloc__
         __attribute_alloc_size__ ((1)) __wur;

Expanding macro gives three GCC function attributes [2]: `__attribute__ ((malloc))`, `__attribute__ ((alloc_size(1)))` and `__attribute__ ((warn_unused_result))`. They are used for GCC (and others recognizing them) to actually ensure that they behave as the standard dictates and give better error messages. Your own malloc-like functions won't be treated same unless you give similar attributes.

[1] https://github.com/bminor/glibc/blob/807690610916df8aef17cd1...

[2] https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attribute...

dwattttt · on Sept 4, 2023

to address the points in order:

1. a specific optimisation pass won't do this, but the combination of several passes written with other intentions can

2 & 3. the C runtime a compiler links with is known to the compiler/linker, and it _does_ usually treat intrinsics like malloc specially

A quick google reveals https://github.com/llvm/llvm-project/issues/52930, LLVM recognising uninitialised memory as 'undef', and needing to treat it as poisoned (IIRC propogation of undef can cause things like separate uses of the variable to not even unify to a single value)

EDIT: duplicated link

lifthrasiir · on Sept 4, 2023

See also https://stackoverflow.com/a/43825157 for possible workarounds. If you care about the standard comformance `calloc` seems the best approach.

anonymoushn · on Sept 4, 2023

If we actually got sparse from mmap, it's all zeros, and if it has nonzero numbers in it, it's because we previously wrote them. Seems like you only catch nasal demons if you make the mistake of calling malloc.

gpderetta · on Sept 4, 2023

or use calloc. Or just zero the whole buffer the first time: both mmap and calloc will have to zero on first use anyway, but future resets won't need zeroing.

Or use optimization barriers as described elsethread.

ahoka · on Sept 4, 2023

Yes, this is actually a good solution.

tedunangst · on Sept 4, 2023

It's not required that the memory be uninitialized. The trick also permits reusing the same memory without clearing every time.

saagarjha · on Sept 4, 2023

For those wondering: in C, this is probably OK on your system, but if you don’t fully understand the explanation I give below, then you should get sign-off from an expert before using this code. Here’s how it works:

Accessing an uninitialized variable is always undefined, in the following case:

* the storage can be declared with the register keyword

Because this is an array of uninitialized data, this is not relevant. In that case, the standard specifies that the value produced is indeterminate. An indeterminate value can either be:

* a trap representation

* an unspecified value

unsigned char never has any trap representations. Its value when uninitialized is always unspecified. Other types may have a trap representation, in theory: this is implementation defined. On your system this is almost certainly not true for integer types, but you may have signaling floats. Accessing a value that is a trap representation is undefined behavior.

If you don’t have any trap representations, then the value produced is unspecified. An unspecified value is, morally, “some random value from the range of values this thing can take”. Of note, the value produced is not required to be consistent: two “reads” can produce different results. However the act of performing the read and the results it produces (bounded by what I just mentioned) are well defined.

Putting it all together, in the case of this algorithm, on a system where integers do not have trap representations, the uninitialized read will produce an unspecified value, which the algorithm does not rely on a consistent result from. So, overall, it’s probably ok.

If you are using this code and don’t have a comment that describes exactly what I just said then you need to add one immediately. There are not many cases where it is crucial to juggle the precise definition of well-defined, undefined, implementation-defined, indeterminate, and unspecified. This is one of them.

lmm · on Sept 4, 2023

I don't believe it's guaranteed to work correctly on any system, because in this code:

    is-member(i):
    return sparse[i] < n && dense[sparse[i]] == i

there's no requirement for the two sparse[i] to take the same value (since that expression evaluates to an indeterminate vaule). So the second occurrence of sparse[i] can take a value >=n, or indeed greater than the length of the whole array, at which point reading it is definitely UB.

au8er · on Sept 4, 2023

Would it be better to do

  sparse_val = sparse[i]
  return sparse_val < n && dense[sparse_val] == i

Regardless, we are treading in undefined territory, and the behaviour is not something that can be reliably be dependent upon

ynik · on Sept 4, 2023

At least in clang, your variant is not any better: uninitialized reads return the special value `undef`, which the compiler can constant-propagate to both uses of `sparse_val`. But then each `undef` can turn to a different value at each use, even if they both came from the same uninitialized read.

saagarjha · on Sept 5, 2023

Yes, I missed that, sorry! You’re correct.

dang · on Sept 4, 2023

Using Uninitialized Memory for Fun and Profit - https://news.ycombinator.com/item?id=3565212 - Feb 2012 (14 comments)

Using Uninitialized Memory for Fun and Profit - https://news.ycombinator.com/item?id=1156628 - Feb 2010 (16 comments)

NikkiA · on Sept 5, 2023

A neat trick that I discovered at university was that VMS' email system used 'sparse' disk blocks for your personal mailbox file, and immediately after forcing a rebuild of your mailbox by compacting it, you'd have dozens of sparse, uninitialised, disk blocks in your file - which you naturally had read access to.

Since compacting mail was the most common activity on a time shared university system, the odds were a good number of those uninit'ed blocks would be other people's erased blocks, and thus you could essentially read other people's emails just by sitting and compacting your email, then hex viewing the raw mailbox file.

tialaramex · on Sept 4, 2023

This maybe calls for a way to tell the compiler that we don't care what value this object has, but we promise it does already have some concrete value.

As written this can't work, but once the compiler understands what's going on it seems fine.

In Rust I'd imagine this as a method on MaybeUninit<T> the type which might be a T but maybe not yet. It's possible we can just assume_init() today where T is something like a byte but if it's unclear, a method which expresses our intent to get an unspecified value is better than just saying well, maybe you can assume_init() but we're not sure if that means what you intended.

iudqnolq · on Sept 4, 2023

Problem is an uninit value is unstable. The compiler could be using it to store something else, and could change what it's storing at different times. You need a freeze operation to say "I don't care what that this is, but it should have one value".

https://www.ralfj.de/blog/2019/07/14/uninit.html

tialaramex · on Sept 4, 2023

Right, I have seen that post but I didn't end up remembering the use of "freeze" to name this operation.

I'd be fine if it was named freeze, labelled unsafe and specifically called out in Safety as requiring that all bit patterns for T are valid representations of T which I think meets the requirement, but it sounds like this can't be done (easily?) without help from LLVM

raphlinus · on Sept 4, 2023

Right, in relatively recent versions of LLVM, there is a "freeze" operation. Making it available to programmers, possibly as a method off MaybeUninit, would potentially be useful. The safety overlaps that of "safe transmute" (and ecosystem crates such as bytemuck that provide that). The Pod trait[1] probably establishes the invariants so that freeze on uninitialized memory is safe.

I imagine the details are tricky (my understanding is that "freeze" took a long time to land in LLVM), but that there's no fundamental reason it can't be done. But I also imagine there are a lot higher priorities; the use case in this blog is pretty niche.

There's more discussion in this thread: [2]

[1]: https://docs.rs/bytemuck/latest/bytemuck/trait.Pod.html

[2]: https://internals.rust-lang.org/t/freeze-maybeuninit-t-maybe...

tialaramex · on Sept 4, 2023

I definitely care more about growing what is possible, and specifically what can be written correctly than about the - perhaps more sensible - goal of making possible things easier. So that's a place where my priorities aren't Rust's priorities. My preferences absolutely can lead to the Turing Tarpit.

Reading for after work thanks for links