I still don't understand why it's slower to mask to 33 or 34 bit rather than 32....

nagisa · 2025-09-17T23:17:22 1758151042

That's because with 32-bit addresses the runtime did not need to do any masking at all. It could allocate a 4GiB area of virtual memory, set up page permissions as appropriate and all memory accesses would be hardware checked without any additional work. Well that, and a special SIGSEGV/SIGBUS handler to generate a trap to the embedder.

With 64-bit addresses, and the requirements for how invalid memory accesses should work, this is no longer possible. AND-masking does not really allow for producing the necessary traps for invalid accesses. So every one now needs some conditional before to validate that this access is in-bounds. The addresses cannot be trivially offset either as they can wrap-around (and/or accidentally hit some other mapping.)

kannanvijayan · 2025-09-18T00:07:30 1758154050

I don't feel this is going to be as big of a problem as one might think in practice.

The biggest contributor to pointer arithmetic is offset reads into pointers: what gets generated for struct field accesses.

The other class of cases are when you're actually doing more general pointer arithmetic - usually scanning across a buffer. These are cases that typically get loop unrolled to some degree by the compiler to improve pipeline efficiency on the CPU.

In the first case, you can avoid the masking entirely by using an unmapped barrier region after the mapped region. So you can guarantee that if pointer `P` is valid, then `P + d` for small d is either valid, or falls into the barrier region.

In the second case, the barrier region approach lets you lift the mask check to the top of the unrolled segment. There's still a cost, but it's spread out over multiple iterations of a loop.

As a last step: if you can prove that you're stepping monotonically through some address space using small increments, then you can guarantee that even if theoretically the "end" of the iteration might step into invalid space, that the incremental stepping is guaranteed to hit the unmapped barrier region before that occurs.

It's a bit more engineering effort on the compiler side.. and you will see some small delta of perf loss, but it would really be only in the extreme cases of hot paths where it should come into play in a meaningful way.

zarzavat · 2025-09-18T09:55:07 1758189307

> AND-masking does not really allow for producing the necessary traps for invalid accesses.

Why does it need to trap? Can't they just make it UB?

Specifying that invalid accesses always trap is going to degrade performance, that's not a 64-bit problem, that's a spec problem. Even if you define it in WASM, it's still UB in the compiler so you aren't saving anyone from UB they didn't already have. Just make the trapping guarantee a debug option only.

_nalply · 2025-09-18T12:37:05 1758199025

It's WASM. WASM runs in a sandbox and you can't have UB on the hardware level. Imagine someone exploiting the behavior of some browser when UB is triggered. Except that the programmer is not having nasal demons [1] but some poor user, like a mom of four children in Abraska running a website on her cell phone.

[1]: http://catb.org/jargon/html/N/nasal-demons.html

zarzavat · 2025-09-18T13:07:22 1758200842

The UB in this case is "you may get another value in the sandboxed memory region if you dereference an invalid pointer, rather than a guaranteed trap". You can still have UB even in a sandbox.

Seems like they got overly attached to the guaranteed trapping they got on 32-bit and wanted to keep it even though it's totally not worth the cost of bounds checking every pointer access. Save the trapping for debug mode only.

_nalply · 2025-09-18T13:18:49 1758201529

Ah, so you meant UB = unspecified behavior, not UB = undefined behavior.

Maybe. Bugs that come from spooky behavior at a distance are notoriously hard to debug, especially in production, and it's worthwile to pay for it to avoid that.

azakai · 2025-09-17T23:09:06 1758150546

The special part is the "signal handler trick" that is easy to use for 32-bit pointers. You reserve 4GB of memory - all that 32 bits can address - and mark everything above used memory as trapping. Then you can just do normal reads and writes, and the CPU hardware checks out of bounds.

With 64-bit pointers, you can't really reserve all the possible space a pointer might refer to. So you end up doing manual bounds checks.

kannanvijayan · 2025-09-18T00:14:42 1758154482

Hi Alon! It's been a while.

Can't bounds checks be avoided in the vast majority of cases?

See my reply to nagisa above (https://news.ycombinator.com/item?id=45283102). It feels like by using trailing unmapped barrier/guard regions, one should be able to elide almost all bounds checks that occur in the program with a bit of compiler cleverness, and convert them into trap handlers instead.

azakai · 2025-09-18T01:35:05 1758159305

Hi!

Yeah, certainly compiler smarts can remove many bounds checks (in particular for small deltas, as you mention), hoist them, and so forth. Maybe even most of them in theory?

Still, there are common patterns like pointer-chasing in linked list traversal where you just keep getting an unknown i64 pointer, that you just need to bounds check...

phire · 2025-09-18T00:05:16 1758153916

Because CPUs still have instructions that automatically truncate the result of all math operations to 32 bits (and sometimes 8-bit and 16-bit too, though not universally).

To operate on any other size, you need to insert extra instructions to mask addresses to the desired size before they are used.

dist1ll · 2025-09-18T03:32:13 1758166333

WASM traps on out-of-bounds accesses (including overflow). Masking addresses would hide that.