Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'd like to see integer types for C/C++ that are able to saturate or trap on overflow - or both types. I'm pretty sure it's even been proposed as a C standard but was rejected or postponed.

It seems like a really obvious feature that's easy to support but isn't.



Free Pascal (fpc) supports trapping on overflow, it uses the overflow flag and tests it after integer operations, it's pretty basic. Here's the assembly output of in Int64 add

  # [16] c := a + b;
    movq U_$P$TEST_$$_A(%rip),%rdx
    movq U_$P$TEST_$$_B(%rip),%rax
    addq %rdx,%rax
    jno .Lj9
    call FPC_OVERFLOW
  .Lj9:
    movq %rax,U_$P$TEST_$$_C(%rip)
It should be easy to support in C/C++, as FPC uses the same LLVM backend.


That’s bizarre - it jumps on not overflow? And jumps forward? But that’s statically predicted not-taken - it’ll statically mispredict and stall the processor every time!


My gut feeling says this is due to FreePascal supporting platforms[1] that only has short conditional jumps (-128 to +127 bytes), like i8086[2]. The alternative on those platforms would be a unconditional jump after the conditional, to jump over the call to the error handler.

On i8086 at least a non-taken conditional jump is 4 cycles while a unconditional jump is 15, for a total of 19 cycles, while a taken conditional jump is only 16. So it makes sense to almost always take the conditional jump.

[1]: https://wiki.freepascal.org/Free_Pascal_supported_targets

[2]: https://edge.edx.org/c4x/BITSPilani/EEE231/asset/8086_family... (page 2-45, or 60 in the PDF)

[3]: page 2-58 in[2] or 73 in the PDF


It's the way that error handling is done in Free Pascal. I've never worried about the code it generates before, I just care about correct results. Yeah, it could be better.


That's hardware specific, right? You're not guaranteed to have a solution on every platform.


All platforms can detect overflow in one way or another; for example by checking for an unexpected sign change.


AMD64 addressing modes cannot detect overflow, and this is where in practice 90% of arithmetic is done!


That can't be true. Why would a compiler use memory access to do math? Math operations are register based, none of the quantities are addresses.

How can you use memory addressing modes to do math, like 987*765?


> That can't be true.

Lol. See for yourself if you don't believe me.

https://godbolt.org/z/o14f7cdsc

> Why would a compiler use memory access to do math?

Because there are efficient instructions for it. Have you written an arithmetic code generation pass? It's normal to do it like this.

> Math operations are register based, none of the quantities are addresses.

An address is just a number. Yeah they're in registers or can be immediate. So what?

> How can you use memory addressing modes to do math, like 987*765?

Because AMD64 addressing modes have a base, a scale, and an offset component. That's multiplying and adding.

Let me know how you think you can work out if that lea overflowed!


Wow, that's nuts. I'd have never expected that abuse.

You can use -ftrapv to check for integer overflows, like this

https://godbolt.org/z/snbdadv8b

>Let me know how you think you can work out if that lea overflowed!

LEA (Load Effective Address) can't overflow... memory is supposed to wrap, the byte after the last byte is byte 0.


Right but have you seen what that’s done to your machine code? That’s why people don’t do it normally.

> LEA (Load Effective Address) can't overflow...

You're mistaken - it's defined to overflow.

https://reverseengineering.stackexchange.com/questions/11442...


I'm not 100% sure, but I think that the behavior of LEA is to wrap, but it does not set the overflow flag


I don't know specific AMD64 addressing modes, but you should always be able to detect overflow in signed 2s complement addition by checking to see if both addends are positive and the result is negative. There is a similar check for subtraction.


Yeah, but the point is addressing modes fuse addition and multiplication and using the result, so there's no opportunity to check anything after the operations.


The only modern architecture which does not have overflow detection is RISC-V.


Interesting. Is it a deliberate decision or just an omission?


Deliberate.

RISC-V architects weighted pros and cons of having a flags register, and pros and cons of having overflow exceptions.

They concluded it is best to not have flags at all (conditional branches do their own testing, and no flag dependencies need to be tracked which simplify superscalar implementations) nor overflow checks (flow breaking and costly; if you need the check, the cost of a software check is minimal, by design).


I understand for overflow exceptions, but I would have expected the cost of a register flag to be zero or near zero, for example, in an integer adder?

It also doesn't look like to me the cost of a software check can always be trivial. It can be for a single operation, but an advantage of an overflow register is that it allows to check for a group of operations as a whole (check/branch just once and abort), which is what is probably practical to do algorithmically. In such scenario switching to software checks for each op and/or bound check the inputs sounds by far not minimal.


Flags aren't that simple. In a superscalar microarchitecture, then you have a lot more to track re: who set the flag, as the flag is a target of every instruction that can set it.


Minimal.. It depends! I remember that the developpers behind GMP (a GNU library for bignum) weren't happy with the performance! But that's pretty niche..


Likewise. That's what I was expecting going in rather than Lovecraftian monstrosities like INT_NAN. Plus they're actually something that's supported in some existing ISAs.


Totally agree. Overflowing silently is almost never what you want. Sometimes it’s useful but that’s the rare case.


Standard (signed) integers in C could be implemented to trap on overflow. Overflow on signed integer is undefined operation, like NULL dereferencing or division by zero, so trap is perfectly expected behavior.

The issue is that unsigned integers must not trap on overflow, and CPUs, if they could be configured to trap on arithmetic overflow, usually do not have separate opcodes for overflowing arithmetic operations and trapping arithmetic operations.


Most compilers, like gcc or clang, have a compile option to trap on integer overflow, instead of leaving the behavior to be undefined for such cases.

For example with gcc one should always use these compile options, unless there exists an extremely serious reason to do otherwise:

-fsanitize=address,undefined -fsanitize-undefined-trap-on-error

These options provide correct behavior not only on overflows, but also on out-of-bounds accesses and other C undefined operations.


It’s fine for debug builds, but people should be aware that building with the address sanitizer enabled balloons the size of the executable and slows it down tremendously.


One should always measure to see if there is a noticeable slowdown or not.

That is what I have meant by "an extremely serious reason to do otherwise".

For many programs the speed is determined by things like the throughput or latency of accessing the main memory, or by I/O operations, or by interaction with an user, in which case the sanitize options have negligible influence on speed.

When necessary, they should be turned off only for specific well tested functions whose performance is limited by CPU computation in registers or in cache memory, and they should remain on for the rest of the program.


ASAN can easily slow your program by 100x or worse. Your bias should always be that its impact would not be acceptable in production. If you were willing to accept the cost of ASAN you should have simply chose a slower, safer language in the first place.


I don't understand. It sounds like you're describing the existing behavior for the existing types. But I'm asking for new types with new behavior. The rules could be defined behavior for these types, which is what I want. I don't want types that happen-to-trap-because-it's-UB. I want a type that traps because that's its mission. And another type that saturates.

I understand the existing undefined behavior, that's not particularly interesting. What would be interesting is well-defined behavior for new types that does not interfere with however defined or undefined behavior exists for the existing types.

> usually do not have separate opcodes for overflowing arithmetic operations and trapping arithmetic operations.

I don't care how good or bad the codegen is. Just emit the code for the right behavior, because I'm using this type that requires this behavior.


> The issue is that unsigned integers must not trap on overflow

You forgot to add 'in C', in Zig they have the same behaviour as signed integers.


Saturating arithmetic is available on some DSPs and may be available to C programs via #pragma on those architectures. IMHO this is a burden to every ALU and instruction that does not need the feature. We should not push application specific features like that down to the basic hardware. We'd also need to provide language support for it.


Saturating arithmetic is probably not all that expensive, since its been in every ARM SIMD unit (and sometimes also scalar) for many years. You just pay a little bit in latency when using those instructions.


+1. Would be great to have integer NAN, INF and -INF supported in h/w.


The k language has implemented this capability (in software) for decades.

https://code.kx.com/q/ref/#datatypes


Yes, true - thanks. Love that in K in particular, and I like K and array languages in general. Another thing they get right. :-)


> trap on overflow

What do you see happening in that case?


The program terminates - depending on context, I suppose. I don't know what all toolchains/backends do but some will generate an unaligned load or some other architecturally illegal code (__builtin_trap). or you could trap to some handler.


x86 has had the `INTO` (interrupt on overflow) instruction (just a one byte form of `INT 4`) for decades. After an `ADD` or other arithmetic operation, you would `INTO` to trap it. However, because compilers love exploiting undefined behavior for speedups, they don't emit that instruction, so it's actually fairly slow on modern processors.


I am not sure if I remember it correctly, but wasn't the problem that you had to explicitly call INTO after every operation? You didn't have a global flag to set or a flag bit in the add instruction, you had to explicitly emit the check. So using it carried a significant performance penalty over leaving the operation unchecked.

> However, because compilers love exploiting undefined behavior for speedups

Except if the behavior had been well defined any attempt to catch overflow would violate the standard - see unsigned integers for reference.

> so it's actually fairly slow on modern processors.

Are you sure that having a potential branch every second operation, that depends directly on the result of the preceding operation, has at any point in time not been horribly slow?


> I am not sure if I remember it correctly, but wasn't the problem that you had to explicitly call INTO after every operation? You didn't have a global flag to set or a flag bit in the add instruction, you had to explicitly emit the check. So using it carried a significant performance penalty over leaving the operation unchecked.

Correct.

> Except if the behavior had been well defined any attempt to catch overflow would violate the standard - see unsigned integers for reference.

Then the compiler could emit them only on signed operations, but they don't.

> Are you sure that having a potential branch every second operation, that depends directly on the result of the preceding operation, has at any point in time not been horribly slow?

Yes; it's been shown that `INTO` is slower than `JO raiseOverflowError` because, by the nature of `INTO` being an interrupt, it doesn't get the advantage of branch prediction, but `JO` would.

On an older processor without a pipeline, sure, it'd be just as slow as an equivalent check, but with how massive the pipelines are on today's processors, it's bad.


> Yes; it's been shown that `INTO` is slower than `JO raiseOverflowError` because, by the nature of `INTO` being an interrupt, it doesn't get the advantage of branch prediction, but `JO` would.

A CPU could predict INTO — simply predicting it as not taken would be a good guess. This would require designing, implementing, verifying, and maintaining, and it might add some area to important parts of the CPU. The vendors haven’t seen it as a priority.

(AIUI at least older CPUs couldn’t predict branches in microcode, and INTO is presumably fully microcoded, so it doesn’t get predicted.)


Right, well a new type would give them that flexibility because I can opt in to the trap or saturate.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: