I very much believe overflow checking should be on for release builds, and not j...

kibwen · on April 29, 2016

The operations themselves may only impose a performance hit of 10% or less, but the optimizations that those checks inhibit can easily cascade into a 2x slowdown, which isn't acceptable if Rust wants to compete with C++.

As for your correlation, I care a lot about safety, but I also recognize that the safest language in the world isn't going to pose any real-world improvement if it's not capable of competing with less-safe languages in their entrenched domains.

pcwalton · on April 29, 2016

> In practice, for anything which isn't a micro-benchmark, it's a negligible performance hit.

Citation needed? Every paper I've seen has indicated a non-negligible performance hit in real apps.

> I see a strong correlation between people complaining about overflow checking overhead, and people who don't actually care about the safety features

I'm a counterexample.

nickpsecurity · on April 29, 2016

Parent might mean this:

http://danluu.com/integer-overflow/

It starts with 2x like kibwen says. Then says those ops will only be small part of workload or something. Extrapolating lead to 5%. Article goes on from there.

Not sure if it applies here but it matches the claim you quoted of micro vs macro benchmark.

vardump · on April 29, 2016

It would be interesting to have support for saturated types as well. u8saturated 200 + u8saturated 200 = 255. That's often desired behavior.

Saturated arithmetic is supported in hardware in SSE. Although not much point if the values need to be constantly shuffled between SSE and scalar registers...

Manishearth · on April 29, 2016

.saturating_add() exists. You can create a wrapper type around this and implement the Add/Sub/Mul traits.

vardump · on April 30, 2016

Does it have branchless codegen?

Manishearth · on April 30, 2016

Not right now (I'm surprised that's the case, there should be intrinsics for it), but that could change.

dbaupp · on April 30, 2016

It gets optimised to be branchless if LLVM thinks that will be faster, yes.

vardump · on April 29, 2016

That can easily lead to excessive branch target buffer invalidation and mess up branch prediction.

It might look acceptable in microbenchmarks, with just 1-10% performance loss and be a total disaster in an actual running system.

A mispredicted branch costs about 15 clock cycles. You'll have a lot of those when CPU can't do bookkeeping on all of your range check branches anymore. The range checks themselves might run fast and fine, but those other branches your program has to execute can become pathological. Those, where the default choice for branches not present in CPU branch prediction buffer and branch target buffer is wrong.

15 clock cycles is rather excessive on any modern CPU that's supposed to execute 1-4 instructions per clock cycle! In the worst pathological cases this can mean an order of magnitude (~10x) worse performance.

Of course data flow analysis might mitigate need for actual range checks, if the compiler can prove the value can never be out of range (of representable values with a given type).

Regardless it is important to understand worst case price can be very high, this type of behavior should not be default in any performance oriented language.

vardump · on April 29, 2016

To add, to avoid any misunderstanding: the cost is probably within 1-10% in likely scenarios on most real world code.

Pathological case would be code with a lot of taken branches mixed with range/overflow checks.

For range checks, any sane compiler would generate code that doesn't branch if the value is in range. If no branch predictor entry is present, branches are assumed [1] to be not taken. These branches can still thrash BTB (kind of cache trashing) and cause taken branches that fell out of BTB to become always mispredicted, thus very expensive.

[1]: Some architectures do support branch hints, even x86 did so in the past. Modern x86 CPUs simply ignore these hints.

pcwalton · on April 30, 2016

Not only that, you lose the ability to pattern match simple arithmetic to LEA and bloat your code size.

giovannibajo1 · on April 30, 2016

It's very hard for an overflow check to be mis-predicted. On x86, for static branch prediction, it's sufficient to generate code with a forward branch to be predicted as unlikely the first time. From that point, as the branch will never be taken (unless it actually triggers a panic but then who cares), it will keep being predicted as not taken.

A correctly-predicted non taken conditional jump on x86 has 0 latency. In the specific case of overflow, you don't even need to generate a cmp instruction. It basically means that's close to a wash, in performance.

The actual negative effect that could be measurable is disabling optimizations (like composition of multiple operations, as you need to check each one for overflow).

Manishearth · on April 29, 2016

> Overflow is defined by the language as an illegal operation

Is it? What do you mean by illegal here? Overflow can't cause UB madness in Rust like it does in C; the compiler does not optimize loops assuming no overflow.

vardump · on April 29, 2016

UB can be confusing, but please remember it also means a lot of very reasonable optimization opportunities.

Say in data flow analysis, when the compiler can assume "(value + 1) > value" to be always true, a whole expensive branch can be removed.

If the overflow is never going to happen anyways, the savings are considerable (expensive branch eliminated) and the result is always correct.

Manishearth · on April 29, 2016

I know that UB can be useful for optimizations. Where did I assert otherwise?

I'm saying that overflow is not undefined in Rust (the very first section of the blog post says this). I made this same mistake a few days ago and thought it was undefined (and implementation-defined to panic on debug), but I was wrong. Overflow is well defined in Rust.

adrianN · on April 29, 2016

Many people complaining about overflow checking either work in high performance computing, where it really matters how fast you can add two numbers, or in embedded systems where you can't afford either the extra space for the instructions, or the extra execution time.

Gibbon1 · on April 29, 2016

I get hammered every time I mention this, but what happens when a primate type overflows should be handled by the type system.

IE, by default overflow checked and is an error. But you can declare a type with something like twos_complement. And the compiler will implement the expected behavior. Or if you want no you can disable overflow checks.

This stuff really should not be global behavior. Because 90% of the time the cost of overflow checks is nil so you should do them.

chewbacha · on April 29, 2016

But the article specifically states that you can enable it globally. Meaning that you can either do it or not do it. And sometimes you want overflow, so you can specifically allow it at a function level.

zAy0LfpBZLC8mAC · on April 29, 2016

I think the only sensible thing to do is to have distinct operators for integer arithmetic (where the impossibility to represent a result obviously is an error that should be caught) and wraparound arithmetic that you can use if you actually need wraparound semantics (or where you need the speed and can prove that it does give you the correct result better than the compiler can).

weinzierl · on April 29, 2016

Swift has this and this is a sensible way to go but not the only one.

I personally would prefer integer overflow resulting in NaN like it does with IEEE floating point operations. Regher has written about this option a bit more in [1]. On the other hand, as long as we have no hardware support this is not going to fly.

If we consider current popular hardware as given, want to be as safe as possible while still retaining performance competitive with C/C++ the solution the Rust community chose is a very good one and one I can live with.

[1] http://blog.regehr.org/archives/641

zAy0LfpBZLC8mAC · on April 29, 2016

Well, I would kindof consider that to be an implementation of that approach, if you think of NaN as an error return. The important part is that you don't return a value that is arithmetically incorrect for the expected semantics, but not recognizable as such by itself.

I think the approach of Rust is stupid because it makes the fast but insecure option the default for all the code that's written without thinking about it too much that isn't in any way performance critical. Most arithmetic operators in code are not performance critical, only a few that are executed often, are, so it doesn't make much sense to make the behaviour that you need in that case the default, when that also is a security risk. And overflow checking only in debug builds is kindof useless from a security perspective.

Having different types instead of different operators also is weird. You don't have "additive integers" and "multiplicative integers" either, where the former's "+" operator does addition, while the latter's does multiplication (which is a form of (repeated) addition, after all!). If you have the same set of numbers to operate on, the obviously sensible thing to do is to have different operators for different ways to map pairs of those numbers to single numbers from the same set.

weinzierl · on April 29, 2016

It's not the same as an error return because the NaN propagates through all subsequent calculations.

In numerical code very often there are no checks for NaN at all. I'd even go so far to say that a lot of FORTRAN code in use today was written before NaN was introduced. I/O functions like printf or WRITE treat NaN properly so if something went wrong you end up with the string "NaN" in your output.

In rust the checked_... methods that return an Option come close to the NaN behaviour.

zAy0LfpBZLC8mAC · on April 29, 2016

NaN is a value that is returned to indicate that an operation could not be performed in the way that is normally expected. How is that not pretty much the definition of an error return?

Propagating an error return through subsequent computations is just monadic error handling ... which is, well, error handling?

Also, of course, there are NaN checks everywhere, as is evidenced by the error propagation. You can't propagate the sentinel value if you don't check for it. It just so happens that this check usually is implemented in hardware as part of every instruction for IEEE 754 numbers, so it's extremely fast in the error-free case and doesn't cause any code bloat, at the cost of some cycles being spent on otherwise useless error propagation in case an error occurs.

Now, what you are saying isn't exactly wrong--but I think it's much less confusing to think of it in this way.

reycharles · on April 30, 2016

n+1 is a number.

Alphasite_ · on April 29, 2016

Sounds like what swift does, checked and unchecked arthritic operators.

vog · on April 29, 2016

Why do people invent these strange terms?

"wraparound arithmetic". "unchecked arithmetics". Really?

The correct term is modular arithmetic, which is well established in maths for centuries. And it is properly backed by a really nice and almost dead-simple theory (at least from a programmer's point of view). Is this no longer taught in school?

If you like to emphasize the fact that it's mostly modulo a power of two (2^n), then call it two's complement. However, that overemphasizes the binary representation, which I find most of the time more distracting than helpful.

As a side note, integer division is usually not performed in modular arithmetics, but addition, subtraction and multiplication are.

jimktrains2 · on April 29, 2016

It's modular arithmetic because of the internals of the processor, not because of some law-of-the-universe.

It could just as easy a processor not allow over- or under- flow operations. However, it's easiest to just truncate the number and essentially do modular arithmetic. That's the difference. Over- or under- flow, checked, wrap-around all refer to the state at which the operation's answer needs more space than it is given. Modular arithmetic and checked instructions are what to do in those situations.

Aside, division in modular arithmetic can be done, but it's not what most people want (hence why usually not performed). 4 / 3 mod 7 -> 4 * 5 mod 7 ( 5 is the multiplicative inverse of 3 mod 7 ( 3 * 5 = 1 mod 7) -> 6 mod 7

vog · on April 30, 2016

> It's modular arithmetic because of the internals of the processor, not because of some law-of-the-universe

I beg to differ. There were various integer representations tried, such as having a separate "sign" bit on integers. But the two's complement finally was used by everyone. So there's something deeply unique to modulo arithmetics that none of the other representations could offer. This is not just about saving a few gates in the circuit.

> However, it's easiest to just truncate the number and essentially do modular arithmetic

The point is, there is no truncatenation step. It's just a side effect. If you build any naive adder circuit, substractor, multiplicator - even if you totally ignore negative numbers, you automatically have a full working two's complement, that works for negative numbers without any additional effort.

And why is that? Because the mathematical structure behind it is so dead-simple that it almost works by miracle. (except for division, which is highly non-intuitive and has more pitfalls than usual pitfall of division by zero.)

As a consequence of this simple math structure, it is very easy to build in hardware without any additional handling of nasty special cases.

jimktrains2 · on May 1, 2016

> As a consequence of this simple math structure, it is very easy to build in hardware without any additional handling of nasty special cases.

Which is exactly my point! It's only the way we build things, not some hard-and-fast way things work intrinsically. I have no argument with it's an extremely simple way to deal with an over- or under- flow. But that's my point! It's a way to deal with the case, not the case itself!

jimktrains2 · on May 1, 2016

I guess I missed my edit window:

> except for division, which is highly non-intuitive and has more pitfalls than usual pitfall of division by zero.

Division literally works the same way it does normally:

10 / 2 = 5 => 5 subtractions of 2 from 10 gives 0.

4 / 3 = 6 mod 7 => 6 subtractions of 3 from 4 gives 0

I wouldn't say it's fraught with pitfalls, just that it's not common to see. FWIW, in an under- or overflow- situation we normally don't want modular arithmetic, hence why we still attempt to trap instead of ignore it.

vog · on May 2, 2016

I was referring to the non-prime modulo values, such as 2^32 or 2^64. In those cases, you do not only have "well-defined" and "division by zero", but also "division with multiple valid results". More generally, this is the issue of zero divisors. [1]

Modulo 2^n, you have proper division only if you divide by odd numbers, then you have division by zero, and finally you have division by any other even number, which is very different from what people used to. There may be no solution, or two solutions, or many more solutions.

Yes, could can still solve them by reducing the modulo, but that's the extra pitfall that you don't have with plain integers or rationals.

[1] https://en.wikipedia.org/wiki/Zero_divisor

khedoros · on April 29, 2016

> Why do people invent these strange terms?

Same reason we talk about a "for loop" or a "while loop", instead of just a "conditional branch". Different words add different context to the concept that they're based on. For example, "modular arithmetic" doesn't include the context of checking for undesired leaks in Rust's arithmetic implementation. "Unchecked arithmetic" does.

zAy0LfpBZLC8mAC · on April 29, 2016

(2147483647 + 1) % m = -2147483648

What is the m?

periodontal · on April 29, 2016

2147483647 + 1 is congruent to -2147483648 modulo 4294967296.

vardump · on April 29, 2016

Maybe it's an Apple thing. Sometimes they make up silly names like "Grand Central Dispatch" for executing a block in parallel in multiple threads. "ARC" makes me still think "adaptive replacement cache", not "automatic reference counting". At least they don't call it "automatic retain count". After all, they used to call "reference count" as "retain count". Confused the heck out of me on my first exposure to Objective-C.

Not that others like Microsoft don't do the same thing, renaming commonly known concepts to something they made up. Like DEP vs NX vs XD. Or Interlocked* for atomic operations. Atomic add in Microsoftese is InterlockedAdd.

renox · on April 29, 2016

Unchecked != modulo in Rust case, I think that the compiler is free to generate either a trap on overflow or a modulo result. After all the MIPS has trap on overflow instructions..

dllthomas · on April 30, 2016

> If you like to emphasize the fact that it's mostly modulo a power of two (2^n), then call it two's complement.

And it's signed, and negative numbers work by, well, two's complement...

Referring to unsigned integers, modulo a power of two, as "two's compliment" is at least weird.

Referring to integers with a different encoding of negative numbers as "two's complement" would be incorrect, even if they still wrap around near a power of two.