Long time ago I wrote C. Could anyone fill me in why the first code snippet is a...

tapete2 · 2025-12-11T09:47:38 1765446458

It doesn't even make sense to use strchr for determining the position of 'r', when the code checks that the position of '-' is at index 0.

Your solution is perfectly fine. Even if you don't have access to strchr for some reason, the original snippet is really convoluted.

You could just write (strlen(argv[1]) > 1 && argv[1][0] == '-' && argv[1][0] == 'r') if you really want to.

microtherion · 2025-12-11T10:02:33 1765447353

It could make some sense to use strchr, because in idiomatic UNIX tools, single character command line options can be clustered. But that also means that subsequent code should not be tested for a specific position.

And if you ever find yourself actually doing command line parsing, use getopt(). It handles all the corner cases reliably, and consistent with other tools.

unwind · 2025-12-11T12:14:40 1765455280

Of course, `&&` in C is short-circuiting so it's safe without the `strlen()` too, as long as the argument is there i.e. not NULL.

Also, the use of a convoluted `if` to conditionally assign a literal boolean is a code smell (to me), I would drop the `if` and just use:

    in_reverse = argc > 0 && argv[1][0] == '-' && argv[1][1] == 'r';

if a more forward-thinking/strict check is not needed.

eska · 2025-12-11T15:41:26 1765467686

Your code actually has 2 bugs. The first I assume is just a typo and you meant to use [1][1] == ‘r’. The second one is that you would accept “-rblah” as well.

CerryuDu · 2025-12-11T18:23:37 1765477417

Not to mention the potential signed integer overflow in (*right - *left) and (*left - *right), which is undefined behavior. And even if you rely on common two's complement wraparound, the result may be wrong; for example, (INT_MAX-(-1)) should mathematically yield a positive value, but the function will produce INT_MIN, which is negative.

And then we have this "modern" way of spelling pointers, "const int* right" (note the space). In C, declaration syntax mirrors use, so it should be "const int *right", because "*right" is a "const int".

I feel too old for this shit. :(

adrian_b · 2025-12-12T18:30:23 1765564223

You are right, the implementation of the "compare" function should have used only comparison operators, not subtraction, because unlike subtraction the comparison operations are not affected by overflow (the hardware implementation of integer comparison handles overflow automatically).

When there is no overflow, the sign of the subtraction result provides the same information as a comparison operator, but this is no longer true when overflow happens.

Joker_vD · 2025-12-11T21:14:22 1765487662

    const int left = *(const int*)untyped_left, right = *(const int*)untyped_right;

    return in_reverse?
        (right > left) - (right < left)
      : (left > right) - (left < right);

I wonder if there is a way to actually do it with only arithmetic, without comparisons?

adrian_b · 2025-12-12T18:36:33 1765564593

Using only arithmetic operators would require explicit checks for overflow and this would be a very inefficient implementation, because the comparison operators handle overflow implicitly without any overhead (because the comparison operators do not use the sign of the subtraction result to decide their truth value, but they compute the true sign of the result, by the modulo 2 sum, a.k.a. XOR, of the result sign with the integer overflow flag; this is done in hardware, with no overhead).

The feature that integer comparison must be correct regardless of overflow has been a requirement in CPU design since the earliest times. Very few CPUs, the most notable examples being Intel 8080 and RISC-V, lacked this feature. The competitors and successors of Intel 8080, i.e. Motorola MC6800 and Zilog Z80 have added it, making them much more similar to the previously existing CPUs than Intel 8080 was. The fact that even microprocessors implemented with a few thousand transistors around 1975 had this feature emphasizes how weird is that RISC-V lacks such a feature in 2025, after 50 years and being implemented with a number of transistors many orders of magnitude greater.

Joker_vD · 2025-12-12T19:15:54 1765566954

But RISC-V has it?

    SLT     a2, a0, a1
    SLT     a0, a1, a0
    SUB     a0, a0, a2
    RET

camel-cdr · 2025-12-13T00:17:28 1765585048

ARM:

    subs w0, w10, w11
    b.vx trap

RISC-V:

    subw a0, t0, t1
    sub a1, t0, t1
    bne a0, a1, trap

adrian_b · 2025-12-12T19:41:00 1765568460

Sorry, but this is the kind of ridiculous reply that the RISC-V fans give when they are asked why their ISA lacks many of the features that any decent ISA has and which have a negligible implementation cost, therefore no reason to be missing.

The workaround suggested by the RISC-V documentation consists in replacing a very large fraction of all instructions of a program (because there are a lot of integer additions, subtractions and comparisons in any program, close to a half of all instructions) with 3 or more instructions, in order to approximate what in any other CPU is done with single instructions.

The other ridiculous workaround proposed to save RISC-V is that any high-performance implementation must supplant its missing features by instruction fusion.

Yes, the missing hardware for overflow detection can be replaced by multiplying the number of instructions for any operation and the missing addressing modes can be synthesized by instruction fusion, but such implementation solutions are extraordinarily more expensive than the normal solutions used for 3 quarters of century in the other computers, since they were made with vacuum tubes.

Because of the extreme overhead of checking for overflow, I bet that most programs compiled for RISC-V do not check for overflow, which is not acceptable in reliable programs (even when using C/C++, I always compile them with overflow checking enabled, which should have been the default option, to be disabled only in specific cases where it can be proven that overflow is impossible and the checks reduce the performance).

Joker_vD · 2025-12-13T06:38:29 1765607909

Oh, sorry, I thought you were saying that "RISC-V's comparison instructions don't properly handle integer overflow that internally happens when they do the comparisons, i.e. it only has unsigned comparisons".

zozbot234 · 2025-12-12T19:53:52 1765569232

The cost of overflow checks turns out to be largely about missed optimizations due to heavier constraints wrt. how the program should behave if overflow occurs (e.g. preserving partial results). Having an overflow check instruction in the ISA just doesn't matter all that much, it can even hurt in bignum computation (often cited as a favorable case for overflow checks) by introducing unwanted insn dependencies.

adrian_b · 2025-12-12T20:14:01 1765570441

What you say about missed optimizations is true only when the compiler attempts to handle itself in a graceful way the cases when overflows would occur, instead of raising exceptions.

This is not what is normal overflow checking. Normal overflow checking just raises a specific exception when integer overflow happens.

This has absolutely no effect upon compiler optimizations. The compiler always generates code ignoring the possibility of exceptions. When exceptions happen, the control is passed far away to the exception handler, which decides what to do, e.g. to save debugging information and abort partially or totally the offending program, because an overflow is normally the consequence of an unforeseen program bug and it is impossible to do any action that will allow the continuation of the execution.

You should remember that there is nothing special about integer overflow, almost every instruction that the compiler generates can raise an exception at run time. Any branch instruction, any memory-access instruction, any floating-point instruction, any vector instruction can raise an exception due to hardware. In recent CPUs, integer overflow is not raised implicitly, so you have to insert a conditional branch, but this is irrelevant.

If your theory that the possibility of raising exceptions can influence compiler optimizations were true, there would exist no compiler optimizations, because from every 10 or so instructions generated by a compiler at least a half can raise various kinds of exceptions, in a manner completely unpredictable by the compiler. Adding integer overflow exceptions changes nothing.

camel-cdr · 2025-12-13T00:02:52 1765584172

Ok, let's test it then!

For testing, I use a custom qemu plugin to calculate the dynamic instruction count, dynamic uop count, and dynamic instruction size. Every instruction with multiple register writebacks was counted as one uop per writeback, and to make the results more comparable, SIMD was disabled.

I used this setup to run self-compiling single-file versions of chibicc (assembling) and tinycc (generating object file), which are small C compilers of 9K and 24K LOC respectively. Both compilers were cross-compiled using clang-22 and were benchmarked cross-compiling themselves to x86.

Let's look at the impact of -ftrapv first. In chibicc O3/O2/Os the dynamic upos increased due to -ftrapv for RISC-V by 5.3%/5.1%/6.7%, and for ARM by 5.1%/5.0%/6.4%. Interestingly, in tinycc it only increased for RISC-V by 1.6%/1.0%/1.0%, while ARM increased slightly more with 1.6%/2.0%/1.3%.

In terms of dynamic instruction count, ARM needed to execute 6%/15% fewer instructions than RISC-V for chibicc/tinycc. Looking at the uops, RISC-V needs to execute 6% more uops in tinycc, but ARM needs to execute 0.5% more uops with chibicc. The dynamic instruction size, which estimates the pressure on icache and fetch bandwidth, was 24%/10% lower in RISC-V for chibicc/tinycc.

Note that this did not model any instruction fusion in RISC-V and only treated incrementing loads and load pairs as multiple uops (to mirror Apple Silicon).

If the only fusion pair you implement is adjacent compressed sp relative stores, then RISC-V ends up with a lower uop count for both programs. They are trivial to implement because you can just interpret the two adjacent 16-bit instructions as a single 32-bit instruction, and compilers always generate them next to each other and in sorted order in function prolog code. You can do this directly in your RVC expander; it only adds minimal additional delay (zero with a trick), which is constant regardless of decode width.

Raw data:

    chibicc/clang-O3-armv9:       insns: 419886184    uops:  450136257    bytes: 1679544736
    chibicc/clang-O3-armv9-trap:  insns: 450205913    uops:  474206409    bytes: 1800823652
    chibicc/clang-O3-rva23:       insns: 449328186    uops:  449328186    bytes: 1288202666
    chibicc/clang-O3-rva23-trap:  insns: 474623648    uops:  474623648    bytes: 1375991094
    chibicc/clang-O2-armv9:       insns: 421810039    uops:  451501004    bytes: 1687240156
    chibicc/clang-O2-armv9-trap:  insns: 451642152    uops:  475084965    bytes: 1806568608
    chibicc/clang-O2-rva23:       insns: 449625081    uops:  449625081    bytes: 1286452180
    chibicc/clang-O2-rva23-trap:  insns: 473682134    uops:  473682134    bytes: 1369720036
    chibicc/clang-Os-armv9:       insns: 457841653    uops:  489902437    bytes: 1831366612
    chibicc/clang-Os-armv9-trap:  insns: 497189616    uops:  523323893    bytes: 1988758464
    chibicc/clang-Os-rva23:       insns: 486216287    uops:  486216287    bytes: 1363135906
    chibicc/clang-Os-rva23-trap:  insns: 520889604    uops:  520889604    bytes: 1473263784


    tinycc/clang-O3-armv9:        insns: 115189179    uops:  126358884    bytes: 460756716
    tinycc/clang-O3-armv9-trap:   insns: 117139555    uops:  128361973    bytes: 468558220
    tinycc/clang-O3-rva23:        insns: 137035509    uops:  137035509    bytes: 427878586
    tinycc/clang-O3-rva23-trap:   insns: 139248009    uops:  139248009    bytes: 436548988
    tinycc/clang-O2-armv9:        insns: 115184314    uops:  126568360    bytes: 460737256
    tinycc/clang-O2-armv9-trap:   insns: 117651772    uops:  129195276    bytes: 470607088
    tinycc/clang-O2-rva23:        insns: 137362294    uops:  137362294    bytes: 420468990
    tinycc/clang-O2-rva23-trap:   insns: 138649335    uops:  138649335    bytes: 428680948
    tinycc/clang-Os-armv9:        insns: 130661270    uops:  144718253    bytes: 522645080
    tinycc/clang-Os-armv9-trap:   insns: 132574148    uops:  146565708    bytes: 530296592
    tinycc/clang-Os-rva23:        insns: 152798316    uops:  152798316    bytes: 452181732
    tinycc/clang-Os-rva23-trap:   insns: 154232874    uops:  154232874    bytes: 458257882

CerryuDu · 2025-12-13T00:11:28 1765584688

    return in_reverse?
        (right > left) - (right < left)
      : (left > right) - (left < right);

I prefer (with "greater" being ±1, defaulting to +1):

    return left < right ? -greater :
           left > right ? greater :
           0;

Joker_vD · 2025-12-11T12:22:42 1765455762

I suspect it was adopted from a bigger snippet that had support for parsing things like "-abc" as "-a -b -c", etc.