> I blame Intel for why so many folks avoid assembly language. x86 (the worst as...

akira2501 · on July 11, 2024

I think that's putting the cart before the horse. I think it wouldn't matter which architecture you choose as there will always be deep performance considerations that must be understood in order to write efficient software.

Otherwise your statement might amount down to "I hope there is an ISA that intentionally wastes performance and energy in deference to human standards of beauty."

It's why the annals of expertise rarely makes for good dinner table conversation.

hajile · on July 11, 2024

x86 has a parity flag. It only takes the parity of the lowest 8 bits though. Why is it there? Because it was in the 8086 because it was in the 8080 because it was in the 8008 because Intel was trying to win a contract for the Datapoint 2200.

Sometimes the short instruction variant is correct, but not if it makes a single instruction break down into many uops as the microcode is 1000x slower.

Oh, but you need to use those longer variants without extra uops to align functions to cache boundaries because they perform better than NOP.

Floats and SIMD are a mess with x87, AMX (with incompatible variants), SSE1-4 (with incompatible variants), AVX, AVX2, and AVX512 (with incompatible variants) among others.

The segment and offset was off dealing with memory is painful.

What about the weird rules about which registers are reserved for multiply and divide? Half the “general purpose” registers are actually locked in at the ISA level.

Now APX is coming up and you get to choose between shorter instructions with 16 registers and 2 registers syntax or long instructions with 32 registers ands 3 registers instructions.

And this just scratches the surface.

RISCV is better in every way. The instruction density is significantly higher. The instructions are more simple and easier to understand while being just as powerful. Optimizing compilers are easier to write because there’s generally just one way to do things and is guaranteed to be optimized.

saagarjha · on July 12, 2024

> Optimizing compilers are easier to write because there’s generally just one way to do things and is guaranteed to be optimized.

Yeah I’m sure RISC-V has completely eliminated zeroing idioms

LegionMammal978 · on July 10, 2024

What do you find particularly problematic about x86 assembly, from a pedagogical standpoint? I've never noticed any glaring issues with it, except for the weird suffixes and sigils if you use AT&T syntax (which I generally avoid).

timmisiak · on July 10, 2024

I suspect the biggest issue is that courses like to talk about how instructions are encoded, and that can be difficult with x86 considering how complex the encoding scheme is. Personally, I don't think x86 is all that bad as long as you look at a small useful subset of instructions and ignore legacy and encoding.

LegionMammal978 · on July 10, 2024

True, encoding is one thing that really sets x86 apart. But as you say, the assembly itself doesn't seem that uniquely horrible (at least not since the 32-bit era), which is why I found the sentiment confusing as it was phrased.

Maybe it's the haphazard SIMD instruction set, with every extension adding various subtly-different ways to permute bytes and whatnot? But that would hardly seem like a beginner's issue. The integer multiplication and division instructions can also be a bit wonky to use, but hardly unbearably so.

hajile · on July 11, 2024

An ordinary developer cannot write performant x86 without a massive optimizing compiler.

Actual instruction encoding is horrible. If you’re arguing that you can write a high-level assembly over the top, then you aren’t so much writing assembly as you are writing something in between.

When you need to start caring about the actual assembly (padding a cache line, avoiding instructions with too many uops, or choosing between using APX 32 registers and more normal shorter instructions, etc) rather than some high level abstraction, the experience is worse than any other popular ISA.

LegionMammal978 · on July 11, 2024

> Actual instruction encoding is horrible. If you’re arguing that you can write a high-level assembly over the top, then you aren’t so much writing assembly as you are writing something in between.

One instruction in x86 assembly is one instruction in the machine code, and one instruction as recognized by the processor. And except for legacy instructions that we shouldn't teach people to use, each of these is not much higher-level than an instruction in any other assembly language. So I still don't see what the issue is, apart from "the byte encoding is wacky".

(There are μops beneath it of course, but these are reordered and placed into execution units in a very implementation-dependent manner that can't easily be optimized until runtime. Recall how VLIW failed at exposing this to the programmer/compiler.)

> padding a cache line, avoiding instructions with too many uops

Any realistic ARM or RISC-V processor these days also supports out-of-order execution with an instruction cache. The Cortex processors even have μops to support this! The classic 5-stage pipeline is obsolete outside the classroom. So if you're aiming for maximum performance, I don't see how these are concerns that arise far less in other assembly languages. E.g., you'll always have to be worried about register dependencies, execution units, optimal loop unrolling, etc. It's not like a typical program will be blocked on the μop cache in any case.

> APX 32 registers and more normal shorter instructions

APX doesn't exist yet, and I'd wager there's a good chance it will never reach consumer CPUs.

hajile · on July 11, 2024

Which “one instruction” is the right one? You can have the exact same instruction represented many different ways in x86. Some are shorter and some are longer (and some are different, but the same length). When you say to add two numbers, which instruction variant is correct?

This isn’t a straightforward answer. As I alluded to in another statement you quoted, padding cache lines is a fascinating example. Functions should ideally start at the beginning of a cache line. This means the preceding cache line needs to be filled with something. NOP seems like the perfect solution, but it adds unnecessary instructions in the uop cache which slow things down. Instead, the compiler goes to the code on the previous cache line and expands instructions to their largest size to fill the space because adding a few bytes of useless prefix is allowed and doesn’t generate NOPs in the uop cache.

Your uop statements aren’t what I was talking about. Almost all RISCV instructions result in just one uop. On fast machines, they may even generate less than one uop if they get fused together.

x86 has a different situation. If your instruction generates too many uops (or is too esoteric), it will skip the fast hardware decoders and be sent to a microcode decoder. There’s a massive penalty for doing this that slows performance to a crawl. To my knowledge, no such instructions exist in any modern ISA.

Intel only has one high performance core. When they introduce APX, it’ll be on everything. There’s good reason to believe this will be introduced. It adds lots of good features and offers increased code density in quite a few situations which is something we haven’t seen in a meaningful way since AMD64.

saagarjha · on July 12, 2024

> x86 has a different situation. If your instruction generates too many uops (or is too esoteric), it will skip the fast hardware decoders and be sent to a microcode decoder. There’s a massive penalty for doing this that slows performance to a crawl. To my knowledge, no such instructions exist in any modern ISA.

Not surprising that modern ISAs are missing legacy instructions

tengwar2 · on July 12, 2024

Personally I found 80286 and 80386 much easier than Z80.