In-order parallel designs are "VLIW". The jargon indeed gets thick. :) But as to...

gpderetta · 2024-12-30T22:07:00 1735596420

VLIW is again a different thing. It uses a single instruction that encodes multiple independent operations to simplify decoding and tracking, usually with exposed pipelines.

But you can have, for example, a classic in-order RISC design that allows for parallel execution. OoO renaming is not necessary for dependency tracking (in fact even scalar in order CPUs need dependency tracking to solve RAW and other hazards), it is "only" needed for executing around stalled instructions (while an in order design will stall the whole pipeline).

Again P5 (i.e the original Pentium) was a very traditional in order design, yet could execute up to two instructions per cycle.

ajross · 2024-12-30T22:15:21 1735596921

> VLIW is again a different thing.

No it isn't. I'm being very deliberate here with refusing pedantry. In practice, "multiple dispatch" means "OO" in the same way that "VLIW" means "parallel in order dispatch". Yes, you can imagine hypothetical CPUs that mix the distinction, but they'd be so weird that they'd never be built. Discussing the jargon without context only confuses things.

> you can have, for example, a classic in-order RISC design that allows for parallel execution.

Only by inventing VLIW, though, otherwise there's no way to tell the CPU how to order what it does. Which is my point; the ideas are joined at the hip. Note that the Pentium had two defined pipes with specific rules about how the pairing was encoded in the instruction stream. It was, in practice, a VLIW architecture (just one with a variable length encoding and where most of the available instruction bundles only filled one slot)! Pedantry hurts in this world, it doesn't help.

wtallis · 2024-12-31T02:24:03 1735611843

> Note that the Pentium had two defined pipes with specific rules about how the pairing was encoded in the instruction stream. It was, in practice, a VLIW architecture (just one with a variable length encoding and where most of the available instruction bundles only filled one slot)!

This is ridiculous. There are no nop-filled slots in the instruction stream, and you can't even be sure which instructions will issue together unless you trace backwards far enough to find a sequence of instructions that can only be executed on port 0 and thus provide a known synchronization point. The P5 only has one small thing in common with VLIW, and there's already a well-accepted name for that feature, and it isn't VLIW.

ajross · 2025-01-03T18:46:42 1735930002

Meh. P5's decode algorithm looks very much like VLIW to me, and emphatically not like the 21264/P6 style of dispatch that came to dominate later. I find that notable, and in particular I find senseless adherence to jargon definitions[1] hurts and doesn't help in this sphere. Arguing about how to label technology instead of explaining what it does is a bad smell.

[1] That never really worked anyway. ia64, Transmeta's devices and Xtensa HiFi are all "VLIW" by your definition yet work nothing like each other.

gpderetta · 2024-12-30T22:28:13 1735597693

I'm sorry, but if P5 was VLIW then the word has lost all meanings. They couldn't possibly be more different.

ajross · 2025-01-03T18:47:44 1735930064

The point was exactly that "VLIW" as a term has basically no meaning. What it "means" in practice is parallel in-order dispatch (and nothing about the instruction format), which is what I said upthread.

atq2119 · 2024-12-31T06:17:32 1735625852

VLIW means Very Large Instruction Word. It is a property of the instruction set, not of the processor that implements it.

You could have a VLIW ISA that is implemented by a processor that "unrolls" each instruction word and mostly executes the constituent instructions serially.

tliltocatl · 2024-12-31T14:09:53 1735654193

Also you can have out-of-order VLIW. Later Itaniums were like that, because turns VLIW doesn't help much with random memory access latency.