All of those already existed by the 80s.

ithkuil · on April 22, 2022

US patent for the technology behind hyper-threading was granted to Kenneth Okin at Sun Microsystems in November 1994

code_biologist · on April 22, 2022

I don't want to dismiss hyper-threading as trite — it's not, especially in implementation, but it is pretty obvious.

Prior to 1994 the CPU-memory speed delta wasn't so bad that you needed to cover for stalled execution units constantly. Looking at the core clock vs FSB of 1994 Intel chips is a great throwback! [1] Then CPU speed exploded relative to memory, as was probably anticipated by forward looking CPU architects in 1994.

With slow memory there are a few obvious changes you make to the degree you need to cover for load stalls: 1) OoO execution 2) data prefetching 3) find other computation (that likely has its own memory stalls) to interleave. On the thread level is a pretty obvious grain to interleave work, if deeply non-trivial to actually implement.

Performance oriented programmers have always had to think about memory access patterns. Not new since the 80s to need to be friendly to your architecture there.

[1] https://en.wikipedia.org/wiki/Pentium#Pentium

formerly_proven · on April 22, 2022

CDC 6600 ran ten threads on one processor in a way that seems a lot like the Niagara T1 on paper.

ithkuil · on April 23, 2022

I think the main difference is that the CDC 6600 the PP state barrel would rotate in a regular way constantly multiplexing the execution of the 10 virtual PPs in the same hardware.

Hyper threading is the idea where the multiplexing of multiple threads happens dynamically based on data dependencies / stalls.

yvdriess · on April 24, 2022

Fundamentally they are the same SMT concept though: multiple hardware threads that have separate register (and other) state but share an execution unit.

HT as an implementation tries to give long sequential execution windows to a hardware thread, it's designed to hide occasional long latencies (IO typically), but pays by having a higher switching cost.

A barrel processor trades off individual instruction latency for a higher degree of parallelism and improved latency hiding.

anamax · on April 23, 2022

And between the two was HEP, by Burton Smith.

jandrewrogers · on April 23, 2022

The MTA architecture developed by Tera Computing came several years earlier (later acquired by Cray). It was arguably a purer expression of the concept. And barrel processing concurrency mechanics can be found on much older architectures.

yvdriess · on April 24, 2022

Other way around: Tera acquired Cray and changed to its name.