I don't want to dismiss hyper-threading as trite — it's not, especially in implementation, but it is pretty obvious.
Prior to 1994 the CPU-memory speed delta wasn't so bad that you needed to cover for stalled execution units constantly. Looking at the core clock vs FSB of 1994 Intel chips is a great throwback! [1] Then CPU speed exploded relative to memory, as was probably anticipated by forward looking CPU architects in 1994.
With slow memory there are a few obvious changes you make to the degree you need to cover for load stalls: 1) OoO execution 2) data prefetching 3) find other computation (that likely has its own memory stalls) to interleave. On the thread level is a pretty obvious grain to interleave work, if deeply non-trivial to actually implement.
Performance oriented programmers have always had to think about memory access patterns. Not new since the 80s to need to be friendly to your architecture there.
I think the main difference is that the CDC 6600 the PP state barrel would rotate in a regular way constantly multiplexing the execution of the 10 virtual PPs in the same hardware.
Hyper threading is the idea where the multiplexing of multiple threads happens dynamically based on data dependencies / stalls.
Fundamentally they are the same SMT concept though: multiple hardware threads that have separate register (and other) state but share an execution unit.
HT as an implementation tries to give long sequential execution windows to a hardware thread, it's designed to hide occasional long latencies (IO typically), but pays by having a higher switching cost.
A barrel processor trades off individual instruction latency for a higher degree of parallelism and improved latency hiding.
The MTA architecture developed by Tera Computing came several years earlier (later acquired by Cray). It was arguably a purer expression of the concept. And barrel processing concurrency mechanics can be found on much older architectures.