They're wider, but that's just because one "word" holds the whole instruction, instead of multiple bytes. In fact, reverse engineering efforts[0] (and the "RISC86" patent[1]) make clear that they're actually "vertical". Intel Goldmont (from [0]) has entries that are 176 bits each, but that's actually three (distinct) 48 bit uops and a 30 bit "sequence word".
Horizontal microcode is much simpler for in-order processors, but my understanding of this stuff seems like they wouldn't work well with the superscalar processors of today. Gating the hundreds of control lines seems (to me) like more effort than gating a few dozen bits of a uop.
In college we were tasked with designing a CPU. Mine was a stack oriented (started register based and retained the registers but most ops were on the stack) that used a very large microcode word, one per clock cycle of the instruction being executed. In the end, I was saving bits from the control word and doing "ready" signals between the blocks so that the microcode didn't need to drive everything. In theory, it could do more than one thing in a clock cycle if the stars aligned just right and there would be no dependencies. No instruction used the feature in the end, because the deadline was too close.
Wish I had the time to implement it. OTOH, I'm glad I never had to debug all the analog glitches and timing bugs that design certainly would show when it colided with reality
Horizontal microcode is much simpler for in-order processors, but my understanding of this stuff seems like they wouldn't work well with the superscalar processors of today. Gating the hundreds of control lines seems (to me) like more effort than gating a few dozen bits of a uop.
[0]: https://github.com/chip-red-pill/uCodeDisasm
[1]: https://patents.google.com/patent/US5926642A