It actually is true, though it's easy to exaggerate how important it is. Decoding an x86 instruction into the RISCish internal format isn't hard, but decoding 4 at once can be quite a challenge. There's no way to look at a single byte of the incoming instruction stream and figure out if it's the start of an instruction or not, you need to do a fair amount of decoding on each byte of the chunk that you fetch from I$ before you can tell where the instruction boundaries are. For A64 ARM you know that every fourth byte is going to be a new instruction start. I wouldn't put this penalty at more than maaaaybe 20% at the very most, but it is a thing.
EDIT: And there are things like instruction gates that seriously are cruft that gums up how the memory system works, and some would argue that the x86 memory ordering constraints fall under this as well. On the other hand you should listen to Linus rant about how REP MOVS is the best thing since sliced bread that every other architecture is missing out on.
OTOH you can cache the decoded microops (IIRC modern Intel chips do this, while modern AMD and the original Pentium cached instruction boundaries), and in thumb mode ARM is also variable length.
EDIT: And there are things like instruction gates that seriously are cruft that gums up how the memory system works, and some would argue that the x86 memory ordering constraints fall under this as well. On the other hand you should listen to Linus rant about how REP MOVS is the best thing since sliced bread that every other architecture is missing out on.