When you say Mersenne-Twister isn't good enough, what are the other shortcomings apart from speed? It seems that even modern versions of Python are continuing to use it...
The homepage might come across as a a little overzealous (for example ChaCha quality listed as good rather than excellent), but generally has good points.
For example, for one of his arguments, he specifically chose a generator called pcg32_once_insecure, which the PCG author does not recommend due to its invertible output function!
Personally, I have read both arguments in detail and I would always use PCG or even a truncated LCG over xoshiro, which has a large size in comparison, potentially worse statistical properties, and no gain- faster in some benchmarks and slower in others.
Yeah, xor is simpler than multiplication in terms of hardware complexity- luckily, we have the multiplication circuits built in, so may as well take advantage of them.