Glancing at the authors' names, it's possible that none of them are native English speakers. Any chance that the sections you're referring to were just AI-polished rather than AI-generated?
No, this paper was edited yesterday. The original (you can verify on arxiv) contained this incredible section: "6.10 Optimised Routing and Pruning Operations (ORPO)"
The actual ORPO paper is "Odds Ratio Preference Optimisation" and it has nothing to do with pruning. This goes way beyond native language preference.
It takes no time at all to find other major mistakes. For instance, the Mixtral diagram § 6.6.1 shows a single router that selects separate 32-layer transformers. Instead, Mixtral has one router per layer (inside of each block), and it doesn’t select a transformer block: it selects a feedforward.