There were CPUs with whole plethora of optional optimizations. For example Cyrix packed their CPUs with goodies, but had no money to test so made it all optional.
L1, Branch Target Buffer, LSSER (load/store reordering), Loop Buffer, Memory Type Range Registers (Write Combining, Cacheability), all controlled using client side software.
Cyrix 5x86 testing of Loop Buffer showed 0.2% average boost and 2.7% maximum observable speed boost.
https://www.ardent-tool.com/CPU/Cyrix_Cx486.html#soft
https://www.vogons.org/viewtopic.php?t=45756 Register settings for various CPUs
https://www.vogons.org/viewtopic.php?t=30607 Cyrix 5x86 Register Enhancements Revealed
L1, Branch Target Buffer, LSSER (load/store reordering), Loop Buffer, Memory Type Range Registers (Write Combining, Cacheability), all controlled using client side software.
Cyrix 5x86 testing of Loop Buffer showed 0.2% average boost and 2.7% maximum observable speed boost.