What i don‘t get is how optimizations should work for Risc-V. For e.g. intel chips you can do optimizations for every generation because some things have been optimized in the architecture so you can do different tricks. But with risc-v only the instruction set is the same and every cpu could implement it differently with different performance characteristics. How should someone optimize for this?
There's two ways: Firstly different chips will have different sets of extensions.
Secondly macro-op fusion - you can optimize particular sequences of instructions in order to try to get certain optimized micro ops, and it's expected in future that different sequences will be needed on different cores. A "wrong" sequence will run slowly but still run.
Of course all of this is a PITA from our (software) point of view.
Yeah because ARM sells the complete implementation for the ARM core. But with Risc-V there may be hundreds of different implementations, you can‘t really add optimizations for everyone.
Not everyone buys the implementations, I specifically listed multiple non-Cortex ones. Cavium/Marvell ThunderX, ThunderX2 (Broadcom Vulcan), Qualcomm Centriq (press F for Falkor), Ampere eMAG (Skylark), all current Apple stuff, are all very different custom implementations.