The chip's ability to run at sustained load is a part of its design also. Precisely because modern chips has to throttle in order to meet power and thermal envelopes, we should be looking at sustained performance as a more accurate measure.
In a majority of cases, burst performance only affects things like responsiveness, and those things should be measured instead for a better reflection of the benefits.
If you perform an integrated test, would you not perform unit tests? An unit test may show areas for easy improvement if other aspects of the total package are changed.
For example, if someone thought M1 was thermally constrained, they might decide to rip mini out of the case and attach a different cooling method.
"Geekbench 5 is a cross-platform benchmark that measures your system's performance with the press of a button. How will your mobile device or desktop computer perform when push comes to crunch? How will it compare to the newest devices on the market? Find out today with Geekbench 5"
GB deliberately avoids running up the heat because it is focused on testing the chip, not the machine's cooling ability.
Cinebench, as you say, tests "real-world" conditions, meaning the entire machine, not just the chip.