> a common optimizing operation for most architectures is to trade calculation f...

> a common optimizing operation for most architectures is to trade calculation for memory(unroll loops, lookup tables...)

That really depends. A cache miss adds eons of latency thus is far worse than doing a few extra cycles of work but depending on the workload the reorder buffer might manage to negate the negative impact entirely. Memory bandwidth as a whole is also incredibly scarce relative to CPU clock cycles.

The only time it's a sure win is if you trade instruction count for data in registers or L1 cache hits but those are themselves very scarce resources.