Just using I$ hit ratio is problematic in many ways. E.g:
- You'll probably not find implementations of different ISAs with identical cache configurations (size, associativity etc).
- It says little about what work is actually done (different ISAs = insns do different amounts of work).
- On x86 all bets are off w.r.t. the effect of the uop cache on the L1I cache hit ratio, and the uop cache hit ratio can't be compared to any other machine.
- You need to reproduce the same program flow on different architectures to be able to compare the numbers.
...etc.
I think that the only reasonable way to do it is to have a multi-ISA simulator where you are in full control of all these aspects. And it would be really hard work.
Re 2, the work per instruction doesn't matter if you compare the same program/program execution, in practice you will get an estimated of the resident set size over the amount of work.
All your other points do stand and that's what I mean with 'is very machine dependent'. And yes, if you want to isolate fully the effect of instruction density an emulator might be the only solution. Still I think that profiling counters can get you 90% there.
- You'll probably not find implementations of different ISAs with identical cache configurations (size, associativity etc).
- It says little about what work is actually done (different ISAs = insns do different amounts of work).
- On x86 all bets are off w.r.t. the effect of the uop cache on the L1I cache hit ratio, and the uop cache hit ratio can't be compared to any other machine.
- You need to reproduce the same program flow on different architectures to be able to compare the numbers.
...etc.
I think that the only reasonable way to do it is to have a multi-ISA simulator where you are in full control of all these aspects. And it would be really hard work.