I was responding to the PC's desire to write an analysis pass.
If perf isn't doing the job then you need to use a "proper" profiler like vTune. Perf is really good for the first pass but the analysis it provides is very superficial compared to a better profiler. This is better than analysis in an emulated environment as far as I'm concerned, most simulators don't even come close to mimicing a real processor (even gem5 isn't cycle accurate so if you are latency bound you may get the wrong result)
It also can track page faults, if you want to know where they happen you probably want perf mem
Would Blinkenlights be proper if I told you that it uses Intel Xed? I'm sure Intel vTune is wonderful but I probably can't use it since I usually don't have access to a desktop. I feel the same way about Valgrind. You have to generate the report, copy it over, spin up the VM, view it in KCacheGrind.
> Would Blinkenlights be proper if I told you that it uses Intel Xed?
Why would it? I have every faith that the tool is sound (i.e. parses X86 correctly etc.), I was just thinking aloud about this kind of analysis.
What do you need to optimize but can't run on a desktop? Or is this the same type of "optimization" that I do fiddling around on compiler explorer? e.g. I like making little snippets faster
Sorry I wasn't able to answer your question. What have I ever done to make you treat me so disrespectfully? Business must not be going well if you need to come online and grill computer hobbyists like me.
If perf isn't doing the job then you need to use a "proper" profiler like vTune. Perf is really good for the first pass but the analysis it provides is very superficial compared to a better profiler. This is better than analysis in an emulated environment as far as I'm concerned, most simulators don't even come close to mimicing a real processor (even gem5 isn't cycle accurate so if you are latency bound you may get the wrong result)
It also can track page faults, if you want to know where they happen you probably want perf mem