The other comments have mentioned the tools. On linux, there's good old perf.
There's `perf stat` that uses CPU performance counters to give you a high-level view if your workload is stalling due to waiting on memory: https://stackoverflow.com/questions/22165299/what-are-stalle.... However it won't tell you exactly where the problem is, just that there is a problem.
You can do `perf record` on your process, and then running `perf report` on the generated data. You'll see what functions and what lines/instructions are taking the most time. Most of the time it will be pretty obvious that it's a memory bottleneck because it will be some kind of assignment or lookup.
If you're using an intel processor, VTune is extremely detailed. Here's a nice article from Intel on using it: https://www.intel.com/content/www/us/en/docs/vtune-profiler/... . You'll see one of the tables in the articles lists functions as "memory bound" - most time is spent waiting on memory, as opposed to executing computations.
There's `perf stat` that uses CPU performance counters to give you a high-level view if your workload is stalling due to waiting on memory: https://stackoverflow.com/questions/22165299/what-are-stalle.... However it won't tell you exactly where the problem is, just that there is a problem. You can do `perf record` on your process, and then running `perf report` on the generated data. You'll see what functions and what lines/instructions are taking the most time. Most of the time it will be pretty obvious that it's a memory bottleneck because it will be some kind of assignment or lookup.
If you're using an intel processor, VTune is extremely detailed. Here's a nice article from Intel on using it: https://www.intel.com/content/www/us/en/docs/vtune-profiler/... . You'll see one of the tables in the articles lists functions as "memory bound" - most time is spent waiting on memory, as opposed to executing computations.