That last part is important. I have worked with many engineers who I would even ...

TillE · on Aug 21, 2023

Learning about how the CPU cache works is probably the most useful thing you can do if you write anything that's not I/O limited. There are definitely a ton of experienced programmers who don't quite understand how often the CPU is just waiting around for data from RAM.

mcculley · on Aug 21, 2023

It is a shame that there are not better monitoring tools that surface this. When I use Activity Monitor on macOS, it would be useful to see how much of “% CPU” is just waiting on memory. I know I can drill down with various profilers, but having it more accessible is way overdue.

saagarjha · on Aug 22, 2023

Instruments?

mcculley · on Aug 22, 2023

Digging around in Instruments is the opposite of accessible.

Every OS always had easy ways to tell if a process is waiting on disk or network (e.g., top, Activity Monitor). The mechanisms for measuring how often a process is waiting on memory exist, but you have to use profilers to use them. We are overdue to have them more accessible. Think of a column after “% CPU” that shows percentage of time blocked on memory.

saagarjha · on Aug 22, 2023

What would you do with that information? You'd need a profiler (and either a copy of your code, or a disassembler) to make it actionable…

mcculley · on Aug 22, 2023

I would do the same thing with the information I get from top and Activity Monitor: use that to guide me to what needs investigating.

I am often developing small one-off programs to process data. I then keep some of these running in various workflows for years. Currently, I might notice a process taking an enormous amount of CPU according to top, but it might really be just waiting on memory. Surfacing that would tell me where to spend my time with a profiler.

saagarjha · on Aug 23, 2023

I’m having a very hard time imagining how you would go from a “percent time waiting on memory” to something productive without doing more work in between. Even assuming you’re dealing with your own, native code, the number tells you almost nothing about where the problem is. The only process I’ve ever seen working is “hmm I have a CPU-bound performance problem (as reported by e.g. Activity Monitor)” → “I used a profiler and the problem is here or it’s spread out” → “I used a specialized tool”.

mcculley · on Aug 23, 2023

> The only process I’ve ever seen working is “hmm I have a CPU-bound performance problem (as reported by e.g. Activity Monitor)

I want to be able to do the same for memory bound performance problems.

But the top level tools are stuck in the land of decades ago when CPUs were the bottleneck.

saagarjha · on Aug 23, 2023

My point is that this isn't how performance work is done. You have to first diagnose that the issue is CPU-bound before it being memory bound can enter the picture. Time spent waiting for memory is accounted the same as any other CPU work, so it goes under that metric.

To make an analogy, this would be like adding a metric for function calls into Activity Monitor and using it to diagnose quadratic performance. You can't just take that number and immediately figure out the problem; you need to go look at the code and see what it's doing first and then go "oh ok this number is too high". The same applies to waiting for memory. What are you going to do with a number that says the program is spending 30% of its time stalled on loads? Is that too high? A good number? You need to analyze it in more detail elsewhere first.

mcculley · on Aug 23, 2023

> Time spent waiting for memory is accounted the same as any other CPU work, so it goes under that metric.

Yes. I know. That’s my point. Tools exist to dig deeper and are not surfaced better.

I do performance work often. I simply stated that it is a shame that the highest level tools do not show us valuable information.

You are free to accept the status quo.

saagarjha · on Aug 23, 2023

You’re really just making a case for firing up a profiler more often. That’s fine, I do that a lot. But what you’re looking for has no meaning outside of that context.

mcculley · on Aug 23, 2023

I would like to fire up a profiler less often.

sakras · on Aug 22, 2023

Instruments is not nearly good enough for any serious performance work. Instruments only tells me what percent of time is spent in which part of the code. This is fine for a first pass, but it doesn’t tell me _why_ something is slow. I really need a V-Tune-like profiler on macOS.

saagarjha · on Aug 22, 2023

I’ve used it professionally and generally been happy with it. What are you missing from it?

sakras · on Aug 22, 2023

I’ve tried to use it professionally, but always end up switching to my x86 desktop to profile my code, just so I can use V-Tune.

It’s missing any kind of deeper statistics such as memory bandwidth, cache misses, branch mispredictions, etc. I think fundamentally Apple is geared towards application development, whereas I’m working on more HPC-like things.

saagarjha · on Aug 22, 2023

Have you tried using the performance counters? They've been useful in my experience, although I don't touch them often. Instruments is definitely not geared towards this since most application developers rarely need to do profiling at this level, but it has some level of this built in when you need it.

saagarjha · on Aug 22, 2023

It’s only useful once you understand how algorithmic complexity works, and how to profile your code, and how you language runtime does things. Before that your CPU cache is largely opaque and trying to peer into it is probably counterproductive.

arrowsmith · on Aug 21, 2023

Okay, you've made me want to learn about it. Where do I start? What concepts do I need to understand? Any reading recommendations?

Mockapapella · on Aug 21, 2023

Haven't read through it, but I suspect this would be a good place to start: https://cpu.land/

HN Discussion: https://news.ycombinator.com/item?id=36823605

hayley-patton · on Aug 22, 2023

Drepper's "What every programmer should know about memory", though you mightn't find it all interesting. https://gwern.net/doc/cs/hardware/2007-drepper.pdf