Once you've identified long response times, it would also be possible to observe that the threads handling requests spent a good percentage of time waiting for CPU, which would point to CPU saturation as the problem. I'm not sure how you do this on GNU/Linux, but you can assess this on illumos with ptime(1) or prstat(1M).
I think your concern about measuring CPU utilization is real, though. You can use frequent sampling and present samples on a heat map to deal with this problem. There are some examples here:
http://www.brendangregg.com/HeatMaps/utilization.html
I think your concern about measuring CPU utilization is real, though. You can use frequent sampling and present samples on a heat map to deal with this problem. There are some examples here: http://www.brendangregg.com/HeatMaps/utilization.html