Different parts of the computation are bound in different ways. The "rule of thumb" for AlexNet-derived networks (with conv layers followed by fully connected layers) is that 95% of the computation time is spent in the conv layers, but 95% of the parameters are stored as the weights of the FC layers. I imagine the later stage could be memory-bound and the former stage is CPU-bound.
Not sure if this holds true for other NN structures like GoogLeNet or the weirder recent fully convolutional networks.
Not sure if this holds true for other NN structures like GoogLeNet or the weirder recent fully convolutional networks.