Different parts of the computation are bound in different ways. The "rule of thu...

Different parts of the computation are bound in different ways. The "rule of thumb" for AlexNet-derived networks (with conv layers followed by fully connected layers) is that 95% of the computation time is spent in the conv layers, but 95% of the parameters are stored as the weights of the FC layers. I imagine the later stage could be memory-bound and the former stage is CPU-bound.

Not sure if this holds true for other NN structures like GoogLeNet or the weirder recent fully convolutional networks.