The slope of the power law is determined by the problem and dataset. Compute, parameter count, and data move you along the curve. Change in architecture/bias is a constant offset.
So architecture can give an advantage, but that advantage can be overcome by scale.
https://arxiv.org/pdf/1712.00409.pdf
The slope of the power law is determined by the problem and dataset. Compute, parameter count, and data move you along the curve. Change in architecture/bias is a constant offset.
So architecture can give an advantage, but that advantage can be overcome by scale.