The goal isn't smaller images, but smaller downloads. Smaller downloads doesn't necessarily mean smaller images. The very first docker pull isn't usually the problem, its all the subsequent ones.
The goal is layers. Cache the ones that don't change often, put the ones that change frequently at the end.
Then you download as little as possible, and reuse layers.
Some techniques for smaller containers, such as squashing, actually makes downloads worse.
That entirely depends on your execution environment! If you are running on one fixed host and only use Docker for the benefit of reproducibility or easy software upgrades, you should definitely go for a caching strategy.
If you are running on your own (bare metal) Kubernetes or other orchestrators, rather go for smaller image sizes.
Some techniques for smaller containers, such as squashing, actually makes downloads worse.