In the grand scheme of things it doesn't matter that much.
Once you starts parallelizing, CI time is: setup_time + (test_run_time / workers), so assuming money isn't a problem you can add more workers as you keep adding more tests.
What really matter is how fast you can setup your test workers and how slow individual tests are.
I think your last statement captures something that not often emphasized, which is that startup time of workers and app are variable costs which scale with the amount of parallelization and so if you have a very large test suite there may be a limit where setup eats up so much time that parallelizing more is going to be pretty "wasteful" and I guess that's where the money comes in.
Secondly if you're actually trying to startup 100+ test workers per build and so on there's going to be some time distribution for how long it takes for each worker to startup and that adds a bit more time for all workers to complete. This distribution probably isn't _that_ wide timewise but if you really start to push your test suite runtime down it may pop up. If you're running things in docker sometimes a node doesn't have the image in it's docker cache...
Unsure if CI services like buildkite have really made this that much faster but it seems like they are using a single box with 64 cores.
> Unsure if CI services like buildkite have really made this that much faster
Buildkite doesn't directly help with it, but since you bring your own hardware and that it's highly customizable, it does allow you to invest in improving setup time quite dramatically. It's a great product.
Once you starts parallelizing, CI time is: setup_time + (test_run_time / workers), so assuming money isn't a problem you can add more workers as you keep adding more tests.
What really matter is how fast you can setup your test workers and how slow individual tests are.