Simple. Imagine you are using tor and one of the relay has problems(unstable, high latency or packet loss). You don’t know which of the 3 hops failed. We have no option but to build a new circuit and we don’t want that. That’s why tor needs stable, trustworthy relays.
From tor blog:
A new relay, assuming it is reliable and has plenty of bandwidth, goes through four phases: the unmeasured phase (days 0-3) where it gets roughly no use, the remote-measurement phase (days 3-8) where load starts to increase, the ramp-up guard phase (days 8-68) where load counterintuitively drops and then rises higher, and the steady-state guard phase (days 68+).
> We have no option but to build a new circuit and we don’t want that.
That's true, you never want to rebuild the circuit. But it strikes me that the idea that this is avoidable falls into at least two of the Eight Fallacies of Distributed Computing[1], namely "The Network Is Reliable" and "Topology Doesn't Change".
If we instead assume that the network isn't reliable, and topology does change, then instead of eliminating unreliable nodes and being conservative with changes to the topology, we would focus on reducing the costs of rebuilding a circuit so that network unreliability and topology changes aren't disastrous.
But it sounds like the Tor team has instead decided to bolster these assumptions, to make them less of assumptions; trying to make the network as reliable as possible and trying to make the topology change as little as possible.
I don't mean this to be a harsh criticism of the Tor team. I'm an outsider, and beyond an uncompromising privacy constraint, I don't know all the constraints Tor was built under. I'm sure the tradeoffs made by the Tor team make sense within the context of their constraints. Obviously, the Tor network works well enough to have a large user base, so they have provided a good-enough solution.
But I wonder if changes could be made to Tor's design in the future which would allow quicker adding and removing nodes, and handle network reliability issues better, so that Tor would be faster.
One possibility which stands out to me is to pool circuits and load-balance between them, so that if a circuit begins to have issues, you still are connected along other circuits while you build a new circuit to replace the unreliable one. This possibly would run into issues where correlate could correlate traffic from different circuits to unmask clients, so you'd have to be careful, but I'm not sure these problems would be insurmountable.
But remember: What tor is doing is hard. They are doing complex crypto, networking, security.... The hard stuff. The real stuff. Torproject is a nonprofit organization with limited capabilities. They are doing their best. It took 3 years to design and implement dos mitigation techniques, for example.
Your proposed plan could take over 10 years, even for a well funded corporation. It might take time and fail. It might create huge vulnerability due to code complexity. Afaik, tor can’t risk that.
From tor blog:
A new relay, assuming it is reliable and has plenty of bandwidth, goes through four phases: the unmeasured phase (days 0-3) where it gets roughly no use, the remote-measurement phase (days 3-8) where load starts to increase, the ramp-up guard phase (days 8-68) where load counterintuitively drops and then rises higher, and the steady-state guard phase (days 68+).
https://blog.torproject.org/lifecycle-new-relay