What's interesting is that each token goes and visits all the model. Basically each token touches the synthesis of the whole human culture before being fully formed.
That depends on whether the weight matrix for the model is sparse or dense. If it's sparse, then a large swath of the path quickly becomes 0 (which could still be considered "visited", though pretty pathological).
And it's probably the closest approximation to what happens in our heads when we utter each word or take an action. We are thin layers of customisation running on top of Language.