When relationships are represented implicitly by the magnitude of the dot product between two vectors, there's no particular advantage to not "creating" all relationships (i.e. enforcing orthogonality for "uncreated" relationships).
That’s not right; there are many vectors that go unbuilt between unrelated tokens. Creating a ton of empty relationships would obviously generate an immense amount of useless data.
Your links are not about actually orthogonal vectors, so they’re not relevant. Also that’s not what superposition is defined as in your own links:
> In this paper, we use toy models — small ReLU networks trained on synthetic data with sparse input features — to investigate how and when models represent more features than they have dimensions. We call this phenomenon superposition
On the contrary, by allowing vectors for unrelated concepts to be only almost orthogonal, it's possible to represent a much larger number of unrelated concepts. https://terrytao.wordpress.com/2013/07/18/a-cheap-version-of...
In machine learning, this phenomenon is known as polysemanticity or superposition https://transformer-circuits.pub/2022/toy_model/index.html