Vectors can do what one-hot vectors cannot do -- no one said inputs need to be r...

Vectors can do what one-hot vectors cannot do -- no one said inputs need to be rows from an token_id -> vector embeddings map. Basically, we are doing this already by moving from one-hot vectors to n-tuples of one-hot vectors, increasing the effective vocabulary size from V to V^n.