(Opinion) I think internally, they record how closely 2 words are in meaning based on the training data.
If everyone uses similar language to describe different problems, then the LLM should be able to at least act like it’s generalizing