The answer to that is a huge part of the NLP field. The current answer is that you break down the string into constituent parts and map each of them into a high dimensional space. “cat” becomes a large vector whose position is continuous and therefore differentiable. “the cat” probably becomes a pair of vectors.
It's weirder than that. You typically are differentiating a loss function of strings and various opaque weights. You are optimizing the loss function over the weight space, so in some informal contrived sense you are actually differentiating with respect to the string.