To expand on this, one of the most common tricks is Nucleus sampling. Roughly, y...

To expand on this, one of the most common tricks is Nucleus sampling. Roughly, you zero out the lowest probabilities such that the remaining sum to just above some threshold you decide (often around 80%).

The idea is that this is more general than eg changing the temperature of the softmax, or using top-k where you just keep the k most probable outcomes.

Note that if you do Nucleus sampling (aka top-p) with the threshold p=0% you just pick the maximum likelihood estimate.