I’m a co-author on one recent (cited) work, Meteor. The challenge in LLM/stego is that the model outputs a probability-weighted vector at each step. Ordinarily you just sample a token from this vector using a random number generator. For stego the idea is to sample from that vector using a string of (pseudorandom) input bits that represent your message, in a way that (1) gives you a valid sampling distribution and (2) is recoverable, meaning that given a second copy of the model plus each token it outputs you can recover the bits used to sample. The first part is relatively easy, it’s the second part that’s “hard” in the sense that you either waste a lot of bits (meaning an inefficient covertext) or you end up with sampling that isn’t statistically identical to a random sampling approach. In Meteor we used a practical approach that’s fairly efficient but trades a bit of statistical perfection. This paper focuses more on the theory of how to do this optimally, which is nice! I don’t fully understand the result right now (it’s a tough paper to read) but this kind of theoretical improvement inspired by previous work is nice.