Can someone who understands this work explain why using a range (de)coder to unc...

pizza · on May 23, 2023

If I’m not mistaken, you still need the key itself to undo the steganography. They provide a way to autoregressively rewrite the decoded message with the aid of the key. I think the perfect secrecy part means that embedding the plaintext doesn’t cause any divergence from the model’s output distribution, so it’s undetectable. You just adapt the covertext in such a way that still has about the same information content but eg via different words.

matthewdgreen · on May 23, 2023

I’m a co-author on one recent (cited) work, Meteor. The challenge in LLM/stego is that the model outputs a probability-weighted vector at each step. Ordinarily you just sample a token from this vector using a random number generator. For stego the idea is to sample from that vector using a string of (pseudorandom) input bits that represent your message, in a way that (1) gives you a valid sampling distribution and (2) is recoverable, meaning that given a second copy of the model plus each token it outputs you can recover the bits used to sample. The first part is relatively easy, it’s the second part that’s “hard” in the sense that you either waste a lot of bits (meaning an inefficient covertext) or you end up with sampling that isn’t statistically identical to a random sampling approach. In Meteor we used a practical approach that’s fairly efficient but trades a bit of statistical perfection. This paper focuses more on the theory of how to do this optimally, which is nice! I don’t fully understand the result right now (it’s a tough paper to read) but this kind of theoretical improvement inspired by previous work is nice.

nullc · on May 23, 2023

Right, so in total ignorance what I would do is just make a GPT powered text compressor and decompressor using a range coder with its probabilities at each token set by the model. Care would need to be taken with precision and end termination so a bias wouldn't be introduced (more care than typical in compressors, since no one normally cares much about 0.01% inefficiencies-- things like end termination have been addressed by bijective range coders).

Then to use it for stego just take your headerless ciphertext and decompress it. Tada: you get model output. To decode the stego just compress it. Assuming everything was built right and that your ciphertext is uniform, the output should be indistinguishable from the model just sampling using an RNG.

As a bonus, you don't even have a stego tool on your computer you just have a particularly carefully constructed text compressor and decompressor that is perfectly usable (and even state of the art in compression rate, given a big enough model) for the compression application.

csdewitt · on May 25, 2023

Hi, I am one of the authors on the PSSuMEC paper. Thanks a lot for your interest in our work. If I understand correctly, range coding would have the same lack of perfect security properties as arithmetic coding (cf our paper). Having said this, we are investigating the utility of iMEC in compression settings.

nullc · on May 27, 2023

Ah, yeah I was trying to get a lay explanation of how arithmetic coding assuming it was done with enough precision wouldn't achieve the perfect security properties.

It might just be that the precision rapidly becomes unmanageable, because I guess you need to support the least probable symbol from every token without ever losing precision (normally arithmetic coders will re-normalize after each symbol to keep the accumulator in a reasonable precision, though they could be designed to do so as infrequently as you like at a performance cost)... If no renormalization is possible I guess an arithmetic coder accumulator would need to handle values like (1/least_prob_token)^n_tokens which gets extremely big extremely fast -- and would at the very least need an unconventional construction.