Hacker News new | past | comments | ask | show | jobs | submit login

So, let me see what I think I understand here:

1. AI models are good at Cæsar-cypher transposition, because it occurs often enough in training models for certain values of the cypher offset. Outside those values, AI doesn't handle the transformations well.

2. Somehow AI models perform this cypher also within high ranges of Unicode, because the characters are encoded as three tokens each, of which the last one encodes the same difference as between alphabetic letters, and so the first two tokens get discarded as irrelevant, meaning that by sheer chance the alphabet maps perfectly via Cæsar-cypher (with a two-token offset) to a specific range of Unicode characters reserved for Byzantine music notation.

3. This is easy to understand for one AI model, because its explicable by chance that the offset between the alphabet and Byzantine music notation should coincide perfectly with two less-significant tokens. It's harder to understand why this works in more than one AI model, though.






It's not that surprising that models encode Byzantine Music Notation characters using separate tokens for each UTF-8 byte, since they're unlikely to occur often enough for byte-pair encoding to allocate dedicated multi-byte tokens to them.

What isn't clear to me is where ASCII text with 64 added to every byte occurs in the wild.


Translating uppercase to lowercase adds 32.

Maybe it's not "thinking" in terms of "add 64", but rather "translate uppercase to lowercase, twice".


Possibly because of xor with 0xc0 which, for lowercase ascii, has the same effect as adding 64.

Probably seeing xor-encoded input is common in the training data.


Have you checked the UTF-8 encodings of all the typical "Unicode fonts" e.g. circled letters, superscripts and so on?

Good idea. I checked all alphanumeric characters whose UTF-8 encoding ends with \xa1.

  >>> '\N{fullwidth latin capital letter a}'.encode('utf-8')
  b'\xef\xbc\xa1'
A seems like the culprit.

Mystery solved!

> What isn't clear to me is where ASCII text with 64 added to every byte occurs in the wild.

maybe it has some general ascii bittwiddling capability for base64 and uuencode that help here too


NaN boxing?

Does it make sense to talk about the index of a token? The model never sees such an index does it? I thought the tokens are converted to learned embeddings at the first layer.

This is a great point, I think I might have been wrong actually. It doesn't really make sense that one row of the embedding matrix is treated differently than another...

Indeed. Maybe the learned circuit does something like the following. For each token's feature vector, compute a representation of positions where it appears in the sentence. This could be made possible by the positional embeddings.

Token Features 0 => list[1, 5, 6, 10]

Token Features 1 => list[7, 8]

...

These "list features" would be invariant to Caesar cipher. So then the LLM could pass these list features to a learned Caesar cipher decoder unit to spit out the decoded text.

It's still unexplained, however, why the Byzantine Music Notation would trigger the this circuit while other Caesar cipher's wouldn't.


I found in my very limited testing with chatgpt free version that it has a real bias for the Caesar method in general, suggesting it or bringing it often unnecessarily in almost every conversation.

Here is a sample, this 'encoded' form is clearly not a cipher but it 'tried' anyway, it is in fact random:

Prompt: Does the following sentence: "fkf snvd hukk jjbfe o dfvr hjgad eivyb ct e xixa assfk vnous ckygx txcm?" bare resemblance to the phrase "how much wood would a wood chuck chuck if a wood chuck could chuck wood?"?

ChatGPT said:

Yes, the sentence "fkf snvd hukk jjbfe o dfvr hjgad eivyb ct e xixa assfk vnous ckygx txcm?" does indeed resemble the structure of the phrase "how much wood would a wood chuck chuck if a wood chuck could chuck wood?".

It seems like it could be an encrypted version of the latter phrase. Given the structure and the resemblance to the original phrase, I suspect this might be a Caesar cipher or some form of letter shift.

Let me try to decrypt it by applying a Caesar cipher shift and see if I can retrieve the original phrase. I'll start with a few different shift values. Hang on!


LLMs ought to be good at generating simple substitutions too, but they're not: https://claude.ai/share/9495246c-b77a-496e-8643-dbc5ec58dbcb

It tries but it fails to both generate and apply a simple substitution. It starts substituting based on applying the substitution twice to some letters.

It was a very frustrating time I spent trying to solve it, especially after asking to confirm it had created a 1:1 mapping.

Since it has no reasoning, it just lies about having done so.


Why should they be? The token is at the word level more or less, so the model can't see the letters to substitute.

When we get character-level models this should be more viable.


> Outside those values, AI doesn't handle the transformations well.

It doesn't handle it well if you prevent it from thinking about it and ask it for a zero-shot, single pass answer.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: