Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A guesser answers the question: what encoding did they actually use?

FTFY answers the question: What horrifying sequence of encode/decode transforms could output this sequence of bytes in UTF-8 that, when correctly decoded as UTF-8, still results in total gibberish?

In other words...

The problem fixed by an encoding guesser:

1. I encode my text with something that's not UTF-8-compatible.

2. I lie to you and say it's UTF-8.

3. You decode it as UTF-8 and get nonsense. What the heck?

4. A guesser tells you what encoding I actually used.

5. You decode it from the guessed encoding and get text.

  ---- 
The problem fixed by FTFY:

1. I encode string S with non-UTF-8 codec C.

2. I lie that it's UTF-8.

3. Someone decodes it as UTF-8. It's full of garbage, but they don't care.

4. They encode that sequence of nonsense symbols, not the original text, as UTF-8. Let's charitably name this "encoding" C'.

5. They say: Here teddyh, take this nice UTF-8.

6. You decode it as UTF-8. What the heck?

7. Is it ISO-8859? Some version of windows-X? Nope. It's UTF-8 carrying C', a non-encoding someone's broken algorithm made up on the spot. There's no decoder that can turn your UTF-8 back into the symbols of S, because the text you got was already garbage.

8. FTFY figures out what sequence of mismatched encode/decode steps generates text in C' and does the inverse, giving you back C^-1( C'^-1( C'( C( S )))) = S.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: