Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Reminds me of the Universal Cyrillic decoder [1]

And old MySQL db dump I have has some values such as: !¡!HONDA POW

Does anyone here have an idea if/how I can recover the mangled text?

[1] https://2cyr.com/decode/?lang=en



In fact, ftfy already figures that text out! Here are the recovery steps that the website outputs:

    import ftfy.bad_codecs  # enables sloppy- codecs
    s = '!¡!HONDA POW'
    s = s.encode('sloppy-windows-1252')
    s = s.decode('utf-8')
    s = s.encode('sloppy-windows-1252')
    s = s.decode('utf-8')
    s = s.encode('latin-1')
    s = s.decode('utf-8')
    print(s)
And the decoded text is (for some reason):

    !¡!HONDA POW


Thank you, I'd also tested that but it seems to simply remove the mangled string part. Maybe it's impossible to recover it automatically after all :/


No, no. That is the recovered text.

Originally, the text had one non-ASCII character, an upside-down exclamation point. A series of unfortunate (but typical) things happened to that character, turning it into 9 characters of nonsense, the 9th of which is also an upside-down exclamation point.

It looks like ftfy is just removing the first 8 characters, but it's reversing a sequence of very specific things that happened to the text (which just happens to be equivalent to removing the first 8 characters).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: