Rot8000

danbruc · on Nov 20, 2018

In case the author sees this, some comments about Rotator.cs.

1. This algorithm will break if the number of valid characters in the BMP becomes odd.

EDIT: As user platforms pointed out, there is an unit test for this.

2. There is an overflow in line 39 because of the check i <= BMP_SIZE in line 37.

3. The web server at rot8000.com exposes at least some errors with stack traces, try rotating the string <script>.

4. In line 42 you are performing a linear search for every character you transform, that is very inefficient, especially with characters at the end of the BMP. At least use a hash map or even better just use an array mapping the input code point directly to the output code point.

5. rot8000.com does at the very least allow rather long inputs which paired with the inefficiency of the linear search makes a DoS attack pretty easy. I tried a 10,000 word lorem ipsum, it was not rejected and the request took a minute to complete.

rottytooth · on Nov 20, 2018

Thanks -- added an issue for the linear search https://github.com/rottytooth/rot8000/issues/2 -- will place a limit on chars in that textbox as well

danbruc · on Nov 21, 2018

For reference, I created an optimized implementation and tested it with a string containing all characters from U+0000 to U+FFFF in order and got the following times. The original implementation took 5202.766 ms, the optimized implementation took 0.079 ms for a speed-up of about 65858. That this is pretty close to 65536 is probably a reflection of the cost for the linear search through almost that number of characters and the test pattern I choose but I am not entirely sure, intuitively I would have expected a factor of 0.5 in there to account for the average case. But I am too lazy right now to do the math.

rottytooth · on Nov 21, 2018

I've updated it to use a hashtable and the tests run quite a lot faster

danbruc · on Nov 21, 2018

I took the array approach which should be still faster because it avoids the hash calculations. Just build an array Char[65536] containing at every index i the character the character i should be mapped to. Rotator.Rotate() then simply becomes the following where Rotator.map is the precomputed array. Probably very similar to an implementation using a hash table. I also got rid of the string builder but did not profile the difference. If one uses a string builder it would most likely help to specify the capacity in the constructor call so that the internal array does not have to be resized repeatedly as the result is constructed and grows in length.

  public static String Rotate(String input)
  {
    var result = new Char[input.Length];

    for (var index = 0; index < input.Length; index++)
    {
      result[index] = Rotator.map[input[index]];
    }

    return new String(result);
  }

andyburke · on Nov 20, 2018

Limiting the text box will protect you against the most naive DoS attacks, but you need some kind of limit at the API level (request size, etc.). Never trust the client.

cat199 · on Nov 21, 2018

not a js guy - any reason this needs any lookup table at all?

quick googling seems to suggest simple bitshifting could be possible..

TJSomething · on Nov 21, 2018

Because the basic multilingual plane that it operates on isn't actually full.

jstanley · on Nov 20, 2018

Interesting! I made a very similar tool earlier this year.

It comes with presets for various different areas of Unicode, and some example text, although the intended use case was very different, I looked at it from a steganography perspective rather than an honours-system obfuscation perspective.

https://incoherency.co.uk/mojibake/

I initially thought it would be able to decode the rot8000 output without any modification but I think the utf-8 escaping that my tool expects (from its own output) gets confused by the output from rot8000.

brlewis · on Nov 20, 2018

It may also be that you're rotating by 0x8000 and this code is not. It's creating a mapping that's restricted to non-control, non-surrogate, non-whitespace characters and rotating by half the size of that mapping.

https://github.com/rottytooth/rot8000/blob/master/Rottytooth...

danbruc · on Nov 20, 2018

This will break, i.e. two consecutive rotations will no longer be the identity, if the number of valid characters in the BMP ever becomes odd. And there are still a few unallocated code points in the BMP. There is also an overflow in line 39 because of the check i <= BMP_SIZE in line 37 which, I guess, previously used Char.MaxValue instead of BMP_SIZE. But it does no harm here, U+0000 just gets filtered out twice.

platforms · on Nov 20, 2018

There's a test for BMP characters being even: https://github.com/rottytooth/rot8000/blob/master/Rottytooth...

More critically, if the # of valid chars changes, previously rot-8000'd text will no longer be reversible through the tool

hyper_reality · on Nov 20, 2018

I also had a similar idea a few years ago for a CTF challenge, coming at it from a "modern Caesar cipher" perspective: https://laurencetennant.com/unicode-shift-cipher

Also a crude "modern Bacon cipher" using Punycode characters as the B's and ASCII-range characters as the A's.

razster · on Nov 20, 2018

᫧Ⴑڠ⧧ޱᒱ෧ឱޮᏧƱภᷧ↱ᢒРⓧ⢱ᚽᏧ₱ឱ˧ⲱⶲ೧ⲱஷ⣧ᶱ឴〠˧ᢱᎪᆱ֫⧧঱ྸ১⪱ᦾዧ઱ᦽ⸠ֲࣧւ໧኱↸ョற⮾Ġ೧ᶱធ℠ᛧᲱᎽ⣧঱ಸᇧ↱ⲾᏧケᚬ૧ᢱằဠǧྱ⮶ᷧ⊲ંⷧұᖼ໧኱மყ₱⦵ყ↱ឯ

ninjin · on Nov 20, 2018

This certainly is what I would call a “neat hack”. Out of curiosity I had to check what it rotates Japanese into. Turns out, mostly Korean: “日本語はどうかな？” becomes “ື걅갿개갡걀等”.

egypturnash · on Nov 20, 2018

It meticulously refrains from rotating emoji. Somehow this feels like failure.

SomeCallMeTim · on Nov 20, 2018

ROT-8000 is only touching the first 65536 Unicode characters (UCS-2). Unicode has >1M code points. [0]

Most emojis seem to be above the first 16 bits. [1] But there are a number of emojis in the first 16 bits, like the "frowning face" emoji at U+2639 -- it rotates just fine -- plus others in the first 16 bits.

(TIL you can't paste emojis into HN comment threads. Probably all for the best.)

[0] https://en.wikipedia.org/wiki/Unicode

[1] https://unicode.org/emoji/charts/emoji-list.html

wl · on Nov 20, 2018

> TIL you can't paste emojis into HN comment threads. Probably all for the best.

It got in the way of me explaining the Rebus principle used in Egyptian hieroglyphs a little while ago.

Then again, the topic was phalluses in Unicode (which do display!), so maybe you're right.

have_faith · on Nov 20, 2018

> TIL you can't paste emojis into HN comment threads. Probably all for the best.

I'm gonna make a HN where you can only speak in Emoji!

Sorta unrelated, does anyone remember the social network where you could only write in Emoji? http://emoj.li/

tyingq · on Nov 20, 2018

Are the rules for what it does allow written down somewhere? I know country flags work: 🇩🇪

codetrotter · on Nov 21, 2018

If your question isn’t answered, you could post all of them in a comment and see which ones remain unfiltered.

https://unicode.org/emoji/charts/full-emoji-list.html

https://unicode.org/emoji/charts/full-emoji-modifiers.html

You might need to split the text over multiple comments. Don’t remember whether or not there is a limit to the length a comment can have. Probably there is.

tyingq · on Nov 21, 2018

I may try that. It seems a little arbitrary.

I can post, for example: ↙️ ↩️ ⌚ ⌛ ⌨ ⏏ ⏩ ⏰ ️⏱ ⏲ ⏳ ️◾ 󠁧󠁢󠁷󠁬󠁳󠁿

angus-prune · on Nov 22, 2018

There is a wonderful talk by the founders of emoj.li about how it was all a joke which got out of hand. https://www.youtube.com/watch?v=GsyhGHUEt-k

ucosty · on Nov 20, 2018

It seems you can type them in natively, though

edit: scratch that, they get stripped out.

theophrastus · on Nov 20, 2018

I was curious as to how one might implement this with a familiar language, and fetched up on this interesting python github script, specifically "rot32768"[0]

[0] https://gist.github.com/terrorbyte/7967039

ConcernedCoder · on Nov 20, 2018

FYI: Here's a static JavaScript version I whipped-up ( as a lunch-time challenge ) that will reversable rotate everything except whitespace...

https://github.com/jeffallen6767/rot0x8000

rottytooth · on Nov 21, 2018

Don't see how this will work without checking for control characters, surrogates and chars above 0x10000 (try 𝄞 for instance)

platforms · on Nov 20, 2018

籝籱籮籺籾籲籬籴籫类籸粀籷籯籸粁米籾籶籹籼籸籿籮类簹粁籁簹簹簹籭籸籰籼簷 http://rot8000.com/Index?%E7%B1%9D%E7%B1%B1%E7%B1%AE%20%E7%B...

tsaoyu · on Nov 21, 2018

Reminds me 锟斤拷 due to Unicode replacement character misinterpretation problem. When placeholder 'U+FFFD' decoded using GBK it will displayed as these characters. Some of glitches can still be found online, e.g., https://docs.oracle.com/cd/E19199-01/817-4244-10/preface.htm...

omarforgotpwd · on Nov 21, 2018

If you are just starting to get interested in cryptography, try and make a program that can break ciphers like this one or similar. Hint: Use frequency analysis on sample ciphertext and compare to known letter frequencies in english letter to match to plaintext. Then you can determine the offset and decrypt

collyw · on Nov 20, 2018

Can someone explain what this is doing please?

Crespyl · on Nov 20, 2018

See: http://rot8000.com/info

It's essentially a Unicode version of the old "Rot 13" cypher.

In Rot 13, you translate each letter 13 places down (as if on a code wheel), such that 'A' becomes 'N', 'B' becomes 'O', wrapping such that 'Z' becomes 'M', and so on.

This version, instead of using the simple 'A=1...Z=26' number space, uses the Unicode range and rotates by 32,768 (0x8000).

scrooched_moose · on Nov 20, 2018

One key aspect you skipped over is it's self-reversible. 'A' becomes 'N', and applying it again 'N' becomes 'A'.

"rot13 is reversible" -> "ebg13 vf erirefvoyr" -> "rot13 is reversible".

"rot8000 is also reversible" -> "类籸籽籁簹簹簹籲籼籪籵籼籸类籮籿籮类籼籲籫籵籮" -> "rot8000 is also reversible"

Rot13 is English-alphabet only so it skips numbers, while rot8000 doesn't have this limitation because it uses the larger unicode set.

joshuamcginnis · on Nov 20, 2018

The only link on the page links to the explanation.

collyw · on Nov 20, 2018

I missed that - meow_info doesn't really convey that its an explanation have the same noticeaion.

TazeTSchnitzel · on Nov 20, 2018

Reminds me of the infamous 畂桳栠摩琠敨映捡獴.

Arkanosis · on Nov 20, 2018

For anyone not getting it: https://en.wikipedia.org/wiki/Bush_hid_the_facts

supakeen · on Nov 20, 2018

Fun, but outputs unprintable or non-used characters and only functions on the BMP?

loa_in_ · on Nov 20, 2018

Reminds me of http://base91.sourceforge.net/.

We could go further, straight to Base8000!

ar-nelson · on Nov 20, 2018

Already exists: https://github.com/qntm/base65536

It's actually pretty useful for compressing data in Unicode-aware environments, like Twitter. Which makes me wonder if Unicode support is universal enough now that an encoding like this could replace MIME/base64 in email.

lifthrasiir · on Nov 21, 2018

Okay, I have seen this 10 times or so when I tried to compare various binary-to-text encodings and basE91 is the only one without a format description. Probably it's time to directly look at the source code. Amazingly, this one turns out to be the only binary-to-text encoding with the input bits groupped by varying number of bits I have ever seen. More specifically:

* The input bits are packed in the reverse order (e.g. 1A 2B 3C is packed as 0x3C2B1A) unlike most other binary-to-text encodings. The last bits are padded with preceding zeroes.

* A pair of basE91 alphabets encode a number 0 through 8280. The first alphabet is least significant: `AB` encodes 91 and not 1.

* 91^2 = 8281 > 2^13 = 8192, so groups of 13 bits are read and encoded as two basE91 alphabets from the least significant to the most significant. But it's not always the case. Occasionally a group of lowermost 14 bits will be read if the bits are less than 91^2. As a result, the first 8281 - 8192 = 89 values (0..88) and the last 89 values (8192..8280) actually encode 14 bits, and it includes all-zero bits. Its average overhead is therefore 22.93% (16 / lg 8281 - 1) and can reach 14.29% (16 / 14 - 1) when all bits are zero.

It reminds me of Ascii85 [1] which had a shorthand for all-zero groups and all-space groups, but this one is more general. Speaking of generality, probably a binary-to-text encoding with arithmetic coding is now viable?

[1] https://en.wikipedia.org/wiki/Ascii85#btoa_version

dullroar · on Nov 20, 2018

Should also change spaces to zero-width spaces, which would then make it less obvious where the word breaks are.

dana321 · on Nov 20, 2018

籖粂籶籸籽籱籮类籪籽籮粂籸籾类籬籱籲粀籸粀籸粀籸粀粀

tuttle7 · on Nov 20, 2018

Noone is concerned by the fact this is sending your text using POST requests. The guy could not use DOM/JS.

Crespyl · on Nov 20, 2018

No, no one is concerned by this. Not every toy website needs to have JS.

Sohcahtoa82 · on Nov 20, 2018

I think the point the tuttle7 was trying to make was that this site could be implemented client-side quite easily. There's no real reason to make the translation server-side and require more server CPU resources and bandwidth.

I feel the same way about https://www.base64decode.org/ . By default, everything gets translated server-side. I wonder how many people use this site on a regular basis for translating secrets. I'd bet my life that the number is greater than zero.

ravenstine · on Nov 20, 2018

Nah bro, it needs Webpack and a mishmash of Angular and Vue with a "sprinkling" of React along with an Elixir backend so it's fault-tolerant. Else, how is this toy site supposed to scale at all?

stilldavid · on Nov 20, 2018

I'd be more concerned if you used this for actual secrets.

richrichardsson · on Nov 20, 2018

That's why you write rude messages to give him/her a laugh when checking server logs.

tuttle7 · on Nov 20, 2018

Yep, passed the test. There is then checks made on contents send. A little warning on how the send data is handled would have been appreciated. Thank you.

marssaxman · on Nov 20, 2018

Why on earth would you need a warning that text entered into an HTML form would be posted to the server when you pressed the button? What else would you expect it to do?

Sohcahtoa82 · on Nov 21, 2018

I'd expect such a trivial operation to be done client-side in JavaScript and not need to ask a server to do it for them.

bratch · on Nov 21, 2018

I was delighted that a fun toy didn't need JS for once. If the concern is one of privacy, the author could just be sending the text to the server in the background with JS too.

krsdcbl · on Nov 20, 2018

If i can carry a bucket on my shoulder, why get a car to move it?