In case the author sees this, some comments about Rotator.cs.
1. This algorithm will break if the number of valid characters in the BMP becomes odd.
EDIT: As user platforms pointed out, there is an unit test for this.
2. There is an overflow in line 39 because of the check i <= BMP_SIZE in line 37.
3. The web server at rot8000.com exposes at least some errors with stack traces, try rotating the string <script>.
4. In line 42 you are performing a linear search for every character you transform, that is very inefficient, especially with characters at the end of the BMP. At least use a hash map or even better just use an array mapping the input code point directly to the output code point.
5. rot8000.com does at the very least allow rather long inputs which paired with the inefficiency of the linear search makes a DoS attack pretty easy. I tried a 10,000 word lorem ipsum, it was not rejected and the request took a minute to complete.
For reference, I created an optimized implementation and tested it with a string containing all characters from U+0000 to U+FFFF in order and got the following times. The original implementation took 5202.766 ms, the optimized implementation took 0.079 ms for a speed-up of about 65858. That this is pretty close to 65536 is probably a reflection of the cost for the linear search through almost that number of characters and the test pattern I choose but I am not entirely sure, intuitively I would have expected a factor of 0.5 in there to account for the average case. But I am too lazy right now to do the math.
I took the array approach which should be still faster because it avoids the hash calculations. Just build an array Char[65536] containing at every index i the character the character i should be mapped to. Rotator.Rotate() then simply becomes the following where Rotator.map is the precomputed array. Probably very similar to an implementation using a hash table. I also got rid of the string builder but did not profile the difference. If one uses a string builder it would most likely help to specify the capacity in the constructor call so that the internal array does not have to be resized repeatedly as the result is constructed and grows in length.
public static String Rotate(String input)
{
var result = new Char[input.Length];
for (var index = 0; index < input.Length; index++)
{
result[index] = Rotator.map[input[index]];
}
return new String(result);
}
Limiting the text box will protect you against the most naive DoS attacks, but you need some kind of limit at the API level (request size, etc.). Never trust the client.
Interesting! I made a very similar tool earlier this year.
It comes with presets for various different areas of Unicode, and some example text, although the intended use case was very different, I looked at it from a steganography perspective rather than an honours-system obfuscation perspective.
I initially thought it would be able to decode the rot8000 output without any modification but I think the utf-8 escaping that my tool expects (from its own output) gets confused by the output from rot8000.
It may also be that you're rotating by 0x8000 and this code is not. It's creating a mapping that's restricted to non-control, non-surrogate, non-whitespace characters and rotating by half the size of that mapping.
This will break, i.e. two consecutive rotations will no longer be the identity, if the number of valid characters in the BMP ever becomes odd. And there are still a few unallocated code points in the BMP. There is also an overflow in line 39 because of the check i <= BMP_SIZE in line 37 which, I guess, previously used Char.MaxValue instead of BMP_SIZE. But it does no harm here, U+0000 just gets filtered out twice.
This certainly is what I would call a “neat hack”. Out of curiosity I had to check what it rotates Japanese into. Turns out, mostly Korean: “日本語はどうかな?” becomes “ື걅갿개갡걀等”.
ROT-8000 is only touching the first 65536 Unicode characters (UCS-2). Unicode has >1M code points. [0]
Most emojis seem to be above the first 16 bits. [1] But there are a number of emojis in the first 16 bits, like the "frowning face" emoji at U+2639 -- it rotates just fine -- plus others in the first 16 bits.
(TIL you can't paste emojis into HN comment threads. Probably all for the best.)
You might need to split the text over multiple comments. Don’t remember whether or not there is a limit to the length a comment can have. Probably there is.
I was curious as to how one might implement this with a familiar language, and fetched up on this interesting python github script, specifically "rot32768"[0]
Reminds me 锟斤拷 due to Unicode replacement character misinterpretation problem. When placeholder 'U+FFFD' decoded using GBK it will displayed as these characters. Some of glitches can still be found online, e.g.,
https://docs.oracle.com/cd/E19199-01/817-4244-10/preface.htm...
If you are just starting to get interested in cryptography, try and make a program that can break ciphers like this one or similar. Hint: Use frequency analysis on sample ciphertext and compare to known letter frequencies in english letter to match to plaintext. Then you can determine the offset and decrypt
It's essentially a Unicode version of the old "Rot 13" cypher.
In Rot 13, you translate each letter 13 places down (as if on a code wheel), such that 'A' becomes 'N', 'B' becomes 'O', wrapping such that 'Z' becomes 'M', and so on.
This version, instead of using the simple 'A=1...Z=26' number space, uses the Unicode range and rotates by 32,768 (0x8000).
It's actually pretty useful for compressing data in Unicode-aware environments, like Twitter. Which makes me wonder if Unicode support is universal enough now that an encoding like this could replace MIME/base64 in email.
Okay, I have seen this 10 times or so when I tried to compare various binary-to-text encodings and basE91 is the only one without a format description. Probably it's time to directly look at the source code. Amazingly, this one turns out to be the only binary-to-text encoding with the input bits groupped by varying number of bits I have ever seen. More specifically:
* The input bits are packed in the reverse order (e.g. 1A 2B 3C is packed as 0x3C2B1A) unlike most other binary-to-text encodings. The last bits are padded with preceding zeroes.
* A pair of basE91 alphabets encode a number 0 through 8280. The first alphabet is least significant: `AB` encodes 91 and not 1.
* 91^2 = 8281 > 2^13 = 8192, so groups of 13 bits are read and encoded as two basE91 alphabets from the least significant to the most significant. But it's not always the case. Occasionally a group of lowermost 14 bits will be read if the bits are less than 91^2. As a result, the first 8281 - 8192 = 89 values (0..88) and the last 89 values (8192..8280) actually encode 14 bits, and it includes all-zero bits. Its average overhead is therefore 22.93% (16 / lg 8281 - 1) and can reach 14.29% (16 / 14 - 1) when all bits are zero.
It reminds me of Ascii85 [1] which had a shorthand for all-zero groups and all-space groups, but this one is more general. Speaking of generality, probably a binary-to-text encoding with arithmetic coding is now viable?
I think the point the tuttle7 was trying to make was that this site could be implemented client-side quite easily. There's no real reason to make the translation server-side and require more server CPU resources and bandwidth.
I feel the same way about https://www.base64decode.org/ . By default, everything gets translated server-side. I wonder how many people use this site on a regular basis for translating secrets. I'd bet my life that the number is greater than zero.
Nah bro, it needs Webpack and a mishmash of Angular and Vue with a "sprinkling" of React along with an Elixir backend so it's fault-tolerant. Else, how is this toy site supposed to scale at all?
Yep, passed the test.
There is then checks made on contents send. A little warning on how the send data is handled would have been appreciated. Thank you.
Why on earth would you need a warning that text entered into an HTML form would be posted to the server when you pressed the button? What else would you expect it to do?
I was delighted that a fun toy didn't need JS for once. If the concern is one of privacy, the author could just be sending the text to the server in the background with JS too.
1. This algorithm will break if the number of valid characters in the BMP becomes odd.
EDIT: As user platforms pointed out, there is an unit test for this.
2. There is an overflow in line 39 because of the check i <= BMP_SIZE in line 37.
3. The web server at rot8000.com exposes at least some errors with stack traces, try rotating the string <script>.
4. In line 42 you are performing a linear search for every character you transform, that is very inefficient, especially with characters at the end of the BMP. At least use a hash map or even better just use an array mapping the input code point directly to the output code point.
5. rot8000.com does at the very least allow rather long inputs which paired with the inefficiency of the linear search makes a DoS attack pretty easy. I tried a 10,000 word lorem ipsum, it was not rejected and the request took a minute to complete.