Hacker News new | past | comments | ask | show | jobs | submit login

Apropos of nothing in particular ...

Not bothering to use ROT13, I just dropped that into my generic substitution cipher decoder and it spat out the answer almost immediately - quite pleased with that.




Interesting. You cycle through substitutions and match against a dictionary to detect a hit?


Nope, I use the shotgun stochastic hill-climbing algorithm I wrote for something else, with the ballistic option disabled.

In short:

    While True:
        Generate a random key
        "Decode"
        Score
        Start timer
        While timer not expired:

            Perturb the key
            Score
            If better:
                Keep the new key
                Reset timer
        Print decrypt
In this case the first output was completely readable.


I have no idea how you'd make a scoring system that worked well with a hill climbing algorithm. You could check each word for a match in the dictionary, but with a substitution cipher you could essentially turn one word into any other word of the same length (excluding words with the same letter in them). Without a way to check for "close" words, you wouldn't be able to climb the hill. would "pello" score a 0.7 and "hello" a 1? I can't think of a fast way to get that number.

I'd like to try my own hand at this, but the way I'm thinking of making the fitness function doesn't seem like it would be smooth enough.

What did you do?


Take a frequency chart of all trigrams in English, and take the vector product of the frequency in the decrypt with the frequency in English.

So, let E(t) be the frequency of occurrence of the trigram t in English, d(t) be how often it occurs in the decrypted text, and compute:

  Score = sum( [ E(t)*d(t) for t in decrypt ] )
If common trigrams turn up frequently in the decrypt, this will be "large". If uncommon trigrams turn up frequently, this will be "small".


...

that is a really good idea...

Turns out there's also a fair amount of data to use as well.

http://www.cse.chalmers.se/edu/year/2010/course/TDA351/ass1/...

I was thinking too rigidly. Thanks!


Norvig has made n-gram data available:

http://norvig.com/ngrams/


Why not use a residual sum of squares here?

https://en.wikipedia.org/wiki/Residual_sum_of_squares


I don't understand your question.

My (admittedly naive) understanding of using RSS is this. I have a model of the data, which in this case will be expected frequencies of my n-grams. Then I compute the actual frequencies of the n-grams, take the difference, square the difference, and add up all the squares. A small number is then indicative of a good fit.

I don't see why this would be better. Not least, this has the problem of all those n-grams that don't turn up in my decrypt. I still have to square their expected frequency and add them into the sum. That contrasts with the method I'm using, where I just take the n-grams that do appear, multiply by the expected frequency, and add that into a running total.

So I guess I don't understand your suggestion.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: