Hacker News new | past | comments | ask | show | jobs | submit login

I take issue with their proof/illustration showing that Kolmogorov complexity is incomputable; perhaps someone can help me explain better. Here's the passage in question:

> There’s just one drawback to Kolmogorov complexity: It’s incomputable, meaning that there is no program that can calculate the complexity of every possible string. We know this because if there were such a program, we’d end up with a contradiction.

> To see this, imagine we have a program that can compute Kolmogorov complexity for any string. Let’s call the program K. Now, let’s search for the smallest string of numbers — call it S — whose Kolmogorov complexity is double the length of K. To be concrete, we could imagine that K has 1 million characters, so we’re looking for a string S whose Kolmogorov complexity is 2 million (meaning that the shortest program that outputs S has 2 million characters).

> With program K in our toolbox, calculating S is easy (though not necessarily quick): We can write a new program that we’ll call P. The program P essentially says, “Go through all strings in order, using program K to compute their Kolmogorov complexity, until you find the first one whose Kolmogorov complexity is 2 million.” We’ll need to use program K when building P, so altogether P will have slightly more than 1 million characters. But this program outputs S, and we defined S as a string whose shortest program has 2 million characters. There’s the contradiction.

The way I understand it, the upper bound of the complexity of a given string is essentially len(string), since you can just print out the thing itself. So in the above example, the string S is at least 2 million characters long.

Now, it describes program K as itself having complexity of only 1 million, but it takes numerous strings as input. It seems to me that if an arbitrary input is given to a program in a case where you're measuring the character length of something, the character length of your input should be included. So while program K has complexity 1 million inherently, program K in the state of processing string S has complexity of 3 million- the complexity of S plus the complexity of K. Thus the contradiction vanishes.

What the author is doing is tantamount to trying to write the shortest program to print all of Lorem Ipsum, and their submission looks like this:

  c=`cat <<EOF
  import sys
  for line in sys.stdin:
      print(line)
  EOF`
  cat loremipsum.txt | python -c "$c"
You don't get to claim a 100 character submission without also counting the contents of loremipsum.txt.



In their example, you're not giving S as an input to K — instead, you're using K to write another program P which iterates over all possible strings and returns the first whose complexity is some value ("2 million" in their example).

That program P will certainly be longer than K (as it contains K), but not much longer — it's adding to K only the instructions needed to iterate over strings and define the threshold complexity ("2 million"). P will then produce that "2 million"-complexity output, but it didn't need any input and thus the complexity of its output is truly just the length of P (which is smaller than "2 million"). It eventually stumbles upon S by going through all possible strings, and didn't need S to be provided.

The main idea of the proof is very similar to the interesting-number paradox [0] or the Berry paradox [1].

[0] https://en.wikipedia.org/wiki/Interesting_number_paradox

[1] https://en.wikipedia.org/wiki/Berry_paradox


Got it, this makes sense, and continues to make more sense the more I think about it.

I appreciate you taking the time to break this down for me.


P+K does not include S, but can output any arbitrary S. That is the whole reason for introducing P, you need a way of getting K(S) into your program without "hard coding" it directly and thus including the complexity of S.

So, if we assume K is computable, then both S and P+K can output S, however, for an arbitrarily large S, K(P+K) < K(S). This is the proof by contradiction. Specifically, the value K(S) is supposed to be the the shortest program that can output S, yet P+K, which can also output S, is shorter.


I'm not really following the logic of your argument, but one correction to:

> it describes program K as itself having complexity of only 1 million

K has length 1M, not complexity 1M.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: