>What would you propose as an alternative? If nothing, fine, but how can we rela...

>What would you propose as an alternative? If nothing, fine, but how can we relate, when we only have a single best (or if you prefer: flawed) thing?

I tend to favor Karl Friston's "free-energy minimization" theory of the brain. For specifying tasks in engineering situations, I like the KL-control paradigm: the agent's task is to minimize (via the normal mechanisms of active inference) the Kullback-Liebler divergence between its induced distribution over latent variables (ie: causal models of the world) and some "target" distribution.

>It can add its internal states to the environment, and hence model these internal states.

No, it can't. I'd have to do a bunch of work to sketch some proofs, but without the ability to consider multiple observable random variables and use hierarchical Bayes on them, AIXI will not be able to detect that certain environmental states are actually equivalent to its own internal states. Hell, since Solomonoff Induction is incomputable, it's not even in its own hypothesis space, so it can never locate a program that generates itself.

>Its bounded by how many compressors / programs are available to it. Calculating the length of these programs that are consistent with the environment is feasible.

See below, please.

>I'm not sure if you mean imprecise stimuli here or imprecise sensors.

Both. From the perspective of any possible reasoning device, the problem is simply low likelihood precision (that is, high variance/entropy of the likelihood function). If I have so much noise in my sensory stimulus that all I see is 50% "heads" and 50% "tails" in my input bits, without any ordering to those bits, then I do not have the information to infer any complex causal structure (ie: the real world) behind those bits.

The point being: when the environment is noisy, it forces the mind to favor simpler explanations, even when those explanations are not the Truth, due to the parameter space over possible Truths (for example, random seeds added to some causal structure) being too large and spreading out the probability mass too thinly. Since Solomonoff Induction deals with algorithmically random programs as its hypothesis space, this means that any noise in the environment sufficient to add one bit of random seed to the shortest generating program cuts the probability mass allocated to the correct hypothesis by half.

>This can be seen as a flaw, or as a simple property (or even a feature). Something that can be optimal, and "stupid", does not detract much from its ability to be optimal.

AIXI is only optimal for a given Turing machine (ie: state-machine with alphabet). The "arbitrarily stupid prior" thing is just to say that for any program, we can create a Turing machine for which that program takes up arbitrarily much tape-space, thus making that program arbitrarily improbable in the Solomonoff Measure over that Turing machine's programs.

>Just because it is uncomputable, does not mean the theory is flawed. Sure, it is not practical, and we like practical things, but it is still valuable to have such a theory. Especially when approximations do yield practical applications.

Hence why I called them "mere" computational issues.