A couple years ago there was a paper reported somewhere (it may have been here) ...

A couple years ago there was a paper reported somewhere (it may have been here) that dealt with unsupervised learning using entropy as the only fitness function. Regardless of the task or any other factors, the researchers used maximizing entropy as the only goal. And this immediately led to the development of complex, interesting, and desirable behavior. When used for a system balancing a pole, it would learn to balance the pole upright. When given a ball where a hoop was present, it would automatically navigate the ball through the hoop. I tried to reach out to the author of the paper to get a full copy of the paper (could only find a paywalled abstract online) but never got a response. It seemed like a very interesting approach, and this sounds like doing basically the same thing. Favor moving to any state which increases the maximum likely future states. Increase entropy, 'intelligence' emerges.