It's been a decade since I have written a BF program, but if I'm not mistaken, t...

robert_tweed · on Jan 24, 2015

This appears to be GA rather than GP, but I did some experiments with GP a while ago and the most counterintuitive thing I discovered was how essential junk code turns out to be.

First, for anyone unaware of the difference, in Genetic Algorithms (GA) you have fixed-length strings of instructions. Genetic Programming (GP) is a superset in which you have ASTs, which can get arbitrarily large (which can be a problem if you're not careful). I find GP much more interesting than GA, but the practical issues with AST bloat are why you see a lot more GA examples.

Anyway, not wanting to run into overly large ASTs myself, I went ahead and prematurely optimised my GP implementation by having it always collapse functions into simpler forms where possible, e.g. (+ (+ 1 2) 3) becomes (+ 6).

It turns out that this causes huge problems, because all that "junk" code is actually useful during evolution. Each generation inherits parts of its parents (called "crossover"), but if you cut down the information available to inherit as-per the above optimisation, you get much less variety between the generations and become more reliant on random mutations. As a result, it's like falling back on bogosort to find your solution.

Once you get down into final generations you can start running optimisations like this and removing dead code, but if you do it too soon, you won't reach a solution convergence quickly, if at all.

Interestingly, this principle seems to apply to DNA too, but DNA doesn't come with an optimiser. Hence most DNA is "junk" but the junk is actually more important than it seems.

It also turns out that genetically optimising (as opposed to hand-coding the optimisation rules, like I did) is non-trivial too. You can either adjust your main cost function to account for code size/speed alongside the problem you're trying to solve, or you can do a second pass that aims to optimise the code after you've already found an acceptable solution. Either way, there's a certain amount of black magic involved to decide how many iterations to run, or what constitutes a "good enough" or "fast enough" or "small enough" solution so you can decide when to stop.

Code size is also irrelevant in GA because the code is always the same size, but different instructions might have different costs, with NOPs having the lowest cost. If you allow goto or recursion in your VM then you have to impose a runtime constraint to avoid infinite loops, and can also assign a cost per instruction executed or per microsecond of runtime.

Someone · on Jan 24, 2015

"Once you get down into final generations you can start running optimisations [...] but DNA doesn't come with an optimiser."

It better not, as DNA isn't planning to "get down into final generations". The Borg would be long dead if it ever stopped adapting.

LnxPrgr3 · on Jan 24, 2015

Stanford researchers did something similar for superoptimization: http://cs.stanford.edu/people/eschkufz/research/asplos291-sc...

They separate the problem into two phases: the first is only worried about correctness, then the second deals with optimization while (eventually) preserving correctness.

ctdonath · on Jan 24, 2015

I hope you've written this up somewhere for publication. You've solved the "junk DNA" issue with computer science.

Best post on HN ever.

robert_tweed · on Jan 24, 2015

It's already documented. I just didn't pay much attention and was thus able to verify what others have observed. I highly recommend the book "A Field Guide to Genetic Programming", which covers this and other associated topics. It's one of the best CS books I've read in the last few years.

r00nk · on Jan 24, 2015

Yeah, the next thing planned for it is user-settable hunger. Then, making more and more efficient code is simply a matter of following the unix philosophy

corvus -h 500 | corvus -h 300| corvus -h 200

So on and so forth as you get closer to [,.]

The dead code part was a organic artifact, and I actually don't think that I want to change that part. It sorta makes you wonder about how much dead code is in real DNA.

yzzxy · on Jan 24, 2015

A vast fraction of DNA is thought to be junk code, actually. Some of it is even believed to be remnants of viruses that inserted themselves into human DNA.

yuchi · on Jan 24, 2015

With genetic code you’ll also get junk code, exactly as it happens for DNA :)

yuchi · on Jan 24, 2015

(I mean, pieces of junk code)