Show HN: A Genetic Algorithm I wrote in JavaScript to evolve "Hello World"

Groxx · on Dec 14, 2010

In my artificial neural networks class, we had a guy who attempted to do just this...

in bash...

to generate C code...

using temp files to pass data around.

They didn't get far, but did actually manage to get something to print after a few hours, though it never seemed to get closer to "hello world" after that point. We figured it was just that the GA was a bit overly proud of its accomplishment, and wasn't interested in improvement. This is with random strings inside a main() wrapper with stdio, and nothing else. I forget what their fitness function was, but I'm fairly certain it wasn't too simple, and I think that shot them in the foot near the end.

I still boggle at the attempt; I built a maze solver in Ruby that handled changing mazes by ant-trail-like solving (I forget the term...) that only took me a handful of hours. Then I spent another 10 to make modifying the maze as simple as possible, and I made a dozen or so wildly different mazes with moving walls, transporters, etc while doing my presentation to demonstrate that it worked. I think that may have won me more points than the functioning code; lots of people's barely worked, and nobody else had anything they could run in reasonable times during a presentation.

stavros · on Dec 14, 2010

The term is Ant Colony Optimization:

http://en.wikipedia.org/wiki/Ant_colony_optimization

Groxx · on Dec 14, 2010

That's the one, thanks! And ooh, I hadn't thought of setting it up for a traveling salesman problem... that does seem like a good application. Never knew that was one of the first uses; the moment I grasped the principles, maze-solving seemed a logical use.

stavros · on Dec 14, 2010

I did try that once, the results were very good, though not optimal. It did converge very quickly to a good solution, however.

user24 · on Dec 14, 2010

> This is with random strings inside a main() wrapper with stdio

I'm impressed it managed to get something which ran at all! Usually with GP you at least start with a set of valid operations and keywords (eg genes=['function','printf','return','...etc']) and mix those together randomly.

icefox · on Dec 14, 2010

A few years ago I made up this [generic] genetic algorithm page in javascript. http://icefox.github.com/javascript_genetic_algorithm/?progr...

The neat part about it was that you could put in your own problem and have it solve it. I took a minute and slapped together the Hello World! version of this problem. You can copy it into the text box and play with the parameters (and resulting graph output. The output isn't a pretty as the link and it runs slower, but still neat.

function salesman() { this.target = "hello world!";

    this.points = [ 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '!', ' ' ];

    this.fitness = function(chromosome) {
        // higher fitness is better
        var f = 0; // start at 0; the best fitness
        for(var i=0, c=this.target.length; i<c ; i++) {
            f -= Math.abs(this.target.charCodeAt(i) - this.points[chromosome[i]].charCodeAt(0));
        }
        return Math.abs(f);
     };
    
    // the size of values that should be passed to fitness
    this.numberOfArgs = function() { return this.target.length; };
    
    // the max value needed for the arguments
    this.maxArg = function() { return this.points.length; };
    
    // convert the current chromosome value which can have a maxValue
    // into something fitness can use.
    this.getArg = function(value, maxValue) {
        return Math.round(value * (this.points.length - 1) / maxValue);
    };

    // Paint the solution onto bestimage
    this.paint = function(values) {
         var canvas = document.getElementById('bestimage');
        if (canvas.getContext){
            var w = canvas.width;
            var h = canvas.height;
            var canvasContext = canvas.getContext('2d');
            canvasContext.clearRect(0, 0, w, h);
            
            canvasContext.font = "italic 200 12px/2 Unknown Font, sans-serif";
            canvasContext.strokeStyle = "blue";
            for (var i = 0; i < values.length; ++i){
                canvasContext.strokeText(this.points[values[i]], i*10, 10);
            }
        }
    }

} salesman;

icefox · on Dec 14, 2010

A change to the code so it didn't hard code the letters, but just use ascii: http://benjamin-meyer.blogspot.com/2010/12/evolving-hello-wo...

user24 · on Dec 14, 2010

Nice! Yeah I was thinking about trying to genericise it so you could plug your own fitness function in, but didn't get around to it.

I got an error when playing with the values though; pop 400, mutation 0.2, crossover 0.8: "When executing function:this.people[peopleSize - i] is undefined"

icefox · on Dec 14, 2010

In mine the two options have be less than 1.0 when combined.

So out of all genomes, 20% (.2) goes to crossover, 80% (.8) goes to crossover and the rest live. The two numbers have to be less than 1.0 for some to remain alive from one round to the next. This gives the option to also specify how many survive from one round to the next v.s (correct me if this is wrong) a random amount each time in your example.

For what it is worth you graph looks much nicer :) Back when I first made mine the canvas tag wasn't around and js was much slower. Really should remove the 'you might get an image!' text before the graph.

user24 · on Dec 14, 2010

Ah, right I see :)

> This gives the option to also specify how many survive from one round to the next v.s (correct me if this is wrong) a random amount each time in your example.

in mine the population is constant. A set of two parents always have exactly two children. Breeding occurs until there are exactly as many children as there are parents, then the children mutate, and murder their parents. Genetics is fun!

Yeah I'm really happy with the graph, thanks! Canvas API is a bit clunky, but powerful (especially now we're allowed to write text into it!)

edit: what selection method are you using?

icefox · on Dec 14, 2010

Say 10% mutate and 60% crossover, that leaves 30% to live another round and be the parents for the crossover and mutation. After the fitness function is run the results are sorted by the fitness value. The lowest get to live and be part of the 30% the live/mutate/crossover.

Link to the js: http://icefox.github.com/javascript_genetic_algorithm/javasc...

cgs1019 · on Dec 14, 2010

The author clearly put a lot of work into both the implementation and the presentation as well as the accompanying blog post (I love the quotes interspersed; reminds me of my college Abstract Algebra text by Gallian). However, I must say I was hoping for a GA that evolved a /program/ to print "Hello, world". Maybe in a follow-up post? :D

[EDIT: s/GE/GA/ ... not sure why I got that wrong...]

user24 · on Dec 14, 2010

:D Thanks for your positive comments!

> I love the quotes interspersed

You've no idea how hard it is to find good quotes about mutation! (I did try to find a good TMNT quote at first...)

> I was hoping for a GA that evolved a /program/

That would be very cool. I think my next project is going to be a neural network script. But I may revisit genetic algorithms again too, and at that point I'll probably look into genetic programming.

Vivtek · on Dec 14, 2010

I almost made this same comment before noticing yours.

user24 · on Dec 14, 2010

The code's fairly messy, but I hope the blog post and online demo will be enough to give the absolute beginner a good idea of how GAs work.

One day I hope to apply machine learning to the art of predicting real-life events. This is one small step towards that project, and I'd love to hear your feedback! - But it's 1AM now so I'll be offline for the next few hours. Here's hoping I wake up to a million upvotes! :)

yters · on Dec 14, 2010

Unfortunately there is something called the no free lunch theorem. Puts a significant constraint on using machine learning for real-life prediction.

tel · on Dec 14, 2010

Then again, all ML eventually is used for real-life prediction, so you ought to be able to buy a lunch if you really want one. The GA here is a fun start that'll introduce a training algorithm; there are plenty of gradient-aligned steps forward.

yters · on Dec 17, 2010

Sure, I'm just saying ML, just like its forebear AI, doesn't live up to the hype. Lunch is very expensive.

tel · on Dec 18, 2010

Probably true. I'm not a big fan of the term ML due to the hype associated with it. Statistics however, which is pretty near the same thing most of the time, has proven itself quite often.

Groxx · on Dec 14, 2010

To that end, I propose a new acronym: The TISATATTANSTAAFLT (pronounced: "tiz-a tat tan staa fult")

The "There Is Such A Thing As The There Ain't No Such Thing As A Free Lunch" Theorem.

user24 · on Dec 14, 2010

You never know until you try ;)

yters · on Dec 17, 2010

Very true. I try jumping off of cliffs all the time cuz this time I might just fly!

user24 · on Dec 18, 2010

Right. I'll just give up then. It's a good job no-one wasted time trying to apply machine learning to speech recognition, cuz there's no free lunch you know. Or handwriting recognition. Or housing prices. Or game strategies. Thanks for letting us know that there's no point searching for a solution via ML. Phew!

anathem · on Dec 14, 2010

  * http://www.jurisgalang.com/2010/10/13/breeding-images/ 
  * http://www.jurisgalang.com/2009/02/01/breeding-strings/

IsaacL · on Dec 14, 2010

Aren't genetic algorithms and machine learning distinct? I'm far from an expert, but my understanding is that with ML you're either trying to find a function (supervised) or clusters (unsupervised), whereas GA are a generic optimisation technique.

Maybe you could generate a bunch of different versions of an ML algo (say, ANN with different numbers of hidden nodes and starting weights), and use GA to evolve the best one. Doesn't seem like the best technique, though.

user24 · on Dec 14, 2010

Yeah, I've heard of using GA to find good weights for an ANN.

GAs are general optimisation techniques, but 'evolutionary computing' is very closely linked to AI and ML.

JohnIdol · on Dec 14, 2010

Nice.

I did something similar in javascript --> https://github.com/JohnIdol/WasDarwinWrong ... and C# --> https://github.com/JohnIdol/typingmonkey

agnt666 · on Dec 15, 2010

Recently, I also wrote (C#) a string evolver using a GA: https://github.com/dreasgrech/StringEvolver

I tried 3 different types of fitness functions: Per-Character error summation, Levenshtein Distance and Hamming Distance.

I found that Hamming Distance produced the quickest convergences.

The code for my project is "fully OOP" so you can switch implementations for crossover, mutation and selection operators and also for the fitness functions.

Go ahead, fork me ;-)

kurin · on Dec 14, 2010

Geez. Tried to do this myself. 10k generations in and still not a recognizable string.

kurin · on Dec 14, 2010

Okay. It helps if you sort your fitness correctly. Also it looks like if you keep mutations too low, you'll get stuck in a local maximum.

jerf · on Dec 14, 2010

I don't think there's a local optima here. The gradient with respect to the individual characters always points you at the solution, from all states. Pathologically so, even. But it makes for a good learning problem since the problem space actually fits in human heads and you can clearly see what the GA is doing. When you fire a GA at a real problem you won't have that luxury.

SeanLuke · on Dec 14, 2010

> I don't think there's a local optima here. The gradient always points you directly at the solution, at all times, in all states.

Well, there are no local optima but plenty of global optima other than the correct answer, I think: "Hello, World!" and "!dlroW ,olleH" have the same fitness for example. So no, the gradient won't necessarily lead you to the solution.

Now, suppose we defined the fitness function instead as the number of correct characters plus the number of characters in the right place, then we'd have local optima and a better chance of getting to the optimum.

tassl · on Dec 14, 2010

There is only one global optima with the defined fitness function, and is the one with f=0. There are different local minimum when there is no mutation (for example, if all chromosomes on the initial population don't have the letter W, it will be impossible to create the target unless mutation is used).

The other answers that you are thinking on have different fitness (the comparison is target[i] ?= guess[i]).

In any case, I like this examples because they currently show the process that the GA has to converge in a way easy to see/understand. Of course, in real life it is more used when there is no information of the function to minimize (optimize), or the function is not convex in our search domain, which can lead to getting stuck in a local minimum using other search methods.

I have always felt that the fancy name increased the interest on the topic.

abhikshah · on Dec 14, 2010

The fitness function does take char location into account. In python, it'd be -1*sum(abs(x-y) for x,y in zip(chromosome,target)).

The gradient here is continuous and smooth. Problems where GA is an appropriate solution have fitness landscapes that are much more pathological.

SeanLuke · on Dec 14, 2010

Update Looks like the guy's description of his function is different from his actual function. You're right, there is a single global optimum and no local optima.

user24 · on Dec 14, 2010

Ah, you're right. Wasn't intentional. I say:

> Return the sum of the differences between the chromosome and the target string.

but I mean something more like:

> Return the sum of the character-wise differences between the chromosome and the target string.

kurin · on Dec 14, 2010

I guess "local optimum" wasn't the right word. I got suck in a configuration in which my breed() method couldn't get closer to the goal, because it was just shuffling around characters that existed at the start. (I had started with 1001 randomly generated 13-byte sequences.)

Groxx · on Dec 14, 2010

With GAs, you're frequently better allowing your breeding / mutation to make insane changes. Mimic reality: some changes flat-out kill the offspring immediately. Sure, you need letters... but allow breeding to slice the binary strings representing those letters. Otherwise you rely too heavily on mutation for changes, which is typically very slow.