Hacker News new | past | comments | ask | show | jobs | submit login
MailChimp's Project Omnivore: genetic algorithm predicts email abuse rates (mailchimp.com)
26 points by bentlegen on Jan 27, 2010 | hide | past | favorite | 13 comments



But why genetic algorithms? It seems odd to start with something so complex and so slow instead of starting with simple (eg. generalized linear models) and then work your way up if they are not good enough. I suspect there are many classical statistical techniques and modern machine learning algorithms that would give better results, be faster and would have well understood principles that can be used to diagnose the model.


Genetic algorithms (and other evolutionary algorithms) aren't necessarily complex nor slow. They could get faster and better results than traditional algorithms. And, best of all, you don't have to understand your problem domain to use them.

This last point is very important. You could just throw an EA at a problem without understanding it at all. Just give the EA the task with some variables to optimize and it'll come up with solutions.

You do have to come up with an automated way of judging which solution is best, but that's usually much easier than understanding the problem domain well enough to model it with a traditional technique.

Take a look at Monica Anderson's talks below for a high-level overview of some of the advantages of using EA's and other "model-free methods":

http://videos.syntience.com/


No, actually genetic algorithms are parametric models (http://en.wikipedia.org/wiki/Parametric_model). They cannot get results than is permitted by their model class. And they are relatively poor optimization methods for exploring their parameter space. I cannot think of a single genetic algorithm work published recently in a top-notch machine learning conference (NIPS, ICML, etc.)


There's a difference between a human coming up with a model of the problem domain and having an algorithm do it for you. GA's and other EA's fall in to the latter category.


Best of all you don't need to understand your problem domain?! This sounds like a recipe for disaster to me - how do you tell if your model is giving you nonsense or not?


We had a fairly conservative cross-validation function that tested the results of the genetic optimization for a first pass. Following that, Omnivore was put into "observation" mode, so we could verify that the results it generated were predictive going forward, and not just biased towards the test set. Over the course of the past few months, the model has held up quite successfully.

Since accusing a customer of being a spammer without non-heuristic proof can be an emotion-ridden experience for everyone involved, we were conservative and optimized for accuracy over efficiency pretty much every step of the process.


I don't doubt the quality of your final model, I just think you might have found it easier to get there with a better method.


It's quite common to easily know when the answers you're getting are good without understanding the problem domain at all (or at least without needing to use that understanding to create a model).

For example, you could know that some sounds emitted from a computer speaker sound good or bad, without knowing the first thing about music theory. And evolutionary algorithms (EA) can be used to evolve these sounds to sound better and better, without having been given any sort of explicit model of what "good" or "bad" music is. The solutions evolve merely on the principles of natural selection, by randomly combining solutions and testing them for how "good" or "bad" they are. This has been done. Just search google for: evolving music genetic

The above problem is actually a little difficult for EA's to solve, as you need humans to manually evaluate each solution. This slows the EA down a lot, and it takes a lot longer for optimum solutions to evolve.

Fortunately, the solutions to many other problems can be evaluated algorithmically, by the computer itself. As a very general example, a solution that's trying to minimize some function can be evaluated via a mathematical comparison that a computer can handle just fine on its own.

For some more specific examples, see "36 Human-Competitive Results Produced by Genetic Programming": http://www.genetic-programming.com/humancompetitive.html

I'm sure you could find thousands more examples by searching for the results produced by other "model-free methods" like other EA's (genetic algorithms, for example), neural networks, ant and swarm optimization, etc.


We use MailChimp, and I'm impressed that they've managed to make such an awesome app for something totally un-sexy.


They also have a ton of easter eggs in there, which are awesome. For example, do a preview popup on any HTML email campaign and stretch the window beyond 800px wide. Do it slowly and watch both the ruler and the monkey at the top ;)



It's not a monkey, it's a chimp.


It's getting more and more difficult to send legitimate email to a list through your own server these days. MailChimp's free tools have been helpful, but I think we'll have to bite the bullet and go certified soon.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: