Building a Recommendation Engine with NumPy

physcab · on Dec 22, 2010

I spent a great deal of time writing a recommendation engine for a major web property, so I'll just offer some brief pointers for anyone interested in this stuff:

- If you know nothing about how recommenders might work, a good introductory book is Programming Collective Intelligence.

- To do recommendations at scale, your best bet is to write them in java and run them on Hadoop using map/reduce. You can write them in python too I suppose and use Hadoop Streaming.

- NumPy is awesome. Its a great way to prototype your ideas and if you come from a Matlab background (as I did), its very similar. I have not yet run anything in production using NumPy though.

- If you want a recommender that works out of the box, check out Apache Mahout (used with Hadoop) or the Weka project.

samratjp · on Dec 22, 2010

+1. If anyone is interested in map-reduce solutions in python, be sure to check out Yelp's Mr. Job (http://www.readwriteweb.com/cloud/2010/10/yelps-mrjob-poweri...)

withoutfriction · on Dec 22, 2010

Is ruby appropriate for writing recommendation engines? Or is there far more support for writing them in python?

drats · on Dec 23, 2010

As far as I am aware Ruby has no real equivalent to Numpy and Scipy for Python. This is why I swapped to Python from Ruby; the Ruby community gets extremely thin indeed as soon as you stray from web-related things.

sandGorgon · on Dec 23, 2010

have you looked into using Cascalog/Clojure with hadoop ?

stcredzero · on Dec 22, 2010

Others in these comments have discussed or questioned if Python/NumPy is the best language for recommendations, due to scaling/speed. What if there was a project to translate Python libraries to run on LuaJIT? I suspect a lot of work could be done using sytax-directed automated translations, and there's also Lunatic Python as a fallback, in case a needed Python library hasn't been translated yet.

Does this seem to be a worthwhile project?

rmc · on Dec 22, 2010

Is NumPy the best way to do recommendation engines in Python? Has anyone done a recommendation library?

jchonphoenix · on Dec 22, 2010

NumPy is used in rec engines because it has an extremely fast matrix library.

If you're doing non-matrix (or representing a matrix without a 2d array) recommendation engines then NumPy could be completely useless.

I'm doing research on this exact problem at Carnegie Mellon and we are using a graph to do things instead. We aren't using basic techniques like kNN however, so that may have something to do with it. Instead, we have someone who has done heavy research in submodularity and we're using an approximation algorithm to the submodular function optimization problem.

calanya · on Dec 22, 2010

Sounds very interesting. Can you point to a paper in the area?

jchonphoenix · on Dec 22, 2010

I help Khalid and Carlos with their research (I'm an undergrad). Here's Khalid's thesis proposal. The paper recommendation problem is section 2.2 (Beyond Keyword Search).

http://www.cs.cmu.edu/~kbe/proposal.pdf

danbmil99 · on Dec 22, 2010

please indicate a video post

J3L2404 · on Dec 22, 2010

This is their new layout, which I like very much, because you don't have to watch the video and can just scroll down for each screenshot of code on the left and the transcript on the right. I hope more sites adopt this style.