Hacker News new | past | comments | ask | show | jobs | submit login
Building a Recommendation Engine with NumPy (software-carpentry.org)
89 points by J3L2404 on Dec 22, 2010 | hide | past | favorite | 12 comments



I spent a great deal of time writing a recommendation engine for a major web property, so I'll just offer some brief pointers for anyone interested in this stuff:

- If you know nothing about how recommenders might work, a good introductory book is Programming Collective Intelligence.

- To do recommendations at scale, your best bet is to write them in java and run them on Hadoop using map/reduce. You can write them in python too I suppose and use Hadoop Streaming.

- NumPy is awesome. Its a great way to prototype your ideas and if you come from a Matlab background (as I did), its very similar. I have not yet run anything in production using NumPy though.

- If you want a recommender that works out of the box, check out Apache Mahout (used with Hadoop) or the Weka project.


+1. If anyone is interested in map-reduce solutions in python, be sure to check out Yelp's Mr. Job (http://www.readwriteweb.com/cloud/2010/10/yelps-mrjob-poweri...)


Is ruby appropriate for writing recommendation engines? Or is there far more support for writing them in python?


As far as I am aware Ruby has no real equivalent to Numpy and Scipy for Python. This is why I swapped to Python from Ruby; the Ruby community gets extremely thin indeed as soon as you stray from web-related things.


have you looked into using Cascalog/Clojure with hadoop ?


Others in these comments have discussed or questioned if Python/NumPy is the best language for recommendations, due to scaling/speed. What if there was a project to translate Python libraries to run on LuaJIT? I suspect a lot of work could be done using sytax-directed automated translations, and there's also Lunatic Python as a fallback, in case a needed Python library hasn't been translated yet.

Does this seem to be a worthwhile project?


Is NumPy the best way to do recommendation engines in Python? Has anyone done a recommendation library?


NumPy is used in rec engines because it has an extremely fast matrix library.

If you're doing non-matrix (or representing a matrix without a 2d array) recommendation engines then NumPy could be completely useless.

I'm doing research on this exact problem at Carnegie Mellon and we are using a graph to do things instead. We aren't using basic techniques like kNN however, so that may have something to do with it. Instead, we have someone who has done heavy research in submodularity and we're using an approximation algorithm to the submodular function optimization problem.


Sounds very interesting. Can you point to a paper in the area?


I help Khalid and Carlos with their research (I'm an undergrad). Here's Khalid's thesis proposal. The paper recommendation problem is section 2.2 (Beyond Keyword Search).

http://www.cs.cmu.edu/~kbe/proposal.pdf


please indicate a video post


This is their new layout, which I like very much, because you don't have to watch the video and can just scroll down for each screenshot of code on the left and the transcript on the right. I hope more sites adopt this style.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: