I spent a great deal of time writing a recommendation engine for a major web property, so I'll just offer some brief pointers for anyone interested in this stuff:
- If you know nothing about how recommenders might work, a good introductory book is Programming Collective Intelligence.
- To do recommendations at scale, your best bet is to write them in java and run them on Hadoop using map/reduce. You can write them in python too I suppose and use Hadoop Streaming.
- NumPy is awesome. Its a great way to prototype your ideas and if you come from a Matlab background (as I did), its very similar. I have not yet run anything in production using NumPy though.
- If you want a recommender that works out of the box, check out Apache Mahout (used with Hadoop) or the Weka project.
As far as I am aware Ruby has no real equivalent to Numpy and Scipy for Python. This is why I swapped to Python from Ruby; the Ruby community gets extremely thin indeed as soon as you stray from web-related things.
Others in these comments have discussed or questioned if Python/NumPy is the best language for recommendations, due to scaling/speed. What if there was a project to translate Python libraries to run on LuaJIT? I suspect a lot of work could be done using sytax-directed automated translations, and there's also Lunatic Python as a fallback, in case a needed Python library hasn't been translated yet.
NumPy is used in rec engines because it has an extremely fast matrix library.
If you're doing non-matrix (or representing a matrix without a 2d array) recommendation engines then NumPy could be completely useless.
I'm doing research on this exact problem at Carnegie Mellon and we are using a graph to do things instead. We aren't using basic techniques like kNN however, so that may have something to do with it. Instead, we have someone who has done heavy research in submodularity and we're using an approximation algorithm to the submodular function optimization problem.
I help Khalid and Carlos with their research (I'm an undergrad). Here's Khalid's thesis proposal. The paper recommendation problem is section 2.2 (Beyond Keyword Search).
This is their new layout, which I like very much, because you don't have to watch the video and can just scroll down for each screenshot of code on the left and the transcript on the right. I hope more sites adopt this style.
- If you know nothing about how recommenders might work, a good introductory book is Programming Collective Intelligence.
- To do recommendations at scale, your best bet is to write them in java and run them on Hadoop using map/reduce. You can write them in python too I suppose and use Hadoop Streaming.
- NumPy is awesome. Its a great way to prototype your ideas and if you come from a Matlab background (as I did), its very similar. I have not yet run anything in production using NumPy though.
- If you want a recommender that works out of the box, check out Apache Mahout (used with Hadoop) or the Weka project.