Hey all, I'm one of main devs of Mahout and saw this article and commentary. I t...

Hey all, I'm one of main devs of Mahout and saw this article and commentary. I think it's basically right. I'd like to add my own perspective.

I think Mahout has one key problem, and that's its purported scope. The committers' attitude for a long while, which I didn't like myself, was to ingest as many different algorithms that had anything to do with large-scale machine learning.

The result is an impressive-looking array of algorithms. It creates a certain level of expectation about coverage. If there were no clustering algorithms, you wouldn't notice the lack of algorithm X or Y. But there are a few, so, people complain it's not supporting what they're looking for.

But there's also large variation in quality. Some pieces of the project are quite literally a code dump from someone 2 years ago. Now, some is quite excellent. But because there's a certain level of interest and hype and usage, finding anything a bit stale or buggy leaves a negative impression.

I do think Mahout is much, much better than nothing, at least. There is really only one game in town for "mainstream" distributed ML. If it is only a source of good ideas, and a framework to build on, then it's added a lot of value.

I also think that some corners of the project are quite excellent. The recommender portions are more mature as they predate Mahout and have more active support. Naive Bayes, for example, in contrast, I don't think has been touched in a while.

And I can tell you that Mahout is certainly really used by real companies to do real work! I doubt it solves everyone's problems, but it sure solves some problems better than they'd have solved them from scratch.

I strongly agree with here is that you're never likely to find an ML system that works well out-of-the-box. It's always a matter of tuning, customizing for your domain, preparing input, etc. properly. If that's true, then something like Mahout is never going to be satisfying, because any one system is going to be suboptimal as-is for any given system.

And for the specialist, no system, including Mahout, is ever going to look as smart or sophisticated as what you know and have done. There are infinite variations, specializations, optimizations possible for any algorithm.

So I do see a lot of feedback from smart people that, hmm, I don't think this all that great, and it's valid. For example, I wrote the recommender bits (mostly) and I think the ML implemented there is quite basic. But you see there's somehow a lot of enthusiasm for it, if only because it's managed to roughly bring together, simplify, and make practical the basic ML that people here take for granted. That's good!