Machine Learning as a Service

jgalt212 · on May 31, 2013

I dunno. Machine Learning as a Service seems like a tough thing to monetize, as most machine learning in practice involves a lot of tweaking which would then imply that practitioners would like to go further down the stack to work directly with an R, Python, ?? module, look at its code, see where it's failing, working etc.

I do like Machine Learning as a Service as a loss leader. e.g. customer walks up to the door, can't really get the problem cracked with an out of the box solution, but instead you sell/him her on an expensive long-term consulting project. i.e. the IBM Model.

Does anyone know how one of the pioneers in the segment, Numenta, is fairing? They've been around for a while and seem to have recently changed their name to Grok Solutions.

Deep Learning as a service seems like something that could work as their are less knobs for the user to fiddle with. That being said, it does not seem like Deep Learning is quite there yet.

hkmurakami · on May 31, 2013

I know of at least one Berkeley ML PhD. that was working on a startup that could have easily used the slogan, "Machine Learning as a Service".

I have to wonder how founders/founders-in-the-making react when faculty members from their alma mater, from their own department no less, enter their space. Must be a little bit like having Google enter your niche.

_pmf_ · on May 31, 2013

> I know of at least one Berkeley ML PhD. that was working on a startup that could have easily used the slogan

Out of interest: what is/was the companies name?

hkmurakami · on June 1, 2013

he was still building the product demo when I last talked to him so unfortunately I don't know the answer

dbecker · on May 31, 2013

Must be a little bit like having Google enter your niche.

Interesting analogy... given that Google is already in this niche (with their prediction API)

ozten · on May 31, 2013

To keep ML fast and cheap, you want to keep compute as close to data as possible.

I don't think a web service would be broadly applicable. Perhaps in certain domains it would make sense, but bandwidth costs and duration would be a huge factor in most solutions.

jaytaylor · on May 31, 2013

I completely disagree. I've used siftscience (http://siftscience.com) and the service is remarkably good. I admit there are some cases where it would be impractical. However, in my experience ML-as-a-service has been pragmatic and I've seen it lead to very favorable outcomes with minimal effort from the client.

raylu · on May 31, 2013

So... instead of disagreeing out of hand, how did you overcome the fact that you had to pay for bandwidth between your terabytes of data and the computation centers?

jaytaylor · on May 31, 2013

Terabytes are cheap compared to my time. I've also never required that much training data, so perhaps your use case does not lend itself to these services. I've built automated ML model training systems myself with TreeNet and Mahout, as well as used API-based ML systems, and I'm comfortable saying there is a strong case to be made for services which save much unnecessary effort.

raylu · on May 31, 2013

Storing terabytes is cheap. Terabytes of bandwidth is very very expensive.

neeleshs · on May 31, 2013

Some old links on the cost of terabytes of bandwidth http://josephscott.org/archives/2009/01/how-much-does-one-te...

raylu · on May 31, 2013

And that assumes you're willing to host with those providers, many of whom don't provide computing power. Also, the machine learning service you are using has to pay again for that bandwidth too and that cost gets passed along to you.

ivanist · on May 31, 2013

Transferring 1TB at a sustained rate of 1MB/s will take about 11.5 days.

lightcatcher · on May 31, 2013

One easy way to get around bandwidth costs is to just have the clients of your service communicate directly (via Javascript or other client-side code) with your "machine learning as a service" (MLaaS? :D) vendor. Of course this model only works for certain businesses and allows learning on certain things, but you can go a pretty long way with this.

fragmede · on May 31, 2013

Yes - but doesn't that just beg for putting it on S3 and spinning up EC2 instances on demand? Amazon already handles mailing hard drives to them and has the download bandwidth if you've got the upload speed.

The other option is doing it all in-house, the storage, the servers, oh yeah, and that whole ML thing. Not a bad thing, but not necessarily your core-competency.

jpalomaki · on May 31, 2013

If the data sets are already in the cloud, for example at AWS, you could skip the transfer costs if the machine learning SaaS operates on the same cloud.

In case dealing with terabytes of data it would probably make sense for the MLaaS operator to run points-of-presence in major clouds.

joeyrichar · on May 31, 2013

Ozten, we completely agree which is why we also provide an on premise version of our Machine Intelligence Engine. Our mission is to democratize ML and allow companies to easily deploy it in production.

Thanks for the interest and feedback!

Joey Richards, Chief Scientist, wise.io

mturmon · on May 31, 2013

That seems to be the approach of skytree, another entrant in this space.

curiousRevision · on May 31, 2013

What IP are they patenting? It sounds like they just implemented random forests in C++ in a way that avoided data copy, which is obviously what you'd want to do for performant non-parametric machine learning... So what's patentable here?

Groxx · on May 31, 2013

How does this compare to, say, the Google Prediction Service[1] ? The example use-cases sound similar-ish to my layman interpretation. I've never used a ML service though, so honestly I have no idea what to expect or look for.

Obviously this offers an on-site option that Google does not, which might open up other realms of options. I'm mostly curious in how / how well / what range of problems they're capable of.

[1] https://developers.google.com/prediction/

hayksaakian · on May 31, 2013

The OP's service has a better chance of staying around when it comes time for spring cleaning.

Groxx · on May 31, 2013

True :) Though IIRC Google's prediction API has been paid from day 1, so it might stick its neck out a bit less than e.g. Reader.

piokuc · on May 31, 2013

It would be interesting to know how popular Google's prediction API actually is. They don't show off any testimonials or customers' success stories. The traffic on their forum seems rather moderate: https://developers.google.com/prediction/docs/general_discus... Regardless, I do believe MLaaS makes sense, not necessarily as a fully automated black box thingy, it could and probably should be backed by a consulting service - this is, I think, where startups in this space will have an edge over Google.

dbecker · on May 31, 2013

I haven't used OP's service, but the results from Google's prediction service are surprisingly bad (in both my tests, and in the experience of others I have talked to.)

noelwelsh · on May 31, 2013

Couple of points:

- Benchmarking against Weka, R, and Python is not exactly pitting your product against stiff competition. Skytree (skytree.net) is another company in the same space with the same focus. Benchmarking against them would be interesting.

- I thought it was amusing that in a company of < 20 people the five founders thought it necessary to adopt such grandiose titles: CEO (fine), CTO, Director of Engineering, Chief Scientist, Director of Data Science (exactly how are the hairs split between these four?)

joeyrichar · on May 31, 2013

Thanks for the comments.

(1) We chose to do our original benchmarks against R, Weka and sklearn because these are the tools that the vast majority of people currently use. You'd be amazed how many companies use Weka! That said, we do benchmark favorably against the other competition. We will be publishing a series of blog posts with these benchmarks. Stay tuned!

(2) Our titles in fact do mark a clear delineation between our respective roles and responsibilities, and this is well understood within the company. Perhaps the titles are a bit grandiose, but we have a very big vision for this company.

piokuc · on May 31, 2013

I think the second point was completely unnecessary. I didn't find it amusing or funny, just sarcastic. Yes, sarcastic, but not funny. I guess you consider wise.io your competition, don't you?

hermaj · on May 31, 2013

Speaking as someone with some years experience in ML to me there is a world of difference between the front page for wise.io and mynaweb.com (presuming these are the competing sites).

The Myna page explicitly states it uses multi armed bandit algorithms, the algorithm fits the properties of the following claims. Wise's use of 'patent-pending', 'machine learning technology' and 'deploy machine intelligence' gives the impression of hiding shortcomings with jargon.

noelwelsh · on May 31, 2013

I don't think wise.io are competition for Myna. The tech does fairly different things, and I think we're targeting completely different markets.

As for whether my comment was misplaced or not -- I guess that's up to the community to decide if they care to do such a thing. I do admit it was a bit snarky, but I also genuinely do find the titles a bit amusing.

Tying in to another post on the front page right now, I do think that generalists are advantageous in early stage companies, and titles tend to be more appropriate with specialisation.

visarga · on May 31, 2013

Is this algorithm (a Random Forest variant named WiseRF) difficult to tune or easy? Does the user have to guess parameters, learning rates and such?

I've found that ML algorithms can be like race cars - you really have to know their quirks to get performance out of them. The opposite would be analogous to a luxury car - almost everything is taken care of by the computer - you don't shift gears and don't open the lid.

So, is this WiseRF a race car or a luxury car?

textminer · on May 31, 2013

Random Forests are nice in having few parameters-- number of trees, number of restricted features to sample on at each decision node.

It's the variations on the standard RF that have interested me. Rotation Forests, where features are partitioned randomly and rotated (by PCA or random projection) before decision boundaries are drawn. Extremely Randomized Forests, where the node splits are completely random and not based on best possible Gini/Entropy gain along some feature. There's even an interesting use of deterministic annealing out there for incorporating unlabeled data points in an attempt at semisupervised learning.

These each have their own parameters to tune, but have had slightly different performance on different problem domains. Even models require different data-- most decision trees have a quite natural way of imputing missing values, but something like a Rotation Forest can handle neither missing values nor categorical data (unless you map it to m binary features). And that complexity spooks me away from Machine Learning as a Service, where one could start failing to understand his or her models. (Plus then I'd probably be out of a job.)

vladoh · on May 31, 2013

I actually think that random forests are relatively difficult to tune in comparison to some Boosting methods for example.

The typical parameters are the following: - Number of trees - Percentage of data used to train each tree - Maximal depth of the tree - Minimum information gain (although this can usually be set to 0 and use only the depth) - Minimum sample size (same as the minimal information gain) - Number of thresholds to try for continuous data

Tuning is also difficult because of the non deterministic nature of the algorithm. If you compare to sets of parameters it requires more evaluation in order to be sure if one set is better because of the better choice of the parameters or because the algorithm chose the right data samples. This effect decreases with the number of trees, but the training time is increased in this case.

I think random forests are really good for very large datasets, but in my experience for smaller datasets boosting (for example JointBoost or GentleBoost) can give better results.

taliesinb · on May 31, 2013

Speaking of bagging: Geoff Hinton talks a bit about model averaging and has a very stimulating pair of hypotheses related to this, which are:

a) genetic recombination in the form of sex might be "bagging across genes" that prevents tight co-adaptation (= overfitting on the evolutionary timescale)

b) the brain uses noisy discrete firing instead of continuous communication because it allows for "bagging across network topologies".

The bulk of his talk is about how dropout in neural nets can be used to accomplish some of the same performance benefits that model averaging gives on other algorithms (at a much smaller relative computational cost).

Here's the talk: https://www.youtube.com/watch?v=DleXA5ADG78

joeyrichar · on May 31, 2013

As textminer says, RFs are really nice in that there are few parameters to tune, and the results typically are not that sensitive to the choice of those parameters (contrasted to, say, SVMs, where you can get killed in performance with a poor choice of tuning parameters).

With the MLaaS platform, all of the model optimization is taken care of under the hood (we also allow users to do their own parameter selection / tuning if desired). Our super fast implementation, WiseRF (10-100x faster than RF in sklearn or R) enables us to efficiently explore the hyperparameter space.

Thanks again for your questions and comments.

Joey Richards, Chief Scientist, wise.io

mbq · on May 31, 2013

Two more questions then; is this speed-up achieved on a single core or also due to parallelism? And have you compared WiseRF with Random Jungle (http://bioinformatics.oxfordjournals.org/content/26/14/1752....)?

joeyrichar · on May 31, 2013

(1) The speed-up is achieved on a single core (multithreaded). The tiny memory footprint enables us to do embedded learning (e.g., on an ARM chip). We also have a distributed version of WiseRF in development (stay tuned!).

(2) Soon, we'll be publishing a series of blog posts to benchmark WiseRF against competing implementations. Look for that next week.

Thanks for your interest!

gracenotes · on May 31, 2013

So, at a high level, some future version of this software will show intuitive visualizations of learned models of the data? Or currently, at a lower level, it seems to be implementations of standard machine algorithms with a Python API to use them.

There seems to be an emphasis on efficiency, although I don't think that most freely available machine learning libraries are fundamentally poorly implemented. One problem with these libraries is that documentation can sometimes be scarce. Another problem, for the 0.01% of companies which actually have "big data", is that they might not scale, whatever that means.

Regardless of the library used, one of the bigger problems may be that machine learning, if it's worth it at all, is inherently fickle and tricky. To make an overly broad conjecture: if an externally provided machine learning solution works well, either your data didn't require that much domain knowledge to understand (it was "obvious") or some external/outsourced firm has a deeper understanding of your data than you do. More of the former type of analysis might not necessarily be a bad thing, though.

hobbes78 · on May 31, 2013

One tiny bit of my MSc thesis was based on Google Prediction API and it worked fine, apart the non-conventional URLs used that caused problems with .NET networking.

Even in a project where ML is more important, I think it's usable...

Rickasaurus · on May 31, 2013

Interested in hearing some experience reports. Those stats look too good to be true, and they don't mention which algorithms.

damianeads · on May 31, 2013

Hi,

Thanks for your interest. You can try it yourself by downloading a trial edition of the software. The page http://about.wise.io/wiserf/ mentions the algorithm we're using.

As for the stats, you can reproduce the numbers yourself if you're interested. The ML codes are written in C++ with a very strong emphasis on performance and a low memory footprint.

Give it a try and let us know what you think.

Damian Eads Director of Engineering, wise.io

theschreon · on May 31, 2013

Will this software focus solely on Random Forests? Hoped to see e.g. Deep Convolutional Neural Networks as an option :(