Hacker News new | past | comments | ask | show | jobs | submit login
Unsupervised Learning with Even Less Supervision Using Bayesian Optimization (sigopt.com)
85 points by Zephyr314 on March 11, 2016 | hide | past | favorite | 21 comments



One of the co-founders of SigOpt (YC W15) here. I'm happy to answer and questions about this post or the methods used. More info on the Bayesian methods behind this can be found at sigopt.com/research as well!


Well - just my two cents. The title feels inaccurate. You all are tuning hyper parameters with respect to the performance of the classification task. The bayesian optimization is really to optimize the unsupervised -> supervised pipeline. I was expecting some bayesian optimization of strictly unsupervised representation learning (ex. we have an autoencoder and use some bayesian optimization to tune hyper parameters in order to minimize a reconstruction error). This is really just supervised learning with even less supervision (which is quite typical).


Thanks for the note!

We're using Bayesian optimization to tune both the hyperparameters of the unsupervised model and the supervised model, but you are correct that they are being done in unison with the overall accuracy being the target. The lift you get from adding the unsupervised step (and tuning it) is quite substantial (and statistically significant).

The idea of tuning just the unsupervised part (or doing it independently) is great though. All the code for the post is available at https://github.com/sigopt/sigopt-examples/tree/master/unsupe.... It would be interesting to see if doing that would make for a better overall accuracy.


Actually, out of curiosity, would there be some way to use the inverse coloring transform + a lil noise to generate some kind of equivalence class of free training examples, sort of a la skip-gram?


Author here, you can definitely talk about augmenting your training data with some slight transforms of the labelled set you have. Common strategies are adding noise, rotation etc for images


Interesting. I guess the question then becomes what constitutes a "big" transformation that preserves relevant invariants.


No constrained optimization?


SigOpt allows specifying constraint ranges for every parameter. If the parameter space isn't a tensor product you can report constraint violating suggestions as failures via our API [1] and SigOpt will take that into account as it optimizes.

SigOpt isn't a constraint optimization package for solving things like k-SAT thought if that is what you were asking.

[1]: https://sigopt.com/docs/endpoints/observations/create


In addition to bounds, say I'd like to solve a problem where the variables must satisfy a linear or nonlinear system of equations. So overall there are fewer degrees of freedom than the number of variables, but the problem structure is such that it may be much more efficient to express the objective function and constraints in terms of the higher-dimensional space.


This is where the reporting of "failures" as mentioned above can be helpful. Any suggestion that violates your system of equations based constraints can be instantly reported as a "failure," and this will be taken into account as SigOpt converges to the best parameters.

You could also try to bake this into the objective function (with an L2 penalty for how bad it violates some constraints), depending on how hard the constraints of the problem actually are.


Equality constraints means you're working in a lower-dimensional subspace, I'd have to find out more about your algorithms but I'd be surprised if you wound up evaluating many feasible points at all (esp. with nonlinear equalities).

L2 penalty would likely converge to an infeasible solution. Augmented Lagrangian would be better, but then you're making users handle dual updates. At that point I'd rather use an actual constrained optimization library that does the algorithm carefully, and use primal-dual interior point. Not having this kind of thing built in counts as "no constrained optimization" IMO.


Is this the first OHAAS (Optimize Hyperparameters As A Service)?


We were the first company to launch (Whetlab was bought and shut down before they got out of private beta).

We're currently the only active company offering it as a service.

While hyperparameter optimization is one of the most common use cases of SigOpt right now, the general Bayesian Optimization As A Service we provide has also been used to tune simulations and even manufacturing and process engineering [1].

[1]: https://sigopt.com/cases/process_engineering


There is/was whetlab which got bought out by Twitter if I remember rightly. It's a shame as I was using them and wanted to do it more.


We provide a very similar interface to Whetlab and I would be happy to get you set up on SigOpt. We offer a free trial to get started [1] and a free academic plan [2].

[1]: https://sigopt.com/get_started

[2]: https://sigopt.com/edu



We've found that SigOpt compares very well to spearmint, as well as MOE [1], which I wrote and open sourced around the same time spearmint was open sourced. We have a paper coming out soon comparing SigOpt rigorously to standard methods like random and grid search as well as other open source Bayesian methods like MOE [1], spearmint, HyperOpt [2], and SMAC [3] with good results.

[1]: https://github.com/Yelp/MOE

[2]: https://github.com/hyperopt/hyperopt

[3]: http://www.cs.ubc.ca/labs/beta/Projects/SMAC/


With spearmint I had the ability to modify the parameters of the mcmc sampling (e.g. burn in iterations). Will sigopt expose parameters for those of us who want to manipulate them? Will there be options to use different types of function estimators to estimate the mapping between hyper params and performance (i.e. what if I would like to use a neural network or a decision tree instead of gaussian processes)?

I say these things because as someone who is active in machine learning - I often want to optimize hyper parameters. The type of people that are serious about optimizing hyper parameters (i.e. people who may not like to use grid or random searches) for a model are usually some what technical. Your product seems to be catered to those who may not be too technical (very simple interface, etc). How will you balance what you expose in the future without giving away too much of your underlying algorithms?


As you pointed out, it is all about a balance, and every feature has different tradeoffs.

SigOpt was designed to unlock the power of Bayesian optimization for anyone doing machine learning. We believe that you shouldn't need to be an expert and spend countless hours of administration to achieve great results for every model. We're wrapping an ensemble of the best Bayesian methods behind a simple interface [0] and constantly making improvements so that people can focus on designing features and their individual domain expertise, instead of needing to build and maintain their own hyperparameter optimization tools to see the benefit.

For experts who want to spend a lot of time and effort customizing, administering, updating, and maintaining a hyperparameter tuning solution I would recommend forking one of the open source packages out there like spearmint [1] or MOE [2] (disclaimer, I wrote MOE while working at Yelp).

[0]: https://sigopt.com/docs

[1]: https://github.com/JasperSnoek/spearmint

[2]: https://github.com/Yelp/MOE


thanks for all the great responses!


Now just throw some compressive sensing at the problem ;)




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: