Understanding Machine Learning: From Theory to Algorithms (2014)

arbitrage314 · on Nov 18, 2015

I'm a math geek, but I'm also a mostly self-taught data scientist.

"The Elements of Statistical Learning" (https://web.stanford.edu/~hastie/local.ftp/Springer/OLD/ESLI...) is far and away the best book I've seen.

It took me hundreds of hours to get through it, but if you're looking to understand things at a pretty deep level, I'd say it's well-worth it.

Even if you stop at chapter 3, you'll still know more than most people, and you'll have a great foundation.

Hope this helps!

reader5000 · on Nov 19, 2015

Having read significant chunks of both ESL and Understanding Machine Learning (albeit UML much more recently) I would argue that for many readers UML is superior.

ESL pays short shrift to the computational complexity of learning whereas UML explicitly handles both statistical and computational complexity concerns. It doesnt matter how statistically pure your algorithm is if its running time scales exponentially with your data.

All of UML's chapters are conceptually unified even when discussing different ML algorithms, with ESL being more of a grab-bag by chapter.

Still, both high quality and free!

arbitrage314 · on Nov 19, 2015

Thanks for the interesting comparison!

gajjanag · on Nov 19, 2015

I am a graduate student at MIT, and can second this recommendation. It is a fantastic book for machine learning and nothing else I have seen comes close.

nafizh · on Nov 19, 2015

You meant ESL or UML?

lakeeffect · on Nov 19, 2015

ESL. His post was an hour before the reader5000 uml post.

Thriptic · on Nov 19, 2015

Would you say elements is superior to Bishop's book?

pmelendez · on Nov 18, 2015

Just a curiosity: One of the authors also proposed Pegasos SVM [1] which is a nice online approximation to SVM and that can be written in 15 lines of code or so.

http://ttic.uchicago.edu/~nati/Publications/PegasosMPB.pdf

cdnsteve · on Nov 18, 2015

I feel like the barrier to machine learning for me, as I've seen in many tutorials and books and is an immediate discouragement, is the massive amount of math thrown in your face. Many of us didn't just graduate, need glasses and fall asleep at 8pm on the couch when the kids are in bed... Math is this distant fragment of memory buried under years of everything not Math.

It feels like machine learning is only taught by academia but the majority of the audience is for practical use by the average developer wanting to play with it today.

dxbydt · on Nov 18, 2015

I interview a lot of developers for ML positions at our company. The first red flag is always a lack of math. Candidates who come in with API-level competence ie. can implement an ML algo using this,that or the other API, without any understanding of some basic math behind it, always fare poorly. Atleast in ML, not having an understanding of math is pretty much like claiming expertise in riding a bicycle by watching 100s of bicycle videos on youtube without ever riding a bicycle yourself. This isn't the case with say, front-end web dev, where nobody really cares whether what's under the hood is php or jquery or elegantly handcrafted elm as long as the webpage looks good and functions as advertised. You can get a lot of mileage by hiring someone who can "make webpages" without knowing what tools they use to make that webpage. Once you get traction etc. you'll probably rewrite all of the cruft :) With ML, if you hire somebody who can "create decision tree in python mllib" without knowing the first thing about what entropy is & how to compute entropy ( a real candidate, unfortunately ), you are simply inflicting a lot of pain on yourself and your customers. Suppose such a person is deciding whether or not to give you a loan, and he decides to construct a decision tree. He'll happily throw in your zip code and credit card number into the mix, not realizing that those two features have super high entropy but the tree will have serious overfitting issues ie. the tree will simply not generalize to unseen data. He won't realize these things because he won't know what entropy is in the first place, since he only thinks of decision trees in terms of some black box that comes out of some ML api.

Hockenbrizzle · on Nov 18, 2015

A lack of mathematical intuition is a serious problem for many people from engineering to biology to economics. It certainly plagued me throughout my engineering bachelors studies and is something I continually work to get better at.

In my opinion, physics students learn the best framework for thinking and get a very good mathematical intuition. For example, here's a problem from an introductory QM book that really threw me for a loop when I was studying:

A needle of length L is dropped at random onto a sheet of paper ruled with parallel lines a distance L apart. What is the probability that the needle will cross a line?

openfuture · on Nov 19, 2015

Since the lines are parallel you can rephrase the problem:

A circle of radius L is centered x far away from a border 0<x<L. This is because one end of the needle will always end in some zone (the center) and the other end will be L far away (the circle).

How much of the 2pi boundary is outside the zone?

When x -> 0 then it's going to be 50% since one boundary line becomes a tangent and the other goes through the middle. When we move x by k (e.g. f(x+k)) then 2k new points will be added on the left side while 2*k points leave the boundary on the right side. When x=L/2 then the boundary lines will split the circle in four equal parts (since they're tangent to the radius at r/2 on both sides) so intuitively its 50%.

ihartley · on Nov 18, 2015

https://en.wikipedia.org/wiki/Buffon%27s_needle for anyone that's interested.

peruvian · on Nov 19, 2015

> A lack of mathematical intuition is a serious problem for many people from engineering to biology to economics.

This is true. I can't really say why, but after my Discrete Mathematics class a lot of my Computer Science problems became a lot easier to reason about.

onaclov2000 · on Nov 19, 2015

I will say that after my discrete math class we really only talked about how to write a proof, and well, it really didn't help for me. (I think we were supposed to get further into stuff, but well, the class wasn't paced well, new professor, etc).

graycat · on Nov 19, 2015

For such a problem, usually "at random" means a uniform distribution. But on the plane, there is no uniform distribution. So, the "paper" can't be the plane. So, it might be fair to ask the size of the paper and what happens with the needle near the edges? E.g., on a rectangular sheet of paper of finite size, the needle can land in a position so that it does not cross a line but would on a larger sheet of paper.

maged · on Nov 18, 2015

> A needle of length L is dropped at random onto a sheet of paper ruled with parallel lines a distance L apart. What is the probability that the needle will cross a line?

Thickness of line is needed right? Otherwise P approaches 100% as thickness approaches 0?

davmre · on Nov 18, 2015

The lines are infinitely thin. Equivalently you can imagine the paper is divided into regions of width L, and the question is whether the needle will cross a region boundary (https://en.wikipedia.org/wiki/Buffon's_needle).

nonbel · on Nov 19, 2015

I don't think that page explains it very well, but have poor math background. I imagined notebook paper with horizontal lines spaced L apart and then the needle dropping at any angle. When the needle is vertical the probability it cross a line is 1, when horizontal it is zero. The length of the needle L is the hypotenuse of a triangle. If we call the angle from horizontal x, the "height" of the needle can be anywhere within h=Lsin(x) for x between 0 and pi/2.

The "lines" are like a sample of a point from a uniform distribution U with width L, and h is an interval inside U. The probability a number sampled from a distribution of width L will fall within interval h is h/L. Substituting for h gives p(cross|x) = sin(x).

Then assuming the needle is equally likely to drop at any angle, for any one angle theta we get probability density p(theta=x) = 1/(pi/2-0)= 2/pi.

The probability the needle drops at angle x AND crosses a line is the product of p(theta=x)p(cross|x)= (2/pi)sin(x). As mentioned, x can range between 0 and pi/2. To get the probability the needle drops at angle x1 OR x2 OR x3, etc and cross we need to sum all these. So take the integral of (2/pi)sin(x) from 0:pi/2. This gives 2/pi.

theworstshill · on Nov 18, 2015

andreasvc · on Nov 19, 2015

Not to detract from your point that math is important, but in that example proper methodology (e.g., cross-validation), proper feature engineering, and especially domain knowledge are probably even more important. You can be aware of the strengths & weaknesses of different machine learning algorithms without being intimately familiar with the math. Ideally, ML methods are not treated as a "black boxes", but some aspects are inherently black box, even if you do know the math (e.g., parameter tuning).

mailshanx · on Nov 19, 2015

>> He'll happily throw in your zip code and credit card number into the mix, not realizing that those two features have super high entropy

You will be surprised: you can make significant gains by including the zip code - i've seen that happen in a competitive setting. Where you live probably contains some signal about your credit worthiness.

Having said that, of course it doesn't make sense to simply feed the raw zip code to the tree. An appropriate encoding (most people would use a one-hot encoding, though there exist better ones) of the zip code will be key to extracting signal in a robust way.

>> ... the tree will have serious overfitting issues

Isn't it an almost standard practise now to use an ensemble of trees, such as a random forest? Decision trees have long been known to be prone to overfitting.

NegatioN · on Nov 18, 2015

I get that math is an important aspect, but being able to realise that throwing in those variables would create entropy sounds like more of a common-sense thing, no?

I guess I should still brush up on math though, it seems.

jzymbaluk · on Nov 19, 2015

As an undergrad interested in machine learning, what areas of math specifically should I be focusing on?

dxbydt · on Nov 19, 2015

1.math prereqs at UW:http://courses.washington.edu/css490/2012.Winter/lecture_sli...

2. math prereqs at UCSC:http://people.ucsc.edu/~praman1/static/pub/math-for-ml.pdf

3. math prereqs at UMD: https://www.umiacs.umd.edu/~hal/courses/2013S_ML/math4ml.pdf

gtani · on Nov 19, 2015

here's a list of open content (free pdf) texts that have appendixes or review chapters

- UML

- http://goodfeli.github.io/dlbook/

- 3 books: Barber, MacKay, and Rasmussen/Williams from this list: https://www.reddit.com/r/MachineLearning/comments/1jeawf/mac...

visarga · on Nov 18, 2015

> the massive amount of math thrown in your face

I can skip the math part - we don't all invent new algorithms - but what I really need is a large enough & gradual set of problems to solve (datasets + verification scripts). I mean, start from the simplest and teach people how to use the already available software. Machine Learning should be assimilated practically, too much theory with too little application is useless. Most of us should focus on using existing software efficiently instead of being able to implement backprop.

spdustin · on Nov 18, 2015

Kaggle's "Titanic: Machine Learning from Disaster" [0] is a great place to start. If you're anxious to dive in, go straight to "Getting Started with Excel" [1] and move on to "Getting Started with Python" [2]

Note: I'd strongly recommend the Anaconda Python distribution [3], as it has pretty much everything you need. Also, for immediate feedback on what you're doing with Python, I've fallen in love with Jupyter Notebooks (formerly IPython Notebooks) [4], which you'll have as part of the Anaconda distribution, along with all the other popular Python packages for scientific work.

[0] https://www.kaggle.com/c/titanic

[1] https://www.kaggle.com/c/titanic/details/getting-started-wit...

[2] https://www.kaggle.com/c/titanic/details/getting-started-wit...

[3] https://www.continuum.io/downloads

[4] http://ipython.org

_5qp2 · on Nov 19, 2015

Try this :) https://github.com/hangtwenty/dive-into-machine-learning

ska · on Nov 18, 2015

Machine learning is fundamentally mathematical, so you can't expect to completely avoid it without remembering or learning at least a little bit of the maths. Trying to avoid it all won't do you any favors, it will just mean you can't understand what is going on.

On the other hand, the math you absolutely need to follow along is pretty straightforward, so you can hopefully find tutorials that emphasize the applications and graphic examples of what is going on.

davmre · on Nov 18, 2015

Andrew Ng's Coursera ML course is supposed to be pretty accessible. I've also heard good things about Machine Learning for Hackers (http://www.amazon.com/Machine-Learning-Hackers-Drew-Conway/d...).

Ultimately, ML is a mathematical discipline. You can ask for a gentle approach that gets you to the foot of the mountain, but "if you want to learn about nature, to appreciate nature, it is necessary to understand the language that she speaks in." If you want to be more than an amateur, there's not much substitute for getting comfortable with math at the level of, say, Kevin Murphy's book.

The good news is that the required math is fairly elementary - calculus, linear algebra, probability and statistics, all freshmen or maybe sophomore-level topics - so it shouldn't be beyond reach of a motivated developer able to set aside some time to learn. MOOCs and organizing study groups with friends/co-workers can help a lot here as well.

subnaught · on Nov 18, 2015

I'm currently taking Andrew Ng's Coursera course and I'd agree it's quite accessible. In fact, if you have a solid understanding of calculus and linear algebra, you might find it a bit slow at times.

dfan · on Nov 19, 2015

For people who are disappointed by the shallowness of it, I recommend supplementing it with the notes to his Stanford class: http://cs229.stanford.edu/materials.html. The combination worked well for me.

WaxProlix · on Nov 18, 2015

This was my problem; it was incredibly boring (at least for the first few classes) and the video-based nature of it meant that I had to skip around and try to pick up what was going on later, which seldom worked. I hear it's more self-paced now, so that might be a better option these days.

mindcrime · on Nov 19, 2015

so it shouldn't be beyond reach of a motivated developer able to set aside some time to learn. MOOCs and organizing study groups with friends/co-workers can help a lot here as well.

I'm in the middle of that process right now. I only took Calc I in college and that was 20 years ago, so I have decided to work my way through a Calc sequence, Differential Equations, Linear Algebra, and Probability and Statistics through a combination of MOOCs, "X For Dummies" books, Youtube videos (hello, Gilbert Strang!), Schaum's Outlines books, Khan Academy, a mammoth stack of college maths texts that I've picked up at used book stores, and questions on stats.stackexchange.com, math.stackexchange.com, learnmath.reddit.com, etc.

I'm doing the Ohio State MOOC on Calc I now on Coursera, and accompanying that with the Gilbert Strang "Highlights of Calculus" video series[1]. So far so good. I definitely think this stuff is learnable if one is willing to put in the time and work, even without going back to taking "on campus" classes at a university.

[1]: https://www.youtube.com/playlist?list=PLFW_V3qDH5jRyfpD9uiq6...

moinnadeem · on Nov 19, 2015

I'm a high school Senior (17) and I'm currently taking it. It is ridiculously understandable and I often see myself yearning for more. But I've also had Calc 3 + Linear Algebra by now, so its understandable that not everyone would get it. The intuition is simple, however.

fapjacks · on Nov 18, 2015

Yeah I'd like to second this. This course specifically is what opened the door for me.

alfalfasprout · on Nov 18, 2015

As others have said, you need to at least get the gist of what's going on mathematically to be able to make sensible decisions about model selection, structure, etc.

That said, I find a lot of the introductions to the theory behind ML techniques to be very poorly written. It's often worth giving a new student a conceptual simplification before introducing a rigorous definition.

Without linear algebra and basic probability/calculus though, forget it. Luckily there's great sources to brush up on it.

alfiedotwtf · on Nov 18, 2015

The problem is that machine learning is applied probability and statistics.

If you're interested, here's a gentle book that should give you enough background:

http://www.amazon.com/Modern-Introduction-Probability-Statis...

dtjones · on Nov 18, 2015

Applied machine learning without understanding of the fundamental mathematical assumptions can be a recipe for failure

That said, there are some graphical examples to help understand of how learning algorithms work in 2 dimensions.

kewlb · on Nov 19, 2015

Certainly understanding the math is very important but it is harder to get expertise on the pre-requisite math because the horizon is much bigger. I would recommend taking a case study approach and side by side learning the math stuff needed. If you are looking for an example then take a look at this, https://www.coursera.org/learn/ml-foundations/

tchalla · on Nov 18, 2015

I have seen many people make this observation. I'm just curious about your problem. Is your problem

1 : Mathematics in ML - proofs etc.

2 : Understanding the intuition behind ML algorithms without the requirement of higher order math?

Personally, I feel that 2 can be tackled quite easily. The core issue is that most people who teach want to stay on a "higher dimension". ;)

_5qp2 · on Nov 19, 2015

Not trying to detract from worthwhile discussion here ... but for a hack-first introduction, try this: https://github.com/hangtwenty/dive-into-machine-learning

Omnipresent · on Nov 18, 2015

I asked a related question [0] few weeks ago but didn't find a resource that talks about machine learning from real world example perspective.

[0] https://news.ycombinator.com/item?id=10356874

inglor · on Nov 19, 2015

I've read this book and warmly recommend it. It has a very pragmatic "no bullshit" approach and it's very mathematical and concise.

The neural networks chapter is tiny (but that's ok - that's not the focus) and some of the questions are really hard - but overall I've really enjoyed it.

stevenmays · on Nov 18, 2015

I am considering taking Udacity's machine learning nanodegree [0] with zero machine learning background. It seems interesting. Any thoughts?

[0] https://www.udacity.com/course/machine-learning-engineer-nan...

wodenokoto · on Nov 18, 2015

It's pretty cool that the book is not only free, but they link to courses that uses it.

If you speak Hebrew you can get two different professors take on how to teach the material in the book, as well as lecture notes from a total of 3 professors. That's pretty neat if there's a concept you are struggling with as a student!

enlightenedfool · on Nov 18, 2015

Glad to see another book to learn from for free. But my problem now is that there are so many books each with somewhat different approach and content for the same ML techniques. Not necessarily bad, but I get somewhat confused when trying to apply a method.

EDIT: I guess the focus on the theory might help me.

alvern · on Nov 18, 2015

My question for someone that has an intermediate level of skill in machine learning, what's the best way to dip your toes in? (Udacity, coursera, edx, PDFs, Talking Machines podcast, etc)

zintinio4 · on Nov 18, 2015

Find a problem to work on in a domain you find interesting. By reading published papers and trying to attack the problem, you'll be forced to pick up a lot of other knowledge not commonly discussed like: feature extraction and selection, dimensionality reduction, dealing with sparsity, common metrics for that problem, recent work, etc.

I was forced to learn a massive amount in a short period of time for work, but I'd previously watched Andrew Ng's lectures, as well as majored in Math/CS. I can also generally recommend Hinton's NN lectures, Socher's Deep learning for NLP, Andrew Ng's Machine Learning, and a few books.

michaelsbradley · on Nov 18, 2015

There's also the freely-accessible book A Course in Machine Learning:

http://ciml.info/

Omnipresent · on Nov 18, 2015

Related: Foundations of Data Science: http://www.cs.cornell.edu/jeh/nosolutions90413.pdf

sagik · on Nov 18, 2015

Took the course at the Hebrew University. Awesome course. Here are the slides (English) and videos(Hebrew): http://www.cs.huji.ac.il/~shais/IML2014.html

therobot24 · on Nov 18, 2015

over 30 chapters and the only reference to graphical models is naive bayes and EM

jjaredsimpson · on Nov 18, 2015

Top comment in another thread whines that nobody understands what the machinery of ML algorithms are really doing.

Top comment here whines that math is hard.

pinn42 · on Nov 18, 2015

I'm creating general intelligence in Java multicore.

Merkur · on Nov 18, 2015

great! thanx!