I don't find khan academy's videos to be that great to get at the intuition behind probability and stats. It's good for reference and surface explanations.
This is our second Learning Path, on Data Analysis, built by the awesome Claudia Gold (MIT alum, self-taught data scientist, early at Airbnb). The aim is to list helpful resources in a sequence that a beginner can follow.
Once again, we realize this is a curriculum, not the best curriculum. We'd love your feedback on what we should change or add.
------
Edit: Since we have your attention, here are some other ways in which you can help us:
1. Tell us which new Learning Paths you‘d like to us build.
2. Collaborate with us to build a Learning Path on a subject where you're an expert.
3. Request features that will help you take better advantage of Learning Paths.
I also got a 500 when trying to signup with facebook. When asked about posting I selected privacy as "Only Me". Maybe that had something to do with it.
A curriculum of books to go along with these would be great.
The thing with online courses / video series is that they can give you a great introduction to the topic, but it is often difficult to know where to go next for a deeper understanding!
Also useful for filling in gaps of prerequisite knowledge e.g. "before diving into this stats book, make sure you understand the concepts from this basic algebra book!"
1. I want a "build a language" learning path, updated for the modern ways and with emphasys in what to do after get the AST (ie: Is easy to find info about lexing, parsing and build a AST. But not what to do with them)
Something that explain how implement small bits (like: This is how build a assigment, how build a function, how check types, etc)
It would be nice to have a learning path on Computer Engineering and Computer Science which can guide a beginner all the up. Thanks for this one, will check it out.
Gonna toss my hat in here for the engineering bit. Plenty of programming and math out there, probably owing the the fact that physical material is harder to work with virtually... there doesn't seem to be much quality circuit and low level stuff. The micro-controller class on edx just past being a major exception.
With so many "free" learning resources online, we end up "paying" through the mental churn and frustration of trying to separate the wheat from the chaff. This is a great step in truly making free resources more accessible and meaningful.
To me, the credentials of the "expert" are very important. They are the only indication I have going into an online course that the person who built it has any idea what they're talking about.
The awesome thing here is that experts in a particular area ALREADY likely keeping tabs on the best resources available.
This could be a way for them to publicly opine on their favorite path, and overall their effort would be much lower than, say, building an online course from scratch themselves. It's win-win for them, 1) personal branding, 2) credit for the curation.
Also - I'd love to see a learning path for information security / privacy.
Extensive industry experience or a PhD in the subject, the same I would expect out of someone teaching in a traditional university or community college.
Definitely industry experience over academic experience. We've all seen academics giving bad advice because they don't know what the industry does. IMO, online learning is all about practical knowledge.
I like the curation of free educational content in a specific area because it eliminates the guesswork, and duplicated effort, of filtering for high-quality resources. Thanks to Claudia Gold for the amazing amount of work she put into this. My main gripe comes with the majority of these data science courses/tracks.
It appears that no comprehensive treatment of applied data science exists. For the past few months, I've been searching high-and-low. I understand collaborative filtering; I've heard about the Netflix recommendation challenge ad nauseam; I grasp machine learning, bayesian statistics (prior, posterior, conjugate prior distributions, etc.) on a superficial level. Conversationally, I can hold my own with practitioners', albeit on a beginner level.
But what I, and others, want to learn is how to apply these techniques in a scalable way on a real production system. Right now, it's easy to conjecture about what could/should be done, but there's a lack of confidence in how to achieve the goals. I'm experimenting with a collaborative filtering problem using Cassandra as the data store for thumbs up/down ratings on products, and Hadoop for the MR pipeline; it'd be great to have more visible examples available. Is there any place I could find detailed information on real, online machine learning/statistical inference systems?
Thanks for your comments! I completely agree about the lack of hands-on courses. I found the same thing when I was putting this together. The capstone project is our attempt at including something more practical, but it's self-directed, so that's not exactly what you are after. (Creating individual courses was outside the scope of this project.) However, I'm confident it will exist someday, given the current popularity of both data science and online courses. I assume you've also done some Kaggle challenges?
I agree with the suggestion that you should attend meetups and tech talks (or watch them online if there are none in your area). You'll hear more about real life examples and have a chance to ask questions.
The other main way to learn what you're asking is to get a job doing it! You have more than enough background (assuming you also have knowledge of tools) and you will learn more from others and as you need the information.
You know, I haven't had time to try any Kaggle challenges yet. I'll have to sit down and attempt one this weekend. I appreciate the advice, from both you and Brenden, I'm going to look for more data science meetups and keep my ear to the grapevine for any exciting positions. Keep up the great work Claudia.
I love the idea of expert-curated learning paths - this is so much needed with the proliferation of all the competing MOOCs. Thank you for putting this together.
I've noticed that there's a growing demand for performance and reliability engineering types of roles in the tech. Can that become a learning path? The courses for that could be:
1. OS
2. Computer Networks
3. Distributed Systems
4. Intro to Algorithms
5. Intro to Statistics
6. <Some course on best practices of general systems-level troubleshooting?>
7. <Some course on best practices of software debugging?>
I know it sounds almost like a full-fledged MS program in CS. But this could be a great opportunity for those who are not enrolled in those programs but love systems in general and would like to make a career out of it. Apologies if this type of "learning path" makes no sense to most of the industry insiders.
We got asked this question before, and here's our analysis of the differences.
1. Coursera focuses solely on R for Data Science. SlideRule covers additional tools (e.g. Python, SQL) which a practicing data analyst will find handy. It seems there's a bit of an R vs Python debate in the data world, so we think it's useful for people to know both.
2. SlideRule's path has an (optional) "intro to programming" section for beginners. Coursera assumes some prior programming experience.
3. Most of the courses in the SlideRule path are "self-paced", so in theory someone studying this full-time could cover it in 4-6 weeks. Coursera has fixed start and end dates, so the fastest one could complete the track (accounting for interdependencies of courses) is ~24 weeks.
Thanks for the response, that definitely makes sense. I guess it really depends on the specific technology you want to learn and the type of learner you are.
Login page keeps redirecting me to the sign up page. There I'm told I'm about to login to the django server (why the django server bit, just say I'm about to login), but when I enter my email address, it says a user with that email already exists and I should try logging in instead. The cycle continues.
It would be great to add an elective course for Growth Hacking where you can assume the knowledge of data analysis and provide a survey/use cases of effective examples of using analysis and other methods to inform product development and/or design.
Hi Claudia, thanks for putting this together, I've already found it very useful.
I didn't see any linear algebra anywhere here, and from my (probably naïve) understanding of data science, it seems to be core to a lot of the main ideas. Do you know of any good resources in this same vein as the rest of the track? I've been watching Coursera and EDX and it seems linear algebra offerings are somewhat sporadic.
Hi, Glad you're finding it helpful! The reason I didn't include linear algebra is that it is possible to do the day-to-day work of most entry level data science jobs without it.
That said, it is great to know for a deeper understanding and if you are writing your own machine learning algorithms. This MIT OCW class provides a good introduction, with video lectures and problem sets: http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-...
Could we please reinstate the "built by a former Airbnb Data Scientist", though? That's material information, in that this is not just any curriculum, but one that's expert-curated. As people on this thread have indicated [1], the credentials of the person building a Learning Path are important.
I recommend Harvard stats 110 youtube videos: https://www.youtube.com/playlist?list=PL2SOU6wwxB0uwwH80KTQ6...
These videos are more focused on probability, but they contain a lot of great intuitions.