Thank you for creating these courses and putting them online! I started part 1 a few weeks ago. I couldn't agree more with your teaching style. I'm someone who started college as a computer science major and couldn't finish, because it felt like I was pounding my head on the wall trying to learn abstract math when what I really wanted to do was build stuff.
I finished my degree with economics, went to work in finance, and worked my way to being a developer by first doing advanced Excel, then Access, then database servers, then web development, then native development, and now more math heavy research and development. As the need arises, I keep going further down the stack and gaining more understanding. It's much more natural (for me) to learn that way.
> because it felt like I was pounding my head on the wall trying to learn abstract math when what I really wanted to do was build stuff.
But you do need to know abstract math to build stuff, particularly if what you want to build is deep learning stuff, both models and implementations.
For example, how do you expect to understand how to minimize an utility function if you have no idea of what a gradient is, how you calculate it, and why you want to descend through it.
> For example, how do you expect to understand how to minimize an utility function if you have no idea of what a gradient is, how you calculate it, and why you want to descend through it.
The course teaches all those things - as the comment you're replying to states, you go deeper and deeper during the course to understand all the details.
There's been a lot of research into teaching strategies that shows that this is often a more effective approach for many people than the bottom up approach widely used in math and CS. It doesn't mean that you learn any less of the foundations - just that it's in a different order.
> There's been a lot of research into teaching strategies that shows that this is often a more effective approach for many people than the bottom up approach widely used in math and CS.
I seriously doubt that anyone can effectively learn linear algebra, multivariate calculus, optimization and regression models from an onlone tutorial on deep learning. These are subjects whose basics alone take multiple semester-long courses. If a bottom-down approach was remotely effective, no one would bother teaching the basics.
Do you think it's necessary to have a rigorous understanding of all of those topics before creating a machine learning model? And that you don't learn from interacting with it, even if you don't fully understand how it works? For machine learning in particular I think that's pretty ironic.
Hey! Thanks for these great courses and materials! How much additional math (beyond high school and introductory college courses) do these courses teach? For example, if I were to take both courses, would I be able to understand the papers published by Surya Ganguli (e.g., The Emergence of Spectral Universality in Deep Networks, Variational Walkback: Learning a Transition Operator as a Stochastic Recurrent Net)?
Ganguli's papers are at the more "math-y" end of the spectrum when it comes to DL papers. So don't worry if you're finding them a bit unapproachable.
In part 2 of the course I provide quite a bit of advice about how to approach papers in general, and you'll get plenty of practice in reading and implementing papers - but we don't cover the specific math in this particular papers.
My view is the best approach to the math in papers is to generally learn what you need as you get there. It's nearly impossible to know all the math that covers every paper you'll come across, but if you learn the meta-skill of how to learn it on demand, then you'll be just fine! :)
Gram matrix, some non-linear optimization. Deep Learning is simple, complexity comes from loss function, trying to adjust weights of under-represented categories, different LEGO blocks in building your network and seeing if the particular non-linear optimization works in your case or not. You can go super deep with state-of-art math research in reading about "why do we think Deep Learning works, when it shouldn't", which is mentioned by Jeremy.
Hey Jeremy, thanks for making these courses! I was curious if you could offer any points of comparison between the fast.ai courses and the Udacity ML nanodegree course (or any other MOOCs you have opinions about).
I am not Jeremy but I have taken part in the current course and also took some lessons by Siraj (who also does lessons at udacity), Andrew NG, and Karpathy. I think all of them are awesome teachers. Like Siraj's lessons, the fastai course is full of practical examples, whereas Ng/Karpathy's courses are more theoretical. What makes the fast.ai course unique compared to Siraj (who covers a lot of topics very briefly) is that Jeremy delves into depth, taking a long time explaining things, so you have a lot of opportunities to understand what's going on. In the 2018 course there is actually not that much different content compared to the 2017 course, rather Jeremy took the time and tried to make everything easier for everyone to understand and built a better python library. Also another unique point about Jeremy is that he and Rachel want not only to teach people but rather inspire a community of people working together in teams. The fast.ai library will not be just a project for the course, it will be something that a lot of people will use and contribute to to make pytorch better.
Dr Rachel Thomas, the co-founder of fast.ai, is the person who takes the highest voted questions from our 600+ in-person students and passes them on to me, as I asked her to do. At least 4 different people must have wanted the question asked before she asks it.
This approach helps ensure that if I haven't explained something clearly enough that I get another chance to do so, and, more importantly, keeps me fresh and energized throughout each lesson (I get stale and boring without some interaction).
I am a firm believer in candid feedback; I talked to multiple people taking fast.ai and all of them took it as disturbing the flow of lectures. I really like fast.ai, it's an excellent hands-on course, so please don't get offended.
Since you appreciate candid feedback, I would like to explain why I down-voted you. Your post comes off as super rude and dismissive. You may have some valid criticism to give but, take more care in how you deliver it. It's a bad look for HN.
I appreciate your feedback! :) It's difficult to convey mental state over a single line of text, giving rise to various interpretations that weren't in the intent. Imagine I am your best friend and I say the same in a playful fashion - would that be more acceptable?
Absolutely! That's actually in the hacker news commenting guidelines: "Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith."
https://news.ycombinator.com/newsguidelines.html
So to answer you question, yes, it would be more acceptable if you were saying that to me in person. In fact, I agree with you about the flow of the lectures, and I was looking for someone to bring that up and I'm glad that it's getting discussed here.
However, Jeremy and Rachel are both reading these comments and we should strive to provide thoughtful, fleshed out feedback. The work they've done on Fast.ai is a tremendous lift, and deserves more than a drive-by comment.
The thing is you can visibly see how it interrupts Jeremy; he is in the middle of explaining something, then has the face of a surprised person, loses context for a few seconds etc. Better IMO would be if he just finished a small section in its entirety then did Q&A, instead of allowing himself being interrupted all the time. And often those questions are missing the point, which is to be expected with newbies, so the lecture loses efficiency. If I didn't know most of the basics already, I'd have problem following them. I found just doing the exercises better for learning.
I also find the questions very helpful. I understand for many people who already have a good understanding, these questions might be interrupting the flow. Many a times the questions his students raised were things that I had not thought of, sometimes they even stumped Jeremy. Fortunately Jeremy in most cases gives answers which vary in length, according to whether it's a concept which he will clear later, or whether it is an important concept that needs to clarified then itself.
So I think this more effective way of learning/teaching even though there is a loss in efficiency.
Rachel's mic is always off until I turn it on. So if she's asking a question, it's only after she's visually indicated that she wishes to do so, and I've found a time in my presentation that I'm ready to take it. So I'm literally never being interrupted, and can't be surprised by the fact that she's asking a question (although I may well be surprised by the content).
I do try to limit the time I wait to take a question, since I don't want to move on with a topic where I've failed to properly explain some foundational piece.
Same, there have been several times I have had a question and someone thankfully asks the exact one I had. And Jeremy does a great job explaining it. It really helps to get at some of the why's behind the magic.
You are missing the point, that is full part of their experience opening to unrepresented classes and styles. It is human together with efficient, the heart added to the mind.
I haven't taken the course - and I'm hopelessly biased anyway so you shouldn't trust my comments. However a lot of our students have done it. Here's a couple of their comments (search the linked thread if you want more):
fast.ai 1 is much better than Udacity's ML if you want to do purely Deep Learning. Udacity's DLF ND is IMO better than fast.ai 1. However, fast.ai 2 is way better than Udacity's DLF ND, and only their self-driving car has some Deep Learning models that are even more awesome (imagine driving using Deep Learning only). Take both fast.ai 1 and DLF ND if you are a beginner; after that jump on Stanford's CV & NLP courses; Oxford NLP might be a good idea as well. Then do UCL's Reinforcement Learning and you are set.
Thanks Jeremy and Rachel for the excellent materials. I am a beginner with ML and initially tried to start with fast.ai part-1. I struggled initially as I preferred to know some basics. So, I looked for a beginner friendly course and ended up with Udacity's AI programming with Python. It gives a pretty decent overview on the basics. Now, I am very excited to take up the fast.ai part-1 & part-2 courses.
I didn't take the Udacity ML Nanodegree, though I did do the Deep Learning Nanodegree.
I found it to be disappointing especially since they had hyped the collaboration with Siraj, which was nothing more than linking to certain YouTube videos.
The project feedback was sometimes helpful. I felt like most of the time though, the feedback was "you did this wrong, read this article" instead of something more personal like an elaborate explanation on why you should do things a certain way. I even once explained why I initialized a model a certain way and the reviewer ignored it when critiquing my model, which almost felt like "all students have to do it this way."
It wasn't all bad. My favorite parts were learning about GAN's with videos and a notebook from Goodfellow. And when I was trying to build more intuition about CNN's, the videos with Vincent Vanhoucke were helpful.
But altogether I felt a little disappointed in the actual projects. Maybe it was because I felt like the math was glossed over and it was too many topics with shallow exploration for a single course. I actually wished that Udacity offered a single course for say, CNN's and GAN's, going very deep into the math and processes behind them.
I'm taking another Udacity course taught by Thrun (this time, it's free) and again, he kind of glosses over why certain mathematical operations are done, at which point I spent a lot of time watching lectures by other professors who spent more time explaining it.
I think that's my biggest criticism about MOOC's in general, they can be very hand-wavy about very important concepts that underly a process. I've spent a great deal of time reading papers and course material from other colleges, writing throw away code, and watching videos from other profs in order to shore up an intuition that was simply not strongly built by the MOOC.
> Udacity offered a single course for say, CNN's and GAN's, going very deep into the math and processes behind them.
I think that's what are they going to do next with their AI school. They announced separate nanodegrees for CV, NLP and RL. However, I hope they aren't going to be such massive disappointments as AI ND term 2, where they basically didn't deliver what they promised, castrated projects and one could finish each of term 2 specializations in one weekend (i.e. removing image captioning project, removing real-world NLP project, etc.). A similar story happened with Robotics term 2, where instead of a real robot they promised you received a standard discount for NVidia TX2, and reinforcement learning was butchered from robot walking to robotic arm movement. So I am doubtful they can really live up to their promises with their current staff that keeps underdelivering. The only ND they made absolutely breathtaking, worth every penny, was Self-driving Car ND (IMO the best [not only online] course I've ever taken, and I took many from top 10 universities).
Just wanted to ask a quick question if you recommend any prerequisites before taking this course? I am a second year Computer Science student looking to get into AI, and I was just wondering if this would be a good starting point for me, or should I take some other courses and learn something else before?
This linked course is part 2, so this isn't a good starting point. But part 1 is! :) Here's the link: http://course.fast.ai .
We assume you're familiar with python and somewhat familiar with numpy. If you're not sure if you're ready, you can always just start the first lesson and see how it goes. There are plenty of resources on the forum to help fill in specific skill gaps as they come up.
Thanks Jeremy (and Rachel) for everything y'all have done; I learned a lot just from watching part 1!
Somewhat tangential question, but how do you feel about sticking with PyTorch for future lessons/research work? What are your thoughts on the new Swift integration with TensorFlow?
We're so early in the software and tools cycle for DL that it's unlikely anything that exists today will remain relevant in its current form for very long. You can see that every couple of years the entire DL ecosystem is getting replaced - I don't think that will change.
I'm glad you asked about Swift. It's one of the two directions I'm most excited about for the next stage of DL software (the other being javascript!) Swift is a much better language than Python, and having autograd and tensor types built in will be quite transformative. But it has nothing like the same data science library and tooling, so it'll take time to be a real option for most people.
Javascript has the potential to make DL accessible even for those without a discrete GPU, and allows people to get started even without installing a tool-chain. It also allows more work to be pushed to the client - possibly even on mobile devices.
I might be helping author a textbook based on our upcoming introduction to machine learning MOOC. Which maybe one day would lead to a deep learning book - but that would be years away!
(BTW, I used to think I didn't like videos for learning, but actually I now think some material works best in this format. E.g. in this case we're walking through interactive notebooks, and you can follow along too. Some of the material is animated, and often I'm building up drawings as I talk. Maybe it won't be for you, but it might be worth giving it a go to see...)
Just another thought on this, as someone slowly trying to work his way through part 1 (and generally very happy with it, thanks for this amazing resource!): videos are great when they're really short. It's tough for working people to find enough unbroken time to watch an hour or two long video, like the ones in fast.ai (as amazing as they are), especially with frequent pauses for note-taking and such.
But it's much easier to handle smaller chunks. This is probably my only critique of/gripe about fast.ai---the videos pack a lot of topics in, and they could be broken at topic boundaries to make them easier to use.
Thanks for the feedback! One thing you may be interested in, if you haven't seen it, is that you can right-click the video and copy a link to the current time code, so you can always go back to where you were (or just pause the video in a tab and return later).
Personally, I really hate the Coursera approach of lots of separate short videos - I totally get that some people like it, and that's fine, but it's not something I want to do myself.
another thing I used a ton when going through part 1 was being signed into my google account, google kept track of my progress through the video and if I exited out and came back on a different machine it would resume where I last left off, then I would back up a few minutes to build that context back up (or attempt to anyways).
I'm traditionally a textbook person too, but I really like fast.ai. You get to see how to tinker with things in real time (which is often clunky in a textbook), and there's a lot of informal folklore that people don't normally write in academic textbooks. If you're dead set against video you could just work through the notebooks yourself and read the referenced papers on the website.
I can't say enough good things about the fast.ai courses. They may not be for everyone but they are definitely for me. The teaching style is a perfect match for my learning style. The style is to get some code working and then dive deeper and deeper to understand it better and better.
Note this link is for part 2 of the course which is significantly more advanced than part 1. Start with part 1 if if you are new.
Jeremy - First of all, I just want to say thanks to you and Rachel for the course(s). I worked through part 1 (2017 version) last year and it was fantastic.
I've got part 2 on my to-do list for this year. Looking at the 2018 part 1 course, quite a lot has changed (from Keras to Pytorch?) from a code perspective in comparison to the 2017 part 1 course. If I want to start part 2, does it make sense to go-ahead with the 2018 version? Are there any 2018 part 1 lessons you recommend to bridge any gaps from the 2017 part 1 before I start 2018 part 2?
Not Jeremy (obviously) but having taken both versions of the course I would definitely recommend jumping into the 2018 version. The core content is the same (image recognition, recommender systems, NLP) but the 2018 version uses more cutting edge tools and has a tighter feel (as a v2 of anything usually does).
The field is changing so quickly, but fast.ai is always ahead of the curve teaching modern techniques! Hats off to them for being able to keep up, AND share that with others.
I watched all of the 2018 Part 1 videos and it was not my style.
One thing I liked is the positive attitudes of Jeremy and Rachel, but there was not much theory and a lot of time was spent on questions and answers for the participants of the lecture that I don't think necessary.
I wanted to see definitions and "why", but the course spends too much time on "how". Sometimes I see "why" in the lecture, but many times it is just an answer for the question from a random participant and the answers seemed not well prepared, so it makes it hard to deeply understand the content.
I think fast.ai (at least Part 1) may be good after taking other deep learning courses and before participating in the Kaggle competitions that when you are already familiar with the theories.
Top-down vs bottom-up depends to some extent on what you enjoy, vs what you need to be patient about. With our top-down approach you do get to all of the 'why', but only after understanding 'how'. So it's good for people who want to get started doing stuff right away, and don't mind waiting a bit to understand the details. It means you can start experimenting and building a good intuition for training models, which I believe is the most important skill for a practitioner.
On the other hand, with the bottom-up approach of Andrew Ng in deeplearning.ai, you start with a lot of 'why', and later on get to 'how' (although in less detail and fewer best practices than we show). So it's good for people who want to understand the theory right away, and don't mind waiting a bit to understand how to use it.
A lot of our students did Andrew's course after ours, and many did it in the reverse order. All have reported finding the combination more helpful than either on their own. When we describe 'why' it's mainly with code, whereas with Andrew it's mainly with math - so which you prefer will also depend on which notation and framework you're more comfortable with.
(But I promise - you do get all the 'how' with us, particularly in part 2! Our students have gone on to OpenAI, Google Brain, and senior AI leadership positions at well known startups, as well as writing and implementing new papers. Here's an example of a student who just implemented a paper that was released within the last month: https://sgugger.github.io/deep-painterly-harmonization.html#... )
Granted, each person has their own learning style. Some do better with theory, some do better with practical application.
In my case, I stopped doing data MOOCs entirely once I realized I learned 10x faster just by reading documentation and working on a project from the bottom-up, since MOOCs rarely highlight the annoying gotchas present in real-world data applications.
I can see how The courses would be tough to digest without some theory from other sources, but I have found them to be invaluable for tips and tricks. They are loaded with practical information.
Much of Deep Learning is still experimental in nature and requires quite a bit of educated guessing. A number of times I have been stuck on a particular deep learning problem and a passing comment from one of the fast.ai videos has given me the perfect insight.
I'll have to agree with this one. I had watched, and done most of the assignments, in the 2017 part-I of the course. While I did get stuff working and got results, I was left unsatisfied.
I recently went through the videos, of Andrew Ng's Coursera course to get feel for the theory and intuition,and I'll be taking another stab at fast.ai courses to compare his assignments with Andrew Ng's.
I have taken the classes and also did Udacity Nano Degree and Deeplearning.ai. All of them help you in your journey of mastering DL. But Fast.AI part-2 really teaches you to cutting edge of DL through a number of practical examples. Don't wait and join other fellow learners at forums.fast.ai
>> Welcome to the new 2018 edition of fast.ai's second 7 week course, Cutting Edge Deep Learning For Coders, Part 2, where you'll learn the latest developments in deep learning, how to read and implement new academic papers, and how to solve challenging end-to-end problems such as natural language translation.
I would really like to know how to solve natural language translation. I think everyone would. Many people have been trying to solve this devilishly hard problem for several decades and failed. So I'm really curious how fast.ai has finally manged to do it.
Well, I looked at the summary, and they're implementing a Seq2Seq model for this. It is what I think of as an archetype for machine translation and chat bot tasks.
Quite a few new network architectures in this space have been updates to this model, which uses an RNN encoder and a decoder, along with attention between them and a beam search for better results.
I wouldn't call this model a solution for natural language translation, nor would anyone else. But I think fast.ai meant that they're going to explain and go through this model, and how it's helped bring a new generation of models with good performance in this particular space.
Yup it's multi-layer bidir seq2seq with attention, and a few tricks like teacher forcing. Same as Google Translate. Their version takes a long time to train on a lot of GPUs, so we simplify it by using less layers, and a smaller, simplified corpus (it only contains questions, and limits them to 30 words long).
By "solve end-to-end problems" I only mean that we show how to do the whole process from beginning to end - I didn't mean to imply that the final model would be human-equivalent or perfect or anything like that.
Yeah, I understood the intention behind that statement. Great work with the course!
While you're here, what do you think about using temporal convolution for sequence tasks? I've read a few articles, this particular one by my professor comes to mind now [0], which say CNNs could work extremely well for the tasks traditionally done with RNNs. A recent paper by the people at Google Brain [1] mentioned that their CNN with attention network beats traditional RNN approaches. More surprising is that the network is 130+ layers deep, and yet trains faster than RNNs. Do you think we can potentially switch most machine translation tasks to CNNs?
>> By "solve end-to-end problems" I only mean that we show how to do the whole process from beginning to end (...)
Then why not write just that? What is the point of using language that implies you can teach people how to solve a very hard problem that nobody knows how to solve yet?
I find it extremely disreputable to claim to be able to accomplish feats that go far beyond the limits of current technology. That is the tactic of charlatans and snake oil salesmen, not of scientists and technologists.
Machine translation is not solved, but it's reached some surprisingly improved benchmarks for accuracy, so while it's a little presumptuous to call it solved, it's not the most egregious exaggeration I've heard about machine learning this week.
I have nothing to say other than that I loved DL1 and I'll dive into DL2 right away. I really like the overall philosophy ("see DL is approachable and doable, just do it") and I love that they encourage you to read papers even if they seem hard and to translate math to code etc. etc.
Fantastic job fast.ai team.
It's pretty damn amazing that they build a library (and also the new fastai.text) that goes one level of abstraction beyond PyTorch with the goal of implementing interesting/helpful papers ASAP and because they wanted to make their teaching even more efficient.
Hi Jeremy, Thanks again to you and Dr Rachel for these amazing courses and the support to the community. My question is the following: are you thinking of an organic fast.ai connected hub or foundry acting as a distillator, that is disseminating & implementing in / with / through the fast.ai library the most interesting research papers coming out day in day out from arxiv.org and events? It is nearly impossible to follow the progress without clear state-of-the-art perspective these days, which you have in abundance indeed.
We're certainly looking for ways to help develop the community. Obviously http://forums.fast.ai is a good starting point, and is a very active and supportive community nowadays. But I'm interested in finding ways to support face-to-face communities too. Some students are already doing that - for instance AI Saturdays (https://nurture.ai/ai-saturdays) has study groups in 50 cities now, and I'm told ~5000 people have been involved in them.
I'm always open for ideas in other ways we can help people get involved and stay up to date. And I'm very interested in supporting students who are trying to build these things in their local community.
Question: What is the job market like for folks who spend two months learning deep learning, either through a class like fast.ai or a bootcamp like Insight AI? What sorts of companies hire these people and how much do they pay? I know it varies, but I'm just trying to get a sense to calibrate my expectations of the current market. Thanks for any perspective anyone can offer.
As an outsider and / or away from the SV, where such an environment thrives and opportuinities are a few dozens, your mileage may vary a lot. Personal network is always better to refer to than shots in the dark though. In general, you might need showcasing your knowledge through marketable case studies in a sub-domain you are interested in, so that interviews will come from a shared context.
Exactly right - it's all about your portfolio. We spend a lot of time throughout the course talking about ways to build that portfolio, and folks on the forums support each other with feedback and help on these projects.
Completing a MOOC isn't going to help you land an interview for a job, generally speaking (although if it's any good, it should certainly help you once you do get an interview!) However if you use your study time to, for example, create useful projects that you make available on github, web apps that you host on heroku, deep technical blog posts you post on medium, etc then you should definitely get plenty of interview opportunities.
I've seen many many students go through this process, so I know it works. That includes students that hadn't been getting interviews or job offers - until I convinced them to spend time building a portfolio, and then they got multiple offers from top companies.
(There are some companies - not too many thankfully - that strictly require a PhD for DL applicants. I've noticed that such companies, that focus on credentials over actual work output, seem to overlap a lot with companies that turn out to have toxic cultures. So I'm not sure you should worry too much about them...)
Let me know if you have any questions about the material or approach. There's also a discussion for the course here: http://forums.fast.ai/c/part2-v2