>> Welcome to the new 2018 edition of fast.ai's second 7 week course, Cutting Edge Deep Learning For Coders, Part 2, where you'll learn the latest developments in deep learning, how to read and implement new academic papers, and how to solve challenging end-to-end problems such as natural language translation.
I would really like to know how to solve natural language translation. I think everyone would. Many people have been trying to solve this devilishly hard problem for several decades and failed. So I'm really curious how fast.ai has finally manged to do it.
Well, I looked at the summary, and they're implementing a Seq2Seq model for this. It is what I think of as an archetype for machine translation and chat bot tasks.
Quite a few new network architectures in this space have been updates to this model, which uses an RNN encoder and a decoder, along with attention between them and a beam search for better results.
I wouldn't call this model a solution for natural language translation, nor would anyone else. But I think fast.ai meant that they're going to explain and go through this model, and how it's helped bring a new generation of models with good performance in this particular space.
Yup it's multi-layer bidir seq2seq with attention, and a few tricks like teacher forcing. Same as Google Translate. Their version takes a long time to train on a lot of GPUs, so we simplify it by using less layers, and a smaller, simplified corpus (it only contains questions, and limits them to 30 words long).
By "solve end-to-end problems" I only mean that we show how to do the whole process from beginning to end - I didn't mean to imply that the final model would be human-equivalent or perfect or anything like that.
Yeah, I understood the intention behind that statement. Great work with the course!
While you're here, what do you think about using temporal convolution for sequence tasks? I've read a few articles, this particular one by my professor comes to mind now [0], which say CNNs could work extremely well for the tasks traditionally done with RNNs. A recent paper by the people at Google Brain [1] mentioned that their CNN with attention network beats traditional RNN approaches. More surprising is that the network is 130+ layers deep, and yet trains faster than RNNs. Do you think we can potentially switch most machine translation tasks to CNNs?
>> By "solve end-to-end problems" I only mean that we show how to do the whole process from beginning to end (...)
Then why not write just that? What is the point of using language that implies you can teach people how to solve a very hard problem that nobody knows how to solve yet?
I find it extremely disreputable to claim to be able to accomplish feats that go far beyond the limits of current technology. That is the tactic of charlatans and snake oil salesmen, not of scientists and technologists.
Machine translation is not solved, but it's reached some surprisingly improved benchmarks for accuracy, so while it's a little presumptuous to call it solved, it's not the most egregious exaggeration I've heard about machine learning this week.
I would really like to know how to solve natural language translation. I think everyone would. Many people have been trying to solve this devilishly hard problem for several decades and failed. So I'm really curious how fast.ai has finally manged to do it.