I am one of the authors of "Outrageously Large Neural Networks". Yes - overfitti... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

nshazeer on Jan 30, 2017 | parent | context | favorite | on: Outrageously Large Neural Networks: The Sparsely-G...

I am one of the authors of "Outrageously Large Neural Networks". Yes - overfitting is a problem. We employed dropout to combat overfitting. Even with dropout, we found that adding additional capacity provides diminishing returns once the capacity of the network exceeds the number of examples in the training data (see sec. 5.2). To demonstrate significant gains from really large networks, we had to use huge datasets, up to 100 billion words.

petra on Jan 30, 2017 [–]

Impressive work, mate!

Does mixture-of-experts works well the other way around, as a way to minimize power and hardware in common sized problems ?

And would it work in low resolution networks, like BinaryConnect ?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact