I am one of the authors of "Outrageously Large Neural Networks". Yes - overfitting is a problem. We employed dropout to combat overfitting. Even with dropout, we found that adding additional capacity provides diminishing returns once the capacity of the network exceeds the number of examples in the training data (see sec. 5.2). To demonstrate significant gains from really large networks, we had to use huge datasets, up to 100 billion words.