>> Nah, if you can't handle the huge amount of data, it's possible to just switc...

>> Nah, if you can't handle the huge amount of data, it's possible to just switch to a sparse model or do MC-like sampling... That way the network doesn't scale linearly with the size of the domain being learned.

That's useful when your domain is finite, like in your example, Go. If you're dealing with a non-finite domain, like language, MC won't save you. When you sample from a huge domain, you eventually get something manageable. When you sample from an infinite domain - you get back an infinite domain.

That's why approximating infinite processes is hard: because you can only approximate infinity with itself. And all the computing power in the world will not save you.

>> It's been demonstrated that when a good benchmark appears, a lot of papers follow and top results improve massively.

Mnyeah, I don't know about that. It's useful to have a motivator but on the other hand the competitions become self-fulfilling prophecies, the datasets come with biases that the real world has no obligation to abide by and the competitors tend to optimise for beating the competition rather than solving the problem per se.

So you read about near-perfect results on a staple dataset, so good that it's meaningless to improve on them - 98.6% or something. Then you wait and wait to see the same results in everyday use, but when the systems are deployed in the real world their performance goes way down, so you have a system that got 99 ish in the staple dataset but 60 ish in production, as many others did before it. What have we gained, in practice? We learned how to beat a competition. That's just a waste of time.

And it's even worse because it distracts everyone, just like you say: the press, researchers, grant money...

Well, OK, I'm not saying the competitions are a waste of time, as such. But overfitting to them is a big problem in practice.

>> A promising direction is extending neural networks with memory and attention

That's what I'm talking about, isn't it? Just raw computing power won't do anything. We need to get smarter. So I'm not disagreeing with you, I'm disagreeing with the tendency to throw a bunch of data at a bunch of GPUs and say we've made progress because the whole thing runs faster. You may run faster on a bike, but you won't outrun a horse.

(Oh dear, now someone's gonna point me to a video of a man on a bike outrunning a horse. Fine, internets. You win).