BTW, the research in one of these papers shows how they pass the Turing test. So there's that.
The quality of the generated answers of our mQA model on this dataset is evaluated by human judges through a Turing Test. Specifically, we mix the answers provided by humans and our model. The human judges need to distinguish our model from the human... The experiments show that in 64.7% of cases, the human judges cannot distinguish our model from humans.[1]
Looks like we'll have to apply the first law of AI: whatever a computer can do is no longer AI.
Edit: To be clear, this is Baidu NIPS paper, not a Google one.
Ah, but it is not the Turing test. The paper does mention that it is a Turing test. The proper Turing test requires, among other things, live back and forth with questions and not evaluation on a static data set. Interesting research nevertheless.
A great Interface for browsing this years papers (& previous NIPS).
https://cs.stanford.edu/people/karpathy/nips2015/