I am wondering what their baseline is. They call it "Current Best Non-WaveNet". Quite frankly, Apple's most recent deep learning-based speech synthesis sounds superior, but there aren't enough samples to for a proper comparison: https://machinelearning.apple.com/2017/08/06/siri-voices.htm...
It could just be a matter of opinion, but I prefer both Google's unit selection synthesis, and their WaveNet synthesis. The prosody in Apple's latest method is still annoying, nowhere near as good as the Google models of 2015 and 2016, and not remotely comparable to the WaveNet models.
Apple's change in voice talent is an improvement though, and they may have more units than before, which is helpful. I believe their model also works offline, which is a huge plus (though I think Google's prior model works offline as well).
I think the voice for the samples in your link still has the problems they talk about in that article.
There are noticeable blips in the speech that sound unnatural, particularly when certain sound combinations are used.
The very first sample with "Bruce Frederick" is clearly off. The intonation and timing between the end of bruce and the beginning of frederick is... mechanical.
There's a similar problem in the OPs link with the non-wavenet English voice 1 when it says "Wavenet".
Those issues are much less apparent in the wavenet voices. Timing problems are less noticeable, intonation problems are less noticeable.
Frankly, the voices there sound VERY good, compared to anything I've heard.
That said, I completely agree that there's not enough samples there to make any real judgement.
I think I read "commercial" somewhere in there. So it'd be "the best you can buy", though not necesarily "the best other competing companies use" (ie: Apple).
Still, they picked one that makes theirs look vastly superior.