Yes, and this is particularly problematic IMHO in sequence-to-sequence translators, as were used here. When fed enough training data, they do an amazing job at aping their training, but they are prone to spout utter nonsense in edge cases, and they don't really (to my knowledge) have any indication that they should have less confidence in one result than in the other.
So best case this system gives black box answers that may or may not hold up in verification. That does not seem to be a very useful way to do mathematical research.
So best case this system gives black box answers that may or may not hold up in verification. That does not seem to be a very useful way to do mathematical research.