If it wasn't an LLM that found the professor's mistake then the point I think yo...

wood_spirit · 2025-01-10T03:58:12 1736481492

My point was more meta - the experts don’t have a fixed opinion on the translations of rare ancient texts in the first place, ergo there is nothing to train the llma on.

YeGoblynQueenne · 2025-01-11T12:29:35 1736598575

Thanks for clarifying. But note that the same applies to modern translations. There is no agreement between experts (translators) what it means for a translation to be "good" or "bad", or anything in between.

So the done thing in translation is to choose some existing translation as a "gold standard" and use that one, without any assumptions of how good or bad it is. Sometimes there's an attempt to rate translations by polling humans but that can not be easily done at scale and certainly not at web-scale, as in LLMs (the modern de facto standard for automatic translation as far as I can tell; I haven't been watching the field closely lately).

The same logic is applied to metrics like BLEU and ROUGE scores, used to measure the goodness of a translation. The general idea is that you choose a translation to be the "gold standard" and then compare the n-grams in the gold standard, and the automatic translation, for overlap.

It's a very crude and imperfect measure and it's one reason why, in practice, we have no idea how good automatic translations are especially now that automatic translation systems are deployed and working every day with millions of input and output texts that nobody can reasonably be expected to evaluate.

In any case just because there's no fixed opinion on what is a good translation, doesn't mean an LLM can't be used to produce one. It probably will. Assuming I understand your point better now, I agree that this is going to cause trouble down the line.