> Few people would have thought that we can get this type of quality using just language modeling, so it's unclear how much further we can go before we have to start tackling those fundamental problems.
Yep, that seems to be the core question. This isn't going to be the road to AGI, but it's already ahead of what most people thought could be done without underlying referents. As you say, GPT-2 has some fundamental limitations.
One is 'cognitive': it doesn't have referents or make any kind of logical assessments, only symbolic ones. Even if it develops to the point of writing publication-worthy essays, they'll be syntheses of existing texts, and inventing a novel argument (except by chance) should be essentially impossible. Even cross-format transference is basically out of the question; a news story reading "the mouse ate the tiger" won't be ruled out based on dictionary entries about mice and tigers.
The other, though, is 'intentful'. As Nostalgebraist points out, GPT-2 is essentially a text predictor impersonating a text generator. That difference creates an intrinsic limitation, because GPT-2 isn't trying to create output interchangeable with its inputs. It actively moves away from "write a novel" towards "write the most common/formulaic elements of a novel". At the current level of quality, this might actually improve results; GPT is less likely to make jarring errors when emulating LoTR's walking-and-eating sections than when emulating Gandalf's death. But as it improves and is used to generate whole texts, the "most likely" model leaves it writing unsurprising, mid-corpus text indefinitely.
Solving the cognitive limitation essentially requires AGI, and even adding crude referents in one domain would be quite tough. So I'll make a wild speculation that fixing the intentful limitation will be a big advance soon: training specifically for generation will produce models that more accurately recreate the structure and 'tempo' of human text, without being put deterred by excess novelty.
Yep, that seems to be the core question. This isn't going to be the road to AGI, but it's already ahead of what most people thought could be done without underlying referents. As you say, GPT-2 has some fundamental limitations.
One is 'cognitive': it doesn't have referents or make any kind of logical assessments, only symbolic ones. Even if it develops to the point of writing publication-worthy essays, they'll be syntheses of existing texts, and inventing a novel argument (except by chance) should be essentially impossible. Even cross-format transference is basically out of the question; a news story reading "the mouse ate the tiger" won't be ruled out based on dictionary entries about mice and tigers.
The other, though, is 'intentful'. As Nostalgebraist points out, GPT-2 is essentially a text predictor impersonating a text generator. That difference creates an intrinsic limitation, because GPT-2 isn't trying to create output interchangeable with its inputs. It actively moves away from "write a novel" towards "write the most common/formulaic elements of a novel". At the current level of quality, this might actually improve results; GPT is less likely to make jarring errors when emulating LoTR's walking-and-eating sections than when emulating Gandalf's death. But as it improves and is used to generate whole texts, the "most likely" model leaves it writing unsurprising, mid-corpus text indefinitely.
Solving the cognitive limitation essentially requires AGI, and even adding crude referents in one domain would be quite tough. So I'll make a wild speculation that fixing the intentful limitation will be a big advance soon: training specifically for generation will produce models that more accurately recreate the structure and 'tempo' of human text, without being put deterred by excess novelty.