> Right, but GPT-2 was the name of the particular ML architecture they were studying the properties of; not the name of any specific model trained on that architecture.
That sounds like it would have been a reasonable choice for naming their research, but isn't the abbreviation "GPT" short for "Generative Pre-trained Transformer"? Seems like they very specifically refer to the pre-trained model, which I would also take from the GPT-2 paper's abstract: "Our largest model, GPT-2, is a 1.5B parameter Transformer[...]" [1]
That sounds like it would have been a reasonable choice for naming their research, but isn't the abbreviation "GPT" short for "Generative Pre-trained Transformer"? Seems like they very specifically refer to the pre-trained model, which I would also take from the GPT-2 paper's abstract: "Our largest model, GPT-2, is a 1.5B parameter Transformer[...]" [1]
--- [1] https://cdn.openai.com/better-language-models/language_model...