Hacker News new | past | comments | ask | show | jobs | submit login

The term 'GPT' was first used in the BERT paper to refer to Generic Pre-trained Transformers. [0]

> The fine-tuning approach, such as the Generative Pre-trained Transformer (OpenAI GPT) (Radford et al., 2018), introduces minimal task-specific parameters, and is trained on the downstream tasks by simply fine-tuning all pre-trained parameters.

[0] https://arxiv.org/abs/1810.04805




BERT paper is from 11 Oct 2018 and the OpenAI paper it was referring to "Improving Language Understanding by Generative Pre-Training" is from 11 Jun 2018: https://cdn.openai.com/research-covers/language-unsupervised...


Yes, but that paper that you link to doesn't ever call it GPT or Generative Pretrained Transformers. It talks about training Transformers with Generative Pretraining, both of which are pre-existing concepts by this point.

I also looked on the OpenAI website in Sep 2018 and could find no reference to GPT or Generative Pretrained Transformers, so I think OP might be right about BERT using it first.

http://web.archive.org/web/20180923011305/https://blog.opena...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: