The term 'GPT' was first used in the BERT paper to refer to Generic Pre-trained Transformers. [0]
> The fine-tuning approach, such as the Generative Pre-trained Transformer (OpenAI GPT) (Radford et al., 2018), introduces minimal
task-specific parameters, and is trained on the downstream tasks by simply fine-tuning all pre-trained parameters.
Yes, but that paper that you link to doesn't ever call it GPT or Generative Pretrained Transformers. It talks about training Transformers with Generative Pretraining, both of which are pre-existing concepts by this point.
I also looked on the OpenAI website in Sep 2018 and could find no reference to GPT or Generative Pretrained Transformers, so I think OP might be right about BERT using it first.
> The fine-tuning approach, such as the Generative Pre-trained Transformer (OpenAI GPT) (Radford et al., 2018), introduces minimal task-specific parameters, and is trained on the downstream tasks by simply fine-tuning all pre-trained parameters.
[0] https://arxiv.org/abs/1810.04805