ChatGPT is trained using a combination of supervised and unsupervised learning. For supervised learning, it is trained on a large dataset of human-generated text, such as dialogue data or online conversations. This allows it to learn the structure and style of natural language. For unsupervised learning, it is trained using a language modeling objective, which involves predicting the next word in a sequence of text. This allows it to learn the broader patterns and characteristics of language, and to generate text that is fluent and coherent.
ChatGPT and GPT-3 are both large language models trained by OpenAI, but they have some important differences. GPT-3 is a more general-purpose language model, which means it is trained on a broader range of data and can generate a wider range of responses. It is also much larger than ChatGPT, with 175 billion parameters compared to ChatGPT's 2.6 billion parameters. This makes GPT-3 more powerful and capable of generating more realistic and diverse text, but also makes it more expensive and resource-intensive to use.
In case you are curious, the above information was written entirely by ChatGPT when asking it about itself.
ChatGPT is based on the InstructGPT weights, based on the GPT-3 weights. It is roughly the same number of parameters, as far as we can tell.
The GPT-3 weights were obtained by doing unsupervised pre-training (hence, GPT: generative pre-training): maximizing the likelihood of the model predicting the next word in a large dataset of human text.
The InstructGPT weights were obtained with supervised fine-tuning (SFT) by making the model generate text, and asking a human to show a better text completion (as described in InstructGPT). Then, they also asked humans to rank multiple generated outputs, which was used as the supervised training goal of a separate reward function. That small amount of ranked data unlocked the ability to rank a much larger amount of data through reinforcement learning using proximal policy optimization (PPO): the model generates an output, the reward function rates it, and the model weights are updated to achieve a higher reward.
The ChatGPT beta weights were obtained by doing that again, but asking the humans to make the completion conversational. Since they could only pay few humans, they opened the beta to ask a wider range of people to do SFT using the feedback feature, to train the final ChatGPT weights.
So, all in all, the parameter estimation is incorrect, the order of the training is not right, the description of the purpose of the supervised learning step is wrong, the defining part of the ChatGPT training process is not mentioned (because the InstructGPT paper came in 2022, after the knowledge cut-off), the description of the difference with GPT-3 is misleading.
ChatGPT and GPT-3 are both large language models trained by OpenAI, but they have some important differences. GPT-3 is a more general-purpose language model, which means it is trained on a broader range of data and can generate a wider range of responses. It is also much larger than ChatGPT, with 175 billion parameters compared to ChatGPT's 2.6 billion parameters. This makes GPT-3 more powerful and capable of generating more realistic and diverse text, but also makes it more expensive and resource-intensive to use.
In case you are curious, the above information was written entirely by ChatGPT when asking it about itself.