every single model does/did this. Initially fine tuning required the expensive hand labeled outputs for RLHF. Generating your training data from that inherently encodes the learned distributions and improves performance, hence why some models would call themselves chatgpt despite not being openai models.