every single model does/did this. Initially fine tuning required the expensive h...

bfeynman 19 days ago | parent | context | favorite | on: DeepSeek-R1: Incentivizing Reasoning Capability in...

every single model does/did this. Initially fine tuning required the expensive hand labeled outputs for RLHF. Generating your training data from that inherently encodes the learned distributions and improves performance, hence why some models would call themselves chatgpt despite not being openai models.