Hacker News new | past | comments | ask | show | jobs | submit login

It's kind of interesting if you view this as part of RLHF:

By processing the system prompt in the model and collecting model responses as well as user signals, Anthropic can then use the collected data to perform RLHF to actually "internalize" the system prompt (behaviour) within the model without the need of explicitly specifying it in the future.

Overtime as the model gets better at following its "internal system prompt" embedded in the weights/activation space, we can reduce the amount of explicit system prompts.






Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: