> The language model could have "hallucinated" its own system prompt instruction... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

		baby_souffle 5 days ago \| parent \| context \| favorite \| on: Claude's system prompt is over 24k tokens with too... > The language model could have "hallucinated" its own system prompt instructions, leaving no guarantee that this is the real deal. How would you detect this? I always wonder about this when I see a 'jail break' or similar for LLM...

gcr 5 days ago [–]

In this case it’s easy: get the model to output its own system prompt and then compare to the published (authoritative) version.

The actual system prompt, the “public” version, and whatever the model outputs could all be fairly different from each other though.

Consider applying for YC's Summer 2025 batch! Applications are open till May 13
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact