Hacker News new | past | comments | ask | show | jobs | submit login

So what? Plenty of model trainers are fine tuning to remove all the alignment and bias crap anyway. The concept of 'jailbreak' to a freely distributed open source model doesn't really apply.



The point is that model alignment ensures the model's security and safety, especially in public-facing applications. If alignment is removed through fine-tuning, the model becomes unsafe for use in such contexts.


If ensuring that kind of "safety" (a massive misuse of the word) were an actual security concern, then neither the original, "aligned", un-fine-tuned model, nor a model with an outboard "jailbreak detector", would ever be reliable enough for the word "ensure" to apply, and no sane person would deploy them at all. The "alignment" technologies do not work and nobody knows how to make them work.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: