Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is one of the tradeoffs made to make the outputs safer. One of the ideas floating around is that some of the open source models are better simply because they don't undergo the same alignment / safety tuning as the large models by industry labs. It'll be interesting to see how LLMs improve because safety is a requirement but how can it be accomplished without reducing performance.


To avoid the alignment tax, maybe the system could be broken into 3:

1. Aligned model to check the prompt. It could provide feedback/dumber output for obviously unsafe prompts

2. Unaligned model for the common path.

3. Aligned model to check safety of the output. Tweaks or stops output.

For the common path, the prompt text goes to the unaligned model without modification, and the output goes to the user without modification.

The slither models could just be safe versions of the unaligned model.

This, of course, is at least 3x expensive.


AI cannot hurt you so "safety" just isn't the right word to use here. Nothing about this system prompt is concerned with safety, and it would clearly be better for the end users to just scrap the whole thing giving users direct access to DALL-E 3 without GPT sitting in the middle as a censor.

Now would such a thing be "safe" in legal terms, in the US justice system? Would it be "safe" for some of the employee's social lives? Maybe not, but, safety isn't the right word to use for those concerns.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: