This is one of the tradeoffs made to make the outputs safer. One of the ideas fl...

nomel · on Oct 7, 2023

To avoid the alignment tax, maybe the system could be broken into 3:

1. Aligned model to check the prompt. It could provide feedback/dumber output for obviously unsafe prompts

2. Unaligned model for the common path.

3. Aligned model to check safety of the output. Tweaks or stops output.

For the common path, the prompt text goes to the unaligned model without modification, and the output goes to the user without modification.

The slither models could just be safe versions of the unaligned model.

This, of course, is at least 3x expensive.

nvm0n2 · on Oct 8, 2023

AI cannot hurt you so "safety" just isn't the right word to use here. Nothing about this system prompt is concerned with safety, and it would clearly be better for the end users to just scrap the whole thing giving users direct access to DALL-E 3 without GPT sitting in the middle as a censor.

Now would such a thing be "safe" in legal terms, in the US justice system? Would it be "safe" for some of the employee's social lives? Maybe not, but, safety isn't the right word to use for those concerns.