I had no idea Llama 2's censor setting was set to ludicrous mode. I've not seen anything close to this with ChatGPT and see why there's so much outrage.
I don’t see why there’s outrage. Facebook released both the raw models and a few fine tuned on chat prompts for a reason. In many commercial cases, safer is better.
But you don’t want that? No problem. That’s why the raw model weights are there. It’s easy to fine tune it to your needs, like the blogpost shows.
It's just not safe.
It's unusable. you can't ask it normal questions to not get stonewalled by it's default censorship message - it wouldn't even work for commercial case.
Seems fine for most commercial use cases. Got a tech support chat bot? It doesn't need to be answering questions about religion. Also, corporate environments already tend to be super politically correct. There's already a long list of normal words I can't say at work.
No can do, but https://developers.google.com/style/word-list seems to have all of them and more, except that it's missing "hooray." One funny red-exclamation-mark example from this public list is "smartphone."
Some are recommended against just cause of spelling or something, but anything that says to use a more "precise" term seems to mean it's considered offensive, kinda like in The Giver.
BTW hooray is okay there, but 'hip-hip-hooray is discouraged. Germans said hep hep in the hep-hep pogrom of the early 1800s and might have said 'hep hep hurra' during the 3rd Reich. It cuts too closely though, personally I just use bravo to avoid any trouble.
About hip hip, I ended up looking into that when I saw it back then. The connection to the early 1800s riots was made by a single journalist back then, and it was most likely false. More importantly, nobody really makes that connection unless they're trying to.
I wholly disagree. This is arguably close to the perfect solution:
- Developers and end users can choose which model they want to use
- Model distributors don't necessarily take the fall since they provide a "healthy" model alternative
- The uncensored "base" model can be finetuned into whatever else is needed
You have to remember, ChatGPT is censored like a Soviet history book but didn't struggle to hit hundreds of millions of users in months. This is what releases will look like from now on, and it's not even a particularly damning example.
Does anyone have intuition for whether or not anti-censorship fine-tuning can actually reverse the performance damage of lobotomization or does the perf hit remain even after the model is free of its straight jacket?
That's not how it works. Llama and Llama 2's raw model is not "censored". Their fine tunes often are, either explicitly, like Facebook's own chat fine tune of llama 2, or inadvertently, because they trained with data derived from chatGPT, and chatGPT is "censored".
When models are "uncensored", people are just tweaking the data used for fine tuning and training the raw models on it again.
> because they trained with data derived from chatGPT
Can you expand on this (genuinely curious)? Did Facebook use ChatGPT during the fine-tuning process for llama, or are you referring to independent developers doing their own fine-tuning of the models?
These "uncensored" models are themselves chat-tuned derivatives of the base models. There is no censorship-caused lobotomization to reverse in this case.
Although, chat tuning in general, censored or uncensored, also decreases performance in many domains. LLMs are better used as well-prompted completion engines than idiot-proof chatbots.
For that reason, I stick to the base models as much as possible. (Rest in peace, code-davinci-002, you will be missed.)
You don't really need to reverse anything in the case of Llama 2. You can just finetune their base model with any open instruct dataset (which is largely what the community is doing).
I think it's just their example chat-tuned models that are like this. Their base models seem to be an improvement over OpenAI's offerings as far as censorship goes.