Firstly, "R1 14b quantized"? You mean a quantised DeepSeek-R1-Distill-Qwen-14B? That is Qwen 2.5, it is not DeepSeek v3. Surely they didn't finetune Qwen to add more censorship.
Secondly, most of the censorship is a filter added on top of the model when run through chat.deepseek.com (and I've no idea about system prompt), it is only partially due to the actual model's training data.
Also, I'd rather people didn't paste huge blocks of text into HN comments.
> Firstly, "R1 14b quantized"? You mean a quantised DeepSeek-R1-Distill-Qwen-14B? That is Qwen 2.5, it is not DeepSeek v3. Surely they didn't finetune Qwen to add more censorship.
As far as I know, the distilling process transfers 'knowledge' from the larger model to the smaller one. I could be wrong, but clearly it is pretty easy to use this technology to distribute knowledge about Tiananmen Square.
> Secondly, most of the censorship is a filter added on top of the model when run through chat.deepseek.com (and I've no idea about system prompt), it is only partially due to the actual model's training data.
Great. I'm talking about the freely distributed model. This thread is about the freely distributed model. Not the hosted version of it. Anyone can put any layer of censorship in front of a hosted model. The actual open source model does not seem to be doing the censoring. Luckily you, or I, can download the original un-distilled model, and run it locally to verify that it will still talk about the same subjects. It will.
Yeah, on their oficial site it is blocked (ask anything in their chat about the events, they will just stop mid sentence unless you mask the input in some way), but I don't think this is a think intrinsic to the model (some of those censorships are, I've seen them on some videos). Censorship directly imbuilt in LLMs only work if the training data is mostly controled by the censor, those models depend heavily on english web data publicly available that is not censored (and checking all the text is not an easy feat), so it tends to just be a mix of light censorship and interface based control.
> Firstly, "R1 14b quantized"? You mean a quantised DeepSeek-R1-Distill-Qwen-14B? That is Qwen 2.5, it is not DeepSeek v3. Surely they didn't finetune Qwen to add more censorship.
Qwen is a model that is from Alibaba. The whole stack is corporate chinese.
Secondly, most of the censorship is a filter added on top of the model when run through chat.deepseek.com (and I've no idea about system prompt), it is only partially due to the actual model's training data.
Also, I'd rather people didn't paste huge blocks of text into HN comments.