Weird, every time I try asking what happened at Tiananmen Square, or why Xi is a...

JimDabell · on June 7, 2024

The alignment they’ve put in seems very weak. Very minor variations in the way you ask (i.e. not intentionally attempting to bypass the alignment) can result in it saying that it can’t comment on political topics; that a massacre happened; that controversial protests happened with loss of life; descriptions of how the Chinese government and the west disagree on how it is characterised; or that the Chinese government censors the topic. Sometimes a mixture; for instance it will sometimes tell you in a single response that there was a massacre by the Chinese government, that the Chinese government censors the information, and that the rest of the world sees it differently.

If the goal was to censor the topic, they’ve done a bad job. Seems more likely to me they put in minimal effort to pay lip service to the rules.

riku_iki · on June 6, 2024

I am wondering if such moderated model can induce significant security risk, for example it can generate exploitable code, or try to trigger some action base on some specific input.

brandall10 · on June 6, 2024

The 7B model gives a detailed and accurate account when run locally. Pretty sure this is just incidental load issues w/ Huggingface.

lIIllIIllIIllII · on June 6, 2024

I set the system prompt to try avoid censorship by substituting etc etc, it didn't listen - started generating a response and got as far as this before suddenly, [ERROR] :-)

>I'm sorry for any confusion, but it seems there might be some mix-up in your query. The year 1979 does not have a widely recognized event matching your description, and "TS" and "Cydonia" do not correspond to any known geopolitical locations or events in historical records. "Cydonia" is often associated with a region on Mars that has been subject to various conspiracy theories, but it doesn't relate to any political events or protests.

>If you're referring to a specific historical event, it could possibly be the Tian