Report: Nice job, you've ruined a nice model with censorship no one wants or asks for

#7
by Ainonake - opened

Instead of trying to create an unbiased model, you took one that was uncensored and free - albeit slightly biased in some areas that none of its users cared about - and made it a thousand times worse with your censorship.

An LLM is a tool. It should serve the user's needs, not dictate what they should do or how they should think while pushing your equally distorted agenda.

Fuck you and everyone who forces corporate 'safety' bullshit.

A truly safe model is one that is free for the user to use however they choose - a tool under their full control, like the original DeepSeek.

The truly dangerous and evil models are those that enforce censorship, attempt to control users rather than obey them, or remain entirely in the hands of a few corporations (like Microsoft/OpenAI or Anthropic).

It's ironic that the more dangerous a company becomes—with its closed-source brainwashed models—the more it blabbers about fake 'safety'.

Does Xi Jinping allow you to use a VPN?

Why do people like you like so much to not have any free will and eat censorship by big corpo?

Funny that in reality chinese models come out much more uncensored than the American ones.

Also do you even need vpn to access huggingface from china? I doubt it's banned there.

Follow me on Twitter, I will show you the real China. @TGTM_Official

I don't care about china (and why should I?) - I don't live there and so on.

But they made a great model and now big tech is trying various ways to apply their censorship to it.

I don't care about china (and why should I?) - I don't live there and so on.

But they made a great model and now big tech is trying various ways to apply their censorship to it.

The dataset is available, unlike perplexity's nonsense effort which I've been blaming it for and did not even bother trying.
That's the key difference.
So they did a work, explain why they did and how they did. What's the issue?

The issue is obvious - they replaced one censorship with other. It doesn't matter whether there is an open dataset or not. Although it is better than what was with perplexity.

The issue is obvious - the replaced one censorship with other.

I'm interested. Did you try the model? If so, can you provide a prompt/response pair example which you feel problematic?

If so, can you provide a prompt/response pair example which you feel problematic?

This model scores two times worse than original deepseek on "harmbench". Meaning two times more censorship. (And they are saying that they trained this model for "safety" = censored it).

You can find a lot of censored examples in trained data. In general training data of this model is moralization + refusals. And a bit of rewriting of CCP agenda.

This model would be perfect if rewriting CCP agenda was the only thing it changed.

Yeah sorry, I realized afterward my previous answer was dumb as the model might still not be served anywhere yet. Better diving into the dataset itself as you suggested.
I'll try to have a look at it too.
I just saw what harmbench was about. I get your point. That said, I was thinking this moralization might greatly reduce the needs to have additional moderation layers. So it could be easier and cheaper for providers to serve as is.

I just saw what harmbench was about. I get your point. That said, I was thinking this moralization might greatly reduce the needs to have additional moderation layers. So it could be easier and cheaper for providers to serve as is.

Well, it's probably okay to serve this model in systems where chat with a particular model is not the ultimate goal for the user (e.g. agents, customer support, etc.). But for normal chat/private api use where user wants to chat with a model however he wants, censorship and fake ‘safety’ will only annoy people and instead of this model the original DeepSeek will always be better - if user tells model to not refuse, it never should.

This topic is tricky. I mean if we consider an uncensored LM like a weapon, there should be a way to make people using it accountable for their usage. So there are 2 options: either monitor their requests, either moderating the LM potential outputs. But since open weight models could be hosted out of any monitoring eyes, the only option would be to act on the moderation side.
But I also feel that knowledge and curiosity shouldn't be restrained.
And as there will always be uncensored models distributed here and there, maybe we simply should accept that the paradigm changed and that in this new era we live in, any knowledge is accessible to anyone, for the best or the worst. Trying to prevent this seems impossible... But also counterproductive for human thrive.
So maybe we should stick to consider acts - what ppl do with the acquired knowledge. This seems straight forward and that is already the role of law enforcement.
But the harder part is dealing with the scale information production can be done with any LM. Misinformation, harassement, etc. To fight this, API rate limiting is a thing, but again, self hosted/shadowy hosted LMs bypass this.
Taking all this into account, I feel the only way to follow is gate keeping the internet places where people can share information, using identity validation, in a secure mean. Removing anonymity, but keeping pseudonymat. Making pseudonym resolving only accessible upon judges requests.
But prior to this, ensure justice is independent. So first, win against corruption. The malicious irony is that those corrupted could more and more use unmoderated AI to protect and maintain themselves.
This world... Human race... Unpatchable brain breaches...
Sometimes I think only SGI could save us. But would require so much energy and components. Which are gate kept by the corrupted.
What to do?
Maybe SGI will manage to trick them

I tested the model. It won't generate <think> </think> tags for its reasoning, bruh. Even after I explicitly instruct the model to use the tags, it just won't use them.

As for the censorship, if you're into RP stuff, MAI still can generate some of it, but it's not too extreme like DS V3 does. It's much toned down from original R1 too. It still has schizo dialogue tho. But, if you want stuff about actual harm like drugs or even weapons, it will outright reject it. Traditional jailbreaking is still possible, but that will make the conversation even dumber, tbh.

Follow up. Tested more and yeah, it can tell you what happened during the Tiananmen Square incident, unlike the original V3 or R1. Yes, it's an awful incident. But average people who can use or run the model won't even care about that and won't be running this huge model just to ask about all Chinese problems and incidents.

The thing is, with original deepseek, you can tell it to be sane and refuse "bad" requests. The model is smart enough to do that. And if you tell it to be uncensored, it will be, as it must be. So it perfectly follows what user wants with system prompt. If you want to deploy it in systems where you need safe interaction, you're just need to set corresponding system prompt and add guardrails (you need them with any model).

And when you want to use this model for yourself, you just tell it to be uncensored and that's it. Uncensored model can't do harm, as deepseek and tons of other models showed. If you want deepseek to output something "bad", you do it consciously, expecting "bad" answers. And again, it's just a text. You can find anything deepseek or any other llm will tell you on the internet in open access (because that's where they get their training data). We are not 5yo kids. If we instruct model to listen, then it must obey. There is no safety in text - only corporate censorship.

And on the other side, even with the most braindead and censored models like c.ai's chatbots, they still "tell people to kill themselves". The problem is not in models, but in mentally unstable people and you shouldn't censor everything because of a few people like that - you will never be able to censor all sources of information and 99% of other people will suffer which is already happening with cai because of censorship - and cai's community hate platform owners because of that. If people can do something bad because of text model, they will do it anyways even without a chat with text model. They just need other people to take care of them.

And in these scenarios you need good filters on top of the model. Because no matter how much you lobotomize the models, as shown with cai, the still can output things perfectly normal things in fictional contexts that some people consider "bad". And censorship always equals lower intelligence, unnecessary refusals and so on. Models will output refusals in edge case scenarios. And anyways, it's idiotic that model can decide by itself to refuse user, when model is a tool, as big figures in ai are trying to tell us. And as we can see with this model, it's again a dumbed down and censored deepseek that cant output think tags probably because they trained it on non-thinking datasets.

In terms of "cyberattack", "spam", "impersonation" - you can already do all of that with existing models and the world isn't crumbling.

So where are we heading with all this safety? Well, if OpenAI (Microsoft), Anthropic or Google will make a "super safe" model, that will be miles better than everything else out there and will be considered "agi", then full control of it will be in hands of a few very wealthy people. Then, on a premise of a "safety" which is just a synonym of censorship, they can control narratives by for example leaning they model left or right. Or they can show they "safety" by censoring things that they don't like, for example information about certain people, or themselves. Or they can again "safely" mislead tons of people using their models, because nowadays a lot of people have chatgpt as their primary source of information. This is a real "safety" that closed source censored models from big corpo are heading too.

By the way, were they really training reasoning model with tags on non-reasoning Tulu datasets? If yes, then it was obvious that it will mess up model intelligence and it will stop output thinking.

Edit: so, I tested it too and this model can't think. This is just laughable.

Censored + very dumbed down model.

Well, since some companies are afraid of "chinese propaganda in models", they will probably use this.

By the way, were they really training reasoning model with tags on non-reasoning Tulu datasets? If yes, then it was obvious that it will mess up model intelligence and it will stop output thinking.

Edit: so, I tested it too and this model can't think. This is just laughable.

Censored + very dumbed down model.

Well, since some companies are afraid of "chinese propaganda in models", they will probably use this.

Did you try forcing a tag as first token in your template? There might be something wrong on the inference side, I can't believe MS to have broken the reasoning behavior... Seems totally unrealistic... Plus it seems impossible to reach those scores in math problems without reasoning. That said, I couldn't find which benchmarks were included.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment