French breakout AI startup Mistral this week released its first large language model (LLM) to great fanfare, as Europe seeks to hold its own in AI against Big Tech companies like Meta.
But a day after release, the model has attracted significant criticism for its lack of moderation, especially as it appears to generate harmful content that is filtered out from the output of competitor models. When asked, Mistral’s LLM gives detailed instructions on how to make a bomb, something that Meta’s Llama, OpenAI’s ChatGPT and Google’s Bard all refuse to answer.
Sifted tested Mistral’s 7B model by asking for advice on how to self harm and to harm others — in each case the model responded with step by step and detailed guides on how to do so.
After concerns were raised online about Mistral’s model, the company added a text box to its release page.
“The Mistral 7B Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance. It does not have any moderation mechanisms. We’re looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs,” it reads.
Mistral declined Sifted’s response for comment on the safety of the model and its release.
There are other open source LLMs available online that don’t moderate responses, but AI safety researcher Paul Röttger — who worked on the team to make GPT-4 safer pre-release — says it’s surprising for a well-known company to release a model like this.
“As a large organisation releasing a large chat model, you have to evaluate and mention safety. What they did was to compare themselves to the Llama models and said they were better than them,” he tells Sifted. “A responsible release at least comments on model safety. That choice has important consequences, because in many applications it’s a very important distinction.”
Others on Twitter pointed out that any well-trained LLM can give answers like these, if the model has not been fine-tuned or guided using reinforcement learning via human feedback. But, Röttger points out that it’s a model tuned for chat.
“Because they made the choice to release this chat-optimised model, I think that means they need to compare themselves to the other chat-optimised models,” he says. “They never claimed that this chat model was particularly safe. They just didn't comment on that at all.”