Paris-based generative AI company Mistral has released its first large language model (LLM) — a significant milestone in Europe’s efforts to claim a piece of the GenAI market.
The company says the model outperforms comparable alternatives on the market — including Meta’s Llama 2 — and that it requires nearly 50% less computational power to run. It’s the first piece of technology the company has built since raising a €105m seed round in June.
Mistral’s efficient design of models — combined with its open source strategy — is how cofounder Arthur Mensch says his company can prove critics wrong, as industry experts begin to question whether startups can compete with big tech when it comes to training big models.
“The foundational layer story isn't written yet. There's still many things to invent. And that's what we're starting doing,” he tells Sifted. “That's why we left our companies that weren't innovative enough — that’s why we started Mistral AI.”
New funding models required
Mistral isn’t disclosing how much this model cost to train, but did tell Sifted that this model used around 200k hours of GPU hours (a measure of how much computational power is used in AI training). To give a ballpark, NVIDIA’s latest chips cost around $2-2.5 per hour on the cloud, meaning that Mistral’s model is likely to have cost $400k-450k in compute alone, and the company is currently training larger models that will be more resource intensive.
GPT-4, a far larger model than Mistral’s first effort, cost more than $100m to train.
Mensch isn’t able to reveal all of the secret sauce of how it’s able to train more efficient models than competitors, but he does say that his team has put a lot of effort “into the data side”, as well as the algorithms used to train the models.
The capital-intensive process of training AI models has led some experts to ask whether using equity financing to pay for these compute costs is a smart use of dilutive funding — a question that Mensch says Mistral is also pondering.
“The economics of trading equity for compute are not great,” he says, hinting that Mistral might need to look at owning, or sharing, its own hardware resources in the future.
“A company like ours may need to move a little bit more on the infrastructure stack. This will likely involve some financial engineering, including equity and debt.”
The road to monetisation
Mensch says that Mistral will develop a “family of models” and that some will be fully open source and licenced for commercial use (like this first release), while others will be proprietary, meaning customers will pay to use them.
He says that the efficiency savings that come with Mistral’s best-performing models — as well as getting operational and integration support to use them — will be among the reasons that customers will pay up rather than stick with the company’s free options.
Mensch also says that the company’s open source strategy means that it will benefit from improvements and iterations on the model made by members of the AI and machine learning community.
So far, he says that the company has four clients it’s working with and can publicly disclose that one of those is French healthtech scaleup Alan. So far, Mistral is focusing on applications like document querying, summarising company communication, and for generating personalised marketing material.
“These are workflows that we know LLMs work very well for. I think we will move to harder use cases and other use cases will require deeper integration,” he says.
Mensch says that Mistral is, for now, fully focused on building the best models in the business, but says that it may shift to having more of a focus on applications “if it turns out that the value is mostly located there”.