Interview

July 31, 2024

ElevenLabs’ Mati Staniszewski on tackling deepfakes, working with Disney and raising $101m in two years

The founder of the a16z and Sequoia-backed AI audio startup has brunch with Sifted


Zosia Wanat

10 min read

ElevenLabs founder Mati Staniszewski

The first word that comes to mind every time when I meet Mati Staniszewski, cofounder of the European audio AI startup ElevenLabs, is polished. 

To some extent, it’s a pun on his origin — he’s lived in the UK for more than a decade but he was born and raised in Warsaw — but it’s also how he speaks and acts. In neat black or white shirts, always smiling, positive and modest, he gives rounded, politically correct answers that have certainly been approved by his PR department. He never crosses the line — at least while speaking to the press. 

But maybe it is exactly this polish that has helped the 29-year-old, along with his 30-year-old cofounder Piotr Dąbkowski, to create a global AI audio sensation, in less than two years. 

Advertisement

Since the beginning of 2023, when it struggled to raise a $2m pre-seed round, ElevenLabs — an AI audio translator and generator — has become an international phenomenon. It’s raised $101m from some of the world’s most renowned VCs, such as Andreessen Horowitz and Sequoia, become a unicorn at a (fitting) $1.1bn valuation — and some of its early investors have, on paper, seen an 100x markup on their initial investment.

It’s quite the journey. “We’re slowly realising that there are in fact very few companies in the audio world at this level, and we want to be that voice that delivers,” Staniszewski tells Sifted over brunch. “A good way to think about us: something comparable to OpenAI for audio.”

From dubbing to all audio content

The place we’re meeting — Nessa, a restaurant in the heart of London’s Soho — suits my interviewee well: it’s neat, ordered and pretty, with pastel seating and marble tables. 

It’s also just around the corner from ElevenLabs’ new office, which for decades was occupied by Technicolor Creative Studios, a French animation company. “You can feel the production element in the room,” says Staniszewski, who’s a big fan. 

Over the last month, ElevenLabs has opened three new physical offices — the one in London, another in Warsaw and one in New York. They’re set up to accommodate a fast-growing team: the startup now employs around 70 people — up from eight people in January 2023 — and Staniszewski hopes to reach 100 to 150 by the end of this year.

”It feels that the team is still too small to do all the things that emerge on the horizon,” he says. 

ElevenLabs has a pretty convincing founding story: it all began with two Polish entrepreneurs who were sick of how movie voiceovers are done in the country. While in many other countries English films are dubbed by as many voices as characters, in Poland entire movies get one (male) voiceover, which is played on top of the original English speech. When you grow up with this on TV throughout your childhood, it’s no wonder why you’d try to fix it. 

The founders’ big dream was to make sure that film dubbing can be done in all languages, in the same voices as the original — and, in the long term, that their product could also do real-time translation. They’ve been getting there step-by-step: they created tools to convert text to speech, then audio to audio, then text to sound; plus others to clone voices and to remove background noise. All of this is done with their own research and models, so they can maintain the highest level of quality. 

The pace of development has been accelerated by the boom around large language models (LLMs) and generative AI — but also by the fact that there aren’t too many companies that focus solely on audio. 

“The goal hasn’t changed — the possibility to create content in every language, in every voice, in every sound," says Staniszewski.

Advertisement
Nessa, in London's Soho
The chosen brunch spot

So many use cases

Staniszewski is not a big breakfast fan, he says — he tends to work late at night, and skips early meals. But it’s already 11am, so he can be persuaded to order some healthy granola. (I, on the other hand, am a huge breakfast fan, and end up choosing a massive, fluffy pancake with strawberries, cream, jam and crumble.)

Topic of conversation: ElevenLabs’ ‘uses cases’ (a phrase he seems to love). There are some obvious and tested ones: creating audiobooks, dubbing for movies and for video games. But Staniszewski says those that make him the proudest are in healthtech and education. He recalls one user, a lawyer, who lost her voice due to cancer but could “regain” it through ElevenLabs’ technology in order to make her cases in court.

“So many use cases are coming up where clearly the solutions are missing, where we can do cool things,” he says. “What we’re trying to create now is a platform for the entire audio AI world which helps publishers and entertainment companies to create audio AI content. The tools they need are very different. We’re trying to get to the stage where they have easy access to all of them.”

Over time, the use cases that get prioritised have changed. “We look at where the research is ready; there are use cases that we would love to solve but we’re not ready yet — where it’s so difficult that the quality we want to deliver is not there yet… Another factor is whether the space, the client, really has that problem. Are we solving something real? Or are we chasing something more intuitively? We want to make sure that there’s a real problem. Third factor: where do we think we can create value in the long term? Where do we think there’ll be a competition and where will we keep delivering quality?” he says. 

Another need that’s emerging now is conversational AI. Staniszewski says that across different sectors — AI tutors, AI in healthcare, AI friends, AI customer care — this could be “the biggest use case in the upcoming years”. ElevenLabs already works with a medical company that uses AI to remind their patients to take their medicine and ask about their wellbeing, and in turn sends notes to doctors; and with an AI tutor which allows students to learn a language with an AI native speaker. 

“We don’t do the brain LLMs — the knowledge needed to have a medical chat — but we’re orchestrating the voice. AI needs to know when to start talking, to take a pause, to minimise the speed of the voice — and to make sure it’s all natural,” Staniszewski adds. 

While there are many startups and big tech companies that could work on this, Staniszewski only sees one competitor. 

“Our largest potential competition, less now, but we think it’s on the horizon, is OpenAI,” he says. “They have amazing research talent, they have amazing resources, their funding — I don’t even know how much bigger it is than ours, hundreds, if not thousands times bigger. They will probably also start to build more and more models in audio. We think they won’t be focused on it as a vertical but their quality will get better, and for some of the use cases it might be enough. In the use cases where there is less control or less quality, but the price is important, OpenAI will probably be a bigger competition.”

Defeating deepfakes

As we’re making our way through our meals, we get to the things that Staniszewski doesn’t want to talk about. One of them is data. Many AI companies, including OpenAI, are struggling with legally sourcing data for training their models. In the audio world, it can’t be much easier.

“We’re looking at this space, how this is going to evolve,” he says. “There are research elements that impact what kind of quality we can deliver and we decided that it’s better not to share [details].”  

Another one is the famous deepfake of US president Joe Biden. In January, an AI-generated robot caller impersonating Biden phoned some voters in New Hampshire telling them not to vote in the state’s primary election. Some audio detection experts said that the deepfake was likely created using ElevenLabs‘ technology (ElevenLabs has never admitted this). That incident wasn’t a one-off: this year’s European elections and the ongoing US presidential campaign have also been poisoned by deepfakes and conspiracy theories — concerning ElevenLabs, and other platforms. Just last week, a video circulating on X claimed that Biden’s call to his vice-president Kamala Harris, in which he embraced her candidacy for US president, was also created by ElevenLabs (the company denied these allegations).  

“When these cases have been emerging this year, there’s a big question mark over who created them. Even when sometimes we’re associated with these cases, it’s not us,” Staniszewski says, but he doesn’t want to speak about particular incidents.

It’s “a big problem on the horizon,” he says. “We think that the world of deepfakes or product scams is going to grow. As a company, we want to and we do take a lot of responsibility for how we can detect this content and prevent the bad actors from using it.”

There’s a long list of what the company is doing to stop scams: it tracks every file created by its tool; has released a tool that allows users to check online whether a voice has been cloned by its technology or not; introduced extra verification for those who want to produce high quality voice; created a list of voices that can’t be generated (for example of high-level politicians running in the elections); plus uses both automatic models and humans to check the content that people are creating. 

However, even with its best efforts, Staniszewski doesn’t think the problem is going to disappear. 

“This will be a problem for the whole space. It [the cloning] is possible in open source, in other commercial models from countries that don’t really care about the data or the rules — and these models already exist and are publicly distributed,” he says. “As ElevenLabs, we’re seeing less and less use cases, but in the industry… we see deepfakes. It’s one of the biggest question marks for the whole space: how can we counter those bad actors… The other big question is about fraud and scams at scale — how can we prevent that?” he says, adding that ElevenLabs is actively working with other businesses, academia and the public bodies to solve these issues.

It might feel like tilting at windmills when other companies don’t always have such high standards — take actress Scarlett Johansson accusing OpenAI for using her voice for their audio tool (something Staniszewski also doesn’t want to comment on) or other companies that don’t ask for permission to create deepfakes of dead or famous people.  

Staniszewski says that while businesses need to take responsibility for their tech and how it’s used, it’s the governments that should make sure that there’s no unfair advantage for those who don’t want to play by the rules. 

“If a company has growing influence, it has responsibility on how this technology is distributed,” he says. “But there are companies on the side of providers and distributors, there are companies from other countries that have no controls. How to create a level playing field? There should be a big input from regulations and public institutions,” he says. 

What's next

As we’re finishing our meal — and really running out of time — I manage to ask about Staniszewski’s plans for the future. 

Does he need more money to take ElevenLabs to the next level? In January he raised a $80m Series B — a substantial injection of money, but one that seems tiny in comparison to the sums raised by other European generative AI companies, such as Mistral and Aleph Alpha, not to mention those fundraising in the US. 

Staniszewski says that the company has recently been picked for the Disney Accelerator — a programme that allows startups to gain access to expertise and resources of the Hollywood media giant to help them develop new entertainment-related innovations. He doesn’t want to disclose if Disney has invested in his startup — but says “the main element [of the deal] is to get closer to use cases, experimenting with use cases and see if our technology can deliver quality… one of them is dubbing. If there’s anyone who understands how to act in the creators’ space, it’s them.”

When it comes to a future round, he says that he needs to know what the money would be for.

“If we don’t have concrete goals, we don’t fundraise. Now we have three things that we want to achieve: we want to continue research on frontier audio; we want to develop our go-to-market in the US, in Europe, maybe in Asia; and we want to build these partnerships with the talent, with [industry], with [those preventing scams]. These are the three elements that we’ve been investing a lot of energy, time and funds into. If one of those three starts to scale and develop, then we could potentially consider a next round,” he adds. 

“We’ve managed to reach a wider scale pretty quickly but we’re so lucky to have this opportunity ahead of us that maybe we will build this company for the next few decades, or next few generations. We’re not there yet, but only the fact that we have this chance is something that’s changed in the recent six to twelve months,” he says, while running out for his next meeting. “So many use cases ahead of us!”

Zosia Wanat

Zosia Wanat is a senior reporter at Sifted. She covers the CEE region and policy. Follow her on X and LinkedIn