Some of the US’s leading investors are knocking down the doors of Europe’s generative AI startups this year. In May, NEA made a proactive offer to Stockholm-based Sana Labs, and last week Nvidia backed London-based Synthesia when the company “wasn’t actively looking for new investment”.
This week it’s the turn of London and New York-headquartered ElevenLabs — a startup building technology to generate synthetic voices — which has raised a $19m Series A round led by Andreessen Horowitz, GitHub founder Nat Friedman and Daniel Gross, a former machine learning director at Apple.
ElevenLabs cofounder Mati Staniszewski also tells Sifted that he wasn’t “looking actively” for investment but, after early conversations with Gross, Friedman and a16z, realised that these heavyweight backers would be able to help scale the business.
Building the B2B offering
Staniszewski says the money will be used to hire people in two areas: machine learning specialists who can help improve ElevenLabs’ core audio tech, and software engineers and product designers who can build out the company’s offering across three verticals.
These, he says, are publishing (audiobooks, newsletters and voiced-up articles that we’ve already experimented with on Sifted), gaming (AI-generated voices for in-game characters) and entertainment (AI dubbing for movies and TV).
Staniszewski says that, since the product launched earlier this year, the fastest takeup has been from individual publishers, like YouTubers creating voiceovers or independent book authors for whom getting a voice-acted audiobook recorded would be prohibitively expensive. He says that, depending on the length of the book, it generally costs no more than “a few hundred dollars” to generate an audiobook with an AI voice.
Staniszewski says that ElevenLabs is now focused on expanding its B2B business, and has signed partnerships with audiobook platform Storytel and digital media publisher TheSoul, both of which joined the round as strategic investors.
“When we launched the focus was all on creators — people creating voiceovers for YouTube or social media,” he explains. “In recent months it's shifted a lot to B2B.”
Staniszewski adds that he’ll soon be able to announce a partnership with a leading news organisation, which he says will be using the technology to automatically turn its articles into audio.
The original vision
While publishing appears to be ElevenLabs’ earliest sign of product-market fit, the startup is still working hard to realise its original vision for the company — a comprehensive AI film dubbing system.
Staniszewski and his cofounder Piotr Dabkowski are both Polish and started the company to try and build a solution for the low-quality film dubbing that was available in their native language.
There are still some challenges to overcome, and not just technical ones. Staniszewski explains that ElevenLabs’ models will have to account for the fact that certain languages tend to use more time and words to communicate the same message.
“When you translate something from English to Spanish, it's gonna be about 30-40% longer on average in Spanish,” he says. “You need to think not only about directly translating, but also paraphrasing to make sure that the length is the same.”
As well as movie dubbing, ElevenLabs is also working on a new — slightly dystopian — application for synthetic voices: AI companion apps that turn chatbot responses into lifelike voices.
“We've seen that already in Asian countries like Korea or Japan where that's a popular thing,” he says.
To do this, the startup is working on bringing down the latency time of responses from the AI voice, which is currently eight tenths of a second, to half of a second, to make interactions feel more natural.
Synthetic voice technology like this also opens the doors to what Staniszewski describes as “nefarious use cases” — think deepfakes or phone scams. He says the company is continually working on safeguards to try and minimise misuse of its platform, and recently released a tool where people can check if audio is AI-generated.
What’s the market like?
ElevenLabs is one of a number of startups using generative AI to make synthetic voice software. Others include Valencian Voicemod, which is focused on the gaming sector, and London-founded Sonantic, which was acquired by Spotify last year.
Big tech is also getting in on the action — Meta recently released its own voice generation tool — so startups will need to hustle hard to make sure they’re responsive to customer needs if they’re to stay in the race.
But now, with the backing of some of Silicon Valley’s most reputed investors, ElevenLabs has the level of capital and the guidance to make a real stab at it.