In January 2021, an AI-powered fairy called Moxi was getting pissed off. Life is tough when your maker designed you to be a digital companion for children, but didn’t give you a language model capable of understanding the colourful lexicon of 7-12-year-olds.
“Our systems just couldn't understand what the kids were trying to say when they said things like, ‘Bah bah bah, da da da,’ that cost caused a lot of problems,” says Elisabeth L’Orange, cofounder of Oxolo. “The fairy would get frustrated, the kids got frustrated. It was just a mess.”
Moxi was the third product idea from the Hamburg-based generative AI startup — which today uses genAI to create automated ecommerce videos serving more than 160k users around the world. The company is now raising money and, as Sifted understands it, turning heads in VC land.
How L’Orange pivoted from fairy avatars to SaaS for online shops is a good lesson in how to build a useful — and lucrative — product with generative AI.
Bring out your dead
The company was born when fellow cofounder Heiko Hubertz visited the SXSW festival in Texas in 2019, where he saw genAI models being demoed for the first time and “realised this was the future”, says L’Orange.
At the time, both cofounders had parents who were seriously ill, and they thought that the kinds of generative models that can now be used to make deepfake videos could be used to create avatars that would allow their kids to meet their dead grandparents one day.
“The idea was that we would film that person and then we would create a whole language model around them,” she explains. “We’d then create an avatar of the person with a whole brain behind it, so you could have that person stay alive virtually.”
But after testing the waters and speaking to potential users, L’Orange and Hubertz realised that not everyone was as enthusiastic about this application of genAI as they were.
“We did proper market research and figured out that pretty much every one of the 500 people we quizzed thought it was an outrageously horrible idea,” L’Orange laughs.
Don’t hassle the Hoff
But the Oxolo team — then seven people — weren’t put off, and quickly began working on a new slant on the idea of a virtual avatar you could speak to, this time applying it to celebrities.
“We talked to David Hasselhoff’s management, but to license the faces of the celebrities three years ago was too early: they were either not willing to do it or were really expensive,” L’Orange explains.
The Oxolo team had built an MVP to test the idea by this point, based on the likeness of Barack Obama, and realised there was another issue with a realistic video chatbot — the speed of the interactions.
“If you want to have a personalised chatbot with a face on it you're always going to have a three-second delay because of the rendering of the face and the lip sync,” says L’Orange. “Right now the computing power just simply isn't there to do this in real time. So latency kind of ruins the conversation a little bit.”
This was where Moxi the fairy came in. An animated character, the Oxolo team hoped, would be quicker and easier for the AI system to render.
Systematic market research
The team then built its kid-focused app, Tipy, over a year between spring 2021 and spring 2022 and built up a 70k-strong user base. Then, when L’Orange and Hubertz decided the technical challenges of getting a language model to understand kids was too great, they set about finding a new application for the tech stack they’d built.
“We just systematically looked through different industries asking, ‘Where is generative AI going to have the biggest impact?’” she says.
The research led them to settle on creating videos for ecommerce — an industry where many smaller sellers find content creation prohibitively expensive.
Today Oxolo’s product uses seven different AI architectures to allow users to input the URL of an ecommerce listing, to then automatically generate a scripted product video narrated by an actor that can work across any type of product. Users can then easily make edits to the structure and script, as well as analyse customer data on which elements of the video are driving sales and which are turning customers away.
L’Orange says that the company’s data shows that their videos increase sales conversion rates by as much as 20%, and only cost $6 a pop.
The new SaaS
She adds that, while Oxolo is now adding 3,500 users per day and enjoying 10x revenue growth, the process of building the company has taught her some hard truths about making products in this space.
“It's difficult to build applications that actually generate money with AI. I think there's a lot of tech being built, which is beautiful, but doesn't really have a use case or purpose,” she says, adding that the well-worn wisdom of focusing on B2B rings as true as ever with generative AI.
“We heavily focused on building something that you can actually use… Once you integrate into the workflow of an enterprise it's difficult to get you out of there. That's why B2B SaaS has such high retention and is always favoured by the VCs.”
She thinks it’s also a space free from competition from Big Tech.
“None of these big companies really like to develop tech themselves,” she says. “They prefer to be platforms and charge for being on the app store, and they make tonnes more money doing that than developing their own solutions.”
L’Orange adds that Oxolo will be releasing a Shopify-integrated widget that’ll allow sellers to generate videos from within the platform soon.
Today the startup employs more than 30 people, and is looking to top up its small reserves of angel funding. And if Oxolo’s story teaches us anything, it’s that if you want to make a generative AI omelette, you might have to kill a few fairies along the way.