Since ChatGPT’s launch last year, large language models (LLMs) have mostly drawn attention for helping non-engineers write code and explaining quantum computing to five-year-olds.
Less well-known is the fact that LLMs are already being used widely in healthcare — Sifted has identified a number of European startups using generative AI-powered tools that are already being used in millions of real-life care situations. The tools range from doctor-patient interaction analysis to new drug creation.
Wider use of AI in healthcare has been facilitated by better access to large volumes of structured data to train models on, and better hardware to carry out AI model training.
“Within a few years, we will have AI that is better at doing this groundwork [in diagnosing a patient]. There is where the generative models come in,” Matilda Andersson, a machine learning engineer active in healthcare, tells Sifted.
“A lot is about being able to do analyses that are too complex for humans. No human can go through 5m data points — we don't have that ability or the time,” she says. “Physicians are already under a huge time constraint.”
Eavesdropping on patients
One use case for models like these is to listen in on patient-doctor meetings and help physicians deliver the right diagnosis and treatment. Spoken interactions always present the issue of interpretation, and depending on what patients tell medical staff, they can get the wrong treatment.
Danish AI startup Corti has focused on augmenting human conversation in healthcare with generative AI since 2018. The company started by learning from recordings of emergency medical phone calls involving cardiac arrest — now its technology is used by hospitals and other healthcare institutions in Scandinavia, the UK and the US in over 50m patient encounters per year.
“We've built on the thesis of building generative models from the very get-go of Corti’s history. We can take those large language models, then we fine-tune them with some different architectural choices to fit our vertical,” says Lars Maaløe, associate professor in machine learning and one of the cofounders of Corti.
GenAI for admin
UK home care platform Cera, which helps elderly patients and their carers manage at-home healthcare, is putting AI to use in simplifying care plan preparation. Cera’s care providers put these together when sending patients home from the hospital. Based on conversations between medical professionals and patients and a house visit, they can take as much as 12 hours in total to put together, according to Cera’s founder and CEO Ben Maruthappu. They’re also usually written by hand.
Cera plans to use AI to convert a conversation recorded on a care assessor’s phone into a care plan automatically, reducing the process to a couple of hours.
OpenAI backer Microsoft is providing the foundation models and applications for Cera, which plans to roll out the service on a few thousand patients in southeast England over the next few months. Foundation models, like GPT-3 and DALL-E, are trained on a broad set of unlabelled data that can be used for different tasks but can be fine-tuned for healthcare purposes.
Cera also plans to introduce an AI-powered task recommendation tool for care assistants this year which will suggest certain treatments and support decisions.
LLM for drug discovery
Another major use case for LLMs in healthcare is drug discovery. Several of these companies exist in Europe; there’s LabGenius and Healx in the UK and Cradle in the Netherlands.
Cradle’s CEO Stef van Grieken previously worked in the product leadership team at Google Research where he was involved in early work on large language models. Now, he’s helping the company improve protein design, which can be used, for example, in discovering new antibodies for pharmaceuticals.
“At Cradle, instead of applying these types of techniques to natural language, we try to use it to understand the language of biology,” van Grieken says. “Instead of ChatGPT, where you give it a prompt, and you get an answer, in our case, you say, ‘I would like to catalyse this chemical reaction’ or ‘I would like to bind to this thing’ or ‘I would like it to be more stable under a higher temperature'."
The proteins that Cradle generates can then be tested by other researchers in their laboratories.
“Maybe one in 1,000 sequences that [humans] try in the lab will do what they want or what they intended at the start,” van Grieken says. “By using these large language models to come up with much better alternatives, the number of experiments that you need to do in order to get to a working product goes down dramatically.”
Healx is using LLMs to come up with new compounds — chemical structures that have the potential to one day become drugs. It does this by feeding the model everything Healx knows about a compound, including both public and proprietary data.
“One compound [we were investigating] was likely to be effective, and we understood which parts of the molecule were aiding treatment — but it was highly toxic,” says Bill Tatsis, a scientist at Healx. “The team used generative AI models to produce the molecule with similar efficacy with less toxicity.”
LabGenius, which optimises antibodies to specifically tackle cancer cells, is now looking into applying AI methods in the earlier drug discovery stages — specifically to find good binders for the drugs, says CTO Leonard Wossnig.
“Good binding molecules are something that we already have methods for finding today, but they are slow. I think gen AI will speed up these solutions,” he says.
The risks of using GenAI in healthcare
As with any new technology, some are worried about LLMs being used broadly in healthcare. Models are often only as good as the data they are trained on.
“Over-reliance [on AI] is a risk,” says Cera’s Maruthappu. “People can become over-reliant on tech, even if they have medical qualifications. That's why it’s important to have controls and checks to make sure people don’t take their hands off the steering wheel.”
Claire Novorol, cofounder and chief medical officer at Ada Health, worries about the bias of data that LLMs are trained on and hallucinations. Hallucinations happen when the AI generates outputs that don’t reflect data inputs, like when AI gives a response that is factually inaccurate.
“These hallucinations are basically providing responses, many of which might be quite appropriate, but some of which are completely inappropriate, completely untrue, completely fabricated but provided with the same level of extreme confidence — obviously, a non-expert would not be able to differentiate between the two,” Novorol says.
Instead of a generative model, Ada has built a proprietary symptom assessment tool over the last 10 years on a probability model, developed in conjunction with a large team of medical doctors.
And although Ada is looking into how GenAI can be used within its triage system and consider using these models within the right setting, with the right training, constraints, validation and evidence, Novorol wouldn’t consider using one of the foundation models, such as GPT, in their current form, since she wouldn’t be confident in how it was built and how truthful it would be.
“Increasingly there will be models that are specifically trained on more and more medical data, and I think a lot of the problems will be reduced. It is not yet known that they can be completely eliminated though. Those hallucinations are something kind of fundamental to the system. It's fundamentally linked to how these systems work and their strengths,” she says.
But according to van Grieken at Cradle, the need for an AI's outputs to be on par with people when it comes to truthfulness isn't always necessary given how poor human understanding is of DNA.
“In biology, we've compared against some of the methods that people are using in science today and we've been able to demonstrate that [our GenAI model] significantly outperforms what humans can do. And obviously, as more data comes in, and as we learn more about this problem, the better our models get,” he says.
And of course, molecules discovered using AI wouldn’t be rolled out straight from the lab to the pharmacy — they would go through rigorous testing and clinical trials before actually making it to market.
In addition, models have improved quickly in the space of healthcare. An early version of Google's medical LLM Medpalm scored less than 60% on a doctor's exam (still a pass) — but a later version got an “expert” mark of 85%.
Slow transformation but overhyped all the same
While generative AI has the potential to “transform” healthcare eventually, adoption of it in the larger healthcare sector will take time.
“In five to ten years, generative AI will be part of day-to-day medical practice, but we’re quite far from that,” Cera’s Maruthappu says. “Healthcare is regulated, which means that as we've seen with digital technologies and the use of data in healthcare in the past, it's not going to be a radical transformation that happens in a year.”
On the other hand, others think generative AI in healthcare is “overhyped”. One is Marta-Gaia Zanchi, partner at healthtech-focused VC Nina Capital, who thinks it's becoming a buzzword for startups in the sector (and in other fields).
“Startups start pitching their generative AI-based solution and they don’t really have a good answer for why they need to use generative AI,” she tells Sifted.
“Applying generative AI without regard to what the right solution is means skipping the most important step in the innovation process and just building elaborate skyscrapers on a very weak foundation,” Zanchi says.