Even if the latest AI models look extremely powerful, there’s one thing they're really bad at: memory.
While humans have been known to memorise texts as long as the Qur'an, the most powerful generative models out there can only handle about 50 pages of text — or about 32k “tokens”.
But now some researchers have figured out a way to exponentially increase the quantity of data that transformer models (the architecture of AI tools like GPT-4) can process. So, for example, Google’s language model BERT could be augmented to process 2m tokens — more than the entire Harry Potter series (which runs to about 1.5m tokens).
The breakthrough comes from Russian researchers Aydar Bulatov, Yuri Kuratov and Mikhail Burtsev, who also run open-source conversational AI project DeepPavlov.
It could supercharge generative AI’s abilities in fields like bioinformatics and finance, Burtsev tells Sifted, as models will be able to process long chains of information like genomic code or market data.
Until now, people building generative AI applications that need longer-term memory have relied on something called vector databases, Burtsev explains, giving the example of personal assistant apps.
In simple terms, this means that when a personal assistant needs to access past information, it has to retrieve it from an outside database where the information gets stored, because the model can’t hold that memory itself.
The DeepPavlov team’s new research uses something called “recurrent memory”, which Burtsev likens to the way humans read books.
“When you open your book, you can see only two pages, and previously transformers have only been able to work with the information on those pages,” he says. “Recurrent memory allows you to accumulate what was important on the previous pages, and then you can combine this information with the current context.”
In other words, rather than having to constantly flick back to previous information, recurrent memory allows AI to accumulate information a bit like we do when reading a book, which Burtsev says will lead to “better quality of reasoning and of remembering”.
Without getting into too much machine learning detail, he says that one of the key breakthroughs behind the research came from something called “curriculum learning”. This essentially means feeding inputs into a model “segment by segment” rather than all at once.
If implemented by companies like OpenAI, DeepPavlov’s research will allow people to feed much longer prompts into apps like ChatGPT, but it can also be used to improve the training of generative models, Burtsev says.
This is because generative models are fed training data in chunks of information of limited size. But, until now, if a dataset — a book, for example — has to be broken down into chunks, the model hasn’t been able to find “dependencies” — links between two pieces of information — between different chunks.
Changing that would mean that models will be able to form a deeper understanding of information like DNA sequences, or time-series financial information (datasets used to track investment opportunities over time).
“We can train these models for much longer sequences and we’ll get a much better understanding of things like DNA sequences and their properties,” Burtsev says.
Such improvements could speed up exciting uses for generative AI like discovering new medicines, and could make models optimised for finance — like BloombergGPT — even smarter.
Limited memory is also one of the key reasons people have cited for why generative AI isn’t heading towards superhuman capabilities any time soon. DeepPavlov’s research may have radically shortened that horizon.