It’s no secret that AI has stolen tech headlines in 2023. And, as companies of all sizes race to build the next big thing, it’s created an unprecedented demand for access to the hardware used to train AI models — prompting UK Prime Minister Rishi Sunak to set aside £100m in taxpayer money to buy computer chips to help power a national AI supercomputer.
Large language models (LLMs) like OpenAI’s GPT-4 are trained on huge troves of data and run on chips called graphics processing units (GPUs). The largest models can use more than one million hours of GPU processing (known in the AI industry as “compute”).
Now, with startup challengers like Mistral, Stability AI and AlephAlpha training LLMs of their own — in a bid to offer European alternatives to the US’s big tech companies — and a whole host of other startups building smaller, specialised models, more companies than ever need access to GPUs.
So, how does this new economy of compute work, and how is the surge in demand affecting startups?
Ways to get compute
Compute can essentially be accessed in two ways: buying GPUs from chip makers, or paying for rented access to them via cloud providers like Google, Amazon Web Services (AWS) or smaller companies like Silicon Valley-based Lambda.
For startups training large models, cloud providers are the preferred option, due to the high costs of building the infrastructure for a GPU cluster of thousands of chips.
Stability AI is well known to have access to a large number of AWS’s GPUs, and a leaked pitch deck from Mistral, seen by Sifted, stated that the company has “negotiated competitive deals” for renting compute from cloud providers. The company confirmed to Sifted that it is renting compute “for now.”
The equation is more complicated for smaller startups training specialised models, as they have less financial muscle to negotiate attractive deals. One example is BeyondMath, a Cambridge-based company training AI models on physics equations — something that requires far less raw data than a LLM.
Startups like BeyondMath can get access to GPUs through Google and AWS’s startup programmes, explains the company’s cofounder Alan Patterson. These programmes offer around $250k in free compute credits to young AI companies (as long as they have a VC fund on their cap table).
But, once those run out, things get trickier. Patterson says that many other cloud providers are now “maxed out”, meaning there’s no available compute at some for at least a month.
His cofounder Darren Garvey adds that, for companies training smaller models, it can make more sense to buy GPUs outright. That’s partly due to cost and partly due to the risk of not being able to get access in time.
“GPUs on the cloud are so expensive,” he says. “When we're costing up a project, the question is should we just acquire some of these (GPU) boxes for that project, to de-risk it not being available in the cloud? I think the cheaper route for us will be to acquire the hardware.”
The Nvidia monopoly
One reason for the current scarcity and expense of GPUs is that, for AI training, there is an effective monopoly on the market. US-based multinational Nvidia is the global go-to provider for AI training hardware, and earlier this month China ordered $5bn of chips from the company (putting the UK’s order into fairly stark perspective).
This reliance on one company is making it hard for people to get their hands on the best hardware, says Peter Sarlin, CEO and cofounder of Helsinki-based Silo AI. “At the moment you don't even really have access to the materials. Nvidia A100s (one of the company’s most advanced chips) have been really difficult to purchase on the market.”
Silo AI — which is preparing to train a LLM of its own — has circumvented the high demand for Nvidia GPUs (both to purchase and on the cloud) by making use of the European supercomputer LUMI.
LUMI doesn’t run on Nvidia hardware, but uses GPUs from rival provider AMD, meaning that Silo AI has had to build its own bespoke software for AI training. AMD chips are less popular among AI researchers than Nvidia hardware, partly due to the latter’s tech stack that’s regarded as the most well-developed in the industry.
“We've had to spend a lot of effort to actually be able to run LLMs on LUMI, because you don't have all of the software layers that you have with Nvidia-based supercomputers,” Sarlin says. “It has required quite a lot of investment, but our judgement has been that we'd rather do that than operate a supercomputer on our own.”
The founder adds that he does use more standard cloud compute providers for inference (the processing of individual queries based on prompts, once a model has been trained).
“It is a bit crazy”
The chips arms race has prompted some startups only weeks old to raise monster rounds to pay the tens of millions of Euros needed to train a language model. Some investors are now beginning to question this wisdom.
Nathan Benaich, founder of London-based AI fund Air Street Capital, says that raising such large rounds — when much of it will go straight into the pockets of cloud compute providers — can be bad for both the investor and the startup.
“On the VC side, you put in significant cheques into companies — which implicitly drives valuations up, which implicitly reduces your overall multiple on invested capital quite a bit — just so companies can get resources to try and ship a product,” he says.
“For the company itself, it seems like a very coarse or blunt instrument to sell equity, almost dollar for dollar, to get access to compute… It's like using equity to finance CapEx, which is not the way that finance would typically do this.”
Rasmus Rothe, founder of Berlin-based AI investor Merantix, says that using equity finance for compute is also risky, due to the fact that companies will likely not only have to train a single model.
“It is a bit crazy that millions of VC dollars — expensive capital ultimately — are burned on hardware or training runs. You train your model once and in half a year somebody else has a better model and you need to retrain and that money is gone,” he says. “I think you need to think about what's the commercial value you can generate from this and that needs to be very large in order to justify the training run.”
Benaich adds that he sees large investors funnelling money into these kinds of startups partly as a symptom of needing a new capital-hungry sector to place their bets on, after the decline of speedy grocery companies.
“If you raised a megafund as a generalist VC manager and you'd bet on a few themes, and then those themes become very unpopular, then where are you going to put the money?”