Interview

March 29, 2023

'We are super, super fucked': Meet the man trying to stop an AI apocalypse

Connor Leahy reverse-engineered GPT-2 in his bedroom — and what he found scared him. Now, his startup Conjecture is trying to make AI safe


Tim Smith

6 min read

Sometimes it takes a maverick to stand up to the power of big corporations. In the case of then-24-year-old self-taught coder Connor Leahy it took “a bunch of Ritalin” and two weeks of forced seclusion in a dorm room. 

His goal? To reverse-engineer OpenAI’s latest large language model (LLM) in 2019 to work out what was going on under the hood.

This bootleg experiment marked the beginning of a journey that’s led him to launching his own startup, Conjecture, which is backed by some of the world’s most influential technologists. He’s focusing on AI alignment, or the task of making machine learning models controllable, and he makes no bones about the risks.

Advertisement

“If they [AI models] just get more and more powerful, without getting more controllable, we are super, super fucked. I will be very clear here. And be ‘we’ I mean all of us,” he says.

If Leahy’s to be believed, we’re currently all passengers on a Sam Altman-driven locomotive that’s accelerating into the blackness. Somewhere ahead lies a precipice — the point where machine can outsmart human — that we won’t see until we’ve careered over it. Conjecture is frantically working to reroute the rails.

The alignment problem

Leahy isn’t alone in his concerns. Conjecture — which he founded in 2022 — has backing from investors including GitHub’s former CEO Nat Friedman, a former machine learning director at Apple, Daniel Gross, Tesla’s former head of AI Andrej Karpathy (who also worked as a researcher at DeepMind and OpenAI) and Stripe founders Patrick and John Collison.

While Leahy doesn’t believe it’s likely that GPT-4 represents an existential threat to humanity, he does say we could be dealing with “godlike-level AI” within five years and, perhaps most scarily, we won’t see it until it’s too late.

“These are black-box neural networks. Who knows what's inside of them? We don't know what they're thinking and we do not know how they work,” he explains. “If you keep building smarter systems, at some point you will have a system that can and will break out.”

Leahy speculates that, were a super-intelligent AI system to break out, it could then start secretly running on its own servers and improving itself and amassing its own financial resources.

“Once we have systems that are as smart as humans, that also means they can do research. That means they can improve themselves,” he says. “So the thing can just run on a server somewhere, write some code, maybe gather some bitcoin and then it could buy some more servers.”

At this point, Leahy says an AI could potentially do anything from trying to build an army of killer drones to convincing different countries to go to war with each other. 

In short, the risk is essentially unfathomable.

How to stop super-human AI

Leahy says that Conjecture is currently working on something called AI “boundedness”. This avenue of research focuses on building AI models that humans know for certain what they can and can't do ahead of time.

But he says that he has no certainty that it will work, and explains that it’s nearly impossible to code a non-mathematical idea like “benevolence” into an AI system. Furthermore, it’s not as simple as training a system about what are good and bad actions.

Advertisement

“Let's say your model threatens the user, so you give it a thumbs down. This sends the model at least two signals. Signal number one: stop threatening users. Signal number two: don't get caught threatening users.”

As well as working on this very difficult research problem, Conjecture is building commercialisable tools — such as an AI transcription software — to help generate cash. Eventually though, Leahy believes that controllable AI will be a big money maker.

“Let's say I can offer you a GPT-4 that guaranteed never does anything bad — that would be the best product,” he says.

Leahy adds that while OpenAI says it does care about AI alignment, the pace at which it's releasing stronger models isn’t allowing time for researchers to understand them and make them safe.

“GPT-4 should never have been released when it was,” he says. 

Sifted reached out to OpenAI for comment and the company shared links to pages outlining its approach to alignment and safety.

Where it all began

For Leahy, the writing has been on the wall ever since he first got his hands on OpenAI’s GPT-2 in 2019 (“Usually it made no sense, but there was something growing, something emergent,” he says).

Leahy says he quickly saw that as these LLMs became bigger, they would become more powerful — for the first time he could draw a line to a future where AI could be superhuman.

Leahy and his friends instantly started playing around with the model; they had a hackathon, they created a “cult” and “queried it as [their god]”. 

Soon, Leahy became frustrated that he only had access to a small version of the model — as the full version hadn’t been publicly released. He took matters into his own hands — hence the Ritalin and the two weeks shut inside. 

EleutherAI

His attempt to reverse-engineer GPT-2 sowed the seed for what would later become EleutherAI — the open source community behind some of the most-downloaded GPT-3-style models on AI platform Hugging Face.

He describes how during the Covid summer of 2020, “bored to tears and depressed” while stuck at his parents' place, he started plotting on a machine learning Discord server.

“GPT-3 was released and it was mind-blowing,” he says, “On a whim, I was like, ‘Hey, guys, let’s give this a shot like the good old times.’ So me and two other guys, Leo (Gao) and Sid (Black) started working on our own GPT-3, and the rest is history.”

Leahy says that the EleutherAI community was only able to achieve what it did thanks to “a small number of people working themselves to the bone”, and that the motivation was always to allow people to research and understand powerful LLMs.

“We said, ‘The risks are manageable at this moment, but understanding them is super important because in the future we expect this technology to scale to very dangerous technology,’” he explains.

Today, he says that the rapid acceleration of AI models that have led to GPT-4 highlights the big disbalance in the number of people working on powerful AI, versus those working on safe AI.

“Currently things are really bad — there are thousands of people and billions of dollars working on making these things stronger. There are less than 100 people in the entire world working on the control problem. Which is insane,” he says.

“The actions that people such as Sam Altman take are obviously accelerationist in nature, based on justifications that speeding ahead on AI at the current pace is acceptable and even desirable. I disagree with this.” 

Tim Smith

Tim Smith is news editor at Sifted. He covers deeptech and AI, and produces Startup Europe — The Sifted Podcast . Follow him on X and LinkedIn