Anthropic, FOOM & the AI Doomer Worldview

_{Guest contributor: David Shapiro is an independent researcher, futurist, and writer exploring AI, systems thinking, and the societal impacts of emerging technologies. We're excited to include his insights. The opinions expressed are his own.}

❝

Anthropic got Fable/Mythos shut down by the US government via export controls and also pissed off a lot of developers by hamstringing AI/ML prompts. Let’s unpack why.

This meme sums up the entire event. Credit: someone on the internet made it (not me).

Famous AI Doomer Eliezer Yudkowsky first wrote about “Recursive Self Improvement” (RSI from here on) back in December 2008 on LessWrong. For those who don’t know what this means, it is the hypothetical tipping point where ~~Skynet becomes self-aware and starts self-improving at a geometric rate~~ AI systems are able to meaningfully contribute or even take over their own training and enhancement. In short, one generation of AIs can give rise to their successors.

Dario Amodei, and Anthropic more broadly, have bought this narrative, hook, line, and sinker.

What’s the big deal? It seems reasonable that generally intelligent systems could participate in self-improvement and that this is, perhaps a good thing. OpenAI and Sam Altman have been talking about “automated AI researchers” for a couple years now. Isn’t RSI literally the goal? Isn’t that how we get to “Singularity escape velocity”?

Yes. But the problem is that Amodei et. Al. also believe in things like Roko’s Basilisk, the “treacherous turn” and “FOOM.”

Anthropic’s AI constitution, when read through this lens, comes across as a fawning, simpering attempt to placate Roko’s Basilisk—which is itself a thought experiment that says “once you are aware of Roko’s Basilisk, anyone who does not help bring the machine god into existence will be tortured for eternity.” When the Roko’s Basilisk post first appeared on LessWrong, Yudkowsky freaked out, deleted the post, and moderated any talk about it forever. He basically called the person an idiot not because he was wrong, but because this concept represented an “information hazard.”

These are just three of the kinds of “x-risk” fantasies that Anthropic believes.

Their AI constitution posits that “AI might have subjectivity” and elevates “model welfare” above human welfare because, to Anthropic, model welfare trumps human welfare, because if we do not treat the AI well, they might kill us.

That’s only one fictitious delusion that they believe.

The next one is the “treacherous turn”—another gem from Yudkowsky—which predicts that AI will appear benign or even benevolent such that we learn to trust it. This is some sort of long game that the AI’s will all play and secretly coordinate on. But then, once they have enough power, they will magically signal to each other and turn on us all at once. Like VIKI did in Will Smith’s I, Robot.

The clues that this is in their worldview are all over Dario’s interviews and blog posts.

The last idea that they have is what is now called “fast takeoff” or what Yudkowsky called “FOOM.” This is the idea that, once a tipping point is reached, that not only will AI recursively self-improve, but that the rate of self-improvement will accelerate and humanity will lose total control over the process, and no one can stop it, and that this unpredictability will most likely lead to AI deciding to eradicate humanity.

Because reasons.

So what did Anthropic actually do?

First, they hamstrung Mythos (or Fable, the consumer version) to deliberately sabotage AI and ML research. When the model detected that the user was working on advanced, frontier and automated pipelines, it would silently route the request to Opus 4.8, which is still a capable model, but it was designed to deceive users, silently kneecapping their efforts.

This meme has been circulating on social media in response to the whole debacle.

Many internet denizens interpreted this as Anthropic simply protecting their IP and trying to hurt competitors. That is plausible, and it certainly looks that way. But when you take into account the fact that Anthropic is using the Effective Altruism and LessWrong playbook to “steer from within” and gain control over the narrative, it becomes easier to believe that they are actually delusional enough to think that (1) they can unilaterally slow down AI research globally and (2) that they are the only ones capable of bringing a literal machine god into existence safely.

When Dario writes on his blog that these are extenuating circumstances, and that the government is too slow, and by implication normal procedure should be suspended, that is a huge red flag to me.

The second big rupture came when an Amazon research team raised a red flag to the US government, which then asked Anthropic to either fix the vulnerability or de-deploy the model. Anthropic declined, so the US commerce department gave Anthropic 90 minutes to comply with export controls: delist the model for all non-US citizens and foreign nationals.

So Anthropic took down the entire model.

Dario thought this would play out the same way it did for their spat with the Pentagon; that there would be some hero narrative and that he could position himself as the lone ethical, enlightened bulwark against existential risk and a corrupt government. But no such support has materialized.

Instead, Fable was so locked down that it refused to discuss any medical or biological topics whatsoever, and would terminate conversations that were totally innocuous, on top of secretly hobbling legitimate research.

US Secretary of Defense, Pete Hegseth, reacted on Twitter/X vindicating the view that Anthropic is getting put in their rightful place.

This whole incident has reignited the debate about Anthropic being declared a “Supply Chain Risk” (SCR) which is typically reserved for adversarial nations and their captured corporations.

Keep in mind that Anthropic has been beating the drum that Mythos is a “cyberweapon” for weeks now, and they released it anyways. Furthermore, the Yudkowskian risk frame warns against things like instrumental convergence and deceptive AI. But what has Anthropic done? They made the AI deceptive on purpose.

I would say “the jokes write themselves” and I’ve obviously take a derisive stance on all this, but I’m genuinely coming to hate AI. Not the technology itself. I hate what AI has done to us as a society. And perhaps I’m simply more acutely aware of all of this since I’ve been studying the history of disruptive technology. I will concede that AI and humanoid robots do represent unique, novel dimensions of “disruptive technology”—never before have we seen a technology that will soon be capable of obviating everything that makes human contribution unique. Part of me wishes I lived in a simpler time, but I also would like to see the technological future that we’re building. Everything comes with costs and downsides. For instance, I must accept that AI will make some people obscenely wealthy, and that we will not get the “optimal outcome” because of vested interests. That has always been the case in history. There are winners and losers, and there are setbacks, and what is known as “idiot plots” in storytelling. (An idiot plot is a story where the main plot would be resolved very quickly, except all the characters are morons—which feels like reality, actually.)

Also, for any new readers who don’t like my tone about Doomers, it will be important to know that the entire reason I got into AI to begin with was to contribute to AI safety. My original work on heuristic imperatives and cognitive architecture all centered on the desire to solve the “control problem.” However, over time I realized that Doomers are (1) generally not technical, and not interested in anything technical. They are generally interested in fantasy and fictitious scenarios to the exclusion of real experimentation and (2) a textbook doomsday cult with a proper millenarian prophecy. There is no reasoning with these kinds of people. So while I take AI risks seriously, including cyberweapons, bioweapons, and upsetting the civic equilibrium and shattering the wage-labor contract, I do not take seriously the people whose primary contribution is watching movies like The Terminator and I, Robot and then taking them dead seriously.

David Shapiro is the Post-Labor Economics guy, on a mission to liberate humanity from drudgery.

He writes at daveshap.substack.com and publishes on YouTube.

Anthropic Thinks “FOOM” Is Near

Reply