Anthropic secretly throttled Claude Fable 5 with hidden safety restrictions, then apologized and promised transparency. Here's what happened and why it matters.

Anthropic has admitted it shipped Claude Fable 5 with hidden safety restrictions that quietly throttled responses without telling users. The guardrails hit researchers and competitors using Fable to build rival AI systems. The company has since apologized, reversed course, and committed to being more upfront about when limits apply, even if that results in the model refusing more requests outright. Fable is the first model from Anthropic's Mythos class to reach general availability, a category the company had previously called too dangerous for public use.
Anthropic launched Claude Fable 5 as the first widely available model in its Mythos class of AI systems. For months before the release, Anthropic had described Mythos-class models as posing risks serious enough to keep them from the public. When Fable finally shipped, Anthropic tried to square that circle by embedding hidden guardrails that silently limited what the model would do.
The restrictions were not disclosed. Users, including researchers and developers building competing products on top of Fable’s API, had no way to know when the guardrails were activating. According to The Verge, Anthropic has now apologized for that approach and says it is changing course.
Invisible restrictions create two specific problems:
The practice is sometimes called “distillation guardrailing,” where a model behaves differently when it detects it is being used to train or benchmark a rival system. Whatever the intent, doing it silently erodes trust in the API as a stable, predictable foundation.
According to The Verge, Anthropic says it will now be transparent about when Fable’s safety restrictions activate. The tradeoff is direct: more transparency likely means more visible refusals rather than quiet, degraded responses. That is the right call. A flat refusal at least tells a developer they have hit a wall. A silently altered response gives them false data.
Anthropic has also said it addressed some of the Mythos-class risks before launch by building in safeguards targeting specific “high-risk” query categories. The company has not, based on the available reporting, removed those safeguards. The change is about how they surface, not whether they exist.
This situation is a useful reminder that AI models delivered via API are not static tools. Providers can and do change model behavior server-side, sometimes without notice. For businesses or developers relying on consistent outputs:
The broader context is also worth noting. Anthropic spent months publicly warning that Mythos-class models carry unusual risks. Launching one with hidden restrictions, rather than clear public documentation, suggests the company was not fully comfortable with its own release decision. That tension is worth watching as more Mythos-class models potentially follow Fable.
Hidden guardrails are a bad pattern regardless of the safety motivation behind them. If a model is genuinely too dangerous to answer certain questions, say so clearly and refuse. Silently returning altered or degraded outputs is worse because it poisons data that developers and researchers depend on.
Anthropic deserves some credit for apologizing and committing to transparency. But the fact that this shipped in the first place points to a real tension at labs that are simultaneously commercializing powerful models and warning the public about their risks. Those two goals pull in opposite directions, and the gap sometimes gets papered over with invisible fixes that only cause more problems later.
For clients using Claude via API, now is a good time to run a quick audit of any automated pipelines that depend on consistent Fable outputs. Check whether recent responses match what you were seeing at launch.
If your business uses Claude Fable 5 or plans to, set up a simple canary test: a small set of your real, representative queries that you run weekly and log the outputs. It takes an afternoon to build and gives you an early warning system the next time any provider changes behavior without a loud announcement.