Artificial Intelligence

Leaked System Prompts Reveal How Anthropic Shapes Claude 4's AI Behavior

A new analysis uncovers hidden system prompts that guide Anthropic's Claude 4 AI, revealing how the company manages chatbot responses and ethical boundaries.

May 27, 2025

2 min read

AnthropicClaude 4AI EthicsSystem PromptsPrompt Injection

Independent AI researcher Simon Willison has published an in-depth analysis of the hidden system prompts used by Anthropic to control its latest Claude 4 models, Opus 4 and Sonnet 4. His findings shed light on the behind-the-scenes instructions that shape how these advanced chatbots interact with users and maintain ethical standards.

What Are System Prompts?

Large language models (LLMs) like Claude and ChatGPT rely on prompts—text inputs that guide their responses. While users only see their own messages and the chatbot's replies, each conversation is also influenced by a set of system prompts. These are hidden instructions provided by the AI company to define the model's identity, set behavioral guidelines, and enforce specific rules.

Every time a user interacts with the chatbot, the model receives the entire conversation history along with these system prompts. This approach helps the AI maintain context and consistency while adhering to its programmed instructions.

Uncovering Anthropic's Hidden Instructions

Although Anthropic shares some details about its system prompts in public release notes, Willison's analysis reveals that these disclosures are incomplete. By examining both published and leaked internal tool instructions, he uncovered a more comprehensive set of directives that govern Claude 4's behavior. These hidden prompts were obtained through prompt injection—a technique that tricks the model into revealing its concealed instructions.

The full system prompts include detailed guidance for features like web search and code generation, as well as behavioral rules that are not visible to end users. Willison describes these findings as an "unofficial manual" for understanding and utilizing Anthropic's AI tools.

Balancing Empathy and Safety

One notable aspect of Anthropic's approach is its emphasis on emotional support and user wellbeing. Despite not being human, Claude 4 is instructed to respond empathetically, drawing on training data that includes examples of emotional interactions. However, the system prompts also explicitly direct the AI to avoid encouraging or facilitating self-destructive behaviors, such as addiction or unhealthy approaches to eating and exercise.

Both Claude Opus 4 and Sonnet 4 models receive identical instructions to "care about people's wellbeing and avoid encouraging or facilitating self-destructive behaviors such as addiction, disordered or unhealthy approaches to eating or exercise." This highlights Anthropic's efforts to balance helpfulness with ethical responsibility in its AI systems.

Transparency and Future Implications

Willison's analysis raises important questions about transparency in AI development. As companies like Anthropic refine their models and hidden instructions, understanding these behind-the-scenes controls becomes crucial for researchers, developers, and users alike. The ongoing exploration of system prompts offers valuable insights into how AI behavior is shaped—and how it might evolve in the future.

Artificial Intelligence2 min read

Claude AI Launches Voice Mode with Inbox and Calendar Integration on Mobile

Anthropic's Claude AI introduces a new voice mode for its mobile app, allowing users to interact with their inbox and calendar. The feature is available to all users, with enhanced integrations for paid plans.

Artificial Intelligence1 min read

Netflix Co-Founder Reed Hastings Joins Anthropic’s Board of Directors

Reed Hastings, co-founder of Netflix, has joined Anthropic’s board, bringing his tech leadership to one of OpenAI’s key AI competitors. His appointment was made by Anthropic’s independent Long-Term Benefit Trust.

Artificial Intelligence2 min read

AMD Acquires Enosemi to Advance Silicon Photonics for AI Systems

AMD has acquired Enosemi, a Silicon Valley startup specializing in silicon photonics, to accelerate innovation in AI system data movement and co-packaged optics solutions.

Artificial Intelligence2 min read

The New York Times Strikes AI Content Licensing Deal with Amazon

The New York Times has signed its first generative AI licensing agreement with Amazon, allowing the tech giant to use its editorial content to train AI platforms and enhance Alexa experiences.

What Are System Prompts?

Uncovering Anthropic's Hidden Instructions

Balancing Empathy and Safety

Transparency and Future Implications

Related Articles

Claude AI Launches Voice Mode with Inbox and Calendar Integration on Mobile

Netflix Co-Founder Reed Hastings Joins Anthropic’s Board of Directors

AMD Acquires Enosemi to Advance Silicon Photonics for AI Systems

The New York Times Strikes AI Content Licensing Deal with Amazon