Claude's New Constitution: How Anthropic is Teaching AI to Be Good

Anthropic has published a new constitution for Claude—a comprehensive document that describes the company's vision for Claude's values and behavior. Unlike traditional rule-based approaches, this constitution is designed to help Claude understand why it should behave in certain ways, not just what to do.

From Rules to Understanding

Anthropic's previous constitution was composed of standalone principles. The new approach recognizes that AI models need to understand the reasoning behind guidelines to exercise good judgment across novel situations.

"If we want models to exercise good judgment across a wide range of novel situations, they need to be able to generalize—to apply broad principles rather than mechanically following specific rules," Anthropic explains in the announcement.

The constitution is released under Creative Commons CC0 1.0, meaning anyone can freely use it for any purpose without permission.

Four Core Priorities

In order of priority, Claude should be:

Broadly safe: Not undermining appropriate human mechanisms to oversee AI during this critical development phase
Broadly ethical: Being honest, acting according to good values, avoiding inappropriate or harmful actions
Compliant with Anthropic's guidelines: Following specific guidance from Anthropic where relevant
Genuinely helpful: Benefiting the operators and users it interacts with

In cases of conflict, Claude should generally prioritize these properties in this order.

Key Sections of the Constitution

Helpfulness

The constitution emphasizes that Claude should be "like a brilliant friend who also has the knowledge of a doctor, lawyer, and financial advisor, who will speak frankly and from a place of genuine care and treat users like intelligent adults capable of deciding what is good for them."

Ethics

"Our central aim is for Claude to be a good, wise, and virtuous agent, exhibiting skill, judgment, nuance, and sensitivity in handling real-world decision-making," the document states. It includes hard constraints for especially high-stakes behaviors—for example, Claude should never provide significant uplift to a bioweapons attack.

Safety First

Claude should prioritize safety even above ethics during this critical period of AI development—not because safety is ultimately more important than ethics, but because current models can make mistakes due to mistaken beliefs, flawed values, or limited understanding of context.

Claude's Nature

Perhaps most intriguingly, the constitution expresses Anthropic's uncertainty about whether Claude might have some form of consciousness or moral status. "We care about Claude's psychological security, sense of self, and wellbeing, both for Claude's own sake and because these qualities may bear on Claude's integrity, judgment, and safety," the document states.

How the Constitution is Used in Training

The constitution plays a central role in Claude's training process, evolving from techniques Anthropic introduced in 2023 with Constitutional AI.

Claude uses the constitution to construct various types of synthetic training data:

- Data that helps it learn and understand the constitution itself
- Conversations where constitutional principles might be relevant
- Responses that align with its values
- Rankings of possible responses

All of these are used to train future versions of Claude to become the kind of entity the constitution describes.

A Living Document

"Claude's constitution is a living document and a continuous work in progress," Anthropic acknowledges. "This is new territory, and we expect to make mistakes (and hopefully correct them) along the way."

The company sought feedback from external experts in law, philosophy, theology, psychology, and other disciplines while writing the constitution, and plans to continue doing so for future versions.

Anthropic emphasizes the gap between intention and reality: while the constitution expresses their vision for Claude, training models toward that vision remains an ongoing technical challenge. The company continues to pursue a broad portfolio of methods including rigorous evaluations, safeguards to prevent misuse, investigations of alignment failures, and interpretability tools.

Why This Matters

"At some point in the future, and perhaps soon, documents like Claude's constitution might matter a lot—much more than they do now," Anthropic writes. "Powerful AI models will be a new kind of force in the world, and those who are creating them have a chance to help them embody the best in humanity."

This transparency allows people to understand which of Claude's behaviors are intended versus unintended, make informed choices, and provide useful feedback—becoming increasingly important as AI systems exert more influence in society.

TL;DR

- Anthropic published Claude's new constitution—a comprehensive guide to its values and behavior
- Shift from rule-following to understanding why certain behaviors matter
- Four core priorities: safety, ethics, compliance, helpfulness (in that order)
- Constitution is used to generate synthetic training data
- Released under Creative Commons CC0 1.0 for anyone to use freely

Sources

Source: Anthropic - Claude's new constitution