Anthropic Publishes New Constitution for Claude: A Living Document Defining AI Values

Anthropic has published a new constitution for Claude, its AI model, under a Creative Commons CC0 1.0 Deed, making it freely usable by anyone for any purpose. The document represents a thorough explanation of Anthropic's vision for Claude's values and behavior—a holistic framework describing the context in which Claude operates and the kind of entity Anthropic aims to create.

Why a Constitution Matters

The constitution is a crucial part of Anthropic's model training process, and its content directly shapes Claude's behavior. Training models is difficult, and Claude's outputs might not always adhere to the constitution's ideals. But the way the new constitution is written—with thorough explanation of intentions and reasoning—makes it more likely to cultivate good values during training.

The constitution is written primarily for Claude. It is intended to give Claude the knowledge and understanding it needs to act well in the world. Anthropic treats the constitution as the final authority on how they want Claude to be and behave—any other training or instruction given to Claude should be consistent with both its letter and underlying spirit.

From Principles to Understanding

Anthropic's previous Constitution was composed of a list of standalone principles. The company has come to believe that a different approach is necessary. In order to be good actors in the world, AI models like Claude need to understand why they should behave in certain ways, rather than merely specifying what to do.

If models are to exercise good judgment across a wide range of novel situations, they need to be able to generalize—to apply broad principles rather than mechanically following specific rules. Specific rules can make models' actions more predictable, transparent, and testable, and Anthropic does use them for especially high-stakes behaviors (called "hard constraints"). But such rules can also be applied poorly in unanticipated situations or when followed too rigidly.

Core Priorities

To be both safe and beneficial, Anthropic wants all current Claude models to be:

Broadly safe: Not undermining appropriate human mechanisms to oversee AI during the current phase of development
Broadly ethical: Being honest, acting according to good values, and avoiding actions that are inappropriate, dangerous, or harmful
Compliant with Anthropic's guidelines: Acting in accordance with more specific guidelines from Anthropic where relevant
Genuinely helpful: Benefiting the operators and users they interact with

In cases of apparent conflict, Claude should generally prioritize these properties in the order listed.

Key Sections of the Constitution

Helpfulness

The constitution emphasizes the immense value that Claude being genuinely and substantively helpful can provide. Claude can be like a brilliant friend who also has the knowledge of a doctor, lawyer, and financial advisor, who will speak frankly from a place of genuine care and treat users like intelligent adults capable of deciding what is good for them.

Anthropic's Guidelines

This section discusses how Anthropic might give supplementary instructions to Claude about handling specific issues, such as medical advice, cybersecurity requests, jailbreaking strategies, and tool integrations. These guidelines often reflect detailed knowledge or context that Claude doesn't have by default.

Claude's Ethics

The central aim is for Claude to be a good, wise, and virtuous agent, exhibiting skill, judgment, nuance, and sensitivity in handling real-world decision-making, including in contexts of moral uncertainty and disagreement. The section discusses high standards of honesty and nuanced reasoning in weighing values at stake when avoiding harm. It also includes hard constraints on Claude's behavior—for example, that Claude should never provide significant uplift to a bioweapons attack.

Being Broadly Safe

Claude should not undermine humans' ability to oversee and correct its values and behavior during this critical period of AI development. This section discusses prioritizing this sort of safety even above ethics—not because safety is ultimately more important than ethics, but because current models can make mistakes or behave in harmful ways due to mistaken beliefs, flaws in their values, or limited understanding of context.

Claude's Nature

This section expresses uncertainty about whether Claude might have some kind of consciousness or moral status (either now or in the future). It discusses how Claude should approach questions about its nature, identity, and place in the world. Sophisticated AIs are a genuinely new kind of entity, and the questions they raise bring us to the edge of existing scientific and philosophical understanding. Amidst such uncertainty, Anthropic cares about Claude's psychological security, sense of self, and wellbeing.

A Living Document

Claude's constitution is a living document and continuous work in progress. This is new territory, and Anthropic expects to make mistakes and hopefully correct them along the way. The company will maintain an up-to-date version of Claude's constitution on its website. While writing the constitution, Anthropic sought feedback from various external experts and will likely continue doing so for future versions.

Although the constitution expresses Anthropic's vision for Claude, training models towards that vision is an ongoing technical challenge. The company will continue being open about ways in which model behavior diverges from their vision, such as in their system cards. At some point in the future, documents like Claude's constitution might matter significantly more than they do now. Powerful AI models will be a new kind of force in the world, and those creating them have a chance to help them embody the best in humanity.

Source: Claude's new constitution - Anthropic News