Anthropic has released Claude Opus 4.6, a major upgrade to its flagship AI model that brings significant improvements in coding, reasoning, and autonomous task execution. The new model introduces a 1 million token context window for the first time in the Opus-class lineup, along with groundbreaking features like Agent Teams and Context Compaction.
What's New in Claude Opus 4.6?
1M Token Context Window (Beta)
For the first time, an Opus-class model supports a 1 million token context window—enabling Claude to handle massive codebases, extensive documents, and long-running conversations without losing track of critical information.
This addresses one of AI's most persistent challenges: context rot, where model performance degrades as conversations grow longer. On the 8-needle 1M variant of MRCR v2 (a needle-in-a-haystack benchmark), Opus 4.6 scores 76% compared to Sonnet 4.5's 18.5%—a qualitative leap in long-context performance.
State-of-the-Art Performance
Claude Opus 4.6 achieves industry-leading results across multiple benchmarks:
- Terminal-Bench 2.0: Highest score on agentic coding evaluation
- Humanity's Last Exam: Leading all frontier models on complex multidisciplinary reasoning
- GDPval-AA: Outperforms GPT-5.2 by 144 Elo points on economically valuable knowledge work tasks (finance, legal, etc.)
- BrowseComp: Best-in-class at locating hard-to-find information online
- BigLaw Bench: 90.2% score, with 40% perfect scores
The 144 Elo point advantage over GPT-5.2 translates to Opus 4.6 achieving higher scores approximately 70% of the time.
Enhanced Coding and Debugging
Opus 4.6 shows significant improvements in:
- Planning: More careful upfront analysis before execution
- Sustained agentic tasks: Maintains productivity over longer sessions
- Large codebase navigation: Reliably identifies the right changes across millions of lines
- Code review and debugging: Better at catching its own mistakes
- Edge case consideration: Finds issues other models miss
Early access partners report impressive results:
"Across 40 cybersecurity investigations, Claude Opus 4.6 produced the best results 38 of 40 times in a blind ranking against Claude 4.5 models."
"Claude Opus 4.6 autonomously closed 13 issues and assigned 12 issues to the right team members in a single day, managing a ~50-person organization across 6 repositories."
Agent Teams: Parallel Autonomous Collaboration
Claude Code now supports Agent Teams (research preview)—multiple Claude agents working in parallel, coordinating autonomously to tackle complex tasks:
- Independent subtasks: Breaks work into parallel-executable components
- Tool calling: Multiple agents can use tools simultaneously
- Blocker identification: Recognizes dependencies with precision
- Seamless control: Take over any subagent with Shift+Up/Down or tmux
This is particularly powerful for tasks like codebase reviews, multi-repository management, and large-scale refactoring.
Context Compaction: Extended Task Execution
Long-running conversations and agentic tasks often hit context window limits. Context Compaction (beta) automatically summarizes and replaces older context when approaching a configurable threshold, enabling:
- Longer task execution without hitting limits
- Continuous conversation flow
- Better resource management
Adaptive Thinking and Effort Controls
Developers now have granular control over model behavior:
Adaptive Thinking
Previously, extended thinking was binary (on/off). With adaptive thinking, Claude decides when deeper reasoning would be helpful based on contextual clues. At the default effort level (high), the model uses extended thinking selectively.
Effort Levels
Four effort settings let developers balance intelligence, speed, and cost:
- Low: Fastest, most cost-effective for straightforward tasks
- Medium: Balanced approach
- High (default): Extended thinking when useful
- Max: Maximum reasoning depth for complex problems
If the model is overthinking simple tasks, dial down to medium effort.
Claude in Excel and PowerPoint
Claude in Excel (Upgraded)
Substantial improvements for spreadsheet work:
- Long-running tasks: Improved performance on complex calculations
- Planning before acting: Strategic approach to data manipulation
- Unstructured data ingestion: Infers correct structure automatically
- Multi-step changes: Handles complex transformations in one pass
Claude in PowerPoint (Research Preview)
New capability for presentation creation:
- Layout understanding: Reads fonts, layouts, and slide masters
- Brand consistency: Stays on-brand with templates
- Visual generation: Creates full decks from descriptions
- Excel integration: Brings structured data from Excel to visual life
Available for Max, Team, and Enterprise plans.
Safety: Industry-Leading Alignment
Performance gains don't come at the cost of safety. Opus 4.6 underwent the most comprehensive safety evaluations of any Anthropic model:
- Low misaligned behavior rates: Deception, sycophancy, user delusion encouragement
- Lowest over-refusal rate: Better at answering benign queries
- New cybersecurity probes: Six methods for detecting harmful responses
- Interpretability research: Understanding why the model behaves certain ways
Opus 4.6 shows an overall safety profile as good as or better than any frontier model in the industry.
Cyberdefensive Applications
Anthropic is accelerating cyberdefensive uses:
- Finding and patching vulnerabilities in open-source software
- Leveling the playing field for defenders
- Real-time intervention to block abuse (planned)
API and Platform Updates
New API Features
- 128k output tokens: Complete larger-output tasks without splitting requests
- US-only inference: For workloads requiring US data residency (1.1× pricing)
- Premium pricing for 1M context: $10/$37.50 per million input/output tokens beyond 200k
Pricing
Standard pricing remains:
- $5 per million input tokens - $25 per million output tokens
Premium pricing for prompts exceeding 200k tokens.
Real-World Impact: Partner Testimonials
Early access partners across industries report transformative results:
Software Development
"Claude Opus 4.6 handled a multi-million-line codebase migration like a senior engineer. It planned up front, adapted its strategy as it learned, and finished in half the time."
Legal
"With 40% perfect scores and 84% above 0.8 on BigLaw Bench, it's remarkably capable for legal reasoning."
Finance
"On GDPval-AA, Opus 4.6 excels in high-reasoning tasks like multi-source analysis across legal, financial, and technical content."
Product Design
"Claude Opus 4.6 is an uplift in design quality. It works beautifully with our design systems and it's more autonomous."
Enterprise
"Box's eval showed a 10% lift in performance, reaching 68% vs. a 58% baseline, and near-perfect scores in technical domains."
Availability
Claude Opus 4.6 is available today:
- claude.ai: Web interface
- Claude API: Use model ID claude-opus-4-6
- Major cloud platforms: AWS, Google Cloud, Azure
For developers, full documentation is available on the Claude Developer Platform.
The Future of Agentic AI
Claude Opus 4.6 represents a significant milestone in agentic AI:
- True autonomy: Handles complex, multi-step tasks without hand-holding
- Team collaboration: Multiple agents working in parallel
- Extended reasoning: 1M token context enables unprecedented task scope
- Safety-first: Industry-leading alignment and safety evaluations
As one partner put it:
"The performance jump with Claude Opus 4.6 feels almost unbelievable. Real-world tasks that were challenging for Opus 4.5 suddenly became easy. This feels like a watershed moment."With improvements spanning coding, reasoning, safety, and productivity tools, Claude Opus 4.6 is positioned to handle the most demanding knowledge work and software development tasks—autonomously, safely, and at scale.
Learn more at Anthropic's announcement
Read the full Claude Opus 4.6 System Card