OpenAI has unveiled GPT-5.3-Codex, marking a significant leap forward in AI-powered software development. This new model combines the frontier coding capabilities of GPT-5.2-Codex with the reasoning and professional knowledge of GPT-5.2, resulting in what OpenAI calls "the most capable agentic coding model to date."
Performance Breakthroughs
GPT-5.3-Codex sets new industry benchmarks across multiple evaluation frameworks:
- SWE-Bench Pro: 56.8% success rate on real-world software engineering tasks spanning four programming languages
- Terminal-Bench 2.0: 77.3% performance on terminal skills evaluation
- OSWorld-Verified: 64.7% on agentic computer-use tasks
- Cybersecurity CTF: 77.6% on capture-the-flag challenges
The model achieves these results while using fewer tokens than previous models, enabling developers to build more with the same resources.
Real-World Capabilities
What sets GPT-5.3-Codex apart is its ability to handle long-running, complex tasks that mirror real-world software development. The model can autonomously build complete applications over millions of tokens, iterating and improving based on feedback.
OpenAI demonstrated this capability by having GPT-5.3-Codex build two complete games from scratch—a racing game and a diving game—using only generic prompts like "fix the bug" or "improve the game." The results showcase the model's ability to understand intent, implement functionality, and refine output without constant human intervention.
Interactive Collaboration
Unlike previous coding assistants, GPT-5.3-Codex provides real-time updates and interactive collaboration. The model talks through its decisions, responds to feedback mid-task, and keeps users informed from start to finish. This shift transforms the experience from waiting for a final output to actively steering an intelligent collaborator.
"Much like a colleague, you can steer and interact with GPT-5.3-Codex while it's working, without losing context," OpenAI explains in their announcement.
Used to Build Itself
In a remarkable demonstration of capability, GPT-5.3-Codex was instrumental in creating itself. The Codex team used early versions of the model to:
- Debug its own training infrastructure
- Manage deployment processes
- Diagnose test results and evaluations
- Build data pipelines and visualization tools
- Optimize GPU cluster scaling
Team members report that their jobs have fundamentally changed in just two months, with GPT-5.3-Codex accelerating research, engineering, and product development across OpenAI.
Enhanced Cybersecurity Measures
GPT-5.3-Codex is the first model OpenAI classifies as "High capability" for cybersecurity-related tasks under their Preparedness Framework. The company is implementing comprehensive safety measures, including:
- Trusted Access for Cyber pilot program for defense research
- Expanded beta of Aardvark security research agent
- $10M in API credits for cybersecurity defense work
- Partnership with open-source maintainers for vulnerability scanning
OpenAI is taking a precautionary approach to balance the dual-use nature of cybersecurity capabilities, accelerating defenders while implementing safeguards against misuse.
Availability and Performance
GPT-5.3-Codex is available now with paid ChatGPT plans across the Codex app, CLI, IDE extension, and web interface. The model runs 25% faster than previous versions, thanks to infrastructure improvements and co-design with NVIDIA GB200 NVL72 systems.
API access is being developed with a focus on safe deployment and is expected to be available soon.
Beyond Code Generation
GPT-5.3-Codex represents a shift from pure code generation to comprehensive computer operation. The model supports the full software lifecycle—debugging, deploying, monitoring, documentation, user research, and more. Its capabilities extend beyond software development to general professional knowledge work, including creating presentations, analyzing spreadsheets, and managing complex projects.
This evolution positions Codex not just as a coding assistant, but as a foundation for general-purpose collaboration on technical and professional tasks.
TL;DR
- GPT-5.3-Codex combines frontier coding with advanced reasoning, achieving state-of-the-art results on SWE-Bench Pro (56.8%) and Terminal-Bench 2.0 (77.3%)
- The model provides interactive, real-time collaboration and can autonomously build complex applications over millions of tokens
- OpenAI used early versions to accelerate its own development, debugging training, managing deployment, and building analysis tools
- Enhanced cybersecurity safeguards include Trusted Access for Cyber program and $10M in API credits for defense research
- Available now with paid ChatGPT plans, running 25% faster than previous versions