Introducing GPT-5.3-Codex: The Most Capable Agentic Coding Model

OpenAI has unveiled GPT-5.3-Codex, marking a significant milestone in the evolution of AI-assisted software development. This new model represents the most capable agentic coding system to date, combining frontier coding performance with advanced reasoning capabilities to enable autonomous work on complex, long-running development tasks. Remarkably, GPT-5.3-Codex is also the first OpenAI model that was instrumental in creating itself.

A New Paradigm in AI-Assisted Development

GPT-5.3-Codex advances both the frontier coding performance of GPT-5.2-Codex and the reasoning and professional knowledge capabilities of GPT-5.2, integrating these strengths into a single model that is also 25% faster. This combination enables the system to undertake long-running tasks involving research, tool use, and complex execution—much like working with a human colleague.

Crucially, users can steer and interact with GPT-5.3-Codex while it's working, without losing context. This interactive capability transforms the agent from a fire-and-forget tool into a collaborative partner that can adapt its approach based on real-time feedback.

Self-Improvement: The Model That Built Itself

In a remarkable demonstration of its capabilities, GPT-5.3-Codex is the first OpenAI model that was instrumental in creating itself. The Codex team used early versions of the model to debug its own training, manage its own deployment, and diagnose test results and evaluations. Team members reported being astonished by how much Codex accelerated its own development process.

This self-improvement capability represents more than just an interesting technical achievement—it demonstrates the model's ability to handle the complex, multi-faceted work that characterizes real software development, from debugging intricate systems to managing deployment pipelines.

Frontier Performance Across Benchmarks

GPT-5.3-Codex sets new industry standards across multiple rigorous benchmarks:

SWE-Bench Pro: The model achieves state-of-the-art performance on SWE-Bench Pro, a rigorous evaluation of real-world software engineering. Unlike SWE-bench Verified, which only tests Python, SWE-Bench Pro spans four programming languages and is more contamination-resistant, challenging, diverse, and industry-relevant.

Terminal-Bench 2.0: GPT-5.3-Codex far exceeds previous state-of-the-art performance on Terminal-Bench 2.0, which measures the terminal skills that coding agents need. Notably, it achieves these results using fewer tokens than any prior model, allowing users to accomplish more within their usage limits.

OSWorld: The model demonstrates strong performance on OSWorld, an agentic computer-use benchmark where agents complete productivity tasks in a visual desktop environment. GPT-5.3-Codex shows far stronger computer use capabilities than previous GPT models.

GDPval: With custom skills, GPT-5.3-Codex matches GPT-5.2's performance on GDPval, an evaluation measuring model performance on well-specified knowledge-work tasks across 44 occupations, including creating presentations, spreadsheets, and other work products.

Beyond Code: Building Complete Applications

Combining frontier coding capabilities, improvements in aesthetics, and compaction results in a model capable of impressive work—building highly functional complex games and applications from scratch over the course of days. To test the model's web development and long-running agentic capabilities, OpenAI asked GPT-5.3-Codex to build two games: a racing game and a diving game.

Using generic follow-up prompts like "fix the bug" or "improve the game," GPT-5.3-Codex iterated on the games autonomously over millions of tokens, demonstrating sustained, purposeful work toward improving complex software systems.

Improved Understanding of User Intent

GPT-5.3-Codex better understands user intent when creating everyday websites compared to its predecessor. Simple or underspecified prompts now default to sites with more functionality and sensible defaults, providing a stronger starting canvas for bringing ideas to life.

For example, when asked to build landing pages, GPT-5.3-Codex automatically implements thoughtful UI decisions like showing yearly plans as discounted monthly prices and creating automatically transitioning testimonial carousels with multiple distinct user quotes. These enhancements result in pages that feel more complete and production-ready by default.

Supporting the Full Software Lifecycle

Software engineers, designers, product managers, and data scientists do far more than generate code. GPT-5.3-Codex is built to support all work in the software lifecycle: debugging, deploying, monitoring, writing product requirements documents, editing copy, user research, tests, metrics, and more.

The model's agentic capabilities extend beyond software development, helping users build whatever they want—whether slide decks or data analysis in spreadsheets. This broad applicability makes GPT-5.3-Codex valuable across diverse professional contexts, not just for traditional software development.

Interactive Collaboration

As model capabilities become more powerful, the challenge shifts from what agents can do to how easily humans can interact with, direct, and supervise them working in parallel. GPT-5.3-Codex addresses this through enhanced interactivity.

The model provides frequent updates to keep users apprised of key decisions and progress as it works. Instead of waiting for a final output, users can interact in real time—asking questions, discussing approaches, and steering toward solutions. GPT-5.3-Codex talks through what it's doing, responds to feedback, and keeps users in the loop from start to finish.

Accelerating OpenAI's Own Development

The recent rapid Codex improvements build on research projects spanning months or years across OpenAI. These projects are being accelerated by Codex itself, with many researchers and engineers describing their work today as fundamentally different from just two months ago.

The research team used Codex to monitor and debug training runs, track patterns throughout training, analyze interaction quality, propose fixes, and build rich applications for understanding model behavior. The engineering team used Codex to optimize and adapt the harness, identify bugs, root cause performance issues, and dynamically scale GPU clusters to adjust to traffic surges.

During alpha testing, researchers used GPT-5.3-Codex to create regex classifiers for analyzing session logs, run them at scale, and produce comprehensive reports—work that would have taken significantly longer through traditional methods.

Cybersecurity Capabilities and Safeguards

Recent months have seen meaningful gains in model performance on cybersecurity tasks, benefiting both developers and security professionals. GPT-5.3-Codex is the first model OpenAI classifies as High capability for cybersecurity-related tasks under its Preparedness Framework, and the first directly trained to identify software vulnerabilities.

While OpenAI doesn't have definitive evidence that the model can automate cyber attacks end-to-end, the company is taking a precautionary approach by deploying its most comprehensive cybersecurity safety stack to date. Mitigations include safety training, automated monitoring, trusted access for advanced capabilities, and enforcement pipelines including threat intelligence.

Because cybersecurity is inherently dual-use, OpenAI is taking an evidence-based, iterative approach that accelerates defenders' ability to find and fix vulnerabilities while slowing misuse. As part of this effort, the company is launching Trusted Access for Cyber to accelerate cyber defense research.

Ecosystem Safeguards and Partnerships

OpenAI is investing in ecosystem safeguards, including expanding the private beta of Aardvark, its security research agent, as the first offering in its suite of Codex Security products. The company is also partnering with open-source maintainers to provide free codebase scanning for widely used projects.

Building on its $1 million Cybersecurity Grant Program launched in 2023, OpenAI is committing $10 million in API credits to accelerate cyber defense with its most capable models, especially for open source software and critical infrastructure systems.

Availability and Performance

GPT-5.3-Codex is available with paid ChatGPT plans across all Codex surfaces: the app, CLI, IDE extension, and web. OpenAI is working to safely enable API access soon.

With this update, GPT-5.3-Codex runs 25% faster for Codex users, thanks to improvements in infrastructure and the inference stack, resulting in faster interactions and quicker results. The model was co-designed for, trained with, and served on NVIDIA GB200 NVL72 systems.

Looking Toward a General-Purpose Agent

With GPT-5.3-Codex, Codex is moving beyond writing code to using it as a tool to operate a computer and complete work end to end. By pushing the frontier of what a coding agent can do, OpenAI is also unlocking a broader class of knowledge work—from building and deploying software to researching, analyzing, and executing complex tasks.

What started as a focus on being the best coding agent has become the foundation for a more general collaborator on the computer, expanding both who can build and what's possible with Codex. As capabilities continue to advance, the vision is not just better coding tools, but a fundamental shift in how humans and AI systems work together to accomplish complex technical goals.

Source: Introducing GPT-5.3-Codex - OpenAI Blog