Understanding OpenAI's Agent Architecture

Codex CLI is OpenAI's cross-platform local software agent designed to produce high-quality, reliable software changes while operating safely and efficiently on your machine. The team has learned tremendous insights about building world-class software agents, and they're sharing detailed technical posts about how Codex works internally.

The Agent Loop: Core Foundation

At the heart of every AI agent is the "agent loop." This represents the fundamental interaction pattern between user, model, and tools:

  1. User Input - The agent takes input from the user and prepares textual instructions (a prompt) for the model
  2. Model Inference - The model processes the prompt through tokenization and generates a response
  3. Tool Execution - If the model requests a tool call, the agent executes it
  4. Iteration - The agent appends tool output to the original prompt and re-queries the model
  5. Termination - The loop ends when the model produces an assistant message instead of requesting tools

How Codex Manages Context

Codex uses OpenAI's Responses API to run model inference. Each request to the Responses API initiates a "turn" in the conversation.

The Three-Agent System

Codex uses sophisticated multi-agent architecture including prompt caching, context window management, and tool integration for reliable software development.

TL;DR

- Agent Loop Foundation: User input → model inference → tool execution → iteration → completion
- Context Management: Prompt caching, stateless requests, automatic compaction prevent context window overflows
- Multi-Source Tools: Codex integrates OpenAI tools, native tools, and user-defined MCP servers
- Performance Focus: Maintaining identical static content enables cache hits
- Practical Output: Codex executes actual code changes via tool calls

Source: OpenAI: Unrolling the Codex Agent Loop