Proposal: Scout Models — Lightweight Context-Delivery Agents for Codex #11751

AmirTlinov · 2026-02-13T16:29:22Z

AmirTlinov
Feb 13, 2026

Problem

Codex currently operates with a single model that handles everything: reading the repository, understanding context, reasoning about the task, and producing patches. This creates a fundamental tension — a large, expensive model spends significant tokens (and time) on mechanical context gathering, while still frequently missing critical fragments of code that are essential for producing correct patches.

The result:

Incomplete context → the model hallucinates imports, invents API signatures, or writes patches against stale assumptions.
Wasted capacity → the powerful model burns tokens scanning files it ultimately doesn't need.
Higher cost — every token of context retrieval is billed at the full rate of the primary model.
Slower turnaround — sequential file reads add latency before the model even begins reasoning.

Proposed Solution: Scout Models

Introduce a scout layer — a fleet of small, fast, cheap models (think gpt-5.1-codex-mini or gpt-5.3-codex-spark) whose sole job is context delivery to the primary model (gpt-5.3-codex, gpt-5.1-codex-max, etc.).

Core Principles

Scouts never write diffs. They do not modify code, produce patches, or make creative decisions. They are strictly read-only context agents.
Scouts place anchors, not summaries. Instead of summarizing or paraphrasing code (which loses fidelity), a scout places precise anchors — pointers to exact line ranges — and delivers the raw source fragments to the primary model.
Scouts add headers. Each delivered fragment comes with a short structured header explaining why this fragment is relevant (e.g., [ANCHOR: target function signature — needed for return type inference]).
No token limit on anchors per file. A scout should be able to place 100+ anchors in a single file if the task demands it. Dense files (large modules, generated code, config files) often have many scattered relevant sections.

How It Works

User Task
    │
    ▼
┌─────────────────────────────────────┐
│  SCOUT LAYER                         │
│  (gpt-5.1-codex-mini /              │
│   gpt-5.3-codex-spark)              │
│                                      │
│  • Parse task requirements           │
│  • Traverse repo structure           │
│  • Place anchors on relevant         │
│    code fragments                    │
│  • Attach context headers            │
│  • Deliver clean, noise-free         │
│    context package                   │
└──────────────┬──────────────────────┘
               │  Structured context
               ▼
┌─────────────────────────────────────┐
│  PRIMARY MODEL                       │
│  (gpt-5.3-codex /                   │
│   gpt-5.1-codex-max)                │
│                                      │
│  • Receives only what it needs       │
│  • Full fidelity — raw code,         │
│    not summaries                     │
│  • Reasons about the task            │
│  • Produces accurate patches         │
└─────────────────────────────────────┘

What a Scout Delivers (Example)

Instead of dumping an entire 2000-line file into the primary model's context, a scout delivers:

── CONTEXT PACKAGE ──────────────────────────────────

[FILE: src/services/auth.ts]

  [ANCHOR #1: AuthService class declaration + constructor]
  [REASON: Task modifies auth flow — need DI dependencies and init logic]
  Lines 45-78:
  ┌──────────────────────────────────
  │ export class AuthService {
  │   private readonly tokenStore: TokenStore;
  │   private readonly httpClient: HttpClient;
  │   ...
  │ }
  └──────────────────────────────────

  [ANCHOR #2: validateToken method]
  [REASON: Directly referenced in user task — must be patched]
  Lines 142-189:
  ┌──────────────────────────────────
  │ async validateToken(token: string): Promise<ValidationResult> {
  │   ...
  │ }
  └──────────────────────────────────

  [ANCHOR #3: TokenRefreshStrategy interface]
  [REASON: validateToken returns this type — needed for type safety]
  Lines 12-23:
  ┌──────────────────────────────────
  │ interface TokenRefreshStrategy {
  │   ...
  │ }
  └──────────────────────────────────

[FILE: src/types/auth.types.ts]

  [ANCHOR #4: ValidationResult type definition]
  [REASON: Return type of target method]
  Lines 34-41:
  ┌──────────────────────────────────
  │ export type ValidationResult = {
  │   ...
  │ }
  └──────────────────────────────────

── END CONTEXT PACKAGE ──────────────────────────────

The primary model receives ~120 lines of precisely targeted context instead of ~3500 lines across two files. No noise, no irrelevant code, no guessing.

Why This Matters

Metric	Current Approach	With Scouts
Context precision	Model reads whole files, hopes for the best	Surgical — only anchored fragments
Context completeness	Often misses cross-file dependencies	Scouts traverse the full repo graph
Primary model tokens	High (reads + reasons in one pass)	Low (reasons on pre-filtered context)
Cost to user	Full-price tokens for file scanning	Scanning at mini/nano rates, reasoning at full rate
Patch accuracy	Degrades with context window pressure	Improves — model focuses on relevant code only
Speed	Sequential: read → think → write	Parallel scouts → focused reasoning

Key Design Constraints

Scouts must NOT:

Generate diffs or code modifications
Summarize or paraphrase code (lossy)
Make architectural decisions
Filter out code they "think" is irrelevant based on heuristics — when in doubt, include the fragment

Scouts MUST:

Deliver raw source code fragments with exact line numbers
Provide structured headers explaining relevance
Support high anchor density (dozens to hundreds per file)
Operate in parallel across multiple files
Be replaceable/configurable (user picks scout model)

Implementation Considerations

This could integrate with Codex's existing architecture:

Scouts run as a pre-pass before the primary model's turn
The context package becomes part of the primary model's system/user prompt
Scout model selection could be a user-facing config (e.g., scout_model: "gpt-5.1-codex-mini")
Anchor format is standardized so the primary model can reliably parse delivered fragments

The scout layer is conceptually similar to how a senior engineer works with a junior: the junior gathers all relevant files and highlights the important sections, then the senior makes the actual decisions and writes the code.

Future Vision: Role-Based Agent Pipeline with Per-Role Model Configuration

The scout concept naturally extends into a broader architecture — a role-based agent pipeline where each stage has a distinct responsibility and a separately configurable model. This would allow users to balance cost, speed, and quality at every step.

Proposed Pipeline

┌──────────────────────────────────────────────────────────────────┐
│                        USER REQUEST                              │
└──────────────┬───────────────────────────────────────────────────┘
               │
               ▼
┌──────────────────────────────────────────────────────────────────┐
│  1. PLANNER AGENT                                                │
│     Model: configurable (e.g. gpt-5.3-codex)                    │
│                                                                  │
│     • Analyzes user intent                                       │
│     • Decides what information is needed                         │
│     • Issues targeted commands to scouts                         │
└──────────────┬───────────────────────────────────────────────────┘
               │  Scout commands
               ▼
┌──────────────────────────────────────────────────────────────────┐
│  2. SCOUT AGENTS (parallel)                                      │
│     Model: configurable (e.g. gpt-5.1-codex-mini                │
│            or gpt-5.3-codex-spark)                               │
│                                                                  │
│     • Execute planner's search directives                        │
│     • Traverse repo, place anchors                               │
│     • Deliver precise context fragments with headers             │
│     • No diffs, no summaries — raw code only                     │
└──────────────┬───────────────────────────────────────────────────┘
               │  Structured context package
               ▼
┌──────────────────────────────────────────────────────────────────┐
│  3. PRIMARY AGENT (executor)                                     │
│     Model: configurable (e.g. gpt-5.3-codex                     │
│            or gpt-5.1-codex-max)                                 │
│                                                                  │
│     • Receives clean context — no wasted tokens                  │
│     • If question → formulates precise answer for user           │
│     • If code task → produces patches                            │
│     • Minimal token spend — works only with relevant fragments   │
└──────────────┬───────────────────────────────────────────────────┘
               │  Patches / response
               ▼
┌──────────────────────────────────────────────────────────────────┐
│  4. VALIDATOR AGENT                                              │
│     Model: configurable (e.g. gpt-5.1-codex-mini                │
│            or gpt-5.3-codex-spark)                               │
│                                                                  │
│     • Cross-checks patches against:                              │
│       - Original user intent                                     │
│       - Provided context (anchored fragments)                    │
│       - Code consistency and type safety                         │
│     • Produces specific, actionable remarks (not vague feedback) │
│     • If issues found → sends corrections back to Primary Agent  │
│     • If valid → compiles final report for user                  │
└──────────────────────────────────────────────────────────────────┘

Why Per-Role Model Configuration Matters

Different roles have fundamentally different compute profiles:

Role	Needs	Ideal Model Class
Planner	Intent parsing, decomposition — moderate reasoning	Mid-tier (`gpt-5.3-codex-spark`) or full (`gpt-5.3-codex`)
Scouts	Fast file traversal, anchor placement — low reasoning	Lightweight (`gpt-5.1-codex-mini` / `gpt-5.3-codex-spark`)
Primary (executor)	Deep reasoning, accurate patches — highest capability	Full (`gpt-5.3-codex` / `gpt-5.1-codex-max`)
Validator	Cross-reference checking, pattern matching — moderate reasoning	Mid-tier (`gpt-5.3-codex-spark`) or lightweight (`gpt-5.1-codex-mini`)

A user working on a simple refactor might assign gpt-5.3-codex-spark to every role for maximum speed. A user working on a critical architectural change might assign gpt-5.1-codex-max to the executor and gpt-5.3-codex to the validator. The point is: the user decides, based on their task and budget.

Example Configuration

# .codex/agent-roles.yaml (or equivalent config)
pipeline:
  planner:
    model: "gpt-5.3-codex-spark"
  scouts:
    model: "gpt-5.1-codex-mini"
    max_parallel: 8
    max_anchors_per_file: 200
  executor:
    model: "gpt-5.3-codex"
  validator:
    model: "gpt-5.3-codex-spark"
    validate_against:
      - user_intent
      - provided_context
      - type_safety

Key Benefit

The current single-model approach forces a trade-off: either you use an expensive model for everything (including mechanical scanning), or you use a cheaper model and accept lower patch quality. The role-based pipeline eliminates this trade-off — expensive reasoning only happens where it matters, while mechanical tasks run on fast, cheap models. The validator stage adds a safety net that catches errors before the user ever sees them, without requiring the primary model to self-review (which is both expensive and unreliable).

Offer to Contribute

If this direction aligns with OpenAI's roadmap for Codex, I would be happy to contribute to the implementation — whether that's prototyping the scout orchestration layer, defining the anchor protocol, or integrating it into the existing agent loop. I'm actively working with Codex CLI and have a concrete understanding of where context gaps cause failures in practice.

TL;DR: Add a layer of cheap, fast "scout" models (gpt-5.1-codex-mini / gpt-5.3-codex-spark) that do nothing but read code and deliver precisely anchored raw fragments (with context headers) to the primary model (gpt-5.3-codex). Scouts never write diffs — they only gather and deliver context. This improves patch quality, reduces cost, and speeds up execution.

Resitsahin · 2026-02-14T11:04:33Z

Resitsahin
Feb 14, 2026

I need this for myself, I want a model which can asses a codebase, how user flow is, how business logic is, how is test coverage, does it use modern standards etc...

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Scout Models — Lightweight Context-Delivery Agents for Codex #11751

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Proposal: Scout Models — Lightweight Context-Delivery Agents for Codex #11751

Uh oh!

AmirTlinov Feb 13, 2026

Problem

Proposed Solution: Scout Models

Core Principles

How It Works

What a Scout Delivers (Example)

Why This Matters

Key Design Constraints

Implementation Considerations

Future Vision: Role-Based Agent Pipeline with Per-Role Model Configuration

Proposed Pipeline

Why Per-Role Model Configuration Matters

Example Configuration

Key Benefit

Offer to Contribute

Replies: 1 comment

Uh oh!

Resitsahin Feb 14, 2026

AmirTlinov
Feb 13, 2026

Resitsahin
Feb 14, 2026