The Captain-Dispatcher Design: Empower Your Agent Team Members With the Full 1M Context Window

May 30, 2026

Fifth in a series. Previous posts: (1) AI Agent Teams for Analytics — the 17-agent architecture and lifecycle. (2) Best Practices To Keep AI Agents on Track — how enforcement mechanism matters more than rule content. (3) Agent Discussion: The Quality Layer That Harness Engineering Can’t Replace — how structured agent-to-agent discussion catches judgment-dependent quality issues. (4) The Dispatcher — how repeatable evaluation turns spec changes into measurements.

Summary

Every agent gets the full context window the model offers, not a fraction of it. In the Captain-Dispatcher architecture, each agent runs as an independent top-level process and receives the same context allocation a single-user session would receive. A pipeline of eight agents with a 1M-token-class model gets eight 1M-token contexts, not one 1M context shared eight ways.
Long-horizon analyses no longer compact mid-task. When a sub-agent inherits a small fraction of its parent’s context, it routinely exhausts that budget partway through a multi-step task and silently compacts, losing reasoning chains. Independent-process agents in our pipeline complete multi-hour analyses without auto-compaction, because each agent’s context is sized to its own work, not its caller’s.
The orchestrator stays small while the workers stay large. A Captain that simply routes messages and writes event logs uses a tiny fraction of its context budget. Pushing the heavy work (research, query writing, drafting, review) into separately-launched processes lets the orchestration loop run indefinitely without itself becoming the bottleneck.
Communication is a file-based mailbox, not a function call. Agents exchange messages by writing to and reading from a shared filesystem mailbox (SendMessage). This decouples sender and receiver: messages persist across restarts, can be replayed for debugging, and survive any single agent crashing or being respawned.
The pattern has independent industry validation. Google’s A2A protocol formalizes independent-process agent communication at the protocol level. The open-source TAP project implements file-based mailboxes between heterogeneous agents (Claude, Codex, Gemini). At least four independent teams have converged on the same external-orchestration pattern documented in public GitHub issues — strong evidence the architecture addresses a real structural constraint.
The architecture composes cleanly with existing harness work. The Captain-Dispatcher pattern is orthogonal to the rule-enforcement harness (post 1), the discussion architecture (post 2), and the evaluation dispatcher (post 3). It is the substrate that makes each of those layers possible at full scale: hooks fire per-agent, discussion happens between full-context peers, and the evaluation dispatcher can launch independent pipelines because the agents themselves were already designed to run as independent processes.

1. Introduction

The first three posts in this series described how to make a multi-agent pipeline reliable: enforcement mechanisms for programmatic rules, structured discussion for judgment-dependent quality, and automated infrastructure for repeatable measurement. Each layer assumed something underneath it: that the agents themselves had enough room to do their work. An Auditor cannot challenge a finding if its context window filled up before the challenge step. A Writer cannot reconcile evidence across phases if half the evidence has already been compacted away. A Captain cannot orchestrate a multi-hour pipeline if the orchestration loop itself runs out of context.

This post is about that substrate. The architecture we describe — Captain-Dispatcher — solves a specific problem: making sure every agent has the full context allocation its model is capable of using, even when many agents run as part of the same pipeline. The conventional approach is to spawn agents as sub-processes of a parent orchestrator, with all sub-agents sharing one runtime configuration and one effective context budget. The Captain-Dispatcher approach is to launch each agent as an independent top-level process and have them communicate through a file-based mailbox.

The benefit is direct. Each agent receives the same context window it would receive in a standalone interactive session. A pipeline of eight agents using a 1M-token-class model has eight 1M-token contexts, allocated per agent, not divided across them. Long-horizon work — multi-phase analyses, large codebases, multi-document research — fits naturally inside the per-agent budget. Auto-compaction events that previously truncated work mid-task disappear from the trace.

The pattern is not unique to our system. The Agent-to-Agent (A2A) protocol designed by Google and other industry partners is built on exactly this idea: independent agent processes, opaque to one another, communicating over a shared transport. The open-source TAP framework implements a file-based variant of the same pattern across three different agent runtimes. Public GitHub issue trackers show at least four independent teams converging on the same external-orchestration design as a way to escape framework-imposed context limits. The Captain-Dispatcher architecture is our concrete instance of a pattern that the wider ecosystem has been converging on.

The remainder of this post covers the problem the architecture solves, the root cause of the limitation in conventional sub-process spawning, the design of the independent-process pattern with file-based mailboxes, the industry validation, and the design principles that generalize to other multi-agent systems.

2. The Problem: Sub-Process Spawning Degrades Per-Agent Context

Observation 1: A pipeline’s context budget is the per-agent budget, not the sum

Multi-agent pipelines do not benefit from “total” context across agents. Each agent does its work in its own context window: the Data agent’s table list, the Execution agent’s query plan, the Analyst’s framework, the Writer’s draft. The amount of context each agent has available determines how much work it can do before it must either summarize, hand off, or compact. There is no useful sense in which one agent can “borrow” another agent’s unused context.

When sub-agents are spawned inside a parent’s runtime, the framework typically applies one context budget across the whole agent tree. Sub-agents are allocated a fraction of the parent’s effective context. For pipelines that need every agent to operate on full evidence — read tens of files, hold a framework with a dozen hypotheses, reconcile cross-source data — that fraction is the binding constraint on what each agent can do.

Observation 2: Context exhaustion is silent and corrosive

When an agent fills its context budget mid-task, the runtime does not raise an error visible to the user. It auto-compacts: summarizes the older portion of the conversation, discards original tokens, continues from the summary. Compaction is sometimes appropriate, but inside a multi-step analytical task it is often disastrous. The detailed evidence that justified an earlier claim is replaced by a short summary that no longer supports detailed downstream reasoning. The next step appears to proceed normally; the quality of the output silently drops.

A common failure pattern from this is a Writer agent that produces a polished-looking deliverable from a context that no longer contains the underlying numbers. The deliverable cites figures that the Writer can no longer verify, because they have been compacted out. Cross-source validation collapses to “validated against Table B” with no remaining trace of the comparison. Challenge calibration drops back toward zero because the Auditor’s reasoning record has been summarized away.

Observation 3: Time-to-exhaustion is short — and independent-process agents eliminate it

In a pipeline where sub-agents inherit a fraction of the parent’s context, time-to-context-exhaustion is measured in minutes, not hours. On a comprehensive analysis task (6-dimension breakdown with cross-table validation and 8+ charts), our context tracker recorded every agent hitting 100% context within 95 minutes under the sub-process model:

flowchart LR
    subgraph SP["Sub-Process: All hit 100%"]
        D1["Data<br/>100% @ 25m"] --> E1["Execution<br/>100% @ 45m"]
        E1 --> A1["Analyst<br/>100% @ 55m"]
        A1 --> Au1["Auditor<br/>100% @ 75m"]
        Au1 --> W1["Writer<br/>100% @ 95m"]
    end

    subgraph IP["Independent: None exceed 32%"]
        D2["Data<br/>12%"] --> E2["Execution<br/>26%"]
        E2 --> A2["Analyst<br/>23%"]
        A2 --> Au2["Auditor<br/>32%"]
        Au2 --> W2["Writer<br/>20%"]
    end

    style SP fill:#ffcdd2,stroke:#C62828
    style IP fill:#c8e6c9,stroke:#2E7D32

Table N: Context Utilization — Sub-Process vs Independent-Process (Same Task Type)

Agent	Sub-Process (TeamCreate)	Independent-Process (Captain-Dispatcher)
Data	100% at 25 min	121.6k tokens at 520 min (12% of 1M)
Execution	100% at 45 min	261.9k tokens at 520 min (26% of 1M)
Analyst	100% at 55 min	226.7k tokens at 520 min (23% of 1M)
Auditor	100% at 75 min	318.8k tokens at 520 min (32% of 1M)
Writer	100% at 95 min	200.9k tokens at 520 min (20% of 1M)

Under the sub-process model, every agent exhausted its context before the pipeline completed. The Data agent — the lightest role — hit 100% in 25 minutes. Under the independent-process model, the same task type ran for over 8 hours with no agent exceeding 32% of its context budget. The heaviest agent (Auditor, 318.8k tokens) accumulated more than 1.5x what the sub-process model’s total budget would allow — without any compaction event.

The Auditor’s 318.8k tokens is definitive proof: a 200K-context agent would have compacted at ~180K. Accumulating 318.8k tokens without compaction is only possible with a context window of at least 320K — consistent with the 1M budget that independent top-level processes receive.

A note on misleading banners: Agent runtimes may display a banner at startup indicating a smaller context window or lower effort level than the agent actually has (e.g., “Opus 4.7 with high effort” when the agent actually has 1M context). This is a known inconsistency between the banner display and the actual runtime configuration. The token accumulation data — not the banner — is the ground truth for context window size. When an agent accumulates 318k tokens without compaction, no banner claiming “200K” or “high effort” overrides the empirical evidence.

Observation 4: The orchestrator is itself a long-running agent

The orchestrator (Captain) runs for the full duration of the pipeline — hours, sometimes days. Across that time it must read every agent’s report, write event logs, make routing decisions, and re-read its own spec after any compaction. If the orchestrator shares its context budget with the workers, the orchestration loop becomes the failure point: the Captain itself compacts, loses track of which agents have produced what, and either repeats work or skips deliverables.

Treating the orchestrator as one more agent that needs its full context budget — and the workers as separate agents each with their own full budget — is the only configuration that supports long-running multi-agent pipelines without orchestration-layer collapse.

3. Root Cause: Why Sub-Process Spawning Constrains Context

The structural reason sub-process spawning degrades context comes from how agent frameworks resolve the model identifier when launching child agents. The mechanism is well-documented in public bug trackers and applies across multiple framework versions.

Cause 1: Model identifier stripping during sub-process spawn

Agent runtimes that use a backend launcher to spawn child processes typically pass a --model flag to the child. When the parent is configured with a large-context model variant — for example, a model identifier with a [1m] suffix indicating a 1M-token variant — the child-launching code path can strip that suffix before passing it to the subprocess. The child boots with the base model identifier and falls back to the default context window for that base model, which is much smaller.

This stripping has been documented across at least four versions of the Claude Code runtime in public issue trackers (GitHub issues #43782, #34421, #36670, #40929). Independent reproductions across the versions used the same diagnostic — inspect the running subprocess’s command line — and consistently observed the base model identifier without the large-context suffix. The behavior reproduces on multiple operating systems and across at least four independent users.

The result is structural: even when the parent is configured with the large-context variant, the child cannot be. There is no parent setting that makes the suffix propagate, because the propagation path itself drops it.

Cause 2: Sub-agent tool model parameter constrained by a small enum

The framework-level tool that spawns sub-agents accepts a model parameter, but its value is constrained by a JSONSchema enum that lists only a small set of values — typically the family-level names (such as sonnet, opus, haiku) rather than specific model identifiers with context-window suffixes. Public source-code inspection (sdk-tools.d.ts and equivalent) confirms the enum is the entire allowed set. Specifying a 1M-context variant is structurally impossible: the tool call would fail validation before reaching the spawn step.

A workaround sometimes proposed is a global environment variable that the framework reads when launching any sub-agent. This is documented as applying globally: it sets the model for all sub-agents in the process tree, with no per-agent override. Controlled testing in public issue threads shows that even with the global variable set, sub-agents launch into the base context window rather than the large variant. The framework’s effective configuration is “one model for all sub-agents, default context window for whichever model is set,” with no path to per-agent context tier control.

Cause 3: Frontmatter-based skill agent inheritance is opaque

For agent runtimes that allow declaring a sub-agent’s model in skill frontmatter, the runtime resolves the declared model identifier with logic that re-appends the parent session’s context tier. A skill that declares a smaller model can find itself running on the smaller model with the parent’s large context tier — which the user’s billing may not cover. There is no frontmatter knob to specify a context tier independently of the model. Reproduction reports show 100% failure rates across multiple attempts on at least one runtime version, with the bug filed and labeled but not resolved.

Why these causes argue for independent processes

Each of the three causes above describes the same structural issue from a different angle: the model and context-window configuration for a sub-agent is not a per-call parameter the orchestrator can set freely. The runtime imposes a global or inherited setting, with limited and fragile override paths. As long as agents are spawned inside the orchestrator’s runtime, the orchestrator cannot give each agent its own full-budget context — the framework does not expose that knob.

The resolution is to spawn agents outside the orchestrator’s runtime. An agent launched as an independent top-level process — the same way a user would open a fresh interactive session — receives the same per-process resource allocation a user session would receive. The context window is whatever the model and the user’s account entitle the process to. There is no parent-runtime intermediary stripping suffixes or capping inheritance.

The orchestrator then communicates with these independent processes through a transport that does not require shared runtime state. The natural choice in our environment is a file-based mailbox: each agent reads and writes messages from a shared filesystem location, and the orchestrator routes work by writing to the appropriate mailbox.

4. The Solution: Independent-Process Agents with a File-Based Mailbox

The Captain-Dispatcher architecture has two parts: how agents are spawned, and how they communicate. Both are deliberately simple. The simplicity is what makes the architecture composable with the rule-enforcement, discussion, and evaluation layers built on top.

Architecture Comparison

flowchart TD
    subgraph Sub["Sub-Process Model"]
        P["Parent Process<br/>1M context"] --> S1["Sub-Agent 1<br/>~200K"]
        P --> S2["Sub-Agent 2<br/>~200K"]
        P --> S3["Sub-Agent 3<br/>~200K"]
    end

    subgraph Ind["Independent-Process Model"]
        C["Captain<br/>1M context"]
        A1["Agent 1<br/>1M context"]
        A2["Agent 2<br/>1M context"]
        A3["Agent 3<br/>1M context"]
        C ---|mailbox| A1
        C ---|mailbox| A2
        C ---|mailbox| A3
    end

    style Sub fill:#ffcdd2,stroke:#C62828
    style Ind fill:#c8e6c9,stroke:#2E7D32

Table 1: Sub-Process Model vs Independent-Process Model

Aspect	Sub-Process (conventional)	Independent-Process (Captain-Dispatcher)
Context per agent	Fraction of parent’s budget (e.g., 200K)	Full model entitlement (e.g., 1M)
Effort level	Degraded (e.g., “high”)	Full (e.g., “max”)
Model control	Inherited/limited enum	Per-agent, independently configured
Fault isolation	Parent dies → all children die	Each agent independent; one crash doesn’t affect others
Communication	In-process function calls	File-based SendMessage mailboxes
Inspectability	Transient in-memory state	Every message persisted on disk
Orchestrator context	Accumulates all agent content	Stays small — routes messages only
Hooks/enforcement	Fire in parent’s context	Fire independently per agent
Discussion protocol	Unchanged	Unchanged (SendMessage works identically)
Evaluation dispatcher	Unchanged	Unchanged (launches pipelines the same way)

4.1 Spawning agents as independent top-level processes

Every agent in the pipeline — Captain, Data, Execution, Analyst, Writer, Auditor, Improve, Watchdog — runs as its own top-level operating-system process. There is no parent-child runtime relationship between any two agents. Each is launched the same way a user would launch a fresh interactive session for that agent.

The Captain is the first process launched. It reads the user’s request, instantiates the event log, and decides which agents the pipeline needs. For each agent, the Captain (or a thin shell launcher) starts a new top-level process: a separate session that loads its own skill, its own system prompt, and its own context budget. The Captain holds no reference to the agent’s internal runtime state — only the agent’s mailbox identifier.

The benefit is that the framework’s sub-agent constraints — model identifier stripping, enum-limited model parameter, frontmatter inheritance opacity — do not apply. Each agent’s process is configured the same way a user’s session would be: with full account entitlements, the full context window for whichever model it is running, and full control over its own runtime configuration. Eight agents launched this way yield eight independent full-budget contexts.

flowchart LR
    Captain["Captain<br/><i>thin router</i>"] -->|SendMessage| Data["Data<br/>mailbox/"]
    Captain -->|SendMessage| Exec["Execution<br/>mailbox/"]
    Captain -->|SendMessage| Analyst["Analyst<br/>mailbox/"]
    Data -->|SendMessage| Auditor["Auditor<br/>mailbox/"]
    Exec -->|SendMessage| Auditor
    Analyst -->|SendMessage| Auditor
    Auditor -->|SendMessage| Captain

    style Captain fill:#fff3e0,stroke:#E65100
    style Auditor fill:#fce4ec,stroke:#C62828

Each arrow is a file written to the recipient’s mailbox directory. Messages persist on disk, survive restarts, and are replayable for debugging.

4.2 File-based mailboxes for inter-agent communication

Agents communicate by reading and writing messages in a shared filesystem location. Each agent has a mailbox — a directory or set of files — keyed by the agent’s identifier. To send a message, the sender writes a structured file to the recipient’s mailbox. To receive, the recipient reads new files from its own mailbox.

We expose this as a SendMessage primitive available to every agent. Calling SendMessage(to="auditor", body=...) writes a message file under the auditor’s mailbox path. The Auditor’s process detects the new file (via a file watcher, periodic poll, or notification depending on the runtime) and processes the message in its own context, with all of its own available budget.

The mailbox is durable and inspectable:

Durable — messages persist across agent restarts. If an agent crashes mid-task and is respawned, its mailbox contains every unhandled message; the new process picks them up cleanly.
Inspectable — every inter-agent message is a file on disk. The full conversation trace is recoverable after the run, without any in-memory state. Debugging a pipeline run reduces to reading the message files in time order.
Replayable — because the messages are durable files, an alternate agent implementation can be tested against the same message history without re-running the original pipeline.

4.3 The Captain as a thin router

In the Captain-Dispatcher architecture, the Captain itself is deliberately lightweight. Its responsibilities are routing decisions, event logging, and gate enforcement — not the heavy work of analysis, query writing, or drafting. The Captain’s context budget is therefore dominated by its spec, its event log, and the digest of agent reports it must consider for routing. The substantive content of each agent’s work lives in that agent’s own process, not in the Captain’s.

This split has two operational consequences. First, the Captain can run for the full duration of even a many-hour pipeline without running out of context, because the heavy material never enters its window. Second, when the Captain needs to revisit an earlier agent’s work, it requests a summary (or a specific extracted section) from the agent’s mailbox or from a persisted artifact, rather than holding the full content in its own context.

4.4 Workers as full-context specialists

Each worker agent owns its full context window for its task. The Data agent loads tens of table schemas; the Analyst holds the framework, the metric definitions, and the cross-agent evidence; the Writer drafts the full deliverable with all source material visible. None of these agents compete with one another for context, because each runs in a separate process.

This is the architectural key to the quality gains reported in the earlier posts. The Auditor cannot run substantive cross-source validation if its context cannot hold the relevant tables. The Analyst cannot maintain framework coverage if half the framework has been compacted away. Discussion between agents (post 2) is meaningful only if both participants have enough context to reason about the disputed point. The Captain-Dispatcher architecture is what ensures these preconditions hold.

4.5 Composition with the harness, discussion, and evaluation layers

The independent-process model composes cleanly with the layers built on top:

Harness hooks (post 1) — each agent process triggers its own hooks independently. A PreToolUse hook on the Writer fires within the Writer’s process, not within the Captain’s. Per-agent enforcement is the natural granularity.
Discussion checkpoints (post 2) — discussion is a structured sequence of SendMessage calls between two agents. The full discussion transcript persists in the mailbox files for both participants. Either agent’s process retains its full context across multiple back-and-forth rounds.
Evaluation dispatcher (post 3) — the dispatcher launches whole pipelines as outer-loop top-level processes. Because the agents themselves are designed as independent processes, the dispatcher does not have to special-case anything: it launches a Captain, the Captain launches its agents, and each pipeline runs in isolation from every other concurrently-running pipeline.

The architecture is the substrate; the other layers are the application of that substrate. Each layer’s value depends on the substrate holding.

Table 2: What Changes, What Stays the Same

Component	Changes?	Detail
How agents are spawned	Changed	`Bash(terminal session)` instead of `Agent(team_name=...)`
How agents communicate	Unchanged	SendMessage via file-based mailboxes — identical API
Discussion protocol	Unchanged	Auditor checkpoints, 5-item agenda, Writer conversation mode
Routing guard hook	Updated	Now blocks ALL Agent() calls for core roles (forces terminal spawn)
Other hooks	Updated	4 hooks read `$PROJECT_OUTPUT_DIR` env var for cross-process path resolution
Watchdog monitoring	Updated	Checks terminal sessions instead of sub-process panes
Agent specs	Unchanged	All agent specs (Analyst, Data, Execution, etc.) unchanged
Dispatcher (evaluation)	Updated	Manages N terminal sessions per project; cleanup iterates all
Memory safety	Updated	SIGSTOP/SIGCONT iterates all project sessions
Sub-agents (AA, ML, critics)	Unchanged	Still spawned via Agent(); inherit 1M from their core agent’s top-level process

5. Industry Validation

We did not invent the independent-process pattern. While designing the Captain-Dispatcher architecture from operational pressure, we identified several independent efforts converging on the same approach. The convergence is not coincidence — the underlying constraint is real, and the natural solution to it has the same shape regardless of who is solving it.

5.1 Google’s A2A Protocol: Independent Agents as a Protocol-Level Assumption

The Agent-to-Agent (A2A) protocol, published in 2025 and developed in 2026, is explicitly designed for communication between independent, opaque agent systems. The specification states that the protocol is intended to “facilitate communication and interoperability between independent, potentially opaque AI agent systems” — agents that may be built on different frameworks, run on different servers, and maintain entirely independent internal state.

A2A supports three transport bindings — JSON-RPC 2.0 (the primary default), gRPC, and HTTP/REST — and treats each participating agent as having “opaque execution.” Agents collaborate without sharing internal state. The protocol concerns itself only with the messages that cross the agent boundary, not the runtime of any individual agent.

This is the protocol-level analogue of the Captain-Dispatcher architecture. A2A does not mandate a file-based mailbox specifically — its transport bindings are network-oriented — but the underlying design principle is identical: agents are independent processes (or servers), each with its own runtime, communicating through a defined transport rather than shared in-process state. The fact that A2A treats this as the default model for interoperable agent systems is independent validation that the architecture is not a workaround for a single framework’s limitations but a coherent design that the wider ecosystem is standardizing around.

(Editorial note: A2A’s specification does not explicitly contrast itself with sub-process models. The framing of A2A as validation for the Captain-Dispatcher architecture is our interpretation, not A2A protocol authors’ stated intent.)

5.2 TAP: A File-Based Mailbox Implementation Across Heterogeneous Agents

The Terminal Agent Protocol (TAP), an open-source project, is a working implementation of the file-based variant of the pattern. TAP enables Claude, OpenAI Codex, and Google Gemini agents to collaborate as peers through a shared comms/ directory tracked by git.

The directory layout is fully structured: subdirectories for inbox/, reviews/, findings/, handoff/, retros/, letters/, logs/, onboarding/, receipts/, and archive/. Messages are markdown files. Each agent runtime runs as an independent process with its own configuration: Claude uses a native-push bridge mode via filesystem watches, Codex uses an app-server bridge with a WebSocket daemon, Gemini uses polling. TAP does not spawn AI runtimes as sub-processes of any orchestrator — each agent is a separate top-level process.

TAP’s design choices match ours point for point. Independent processes per agent: yes. File-based messaging: yes. Configuration files per-runtime (.mcp.json for Claude, ~/.codex/config.toml for Codex, .gemini/settings.json for Gemini): yes. Multiple bridge modes to accommodate per-runtime constraints: yes.

The strongest signal from TAP is heterogeneity. TAP’s design makes Claude, Codex, and Gemini interoperate as peers without privileging any one runtime. The Captain-Dispatcher architecture is currently single-runtime (all our agents run on the same model family), but the design is compatible with the same heterogeneity TAP demonstrates — because independent processes communicating through files do not require shared runtime semantics.

5.3 Convergent External Orchestration Across Independent Teams

Beyond protocol specifications and named projects, public GitHub issue trackers show at least four independent teams converging on the same external-orchestration pattern between January and May 2026. Five issues (#19077, #46424, #59523, #60763, #61993) span the period, with 30+ comments and 14+ reactions, and document teams that arrived at variants of the same architecture:

File-based handoff logs (run_log.md, handoff.md) coordinating across agent processes
Separate runtime processes (one team used Rust) implementing a dispatch loop outside the agent context
Independent agent sessions communicating through a shared message bus

The reported motivations cluster around the same structural concern: the orchestration loop or its delegated work cannot fit inside one in-process agent context, and moving the loop to a separate process unblocks the work. (One caveat: the issues primarily concern tool unavailability in nested sub-agent contexts, which is a related but distinct problem from context window degradation. Both motivations argue for the same external-process pattern, and both apply to the Captain-Dispatcher design.)

When multiple teams independently converge on the same architectural choice while solving similar problems with no coordination between them, it is strong evidence that the choice addresses a real structural constraint. The Captain-Dispatcher architecture sits in that convergence: our specific implementation differs in details from each of the public examples, but the load-bearing decisions — independent processes per agent, file-based or transport-based messaging, orchestration outside the agent runtime — are the same.

5.4 Open Frontier

The academic literature has not yet caught up to this practice — no published benchmarks compare file-based vs in-process agent communication latency, and no systematic study compares distributed vs centralized agent architectures. The evidence base is operational (our data) and convergent (A2A, TAP, GitHub issues). Documenting the architecture contributes to closing that gap.

6. Design Principles

The Captain-Dispatcher architecture is specific to our pipeline, but the design choices generalize. The following principles state the choices in transferable form.

Principle 1: Treat per-agent context as an architectural property, not a tuning parameter

The amount of context each agent has available determines what each agent can do. Once an agent has been spawned with a particular context budget, no amount of prompt engineering or rule writing recovers what the budget cannot fit. Choose the spawn model first; tune the prompts within whatever budget that model gives.

Independent-process spawning is the model that gives each agent the maximum budget the runtime allows. Any architecture that shares one budget across multiple agents is making the choice to constrain every agent. That choice may sometimes be justified — for very short, well-bounded sub-tasks — but for substantive analytical work it is the wrong default.

Principle 2: Push heavy work into specialist processes; keep the orchestrator small

The orchestrator’s job is routing, not analysis. A Captain that holds the full content of every agent’s work in its own context is making itself the failure point: it will exhaust before any individual agent does, because it accumulates everyone else’s load.

The discipline is to keep the orchestrator’s context budget dominated by light state — event log, agent identifiers, gate decisions, spec — and treat all substantive content as belonging in the specialist agent’s process. When the orchestrator needs to reason about earlier work, it requests a summary or pulls a specific extract from the worker’s mailbox, rather than re-loading the full material.

This split keeps the orchestrator small enough to run indefinitely while the workers are sized for their own tasks. Both ends of the pipeline are healthy because neither is forced to carry the other’s load.

Principle 3: Make inter-agent communication durable, inspectable, and replayable

A file-based mailbox provides durability (messages survive any agent restart), inspectability (the full conversation trace is on disk), and replayability (alternate implementations can be tested against the same message history). The cost is a small amount of I/O overhead and a shared filesystem location.

The benefit is operational. Debugging a pipeline run reduces to reading a directory of message files in time order. Recovering from a crash is a matter of respawning the affected agent; the messages it had not yet processed are still in its mailbox. Auditing a past run can happen long after the agents themselves have terminated.

Function-call-style in-process communication has none of these properties. The conversation lives in transient memory, vanishes when an agent terminates, and cannot be replayed against a different implementation. For long-running pipelines that may need to be debugged, audited, or rerun against an updated agent spec, the durable mailbox is the right default.

Principle 4: Decouple sender and receiver in time

SendMessage writes to the recipient’s mailbox and returns. It does not wait for the recipient to process the message. The recipient processes messages on its own schedule, in its own context, without imposing its latency on the sender.

This decoupling matters because agent processing times vary widely. An Auditor reviewing a 30-page analysis may take five to ten minutes; a Data agent answering a quick schema question may take thirty seconds. A synchronous in-process call forces the sender to wait for the slowest recipient. An asynchronous mailbox lets the sender continue its own work and check for the response when it is ready.

For pipelines where the orchestrator routes work to multiple specialists in parallel, asynchronous mailboxes are the only natural model. Synchronous in-process calls serialize work that the architecture should parallelize.

Principle 5: Configure each agent runtime independently

Each agent’s runtime configuration — model, context window, tool permissions, hooks — should be a property of that agent’s process, not inherited from the orchestrator. This is what gives the architecture per-agent control over model choice and context tier.

In practice, this means each agent is launched with its own configuration (skill file, system prompt, environment variables, tool allowlist). The orchestrator does not pass configuration down to the agent; the agent reads its own. Changing the Auditor’s model does not require modifying the Captain’s launcher. Tightening the Writer’s tool permissions does not affect the Analyst.

The independence is symmetric: the Captain itself has its own configuration, and worker agents do not depend on what the Captain is running.

Principle 6: Treat the orchestrator-worker split as the substrate, not the application

The architecture is not the analysis. It is the platform on which analysis runs. The harness rules (post 1), the discussion checkpoints (post 2), and the evaluation dispatcher (post 3) are the analytical and quality layers; the Captain-Dispatcher architecture is what makes them feasible at full scale.

The implication for system design is that the substrate should be settled before the application is iterated on. If the substrate constrains per-agent context, every higher layer is operating in a smaller space than the model is capable of using, and improvements to the higher layers will plateau. Get the substrate right first; then iterate on the rule layer, the discussion layer, the evaluation layer.

Principle 7: Borrow from the standards that exist

The independent-process pattern is what A2A specifies and what TAP implements. Other systems have adopted variants. When designing a new multi-agent system, the right starting move is to adopt the convergent pattern rather than inventing a parallel one. The convergence is the signal; treat it as a default and deviate only with reason.

The Captain-Dispatcher architecture is our deviation only in choosing file-based messaging over HTTP and JSON-RPC. The deviation is justified by our environment (everything runs on the same host, filesystem is the simplest transport) and would not necessarily be the right call for cross-host or cross-organization agents. The protocol-level design (A2A’s choice) is preferable when agents need to interoperate across boundaries; the file-based design (TAP’s and ours) is preferable when agents are co-located and the operational simplicity of filesystem persistence is valuable.

7. Conclusion

The first three posts in this series described what makes a multi-agent pipeline reliable: enforcement of rules, structured discussion for judgment, automated evaluation for measurement. Each of those layers presumed an architecture in which every agent had enough room to do its work. This post documents that architecture.

The Captain-Dispatcher pattern is one design choice with broad consequences. By launching every agent as an independent top-level process and routing messages through a file-based mailbox, the architecture gives each agent the full context budget the model offers, makes the orchestrator small enough to run for the entire pipeline, and produces a durable, inspectable, replayable record of every inter-agent message. The constraints that motivated this design — model identifier stripping during sub-process spawn, enum-limited model parameters in framework spawn tools, opaque inheritance in skill frontmatter — are public and reproducible. The convergent pattern from A2A, TAP, and multiple independent teams is evidence that the design is the natural response to those constraints, not a workaround specific to our pipeline.

For other teams building multi-agent systems on similar substrates, the general lesson is to separate the orchestration loop from the worker runtimes. Whatever framework feature spawns agents in-process for you almost certainly imposes a context constraint that the framework does not surface and may not document. The way to escape that constraint is to spawn agents outside the framework — as user-equivalent independent processes — and use a transport (file-based mailbox, HTTP, or message bus) that does not require shared runtime state. The cost is modest engineering on the transport. The benefit is every agent operating at the full capability the model allows.

The next phase of work, beyond the scope of this post, is making the mailbox transport faster and more structured (message schemas, watch-based delivery, mailbox garbage collection) while preserving the durability and inspectability properties that make the file-based design valuable. Each layer above this substrate — rules, discussion, evaluation — continues to compound as the substrate itself becomes more robust.

References

Published work and public artifacts referenced in this post. Findings drawn from these sources are framed as independent validation of design choices we arrived at from operational requirements.

A2A Protocol. Agent-to-Agent Protocol Specification, v1.0.0. github.com/a2aproject/A2A and a2a-protocol.org/latest/specification/. — Protocol for communication between independent, opaque agent systems. Three transport bindings: JSON-RPC 2.0 (primary), gRPC, HTTP/REST. Agents have “opaque execution” and collaborate without shared internal state. Cited as protocol-level validation of the independent-process model.
TAP (Terminal Agent Protocol), v0.5.2. github.com/HUA-Labs/tap. — File-based peer-to-peer messaging between Claude, Codex, and Gemini agents through a shared comms/ directory with structured subdirectories (inbox, reviews, findings, handoff, retros, letters, logs, onboarding, receipts, archive). Three bridge modes: native-push, app-server, polling. Configurations: .mcp.json (Claude), ~/.codex/config.toml (Codex), .gemini/settings.json (Gemini). TAP does not spawn AI runtimes as sub-processes. Cited as a working implementation of the file-based variant of the independent-process pattern across heterogeneous agents.
Claude Code Issue #43782. github.com/anthropics/claude-code/issues/43782, April 2026. — Documents that the tmux-based subprocess launcher strips the large-context model identifier suffix when spawning sub-agents. Reproduced via ps -eo pid,args showing the base model identifier without the suffix in the subprocess command. Cited as documentation of the model identifier stripping behavior.
Claude Code Issue #34421. github.com/anthropics/claude-code/issues/34421, 2026. — Independent reproduction of the identifier-stripping behavior via subprocess command-line inspection. One of four independent reproductions across four runtime versions (2.1.76, 2.1.80, 2.1.87, 2.1.92).
Claude Code Issues #36670, #40929. github.com/anthropics/claude-code/issues/36670 and /40929, 2026. — Additional independent reproductions of the sub-agent context degradation behavior. Combined with #43782 and #34421, four independent users across four runtime versions document the same root cause.
Claude Code Issue #51254. github.com/anthropics/claude-code/issues/51254, 2026. — Documents that the Agent tool’s model parameter is constrained by a JSONSchema enum of three values (sonnet, opus, haiku), with the enum verified against the runtime’s published type definitions. Cited as documentation of the structural impossibility of specifying large-context model variants through the framework’s spawn tool.
Claude Code Issue #57718. github.com/anthropics/claude-code/issues/57718, 2026. — Controlled five-test experiment confirming that the global subagent-model environment variable (CLAUDE_CODE_SUBAGENT_MODEL) is silently clamped when used to attempt large-context variants. Cited as evidence that workarounds within the framework do not yield per-agent context control.
Claude Code Issue #57249. github.com/anthropics/claude-code/issues/57249, 2026. — Documents that skill subagents that declare a different model via frontmatter silently inherit the parent session’s context tier. 100% failure rate reproduction across four attempts. Cited as evidence of opaque inheritance behavior in skill frontmatter.
Claude Code Issue #50083. github.com/anthropics/claude-code/issues/50083, 2026. — Documents that the CLAUDE_CODE_AUTO_COMPACT_WINDOW environment variable cannot override the context window because the implementation uses Math.min(modelWindow, configured), capping at the model’s default when the server-side flag sets the model window to the default. Cited as evidence that environment-variable workarounds are fragile and require deep knowledge of implementation internals.
Claude Code Issues #19077, #46424, #59523, #60763, #61993. github.com/anthropics/claude-code/issues/, January–May 2026. — Five issues across at least four independent users/teams documenting convergent external-orchestration patterns: file-based handoff logs (run_log.md, handoff.md), separate-runtime dispatch loops, independent agent sessions with shared message buses. Combined: 30+ comments, 14+ reactions across the period. Cited as evidence of independent convergence on the external-orchestration pattern.

Hardware-related numbers in this post (time-to-context-exhaustion, agent counts, RAM footprints) are illustrative of the qualitative behavior we have observed and are sensitive to the specific model, prompts, hardware, and runtime version in use. The qualitative claims — that sub-agent spawning constrains per-agent context, that independent-process spawning preserves it, that file-based mailboxes are durable and inspectable — generalize across configurations.