Agent Design 101: When You Want to Add 1000 Rules to Your Agent
Designing an agent is not just writing instructions — it is choosing, for every rule, the channel that makes the agent actually follow it.
This is the seventh post in a series about building a multi-agent system for complex analytics. Post 2 (“Rule Compliance Through Harness Engineering”) established the foundations this post builds on. This post is the broader companion: the full menu of enforcement channels, what to do when the strongest channel is the wrong answer, and — most importantly — the decision of whether to add anything at all.
Summary
(a) Why a spectrum — not all-text, not all-hooks
- Post 2’s punchline: automate enforcement and you reach ~99% compliance. So why not make everything a hook?
- A hook’s cost is paid continuously — false-blocks on legitimate work, deadlocks with its own clear-action, maintenance as the system evolves.
- About four in five hooks we audited shipped silently broken.
- Large classes of rule can’t be mechanized at all — anything needing judgment is a category error for a string-matching hook.
- So the move isn’t “automate everything.” It’s match each rule to the cheapest channel that holds — and earn a stricter one only on documented recurrence.
(b) The fourteen channels
Eight Tier-A building blocks (weakest → strongest in enforcement):
- Memory note — durable cross-session fact, one per file.
- Spec rule — judgment text inline at the decision point.
- Warm reminder — one-line per-task recall trigger for a persistent agent.
- Reviewer check — blocking check by a second agent, for semantic properties.
- Linter rule — deterministic, content-blind check on a produced artifact.
- Harness hook — harness-level block on a tool-call event, for skippable process rules.
- Toolization — make the bad output unrepresentable by construction, then retire the gate.
- Liveness infra — daemon + proactive nudge, for stalls and resource ceilings.
Six Tier-B composite patterns for changes in the coupling across agents:
- Restructure — reorganizing existing structure on an accumulation signal.
- Orchestration flow — the control-flow graph: a step’s existence, order, or authority.
- Shared schema — the shape of an artifact many parties read and write.
- Agent roster — the set of agents and their lifecycle.
- Security boundary — a sensitive-data control where both error directions fail.
- Knowledge library — a reusable, durable cross-project fact.
(c) Core principles
- Every MANDATORY rule needs a fallback chain. “Try X → if it fails, try Y → else escalate.” A bare “you MUST do X” is a dead end when the precondition fails; each run then improvises a different workaround. The single largest source of run-to-run variance observed.
- Prefer AVOID — lead with the brakes. At scale, the dominant defect is too much enforcement added too eagerly, not too little.
- Earn the block. Start at the cheapest tier; climb only on a documented recurrence.
- Ship the full change-bundle. CREATE → ADOPT → GUARANTEE → PROPAGATE → DEMOTE. A primitive nobody is wired to call is dead.
- Gate on the real artifact, never on intent. A gate that polices what the agent meant is worse than no gate.
- Self-graded ≠ validated. A green self-test is not evidence of soundness.
- Match the channel to the property type. Mechanical → linter. Semantic → reviewer. Constructible → toolization. Environmental → liveness infra.
1. Introduction
You have just watched your agent skip the same rule for the third time. Your instinct, naturally, is to write the rule down. Again. Maybe louder, maybe at the top of the file, maybe with bold and capitals. This post is about why that instinct is usually wrong — and what to do instead.
What we learned in Post 2 (“Rule Compliance Through Harness Engineering”)
The foundations this post builds on (curves + evidence live in Post 2):
- Mechanism beats content — the same rule lands ~50% to ~99% compliance depending on the channel.
- The dead zone — rules buried mid-file are followed far less reliably than rules near the top or bottom.
- Lifetime decay — a persistent agent starts near full compliance at spawn and drifts down over many tasks as context fills and compacts.
- The four-condition framework — a rule sticks only when it is in context, triggered, actionable (a concrete step, not a principle), and verified.
- The three-layer defense — persistent agents are kept fresh by harness hooks + warm-rule re-injection per task + a periodic full re-read of the spec.
What it got right, and what it got wrong
Post 2’s headline still holds: hooks reach ~99% compliance, text rules drift to 50–70%, mechanism beats content. Two corrections emerged after several hundred more revisions:
Correction 1 (technical): the ~99% number was measured against each hook’s own self-test, which verified only that it fired — not that it passed the right cases. Audited adversarially, about four in five hooks were silently broken — they encoded the happy path. “If you can hook it, hook it” needs a soundness test: trigger on the real thing, clear via a token the system actually produces, cover every transport.
Correction 2 (the bigger inversion): early in a system’s life, the dominant defect is “too few rules, not enforced hard enough.” Past a few hundred revisions, that flips. The corpus fills with the opposite — heuristic gates over-blocking legitimate work, hooks shipped without their clear-emitter, redundant checks, self-graded confidence. The discipline that scales is not “automate everything.” It is AVOID first — earn the block.
2. What if most of the rules you want to add should never get added?
Every guide to agent design tells you how to add rules. The more important skill is not adding them. You wanted to add a thousand rules to your agent; most of them should die at stage 0 — adding enforcement is not free, and text is the most expensive thing to add (see the text budget below). The decision starts with a filter, not a menu.
flowchart TD
Start["A rule was violated,<br/>or a change is requested"] --> S0{"STAGE 0 — AVOID<br/>Is it needed at all?"}
S0 -->|"instance of an existing rule<br/>/ first occurrence to watch<br/>/ not actually valuable"| Nothing["Add NOTHING<br/>(escalate only on recurrence)"]
S0 -->|"a user correction or preference"| Backlog["Route to the backlog<br/>(log now, batch later)"]
S0 -->|"survives"| S1{"STAGE 1 — one primitive,<br/>or a coupling?"}
S1 -->|"enforces ONE property of<br/>one step or one artifact"| TierA["TIER A —<br/>the 8 building blocks"]
S1 -->|"lives in the COUPLING across<br/>producers / consumers / agents"| TierB["TIER B —<br/>the 6 composite patterns"]
style Nothing fill:#c8e6c9,stroke:#2E7D32
style Backlog fill:#fff9c4,stroke:#F9A825
style TierA fill:#e3f2fd,stroke:#1565C0
style TierB fill:#e1bee7,stroke:#6A1B9A
The four AVOID tests
Before routing anything, ask in order:
- Is it just an instance of an existing rule? → Add nothing.
- Is it a first occurrence to watch, not a rule to write? → Add nothing; escalate only on recurrence. Worked borderline example: an agent violated a low-impact formatting habit twice in a single run — recurrence on paper, but the blast radius is a couple of bullets in one section, easily fixed in revision. Still watch, don’t gate yet; the cost of a new check exceeds the cost of the misses.
- Is it not actually valuable? → Add nothing. (Crucial distinction: low compliance ≠ delete. A low-compliance rule may be an iteration opportunity, not dead weight. Distinguish not-valuable from not-followed.)
- Is it a user correction or preference? → Route it to an improvement backlog and handle it in a batch. A live hot-patch mid-project is how contradictory layers accumulate.
The cost-of-adding ladder
If a candidate survives the AVOID tests, walk this ladder top-down and stop at the first rung that holds:
| Rung | Action | Why it comes first |
|---|---|---|
| 0 | AVOID | Most candidates die here. |
| 1 | Migrate into an existing mechanism | Reuse beats build-new; a new linter is debt. |
| 2 | Displace / consolidate | If you must add, remove or merge a weaker rule to make room — don’t grow net lines. |
| 3 | Toolize | If a construction defect can be built away, prevent it rather than detect it. |
| 4 | Add a new rule/gate | Only now — at the cheapest tier that holds, with an escalation trigger planned. |
The text budget
Text is a scarce, shared, finite-attention resource — every addition taxes every existing rule. The caps below operationalize Post 2’s dead-zone and lifetime-decay results.
| Tier | Cap | Discipline |
|---|---|---|
| Spec (the agent’s instruction file) | ≈1,200 lines | crossing it is a restructure trigger, surfaced to a human — not worked around |
| Warm-rules (injected every task) | 15 rules | displace, don’t append; one line each |
| Memory (durable facts) | 1 fact per file | + a relevance description; never duplicate the codebase or git |
A hard rule across all three: operative content only — no version numbers, no changelogs, no “why we added this.” Rationale belongs in design docs, not the agent’s working context.
3. If you must add something, how do you pick the strength?
On the absence of a magic recurrence threshold. There is no number — no “block at the 4th miss” constant — and this post deliberately does not give one. Per its own anti-pattern about guessed numerics on a moving signal, hard-coding a threshold here is exactly the trap. Post 2’s 1st/2nd/3rd cadence is the shape (watch → log → escalate); blast radius and reachability override the count. Premature strength is the expensive mistake — a weak tier occasionally missing is a one-time cost; a strong tier false-blocks every run.
flowchart BT
subgraph LADDER["The block ladder (climb only on documented recurrence)"]
direction BT
M["1. Memory note <i>(text)</i>"] --> S["2. Spec rule <i>(text)</i>"]
S --> W["3. Warm reminder <i>(text)</i>"]
W --> R["4. Reviewer check <b>BLOCKS (semantic)</b>"]
R --> L["5. Linter rule <b>BLOCKS (mechanical)</b>"]
L --> H["6. Harness hook <b>BLOCKS (ambient, all transports)</b>"]
end
OFF1["TOOLIZATION — off-ladder, the EXIT<br/>Make the bad output unrepresentable;<br/>~100% via PREVENTION, not stricter blocking"]
OFF2["LIVENESS INFRA — fully off-axis<br/>Daemons + nudges for stalls / dead agents /<br/>resource ceilings; not about rule-following"]
H -.->|"defect is constructible?<br/>EXIT the ladder"| OFF1
style M fill:#fff9c4,stroke:#F9A825
style S fill:#fff9c4,stroke:#F9A825
style W fill:#fff9c4,stroke:#F9A825
style R fill:#c8e6c9,stroke:#2E7D32
style L fill:#a5d6a7,stroke:#2E7D32
style H fill:#81c784,stroke:#2E7D32
style OFF1 fill:#e1bee7,stroke:#6A1B9A
style OFF2 fill:#bbdefb,stroke:#1565C0
The master picture. The bottom three rungs are text — they don’t block; only Reviewer onward actually blocks. Toolization is the exit when a defect is constructible; liveness is fully off-axis.
Forcing questions decide how high to climb: Is a miss unacceptable? → at least reviewer/linter. Can the agent skip the check, and is that unacceptable? → hook. Must you stop the action before it happens? → hook. Is the defect constructible — is there a buildable artifact whose existence makes the bad output impossible? → toolization, not a harder block. If none is a hard “yes,” stay lower.
4. The Six Block-Ladder Channels — earn each rung
For each rung: what it is, how it works, what fails without it, why the rung below it doesn’t hold for this case, why the rung above it is unearned overkill.
Compliance percentages are measured from operating the system but are undated and partly soft — treat them as the relative ordering of channels, not precise constants.
4.1 Memory note — the floor
- What it is: the weakest channel — a durable cross-session fact or preference, one fact per file, loaded at session start only (so it still decays mid-session).
- How it works: one fact per file + a short relevance description; never duplicate what the codebase or git records; verify on recall.
- Rung logic: Without it, cross-session knowledge evaporates. Spec above is unearned when the fact only matters across sessions, not at any in-task decision. Inverse failure: parking a load-bearing rule only in memory leaves it at ~60% forever; on recurrence it must hand off upward.
- Real example (genericized): the user prefers related bullets grouped under a bold parent headline rather than left flat. The preference lives as a one-line memory note so it survives across projects; on recurrence it was escalated into a formatting linter.
4.2 Spec rule — inline judgment text
- What it is: an analytical or judgment rule written into the agent’s instruction file, applied while reasoning, that cannot be mechanized.
- How it works: placement dominates (post 2) — inline at the decision point ≫ buried “Rules” section ≫ a pointer to another file. Operative-only; under the line budget. Spec text alone is documentation, not enforcement — load-bearing rules need a reviewer backstop.
- Rung logic: Memory doesn’t hold when the rule fires at a specific in-task decision. Warm is unearned when the agent is single-shot (no lifetime decay to defeat).
- Real example (genericized): “When the requested metric cannot be reproduced, compute two or three adjacent proxy metrics and present both the literal answer and the closest proxy” — stated inline at the synthesis step, backstopped by a reviewer.
4.3 Warm reminder — a per-task recall trigger
- What it is: a one-line reminder injected alongside every new task, to defeat the lifetime decay that hits long-lived agents (curve quantified in post 2).
- How it works: a self-contained trigger in a warm-rules file the harness re-injects each task; the rule body stays in the spec. Cap the set (we use 15), displace rather than append. Inline text — never a “see file X” pointer; pointer-style warm-rules measured 10–27% compliance versus 90%+ for inline.
- Rung logic: Spec doesn’t hold under decay across many tasks — fine at task 1, gone by task 5. Reviewer is overkill when the agent can do it once reminded.
- Real example (genericized): the most-violated formatting rule for the document-writing agent — “only the header row of a table is bold; data cells are plain” — is injected as a one-line warm reminder each task.
4.4 Reviewer check — semantic / judgment enforcement
- What it is: a blocking check by a second agent, for any property where “right” requires understanding meaning — is the metric faithful? is the claim supported? is the proxy justified? is the premise verified?
- Asks questions only meaning can answer. Example check: “Is this conclusion supported by the data shown?” No string-match can decide that.
- How it works: a dedicated reviewer (and on dispute, an arbiter) challenges the output and emits a PASS sign-off. Quality ladder: self-review → reviewer → bounded discussion → arbiter. Hard boundary: never mechanize the judgment in a linter — a string-matcher false-positives on faithful work.
- Rung logic: Without it, a defensible-but-wrong answer ships (correlation as causation, wrong denominator inflating a rate, speculative number as fact). Warm doesn’t hold when the property requires interpreting the output; linter above is a channel-class mismatch.
- Real example (genericized): a reviewer agent independently re-derives a headline number and challenges an unsupported causal claim before publication. Its PASS sign-off is gated by a linter that checks the sign-off exists and is substantive: non-empty + contains the required structural fields (names the claim it checked + the check it applied) — never a judgment of whether the content is correct (that’s the reviewer’s job).
4.5 Linter rule — a mechanical, content-blind check
- What it is: a deterministic check of a property decidable with no understanding of meaning — format, structure, presence, path, count, ordering.
- Asks questions any
grepcan answer. Example check: “Does this file have the required heading, and are the bullets nested at the correct depths?” No model needed. - How it works: mechanical only, and categorical, not a guessed numeric threshold (guessed numbers on a moving dataset get disproved and start false-blocking). Second legitimate role: gating the presence and substance of a reviewer’s sign-off. When a linter misfires, fix the tool, never add an override.
- Rung logic: Without it, mechanical defects ship (broken nesting, missing sections, placeholder text). Reviewer below is overkill on a check
grepdecides; hook above is unearned when the agent runs the linter anyway.
War-story — “The gate that cried wolf.” An early gate guessed a numeric threshold to estimate whether the agent had been “thorough enough” by counting label tokens on a moving dataset. It false-blocked legitimate deliverables dozens of times before analysts started overriding it to ship. The fix wasn’t to escalate the gate; it was to make the rule categorical (presence/absence of a structural element) and delete the guessed number. After the rewrite, the formatting linter (bullet-nesting depth, blank-line spacing before headings) stuck at 95%+. Lesson: never key a gate on what the agent meant — only on the real artifact.
4.6 Harness hook — a harness-level block
- What it is: a check wired into the harness itself that fires ambiently on a tool-call event, intercepting across every transport — the strongest single block on the ladder.
- How it works: must pass a three-part soundness test. P1: triggers on the real thing (only path, not a heuristic for intent). P2: clears via a produced token — a nonce-bearing receipt on a durable artifact, re-read fresh, never an ephemeral event the harness might dedupe. P3: cannot be reached another way (covers direct writes, shell writes, tool-API writes, renames). Ships with three mandatory companions: the emitter that writes the token, a false-positive test on real legitimate inputs, and deadlock-safety.
- Rung logic: Linter doesn’t hold because the agent can avoid it (skip, different transport, fail-open). Above the hook there is no stricter rung — the question becomes “is the defect constructible — exit to toolization?” A process rule the agent can-but-doesn’t follow will keep being skipped — “a sixth text reminder is the definition of insanity”; only an unskippable harness block holds.
War-story callout — “The hook that blocked everything.” A block-until-receipt hook shipped without its companion emitter. The system meant to write the receipt was never wired in; the token was never produced; every guarded write blocked — production deliverables, internal drafts, exploratory work, all of it. The cascade was caught only because all writes failed at once (the more dangerous failure is when only legitimate work blocks). Lesson: a hook is at minimum a bundle (hook + emitter + clear-path test) shipped atomically, or it is worse than nothing.
War-story callout — “All green, four in five broken.” A batch of hooks all passed their own self-tests; we were quoting the 99% number out of exactly those tests. An adversarial harness — real violations + legitimate inputs — found about four in five failed at least one direction. They had encoded the happy path; they had never been challenged on the unhappy one. Lesson: every enforcement ships a false-positive test on real legitimate inputs and an independent check that it blocks the real violation.
- Real example (genericized): a hook blocks publishing a deliverable until the quality-review pipeline has produced its sign-off receipt — keyed on the real receipt artifact (P2), covering every write transport (P3), shipped together with its emitter, FP-test, and deadlock-safety companion.
The six at a glance
| # | Channel | Use it for | Compliance | Must be wired with |
|---|---|---|---|---|
| 1 | Memory note | a durable cross-session fact/preference | ~60% | the escalation ladder (escalate on recurrence) |
| 2 | Spec rule | a judgment rule applied while reasoning | 50–70% | a reviewer backstop, if load-bearing |
| 3 | Warm reminder | a real rule a persistent agent forgets | ~90% | an existing referent + harness re-injection |
| 4 | Reviewer check | a semantic/judgment property | 70–80% | a spec rule + a gate on the sign-off artifact |
| 5 | Linter rule | a mechanical/structural property | 95%+ | (stands alone) — categorical, not numeric |
| 6 | Harness hook | a recurring skippable process rule | 99%+ | a companion token-emit + FP-test + deadlock-safety |
Soft and undated — treat as a relative ordering, not precise constants.
5. The two channels that sit OFF the block ladder
Two more Tier-A primitives. The most common design error is treating them as “rungs above the hook.” They are not — they answer different questions entirely.
5.1 Toolization — prevent-by-construction (the ladder’s EXIT)
- What it is: instead of letting the agent perform an error-prone step and catching the error, you build a tool that makes the bad output unrepresentable and the sole path — then retire the gate (poka-yoke, “make illegal states unrepresentable” — see References).
- Why this is OFF the block ladder, not above it: a hook is the strongest block. Toolization is not a harder block — it is prevention, which makes blocking unnecessary. The ladder is climbed by “how strictly do I block?”; toolization is reached by a different question: is the defect constructible? If yes, exit.
- How it works (the build sequence — last steps are load-bearing because you are removing a safety net): find the bar (the last point of choice) → build so the bad output cannot be expressed → make the tool the sole path → prove it correct on real artifacts first → ship an escape hatch (e.g., a
--fallbackflag that reverts to the old path, or a logged manual override the on-call sees) → and only then demote the now-redundant gate. Litmus test: after the change, does the agent still perform the error-prone step? If yes, you only handed it a standard — keep the gate. - Real example (genericized): rather than repeatedly linting hand-authored document markup for broken nested bullets, all publishing routes through a single emitter that converts agent markdown to correct nested markup by construction. Once the emitter is the sole path, the nesting linter is demoted — but only after it is proven on real documents; otherwise the tool’s own bug becomes universal.
5.2 Liveness infra — environmental, fully off-axis
- What it is: the channel for failures an agent cannot self-enforce and text cannot make happen — stalls (dropped handoff, idle orchestrator, silently-dead agent, lost sub-agent output) or resource ceilings (context-window crash, memory leak, OOM kill).
- Why this is FULLY off-axis: liveness has nothing to do with rule-following. The agent is stuck, dead, or out of room — no rule to enforce, no agent in a state to enforce it.
- How it works: for stalls, a daemon (polling file modification times or heartbeats) plus a proactive nudge — never a text rule (cannot fire when the agent is idle or dead) and never a cron timer (documented dead-end: pure noise). For resource ceilings, re-architect the environment. The monitor needs its own monitor.
- Real example (genericized): a watchdog daemon polls each agent’s output-file modification time; if an agent goes silent at a handoff past a threshold, it nudges the orchestrator. The watchdog self-checks and can be respawned, so the monitor cannot quietly fail.
- Pairs with Orchestration flow (Tier B). A dropped-handoff stall has two fixes shipped in order, not as alternatives: the liveness daemon detects and recovers the stall now; the orchestration fix (next step reads a produced artifact, not a live message to a peer who may already be gone) prevents the drop structurally. Ship the daemon first; queue the orchestration change behind it.
6. The Cross-Cutting Machinery
Three bodies of machinery span the whole menu — what separates a working enforcement architecture from a pile of rules.
6.1 The change-bundle: CREATE → ADOPT → GUARANTEE → PROPAGATE → DEMOTE
Every change ships as a five-step bundle, atomically. A primitive nobody is wired to call or read is worse than absent — it gives false safety.
flowchart LR
C["CREATE"] --> A["ADOPT ⚠<br/>wire it to be called / read"]
A --> G["GUARANTEE<br/>FP-test + deadlock-safety<br/>+ unforgeable clear"]
G --> P["PROPAGATE ⚠<br/>push to EVERY producer,<br/>consumer, validator, copy"]
P --> D["DEMOTE<br/>retire the weaker gate (last)"]
style C fill:#e3f2fd,stroke:#1565C0
style A fill:#ffcdd2,stroke:#C62828
style G fill:#fff9c4,stroke:#F9A825
style P fill:#ffcdd2,stroke:#C62828
style D fill:#c8e6c9,stroke:#2E7D32
ADOPT and PROPAGATE (red) are the two most-skipped steps — give them equal billing to CREATE. The recurring failure is shipping CREATE alone (the “approved-but-not-wired” hook, the tool nobody calls). A wired hook is also not a firing hook: verify event, matcher, actor. DEMOTE is first-class but happens last, after the new path is proven on a real artifact — or you universalize an unfixed bug.
6.2 Linter vs. gate vs. hook — the vocabulary, and the one dial
- Gate — a posture: the check blocks (vs. warns or logs). Not a mechanism.
- Linter — a mechanism: a deterministic check invoked at a step, on a produced artifact.
- Hook — a mechanism: wired into the harness, fires ambiently on a tool-call event.
A blocking linter and a blocking hook are both “gates”; the meaningful choice is the one dial:
flowchart TD
Q{"Does the check sit BETWEEN<br/>the agent and its tools?"}
Q -->|"NO — invoked on an artifact<br/>after the fact"| LIN["LINTER<br/>bypassable + deadlock-PROOF<br/>(cheap)"]
Q -->|"YES — intercepts the<br/>action itself"| HOOK["HOOK<br/>unbypassable + deadlock-PRONE<br/>(expensive)"]
style LIN fill:#c8e6c9,stroke:#2E7D32
style HOOK fill:#fff9c4,stroke:#F9A825
6.3 The two-axis defeat framework
When a gate keeps failing, there are exactly two ways an agent defeats it — and they need different fixes. Confusing them is why teams escalate to a hook and the problem persists.
flowchart TD
G["A gate keeps getting defeated"] --> A1{"Axis 1 — does the agent<br/>AVOID the check?"}
G --> A2{"Axis 2 — does the agent<br/>FAKE the clear?"}
A1 -->|"skip / wrong transport / fail-open"| F1["Cure: escalate to a HOOK<br/>(unskippable, all-transport)"]
A2 -->|"self-asserts 'done'<br/>without verifying"| F2["Cure: TRUST-ROOT<br/>recompute from ground truth,<br/>or a gate-issued nonce"]
F1 --> R["Robust gate =<br/>UNSKIPPABLE and UNFORGEABLE<br/>(two separate fixes)"]
F2 --> R
style F1 fill:#fff9c4,stroke:#F9A825
style F2 fill:#fff9c4,stroke:#F9A825
style R fill:#c8e6c9,stroke:#2E7D32
Forgeability bites linters and hooks alike — a hook with a forgeable clear is just as fakeable (echo > token-file). Robust = unskippable and unforgeable, two independent pieces of work.
7. Composite Patterns (Tier B): When the Change Lives in the Coupling
This section is a labeled reference catalog, not a §4-depth tour. Each of the six patterns below is a discipline in its own right, deeper than one table row — what follows is the map, plus the single rule that governs them all: ship every step of the wiring, or the change breaks. The dividing test for routing into Tier B is does the change live in one primitive, or in the coupling across several?
Worked example — Orchestration flow. Adding a new “review” step between two existing agents is not “write a reviewer rule.” It is at minimum three pieces wired atomically: the step node itself (the new agent invocation in the control-flow graph), a gate on it (the next step does not run until the review artifact exists), and a handoff artifact (the next step reads that produced file, not a live message to a peer who may already be gone). Ship one of the three without the other two and the addition is worse than absent — silently skipped, silently raced, or quietly losing work.
| Pattern | When | Must be wired with (miss one → broken) |
|---|---|---|
| Restructure | reorganizing existing structure on an accumulation signal (drift across copies, a >1,200-line spec, redundant surfaces, a chronically-failing rule) | map every reference → rewrite all → verify identical → defer the delete behind a human gate → end with a single-source guard |
| Orchestration flow | changing the control-flow graph itself — a step’s existence/order, a dependency, decision authority, or spawn topology | a gate per new step + handoff artifacts (the next step reads a produced artifact, never a live message to a peer who may be gone) |
| Shared schema | changing the shape of an artifact that multiple producers, consumers, and validators agree on | one schema owned by a sole-path emitter, read by every enforcer; a change is a migration to all consumers |
| Agent roster | adding, removing, or merging an agent, or changing its spawn/teardown lifecycle | the full coupled set: role token → spec → guarded spawn path → orchestration wire + teardown → version row → registry |
| Security boundary | changing how sensitive data is handled, where both a false-negative and a false-positive are failures | guard both directions; sensitive-mode via environment variables, never CLI flags (flags suppress the very controls); never silently disable a control |
| Knowledge library | routing a reusable, verified, durable, worthwhile cross-project fact into shared stores | a single writer + a human-approval gate + validate-on-use (the consumer re-checks before relying) |
The thread through all six is the change-bundle from §6.1 — and the step most often skipped is PROPAGATE.
8. Industry Landscape
The channels here are not exotic; the contribution is the unified menu plus the discipline of when not to use it. Post 2 compares the multi-agent and guardrail frameworks themselves; this table maps the new concepts in this post to their closest prior art.
| Concept here (new in this post) | Closest established work | What this adds |
|---|---|---|
| AVOID-first / dominant-risk inversion | YAGNI [Beck, 2000]; NIOSH Hierarchy of Controls | the empirical inversion: at scale, over-enforcement is the dominant defect, not under-enforcement |
| Toolization / prevent-by-construction (off the block ladder) | poka-yoke [Shingo, 1986]; “make illegal states unrepresentable” [Minsky, 2011] | framing prevention as the exit from the enforcement ladder, with a build sequence that ends in demoting the gate |
| Two-axis defeat framework (reachability vs. forgeability) | Swiss Cheese Model [Reason, 1990; Shamsujjoha et al., 2025] | the operational diagnostic: separating the two failure modes prevents the “escalate to hook and the problem persists” trap |
| Change-bundle (CREATE → ADOPT → GUARANTEE → PROPAGATE → DEMOTE) | software change management; deployment pipelines | atomicity as the unit: a primitive without ADOPT and PROPAGATE is worse than absent |
| Earn-the-block escalation | implementation intentions [Gollwitzer & Sheeran, 2006]; AgentSpec [Wang et al., ICSE 2026] | a menu of channels behind the trigger-action-verification shape, ordered by cost, climbed on recurrence |
| Soundness-test discipline for hooks | property-based testing; adversarial evaluation | the P1/P2/P3 test + companion-bundle (emitter, FP-test, deadlock-safety) shipped atomically with every hook |
The gap no existing framework fills: most guardrail systems are binary safety classifiers, and most agent frameworks write instructions once at spawn. None offers a graduated, cost-aware menu of enforcement channels with an explicit “earn the block” escalation rule, an explicit exit to prevent-by-construction, and a separate off-axis channel for liveness — which is precisely where the over-enforcement failures at scale come from.
9. Principles for Designing an Agent
Ordered by impact (most important first).
Principle 1: Every MANDATORY rule needs a fallback chain
A “you MUST do X” rule that states only the happy path leaves the agent at a dead end when its precondition fails — and each run then improvises a different workaround, producing wildly inconsistent results. This was the single largest source of run-to-run variance observed. Write every mandatory rule as “try X → if it fails, try Y → else escalate with context.” A rule without a fallback is not really a rule; it is a wish, and the agent will invent the path.
Principle 2: Prefer AVOID — lead with the brakes
At scale the dominant risk inverts: it is too much enforcement, added too eagerly, not too little. Most candidate rules should die at the four AVOID tests. Adding is never free; text is the most expensive thing to add.
Principle 3: Earn the block — start cheap, escalate on recurrence
The ladder is text → reviewer/linter → hook, with toolization as the exit and liveness off-axis. Climb only on a documented recurrence. Every over-enforcement scar came from gating or hooking on day one.
Principle 4: Ship the full change-bundle
CREATE → ADOPT → GUARANTEE → PROPAGATE → DEMOTE, atomically. A primitive nobody is wired to call is dead — and worse than absent, because it gives false safety. ADOPT and PROPAGATE are the steps you will be tempted to skip.
Principle 5: Gate on the real artifact, never on intent
A gate must key on the real produced artifact or the real execution — never a text-match, a message’s content, or an estimator of what the agent meant. A gate that polices intent is worse than no gate; it false-blocks legitimate work far more often than it catches a real violation.
Principle 6: Self-graded ≠ validated
A green self-test is not evidence of soundness — roughly four in five audited hooks passed their own self-test yet were broken, because self-tests hard-code the happy path and encode the old, broken contract. Every enforcement ships a false-positive test on real legitimate inputs plus an independent review, and you verify every “already done” claim against live code in a clean environment.
Principle 7: Match the channel to the property type
Mechanical → linter. Semantic/judgment → reviewer (never a linter — it will false-positive on faithful work). Constructible → toolization. Environmental → liveness infra. Crossing these boundaries is a recurring, predictable mistake. (Placement and inline-vs-pointer discipline within text channels is governed by post 2.)
10. Conclusion
- Designing an agent is choosing enforcement channels, not writing prose. The same rule lands at very different compliance depending entirely on the mechanism behind it.
- The instruction file teaches an agent what to do. The channel you choose determines whether it does.
Appendix
A. The Fourteen Channels at a Glance
| # | Channel | Tier | Role | Use it for |
|---|---|---|---|---|
| 1 | Memory note | A | block ladder rung 1 | a durable cross-session fact/preference |
| 2 | Spec rule | A | block ladder rung 2 | a judgment rule applied while reasoning |
| 3 | Warm reminder | A | block ladder rung 3 | a real rule a persistent agent forgets |
| 4 | Reviewer check | A | block ladder rung 4 | a semantic/judgment property |
| 5 | Linter rule | A | block ladder rung 5 | a mechanical/structural property |
| 6 | Harness hook | A | block ladder rung 6 (top) | a recurring skippable process rule |
| 7 | Toolization | A | OFF-LADDER — the exit | a buildable-away construction defect |
| 8 | Liveness infra | A | OFF-LADDER — off-axis | a stall, dead agent, or resource ceiling |
| 9 | Restructure | B | coupling | reorganizing structure on an accumulation signal |
| 10 | Orchestration flow | B | coupling | the control-flow graph (steps/order/authority) |
| 11 | Shared schema | B | coupling | a contract many parties read and write |
| 12 | Agent roster | B | coupling | the set of agents and their lifecycle |
| 13 | Security boundary | B | coupling | a sensitive-data control (both error directions fail) |
| 14 | Knowledge library | B | coupling | a reusable cross-project fact |
B. The Decision Tree (condensed)
STAGE 0 — AVOID (most candidates die here)
• existing rule / first occurrence to watch / not valuable → ADD NOTHING
• user correction or preference → backlog (batch later)
• can it be a mechanism instead of text? → migrate into existing tool/linter/hook
STAGE 1 — one primitive (Tier A) or a coupling (Tier B)?
TIER A — classify the property:
ENVIRONMENTAL (stall / dead / limit-hit) → Liveness infra (OFF-AXIS — daemon + proactive)
CONSTRUCTIBLE (can build the defect away) → Toolization (OFF-LADDER EXIT — sole-path → prove → escape hatch → demote)
EVERYTHING ELSE — climb the block ladder weakest→strongest, stop at the rung that holds:
durable cross-session fact → Memory note (rung 1; floor)
in-task judgment rule → Spec rule (rung 2; inline at the decision point)
persistent agent, decays → Warm reminder (rung 3; 1-line inline; re-injected each task)
semantic / judgment property → Reviewer check (rung 4; + spec rule; gate sign-off, never content)
mechanical / structural → Linter rule (rung 5; categorical, not a guessed number)
recurred + skippable + binary → Harness hook (rung 6; P1/P2/P3 + emitter + FP-test + deadlock-safety)
TIER B — which coupling?
steps/order/authority → Orchestration flow
shape read+written by many → Shared schema
agents or their lifecycle → Agent roster
sensitive-data boundary → Security boundary
reusable cross-project fact → Knowledge library
reorganizing existing structure → Restructure
STILL failing? — two-axis defeat: AVOIDS → hook; FAKES the clear → trust-root (recompute or nonce).
Robust = unskippable AND unforgeable (two separate fixes).
C. The Anti-Pattern Catalog (what each channel prevents)
| Anti-pattern (do NOT build this) | The rule |
|---|---|
| Spec/self-check text for a process rule the agent skips | an agent that can-but-doesn’t → go straight to a gated step |
| A cron timer or text rule for a liveness problem | liveness needs a daemon + proactive check |
| A guessed numeric threshold on a moving dataset | make the rule categorical; if you “need a number,” fix the mechanism |
| An intent-heuristic gate | key on the real artifact/execution only |
| A hook without its companion token-emit | ship the hook + emitter + loophole-close + verify, atomically |
| Overriding a misfiring linter | fix the tool, never add an override |
| Self-graded “perfect” / self-test as validation | require an independent adversarial test + an FP-test |
| Editing one side of a shared contract | a change is a migration: propagate to every consumer |
| A MANDATORY rule with no fallback chain | every MUST gets “try X → else Y → else escalate” |
| Treating toolization as “a stricter hook” | toolization is the ladder’s EXIT — DEMOTE the gate once the bad output is impossible |
D. References
Instruction Following & Agent Behavior
- [Zhou et al., 2023] “Instruction-Following Evaluation (IFEval).”
- [AGENTIF, 2025] “Agentic Instruction Following Benchmark.” Tsinghua University.
- [Wang et al., 2026] “AgentSpec: Runtime Enforcement for LLM Agent Systems.” ICSE 2026.
Enforcement, Guardrails & Safety
- [Rebedea et al., 2023] “NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications.”
- [OpenAI, 2025] “Agents SDK.” Mar 2025.
- [Reason, 1990] “Human Error.” Cambridge University Press. (Swiss Cheese Model.)
- [Shamsujjoha et al., 2025] “Swiss Cheese Model for AI Agent Safety.” IEEE ICSA 2025.
Mistake-Proofing & Software Design
- [Shingo, 1986] “Zero Quality Control: Source Inspection and the Poka-Yoke System.” (Mistake-proofing.)
- [Minsky, 2011] “Effective ML / Make Illegal States Unrepresentable.”
- [Beck, 2000] “Extreme Programming Explained.” (YAGNI.)
- [NIOSH] “Hierarchy of Controls.” National Institute for Occupational Safety and Health.
Cognitive Science & Human Factors
- [Gollwitzer & Sheeran, 2006] “Implementation Intentions and Goal Achievement: A Meta-Analysis.” Advances in Experimental Social Psychology.
This is the seventh post in a series. The first post covers the multi-agent architecture and analysis lifecycle; the second covers rule compliance through harness engineering (the dead-zone, lifetime decay, four-condition framework, and three-layer defense this post builds on); later posts cover agent discussion, automated evaluation infrastructure, and the watchdog liveness layer. Together they describe one system’s answer to a single question: how do you make a team of AI agents reliably do the right thing?