Claude Code Subagents: Parallel Work Without Chaos

Q: What is a Claude Code subagent?

A Claude Code subagent is a separate Claude invocation spawned by a parent session with its own context window, tool access, and a single defined task. The parent sends a prompt, the subagent runs to completion, and only the final summary returns to the parent's context. This keeps the parent's context clean because none of the files read or intermediate steps pollute it. Subagents are the main mechanism for handling large codebase work without exhausting the parent context window.

Q: When should I use a subagent instead of working inline in Claude Code?

Use a subagent when a task requires reading five or more files to answer one question, when independent searches can run across separate modules, or when you need a research-plan-implement pipeline. Stay inline for single-file edits under 200 lines or refactors that touch tightly coupled files where shared context matters. The rule of thumb is parallelize work that doesn't share state and serialize work that does. Spawning overhead makes subagents wasteful for trivial tasks.

Q: How do I write a good subagent prompt?

Treat the prompt as a contract with a junior engineer who cannot ask follow-up questions. Enumerate the specific checks or items to look for, bound the search space to specific directories or files, restrict which tools can be used, and lock the return format to a strict JSON schema. Forbid edits if the subagent will run in parallel with others. This pattern (enumerate, bound, constrain tools, lock the schema) prevents drift and makes output deterministically parseable.

Q: Why do subagents burn so many tokens?

The three common failure modes are unscoped exploration (reading every file that mentions a topic), overlapping subagents that all read the same files, and implementation subagents without acceptance criteria that produce different output every run. Each problem stems from the parent failing to scope work before spawning. Fixes include giving a starting file with a depth budget, pre-partitioning the search space across subagents, and defining a concrete test command that must pass for completion.

Q: What is the research-plan-implement pipeline pattern in Claude Code?

It's a sequential subagent workflow where each stage writes its output to a file and the next stage reads that file as its sole input. Stage one researches and writes research.md, stage two reads it and produces plan.md, stage three implements the plan as a diff, and stage four verifies with tests and lints. The parent only holds four short status summaries instead of the full research context. This reduces token cost, speeds up responses, and lowers hallucinations because each stage receives only what it needs.

By Lazar Milicevic · Published June 13, 2026 · 12 min read

Developer working at a laptop with multiple terminal windows open, illustrating parallel Claude Code subagent workflows

You opened Claude Code to refactor an auth module, ended up two hours deep in a session that burned 400k tokens, and the diff is still half-broken. The model bounced between reading files, planning, and editing — losing context every time it pulled in another 800-line file. Subagents fix this, but only if you scope them like you'd scope a Jira ticket: tight inputs, tight outputs, no ambiguity about who owns what.

What a subagent actually is

A subagent is a separate Claude invocation spawned by your main session with its own context window, its own tool access, and a single defined task. The parent session sends a prompt, the subagent runs to completion, and only the final summary comes back into the parent's context. Everything the subagent read, every dead end it explored, every intermediate tool call — none of it pollutes the parent.

That last sentence is the whole reason subagents exist. Context windows are the bottleneck on long sessions. A 50-file codebase audit will blow past your parent context if done inline. Done as a subagent, the parent sees a 2,000-token summary instead of 200,000 tokens of raw file content.

Two practical implications:

Subagents are cheap on parent context, expensive on tokens. You pay for every token the subagent reads. Fan-out multiplies that cost.
Subagents can't ask follow-up questions. Once spawned, they execute until they finish or fail. If your prompt is ambiguous, you get ambiguous output.

When to fan out, when to stay sequential

Subagents are not free. Every spawn is a new context, a new system prompt, and a new round of tool discovery. For trivial work, inline is faster and cheaper.

Here's the heuristic I use:

Task shape	Approach	Why
Single-file edit, <200 lines	Inline	Spawning overhead exceeds the work
Read 5+ files to answer one question	Single subagent	Keeps parent context clean
Independent searches across modules	Parallel subagents	Wall-clock wins, no shared state
Plan → implement → test pipeline	Sequential subagents	Each stage hands off a contract
"Explore the codebase and figure out X"	Single research subagent	Open-ended work needs one owner
Refactor that touches 3 coupled files	Inline	Coupling means shared context matters

The trap is fanning out work that has hidden dependencies. If subagent A renames a function and subagent B is editing a caller of that function in parallel, you get a merge conflict the parent has to resolve — except the parent doesn't know about the rename until A reports back. Now you're debugging a problem you created.

Rule: parallelize work that doesn't share state. Serialize work that does.

Scoping the prompt: the contract pattern

A subagent prompt is a contract. You define inputs, outputs, allowed tools, and acceptance criteria. Treat it like writing a function signature for a junior engineer who will not be in the room to ask questions.

A bad prompt:

Look at the auth module and find any security issues.

This will burn tokens reading every file twice, produce a wandering report, and miss things because "security issues" is undefined.

A scoped prompt:

Audit src/auth/ for these specific issues:

1. Routes that read req.body without schema validation
2. JWT verification that doesn't check `exp` and `iss`
3. Password comparison that uses === instead of constant-time compare
4. Session cookies missing httpOnly or secure flags

For each finding, return:
- File path and line number
- The exact problematic code (3-line snippet)
- The fix as a unified diff

Do not modify files. Do not audit anything outside src/auth/.
Return findings as JSON matching this schema:

{
  "findings": [
    {
      "category": "string",
      "file": "string",
      "line": number,
      "snippet": "string",
      "fix_diff": "string"
    }
  ]
}

If you find zero issues, return {"findings": []}. Do not pad.

The second prompt does four things the first doesn't: it enumerates the categories so the model can't drift, it bounds the search space (src/auth/ only), it forbids edits (so this subagent is safe to parallelize with others), and it locks the return format so the parent can parse the output deterministically.

This is the pattern: enumerate, bound, constrain tools, lock the schema.

The fan-out pattern: parallel research

The highest-leverage use of subagents is parallel research on a large codebase. You're trying to answer a question that requires reading a lot of code, and you want the answer fast without melting your parent context.

Example: you're about to migrate from REST to gRPC and need to know every external API call surface. Inline, you'd grep, then read 30 files, then summarize — and you'd lose the thread halfway through. With subagents:

parent_task: "Inventory all external HTTP API endpoints we expose"

subagents:
  - name: routes_express
    scope: "src/routes/**/*.ts"
    task: "List every Express route handler with method, path, request schema, response schema"
    output: routes_express.json

  - name: routes_fastify
    scope: "src/services/**/fastify-*.ts"
    task: "Same inventory for Fastify handlers"
    output: routes_fastify.json

  - name: graphql_resolvers
    scope: "src/graphql/**/*.ts"
    task: "List every GraphQL query and mutation resolver with input/output types"
    output: graphql.json

merge_step: |
  Read all three JSON files.
  Produce a unified table grouped by domain (auth, billing, users, ...).
  Flag any endpoint that appears in two systems (duplicate surface).

Three subagents run in parallel. Each touches a disjoint part of the tree. None of them edit files. The merge step is small, runs in the parent, and produces the artifact you actually wanted.

Wall-clock time on a real codebase: roughly 1/3 the inline equivalent, because the three reads happen concurrently and the parent never had to load any source file into its own context.

The pipeline pattern: research → plan → implement

Sequential subagents work when each stage produces a tight artifact that the next stage consumes. The classic shape is research → plan → implement → verify.

# Stage 1: research subagent
# Input: feature request
# Output: research.md — what exists, what's missing, constraints

# Stage 2: planning subagent
# Input: research.md
# Output: plan.md — ordered list of file edits with rationale

# Stage 3: implementation subagent
# Input: plan.md + access to edit tools
# Output: actual diff applied to the working tree

# Stage 4: verification subagent
# Input: the diff
# Output: test results, lint results, list of concerns

The discipline that makes this work: each stage writes its output to a file, and the next stage reads that file as its sole input. The parent isn't holding the research in its context window while implementation runs. The parent holds four pointers and four short status summaries.

Why this matters for cost and reliability: the implementation subagent does not need to see the original feature request, the user's earlier messages, or the failed exploration the research subagent went through. It needs plan.md and write access. That's it. Smaller context → faster responses, fewer hallucinations, lower token spend.

Where subagents burn tokens on chaos

Three failure modes I see repeatedly:

1. Unscoped exploration. "Figure out how the billing system works" with no file boundary will read every file the subagent can find that mentions billing, payments, invoices, charges, subscriptions, plans, or money. Token cost on a mid-size monorepo: easily 100k. Fix: give it a starting file and a depth budget. "Start at src/billing/index.ts. Follow imports up to 2 levels deep. Do not read test files."

2. Overlapping subagents. You spawn three subagents to "investigate the slow checkout flow." All three end up reading the same controller, the same service, and the same DB layer. You paid 3x for the same context. Fix: pre-partition the search space in the parent. The parent should know enough about the topology to assign disjoint slices.

3. Implementation subagents without acceptance criteria. "Add rate limiting to the API" produces a different implementation every run, and you have no way to tell the subagent it got it wrong without reading the entire diff. Fix: define the acceptance test in the prompt. "When complete, npm run test:rate-limit must pass. The test file already exists at tests/rate-limit.test.ts. Do not modify the test."

The pattern across all three: the parent did not do its job of scoping before spawning. Subagents amplify whatever discipline (or lack of it) you put into the prompt.

Merging results without losing fidelity

The hardest part of subagent workflows isn't the spawn — it's the merge. When three research subagents come back with three reports, the parent has to reconcile them into one artifact. Done badly, you lose information. Done well, you compress without dropping signal.

Two techniques that work:

Structured returns over prose. If subagents return JSON matching a shared schema, the merge step is a deterministic concatenation plus a small reasoning step. If they return Markdown essays, the merge step is itself an LLM call that may lose details.

Explicit conflict surfacing. Tell the merge step what disagreement looks like. "If two subagents report different line numbers for the same finding, list both and flag for human review." Don't let the model paper over inconsistencies — those inconsistencies are usually the interesting signal.

For the implementation pipeline, the "merge" is really a verification step: does the diff produced by stage 3 satisfy the plan from stage 2? A subagent dedicated to that comparison is more reliable than asking the implementation subagent to self-verify, because self-verification has the same blind spots as the original work.

A worked example: adding a new webhook handler

Concrete walkthrough of how I'd structure this with subagents.

Task: add a Stripe webhook handler for invoice.payment_failed that retries the customer's default payment method once, then sends a dunning email.

# In the parent session:

# Subagent 1 (research, read-only, parallel with #2)
spawn research_existing_webhooks \
  --scope "src/webhooks/**" \
  --task "Summarize how existing webhook handlers are structured: \
          file layout, signature verification, error handling, \
          idempotency strategy. Return as JSON." \
  --output webhooks_pattern.json

# Subagent 2 (research, read-only, parallel with #1)
spawn research_email_system \
  --scope "src/notifications/**" \
  --task "How are transactional emails sent? Template system, \
          queue mechanism, retry policy. Return as JSON." \
  --output email_pattern.json

# Parent waits for both, then:

# Subagent 3 (plan, depends on 1 and 2)
spawn plan_implementation \
  --inputs webhooks_pattern.json,email_pattern.json \
  --task "Produce a step-by-step plan: files to create, files to \
          modify, exact function signatures, where idempotency lives, \
          how retries are bounded. No code yet." \
  --output plan.md

# Human reviews plan.md here. This is the cheap checkpoint.

# Subagent 4 (implement, depends on plan)
spawn implement \
  --input plan.md \
  --tools edit,bash \
  --task "Execute the plan. Run npm test after each file change. \
          Stop and report if any test fails."

# Subagent 5 (verify, depends on implement)
spawn verify \
  --task "Run the full test suite. Run typecheck. Run lint. \
          Confirm the new handler matches the patterns in \
          webhooks_pattern.json. Report any deviations."

Five subagents, two of them parallel, one human review gate after the plan. The parent context never contains the full source of any webhook file — it only ever holds the JSON summaries, the plan, and the verification report.

What this costs vs. inline: more total tokens (you're paying for the parallel research), but dramatically less wall-clock time, a cleaner audit trail (every artifact is a file you can read), and a context budget that doesn't collapse halfway through.

How BizFlowAI approaches this

We run subagent patterns daily on client codebases — typically mid-size TypeScript or Python monorepos where inline exploration would burn through context before the first useful edit. The standard shape is a parallel research fan-out to map the relevant surface, a human-reviewed plan artifact, then a serialized implement-and-verify pair. The artifacts (research.json, plan.md, diff.patch, verify.log) become the deliverable, not just the code change — clients see exactly what was looked at, what was decided, and what was checked.

The delivery-time win comes from two places: parallelism on the read-heavy stages, and the plan checkpoint catching scope errors before the expensive implementation pass. If you want to see how this maps onto a specific task in your own stack, a discovery call walks through one of your real backlog items end-to-end and shows where the delegation cuts hours out.

Practical defaults to start with

If you're adding subagents to your workflow this week, start with these defaults and adjust:

Always file-bound your subagents. Give them an explicit scope path. "Anywhere in the repo" is a token bomb.
Read-only by default. Only the implementation subagent gets edit tools. Research and verification stay read-only.
Structured output over prose. JSON schema in the prompt, JSON in the return.
Artifact the handoffs. Stage N writes a file. Stage N+1 reads that file. The parent doesn't hold intermediate state.
One human checkpoint after planning. This is the cheapest place to catch direction errors.
Cap fan-out at 3-4 parallel subagents. Beyond that, merging gets harder than the work saved.

Subagents aren't magic. They're a way to keep your parent session's context budget alive across long tasks, and to parallelize the parts of the work that don't share state. The discipline is in the prompt: scope, bound, constrain, schema. Do that, and you get faster delivery and cleaner diffs. Skip it, and you get an expensive way to produce the same mess you would have produced inline.

Frequently asked questions

What is a Claude Code subagent?

A Claude Code subagent is a separate Claude invocation spawned by a parent session with its own context window, tool access, and a single defined task. The parent sends a prompt, the subagent runs to completion, and only the final summary returns to the parent's context. This keeps the parent's context clean because none of the files read or intermediate steps pollute it. Subagents are the main mechanism for handling large codebase work without exhausting the parent context window.

When should I use a subagent instead of working inline in Claude Code?

Use a subagent when a task requires reading five or more files to answer one question, when independent searches can run across separate modules, or when you need a research-plan-implement pipeline. Stay inline for single-file edits under 200 lines or refactors that touch tightly coupled files where shared context matters. The rule of thumb is parallelize work that doesn't share state and serialize work that does. Spawning overhead makes subagents wasteful for trivial tasks.

How do I write a good subagent prompt?

Treat the prompt as a contract with a junior engineer who cannot ask follow-up questions. Enumerate the specific checks or items to look for, bound the search space to specific directories or files, restrict which tools can be used, and lock the return format to a strict JSON schema. Forbid edits if the subagent will run in parallel with others. This pattern (enumerate, bound, constrain tools, lock the schema) prevents drift and makes output deterministically parseable.

Why do subagents burn so many tokens?

The three common failure modes are unscoped exploration (reading every file that mentions a topic), overlapping subagents that all read the same files, and implementation subagents without acceptance criteria that produce different output every run. Each problem stems from the parent failing to scope work before spawning. Fixes include giving a starting file with a depth budget, pre-partitioning the search space across subagents, and defining a concrete test command that must pass for completion.

What is the research-plan-implement pipeline pattern in Claude Code?

It's a sequential subagent workflow where each stage writes its output to a file and the next stage reads that file as its sole input. Stage one researches and writes research.md, stage two reads it and produces plan.md, stage three implements the plan as a diff, and stage four verifies with tests and lints. The parent only holds four short status summaries instead of the full research context. This reduces token cost, speeds up responses, and lowers hallucinations because each stage receives only what it needs.

Work with BizFlowAI

If you'd rather have this built for you, that's what we do: production AI automation for solo founders and small teams — agents, integrations, and document pipelines that actually ship.

Book a free discovery call — 30 minutes, we map the highest-ROI automation in your workflow. No pitch deck, just engineering.