Learn 95% of the Claude Agent SDK in Under 14 Minutes

By Lazar Milicevic · Published June 14, 2026 · 9 min read

If you've been pasting prompts into a browser and calling it automation, you're using maybe 5% of what Claude can do. The Agent SDK is the same model — but wrapped in tools, file access, and autonomous loops that finish work while you're asleep. Here's the exact mental model and the five primitives I use in every production agent I ship.

Primitive 1: The Query Loop Replaces the Human

In the chat app, the loop is: you type, Claude answers, you type again. You are the loop. The SDK replaces you with code.

from claude_agent_sdk import query

async for message in query(
    prompt="Process all unread invoices in the inbox.",
    options={"allowed_tools": ["read_new_emails", "create_invoice"]}
):
    print(message)

Six lines. But inside that single query call, Claude can call a tool, read the result, think, call another tool, and only then return. The whole act-observe-decide cycle happens autonomously.

The mental shift: stop thinking "I'm asking Claude a question." Start thinking "I'm hiring a worker and giving them a task." That worker keeps going until the task is done or it hits a stop condition you defined.

For a small business this is the difference between:

An AI that drafts an email for you to send, and
An AI that reads the inbox, classifies what needs a reply, drafts each one, queues them, and pings you on Telegram about the three it wasn't sure about — all between 2 AM and 6 AM.

Same model. Different wrapper.

Primitive 2: Tool Definitions in Three Categories

A tool is just a function you let Claude call. You describe it in plain English — name, purpose, arguments — and the SDK handles the protocol. Claude emits a tool-use message, your code runs the function, you return the result, Claude continues.

The pattern I use on almost every client build: three categories.

The three tool categories

Read tools — fetch emails, query the CRM, read a file, hit an API.
Write tools — send a reply, create an invoice, update a row, post a message.
Escalate tools — ping the owner on Telegram when something looks off.

That third category is what separates agents that work from agents that either spam customers or hang forever. Most tutorials skip it. Always give the agent a way to ask for help.

For a small invoicing business I built for, the tool list was literally five functions:

@tool
def read_new_emails() -> list[Email]: ...

@tool
def extract_invoice_data(email_id: str) -> InvoiceData: ...

@tool
def create_invoice_in_system(data: InvoiceData) -> str: ...

@tool
def send_invoice_pdf(invoice_id: str, recipient: str) -> bool: ...

@tool
def notify_owner_on_exception(reason: str, context: dict) -> None: ...

Five tools. One agent. Around 200 invoices a week handled without anyone touching the keyboard. The owner gets a Telegram ping maybe twice a week when something genuinely needs eyes — usually a new client whose tax ID doesn't match the format the system expects.

Primitive 3: System Prompts as Job Descriptions, Not Vibes

This is where most builds quietly go wrong. People write system prompts like they're chatting: "You are a helpful assistant that processes invoices." That's not a job description, that's a vibe.

Write the system prompt the way you'd write a contract for someone you'll never meet. Here's the template I paste into every new agent:

ROLE
One sentence. What is this agent.

INPUTS
What will arrive. In what format. From where.

OUTPUTS
What does "done" look like. What artifacts must exist.

RULES
Hard constraints. Things that must never happen.

ESCALATION
Exact conditions to stop and call notify_owner.

The rules section is almost always the longest, because rules are how you prevent the embarrassing failures:

Don't send emails to addresses not on the approved list.
Don't create invoices over €5,000 without explicit confirmation.
Don't reply in a language different from the incoming message.
Don't retry a failing API call more than 3 times.

Treat the system prompt like an employee handbook, not a greeting. The chat-app habit of being chatty in the prompt is the single most common reason a 6-tool agent does something stupid at 3 AM.

Primitive 4: Session Memory and Context Compaction

This is where long-running agents either survive or die. Every message, every tool call, every JSON result piles into the context window. Run an agent for two hours of real work and you'll blow past the limit — speed drops, costs climb, quality decays.

The SDK gives you two patterns:

Pattern	When to use	Tradeoff
Session resumption	Job pauses and resumes (overnight cron, queue worker)	Easy; doesn't fix bloat within a single run
Context compaction	Single long-running task (research, multi-stage processing)	Requires checkpoint logic; massively cheaper

The pattern I use on overnight jobs: every 20 tool calls, compact. Keep the system prompt. Keep the current task. Keep a one-paragraph summary of progress. Throw away the raw tool outputs that have already been processed.

if tool_call_count % 20 == 0:
    summary = await summarize_progress(history)
    history = [system_prompt, current_task, summary]

An agent that would have died after 30 minutes now runs for six hours on a few cents of API spend. For real numbers: a typical client agent I ship runs around 40 invocations a day and costs roughly 18 cents in API spend. Not a typo. The expensive part of automation was never the model — it was developer time, and the SDK collapses that.

Primitive 5: Subagent Orchestration

This is the one that changes the shape of what you can build. One parent agent that spawns specialized children. The parent doesn't do the work — it decides what work needs doing and dispatches it. Think project manager, not contractor.

Concrete example from a lead-gen workflow I ship:

Parent agent (orchestrator)
├── reads lead list from CRM
├── for each lead → spawns Research subagent
│     • tools: web_search, scrape_site, read_linkedin
│     • returns: {company_summary, qualified: bool}
├── for each qualified lead → spawns Drafting subagent
│     • tools: read_research, draft_email
│     • returns: draft_text
└── for each draft → spawns Sending subagent
      • tools: send_email, log_to_crm
      • returns: send_status

Each subagent has:

A tiny, focused system prompt (10-20 lines, not 200)
A small tool set (2-4 tools, not 15)
A short context window because it dies after one task

Why this matters in practice:

What subagents fix

Context bloat — each child starts clean; the parent only keeps summaries.
Tool confusion — Claude picks better tools when there are 3 to choose from instead of 15.
Cost — small focused contexts cost a fraction of one giant agent's context.
Failure isolation — one subagent failing doesn't poison the rest of the run.

The orchestrator pattern is also how you stop hitting rate limits — you can throttle the parent's dispatch logic without rewriting any of the worker agents.

Putting the Five Together

Here's the mental model in one table. Every production agent I ship maps to it:

Primitive	What it replaces	What breaks without it
Query loop	You sitting in front of a chat tab	No autonomy — still manual
Tool definitions (R/W/E)	Copy-pasting between apps	Agent can think but not act
System prompt as JD	"You are a helpful assistant"	Embarrassing edge-case failures
Memory & compaction	Restarting the chat every hour	Agent dies on long jobs
Subagent orchestration	One monolithic mega-prompt	Context bloat, tool confusion, cost spikes

Build a real agent and you'll touch all five within the first day. Skip any one of them and you'll feel exactly where the seam tears.

Why bizflowai.io helps with this

A lot of small businesses I work with don't want to learn the SDK — they want the invoice agent, the lead-research agent, the inbox-triage agent already running on their data. At bizflowai.io I build these as scoped engagements: I map the workflow, define the read/write/escalate tools against the systems you already use (Gmail, your CRM, your invoicing software, Telegram for escalations), write the system prompts as proper job descriptions, and deploy the agent to a home server or a small VPS so the monthly cost stays in the cents-per-day range. You get the autonomous worker; I take care of the five primitives behind it.

Frequently asked questions

What is the query loop in the Claude Agent SDK?

The query loop is the core primitive of the Claude Agent SDK that replaces a human-in-the-loop chat with code. You call query with a prompt and get back a stream of messages. Within a single query call, Claude can decide to call a tool, get a result, reason about it, call another tool, and then answer—turning Claude from a chatbot into an autonomous worker that completes tasks.

How do I define tools for a Claude agent?

A tool is a function you let Claude call, described in plain English with a name, purpose, and arguments. Organize tools into three categories: read tools (fetch emails, query a CRM), write tools (send replies, create invoices), and escalate tools (notify a human via Telegram). When Claude emits a tool-use message, your code runs the function and passes the result back so Claude can continue.

How should I write a system prompt for a Claude agent?

Treat the system prompt as a job description, not a greeting. Include five sections: Role (one sentence describing the agent), Inputs (what arrives and in what format), Outputs (what 'done' looks like), Rules (what must never happen, usually the longest section), and Escalation (when to stop and ask a human). This prevents embarrassing failures like sending emails to unapproved addresses or creating oversized invoices.

Why does context compaction matter for long-running agents?

Every message, tool call, and result accumulates in the context window. After hours of running, agents blow past the limit, slow down, get more expensive, and lose quality. Context compaction solves this: at checkpoints, the agent summarizes its progress, drops raw tool outputs, and continues with a clean compressed history. A common pattern is compacting every twenty tool calls while keeping the system prompt and current task intact.

When should I use session resumption vs context compaction?

Use session resumption when you need to pause and restart an agent later—you save the conversation and pick up where you left off. Use context compaction for long, continuous runs like overnight jobs where the context window would otherwise overflow. Compaction has the agent summarize progress at checkpoints and drop raw tool outputs, keeping the agent fast and cheap while session resumption simply preserves state between runs.

Want more like this?

I publish practical AI automation, GenAI engineering, and faceless content workflows on YouTube every week.

Subscribe to bizflowai.io on YouTube — never miss a new tutorial.

Planning an AI automation project or need a second opinion on your architecture?

Connect with me on LinkedIn — Lazar Milicevic, GenAI Engineer & bizflowai.io Founder.