What is Claude Code? I Use It to Do a Bookkeeper's Job

Abstract ledger pages flowing into Claude Code's terminal interface, automating a bookkeeper's daily workflow

One of my clients used to pay someone three hours a month to retype vendor invoices from Gmail into their billing database. I rebuilt that job as a Claude Code agent. It now runs every morning at 6am, costs roughly 40 cents in API credits per month, and the client hasn't touched it in eight weeks.

If you've been told Claude Code is a coding tool, you've been sold the wrong story. Here's exactly how the non-coding workflow runs.

What Claude Code actually is when you stop calling it a coding tool

Forget the name for a second. Claude Code is an AI agent that lives in your terminal. It can read files, write files, run shell commands, hit APIs, log into services with credentials you hand it, and — the part that matters — it can take a goal written in plain English and figure out the steps to reach it.

Think about what a junior employee does on a computer all day:

  • Opens Gmail, downloads attachments
  • Copies numbers from a PDF into a spreadsheet or billing system
  • Sends a confirmation email
  • Marks the item as done somewhere

None of that is coding. All of it is reading, writing, and following a checklist. That's exactly the shape of work Claude Code is built for. The fact that it can also write a React app is a side effect, not the main feature.

The actual workflow: Gmail in, Postgres rows out

The client receives about 40 vendor invoices a month from roughly 20 recurring senders. Before automation, someone in the office:

  1. Searched the inbox for vendor emails
  2. Downloaded the attached PDF
  3. Read invoice number, date, amount, tax, line items
  4. Typed all of it into the internal billing database
  5. Filed the email

Three hours of work. Easy to mess up at invoice 35 when you're tired. Easy to fall a month behind. The kind of work that quietly eats a small business alive.

The rebuilt version lives in a single folder on a small home server:

~/agents/invoice-bot/
├── CLAUDE.md          # the spec (plain English)
├── .env               # gmail + postgres credentials
├── sandbox.sql        # safe copy of the invoices table
└── logs/              # daily run logs

No application code. No custom scraper. No PDF-parsing library I had to maintain. The folder is the agent.

CLAUDE.md is the whole product

The file that does the heavy lifting is not code. It's a plain text file called CLAUDE.md that tells the agent, in English, what its job is. This is the actual structure of mine (sanitized):

# Role
You are an invoice intake agent. Run once per day.

# Inputs
- Gmail account: ops@clientdomain.com (creds in .env)
- Look at: unread messages from the last 30 days
- Only from these vendor domains: [list of 20]

# For each matching email
1. Download any PDF attachment to ./tmp/
2. Extract these fields from the PDF:
   - invoice_number (string)
   - invoice_date (YYYY-MM-DD)
   - vendor_name (string)
   - subtotal (decimal)
   - tax (decimal)
   - total (decimal)
   - line_items (JSON array of {description, qty, unit_price})
3. Validate: subtotal + tax == total (tolerance 0.02)
4. If valid: INSERT into invoices table (schema below)
5. Apply Gmail label "processed" to the email
6. If invalid: apply label "review", do NOT insert

# Vendor-specific rules
- Vendor X: invoice number is in the bottom-left footer, not the header
- Vendor Y: tax line is labeled "VAT" not "Tax"
- Vendor Z: dates are DD/MM/YYYY, convert before insert

# Database
Table: invoices
Columns: id, vendor_name, invoice_number, invoice_date, subtotal, tax, total, line_items, source_email_id, created_at

# Hard rules
- NEVER delete emails
- NEVER modify existing rows
- If unsure about a field, label "review" and skip

That's the entire brief. The vendor-specific rules section started empty and grew over the first week as I caught edge cases in the logs.

The first run: watch it, don't trust it

This is the part most tutorials skip. You do not point an agent at your production database and walk away. The first run goes against a sandbox.

My setup for the first dozen runs:

# point the agent at a copy of the real table
export DATABASE_URL="postgres://localhost/invoices_sandbox"

# run claude code in interactive mode, watch every action
claude

# inside the session:
> Read CLAUDE.md and execute the daily run. 
> Show me each step before you take destructive actions.

I sat there and watched the terminal. On the first run it tripped on exactly the kind of thing you'd expect: one vendor's PDF had the invoice number in the footer instead of the header, and the agent grabbed the customer reference number by mistake. I stopped it, added one line to CLAUDE.md (Vendor X: invoice number is in the bottom-left footer), and ran it again. Clean.

A few things I learned the hard way during the first week:

  • Vague specs produce creative improvisation. "Extract the invoice details" gets you garbage. "Extract these 7 named fields, in this format, with this validation rule" gets you rows you can trust.
  • Always give it a "review" escape hatch. Better to skip a weird invoice and label it than guess and corrupt the table.
  • Read the logs every morning for at least a week. Edge cases you didn't predict will show up. Update the spec, not the code.

Scheduling it: cron + headless mode

Once the spec was stable (about day 5), I switched from interactive runs to a scheduled headless run. The whole scheduling layer is one cron line:

# /etc/cron.d/invoice-bot
0 6 * * * lazar cd /home/lazar/agents/invoice-bot && \
  claude -p "Read CLAUDE.md and execute the daily run." \
  --output-format stream-json \
  >> logs/$(date +\%Y-\%m-\%d).log 2>&1

Every morning at 06:00 the agent wakes up, reads its instructions, processes whatever's new in the inbox, writes to Postgres, applies Gmail labels, and goes back to sleep. The client opens their billing system and yesterday's invoices are just there.

The numbers from the last month, measured not estimated:

  • Invoices processed: 40
  • Manual time before: ~180 minutes
  • Agent runtime across the month: ~6 minutes total
  • Anthropic API cost: $0.41
  • Errors requiring human review: 2 (both correctly flagged, not silently wrong)

The error rate is lower than the human's was, because the agent doesn't get tired at invoice 35 and transpose digits.

The limits, because I'm not selling you anything

Claude Code is not magic. Things it will not do well if you're sloppy:

  • Vague specs → improvisation you don't want. Named fields, concrete examples, hard rules. Treat the spec like a contract.
  • No sandbox → corrupted data. First 10–20 runs go against a copy. Always.
  • No guardrails → expensive mistakes. Anything that costs money, sends external messages, or deletes data needs an explicit allow-list in the spec.
  • No log review → silent drift. Read the logs for the first week. Vendors change PDF templates. Specs need updates.

Treat it like a smart new hire on day one, not a finished product. Spend a few hours writing the spec properly and you get an operator that runs forever. Skip that step and you get a very expensive way to corrupt your own database.

The real unlock for a 1-10 person business

Bookkeeping is just the obvious example because every business has it and every business hates it. The actual unlock is broader: Claude Code can operate any system that exposes a terminal, an API, or a file on disk.

That covers, off the top of my head for the clients I work with:

  • Inbox triage and routing
  • CRM updates from email replies
  • Accounting software entries
  • E-invoicing portal submissions (the Serbian SEF portal, for my Fakturko clients)
  • Spreadsheet exports and reconciliations
  • Weekly reporting pulled from 4 different dashboards into one PDF

Anywhere a junior employee currently clicks, types, and copies, you can point an agent at the same task. The bottleneck isn't the technology. It's whether you can write down what the job actually is, step by step, in a markdown file.

Why bizflowai.io helps with this

This invoice intake agent is one of the standard workflows I deploy for small business clients through bizflowai.io — the others being Gmail triage, lead follow-up sequencing, and recurring report generation. Every deployment ships with a versioned CLAUDE.md, a sandbox-first rollout, daily log digests sent to the owner, and explicit guardrails on anything that writes to a production system. The point is not to sell you an agent. The point is to install one that runs for months without you thinking about it.

Frequently asked questions

What is Claude Code?

Claude Code is an AI agent that runs in your terminal. It can read and write files, run commands, hit APIs, and log into services using credentials you provide. Given a goal described in plain English, it figures out the steps to reach it. Despite the name, it isn't only for programmers, it can handle any task built on reading, writing, and following a checklist.

How do I automate invoice processing with Claude Code?

Create a folder containing a CLAUDE.md spec file written in plain English. Specify the Gmail account, vendor domains to watch, which PDF fields to extract, validation rules, and the database table to update. Test it on a sandbox first, refine the spec when it makes mistakes, then schedule it with a cron job to run headless on a server each morning.

Why does the CLAUDE.md file matter for Claude Code automation?

CLAUDE.md is the plain-English spec that tells Claude Code exactly what its job is, including credentials, sources, fields to extract, validation rules, and output destinations. If the spec is vague, the agent will improvise unpredictably. Clear rules, named fields, and concrete examples (like noting that one vendor's invoice number sits in the footer) keep the agent reliable and prevent errors.

How much does running Claude Code for invoice automation cost?

In a real client workflow processing about 40 vendor invoices per month, Claude Code used roughly six minutes of total agent runtime and cost around 40 cents in Anthropic API credits. This replaced about three hours of monthly manual data entry, with a lower error rate than the human had because the agent doesn't fatigue or transpose digits late in a batch.

When should I let Claude Code run unsupervised versus watching it?

Always watch Claude Code during initial runs on a sandbox copy, never a production database. Observe every terminal action, stop it when it does something wrong, and update CLAUDE.md with clearer rules. Only schedule unsupervised runs once the spec is stable, and even then keep guardrails on anything that costs money or sends external messages, and review logs for the first week.


Want more like this?

I publish practical AI automation, GenAI engineering, and faceless content workflows on YouTube every week.

Subscribe to bizflowai.io on YouTube — never miss a new tutorial.

Planning an AI automation project or need a second opinion on your architecture?

Connect with me on LinkedIn — Lazar Milicevic, GenAI Engineer & bizflowai.io Founder.

Visit bizflowai.io for our services, case studies, and AI consulting.

Frequently asked questions

What is Claude Code?

Claude Code is an AI agent that runs in your terminal. It can read and write files, run commands, hit APIs, and log into services using credentials you provide. Given a goal described in plain English, it figures out the steps to reach it. Despite the name, it isn't only for programmers, it can handle any task built on reading, writing, and following a checklist.

How do I automate invoice processing with Claude Code?

Create a folder containing a CLAUDE.md spec file written in plain English. Specify the Gmail account, vendor domains to watch, which PDF fields to extract, validation rules, and the database table to update. Test it on a sandbox first, refine the spec when it makes mistakes, then schedule it with a cron job to run headless on a server each morning.

Why does the CLAUDE.md file matter for Claude Code automation?

CLAUDE.md is the plain-English spec that tells Claude Code exactly what its job is, including credentials, sources, fields to extract, validation rules, and output destinations. If the spec is vague, the agent will improvise unpredictably. Clear rules, named fields, and concrete examples (like noting that one vendor's invoice number sits in the footer) keep the agent reliable and prevent errors.

How much does running Claude Code for invoice automation cost?

In a real client workflow processing about 40 vendor invoices per month, Claude Code used roughly six minutes of total agent runtime and cost around 40 cents in Anthropic API credits. This replaced about three hours of monthly manual data entry, with a lower error rate than the human had because the agent doesn't fatigue or transpose digits late in a batch.

When should I let Claude Code run unsupervised versus watching it?

Always watch Claude Code during initial runs on a sandbox copy, never a production database. Observe every terminal action, stop it when it does something wrong, and update CLAUDE.md with clearer rules. Only schedule unsupervised runs once the spec is stable, and even then keep guardrails on anything that costs money or sends external messages, and review logs for the first week.