Per-Page PDF Chunking Cut My Claude Bill By 6x (Same

Abstract tech illustration: Per-Page PDF Chunking Cut My Claude Bill By 6x (Same

Uploading PDFs straight to Claude was quietly burning 2,800 tokens per page on my invoicing pipeline, even when the page was 80% whitespace. If you're processing client contracts, supplier invoices, or any text-native PDF at volume, you're paying a vision-token tax for nothing. Here's the exact pre-chunking trick that dropped me to 450 tokens per page with identical extraction accuracy — and in some cases, better.

The hidden cost of dragging PDFs into Claude

The default workflow is seductive. Drag a 12-page supplier invoice into the Claude UI (or attach it via the API), ask "extract every line item as JSON," and it works. Looks like magic. But under the hood, every PDF upload routes through the vision pipeline. Each page gets rasterized into an image, and Claude pays a per-image token cost regardless of how much actual content sits on that page.

I measured this on a real digitally-generated invoice last week — boring B2B supplier doc, 12 pages, lots of margins and repeated header bars:

  • Average tokens consumed per page: ~2,800
  • Total input tokens for the document: ~33,600
  • Cost on Claude Sonnet 4.5 at $3/M input tokens: ~$0.10 per invoice

That sounds cheap until you scale it. One of my clients runs roughly 500 PDFs/month through their AP automation. At that volume, the vision route costs around $50/month in input tokens alone — and that's before output tokens, before retries, before the second pass I sometimes do to reconcile totals.

The kicker: most of those tokens are paying for whitespace, footer disclaimers, and the company logo at the top of every page. None of which I need to extract a line item.

Eight lines of Python that fix it

If the PDF is text-native (more on that test in a minute), you don't need vision at all. pdfplumber reads the text layer directly. No rasterization, no image tokens, no paying for the colored header bar.

import pdfplumber

def chunk_pdf_by_page(path: str) -> list[str]:
    pages = []
    with pdfplumber.open(path) as pdf:
        for page in pdf.pages:
            text = page.extract_text() or ""
            pages.append(text.strip())
    return pages

That's it. You get back a list of strings, one per page. From there, the extraction pipeline becomes a normal text-in / JSON-out loop:

import anthropic, json

client = anthropic.Anthropic()

PROMPT = """Extract every line item from this invoice page as JSON.
Schema: [{"description": str, "quantity": number, "unit_price": number, "total": number}]
If the page has no line items, return [].
Return ONLY valid JSON, no prose."""

def extract_page(page_text: str) -> list[dict]:
    msg = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=2048,
        messages=[{
            "role": "user",
            "content": f"{PROMPT}\n\n---PAGE---\n{page_text}"
        }],
    )
    return json.loads(msg.content[0].text)

def extract_invoice(path: str) -> list[dict]:
    items = []
    for page_text in chunk_pdf_by_page(path):
        if not page_text:
            continue
        items.extend(extract_page(page_text))
    return items

Same 12-page invoice through this pipeline:

  • Average tokens per page: ~450
  • Total input tokens: ~5,400
  • Cost per invoice: ~$0.016

That's 6.2x cheaper. Across 500 PDFs/month, the saving is roughly $84/month on a single workflow. Run that across three or four AP/contract workflows for a mid-size SMB client and you're back to honest unit economics.

Why the savings are this big

  • Vision pages bill on rendered pixels, not content density — a near-empty page still costs ~1,500-2,500 tokens.
  • Text extraction strips formatting, repeated headers, and decorative whitespace by default.
  • Per-page chunking caps the context window per call, so you never get charged for the whole document in one shot when you only needed page 7.

The accuracy result nobody expects

I expected the cost to drop. I did not expect accuracy to go up.

When Claude processes a rendered page image, it has to infer column structure from pixel positions. On multi-column line-item tables — the kind every supplier invoice on Earth uses — it occasionally drifts a row, attaching a unit price from row 4 to the description on row 5. Not often, but enough that I had to build a reconciliation step that re-summed line totals against the invoice grand total.

Raw text from pdfplumber preserves the reading order the PDF generator wrote. There's no column-alignment guessing. On a sample of 40 invoices I re-ran both ways:

Metric Vision (direct upload) Text chunking
Line items extracted 612 618
Column-misalignment errors 7 0
Avg tokens / page 2,810 451
Reconciliation failures 3 0
Cost per 100 invoices $4.20 $0.68

Same prompt, same model, same documents. The text path was cheaper and cleaner. The only thing I gave up was direct visual reasoning about logos or stamps — irrelevant for line-item extraction.

The one-line test that decides everything

This whole approach has one hard prerequisite: the PDF must be text-native. If it's a scan or a photo of a printed invoice, extract_text() returns empty strings and you're back to OCR or vision.

The dumb-simple human test: open the PDF, try to select a line of text with your cursor. If it highlights, you can chunk it. If it doesn't, you can't.

In code:

def is_text_native(path: str, min_chars_per_page: int = 50) -> bool:
    with pdfplumber.open(path) as pdf:
        for page in pdf.pages[:3]:  # sample first 3 pages
            text = page.extract_text() or ""
            if len(text.strip()) >= min_chars_per_page:
                return True
    return False

Branch your pipeline on that check:

def process(path: str) -> list[dict]:
    if is_text_native(path):
        return extract_invoice(path)            # cheap text path
    else:
        return extract_invoice_via_vision(path) # fallback OCR/vision path

In production I see roughly this split on US SMB document flows:

  • Digitally generated (QuickBooks, Xero, NetSuite, Bill.com, supplier portals, e-signed contracts): ~85-90% — these all chunk cleanly.
  • Scanned or photographed (receipts users snap on their phone, old vendor faxes, IRS forms saved as image PDFs): ~10-15% — these need OCR (Tesseract, AWS Textract, or vision).

That means the cheap path covers the overwhelming majority of real volume. The expensive fallback only runs when it has to.

Edge cases I had to handle in production

Eight lines gets you the demo. Production needs a few more guards.

Things that will bite you

  • Tables that span page boundaries. A line item starting on page 3 and continuing on page 4 will get split into two extractions. I keep a small "carry-over" buffer that re-feeds the last 200 characters of page N into page N+1's prompt when the page ends mid-row.
  • Headers and footers repeated on every page. Strip them before sending to the model — saves another 15-20% tokens. pdfplumber exposes bounding boxes; drop anything in the top 8% or bottom 8% of the page.
  • JSON parse failures. Wrap json.loads in a retry that asks Claude to re-emit valid JSON. Happens on maybe 1 in 300 pages.
  • Empty pages. Skip them entirely. Don't pay for a model call to confirm a blank page is blank.
  • Concurrency. Per-page chunking is embarrassingly parallel. I run pages through an async semaphore at 10 concurrent calls — a 30-page contract finishes in under 4 seconds instead of 30.
import asyncio
from anthropic import AsyncAnthropic

aclient = AsyncAnthropic()
sem = asyncio.Semaphore(10)

async def extract_page_async(text: str) -> list[dict]:
    async with sem:
        msg = await aclient.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=2048,
            messages=[{"role": "user", "content": f"{PROMPT}\n\n{text}"}],
        )
    return json.loads(msg.content[0].text)

async def extract_invoice_async(path: str) -> list[dict]:
    pages = [p for p in chunk_pdf_by_page(path) if p]
    results = await asyncio.gather(*(extract_page_async(p) for p in pages))
    return [item for page_items in results for item in page_items]

That single change — async + semaphore — is the difference between an AP pipeline that runs nightly and one that runs in real time as invoices land in the inbox.

When you should still use direct PDF upload

I'm not anti-vision. There are real cases where dragging the PDF in is the right call:

  • One-off analysis. You're looking at a single document, not building a pipeline. The 5-minute setup cost of pdfplumber isn't worth it.
  • Layout matters. Floor plans, org charts, marketing decks, anything where the visual structure carries meaning.
  • Stamped or handwritten annotations. "Approved" stamps, signature blocks, margin notes — these live in pixels, not text.
  • Mixed-content reports. Charts and graphs the model needs to read.

The decision rule I use: if you're processing fewer than 50 PDFs/month and you need visual reasoning, upload directly. If you're processing more than 50/month and you only need text content, chunk it. Volume changes the math.

Why bizflowai.io helps with this

Most of the document-automation pipelines I build for clients on bizflowai.io live exactly in this territory — supplier invoices flowing in from email, contracts hitting Google Drive, receipts arriving via webhook. The pre-chunking decision tree (text-native vs scanned, header stripping, per-page extraction, async batching, JSON reconciliation) is baked into the standard ingestion module so clients pay text-path prices on 85%+ of their volume and only fall back to vision when the document actually needs it.

Frequently asked questions

Why does uploading PDFs directly to Claude use so many tokens?

When you upload a PDF to Claude, it routes through the vision pipeline, which rasterizes every page into an image. That means white space, headers, footers, and blank margins all consume tokens. On a tested digitally-generated invoice, this came out to roughly 2,800 tokens per page, making high-volume PDF processing expensive even when the underlying content is sparse.

How do I reduce token costs when extracting data from PDFs with Claude?

Use pdfplumber in Python to pre-chunk the PDF into plain text before sending it to Claude. Open the PDF, loop over pages, and call extract_text on each to get one text string per page. Send each page individually with a stable extraction prompt, then aggregate the JSON outputs. This dropped costs from 2,800 to 450 tokens per page — about 6.2 times cheaper.

When should I use pdfplumber vs Claude's vision pipeline for PDFs?

Use pdfplumber for text-native PDFs — digitally generated invoices, contracts, or exports from accounting software where you can highlight text with your cursor. Use Claude's vision pipeline (or OCR first) for scanned documents or photos, where pdfplumber returns empty strings. Quick test: open the PDF and try selecting a line. If it highlights, chunk it. If not, you need vision.

Does pre-chunking PDF text hurt extraction accuracy?

No — accuracy actually improves. When Claude processes raw text instead of a rendered image, it stops hallucinating column alignment on multi-column tables. There's no more guessing which price belongs to which row, since the text structure is preserved directly rather than inferred from pixels. You get both lower token costs and more reliable line-item extraction.

How much money can pre-chunking PDFs save at scale?

On Claude Sonnet, processing 500 PDFs per month with pdfplumber pre-chunking instead of direct upload saves roughly 84 dollars monthly from a single workflow. The savings come from dropping token usage from around 2,800 per page to 450 per page — a 6.2x reduction — by skipping the vision pipeline's rasterization of whitespace, headers, and margins.


Want more like this?

I publish practical AI automation, GenAI engineering, and faceless content workflows on YouTube every week.

Subscribe to bizflowai.io on YouTube — never miss a new tutorial.

Planning an AI automation project or need a second opinion on your architecture?

Connect with me on LinkedIn — Lazar Milicevic, GenAI Engineer & bizflowai.io Founder.

Visit bizflowai.io for our services, case studies, and AI consulting.

Frequently asked questions

Why does uploading PDFs directly to Claude use so many tokens?

When you upload a PDF to Claude, it routes through the vision pipeline, which rasterizes every page into an image. That means white space, headers, footers, and blank margins all consume tokens. On a tested digitally-generated invoice, this came out to roughly 2,800 tokens per page, making high-volume PDF processing expensive even when the underlying content is sparse.

How do I reduce token costs when extracting data from PDFs with Claude?

Use pdfplumber in Python to pre-chunk the PDF into plain text before sending it to Claude. Open the PDF, loop over pages, and call extract_text on each to get one text string per page. Send each page individually with a stable extraction prompt, then aggregate the JSON outputs. This dropped costs from 2,800 to 450 tokens per page — about 6.2 times cheaper.

When should I use pdfplumber vs Claude's vision pipeline for PDFs?

Use pdfplumber for text-native PDFs — digitally generated invoices, contracts, or exports from accounting software where you can highlight text with your cursor. Use Claude's vision pipeline (or OCR first) for scanned documents or photos, where pdfplumber returns empty strings. Quick test: open the PDF and try selecting a line. If it highlights, chunk it. If not, you need vision.

Does pre-chunking PDF text hurt extraction accuracy?

No — accuracy actually improves. When Claude processes raw text instead of a rendered image, it stops hallucinating column alignment on multi-column tables. There's no more guessing which price belongs to which row, since the text structure is preserved directly rather than inferred from pixels. You get both lower token costs and more reliable line-item extraction.

How much money can pre-chunking PDFs save at scale?

On Claude Sonnet, processing 500 PDFs per month with pdfplumber pre-chunking instead of direct upload saves roughly 84 dollars monthly from a single workflow. The savings come from dropping token usage from around 2,800 per page to 450 per page — a 6.2x reduction — by skipping the vision pipeline's rasterization of whitespace, headers, and margins.