I Sent 200 PDFs to Claude as Images. It Cost Me $280.

Abstract tech illustration: I Sent 200 PDFs to Claude as Images. It Cost Me $280.

Last month I ran a batch of 200 vendor invoices through Claude vision. The bill came back at $280. It should have been $44. If you're following the popular advice and rasterizing every PDF page before sending it to Claude, you're burning money on pages that don't need it — and I'll show you the exact page-level router I run in production to fix it.

Why the "rasterize everything" advice quietly breaks

You've seen this pattern in every tutorial: take a PDF, convert each page to a PNG, send the images to Claude vision, let the model read the layout. It's not wrong. It handles scans. It handles weird multi-column tables. It handles forms where the text layer is junk Unicode soup.

The reason creators recommend it is honest — it works in a demo with one clean PDF. The problem shows up on invoice number fifty in a real pipeline.

Picture a small business or an automation you're building for one. Vendor invoices land in an inbox every day:

  • Some are clean digital PDFs exported from QuickBooks, Xero, or NetSuite.
  • Some are scans — a PDF wrapper around a 200 DPI image of paper.
  • Most are mixed. Page one is a scanned letterhead with the vendor logo. Pages two and three are crisp digital line items exported straight from their billing system.

If you treat them all the same way — convert to image, send to vision, extract fields — everything appears to work. Until you look at the invoice from Anthropic.

The real numbers from a 200-invoice batch

I ran a production batch of 200 vendor invoices through the image-only route. Here's what came back:

Metric Image-only route
Cost per document $1.40
Latency per document 11.0 seconds
Total batch cost $280
Total wall time ~37 minutes
Field extraction accuracy 94%

That batch should have cleared in five minutes for under $50. About 80% of those pages were clean digital text — Claude vision was reading pixels of text that the PDF was already handing me as a literal Python string. I was paying premium vision tokens to do OCR on files that didn't need OCR.

The fix isn't a smarter model or a cheaper one. It's not sending the wrong work to the model in the first place.

The page-level router (not document-level)

Here's the part most tutorials skip. The router has to operate at the page level, not the document level.

Why this matters: most real invoices are mixed. If you route at the document level, one scanned cover page forces the entire 4-page invoice into the expensive vision path. Route per page, and only that single cover page pays the vision cost. The line items, totals, and tax breakdown flow as cheap text.

For each page, the router measures two things:

  1. Text density — characters of extracted text divided by page area.
  2. Table zones vs. image regions — does the page contain tables that overlap with embedded image regions?

If text density is above threshold and tables sit in clean zones → send the page as plain text in the prompt. If density is below threshold, or I detect a scan, or table bounding boxes overlap image regions → that single page gets rasterized and sent as an image.

The density threshold I landed on after tuning against a labeled set is ~1,200 characters per standard US Letter page. Tune yours on your own corpus — contracts, shipping manifests, and invoices all have different baselines.

The router, in code

Here's the core of what runs in production. Stripped down, but accurate to the logic:

import pdfplumber
from pathlib import Path

DENSITY_THRESHOLD = 1200  # chars per standard page
EXPECTED_TOKENS = ("
quot;, "total", "invoice") # invoice sanity check def route_page(page) -> str: """Return 'text' or 'image' for a single PDF page.""" text = page.extract_text() or "" area = page.width * page.height # in PDF points # Normalize to a US Letter reference (612 x 792 = 484,704) density = (len(text) / area) * 484_704 # Sanity check: does the extracted text look like an invoice? text_lower = text.lower() looks_valid = any(tok in text_lower for tok in EXPECTED_TOKENS) # Check table / image overlap tables = page.find_tables() images = page.images overlap = any( _bbox_overlap(t.bbox, (im["x0"], im["top"], im["x1"], im["bottom"])) for t in tables for im in images ) if density >= DENSITY_THRESHOLD and looks_valid and not overlap: return "text" return "image" def _bbox_overlap(a, b) -> bool: ax0, ay0, ax1, ay1 = a bx0, by0, bx1, by1 = b return not (ax1 < bx0 or bx1 < ax0 or ay1 < by0 or by1 < ay0) def route_pdf(pdf_path: Path): decisions = [] with pdfplumber.open(pdf_path) as pdf: for i, page in enumerate(pdf.pages): decisions.append({ "page": i + 1, "route": route_page(page), }) return decisions

The downstream call site is just a switch: pages flagged text get concatenated into the prompt with page boundaries, pages flagged image get rasterized at 200 DPI and attached as vision content. One Claude request per document, mixed content blocks.

What the sanity check buys you

  • pdfplumber sometimes returns a wall of garbled text from PDFs with a corrupted text layer. Density looks great. Extraction silently fails.
  • Checking for an expected token (currency symbol, the word "total" for invoices, "agreement" for contracts) catches this.
  • If the sanity check fails, fall back to image even when density is high.

What changed after the router went live

Same 200-invoice batch, re-run through the hybrid router:

Metric Image-only Hybrid router Delta
Cost per document $1.40 $0.22 6.4× cheaper
Latency per document 11.0s 3.4s 3.2× faster
Batch cost $280 $44 -$236
Batch wall time ~37 min ~11 min -26 min
Extraction accuracy (30 fields) 94% 94% unchanged

Accuracy was measured against a hand-labeled ground truth of ~30 fields per invoice — vendor name, invoice number, issue date, due date, subtotal, tax, total, every line item description, quantity, unit price, line total. Same number. Six times cheaper. Three times faster.

The reason accuracy holds: when the text layer is good, it's better than vision OCR. The model receives the exact characters the PDF generator wrote, not a model's best guess at pixels.

Gotchas before you ship this

A few things I learned the expensive way.

Things that will bite you

  • Your density threshold is document-type specific. 1,200 chars/page works for US Letter invoices. Long-form contracts run 2,500+. Shipping manifests with sparse tables can come in at 600 and still be perfectly clean text. Measure your own corpus before copying a number.
  • Log every routing decision per page. Page number, density value, route taken, downstream extraction confidence. After a week of production traffic you'll see exactly where to retune — usually one or two vendor templates that sit right on the boundary.
  • Watch for "clean-looking" PDFs with broken text layers. Some accounting tools emit PDFs where the text layer is positioned correctly visually but contains junk characters (CID font issues, custom encodings). The sanity-check token list is your guardrail.
  • Rasterize at 200 DPI, not 300. Going from 300 to 200 DPI cut vision token cost roughly 40% on the pages that did need imaging, with no accuracy loss on invoice fields.
  • Cache routing decisions by file hash. If the same vendor sends the same template every month, you don't need to re-evaluate density on every run.

Why bizflowai.io helps with this

This is exactly the kind of work I build into client systems at bizflowai.io — document pipelines that route intelligently instead of brute-forcing every page through the most expensive endpoint. For invoice processing, contract review, and inbound document triage, the routers, OCR fallbacks, and field-level extraction validators are already wired up. Clients get the accuracy of vision on the pages that need it, and the cost of plain text on the 80% that don't.

The takeaway

The default "rasterize everything" advice optimizes for a demo, not a production bill. The fix isn't complicated — it's a per-page check on text density, a sanity token, and a table-vs-image overlap test. About 60 lines of Python sitting between your PDFs and Claude.

On a 200-document batch that saved $236 and 26 minutes of wall time, with no drop in field accuracy. Scale that to a small business processing 1,000 vendor invoices a month and you're looking at over $14,000 a year in API spend you weren't supposed to be paying.

Build the router. Log every decision. Tune your threshold against ground truth. Then stop paying premium vision tokens to OCR text that was never an image in the first place.

Frequently asked questions

What is a page-level PDF router for Claude vision?

A page-level router inspects each page of a PDF individually and decides whether to send it to Claude as plain text or as a rasterized image. Pages with high extracted-text density go through as cheap text, while scanned or image-heavy pages get sent to Claude vision. This avoids forcing an entire document into the expensive image path just because one page is scanned.

How do I reduce Claude vision costs when extracting data from PDF invoices?

Stop sending every page to vision. Use pdfplumber to extract text per page, measure text density (characters divided by page area), and route only low-density or scanned pages to Claude as images. Send clean digital pages as plain text in the prompt. On a 200-invoice batch this dropped cost from $1.40 to $0.22 per document and latency from 11 seconds to 3.4 seconds.

Why does page-level routing matter for mixed PDFs?

Real-world invoices are often mixed: page one is a scanned letterhead, while pages two and three are clean digital exports. Document-level routing forces the entire file into the expensive vision path because of one scanned page. Page-level routing isolates the cost, so only the scanned header pays vision pricing while line items, totals, and tax breakdowns flow as cheap text.

When should I use text extraction vs Claude vision for a PDF page?

Use text extraction when pdfplumber returns high character density (around 1,200+ characters per standard page) and tables don't overlap with image regions. Use Claude vision when density is below threshold, the page is a scan, tables sit inside image zones, or a sanity check (like missing currency symbols or the word 'total') suggests the text layer is corrupted.

What text density threshold should I use for routing PDF pages?

Roughly 1,200 characters per standard page worked for invoices after tuning against ground truth, but the threshold is document-type specific. Invoices, contracts, and shipping manifests have different baselines, so measure your own document set. Also log every routing decision with its density value, since a week of production traffic will reveal exactly where to retune.


Want more like this?

I publish practical AI automation, GenAI engineering, and faceless content workflows on YouTube every week.

Subscribe to bizflowai.io on YouTube — never miss a new tutorial.

Planning an AI automation project or need a second opinion on your architecture?

Connect with me on LinkedIn — Lazar Milicevic, GenAI Engineer & bizflowai.io Founder.

Visit bizflowai.io for our services, case studies, and AI consulting.

Frequently asked questions

What is a page-level PDF router for Claude vision?

A page-level router inspects each page of a PDF individually and decides whether to send it to Claude as plain text or as a rasterized image. Pages with high extracted-text density go through as cheap text, while scanned or image-heavy pages get sent to Claude vision. This avoids forcing an entire document into the expensive image path just because one page is scanned.

How do I reduce Claude vision costs when extracting data from PDF invoices?

Stop sending every page to vision. Use pdfplumber to extract text per page, measure text density (characters divided by page area), and route only low-density or scanned pages to Claude as images. Send clean digital pages as plain text in the prompt. On a 200-invoice batch this dropped cost from $1.40 to $0.22 per document and latency from 11 seconds to 3.4 seconds.

Why does page-level routing matter for mixed PDFs?

Real-world invoices are often mixed: page one is a scanned letterhead, while pages two and three are clean digital exports. Document-level routing forces the entire file into the expensive vision path because of one scanned page. Page-level routing isolates the cost, so only the scanned header pays vision pricing while line items, totals, and tax breakdowns flow as cheap text.

When should I use text extraction vs Claude vision for a PDF page?

Use text extraction when pdfplumber returns high character density (around 1,200+ characters per standard page) and tables don't overlap with image regions. Use Claude vision when density is below threshold, the page is a scan, tables sit inside image zones, or a sanity check (like missing currency symbols or the word 'total') suggests the text layer is corrupted.

What text density threshold should I use for routing PDF pages?

Roughly 1,200 characters per standard page worked for invoices after tuning against ground truth, but the threshold is document-type specific. Invoices, contracts, and shipping manifests have different baselines, so measure your own document set. Also log every routing decision with its density value, since a week of production traffic will reveal exactly where to retune.