Stop Uploading Scanned PDFs to Claude — Do This Instead

By Lazar Milicevic · Published June 20, 2026 · 9 min read

A client was burning 4,800 tokens per invoice on Claude's native PDF upload. Same accuracy after the fix: 950 tokens. The reason isn't in the docs anywhere obvious, and it only shows up when you process real-world scanned documents at volume.

The Problem Nobody Talks About

The setup: procurement team uploads about 200 supplier invoices per week. Almost all of them are scans or phone photos saved as PDF. The extraction job is twelve fields per invoice — vendor, date, totals, tax breakdown, line items, the usual.

First version was the obvious one. Pipe every PDF straight into Claude's native PDF endpoint, prompt for the twelve fields, parse the JSON. It worked. Accuracy hit 94% on the schema. Then the bill came.

4,800 tokens per invoice. Average. For a one-page document.

The founder asked the right question: why is one page costing this much?

Here's what's actually happening — and I had to dig to find it. When you send a PDF to Claude, the API checks whether the PDF contains an extractable text layer. If yes, you pay text-token rates and life is good. If the PDF is a scan, a phone photo wrapped in a PDF, or anything image-based, Claude silently rasterizes every page at a high internal resolution and processes it as vision input. You pay vision-token rates, on images sized larger than you would ever choose if you were sending them yourself.

You don't get warned. You don't get a flag back saying "hey, this was a scan." You just get the bill.

Most tutorials don't catch this because they test on clean text PDFs generated from Word or InDesign. The vision path never fires. In production, on real procurement workflows, it fires constantly.

Detect Image-Based PDFs Before They Hit Claude

The fix is upstream classification. Decide whether a PDF is text or image before you call the API, then route accordingly.

Three libraries do the whole job:

pip install pdfplumber pdf2image Pillow

pdfplumber — opens a PDF and tells you whether there is a real, extractable text layer
pdf2image — rasterizes PDF pages into actual image files (wraps poppler)
Pillow — compresses those images down to a size that's cheap but still readable

The classification logic is one function:

import pdfplumber

def is_text_pdf(path: str, min_chars: int = 50) -> bool:
    with pdfplumber.open(path) as pdf:
        first_page_text = pdf.pages[0].extract_text() or ""
        return len(first_page_text.strip()) >= min_chars

If extract_text() gives you back fifty or more characters of actual content from page one, it's a text PDF. Send it to Claude's PDF endpoint as normal — native upload is cheap and accurate on real text.

If you get back an empty string, a couple of stray ligatures, or pure noise, it's image-based. Now you take control of the rasterization yourself instead of letting Claude do it at whatever resolution it wants.

The Conversion Path That Cuts Tokens 5x

When is_text_pdf returns False, here's the path. Two settings matter and they're not arbitrary — I tuned them against the actual 200-invoice corpus until accuracy held at 94%.

from pdf2image import convert_from_path
from PIL import Image
import io, base64

def pdf_to_compressed_jpegs(path: str) -> list[bytes]:
    pages = convert_from_path(path, dpi=150)
    out = []
    for page in pages:
        page.thumbnail((1024, 1024))  # longest edge = 1024
        buf = io.BytesIO()
        page.save(buf, format="JPEG", quality=85)
        out.append(buf.getvalue())
    return out

Why these specific numbers:

DPI 150 — enough for OCR-quality digit recognition on printed invoices. At 200+ DPI you double the pixel count for zero accuracy gain. At 100 DPI Claude starts misreading tax amounts with similar-looking digits.
1024 px longest edge — Claude's vision pricing scales with image dimensions. 1024 keeps you in a cheap tier while preserving enough detail that every digit is sharp.
JPEG quality 85 — the sweet spot. Quality 95 doubles file size for no accuracy improvement. Quality 70 starts smearing thin fonts on cheap thermal-printed receipts.

Then send them as vision inputs:

import anthropic

client = anthropic.Anthropic()

def extract_fields(jpegs: list[bytes], prompt: str) -> str:
    content = [
        {
            "type": "image",
            "source": {
                "type": "base64",
                "media_type": "image/jpeg",
                "data": base64.b64encode(j).decode(),
            },
        }
        for j in jpegs
    ]
    content.append({"type": "text", "text": prompt})

    msg = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=1024,
        messages=[{"role": "user", "content": content}],
    )
    return msg.content[0].text

That's the entire router. Classify, branch, convert if needed, send.

The Numbers, Side by Side

Same 200 invoices, same twelve-field schema, same prompt, same model. Run both paths back-to-back, log the token counts.

Path	Avg tokens / invoice	Accuracy (12 fields)	Relative cost
Native PDF upload (everything)	4,800	94%	1.0x
Router (text → native, scan → compressed JPEG)	950	94%	0.20x

Five times cheaper. Identical extraction quality. Same prompt, same model, same downstream parser.

The interesting part is why the savings are this clean. On a scanned invoice, Claude's internal rasterization was producing what looked like ~2000px vision inputs per page. My pipeline gives it a 1024px JPEG at quality 85. Claude reads every digit on both. The smaller image is just billed at a smaller tier.

At 200 invoices per week, the dollar delta isn't enormous in absolute terms — but the principle scales. The moment that client hits 2,000 invoices per week, or layers on contract review, or adds receipt processing, the savings compound and the router pays for itself on day one.

The Mixed-PDF Edge Case

There's one case the simple router misses: PDFs where some pages have a real text layer and other pages are embedded scans. The most common offender is contracts with a photographed signature page stapled in at the end.

Fix: loop per page, not per document. Tag each page, split the routing per page, stitch results back together.

import pdfplumber
from pdf2image import convert_from_path

def classify_pages(path: str, min_chars: int = 50) -> list[str]:
    tags = []
    with pdfplumber.open(path) as pdf:
        for page in pdf.pages:
            text = page.extract_text() or ""
            tags.append("text" if len(text.strip()) >= min_chars else "image")
    return tags

def route_mixed_pdf(path: str):
    tags = classify_pages(path)
    text_pages = [i for i, t in enumerate(tags) if t == "text"]
    image_pages = [i for i, t in enumerate(tags) if t == "image"]

    results = {}

    if text_pages:
        # extract text directly with pdfplumber, send as plain text
        with pdfplumber.open(path) as pdf:
            for i in text_pages:
                results[i] = pdf.pages[i].extract_text()

    if image_pages:
        all_imgs = convert_from_path(path, dpi=150)
        for i in image_pages:
            page = all_imgs[i]
            page.thumbnail((1024, 1024))
            # send to Claude vision, store result in results[i]
            ...

    return [results[i] for i in sorted(results)]

Slightly more code, same principle. You only pay vision rates on the pages that actually need it.

Things worth checking on your own corpus

Run is_text_pdf against a sample of 50 documents and eyeball the classification — for some scanners the OCR layer is just barely present, which fools the 50-character threshold
Spot-check three or four scanned invoices at DPI 150 vs DPI 200 to confirm digit accuracy on your specific fonts
Log token counts per invoice for the first week so you can prove the savings to whoever's paying the bill

The Bigger Lesson About Managed Endpoints

This isn't really about PDFs. It's that managed AI endpoints make convenient defaults that quietly cost you money on edge cases. Native PDF upload is the right choice for clean text documents. It's the wrong choice for scans, and the API gives you no signal about which one it just did.

The same pattern shows up elsewhere:

File-upload endpoints that re-encode audio at higher bitrates than your source
Vision endpoints that accept any image size and bill at the highest tier the dimensions fall into
Embedding endpoints that don't deduplicate near-identical chunks before billing

If you're processing real-world documents at any volume — invoices, receipts, contracts, forms, ID cards — assume the default path is leaving money on the table and check before you scale. The fix is almost always a small upstream router, not a model change.

Why bizflowai.io helps with this

We build document-processing pipelines for small businesses that already look like this under the hood — a thin classifier in front of the LLM, vision inputs sized for the actual job, per-page routing for mixed documents, and token logging so the savings are provable instead of theoretical. The 200-invoice procurement workflow in this post is a redacted version of a live client system. If you're processing scans at volume and the bill feels off, the fix is usually 20 lines of routing code, not a model swap.

Frequently asked questions

Why is Claude's native PDF upload so expensive for scanned invoices?

When you upload a PDF to Claude, it checks for extractable text. If the PDF is a scan, phone photo, or image-based file, Claude silently rasterizes every page at a high internal resolution and processes it as vision input. You pay vision-token rates on oversized images without warning. In one real case, this averaged 4,800 tokens per single-page invoice instead of the much lower text-token rate.

How do I cut Claude PDF processing costs for image-based invoices?

Build an upstream router using pdfplumber, pdf2image, and Pillow. Use pdfplumber to check if a PDF has extractable text — if yes, send it to Claude's native PDF endpoint. If extract_text returns empty or junk, convert pages with pdf2image at 150 DPI, resize the longest edge to 1024 pixels with Pillow, save as JPEG at quality 85, and send as vision inputs. This cut costs from 4,800 to 950 tokens per invoice.

What DPI and image size should I use for Claude vision OCR on invoices?

Use 150 DPI when converting PDF pages to images with pdf2image — enough resolution for OCR-quality field extraction without overkill. Then resize the longest edge to 1024 pixels using Pillow's thumbnail method and save as JPEG at quality 85. This combination is small enough to be cheap but sharp enough that Claude reads every digit correctly, maintaining 94% accuracy on a twelve-field invoice schema.

How do I handle mixed PDFs with both text pages and scanned pages?

Loop pdfplumber per page rather than per document. Tag each page as text-based or image-based, then split the document and route each page down the correct path — native PDF endpoint for text pages, image conversion and vision input for scanned pages. Stitch the extracted fields back together at the end. This handles common cases like contracts with photographed signature pages stapled in.

When should I bypass managed AI endpoint defaults?

Whenever you process real-world documents at volume — invoices, receipts, contracts, or forms — assume the default path is leaving money on the table. Managed AI endpoints offer convenient defaults that quietly cost more on edge cases like image-based PDFs. Check what's actually happening under the hood and handle edge cases upstream yourself. In one invoice pipeline, this produced a 5x cost reduction with identical 94% accuracy.

Want more like this?

I publish practical AI automation, GenAI engineering, and faceless content workflows on YouTube every week.

Subscribe to bizflowai.io on YouTube — never miss a new tutorial.

Planning an AI automation project or need a second opinion on your architecture?

Connect with me on LinkedIn — Lazar Milicevic, GenAI Engineer & bizflowai.io Founder.