Stop Uploading Scanned PDFs to Claude — Do This Instead

Abstract tech illustration: Stop Uploading Scanned PDFs to Claude — Do This Instead

You built an invoice automation. It worked on three clean PDFs. Then real documents started arriving — phone photos, crumpled scans, faxed delivery notes — and your accuracy quietly collapsed without a single error log to warn you. This is the part of document AI nobody benchmarks.

The silent failure mode in Claude PDF uploads

Here's the scenario I see almost every week with clients. Someone wires up an automation that reads supplier invoices, extracts the total, tax ID, supplier name, and dumps it into their accounting system. Tested on a handful of clean PDFs exported from Word or an invoicing tool. Ships to production. Six weeks later, the books don't reconcile and nobody knows why.

On a real client pipeline, I ran 250 scanned Serbian SMB invoices through the standard "upload PDF to Claude" workflow. The result:

  • 29% of extracted fields were wrong. Totals off by a digit. Tax IDs missing one character. Supplier names truncated mid-word.
  • Zero errors raised. The system always returned a number. It was just the wrong number.
  • At a modest 500 invoices/month that's roughly 145 broken accounting entries every month, slipping into the ledger silently.

The reason this happens is mechanical, not magical. When you upload a PDF to Claude, it tries to read the embedded text layer first. A PDF exported from a digital source has a perfect text layer — accuracy is excellent. But a scanned or phone-photographed invoice has no text layer at all. It's an image wrapped in a PDF container. Claude falls back to a weaker OCR path, and on noisy real-world scans with stamps, blue ink signatures, thermal receipt fade, handwriting, or Cyrillic characters, that fallback path struggles.

The default documentation example never shows you this because the docs test on clean digital PDFs.

Skip the PDF. Send images directly to vision.

The fix is almost embarrassingly simple: stop sending PDFs. Convert each page to a high-resolution image, then send it to Claude as a vision input with a structured output schema.

In Python it's about fifteen lines:

import base64
import json
from pdf2image import convert_from_path
from io import BytesIO
from anthropic import Anthropic

client = Anthropic()

SCHEMA = """
{
  "invoice_number": "string",
  "supplier_name": "string",
  "tax_id": "string",
  "issue_date": "YYYY-MM-DD",
  "total": "number",
  "currency": "string"
}
"""

def extract_invoice(pdf_path: str) -> dict:
    pages = convert_from_path(pdf_path, dpi=200)
    page = pages[0]  # first page is enough for 95% of invoices

    buf = BytesIO()
    page.save(buf, format="PNG")
    img_b64 = base64.standard_b64encode(buf.getvalue()).decode()

    msg = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": img_b64,
                    },
                },
                {
                    "type": "text",
                    "text": f"Extract this invoice as JSON matching schema:\n{SCHEMA}\nReturn ONLY JSON.",
                },
            ],
        }],
    )
    return json.loads(msg.content[0].text)

That's the whole pipeline. The two parts that matter:

  • dpi=200 — high enough that 6pt tax IDs and stamped reference numbers stay legible, low enough that you don't blow your token budget. I tested 150, 200, 300. 200 is the sweet spot; 300 added 40% to image tokens without meaningful accuracy gain.
  • Schema in the prompt — Claude follows JSON schemas reliably when you give it explicit field types. Don't ask for "the important data." Name every field.

Real numbers on 250 invoices

Same 250 documents. Same model. Only the input format changed.

Metric PDF upload (default) Image + vision (this approach)
Field-level accuracy 71% 93%
Cost per document $0.018 $0.011
Latency per invoice 4.2 s 2.6 s
Silent errors / 1000 docs ~290 ~70

Three wins on one swap. Cost dropped because vision tokens on a single rasterized page are cheaper than the PDF parser's overhead on a multi-page document where Claude has to scan every page for text-layer hints. Latency dropped for the same reason — one image is faster to process than a multi-page PDF round-trip.

The accuracy jump from 71% to 93% is where the business value lives. At 500 invoices/month, that's the difference between 145 broken entries and 35 — and the 35 are mostly handwritten amounts on thermal receipts, which we can catch with one more guardrail.

The confidence-score guardrail that makes it production-safe

93% is great. 93% is not good enough to push straight into accounting software without review. So in production I add one more step: ask the model to return a confidence score per field, and route anything below a threshold to a human review queue.

SCHEMA_WITH_CONFIDENCE = """
{
  "invoice_number": {"value": "string", "confidence": 0.0-1.0},
  "supplier_name":  {"value": "string", "confidence": 0.0-1.0},
  "tax_id":         {"value": "string", "confidence": 0.0-1.0},
  "issue_date":     {"value": "YYYY-MM-DD", "confidence": 0.0-1.0},
  "total":          {"value": "number", "confidence": 0.0-1.0},
  "currency":       {"value": "string", "confidence": 0.0-1.0}
}
"""

CRITICAL_FIELDS = {"total", "tax_id"}
THRESHOLD = 0.85

def needs_review(extraction: dict) -> bool:
    for field in CRITICAL_FIELDS:
        if extraction[field]["confidence"] < THRESHOLD:
            return True
    return False

The honest gotcha: handwritten amounts, especially on small thermal receipts where the cashier scribbled the total in pen, still trip the model maybe one time in fifteen. The confidence guardrail catches almost all of them. Claude is reasonably well-calibrated on these scores when you ask explicitly — it doesn't return 0.99 on a field it can't actually read.

In practice this routes about 8–12% of invoices to a human queue. On 500 docs/month, that's roughly 50 documents for a human to glance at — two minutes of work per day to keep the books clean. That's the difference between an automation you trust and one that silently corrupts your ledger.

What I'd add if you're scaling past 1000 docs/month

  • Page rotation detection (Pillow's ImageOps.exif_transpose plus a quick orientation check) — phone photos arrive sideways constantly
  • A second pass on review-queue items with a different prompt that focuses only on the failed fields
  • Logging the raw image + extraction to S3 or local storage so you can replay failed cases when you tune prompts

Why most tutorials miss this

The default workflow in any tool's documentation is built for the clean demo case. The Anthropic docs show PDF upload because for the demo PDFs they ship with, it works perfectly. The example invoices in every "build an AI invoice processor" tutorial are crisp digital exports from QuickBooks.

Real business documents are messy. They're photographed in bad office lighting, rotated 90 degrees, stained with coffee, printed on a 2009 LaserJet that's running out of toner, mixing Cyrillic and Latin characters, with handwritten notes in the margin. The default path will silently fail on these, and you won't know until your accountant calls.

The rule I now apply to every document intake project:

  • Assume the input is the worst version of itself
  • Design for the messy case from day one, not the demo case
  • Always return a confidence signal, not just a value
  • Route low-confidence outputs to humans before they hit a system of record

This isn't specific to invoices. The same pattern — image-first input, schema-bound output, per-field confidence, human review queue — applies to contracts, delivery notes, KYC documents, medical forms, anything where the source is paper or a phone photo and the destination is a database that has to be right.

Why bizflowai.io helps with this

This image-first vision pipeline with confidence-routed human review is exactly what I deploy for clients running invoice intake, receipt processing, and supplier document workflows through bizflowai.io. The setup includes the rasterization step, the structured schema, the confidence guardrail, and the routing logic into whatever accounting system you already use — so you get the 93% straight-through rate without rebuilding the plumbing from scratch.

Frequently asked questions

Why does Claude's default PDF upload fail on scanned invoices?

Claude's PDF upload first reads the document's text layer, which works well for clean digital PDFs from Word or invoicing tools. But scanned or photographed invoices have no text layer — they're images wrapped in a PDF container. Claude falls back to a weaker OCR path that struggles with stamps, handwriting, noisy scans, or Cyrillic characters, producing wrong-but-plausible values like totals off by a digit.

How do I improve Claude's accuracy on scanned invoice extraction?

Skip the PDF entirely. Convert each page to a high-resolution image using pdf2image at 200 DPI, base64-encode the PNG, and send it to Claude's messages API as a vision input with image content blocks. Define a structured JSON schema (invoice_number, supplier_name, tax_id, total, currency, issue_date) and ask Claude to return JSON matching it. The full implementation is about fifteen lines of Python.

How much does converting PDFs to images improve Claude extraction accuracy?

On a test of 250 scanned Serbian invoices, switching from PDF upload to page-by-page vision input moved field accuracy from 71% to 93%. Cost per document dropped from 1.8 cents to 1.1 cents because vision tokens on a single image page are cheaper than the PDF parser's overhead. Latency also dropped from 4.2 seconds to 2.6 seconds per invoice.

How do I handle handwritten amounts in automated invoice processing?

Handwritten amounts on small thermal receipts still trip the model roughly one time in fifteen. The production fix is a confidence check: ask Claude to return a confidence score per field, and route any document with a score below 0.85 on the total or tax ID to a human review queue instead of pushing it directly to accounting. This guardrail prevents silent data corruption.

When should I trust a tool's default document workflow versus building a custom path?

Default workflows in tool documentation are built for the clean demo case — digital PDFs, perfect text layers, single language. If you're automating intake of real business documents like invoices, contracts, delivery notes, receipts, or KYC files, assume the default path will silently fail on photographed, rotated, stained, or multilingual inputs. Design for the messy case from day one rather than discovering errors in production.


Want more like this?

I publish practical AI automation, GenAI engineering, and faceless content workflows on YouTube every week.

Subscribe to bizflowai.io on YouTube — never miss a new tutorial.

Planning an AI automation project or need a second opinion on your architecture?

Connect with me on LinkedIn — Lazar Milicevic, GenAI Engineer & bizflowai.io Founder.

Visit bizflowai.io for our services, case studies, and AI consulting.

Frequently asked questions

Why does Claude's default PDF upload fail on scanned invoices?

Claude's PDF upload first reads the document's text layer, which works well for clean digital PDFs from Word or invoicing tools. But scanned or photographed invoices have no text layer — they're images wrapped in a PDF container. Claude falls back to a weaker OCR path that struggles with stamps, handwriting, noisy scans, or Cyrillic characters, producing wrong-but-plausible values like totals off by a digit.

How do I improve Claude's accuracy on scanned invoice extraction?

Skip the PDF entirely. Convert each page to a high-resolution image using pdf2image at 200 DPI, base64-encode the PNG, and send it to Claude's messages API as a vision input with image content blocks. Define a structured JSON schema (invoice_number, supplier_name, tax_id, total, currency, issue_date) and ask Claude to return JSON matching it. The full implementation is about fifteen lines of Python.

How much does converting PDFs to images improve Claude extraction accuracy?

On a test of 250 scanned Serbian invoices, switching from PDF upload to page-by-page vision input moved field accuracy from 71% to 93%. Cost per document dropped from 1.8 cents to 1.1 cents because vision tokens on a single image page are cheaper than the PDF parser's overhead. Latency also dropped from 4.2 seconds to 2.6 seconds per invoice.

How do I handle handwritten amounts in automated invoice processing?

Handwritten amounts on small thermal receipts still trip the model roughly one time in fifteen. The production fix is a confidence check: ask Claude to return a confidence score per field, and route any document with a score below 0.85 on the total or tax ID to a human review queue instead of pushing it directly to accounting. This guardrail prevents silent data corruption.

When should I trust a tool's default document workflow versus building a custom path?

Default workflows in tool documentation are built for the clean demo case — digital PDFs, perfect text layers, single language. If you're automating intake of real business documents like invoices, contracts, delivery notes, receipts, or KYC files, assume the default path will silently fail on photographed, rotated, stained, or multilingual inputs. Design for the messy case from day one rather than discovering errors in production.