PDF Upload to Claude: 11.4s vs 2.8s — The Markdown Swap

By Lazar Milicevic · Published June 12, 2026 · 6 min read

You've got a folder of supplier invoices piping into Claude for extraction. It works, but a 3-page invoice somehow eats 14,000 input tokens and takes 11 seconds. I ran 1,200 documents through two pipelines last month and the fix turned out to be 40 lines of Python.

What Claude actually does when you upload a PDF

When you attach a PDF to a Claude message, the model doesn't just read the text. It rasterizes every page, treats it as an image-plus-text composite, runs its own internal extraction layer, and bills you for the reconstructed view. That's why a clean 3-page invoice — maybe 800 words of actual content — shows up as 14,000 input tokens in your logs.

On native PDFs with embedded text, this is wasteful. On scans, it's worse: you're paying for an OCR step you can't tune or inspect. And on multi-column layouts (think two-column statements, invoices with sidebar metadata), Claude sometimes hallucinates structure that isn't there because the visual model guessed wrong about reading order.

The pricing model isn't broken. The workflow is. You're paying Claude to do a job your laptop can do for free.

The benchmark: 1,200 invoices, two pipelines, same prompt

Real numbers, no estimates. Last 30 days of production logs on a client invoice extraction job:

Pipeline A — Raw PDF upload to Claude, standard extraction prompt
Pipeline B — PDF → Markdown preprocessor → Claude with the Markdown as plain text content, same prompt

Same model. Same prompt. Same 1,200-document batch (native PDFs, scans, multi-column — the real-world mix, not a curated demo set).

Results

Pipeline A: 11.4s avg latency, 14,200 input tokens/doc
Pipeline B: 2.8s avg latency, 3,100 input tokens/doc
Speedup: 4.07×
Token reduction: 78%
Accuracy on structured fields: identical
Accuracy on the scanned subset: slightly better with Pipeline B, because the OCR step was explicit and tunable instead of hidden inside Claude

That last point matters. When OCR lives inside the model, you can't fix bad output. When it lives in your preprocessor, you swap engines, tune DPI, add a confidence threshold. You own the failure mode.

The 40-line preprocessor

Here's the core of it. Native PDFs go through pdfplumber for text + basic layout. Pages with no extractable text fall back to Tesseract OCR.

import pdfplumber
import pytesseract
from pdf2image import convert_from_path
from pathlib import Path

def pdf_to_markdown(pdf_path: str) -> str:
    """Convert a PDF (native or scanned) to clean Markdown."""
    pdf_path = Path(pdf_path)
    pages_md = []

    with pdfplumber.open(pdf_path) as pdf:
        for i, page in enumerate(pdf.pages, start=1):
            text = page.extract_text() or ""

            # Native PDF path: text extracted cleanly
            if len(text.strip()) > 40:
                pages_md.append(f"## Page {i}\n\n{text.strip()}")
                continue

            # Scan path: fall back to OCR for this page only
            images = convert_from_path(
                str(pdf_path),
                first_page=i, last_page=i, dpi=300
            )
            ocr_text = pytesseract.image_to_string(images[0], lang="eng")
            pages_md.append(f"## Page {i}\n\n{ocr_text.strip()}")

    return "\n\n---\n\n".join(pages_md)

Then in your Claude call, the Markdown goes in as plain text:

import anthropic

client = anthropic.Anthropic()
markdown = pdf_to_markdown("invoice_2024_11_03.pdf")

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": (
            "Extract invoice_number, vendor, total, line_items "
            "as JSON from this document:\n\n" + markdown
        )
    }]
)

No PDF attachment. No document content block. Plain text. That's the whole swap.

Production notes

Use marker instead of pdfplumber if you need better table layout preservation — slower but cleaner output
For high-volume cloud OCR, swap Tesseract for Google Document AI or AWS Textract on the scan fallback
Cache the Markdown output keyed by file hash — reruns become free

Pre-filtering: another 20% off the token bill

Because you control the Markdown step, you can drop content before Claude ever sees it. Headers, footers, page numbers, repeated boilerplate ("Thank you for your business" on every page), legal fine print that never contains extractable fields.

import re

BOILERPLATE_PATTERNS = [
    r"Page \d+ of \d+",
    r"Confidential.*?\n",
    r"Thank you for your business\.?",
    r"www\.[a-z0-9-]+\.[a-z]{2,}/terms",
]

def clean_markdown(md: str) -> str:
    for pattern in BOILERPLATE_PATTERNS:
        md = re.sub(pattern, "", md, flags=re.IGNORECASE)
    # collapse 3+ newlines to 2
    md = re.sub(r"\n{3,}", "\n\n", md)
    return md.strip()

On the invoice batch, this knocked another ~20% off the token count with zero accuracy loss. By the time Claude sees the document, it's already focused on the rows that matter.

When to skip this entirely

The setup cost isn't free. Skip the preprocessor if:

You're having a one-off conversation about a single contract — just upload the PDF
You process fewer than 5 documents a day and latency doesn't stack
The documents are highly visual (architectural drawings, complex charts) where layout is the data

The swap pays off when you're running volume: anything north of 50 documents/day, or any workflow where one document blocks the next in a queue. At that point the 4× speedup turns into hours of wall-clock time back per week, and the 78% token cut shows up on your monthly invoice.

On the client setup I deployed last month, the preprocessor runs on a small server with a cron job. Invoices land in a watched folder during the day, get processed at 2 AM, and the team wakes up to structured rows in Postgres. Nobody uploads anything by hand anymore.

Why bizflowai.io helps with this

This is the kind of pipeline I build for clients every week — document intake, OCR fallback, structured extraction, database write, monitoring. The preprocessor above is the shape of it, but the production version includes retry logic, per-page confidence scoring, schema validation on the JSON Claude returns, and alerting when extraction drifts. If you're running invoice, contract, or statement workflows at volume, bizflowai.io is where I package these as end-to-end systems instead of one-off scripts.

Frequently asked questions

Why does uploading PDFs directly to Claude use so many tokens?

When you upload a PDF directly to Claude, the model internally rasterizes and parses the document, treating every page as image plus text and running its own extraction layer. You're charged tokens for the entire reconstructed view. On a 3-page invoice, this can consume around 14,000 input tokens. The process is especially wasteful on native PDFs with clean text and can hallucinate structure on multi-column layouts.

How do I reduce token costs when extracting data from PDFs with Claude?

Convert the PDF to clean Markdown on your own machine before sending it to Claude. Use a roughly 40-line Python preprocessor with pdfplumber or marker for native PDFs, and Tesseract or a cloud OCR as a fallback for scanned pages. Send the resulting Markdown string to Claude as plain text content. In a 1,200-invoice test, this cut input tokens by 78% and made processing 4x faster.

When should I use Markdown preprocessing vs uploading PDFs directly to Claude?

Use direct PDF upload for one-off chats or occasional documents under five pages — the setup cost isn't worth it. Switch to Markdown preprocessing when running volume: north of 50 documents a day, or workflows where latency stacks because one document blocks the next. At that scale, the 4x speedup saves hours and the token reduction shows up clearly on your monthly invoice.

Does Markdown preprocessing hurt extraction accuracy compared to raw PDF upload?

No. In a test on 1,200 mixed invoices including native PDFs, scanned PDFs, and multi-column layouts, accuracy on structured fields was the same. Accuracy was actually slightly better on the scanned subset because the OCR step was explicit and tunable rather than hidden inside Claude. Both pipelines used the same model and the same extraction prompt.

How can I further reduce tokens beyond converting PDFs to Markdown?

Because you control the Markdown preprocessing step, you can pre-filter the document before Claude sees it. Strip headers and footers, drop page numbers, and collapse repeated boilerplate so the model focuses only on the content that matters. This filtering typically cuts another 20% off your token count, on top of the 78% reduction already gained from converting PDFs to Markdown.

Want more like this?

I publish practical AI automation, GenAI engineering, and faceless content workflows on YouTube every week.

Subscribe to bizflowai.io on YouTube — never miss a new tutorial.

Planning an AI automation project or need a second opinion on your architecture?

Connect with me on LinkedIn — Lazar Milicevic, GenAI Engineer & bizflowai.io Founder.

Visit bizflowai.io for our services, case studies, and AI consulting.