Claude 429'd at PDF #31 — Here's the 12-Line Fix

By Lazar Milicevic · Published June 18, 2026 · 8 min read

A small invoicing client sends us 200 supplier PDFs every morning. The first version of the pipeline died at file 31 with a 429, burned 14 minutes 22 seconds, and left 169 invoices untouched. The fix was twelve lines of Python. Here's exactly what broke, why it broke, and the pattern I use in production.

The pipeline that 429'd before lunch

The job spec was boring: 200 supplier PDFs a day, mixed vendors, mixed layouts, half with tables, some scanned, some text-native. Extract line items, totals, VAT, dates, vendor name. Push structured JSON into the client's accounting system.

The first version was the obvious version — loop the folder, attach each PDF to a Claude request, ask for JSON back:

# v1 — the demo pattern that dies in production
import anthropic, base64, pathlib, json

client = anthropic.Anthropic()

for pdf_path in pathlib.Path("invoices").glob("*.pdf"):
    pdf_b64 = base64.standard_b64encode(pdf_path.read_bytes()).decode()
    resp = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=2000,
        messages=[{
            "role": "user",
            "content": [
                {"type": "document", "source": {
                    "type": "base64",
                    "media_type": "application/pdf",
                    "data": pdf_b64,
                }},
                {"type": "text", "text": "Return invoice fields as JSON."},
            ],
        }],
    )
    json.loads(resp.content[0].text)  # often doesn't even reach here

Worked beautifully on the first ten files. By file 31 — 429. Rate limit. Batch dead. 14:22 wasted. The client's morning reconciliation didn't happen.

That's the part nobody warns you about in the quickstart docs.

Why direct PDF upload destroys your token budget

When you upload a PDF directly to Claude, you are not sending text. You are sending a file that Claude has to run a vision pass on, page by page, because the model doesn't know upfront whether your PDF is text-native or a scan. It treats every page as an image.

That vision pass costs roughly 3–5× the tokens of the actual extracted text. Multiply that across 200 documents fired back-to-back and you're not just paying more — you're slamming into the per-minute input-token ceiling on your tier.

Rough numbers from the broken run vs. the fixed run on the same 200 files:

Metric	Direct PDF upload	Pre-extracted text
Tokens per invoice (avg)	15,000–20,000	2,000–4,000
Total runtime	14:22 (died at #31)	3:51 (all 200)
429 errors	yes, batch killed	0
Token usage vs baseline	100%	~22%
Line-item accuracy	inconsistent (columns bled)	consistent

The accuracy delta surprised me. The vision pass occasionally fused adjacent table columns — quantity ended up concatenated with unit price on dense layouts. Pre-extracted text with preserved column structure parsed cleaner.

One PDF at a time in a chat window? Fine. Two hundred in a batch script? You will hit the wall, and you'll hit it fast.

The 12-line swap

Before Claude ever sees the document, extract it locally with pdfplumber. Pull text, pull tables, serialize tables as pipe-delimited rows so the column structure survives the trip through the LLM.

import pdfplumber

def pdf_to_text(path: str) -> str:
    chunks = []
    with pdfplumber.open(path) as pdf:
        for page in pdf.pages:
            text = page.extract_text() or ""
            chunks.append(text)
            for table in page.extract_tables() or []:
                for row in table:
                    chunks.append(" | ".join((c or "").strip() for c in row))
    return "\n".join(chunks)

That's it. That string is what goes to Claude — not the file.

The request body becomes plain text plus a system prompt that nails down the JSON schema:

SYSTEM = """You extract invoice data. Return ONLY valid JSON, no prose, no markdown fences.
Schema:
{
  "vendor": str, "invoice_number": str,
  "issue_date": "YYYY-MM-DD", "due_date": "YYYY-MM-DD",
  "line_items": [{"description": str, "qty": float, "unit_price": float, "total": float}],
  "subtotal": float, "vat": float, "total": float
}
If a field is missing, return null."""

resp = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=2000,
    system=SYSTEM,
    messages=[{"role": "user", "content": pdf_to_text(pdf_path)}],
)

Same 200-file batch. Rerun. Claude doesn't see a single PDF. It sees clean text payloads, one per invoice, each 2,000–4,000 tokens instead of 15,000–20,000. The whole batch finishes in 3 minutes 51 seconds. Zero 429s. Token usage dropped ~78% versus the direct-upload run.

The four things that bite you at scale

The 12-line swap is the headline. These four details are what separates a script that works on your laptop from a pipeline that runs every morning for a paying client.

Production checklist

Scanned PDFs return empty strings from pdfplumber. There's nothing to extract — no text layer. Detect this (len(text.strip()) < 50 per page is a decent heuristic), route those through Tesseract OCR, then back into the same text pipeline.
Chunk anything over ~30,000 characters into separate calls. One invoice per call is the cleanest unit anyway; multi-invoice PDFs should be split on the page break that separates them.
Validate every response with a strict schema before writing to the database. Roughly 1 in 50 responses comes back with a null field or a malformed date. A 10-line pydantic model catches it.
Failed parses go to a retry queue, not a crash. One bad PDF shouldn't kill the batch. In the pipeline I run for this client (thousands of docs per week) the retry queue rarely has more than two or three documents in it on any given day.

Here's the validation layer I drop into every one of these pipelines:

from pydantic import BaseModel, ValidationError
from datetime import date

class LineItem(BaseModel):
    description: str
    qty: float
    unit_price: float
    total: float

class Invoice(BaseModel):
    vendor: str
    invoice_number: str
    issue_date: date | None
    due_date: date | None
    line_items: list[LineItem]
    subtotal: float
    vat: float
    total: float

def parse_or_queue(raw: str, source_path: str, retry_queue: list):
    try:
        return Invoice.model_validate_json(raw)
    except (ValidationError, ValueError) as e:
        retry_queue.append({"path": source_path, "error": str(e), "raw": raw})
        return None

Scanned-PDF fallback is just as boring and just as essential:

import pytesseract
from pdf2image import convert_from_path

def pdf_to_text_with_ocr_fallback(path: str) -> str:
    text = pdf_to_text(path)
    if len(text.strip()) < 100:  # likely a scan
        images = convert_from_path(path, dpi=200)
        text = "\n".join(pytesseract.image_to_string(img) for img in images)
    return text

OCR is slower — figure 2–5 seconds per page versus 50–100 ms for pdfplumber — so you only want to fall back when you have to.

When direct PDF upload is actually fine

I'm not saying never send PDFs to Claude. The vision pass is genuinely useful when:

The document is a one-off, not a batch (single contract review, ad-hoc question).
Layout itself carries meaning you can't recover from text (stamps, signatures, handwritten annotations, form checkboxes positioned on a page).
You're under maybe 10–15 documents in a session and not hitting rate limits.

The moment volume goes up or the workload is repeating daily, pre-extraction is not an optimization. It's the difference between a pipeline that ships and a pipeline that 429s before breakfast.

Quick decision table I use:

Situation	Pattern
1–15 docs, ad-hoc	Direct PDF upload, fine
Batch >15 docs, text-native	pdfplumber → text → Claude
Batch with mixed scans	pdfplumber → OCR fallback → Claude
Layout-critical (forms, stamps)	Direct PDF upload, accept the token cost
Daily recurring job	Always pre-extract

Why bizflowai.io helps with this

Batch document pipelines — invoices, contracts, supplier statements, weekly reports — are most of what I build for clients through bizflowai.io. The pattern in this post (pdfplumber + schema-locked JSON + pydantic validation + retry queue) is the default starting template for any document-extraction workflow we ship. If you're processing dozens or hundreds of recurring documents and your current setup is either a person doing data entry or a script that occasionally 429s, that's the gap we close.

Frequently asked questions

Why does uploading PDFs directly to Claude cause rate limit errors in batch jobs?

When you attach a PDF to a Claude request, Claude runs a vision pass on every page because it can't tell if the file is text-native or scanned. That vision pass costs three to five times the tokens of the extracted text. Across 200 documents, this consumes 15,000-20,000 tokens per file and quickly hits the per-minute token ceiling on your tier, triggering 429 rate limit errors.

How do I extract text from PDFs before sending them to Claude?

Use pdfplumber in Python. Open the file, loop through pages, call page.extract_text() for body text, and page.extract_tables() for any tables, serializing rows as pipe-delimited strings to preserve structure. Concatenate everything into one string and send that text to Claude instead of the PDF file. This swap is roughly twelve lines of code and replaces the vision pass entirely.

How much does pre-extracting PDF text reduce Claude token usage?

In a 200-invoice batch test, pre-extracting text with pdfplumber dropped token usage by 78% compared to direct PDF upload. Each invoice went from 15,000-20,000 tokens down to 2,000-4,000 tokens. The batch also completed in 3 minutes 51 seconds with zero 429 errors, versus the direct-upload run that died after 14 minutes 22 seconds with 169 invoices unprocessed.

When should I use OCR instead of pdfplumber for PDF extraction?

Use OCR when PDFs are scanned images rather than text-native files. Pdfplumber returns nothing from scanned PDFs because there's no embedded text to extract. For those documents, route them through Tesseract OCR first to generate text, then feed that text into the same Claude pipeline. Text-native PDFs should go straight through pdfplumber without OCR.

What are best practices for production PDF-to-JSON pipelines with Claude?

Four practices: chunk anything over roughly 30,000 characters into separate calls, ideally one invoice per call; require structured JSON output with a strict schema defined in the system prompt; validate every response with pydantic or JSON schema checks before writing to your database; and log failed parses to a retry queue rather than crashing the batch. This catches the roughly 1-in-50 malformed responses.

Want more like this?

I publish practical AI automation, GenAI engineering, and faceless content workflows on YouTube every week.

Subscribe to bizflowai.io on YouTube — never miss a new tutorial.

Planning an AI automation project or need a second opinion on your architecture?

Connect with me on LinkedIn — Lazar Milicevic, GenAI Engineer & bizflowai.io Founder.

Visit bizflowai.io for our services, case studies, and AI consulting.