5 Bugs That Broke My SaaS in Week 1 (Claude Code Caught

By Lazar Milicevic · Published June 17, 2026 · 9 min read

Every Claude Code tutorial ends at "and now you have an app." That's not the finish line. That's day zero. Here are the five specific production bugs that hit Fakturko — my Serbian invoicing SaaS — in its first week with paying customers, and the exact slash-command ops layer that now catches every one of them before I open the inbox.

Bug 1: VAT rounding that the tax authority disagreed with

Serbian VAT is 20%. Easy. Until you have an invoice with seven line items, each with its own quantity, unit price, and discount. The question is whether you round per line, then sum — or sum first, then round once. Those two paths produce different totals by a few dinars, and the Serbian tax authority has a strong opinion about which one is correct (per-line, rounded to two decimals, summed).

The first customer who noticed sent a screenshot at 11 PM on a Saturday. His accountant had recalculated the totals by hand and the invoice was off by 3 RSD. To a Western reader that sounds petty. To a Serbian business owner whose accountant has to file that number with the Poreska Uprava, it's the difference between an accepted filing and a rejected one.

The fix was twenty minutes. The lesson was bigger: every locale has tax rules that look obvious until they aren't.

# Wrong: round the total
total = sum(qty * price * (1 + VAT) for qty, price in items)
total = round(total, 2)

# Right (per Serbian rules): round each line, then sum
lines = [round(qty * price, 2) for qty, price in items]
vat   = round(sum(lines) * 0.20, 2)
total = round(sum(lines) + vat, 2)

What I learned to check on day zero

Per-line vs. per-total rounding rules for every tax jurisdiction you ship to
Discount application order (before or after VAT — both exist in the wild)
How your DB stores money (decimal, never float — I use NUMERIC(12,2) in Postgres)

Bug 2: PDFs that rendered Cyrillic as question marks

The invoice generator passed every test. Because I had tested it with Latin characters. The first customer who entered their company name in Cyrillic got a PDF where their company name was ??????? ??? ??????.

To a Serbian business, an invoice with question marks where the company name should be is not an invoice. It's an insult, and a document their bank will reject.

The bug was font embedding. The default PDF library was using a font without Cyrillic glyphs, and silently substituting question marks instead of failing loudly. Three hours to find at midnight. Two-line fix:

from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont

pdfmetrics.registerFont(TTFont("DejaVu", "fonts/DejaVuSans.ttf"))
pdfmetrics.registerFont(TTFont("DejaVu-Bold", "fonts/DejaVuSans-Bold.ttf"))
# Then in your style: fontName="DejaVu"

The deeper lesson: your test fixtures lie. If your customers will type non-ASCII, your fixtures must contain non-ASCII from the first commit. I now seed the test database with names like Предузеће "Београд" д.о.о. and Müller GmbH so any encoding regression breaks the build.

Bug 3: Stripe webhooks that weren't idempotent

Stripe will retry a failed webhook for up to 3 days, with exponential backoff. If your endpoint returns a 500 because your database was briefly slow, Stripe retries. If your handler isn't idempotent, you've just created two invoice records, sent two emails, and charged the customer's bookkeeping two line items for one payment.

This one cost real money before I caught it — one customer got two invoices for the same payment, and the refund + apology took an hour and a chunk of trust.

The fix is the standard pattern, but a lot of solo founders skip it because the happy path works in testing:

@app.post("/webhooks/stripe")
async def stripe_webhook(req: Request):
    event = stripe.Webhook.construct_event(
        await req.body(),
        req.headers["stripe-signature"],
        WEBHOOK_SECRET,
    )

    # Idempotency: every Stripe event has a unique id
    if await db.fetchval(
        "SELECT 1 FROM processed_events WHERE event_id = $1",
        event["id"],
    ):
        return {"status": "already_processed"}

    async with db.transaction():
        await handle_event(event)
        await db.execute(
            "INSERT INTO processed_events(event_id, processed_at) VALUES ($1, NOW())",
            event["id"],
        )
    return {"status": "ok"}

The processed_events table plus the transaction is the whole pattern. If the handler crashes, the INSERT rolls back, Stripe retries, and the work runs cleanly. If it succeeds, the next retry is a no-op.

Bug 4: UTC server, CET customer, wrong invoice date

The server runs in UTC. The customer is in Central European Time. An invoice created at 1 AM local time was dated the previous day on the PDF, because datetime.now() returned the UTC date.

For a tutorial app, that's cosmetic. For a business whose invoices have to match their accounting period exactly, that's a compliance problem. An invoice dated March 31 vs. April 1 lands in a different VAT filing period.

# Wrong
invoice.issue_date = datetime.utcnow().date()

# Right
from zoneinfo import ZoneInfo
invoice.issue_date = datetime.now(ZoneInfo("Europe/Belgrade")).date()

The rule I now follow: store UTC, render in the user's locale, and the user's locale is a per-tenant setting, not an environment variable. A SaaS that serves multiple time zones cannot hardcode one.

Bug 5: Silent email bounces

Invoice emails were going out. Some were bouncing. SendGrid was reporting it on a dashboard I never looked at. Customers assumed their invoice had been delivered. Their clients never received it.

I only found out when a customer asked why his client hadn't paid an invoice that, technically, had never arrived. Twelve days late.

The fix is a webhook from your email provider into your own system, with a per-invoice delivery state:

State	Meaning	Action
`queued`	Handed to provider	—
`delivered`	Accepted by recipient MTA	Mark on invoice
`bounced`	Hard bounce	Flag customer, draft notification
`deferred`	Soft bounce, retrying	Watch for 24h
`spam`	Marked as spam	Alert immediately

Now every invoice has a delivery status visible to the customer, and any bounce triggers a Telegram ping to me within 60 seconds.

The ops layer: four slash commands, not thirty subagents

Here's the part most tutorials skip. The fix for these isn't "more testing." Tests can't catch a bounced email three days after deploy. The fix is an ops layer that runs continuously, reads production signals, and either patches the problem or drafts a response before I've finished my coffee.

In Claude Code, this lives in a CLAUDE.md at the project root plus a .claude/commands/ folder. Not thirty subagents. Not multi-agent orchestration theater. Four well-scoped commands that each do one job.

# .claude/commands/triage.md
Read the last 24h of:
  - logs/app.log (filter level >= WARNING)
  - Stripe events via `stripe events list --limit 50`
  - unread emails in support@ via the IMAP helper

Correlate by invoice_id and customer_email. If a customer email
mentions an invoice number AND a webhook for that invoice failed
in the same window, treat as ONE incident.

For each incident, output:
  - root cause (one sentence)
  - proposed fix (patch or response)
  - confidence (high/medium/low)

If confidence=high and patch is < 30 lines, open a PR with a test.
Otherwise, queue for review.

The four commands I actually use daily:

/triage — correlates logs, Stripe events, and inbox into single incidents. Catches Bugs 3 and 5 automatically. Reviewing its output takes about 4 minutes a day.
/refund — reads the customer message, pulls the Stripe charge, checks docs/refund-policy.md, and either drafts the refund with a one-tap approval or drafts a polite decline with an alternative. Reply time went from 2 days to under 10 minutes.
/vat-question — Serbian small business owners ask the same 5-6 VAT questions. Claude has a short verified reference doc with source citations. Drafts the answer, links the source, I glance, send.
/deploy-check — before any push to production, reads the diff, runs targeted tests against the touched modules, checks for new untranslated strings, checks that any new money math uses Decimal, and checks that any new date handling uses an explicit timezone. Blocks the deploy if any of those fail.

The whole loop — triage in the morning, refund/VAT replies through the day, deploy-check on every push — is about 15 minutes of my attention. Before this layer existed, the same work was eating 2-3 hours a day and I was still missing things.

Why this works better than "more tests"

Tests verify what you thought to check. Production verifies what your customers actually do. The five bugs above all passed tests. They failed reality because reality contained Cyrillic characters, network blips, customers in different time zones, recipient mailservers that bounce, and accountants with strong opinions about rounding.

The ops layer doesn't replace tests. It assumes tests will be incomplete and treats production signals — logs, webhook events, support emails, delivery webhooks — as the real test suite. Claude Code is good at this specific job because it can hold the whole context (codebase + recent logs + recent events + the open email) in one head, which is exactly the correlation work a solo founder does manually at 11 PM on Saturday.

Why bizflowai.io helps with this

This same pattern — a thin ops layer of well-scoped slash commands sitting on top of a small SaaS — is what we deploy for clients at bizflowai.io. The work isn't "build you another dashboard." It's connecting the signals you already have (Stripe, your error logs, your support inbox, your email provider's bounce webhooks) into a single triage loop, plus the two or three response drafters that handle the 80% of repetitive customer messages a solo operator drowns in. Same architecture as Fakturko's, adapted to whichever stack the client already runs on.

Frequently asked questions

Why do most one-person SaaS products fail in month two?

Most solo SaaS products fail in month two because the founder gets buried in operational work after launch: refund requests, VAT questions, silently failed Stripe webhooks, and rendering bugs affecting real customers. This 'build-versus-operate gap' isn't about shipping features—it's about handling the boring, continuous production issues that nobody teaches. Founders quietly give up when operations overwhelm building.

What common bugs hit a SaaS in its first week with paying customers?

Five recurring bugs hit early-stage SaaS: VAT rounding errors (per-line vs total rounding changes the final amount), PDF encoding failures for non-Latin characters like Cyrillic, non-idempotent Stripe webhook retries causing double charges, timezone mismatches between UTC servers and local customer time on invoice dates, and silent email bounces where invoices never reach recipients but appear sent.

Why do Stripe webhooks need to be idempotent?

Stripe retries failed webhooks for up to three days. If your endpoint isn't idempotent, a single payment event processed multiple times can charge a customer twice, create duplicate invoices, and force you to issue refunds and apologies. Idempotency ensures the same webhook event produces the same result no matter how many times it's delivered, preventing real financial damage.

How do I use Claude Code to handle SaaS operations automatically?

Set up a small set of slash commands in a CLAUDE.md file at your project root—not complex multi-agent orchestration, just a handful of well-scoped commands. For example, a /triage command reads the last 24 hours of error logs, Stripe webhook events, and support inbox messages, then correlates them into single incidents and proposes plain-language fixes before you've finished your coffee.

Why does PDF Cyrillic encoding matter for international SaaS customers?

PDF generators often work in testing with Latin characters but render Cyrillic, Arabic, or other scripts as question marks due to font embedding issues. For a Serbian business, an invoice displaying question marks instead of their company name isn't just broken—it's unusable and unprofessional. The fix is typically two lines once identified, but can take hours to diagnose under production pressure.

Want more like this?

I publish practical AI automation, GenAI engineering, and faceless content workflows on YouTube every week.

Subscribe to bizflowai.io on YouTube — never miss a new tutorial.

Planning an AI automation project or need a second opinion on your architecture?

Connect with me on LinkedIn — Lazar Milicevic, GenAI Engineer & bizflowai.io Founder.