Germany Just Made Google Legally Liable For AI Overview Lies

German legal scales weighing a Google AI Overview hallucination, symbolizing new liability for AI lies

A German court just decided that when Google's AI Overview writes a sentence, Google said it — not the sites it summarized. If you run a support bot, a quoting assistant, or any LLM feature that talks to customers in the EU, that same logic now points at you. Here's what actually changes at the code level, and what to ship this week.

The ruling in one paragraph, no legal fluff

A German court held that Google's AI Overviews are Google's own speech. Not a neutral aggregation. Not a search index pointing at third parties. The model generated the text, Google operates the model, Google is the publisher. The case was brought by a business that got described inaccurately by a hallucinated summary, and Google's defense — "it's synthesized from third-party sources" — was rejected. The court also addressed the "AI may make mistakes" disclaimer directly: a disclaimer does not convert a defamatory statement into a non-defamatory one.

The reason this matters outside Google's legal department: the logic generalizes cleanly. Any deployer of an LLM that displays generated text to a user is, under the same reasoning, the publisher of that text. Not OpenAI. Not Anthropic. You. Your terms of service do not override EU consumer protection law, and "we just call the API" is not a defense any more than "we just run the model" worked for Google.

Step one: actually list every place your product generates text

Most teams I work with cannot produce this list when I ask for it. They know about the chatbot. They forget about the seven other surfaces where an LLM writes to a human.

Sit down for thirty minutes and write the inventory. A useful starter:

  • Customer-facing chat (support widget, in-app assistant)
  • Email auto-replies and email drafts shown to staff before send
  • Generated product descriptions, SEO copy, category text
  • Sales/quoting tools that produce pricing or terms in natural language
  • Internal copilots that summarize contracts, tickets, or call transcripts
  • Anything that auto-fills a form field a customer will see

For each row, write down four columns: surface, model, inputs (what the prompt sees), grounding source (or "none"). If the grounding column says "none" anywhere a customer sees the output, that's your highest-risk surface. Fix that one first.

Step two: stop letting the model answer from its weights

The court's logic punishes free-form generation. The defense — when it eventually works — will be "the system retrieved verified facts from our database and quoted them, with citations, and refused when confidence was low." That's RAG. It's boring, it's been the correct architecture for two years, and most shipped chatbots still don't do it properly.

Concrete pattern. The model gets a question, retrieves from your verified source, and is instructed to refuse if the retrieval is weak:

def answer(question: str) -> dict:
    hits = vector_store.search(question, k=5)
    top_score = hits[0].score if hits else 0.0

    # Refuse below threshold instead of guessing
    if top_score < 0.72 or not hits:
        return {
            "answer": "I don't have a verified answer for that. "
                      "A human will follow up.",
            "sources": [],
            "refused": True,
        }

    context = "\n\n".join(f"[{i}] {h.text}" for i, h in enumerate(hits))
    prompt = f"""Answer ONLY using the sources below.
If the sources do not contain the answer, reply exactly: NO_ANSWER.
Cite sources inline as [0], [1], etc.

Sources:
{context}

Question: {question}"""

    out = llm.complete(prompt, temperature=0)
    if "NO_ANSWER" in out:
        return {"answer": None, "sources": [], "refused": True}

    return {
        "answer": out,
        "sources": [h.url for h in hits],
        "refused": False,
    }

A few things this does that a naked wrapper doesn't:

  • Hard refuse path when retrieval is weak — no "make something up because the user asked nicely."
  • Temperature 0 — you want the same input to produce the same output, every time, for audit.
  • Inline citations — so a human reviewer (or a judge) can trace each claim back to a source document.

This won't make you bulletproof. It will make you defensible, which is the actual goal.

Step three: log everything, or you have no defense

Post-ruling, "the model said something weird" complaints become legal letters. If you can't reconstruct what the system saw, what it retrieved, and what it returned, you have no story to tell. No logs, no defense.

Minimum schema for every generated response:

request_id: uuid
timestamp: 2025-01-15T14:22:01Z
user_id: hashed_id
surface: support_chat        # or email_draft, quote_gen, etc.
model: claude-sonnet-4-5
model_version: "20250114"
prompt_template_hash: sha256:8f3a...
user_input: "Do you ship to Austria?"
retrieved_chunks:
  - source_id: kb_shipping_eu
    score: 0.89
    text: "We ship to all EU countries..."
  - source_id: kb_shipping_austria
    score: 0.81
    text: "Austria: 3-5 business days, EUR 9.90..."
final_prompt_sha256: "a1b2c3..."
raw_output: "Yes, we ship to Austria in 3-5 business days for EUR 9.90 [1]."
displayed_to_user: true
refused: false
latency_ms: 1840

Two things worth doing on top of this:

  • Retention policy that survives a complaint window. EU statute of limitations on these claims is not 30 days. Plan for 1-3 years of retention on at least the metadata + hashes, even if you rotate the full text out of hot storage.
  • Prompt template versioning. When the prompt changes, the hash changes. When a regression happens, you can say exactly which version was live on the date of the incident.

The disclaimer trap, and what to do instead

The court explicitly addressed the "AI may make mistakes" footer. It doesn't work. It cannot convert false speech into non-false speech, and EU consumer protection doesn't let you contract your way out of harm.

What actually reduces risk:

  • Scope limits in the prompt. The bot answers questions about your product, shipping, pricing, returns. For anything else, it refuses. Narrow scope = fewer surfaces to hallucinate on.
  • Human-in-the-loop on high-stakes surfaces. Quotes, contracts, anything legal or financial — the LLM drafts, a human sends. Use the model to save typing, not to make commitments.
  • Citation-required UX. If the answer doesn't have a citation rendered next to it, the UI doesn't show it. The model literally cannot ship an uncited claim to the user because your frontend filters it out.
  • Named-entity guardrails. If the user's question or the model's draft answer contains a third-party brand or person's name, route to a stricter path or refuse. Defamation cases almost always involve naming someone.

None of this is exotic. It's the boring infrastructure that the spray-and-pray crowd skipped to ship faster.

Why this ruling sorts the market

For two years the playbook has been: bolt an LLM onto the product, ship, deal with hallucinations later. Later just arrived, and it's a German courtroom — which under EU law means it's everyone's courtroom eventually.

The teams who built retrieval, citations, refusal paths, and audit trails are about to look like they were paranoid for good reason. The teams who shipped a naked GPT wrapper with a footer disclaimer are about to find out that their ToS doesn't beat consumer protection law. If you're in the second group, you have maybe a few quarters before someone tests the precedent against a smaller defendant. Use that time.

Why bizflowai.io helps with this

Most of what I ship for clients — support assistants, quoting tools, internal copilots — is already built on this pattern by default: grounded retrieval from the client's verified data, refusal when confidence is low, full prompt/response/source logging, and citation-required output. Not because we predicted a German court ruling, but because shipping an LLM feature that can confidently lie to a paying customer was always a bad business decision. The ruling just made the bill come due faster.

Frequently asked questions

What did the German court rule about Google's AI Overview?

A German court ruled that text generated by Google's AI Overview is Google's own speech, not a neutral aggregation of third-party sources. This means if the overview makes a false or defamatory statement, Google is the publisher and is legally liable. The case was brought by a company harmed by a hallucinated summary, and the court rejected Google's defense that the output was merely synthesized from other sources.

Why does the German AI Overview ruling matter for startups and founders?

The legal logic generalizes beyond Google. If your company ships a customer-facing AI feature — a support bot, sales assistant, or contract summarizer — you become the publisher of whatever text it generates, not OpenAI or Anthropic. EU courts are now comfortable assigning liability to the deployer of the AI system, meaning founders are directly exposed to defamation and consumer protection claims for their AI's outputs.

Does an 'AI may make mistakes' disclaimer protect me from liability?

No. The German court explicitly addressed disclaimers and ruled they do not convert a defamatory statement into a non-defamatory one. Adding a notice that AI output may be inaccurate does not shield the deployer from liability when the system generates false or harmful content about a person or business. Terms of service also do not override consumer protection law in the EU.

How do I reduce AI liability risk in a customer-facing product?

Take three steps: First, audit every place your AI writes to customers — emails, chat answers, quotes, product descriptions. Second, add a grounding layer using retrieval-augmented generation (RAG) so the model answers from verified sources like your product database or pricing sheet, and refuses when retrieval confidence is low. Third, log every prompt, output, and source document used so you can prove what the system saw if complaints arise.

When should I use RAG instead of letting an LLM answer from its weights?

Use retrieval-augmented generation any time a wrong answer could cause harm — including pricing, product details, contract terms, or factual claims about people or businesses. Letting the LLM answer from its training weights invites hallucinations, which post-ruling expose you as the publisher. RAG retrieves from verified sources and should be paired with a prompt that refuses to answer when retrieval confidence is low.


Want more like this?

I publish practical AI automation, GenAI engineering, and faceless content workflows on YouTube every week.

Subscribe to bizflowai.io on YouTube — never miss a new tutorial.

Planning an AI automation project or need a second opinion on your architecture?

Connect with me on LinkedIn — Lazar Milicevic, GenAI Engineer & bizflowai.io Founder.

Visit bizflowai.io for our services, case studies, and AI consulting.

Frequently asked questions

What did the German court rule about Google's AI Overview?

A German court ruled that text generated by Google's AI Overview is Google's own speech, not a neutral aggregation of third-party sources. This means if the overview makes a false or defamatory statement, Google is the publisher and is legally liable. The case was brought by a company harmed by a hallucinated summary, and the court rejected Google's defense that the output was merely synthesized from other sources.

Why does the German AI Overview ruling matter for startups and founders?

The legal logic generalizes beyond Google. If your company ships a customer-facing AI feature — a support bot, sales assistant, or contract summarizer — you become the publisher of whatever text it generates, not OpenAI or Anthropic. EU courts are now comfortable assigning liability to the deployer of the AI system, meaning founders are directly exposed to defamation and consumer protection claims for their AI's outputs.

Does an 'AI may make mistakes' disclaimer protect me from liability?

No. The German court explicitly addressed disclaimers and ruled they do not convert a defamatory statement into a non-defamatory one. Adding a notice that AI output may be inaccurate does not shield the deployer from liability when the system generates false or harmful content about a person or business. Terms of service also do not override consumer protection law in the EU.

How do I reduce AI liability risk in a customer-facing product?

Take three steps: First, audit every place your AI writes to customers — emails, chat answers, quotes, product descriptions. Second, add a grounding layer using retrieval-augmented generation (RAG) so the model answers from verified sources like your product database or pricing sheet, and refuses when retrieval confidence is low. Third, log every prompt, output, and source document used so you can prove what the system saw if complaints arise.

When should I use RAG instead of letting an LLM answer from its weights?

Use retrieval-augmented generation any time a wrong answer could cause harm — including pricing, product details, contract terms, or factual claims about people or businesses. Letting the LLM answer from its training weights invites hallucinations, which post-ruling expose you as the publisher. RAG retrieves from verified sources and should be paired with a prompt that refuses to answer when retrieval confidence is low.