Claude Code Hooks: Guardrails Before Every Commit

By Lazar Milicevic · Published June 13, 2026 · 12 min read

Developer terminal showing git commit output with pre-commit hook checks running on a laptop

You ask Claude to "run the tests before you commit." It does, three times in a row. On the fourth task it forgets, commits broken code, and the CI pipeline lights up red ten minutes later. You add a sterner line to CLAUDE.md. It works for a day. Then it doesn't.

This is the fundamental problem with treating a language model like a disciplined junior engineer: politeness is not a control. If the consequence of skipping a step is "the model felt like skipping a step," you don't have a guardrail — you have a suggestion. Claude Code hooks fix this by moving enforcement out of the prompt and into the runtime. They fire deterministically, they exit with a status code, and they can block the action that triggered them.

This post is a working tour of hooks for the three checks that matter most before code leaves your machine: lint, tests, and secrets scanning. It's written for the solo developer or small team shipping real software with Claude Code, not for theorists.

Why prompt-based "rules" fail

CLAUDE.md is useful. It's also a soft constraint. The model reads it, weighs it against the current context, and decides whether to follow it. Most of the time it does. Sometimes it doesn't — usually under pressure: long conversations, large diffs, or when it thinks it's being helpful by skipping the "boring" step.

There are three failure modes you'll see repeatedly:

Drift over long sessions. As the context fills up, instructions at the top of the conversation get less weight. The 200th tool call doesn't behave like the 5th.
Selective compliance. The model runs tests when the change is risky and skips them when it judges the change to be trivial. Its judgment of "trivial" is not your judgment of "trivial."
Silent partial execution. It runs the lint command, sees a warning, decides it's unrelated, and commits anyway.

Hooks remove the model's discretion from the loop. The hook either passes or it doesn't. If it doesn't pass, the action is blocked. No amount of reasoning by the model changes that.

What a hook actually is

A Claude Code hook is a shell command that fires on a specific event in the agent's lifecycle. The relevant events for pre-commit guardrails are:

PreToolUse — runs before a tool call (like Bash or Edit). If the hook exits non-zero with a blocking decision, the tool call is prevented.
PostToolUse — runs after a tool call. Useful for triggering formatters or running tests after a file edit.
Stop — runs when the agent thinks it's done. Useful as a final gate.

The configuration lives in .claude/settings.json (project-level, committed to the repo) or ~/.claude/settings.json (user-level). For team consistency, put it in the project.

A minimal hook config looks like this:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": ".claude/hooks/block-bad-commits.sh"
          }
        ]
      }
    ]
  }
}

The hook receives a JSON payload on stdin describing the tool call. It can:

Exit 0 — allow the action, optionally print context for the model.
Exit 2 — block the action; stderr is shown to the model so it knows why.
Return a structured JSON decision on stdout for more control.

That's the whole mental model. Everything below is applying it.

Guardrail 1: block commits that skip lint

The first hook intercepts git commit calls and refuses them if the linter isn't clean. The point is not to run lint — Claude can do that on its own. The point is to make it impossible to commit when lint fails, regardless of what the model decided.

Create .claude/hooks/pre-commit-lint.sh:

#!/usr/bin/env bash
set -euo pipefail

# Read the tool call payload from stdin
payload=$(cat)
command=$(echo "$payload" | jq -r '.tool_input.command // ""')

# Only act on git commit commands
if [[ ! "$command" =~ git[[:space:]]+commit ]]; then
  exit 0
fi

# Detect project type and run the appropriate linter
if [[ -f "package.json" ]]; then
  if ! npm run lint --silent > /tmp/lint.log 2>&1; then
    echo "Lint failed. Commit blocked." >&2
    echo "--- Lint output ---" >&2
    tail -n 40 /tmp/lint.log >&2
    exit 2
  fi
elif [[ -f "pyproject.toml" ]] || [[ -f "setup.py" ]]; then
  if ! ruff check . > /tmp/lint.log 2>&1; then
    echo "Ruff found issues. Commit blocked." >&2
    tail -n 40 /tmp/lint.log >&2
    exit 2
  fi
fi

exit 0

Wire it in .claude/settings.json:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          { "type": "command", "command": ".claude/hooks/pre-commit-lint.sh" }
        ]
      }
    ]
  }
}

Now when Claude runs git commit -m "...", the hook intercepts the bash call, runs lint, and if lint fails, the commit never happens. The model sees the stderr output and can either fix the issues or stop. Either is fine — what matters is that broken code doesn't get committed.

A note on speed: if your lint takes 30 seconds, every commit attempt costs 30 seconds. Configure your linter to only check changed files when possible:

changed=$(git diff --cached --name-only --diff-filter=ACM | grep -E '\.(js|ts|tsx)#39; || true)
if [[ -n "$changed" ]]; then
  npx eslint $changed
fi

Guardrail 2: block commits when tests fail

Same shape, different command. The trick is deciding which tests to run. Running the full suite on every commit kills momentum. Running nothing defeats the purpose.

A reasonable default: run the fast unit tests on commit, defer integration tests to PR/CI.

#!/usr/bin/env bash
set -euo pipefail

payload=$(cat)
command=$(echo "$payload" | jq -r '.tool_input.command // ""')

if [[ ! "$command" =~ git[[:space:]]+commit ]]; then
  exit 0
fi

if [[ -f "package.json" ]]; then
  if ! npm test -- --run --reporter=dot > /tmp/test.log 2>&1; then
    echo "Tests failed. Commit blocked." >&2
    tail -n 60 /tmp/test.log >&2
    exit 2
  fi
elif [[ -f "pyproject.toml" ]]; then
  if ! pytest -x -q --timeout=30 > /tmp/test.log 2>&1; then
    echo "Tests failed. Commit blocked." >&2
    tail -n 60 /tmp/test.log >&2
    exit 2
  fi
fi

exit 0

The -x flag in pytest and --timeout=30 are deliberate: fail fast, don't let a hung test eat your session. The point of a hook is to be a fast, deterministic gate, not a CI replacement.

For larger codebases, scope tests to the affected packages. A monorepo example:

changed_dirs=$(git diff --cached --name-only | xargs -n1 dirname | sort -u)
for dir in $changed_dirs; do
  if [[ -f "$dir/package.json" ]]; then
    (cd "$dir" && npm test -- --run) || exit 2
  fi
done

You can wire multiple PreToolUse hooks for the same matcher — they run in sequence, and any one of them can block. Keep them in separate files. One hook per concern is much easier to debug than one mega-script.

Guardrail 3: blocking secrets from ever being committed

This is the hook that pays for itself the first time it fires. The risk profile is asymmetric: a single committed API key can mean key rotation, audit logs, and an incident report. A blocked commit costs you ten seconds.

Two approaches that work well together:

A fast regex pre-check for obvious patterns (AWS keys, Stripe keys, generic high-entropy strings near KEY= or SECRET=).
A real secrets scanner (gitleaks, trufflehog) for thorough coverage.

#!/usr/bin/env bash
set -euo pipefail

payload=$(cat)
command=$(echo "$payload" | jq -r '.tool_input.command // ""')

if [[ ! "$command" =~ git[[:space:]]+commit ]]; then
  exit 0
fi

# Fast regex pass on staged diff
staged=$(git diff --cached)

patterns=(
  'AKIA[0-9A-Z]{16}'                    # AWS access key
  'sk_live_[0-9a-zA-Z]{24,}'            # Stripe live key
  'xox[baprs]-[0-9a-zA-Z-]{10,}'        # Slack token
  'ghp_[0-9a-zA-Z]{36}'                 # GitHub PAT
  '-----BEGIN (RSA |EC |OPENSSH )?PRIVATE KEY-----'
)

for pattern in "${patterns[@]}"; do
  if echo "$staged" | grep -E "$pattern" > /dev/null; then
    echo "Potential secret detected matching: $pattern" >&2
    echo "Commit blocked. Move the value to an env var or .env file." >&2
    exit 2
  fi
done

# Thorough scan with gitleaks if installed
if command -v gitleaks > /dev/null; then
  if ! gitleaks protect --staged --no-banner > /tmp/leaks.log 2>&1; then
    echo "gitleaks found potential secrets. Commit blocked." >&2
    cat /tmp/leaks.log >&2
    exit 2
  fi
fi

exit 0

A few practical notes:

The regex pass is fast (milliseconds) and catches the highest-impact leaks. Run it even if you also run gitleaks.
False positives happen. When they do, the model will usually try to "fix" the file by deleting the line — make sure your hook output is clear that this is a suspected secret, and instruct what to do (move to .env, add to allowlist).
Maintain a .gitleaks.toml with allowlists for known false positives (test fixtures, example values in docs).

Putting it together: a layered settings.json

Here's a complete .claude/settings.json combining all three hooks plus a post-edit formatter:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          { "type": "command", "command": ".claude/hooks/secrets-scan.sh" },
          { "type": "command", "command": ".claude/hooks/pre-commit-lint.sh" },
          { "type": "command", "command": ".claude/hooks/pre-commit-tests.sh" }
        ]
      }
    ],
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          { "type": "command", "command": ".claude/hooks/format-on-write.sh" }
        ]
      }
    ]
  }
}

The PostToolUse formatter runs prettier --write or ruff format on the file that was just modified. It doesn't block — it just keeps formatting consistent so lint doesn't fail later for trivial reasons.

Comparison of where each control belongs:

Concern	`CLAUDE.md` instruction	Hook	CI pipeline
Coding style preferences	✅	—	—
Auto-format on save	—	✅	—
Lint must pass	—	✅	✅
Tests must pass	—	✅ (fast)	✅ (full)
Secrets scan	—	✅	✅
Architecture/design rules	✅	—	—
Security review	—	—	✅

The pattern: use CLAUDE.md for taste and approach, use hooks for deterministic gates on the local loop, use CI as the final authority. Hooks are not a replacement for CI — they're the layer that catches issues before CI has to.

Debugging hooks without losing your mind

Hooks fail silently in annoying ways the first time you write them. A few habits that save time:

Log everything. Add a debug log line to every hook while you're iterating:

echo "[$(date -Iseconds)] hook=$0 cmd=$command" >> /tmp/claude-hooks.log

Test hooks outside Claude Code. They're just shell scripts reading JSON from stdin. You can test them directly:

echo '{"tool_input":{"command":"git commit -m test"}}' | ./.claude/hooks/pre-commit-lint.sh
echo "exit: $?"

Watch for exit code semantics. Exit 0 allows, exit 2 blocks with feedback to the model. Other non-zero codes are treated as errors but don't necessarily block — check the current Claude Code docs for the exact behavior version you're on.

Don't write hooks in Python if bash will do. Startup time matters. A 200ms Python interpreter spin-up on every tool call is noticeable. Bash + jq is usually enough.

Be careful with matchers. A matcher of Bash catches every bash call, not just commits. Inside the hook, filter by the actual command. Otherwise you'll be running lint on ls calls.

What hooks can't do

Worth stating plainly:

Hooks run on your machine. If a teammate clones the repo and runs Claude Code without the project settings.json, they bypass the hooks. Commit .claude/settings.json and .claude/hooks/ to the repo.
Hooks don't replace CI. A local machine can have stale dependencies, different OS, different env. CI is the source of truth.
Hooks won't catch logic errors. They catch the categories of mistakes you've already decided are unacceptable. They can't think.
Hooks add latency. If you pile on slow checks, you'll feel it on every commit. Budget for it — sub-five-second hooks are tolerable, thirty-second hooks aren't.

The right mental model: a hook is a tripwire, not a referee. It catches the specific failure modes you've encoded. Everything else is still your job.

How BizFlowAI approaches this

Hook configurations are part of every Claude Code rollout we ship for clients. When we set up an engineering team with Claude Code, the .claude/settings.json and .claude/hooks/ directory ship together with a CLAUDE.md tuned to the codebase — and the hooks are written to that team's actual lint, test, and security stack. Not a generic template.

The combinations vary: a Python shop with ruff + pytest + gitleaks, a TypeScript monorepo with turbo + vitest + scoped per-package gates, a Rails app with rubocop + rspec + a custom check for raw SQL. The pattern is the same — soft rules in the prompt, hard gates in the hooks, full validation in CI — but the implementation is specific to what's already on the team's disk. If you'd like to see a production hooks setup running against a real repo, book a discovery call and we'll walk you through one.

Where to go from here

Start small. Pick one of the three hooks above — the secrets scanner is the highest-leverage place to begin — and ship it tomorrow. Watch it fire. Tune the false positives. Then add the next one.

A week of running with even a single deterministic hook will change how you think about agent-driven development. You'll stop writing increasingly elaborate paragraphs in CLAUDE.md and start writing thirty-line shell scripts that just refuse to let bad things happen. That's the right trade.

The model is good. The model is not a guardrail. Write the hook.

Frequently asked questions

What are Claude Code hooks?

Claude Code hooks are shell commands that fire deterministically on specific events in the agent's lifecycle, such as before or after a tool call. They are configured in .claude/settings.json and receive a JSON payload on stdin describing the tool call. A hook can exit 0 to allow the action or exit 2 to block it and show stderr to the model. Unlike instructions in CLAUDE.md, hooks cannot be ignored by the model.

How do I block Claude Code from committing code when lint or tests fail?

Register a PreToolUse hook with matcher 'Bash' in .claude/settings.json that points to a shell script. The script reads the tool payload from stdin, checks if the command matches 'git commit', and runs your linter or test suite. If the check fails, the script writes the error to stderr and exits with code 2, which blocks the commit and shows the output to Claude so it can fix the issues.

How can I prevent Claude Code from committing secrets like API keys?

Add a PreToolUse hook that scans the staged diff (git diff --cached) before any git commit runs. Combine a fast regex pass for known patterns like AKIA AWS keys, sk_live_ Stripe keys, ghp_ GitHub PATs, and PRIVATE KEY blocks with a thorough scanner like gitleaks or trufflehog. If either detects a match, exit 2 with a message telling Claude to move the value to an environment variable.

Why are CLAUDE.md instructions not enough to enforce pre-commit checks?

CLAUDE.md is a soft constraint that the model weighs against current context and can ignore. Common failure modes include drift over long sessions where early instructions lose weight, selective compliance where the model skips checks on changes it judges trivial, and silent partial execution where it runs lint, sees warnings, and commits anyway. Hooks remove model discretion by enforcing rules deterministically at runtime.

How do I keep Claude Code hooks fast enough not to slow down commits?

Scope checks to changed files instead of the whole codebase using git diff --cached --name-only and pass the result to your linter. For tests, run only fast unit tests on commit and defer integration tests to CI, using flags like pytest -x --timeout=30 to fail fast. Split concerns into separate hook scripts rather than one mega-script so each runs quickly and is easier to debug.

Work with BizFlowAI

If you'd rather have this built for you, that's what we do: production AI automation for solo founders and small teams — agents, integrations, and document pipelines that actually ship.

Book a free discovery call — 30 minutes, we map the highest-ROI automation in your workflow. No pitch deck, just engineering.