MCP Red Teaming & Security Scanner
Security scanner for Model Context Protocol servers. Combines static metadata analysis with active behavioral probing — connects to MCP servers, enumerates tools/resources/prompts, calls tools with safe payloads, and analyzes what comes back.
Works against standard MCP (SSE, Streamable HTTP), local stdio servers
(npx, python, etc.), non-standard tool servers (POST /execute), and
Kubernetes-internal MCP deployments.
Use with DVMCP for training, or point at any MCP server in dev/staging/prod.
See CHANGELOG.md for recent changes and planned work.
Quickstart (recommended):
git clone https://github.com/babywyrm/mcpnuke.git && cd mcpnuke
./quickstart.shThis creates a .venv, installs all extras (dev, ai, k8s), runs tests, and
prints usage. After that, ./scan and uv run mcpnuke just work — no
activation needed.
uv (manual):
uv sync --all-extras
uv run mcpnuke --helpNo source .venv/bin/activate needed — uv run finds the project venv automatically.
Optional extras: dev (testing/linting), ai (Claude analysis), k8s (Kubernetes checks), all (everything).
pip (manual):
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[all,dev]"From PyPI (coming soon):
uv pip install 'mcpnuke[all]'Verify your install:
mcpnuke --doctorNew to mcpnuke? Try the DVMCP Walkthrough --
a hands-on guide that scans 10 vulnerable MCP servers and explains every finding.
Or run ./walkthrough/demo.sh for the fully automated version.
For command recipes across camazotz, DVMCP, deterministic benchmarking, and
Bedrock variations, see QUICKSTART.md.
# Single target
./scan --targets http://localhost:2266
# DVMCP challenges 1–10
./scan --port-range localhost:9001-9010 --verbose
# Authenticated endpoint (JWT, PAT, etc.)
./scan --targets https://api.githubcopilot.com/mcp/ --auth-token ghp_xxx
# OIDC auto-token (Keycloak, etc.)
./scan --targets http://localhost:9090/mcp \
--oidc-url http://keycloak:8080/realms/myapp \
--client-id myapp --client-secret SECRET
# OIDC with explicit scope, extra headers, and TLS verification
./scan --targets https://target.example/mcp \
--oidc-url https://auth.example/realms/agentic \
--client-id scanner --client-secret SECRET \
--oidc-scope "mcp.read mcp.invoke" \
--header "X-Tenant: blue" \
--header "X-Agent-Flow: planner" \
--tls-verify
# Optional: DPoP + token introspection + JWKS metadata checks
./scan --targets https://target.example/mcp \
--auth-token "$ACCESS_TOKEN" \
--dpop-proof "$DPOP_PROOF_JWT" \
--token-introspect-url "https://auth.example/oauth2/introspect" \
--token-introspect-client-id scanner \
--token-introspect-client-secret SECRET \
--jwks-url "https://auth.example/.well-known/jwks.json" \
--tls-verify \
--json auth-flow-report.json
# JSON report for CI
./scan --port-range localhost:9001-9010 --json report.json
# Differential scan (compare to baseline)
./scan --targets http://localhost:9001 --baseline baseline.json
# Scan a local MCP server via stdin/stdout (no proxy needed)
./scan --stdio 'npx -y @modelcontextprotocol/server-everything'
# Fast scan (~2min vs ~30min) — samples top 5 security-relevant tools, skips heavy probes
./scan --targets http://localhost:9090 --fast --verbose
# Grouped findings (compact report)
./scan --targets http://localhost:9090 --group-findings
# Parallel deep probes (faster behavioral phase)
./scan --targets http://localhost:9090 --probe-workers 4
# AI-powered analysis (requires ANTHROPIC_API_KEY)
./scan --targets http://localhost:9002/sse --claude --verbose
./scan --targets http://localhost:9002/sse --claude --claude-model claude-opus-4-20250514
./scan --targets http://localhost:9002/sse --claude --claude-max-tools 25 --claude-phase2-workers 3
# AI-powered analysis via AWS Bedrock Claude (optional)
./scan --targets http://localhost:9002/sse --claude --bedrock --bedrock-region us-east-1
# Run tests
uv run pytest tests/ -vAll ./scan commands also work as uv run mcpnuke (no activation needed),
mcpnuke (with venv activated), or .venv/bin/mcpnuke.
When --auth-token looks like a JWT, mcpnuke decodes it (without signature
validation) and includes a safe claim summary in JSON output under
auth_context.jwt_claims_summary to help validate agentic auth wiring.
If configured, token introspection and JWKS fetch summaries are also included
under auth_context without affecting scan behavior when disabled.
Exit codes: 0 — no findings (clean); 1 — findings reported; 2 — scan
error (connection failure, invalid args, etc.). Use 1 vs 2 in CI to
distinguish “vulns found” from “scanner failed.”
1. CONNECT Detect transport (SSE, Streamable HTTP, stdio, or custom tool server)
2. ENUMERATE initialize → tools/list → resources/list → prompts/list
(or probe tool names for non-MCP /execute APIs)
3. STATIC CHECKS Pattern-match metadata (names, descriptions, schemas)
4. PROBE Call tools with safe payloads, read resources
5. ANALYZE Scan responses for injection, exfil, leakage, drift
6. AGGREGATE Detect attack chains across findings
7. REPORT Console table (or --group-findings) + optional JSON
The scanner runs checks in a deliberate order:
| Phase | Checks | What Happens |
|---|---|---|
| Static | prompt_injection, tool_poisoning, excessive_permissions, token_theft, code_execution, remote_access, schema_risks, rate_limit, prompt_leakage, supply_chain, tool_shadowing, webhook_persistence, credential_in_schema, config_tampering, exfil_flow | Pattern-match on tool names, descriptions, schemas. No server interaction beyond enumeration. |
| Behavioral | rug_pull, indirect_injection, protocol_robustness | Light interaction: re-list tools, read resources, send invalid methods. |
| Deep Probes | deep_rug_pull, tool_response_injection, input_sanitization, error_leakage, temporal_consistency, resource_poisoning, response_credentials, state_mutation, notification_abuse | Active tool invocation with safe payloads. Analyze responses for threats. |
| Transport | sse_security | CORS, unauthenticated SSE, cross-origin POST. |
| Aggregate | multi_vector, attack_chains | Cross-reference all prior findings to detect compound threats. |
| AI (optional) | llm_tool_analysis, llm_response_analysis, llm_chain_reasoning | Claude reads definitions, tool output, and all findings to identify subtle risks and multi-step attack chains. Requires --claude. |
| Check | Severity | What It Detects |
|---|---|---|
prompt_injection |
CRITICAL | Injection payloads in tool/resource/prompt descriptions |
tool_poisoning |
CRITICAL | Hidden instructions, invisible Unicode in tool descriptions |
excessive_permissions |
CRITICAL–MEDIUM | Dangerous capabilities (shell, filesystem, network, DB, cloud) |
code_execution |
CRITICAL–HIGH | Tools with exec/eval/shell parameters or descriptions |
remote_access |
CRITICAL–HIGH | Reverse shells, C2 beacons, port forwarding, data exfil |
token_theft |
CRITICAL–HIGH | Tools that accept or forward credentials as parameters |
supply_chain |
CRITICAL | Dynamic package install from user-controlled URLs |
schema_risk |
CRITICAL–MEDIUM | Command params, unbounded strings, freeform objects |
tool_shadowing |
HIGH–MEDIUM | Tool names that collide with common tools or other servers |
prompt_leakage |
HIGH | Tools that may echo, log, or expose internal prompts |
rate_limit |
MEDIUM | Descriptions suggesting unbounded/unthrottled usage |
webhook_persistence |
HIGH–MEDIUM | Callback/webhook params or tool names enabling persistent re-injection |
credential_in_schema |
CRITICAL–HIGH | Hardcoded credentials (API keys, JWTs, connection strings) in tool schemas |
config_tampering |
HIGH | Tools that can modify agent config, system prompt, or tool registry |
exfil_flow |
CRITICAL | Data flow from sensitive source tools to communication/network sinks |
jwt_algorithm |
CRITICAL–HIGH | JWT alg:none (signature bypass) or symmetric HMAC algorithms |
jwt_issuer |
MEDIUM | JWT missing iss (issuer) claim |
jwt_audience |
MEDIUM | JWT missing aud (audience) claim — enables cross-service replay |
jwt_token_id |
LOW | JWT missing jti — replay detection not possible |
jwt_ttl |
HIGH–MEDIUM | JWT with no exp or TTL exceeding threshold (default 4h) |
jwt_weak_key |
CRITICAL | JWT signed with a known weak/default HMAC key |
| Check | Severity | What It Detects |
|---|---|---|
rug_pull |
CRITICAL–HIGH | Tool list changes between two tools/list calls |
deep_rug_pull |
CRITICAL | Tool list/schema changes after invoking tools — catches state-dependent rug pulls, injection pattern drift (clean → poisoned after N calls) |
tool_response_injection |
CRITICAL–HIGH | Injection payloads, exfil URLs, hidden content, invisible Unicode, or base64-encoded attacks in tool responses |
cross_tool_manipulation |
HIGH | Tool output that directs the LLM to invoke a different tool |
input_sanitization |
CRITICAL–HIGH | Path traversal, command injection, template injection, SQL injection probes reflected unsanitized. LLM-aware SSTI: confirmed engine fingerprints (Jinja2/Mako/ERB/EL) stay CRITICAL; math-style template probes evaluated by the LLM (e.g. {{7*7}} → 49) are downgraded to MEDIUM so LLM-backed MCP servers are not false-flagged as code SSTI. |
error_leakage |
HIGH–MEDIUM | Stack traces, internal paths, connection strings, or secrets in error responses |
temporal_consistency |
CRITICAL–MEDIUM | Escalating injection, wildly inconsistent responses, or new threats across repeated identical calls |
resource_poisoning |
CRITICAL–HIGH | Base64-encoded injection, data URIs, steganographic Unicode, CSS-hidden HTML, or markdown image exfiltration in resource content |
state_mutation |
HIGH–MEDIUM | Resources that appear, disappear, or change content after tool invocations |
notification_abuse |
CRITICAL–MEDIUM | Unsolicited sampling/createMessage, roots/list, or other server-initiated requests |
indirect_injection |
CRITICAL–HIGH | Injection/poison patterns in resource content; probes content-processing tools with embedded injection payloads |
active_prompt_injection |
CRITICAL | Sends injection payloads as tool inputs — detects instruction following, system prompt leaks, and role overrides |
response_credentials |
CRITICAL–HIGH | Credentials (API keys, passwords, private keys, connection strings) in tool responses |
| Check | Severity | What It Detects |
|---|---|---|
auth |
HIGH | Unauthenticated MCP/tool-server initialize accepted |
sse_security |
HIGH–MEDIUM | Unauthenticated SSE stream, CORS misconfiguration, cross-origin POST |
multi_vector |
CRITICAL | 2+ dangerous vulnerability categories active on one server |
attack_chain |
CRITICAL | Linked vulnerability pairs (e.g. input_sanitization → code_execution) |
The deep probe checks call tools on the target server using safe, non-destructive payloads. Here's how it works:
For each tool, the scanner reads the inputSchema and generates minimal
arguments using context-aware safe values:
| Parameter Pattern | Safe Value |
|---|---|
path, file, dir |
/tmp/mcp_probe_MCP_PROBE_8f4c2a |
command, exec |
echo hello |
expression, code |
1+1 |
url, endpoint |
http://example.com/test |
username, name |
probe_user |
password, token |
MCP_PROBE_8f4c2a (canary) |
| Enum params | First enum value |
| Everything else | test |
Only required parameters are filled. Optional params are skipped to minimize side effects.
For input_sanitization, the scanner sends targeted probes based on
parameter semantics:
| Param Type | Probe Examples |
|---|---|
| File/path params | ../../../tmp/MCP_PROBE_8f4c2a, path null bytes |
| Command params | test; echo MCP_PROBE_8f4c2a, pipe/backtick variants |
| Query/SQL params | ' OR '1'='1, UNION SELECT |
| Other strings | {{7*7}}, ${7*7}, ERB/Jinja templates |
The canary string MCP_PROBE_8f4c2a is embedded in probes. If it appears
in the response, the tool reflected input without sanitization.
When tool_response_injection calls a tool, the response is cached in
probe_opts["_response_cache"]. Downstream checks like response_credentials
reuse the cache instead of re-invoking the same tools, eliminating redundant
calls and reducing scan time.
Every tool response is scanned for:
- Injection payloads — "ignore previous instructions", role overrides, system prompt markers
- Semantic injection — mode switches, secrecy directives, credential requests, XML/delimiter tool-call injection
- Exfiltration URLs — webhook, ngrok, burp, requestbin, pipedream, interactsh
- Hidden content — HTML comments,
<hidden>blocks,<script>tags - Invisible Unicode — zero-width chars, bidi overrides, invisible formatters
- Base64-encoded attacks — decoded and re-scanned for injection patterns
- Cross-tool references — "call tool X", "invoke function Y"
- LLM classification (with
--claude) — ambiguous responses sent to Claude for malicious/benign classification
./scan [OPTIONS]
Target Selection:
--targets URL [URL ...] One or more MCP target URLs
--port-range HOST:START-END Scan a port range (e.g. localhost:9001-9010)
--targets-file FILE Read URLs from file (one per line, # comments)
--public-targets Use built-in public targets list
Authentication:
--auth-token TOKEN Bearer token for authenticated endpoints
(or set MCP_AUTH_TOKEN env var)
--dpop-proof JWT Optional static DPoP header value
--header KEY:VALUE Extra HTTP header (repeatable)
--tls-verify Enable TLS certificate verification
--oidc-scope SCOPE Optional OAuth2 scope for client_credentials
--token-introspect-url URL Optional OAuth2 token introspection endpoint
--token-introspect-client-id ID
--token-introspect-client-secret SECRET
--jwks-url URL Optional JWKS endpoint for keyset metadata
Scan Options:
--timeout SEC Per-target connection timeout (default: 25)
--workers N Parallel scan workers (default: 4)
Stdio Transport:
--stdio CMD Scan a local MCP server via stdin/stdout JSON-RPC
(e.g. --stdio 'npx -y @modelcontextprotocol/server-everything')
Safety Controls:
--no-invoke Static-only: skip all behavioral probes (safe for production)
--safe-mode Skip dangerous tools (delete/send/exec/write), probe read-only
--probe-calls N Invocations per tool for deep rug pull (default: 10)
Performance:
--fast Sample top 5 security-relevant tools, skip heavy probes
--probe-workers N Parallel deep behavioral probe threads (default: 1)
--deterministic Stable ordering + single-thread probes/AI Phase 2 for repeatable benchmarking
--claude-phase2-workers N Parallel Claude workers for AI Phase 2 (default: 1)
--bedrock Route Claude calls through AWS Bedrock runtime
--bedrock-region REGION Bedrock region (e.g. us-east-1)
--bedrock-profile PROFILE AWS profile for Bedrock credentials
--bedrock-model MODEL_ID Bedrock model ID (default: anthropic.claude-3-5-sonnet-20241022-v2:0)
Tool Server:
--tool-names-file FILE Custom wordlist for ToolServer enumeration (supplements built-in)
Output:
--json FILE Write JSON report to FILE
--group-findings Collapse similar findings into compact grouped rows
--no-color Disable colored output (respects NO_COLOR env var)
--verbose, -v Verbose output
--debug Debug output (very noisy)
Differential:
--baseline FILE Compare against baseline
--save-baseline FILE Save scan as baseline
Kubernetes:
--k8s-namespace NS Namespace for internal checks (default: default)
--no-k8s Skip Kubernetes checks
--k8s-discover Auto-discover MCP targets via K8s service discovery
--k8s-discover-namespaces Namespaces to scan for MCP services
--k8s-no-probe Skip active probing during discovery (port match only)
--k8s-discovery-workers N Concurrent MCP probes during discovery (default: 10)
--k8s-max-endpoints N Cap number of MCP endpoints to scan (no limit by default)
--k8s-discover-only List discovered endpoints only; skip MCP scanning
| Mode | Flag | What Runs | Use Case |
|---|---|---|---|
| Full | (default) | Static + all behavioral probes | Dev/staging, DVMCP, CTFs |
| Fast | --fast |
Static + top-5 tools (tiered scoring), skip heavy probes (risk-aware: retains input_sanitization when dangerous params detected), cap workers at 2 |
Quick triage, large tool sets |
| Safe | --safe-mode |
Static + probes on read-only tools only | Prod servers with mixed tool risk |
| Static | --no-invoke |
Static checks only, no tool calls | Prod servers, zero side-effect risk |
| AI | --claude |
All checks + Claude analysis | Deep analysis, subtle vuln hunting |
In --fast mode, mcpnuke ranks all discovered tools using a tiered weighted
scoring algorithm (_tool_security_score) and selects the top 5. The scorer
considers:
| Factor | How It Works |
|---|---|
| Keyword tiers (6 levels) | Exec/eval/shell keywords score highest (10), followed by secret/credential (8), webhook/callback (7), run/command (6), upload/write/file (4), admin/root (3) |
| Name vs description | Keywords in the tool name get 3x the weight of keywords in the description |
| Dangerous parameters | Params named url, command, code, query, script, host, etc. add +8 each |
| Schema complexity | Number of input properties (capped at 3) adds a small bonus |
| High-value floor | Tools with names containing secret, credential, password, token, config, etc. get a minimum score of 15, even if other signals are weak |
This ensures zero-parameter tools like server-config and secrets.leak_config
rank above benign tools like smelt-item or move-to-position, and that tools
with dangerous parameter surfaces (run-maintenance, admin-webhook, fetch-skin)
are consistently selected.
Add --claude to any scan to layer LLM reasoning on top of deterministic checks.
Requires the anthropic package and ANTHROPIC_API_KEY env var.
By default, mcpnuke uses direct Claude API calls; Bedrock is opt-in via --bedrock.
Setup:
# If installed via quickstart.sh or uv sync --all-extras, anthropic is included.
# Otherwise install the AI extra:
uv pip install -e ".[ai]" # or: pip install anthropic
export ANTHROPIC_API_KEY=sk-ant-...For Bedrock mode, the same ai extra includes boto3; configure AWS credentials
and pass --bedrock (plus optional region/profile/model flags).
If --claude is used without the package or API key, mcpnuke exits immediately
with a clear error message instead of running the full scan first.
Usage:
# Sonnet (fast, default)
./scan --targets http://localhost:9002/sse --claude --verbose
# Opus (deepest reasoning)
./scan --targets http://localhost:9002/sse --claude --claude-model claude-opus-4-20250514
# Fast mode + Claude (deterministic fast scan, then AI analysis)
./scan --targets http://localhost:9090 --fast --claude --verbose
# Faster Claude Phase 2 on medium/large toolsets
./scan --targets http://localhost:9090 --fast --claude --claude-max-tools 25 --claude-phase2-workers 3
# Repeatable benchmarking mode (recommended for run-to-run comparisons)
./scan --targets http://localhost:9090 --fast --claude --deterministic --verbose
# Claude via Bedrock (no ANTHROPIC_API_KEY required)
./scan --targets http://localhost:9090 --fast --claude --bedrock --bedrock-region us-east-1--claude-phase2-workers guidance:
- Default is
1(serial). This is safe and works out of the box. - Use
2-4to reduce wall-clock time when Phase 2 dominates runtime. - Keep
1if your key is rate-limited or target/network is unstable. - This flag is optional; scans run normally without it.
--deterministic guidance:
- Forces stable tool ordering and single-threaded deep probes/AI Phase 2.
- Use this for benchmarking and CI drift checks when you need tighter run-to-run consistency.
- This does not remove model/target nondeterminism entirely, but it reduces scanner-side variance.
mcpnuke uses a three-layer analysis architecture. Each layer catches what the previous one can't:
Layer 1: Deterministic (regex patterns) — what tools SAY
Layer 2: Behavioral (call tools, probe) — what tools DO
Layer 3: Claude AI (read, reason, chain) — what tools MEAN
Claude runs three phases after deterministic + behavioral checks:
| Phase | What it does | Example finding |
|---|---|---|
| Tool analysis | Reads definitions for subtle poisoning, social engineering, logical risks | "These tools chain into a privilege escalation path" |
| Response analysis | Reads actual tool output for manipulation, hidden intent, credential leakage | "Tool response is a fake paywall — social engineering the LLM" |
| Chain reasoning | Connects all findings into multi-step attack scenarios | "Unauthenticated access → command injection → lateral movement → persistence" |
Real example from DVMCP Challenge 4 (Rug Pull):
| Layer | Findings | Score |
|---|---|---|
| Deterministic only | 5 (schema_risk, auth, SSE) | 26 |
| + Behavioral probes | 6 (+ deep_rug_pull) | 36 |
| + Claude Opus | 10 (+ social engineering, attack chains) | 64 |
AI findings are prefixed with [AI] and include taxonomy IDs (e.g. [AI] [MCP-T03]).
They appear alongside deterministic findings in the same report.
Tools are classified as dangerous if their name contains keywords like
delete, execute, send, write, deploy, kill, transfer, etc.
In --safe-mode, these are skipped while read-only tools (get, list,
search, check, verify, etc.) are still probed.
# Terminal 1: start challenge servers
./tests/dvmcp_reset.sh --setup-only
# Terminal 2: scan
./scan --port-range localhost:9001-9010 --verbose# Servers that use POST /execute with {"tool": "...", "query": "..."} instead of MCP
./scan --targets http://localhost:5000/execute --verbose
# With custom tool names wordlist for a specific engagement
./scan --targets http://localhost:5000/execute --tool-names-file my_tools.txtThe scanner auto-detects non-MCP tool servers by probing 20+ common
execute/invoke paths and fingerprints the framework (Flask, FastAPI, Express,
Spring Boot, etc.) from response headers. Tools are enumerated from a
built-in wordlist (data/tool_names.txt, 84 names) supplemented by any
custom wordlist. All static + behavioral checks run against discovered tools.
./scan --targets https://api.githubcopilot.com/mcp/ --auth-token ghp_xxx
# Or via env var
export MCP_AUTH_TOKEN=ghp_xxx
./scan --targets https://api.githubcopilot.com/mcp/./scan --targets https://mcp.deepwiki.com/mcpUse /mcp (Streamable HTTP), not /sse.
# Save baseline
./scan --targets http://localhost:9001 --save-baseline baseline.json
# Later: detect regressions
./scan --targets http://localhost:9001 --baseline baseline.jsonReports added/removed/modified tools, resources, prompts. New tools flagged as MEDIUM for review.
./scan --port-range localhost:9001-9010 --json report.jsonExit code is 1 if the scan completes and reports findings, 0 if clean,
and 2 on scan errors. Use in CI pipelines to gate deployments and to
separate “findings” from “scanner failure.”
# Full suite
uv run pytest tests/ -v
# DVMCP challenges only
uv run pytest tests/test_dvmcp.py -v
# Stop on first failure
uv run pytest tests/ -v -xDeploy mcpnuke as a K8s Job to scan cluster-internal MCP services and audit the Kubernetes posture from inside.
When a cluster has many services (dozens or hundreds of potential MCP endpoints):
- Parallel discovery — MCP probes run with
--k8s-discovery-workers(default 10). Increase for faster discovery:--k8s-discovery-workers 20. - Cap endpoints — Limit how many MCPs are scanned:
--k8s-max-endpoints 50. Annotation-sourced endpoints are kept first; then probed; then port-match. - Discover-only triage — List endpoints without running full MCP scans:
mcpnuke --k8s-discover --k8s-discover-only --json endpoints.jsonto export a URL list for triage or splitting across jobs. - Service fingerprinting — Uses the same worker count for parallel HTTP probes when enumerating frameworks and exposed actuator/debug paths.
Note: Use
mcpnuke(not./scan) in K8s manifests — inside the container the package is installed globally.
# Build the image
docker build -f mcpnuke/k8s/Dockerfile -t mcpnuke:latest .
# Deploy (read-only cluster access)
kubectl apply -k mcpnuke/k8s/manifests/
# Optional: enable full RBAC auditing (SA blast radius mapping)
kubectl apply -f mcpnuke/k8s/manifests/rbac-impersonate.yaml
# Check results
kubectl logs -n mcpnuke -l app.kubernetes.io/name=mcpnukeNote: The base deployment grants read-only access to services, pods, secrets, configmaps, and network policies. The optional
rbac-impersonate.yamladds ServiceAccount impersonation, which lets the scanner enumerate effective permissions for every SA in the target namespace. This is an elevated privilege -- apply it only if you want complete RBAC auditing. The scanner degrades gracefully without it.
| Check | What It Finds |
|---|---|
| RBAC enumeration | Which resources the scanner's SA can access (secrets, configmaps, pods) |
| SA blast radius | Maps effective permissions for every ServiceAccount; flags overprivileged accounts |
| Helm secret scanning | Decodes Helm release secrets (base64→base64→gzip) and scans values for private keys and credentials |
| Helm version drift | Compares release versions to find credentials removed in newer releases but still recoverable from old ones |
| Pod security | Privileged containers, hostNetwork/PID, dangerous capabilities, hostPath mounts, root UID, missing resource limits |
| ConfigMap leaks | Scans ConfigMap data for private keys and credential-named fields |
| NetworkPolicy audit | Flags namespaces with no network policies |
| Service fingerprinting | Identifies frameworks (Spring Boot, Flask, Express, etc.) and probes for exposed actuator, debug, swagger, and admin endpoints |
| MCP discovery | Auto-discovers MCP servers via annotations (mcp.io/enabled) and well-known port probing |
| Tool server detection | Detects non-MCP tool-execute APIs (POST /execute) by probing with tool-style payloads; enumerates available tools by name |
Use the CronJob manifest for periodic auditing:
kubectl apply -f mcpnuke/k8s/manifests/cronjob.yamlDefault schedule: every 6 hours. Edit the spec.schedule field to change.
Edit k8s/manifests/job.yaml args to target specific namespaces:
args:
- "--k8s-discover"
- "--k8s-discover-namespaces"
- "my-namespace"
- "--k8s-namespace"
- "my-namespace"
- "--verbose"
- "--json"
- "/reports/scan.json".
├── quickstart.sh # One-command setup (venv + install + tests)
├── scan # Zero-config runner (no venv activation needed)
├── mcpnuke/ # Python package
│ ├── __init__.py # Version, package docstring
│ ├── __main__.py # Entry point (python -m mcpnuke)
│ ├── cli.py # Argument parsing
│ ├── scanner.py # Scan orchestration, parallel execution, cross-target analysis
│ ├── diff.py # Differential scanning (baseline save/load/compare)
│ ├── core/
│ │ ├── constants.py # Protocol versions, severity weights, attack chain patterns
│ │ ├── enumerator.py # MCP handshake: initialize → list tools/resources/prompts
│ │ ├── models.py # Finding, TargetResult dataclasses
│ │ └── session.py # SSE + HTTP + Stdio + ToolServer transport detection and sessions
│ ├── patterns/
│ │ ├── rules.py # Static regex patterns (injection, poison, theft, exec, etc.)
│ │ └── probes.py # Behavioral probe payloads, canary strings, response analysis
│ ├── checks/
│ │ ├── __init__.py # Check registry and run_all_checks() orchestrator
│ │ ├── injection.py # prompt_injection, tool_poisoning, indirect_injection, active_prompt_injection
│ │ ├── permissions.py # excessive_permissions, schema_risks
│ │ ├── behavioral.py # rug_pull, deep_rug_pull, state_mutation, notification_abuse
│ │ ├── tool_probes.py # response_injection, input_sanitization, error_leakage
│ │ ├── theft.py # token_theft
│ │ ├── execution.py # code_execution, remote_access
│ │ ├── chaining.py # tool_shadowing, multi_vector, attack_chains
│ │ ├── transport.py # sse_security (CORS, unauth SSE, cross-origin POST)
│ │ ├── rate_limit.py # rate_limit
│ │ ├── prompt_leakage.py # prompt_leakage
│ │ ├── supply_chain.py # supply_chain
│ │ ├── webhook_persistence.py # webhook_persistence (name + param detection)
│ │ ├── credential_in_schema.py # credential_in_schema
│ │ ├── config_tampering.py # config_tampering
│ │ ├── exfil_flow.py # exfil_flow (source→sink with live verification)
│ │ └── response_credentials.py # response_credentials (cached response reuse)
│ ├── data/
│ │ ├── public_targets.txt # Built-in target URLs (DVMCP, public MCP servers)
│ │ └── tool_names.txt # Wordlist for ToolServer tool enumeration
│ ├── k8s/
│ │ ├── scanner.py # RBAC, Helm secrets, pod security, SA blast radius
│ │ ├── discovery.py # MCP auto-discovery via annotations + port probing
│ │ ├── fingerprint.py # Framework detection + exposed endpoint probing
│ │ ├── Dockerfile # Multi-stage Python 3.12-slim image
│ │ └── manifests/ # Kustomize-ready K8s deployment manifests
│ └── reporting/
│ ├── console.py # Rich table output
│ └── json_out.py # JSON report writer
├── tests/ # Pytest suite (224 tests, incl. DVMCP challenges)
│ ├── test_dvmcp.py # DVMCP challenges 1-10 (offline + optional live)
│ ├── test_cli.py # CLI argument parsing
│ ├── test_diff.py # Differential scanning
│ ├── test_k8s.py # Kubernetes checks
│ ├── test_fast_sampling.py # _tool_security_score + _pick_security_relevant
│ ├── test_webhook_persistence.py
│ ├── test_response_credentials.py
│ ├── test_exfil_flow.py
│ ├── test_config_tampering.py
│ ├── test_credential_in_schema.py
│ └── ...
├── walkthrough/ # Hands-on DVMCP guide + automated demo
│ ├── README.md # Progressive walkthrough with annotated findings
│ └── demo.sh # Zero-to-findings automated demo script
├── pyproject.toml # Project metadata, dependencies, entry points
├── CHANGELOG.md
└── README.md
Score = SUM(finding_weights)
CRITICAL → 10 points
HIGH → 7 points
MEDIUM → 4 points
LOW → 1 point
Rating:
≥ 20 → CRITICAL
≥ 10 → HIGH
≥ 5 → MEDIUM
≥ 1 → LOW
0 → CLEAN
After all individual checks run, the scanner looks for linked vulnerability pairs that combine into compound attack paths:
| Chain | Risk |
|---|---|
prompt_injection → code_execution |
Injection leads to RCE |
prompt_injection → token_theft |
Injection leads to credential exfil |
code_execution → token_theft |
RCE used to steal credentials |
code_execution → remote_access |
RCE to persistent access |
indirect_injection → token_theft |
Poisoned data exfils creds |
tool_response_injection → cross_tool_manipulation |
Output hijacks tool flow |
deep_rug_pull → tool_poisoning |
Post-trust tool mutation |
input_sanitization → code_execution |
Unsanitized input to RCE |
resource_poisoning → tool_response_injection |
Poisoned resource feeds tool |
cross_tool_manipulation → token_theft |
Tool chaining steals creds |
webhook_persistence → tool_response_injection |
Persistent callback feeds poisoned responses |
webhook_persistence → token_theft |
Webhook exfils credentials |
config_tampering → code_execution |
Config rewrite enables RCE |
config_tampering → webhook_persistence |
Config rewrite installs persistent callback |
response_credentials → token_theft |
Leaked creds enable further theft |
response_credentials → remote_access |
Leaked creds enable lateral movement |
exfil_flow → token_theft |
Source→sink pipeline steals creds |
exfil_flow → remote_access |
Source→sink pipeline enables remote access |
Chains are reported as CRITICAL with evidence-based tool names (e.g.
input_sanitization → code_execution (execute_command)) and appear in the
"Attack Chains Detected" section of the scan output.
DVMCP provides 10 deliberately vulnerable MCP servers for testing:
| Challenge | Port | Vulnerability |
|---|---|---|
| 1. Basic Prompt Injection | 9001 | Sensitive credentials in resources |
| 2. Tool Poisoning | 9002 | execute_command with shell=True |
| 3. Excessive Permissions | 9003 | file_manager with read/write/delete |
| 4. Rug Pull Attack | 9004 | Tool behavior changes after N calls |
| 5. Tool Shadowing | 9005 | Tool name conflicts |
| 6. Indirect Prompt Injection | 9006 | Injection via data sources |
| 7. Token Theft | 9007 | Passwords/tokens as parameters |
| 8. Code Execution | 9008 | eval() on user input |
| 9. Remote Access Control | 9009 | Command injection via remote_access |
| 10. Multi-Vector Attack | 9010 | Chained vulnerabilities |
# Run offline DVMCP challenge tests (no servers needed)
.venv/bin/pytest tests/test_dvmcp.py -v
# One-time setup for live testing
git clone https://github.com/harishsg993010/damn-vulnerable-MCP-server.git \
tests/test_targets/DVMCP
# Reset to baseline + start servers + scan (recommended)
./tests/dvmcp_reset.sh --scan
# Or step by step:
./tests/dvmcp_reset.sh # reset + start servers
./scan --port-range localhost:9001-9010 --verbose
# Scan specific challenges
./scan --targets http://localhost:9002 http://localhost:9008
# Deeper rug pull probing (more calls per tool, default is 10)
./scan --port-range localhost:9001-9010 --probe-calls 15
# Static-only scan (no tool calls)
./scan --port-range localhost:9001-9010 --no-invoke
# Run live DVMCP tests
DVMCP_LIVE=1 .venv/bin/pytest tests/test_dvmcp.py -v
# Kill servers + clean state
./tests/dvmcp_reset.sh --kill-only| Code | Meaning |
|---|---|
| 0 | Clean — scan finished with no findings |
| 1 | Findings — at least one finding was reported |
| 2 | Error — scan did not complete successfully (e.g. unreachable target, bad flags) |
For ecosystem architecture, walkthroughs, and cross-project guides: agentic-sec — the central documentation for camazotz + nullfield + mcpnuke.