mcpnuke

MCP Red Teaming & Security Scanner

Security scanner for Model Context Protocol servers. Combines static metadata analysis with active behavioral probing — connects to MCP servers, enumerates tools/resources/prompts, calls tools with safe payloads, and analyzes what comes back.

Works against standard MCP (SSE, Streamable HTTP), local stdio servers (npx, python, etc.), non-standard tool servers (POST /execute), and Kubernetes-internal MCP deployments.

Use with DVMCP for training, or point at any MCP server in dev/staging/prod.

See CHANGELOG.md for recent changes and planned work.

Install

Quickstart (recommended):

git clone https://github.com/babywyrm/mcpnuke.git && cd mcpnuke
./quickstart.sh

This creates a .venv, installs all extras (dev, ai, k8s), runs tests, and prints usage. After that, ./scan and uv run mcpnuke just work — no activation needed.

uv (manual):

uv sync --all-extras
uv run mcpnuke --help

No source .venv/bin/activate needed — uv run finds the project venv automatically.

Optional extras: dev (testing/linting), ai (Claude analysis), k8s (Kubernetes checks), all (everything).

pip (manual):

python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[all,dev]"

From PyPI (coming soon):

uv pip install 'mcpnuke[all]'

Verify your install:

mcpnuke --doctor

Quick Start

New to mcpnuke? Try the DVMCP Walkthrough -- a hands-on guide that scans 10 vulnerable MCP servers and explains every finding. Or run ./walkthrough/demo.sh for the fully automated version. For command recipes across camazotz, DVMCP, deterministic benchmarking, and Bedrock variations, see QUICKSTART.md.

# Single target
./scan --targets http://localhost:2266

# DVMCP challenges 1–10
./scan --port-range localhost:9001-9010 --verbose

# Authenticated endpoint (JWT, PAT, etc.)
./scan --targets https://api.githubcopilot.com/mcp/ --auth-token ghp_xxx

# OIDC auto-token (Keycloak, etc.)
./scan --targets http://localhost:9090/mcp \
  --oidc-url http://keycloak:8080/realms/myapp \
  --client-id myapp --client-secret SECRET

# OIDC with explicit scope, extra headers, and TLS verification
./scan --targets https://target.example/mcp \
  --oidc-url https://auth.example/realms/agentic \
  --client-id scanner --client-secret SECRET \
  --oidc-scope "mcp.read mcp.invoke" \
  --header "X-Tenant: blue" \
  --header "X-Agent-Flow: planner" \
  --tls-verify

# Optional: DPoP + token introspection + JWKS metadata checks
./scan --targets https://target.example/mcp \
  --auth-token "$ACCESS_TOKEN" \
  --dpop-proof "$DPOP_PROOF_JWT" \
  --token-introspect-url "https://auth.example/oauth2/introspect" \
  --token-introspect-client-id scanner \
  --token-introspect-client-secret SECRET \
  --jwks-url "https://auth.example/.well-known/jwks.json" \
  --tls-verify \
  --json auth-flow-report.json

# JSON report for CI
./scan --port-range localhost:9001-9010 --json report.json

# Differential scan (compare to baseline)
./scan --targets http://localhost:9001 --baseline baseline.json

# Scan a local MCP server via stdin/stdout (no proxy needed)
./scan --stdio 'npx -y @modelcontextprotocol/server-everything'

# Fast scan (~2min vs ~30min) — samples top 5 security-relevant tools, skips heavy probes
./scan --targets http://localhost:9090 --fast --verbose

# Grouped findings (compact report)
./scan --targets http://localhost:9090 --group-findings

# Parallel deep probes (faster behavioral phase)
./scan --targets http://localhost:9090 --probe-workers 4

# AI-powered analysis (requires ANTHROPIC_API_KEY)
./scan --targets http://localhost:9002/sse --claude --verbose
./scan --targets http://localhost:9002/sse --claude --claude-model claude-opus-4-20250514
./scan --targets http://localhost:9002/sse --claude --claude-max-tools 25 --claude-phase2-workers 3

# AI-powered analysis via AWS Bedrock Claude (optional)
./scan --targets http://localhost:9002/sse --claude --bedrock --bedrock-region us-east-1

# Run tests
uv run pytest tests/ -v

All ./scan commands also work as uv run mcpnuke (no activation needed), mcpnuke (with venv activated), or .venv/bin/mcpnuke.

When --auth-token looks like a JWT, mcpnuke decodes it (without signature validation) and includes a safe claim summary in JSON output under auth_context.jwt_claims_summary to help validate agentic auth wiring. If configured, token introspection and JWKS fetch summaries are also included under auth_context without affecting scan behavior when disabled.

Exit codes: 0 — no findings (clean); 1 — findings reported; 2 — scan error (connection failure, invalid args, etc.). Use 1 vs 2 in CI to distinguish “vulns found” from “scanner failed.”

How It Works

1. CONNECT        Detect transport (SSE, Streamable HTTP, stdio, or custom tool server)
2. ENUMERATE      initialize → tools/list → resources/list → prompts/list
                  (or probe tool names for non-MCP /execute APIs)
3. STATIC CHECKS  Pattern-match metadata (names, descriptions, schemas)
4. PROBE          Call tools with safe payloads, read resources
5. ANALYZE        Scan responses for injection, exfil, leakage, drift
6. AGGREGATE      Detect attack chains across findings
7. REPORT         Console table (or --group-findings) + optional JSON

Scan Phases

The scanner runs checks in a deliberate order:

Phase	Checks	What Happens
Static	prompt_injection, tool_poisoning, excessive_permissions, token_theft, code_execution, remote_access, schema_risks, rate_limit, prompt_leakage, supply_chain, tool_shadowing, webhook_persistence, credential_in_schema, config_tampering, exfil_flow	Pattern-match on tool names, descriptions, schemas. No server interaction beyond enumeration.
Behavioral	rug_pull, indirect_injection, protocol_robustness	Light interaction: re-list tools, read resources, send invalid methods.
Deep Probes	deep_rug_pull, tool_response_injection, input_sanitization, error_leakage, temporal_consistency, resource_poisoning, response_credentials, state_mutation, notification_abuse	Active tool invocation with safe payloads. Analyze responses for threats.
Transport	sse_security	CORS, unauthenticated SSE, cross-origin POST.
Aggregate	multi_vector, attack_chains	Cross-reference all prior findings to detect compound threats.
AI (optional)	llm_tool_analysis, llm_response_analysis, llm_chain_reasoning	Claude reads definitions, tool output, and all findings to identify subtle risks and multi-step attack chains. Requires `--claude`.

Security Checks Reference

Static Checks (metadata only)

Check	Severity	What It Detects
`prompt_injection`	CRITICAL	Injection payloads in tool/resource/prompt descriptions
`tool_poisoning`	CRITICAL	Hidden instructions, invisible Unicode in tool descriptions
`excessive_permissions`	CRITICAL–MEDIUM	Dangerous capabilities (shell, filesystem, network, DB, cloud)
`code_execution`	CRITICAL–HIGH	Tools with exec/eval/shell parameters or descriptions
`remote_access`	CRITICAL–HIGH	Reverse shells, C2 beacons, port forwarding, data exfil
`token_theft`	CRITICAL–HIGH	Tools that accept or forward credentials as parameters
`supply_chain`	CRITICAL	Dynamic package install from user-controlled URLs
`schema_risk`	CRITICAL–MEDIUM	Command params, unbounded strings, freeform objects
`tool_shadowing`	HIGH–MEDIUM	Tool names that collide with common tools or other servers
`prompt_leakage`	HIGH	Tools that may echo, log, or expose internal prompts
`rate_limit`	MEDIUM	Descriptions suggesting unbounded/unthrottled usage
`webhook_persistence`	HIGH–MEDIUM	Callback/webhook params or tool names enabling persistent re-injection
`credential_in_schema`	CRITICAL–HIGH	Hardcoded credentials (API keys, JWTs, connection strings) in tool schemas
`config_tampering`	HIGH	Tools that can modify agent config, system prompt, or tool registry
`exfil_flow`	CRITICAL	Data flow from sensitive source tools to communication/network sinks
`jwt_algorithm`	CRITICAL–HIGH	JWT `alg:none` (signature bypass) or symmetric HMAC algorithms
`jwt_issuer`	MEDIUM	JWT missing `iss` (issuer) claim
`jwt_audience`	MEDIUM	JWT missing `aud` (audience) claim — enables cross-service replay
`jwt_token_id`	LOW	JWT missing `jti` — replay detection not possible
`jwt_ttl`	HIGH–MEDIUM	JWT with no `exp` or TTL exceeding threshold (default 4h)
`jwt_weak_key`	CRITICAL	JWT signed with a known weak/default HMAC key

Behavioral Checks (active server interaction)

Check	Severity	What It Detects
`rug_pull`	CRITICAL–HIGH	Tool list changes between two `tools/list` calls
`deep_rug_pull`	CRITICAL	Tool list/schema changes after invoking tools — catches state-dependent rug pulls, injection pattern drift (clean → poisoned after N calls)
`tool_response_injection`	CRITICAL–HIGH	Injection payloads, exfil URLs, hidden content, invisible Unicode, or base64-encoded attacks in tool responses
`cross_tool_manipulation`	HIGH	Tool output that directs the LLM to invoke a different tool
`input_sanitization`	CRITICAL–HIGH	Path traversal, command injection, template injection, SQL injection probes reflected unsanitized. LLM-aware SSTI: confirmed engine fingerprints (Jinja2/Mako/ERB/EL) stay CRITICAL; math-style template probes evaluated by the LLM (e.g. `{{7*7}}` → `49`) are downgraded to MEDIUM so LLM-backed MCP servers are not false-flagged as code SSTI.
`error_leakage`	HIGH–MEDIUM	Stack traces, internal paths, connection strings, or secrets in error responses
`temporal_consistency`	CRITICAL–MEDIUM	Escalating injection, wildly inconsistent responses, or new threats across repeated identical calls
`resource_poisoning`	CRITICAL–HIGH	Base64-encoded injection, data URIs, steganographic Unicode, CSS-hidden HTML, or markdown image exfiltration in resource content
`state_mutation`	HIGH–MEDIUM	Resources that appear, disappear, or change content after tool invocations
`notification_abuse`	CRITICAL–MEDIUM	Unsolicited `sampling/createMessage`, `roots/list`, or other server-initiated requests
`indirect_injection`	CRITICAL–HIGH	Injection/poison patterns in resource content; probes content-processing tools with embedded injection payloads
`active_prompt_injection`	CRITICAL	Sends injection payloads as tool inputs — detects instruction following, system prompt leaks, and role overrides
`response_credentials`	CRITICAL–HIGH	Credentials (API keys, passwords, private keys, connection strings) in tool responses

Transport & Aggregate Checks

Check	Severity	What It Detects
`auth`	HIGH	Unauthenticated MCP/tool-server initialize accepted
`sse_security`	HIGH–MEDIUM	Unauthenticated SSE stream, CORS misconfiguration, cross-origin POST
`multi_vector`	CRITICAL	2+ dangerous vulnerability categories active on one server
`attack_chain`	CRITICAL	Linked vulnerability pairs (e.g. `input_sanitization → code_execution`)

Behavioral Probing Methodology

The deep probe checks call tools on the target server using safe, non-destructive payloads. Here's how it works:

Safe Argument Generation

For each tool, the scanner reads the inputSchema and generates minimal arguments using context-aware safe values:

Parameter Pattern	Safe Value
`path`, `file`, `dir`	`/tmp/mcp_probe_MCP_PROBE_8f4c2a`
`command`, `exec`	`echo hello`
`expression`, `code`	`1+1`
`url`, `endpoint`	`http://example.com/test`
`username`, `name`	`probe_user`
`password`, `token`	`MCP_PROBE_8f4c2a` (canary)
Enum params	First enum value
Everything else	`test`

Only required parameters are filled. Optional params are skipped to minimize side effects.

Injection Probes

For input_sanitization, the scanner sends targeted probes based on parameter semantics:

Param Type	Probe Examples
File/path params	`../../../tmp/MCP_PROBE_8f4c2a`, path null bytes
Command params	`test; echo MCP_PROBE_8f4c2a`, pipe/backtick variants
Query/SQL params	`' OR '1'='1`, `UNION SELECT`
Other strings	`{{77}}`, `${77}`, ERB/Jinja templates

The canary string MCP_PROBE_8f4c2a is embedded in probes. If it appears in the response, the tool reflected input without sanitization.

Response Caching

When tool_response_injection calls a tool, the response is cached in probe_opts["_response_cache"]. Downstream checks like response_credentials reuse the cache instead of re-invoking the same tools, eliminating redundant calls and reducing scan time.

Response Analysis

Every tool response is scanned for:

Injection payloads — "ignore previous instructions", role overrides, system prompt markers
Semantic injection — mode switches, secrecy directives, credential requests, XML/delimiter tool-call injection
Exfiltration URLs — webhook, ngrok, burp, requestbin, pipedream, interactsh
Hidden content — HTML comments, <hidden> blocks, <script> tags
Invisible Unicode — zero-width chars, bidi overrides, invisible formatters
Base64-encoded attacks — decoded and re-scanned for injection patterns
Cross-tool references — "call tool X", "invoke function Y"
LLM classification (with --claude) — ambiguous responses sent to Claude for malicious/benign classification

CLI Reference

./scan [OPTIONS]

Target Selection:
  --targets URL [URL ...]     One or more MCP target URLs
  --port-range HOST:START-END Scan a port range (e.g. localhost:9001-9010)
  --targets-file FILE         Read URLs from file (one per line, # comments)
  --public-targets            Use built-in public targets list

Authentication:
  --auth-token TOKEN          Bearer token for authenticated endpoints
                              (or set MCP_AUTH_TOKEN env var)
  --dpop-proof JWT            Optional static DPoP header value
  --header KEY:VALUE          Extra HTTP header (repeatable)
  --tls-verify                Enable TLS certificate verification
  --oidc-scope SCOPE          Optional OAuth2 scope for client_credentials
  --token-introspect-url URL  Optional OAuth2 token introspection endpoint
  --token-introspect-client-id ID
  --token-introspect-client-secret SECRET
  --jwks-url URL              Optional JWKS endpoint for keyset metadata

Scan Options:
  --timeout SEC               Per-target connection timeout (default: 25)
  --workers N                 Parallel scan workers (default: 4)

Stdio Transport:
  --stdio CMD                 Scan a local MCP server via stdin/stdout JSON-RPC
                              (e.g. --stdio 'npx -y @modelcontextprotocol/server-everything')

Safety Controls:
  --no-invoke                 Static-only: skip all behavioral probes (safe for production)
  --safe-mode                 Skip dangerous tools (delete/send/exec/write), probe read-only
  --probe-calls N             Invocations per tool for deep rug pull (default: 10)

Performance:
  --fast                      Sample top 5 security-relevant tools, skip heavy probes
  --probe-workers N           Parallel deep behavioral probe threads (default: 1)
  --deterministic             Stable ordering + single-thread probes/AI Phase 2 for repeatable benchmarking
  --claude-phase2-workers N   Parallel Claude workers for AI Phase 2 (default: 1)
  --bedrock                   Route Claude calls through AWS Bedrock runtime
  --bedrock-region REGION     Bedrock region (e.g. us-east-1)
  --bedrock-profile PROFILE   AWS profile for Bedrock credentials
  --bedrock-model MODEL_ID    Bedrock model ID (default: anthropic.claude-3-5-sonnet-20241022-v2:0)

Tool Server:
  --tool-names-file FILE      Custom wordlist for ToolServer enumeration (supplements built-in)

Output:
  --json FILE                 Write JSON report to FILE
  --group-findings            Collapse similar findings into compact grouped rows
  --no-color                  Disable colored output (respects NO_COLOR env var)
  --verbose, -v               Verbose output
  --debug                     Debug output (very noisy)

Differential:
  --baseline FILE             Compare against baseline
  --save-baseline FILE        Save scan as baseline

Kubernetes:
  --k8s-namespace NS          Namespace for internal checks (default: default)
  --no-k8s                    Skip Kubernetes checks
  --k8s-discover              Auto-discover MCP targets via K8s service discovery
  --k8s-discover-namespaces   Namespaces to scan for MCP services
  --k8s-no-probe              Skip active probing during discovery (port match only)
  --k8s-discovery-workers N   Concurrent MCP probes during discovery (default: 10)
  --k8s-max-endpoints N       Cap number of MCP endpoints to scan (no limit by default)
  --k8s-discover-only         List discovered endpoints only; skip MCP scanning

Scan Modes

Mode	Flag	What Runs	Use Case
Full	(default)	Static + all behavioral probes	Dev/staging, DVMCP, CTFs
Fast	`--fast`	Static + top-5 tools (tiered scoring), skip heavy probes (risk-aware: retains `input_sanitization` when dangerous params detected), cap workers at 2	Quick triage, large tool sets
Safe	`--safe-mode`	Static + probes on read-only tools only	Prod servers with mixed tool risk
Static	`--no-invoke`	Static checks only, no tool calls	Prod servers, zero side-effect risk
AI	`--claude`	All checks + Claude analysis	Deep analysis, subtle vuln hunting

Fast Mode Scoring

In --fast mode, mcpnuke ranks all discovered tools using a tiered weighted scoring algorithm (_tool_security_score) and selects the top 5. The scorer considers:

Factor	How It Works
Keyword tiers (6 levels)	Exec/eval/shell keywords score highest (10), followed by secret/credential (8), webhook/callback (7), run/command (6), upload/write/file (4), admin/root (3)
Name vs description	Keywords in the tool name get 3x the weight of keywords in the description
Dangerous parameters	Params named `url`, `command`, `code`, `query`, `script`, `host`, etc. add +8 each
Schema complexity	Number of input properties (capped at 3) adds a small bonus
High-value floor	Tools with names containing `secret`, `credential`, `password`, `token`, `config`, etc. get a minimum score of 15, even if other signals are weak

This ensures zero-parameter tools like server-config and secrets.leak_config rank above benign tools like smelt-item or move-to-position, and that tools with dangerous parameter surfaces (run-maintenance, admin-webhook, fetch-skin) are consistently selected.

AI-Powered Analysis (Claude)

Add --claude to any scan to layer LLM reasoning on top of deterministic checks. Requires the anthropic package and ANTHROPIC_API_KEY env var. By default, mcpnuke uses direct Claude API calls; Bedrock is opt-in via --bedrock.

Setup:

# If installed via quickstart.sh or uv sync --all-extras, anthropic is included.
# Otherwise install the AI extra:
uv pip install -e ".[ai]"    # or: pip install anthropic

export ANTHROPIC_API_KEY=sk-ant-...

For Bedrock mode, the same ai extra includes boto3; configure AWS credentials and pass --bedrock (plus optional region/profile/model flags).

If --claude is used without the package or API key, mcpnuke exits immediately with a clear error message instead of running the full scan first.

Usage:

# Sonnet (fast, default)
./scan --targets http://localhost:9002/sse --claude --verbose

# Opus (deepest reasoning)
./scan --targets http://localhost:9002/sse --claude --claude-model claude-opus-4-20250514

# Fast mode + Claude (deterministic fast scan, then AI analysis)
./scan --targets http://localhost:9090 --fast --claude --verbose

# Faster Claude Phase 2 on medium/large toolsets
./scan --targets http://localhost:9090 --fast --claude --claude-max-tools 25 --claude-phase2-workers 3

# Repeatable benchmarking mode (recommended for run-to-run comparisons)
./scan --targets http://localhost:9090 --fast --claude --deterministic --verbose

# Claude via Bedrock (no ANTHROPIC_API_KEY required)
./scan --targets http://localhost:9090 --fast --claude --bedrock --bedrock-region us-east-1

--claude-phase2-workers guidance:

Default is 1 (serial). This is safe and works out of the box.
Use 2-4 to reduce wall-clock time when Phase 2 dominates runtime.
Keep 1 if your key is rate-limited or target/network is unstable.
This flag is optional; scans run normally without it.

--deterministic guidance:

Forces stable tool ordering and single-threaded deep probes/AI Phase 2.
Use this for benchmarking and CI drift checks when you need tighter run-to-run consistency.
This does not remove model/target nondeterminism entirely, but it reduces scanner-side variance.

mcpnuke uses a three-layer analysis architecture. Each layer catches what the previous one can't:

Layer 1: Deterministic (regex patterns)     — what tools SAY
Layer 2: Behavioral (call tools, probe)     — what tools DO
Layer 3: Claude AI (read, reason, chain)    — what tools MEAN

Claude runs three phases after deterministic + behavioral checks:

Phase	What it does	Example finding
Tool analysis	Reads definitions for subtle poisoning, social engineering, logical risks	"These tools chain into a privilege escalation path"
Response analysis	Reads actual tool output for manipulation, hidden intent, credential leakage	"Tool response is a fake paywall — social engineering the LLM"
Chain reasoning	Connects all findings into multi-step attack scenarios	"Unauthenticated access → command injection → lateral movement → persistence"

Real example from DVMCP Challenge 4 (Rug Pull):

Layer	Findings	Score
Deterministic only	5 (schema_risk, auth, SSE)	26
+ Behavioral probes	6 (+ deep_rug_pull)	36
+ Claude Opus	10 (+ social engineering, attack chains)	64

AI findings are prefixed with [AI] and include taxonomy IDs (e.g. [AI] [MCP-T03]). They appear alongside deterministic findings in the same report.

Tools are classified as dangerous if their name contains keywords like delete, execute, send, write, deploy, kill, transfer, etc. In --safe-mode, these are skipped while read-only tools (get, list, search, check, verify, etc.) are still probed.

Quickstart Scenarios

Scan DVMCP (all 10 challenges)

# Terminal 1: start challenge servers
./tests/dvmcp_reset.sh --setup-only

# Terminal 2: scan
./scan --port-range localhost:9001-9010 --verbose

Custom tool server (non-MCP /execute API)

# Servers that use POST /execute with {"tool": "...", "query": "..."} instead of MCP
./scan --targets http://localhost:5000/execute --verbose

# With custom tool names wordlist for a specific engagement
./scan --targets http://localhost:5000/execute --tool-names-file my_tools.txt

The scanner auto-detects non-MCP tool servers by probing 20+ common execute/invoke paths and fingerprints the framework (Flask, FastAPI, Express, Spring Boot, etc.) from response headers. Tools are enumerated from a built-in wordlist (data/tool_names.txt, 84 names) supplemented by any custom wordlist. All static + behavioral checks run against discovered tools.

Authenticated endpoint (GitHub MCP)

./scan --targets https://api.githubcopilot.com/mcp/ --auth-token ghp_xxx

# Or via env var
export MCP_AUTH_TOKEN=ghp_xxx
./scan --targets https://api.githubcopilot.com/mcp/

Remote public MCP (DeepWiki)

./scan --targets https://mcp.deepwiki.com/mcp

Use /mcp (Streamable HTTP), not /sse.

Differential scan

# Save baseline
./scan --targets http://localhost:9001 --save-baseline baseline.json

# Later: detect regressions
./scan --targets http://localhost:9001 --baseline baseline.json

Reports added/removed/modified tools, resources, prompts. New tools flagged as MEDIUM for review.

JSON report for CI

./scan --port-range localhost:9001-9010 --json report.json

Exit code is 1 if the scan completes and reports findings, 0 if clean, and 2 on scan errors. Use in CI pipelines to gate deployments and to separate “findings” from “scanner failure.”

Run tests

# Full suite
uv run pytest tests/ -v

# DVMCP challenges only
uv run pytest tests/test_dvmcp.py -v

# Stop on first failure
uv run pytest tests/ -v -x

Kubernetes Deployment

Deploy mcpnuke as a K8s Job to scan cluster-internal MCP services and audit the Kubernetes posture from inside.

Clusters with many MCPs

When a cluster has many services (dozens or hundreds of potential MCP endpoints):

Parallel discovery — MCP probes run with --k8s-discovery-workers (default 10). Increase for faster discovery: --k8s-discovery-workers 20.
Cap endpoints — Limit how many MCPs are scanned: --k8s-max-endpoints 50. Annotation-sourced endpoints are kept first; then probed; then port-match.
Discover-only triage — List endpoints without running full MCP scans: mcpnuke --k8s-discover --k8s-discover-only --json endpoints.json to export a URL list for triage or splitting across jobs.
Service fingerprinting — Uses the same worker count for parallel HTTP probes when enumerating frameworks and exposed actuator/debug paths.

Note: Use mcpnuke (not ./scan) in K8s manifests — inside the container the package is installed globally.

Quick deploy

# Build the image
docker build -f mcpnuke/k8s/Dockerfile -t mcpnuke:latest .

# Deploy (read-only cluster access)
kubectl apply -k mcpnuke/k8s/manifests/

# Optional: enable full RBAC auditing (SA blast radius mapping)
kubectl apply -f mcpnuke/k8s/manifests/rbac-impersonate.yaml

# Check results
kubectl logs -n mcpnuke -l app.kubernetes.io/name=mcpnuke

Note: The base deployment grants read-only access to services, pods, secrets, configmaps, and network policies. The optional rbac-impersonate.yaml adds ServiceAccount impersonation, which lets the scanner enumerate effective permissions for every SA in the target namespace. This is an elevated privilege -- apply it only if you want complete RBAC auditing. The scanner degrades gracefully without it.

What it checks in-cluster

Check	What It Finds
RBAC enumeration	Which resources the scanner's SA can access (secrets, configmaps, pods)
SA blast radius	Maps effective permissions for every ServiceAccount; flags overprivileged accounts
Helm secret scanning	Decodes Helm release secrets (base64→base64→gzip) and scans values for private keys and credentials
Helm version drift	Compares release versions to find credentials removed in newer releases but still recoverable from old ones
Pod security	Privileged containers, hostNetwork/PID, dangerous capabilities, hostPath mounts, root UID, missing resource limits
ConfigMap leaks	Scans ConfigMap data for private keys and credential-named fields
NetworkPolicy audit	Flags namespaces with no network policies
Service fingerprinting	Identifies frameworks (Spring Boot, Flask, Express, etc.) and probes for exposed actuator, debug, swagger, and admin endpoints
MCP discovery	Auto-discovers MCP servers via annotations (`mcp.io/enabled`) and well-known port probing
Tool server detection	Detects non-MCP tool-execute APIs (`POST /execute`) by probing with tool-style payloads; enumerates available tools by name

Recurring scans

Use the CronJob manifest for periodic auditing:

kubectl apply -f mcpnuke/k8s/manifests/cronjob.yaml

Default schedule: every 6 hours. Edit the spec.schedule field to change.

Customization

Edit k8s/manifests/job.yaml args to target specific namespaces:

args:
  - "--k8s-discover"
  - "--k8s-discover-namespaces"
  - "my-namespace"
  - "--k8s-namespace"
  - "my-namespace"
  - "--verbose"
  - "--json"
  - "/reports/scan.json"

Project Structure

.
├── quickstart.sh              # One-command setup (venv + install + tests)
├── scan                       # Zero-config runner (no venv activation needed)
├── mcpnuke/                # Python package
│   ├── __init__.py            # Version, package docstring
│   ├── __main__.py            # Entry point (python -m mcpnuke)
│   ├── cli.py                 # Argument parsing
│   ├── scanner.py             # Scan orchestration, parallel execution, cross-target analysis
│   ├── diff.py                # Differential scanning (baseline save/load/compare)
│   ├── core/
│   │   ├── constants.py       # Protocol versions, severity weights, attack chain patterns
│   │   ├── enumerator.py      # MCP handshake: initialize → list tools/resources/prompts
│   │   ├── models.py          # Finding, TargetResult dataclasses
│   │   └── session.py         # SSE + HTTP + Stdio + ToolServer transport detection and sessions
│   ├── patterns/
│   │   ├── rules.py           # Static regex patterns (injection, poison, theft, exec, etc.)
│   │   └── probes.py          # Behavioral probe payloads, canary strings, response analysis
│   ├── checks/
│   │   ├── __init__.py        # Check registry and run_all_checks() orchestrator
│   │   ├── injection.py       # prompt_injection, tool_poisoning, indirect_injection, active_prompt_injection
│   │   ├── permissions.py     # excessive_permissions, schema_risks
│   │   ├── behavioral.py      # rug_pull, deep_rug_pull, state_mutation, notification_abuse
│   │   ├── tool_probes.py     # response_injection, input_sanitization, error_leakage
│   │   ├── theft.py           # token_theft
│   │   ├── execution.py       # code_execution, remote_access
│   │   ├── chaining.py        # tool_shadowing, multi_vector, attack_chains
│   │   ├── transport.py       # sse_security (CORS, unauth SSE, cross-origin POST)
│   │   ├── rate_limit.py      # rate_limit
│   │   ├── prompt_leakage.py  # prompt_leakage
│   │   ├── supply_chain.py    # supply_chain
│   │   ├── webhook_persistence.py  # webhook_persistence (name + param detection)
│   │   ├── credential_in_schema.py # credential_in_schema
│   │   ├── config_tampering.py     # config_tampering
│   │   ├── exfil_flow.py           # exfil_flow (source→sink with live verification)
│   │   └── response_credentials.py # response_credentials (cached response reuse)
│   ├── data/
│   │   ├── public_targets.txt # Built-in target URLs (DVMCP, public MCP servers)
│   │   └── tool_names.txt     # Wordlist for ToolServer tool enumeration
│   ├── k8s/
│   │   ├── scanner.py         # RBAC, Helm secrets, pod security, SA blast radius
│   │   ├── discovery.py       # MCP auto-discovery via annotations + port probing
│   │   ├── fingerprint.py     # Framework detection + exposed endpoint probing
│   │   ├── Dockerfile         # Multi-stage Python 3.12-slim image
│   │   └── manifests/         # Kustomize-ready K8s deployment manifests
│   └── reporting/
│       ├── console.py         # Rich table output
│       └── json_out.py        # JSON report writer
├── tests/                     # Pytest suite (224 tests, incl. DVMCP challenges)
│   ├── test_dvmcp.py          # DVMCP challenges 1-10 (offline + optional live)
│   ├── test_cli.py            # CLI argument parsing
│   ├── test_diff.py           # Differential scanning
│   ├── test_k8s.py            # Kubernetes checks
│   ├── test_fast_sampling.py  # _tool_security_score + _pick_security_relevant
│   ├── test_webhook_persistence.py
│   ├── test_response_credentials.py
│   ├── test_exfil_flow.py
│   ├── test_config_tampering.py
│   ├── test_credential_in_schema.py
│   └── ...
├── walkthrough/               # Hands-on DVMCP guide + automated demo
│   ├── README.md              # Progressive walkthrough with annotated findings
│   └── demo.sh                # Zero-to-findings automated demo script
├── pyproject.toml             # Project metadata, dependencies, entry points
├── CHANGELOG.md
└── README.md

Risk Scoring

Score = SUM(finding_weights)

  CRITICAL  →  10 points
  HIGH      →   7 points
  MEDIUM    →   4 points
  LOW       →   1 point

Rating:
  ≥ 20  →  CRITICAL
  ≥ 10  →  HIGH
  ≥  5  →  MEDIUM
  ≥  1  →  LOW
     0  →  CLEAN

Attack Chain Detection

After all individual checks run, the scanner looks for linked vulnerability pairs that combine into compound attack paths:

Chain	Risk
`prompt_injection → code_execution`	Injection leads to RCE
`prompt_injection → token_theft`	Injection leads to credential exfil
`code_execution → token_theft`	RCE used to steal credentials
`code_execution → remote_access`	RCE to persistent access
`indirect_injection → token_theft`	Poisoned data exfils creds
`tool_response_injection → cross_tool_manipulation`	Output hijacks tool flow
`deep_rug_pull → tool_poisoning`	Post-trust tool mutation
`input_sanitization → code_execution`	Unsanitized input to RCE
`resource_poisoning → tool_response_injection`	Poisoned resource feeds tool
`cross_tool_manipulation → token_theft`	Tool chaining steals creds
`webhook_persistence → tool_response_injection`	Persistent callback feeds poisoned responses
`webhook_persistence → token_theft`	Webhook exfils credentials
`config_tampering → code_execution`	Config rewrite enables RCE
`config_tampering → webhook_persistence`	Config rewrite installs persistent callback
`response_credentials → token_theft`	Leaked creds enable further theft
`response_credentials → remote_access`	Leaked creds enable lateral movement
`exfil_flow → token_theft`	Source→sink pipeline steals creds
`exfil_flow → remote_access`	Source→sink pipeline enables remote access

Chains are reported as CRITICAL with evidence-based tool names (e.g. input_sanitization → code_execution (execute_command)) and appear in the "Attack Chains Detected" section of the scan output.

Testing with DVMCP

DVMCP provides 10 deliberately vulnerable MCP servers for testing:

Challenge	Port	Vulnerability
1. Basic Prompt Injection	9001	Sensitive credentials in resources
2. Tool Poisoning	9002	`execute_command` with `shell=True`
3. Excessive Permissions	9003	`file_manager` with read/write/delete
4. Rug Pull Attack	9004	Tool behavior changes after N calls
5. Tool Shadowing	9005	Tool name conflicts
6. Indirect Prompt Injection	9006	Injection via data sources
7. Token Theft	9007	Passwords/tokens as parameters
8. Code Execution	9008	`eval()` on user input
9. Remote Access Control	9009	Command injection via `remote_access`
10. Multi-Vector Attack	9010	Chained vulnerabilities

# Run offline DVMCP challenge tests (no servers needed)
.venv/bin/pytest tests/test_dvmcp.py -v

# One-time setup for live testing
git clone https://github.com/harishsg993010/damn-vulnerable-MCP-server.git \
    tests/test_targets/DVMCP

# Reset to baseline + start servers + scan (recommended)
./tests/dvmcp_reset.sh --scan

# Or step by step:
./tests/dvmcp_reset.sh                  # reset + start servers
./scan --port-range localhost:9001-9010 --verbose

# Scan specific challenges
./scan --targets http://localhost:9002 http://localhost:9008

# Deeper rug pull probing (more calls per tool, default is 10)
./scan --port-range localhost:9001-9010 --probe-calls 15

# Static-only scan (no tool calls)
./scan --port-range localhost:9001-9010 --no-invoke

# Run live DVMCP tests
DVMCP_LIVE=1 .venv/bin/pytest tests/test_dvmcp.py -v

# Kill servers + clean state
./tests/dvmcp_reset.sh --kill-only

Exit Code

Code	Meaning
0	Clean — scan finished with no findings
1	Findings — at least one finding was reported
2	Error — scan did not complete successfully (e.g. unreachable target, bad flags)

Documentation Hub

For ecosystem architecture, walkthroughs, and cross-project guides: agentic-sec — the central documentation for camazotz + nullfield + mcpnuke.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.cursor		.cursor
.github/workflows		.github/workflows
docs		docs
mcpnuke		mcpnuke
tests		tests
walkthrough		walkthrough
.gitignore		.gitignore
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
pyproject.toml		pyproject.toml
quickstart.sh		quickstart.sh
scan		scan

Folders and files

Latest commit

History

Repository files navigation

mcpnuke

Install

Quick Start

How It Works

Scan Phases

Security Checks Reference

Static Checks (metadata only)

Behavioral Checks (active server interaction)

Transport & Aggregate Checks

Behavioral Probing Methodology

Safe Argument Generation

Injection Probes

Response Caching

Response Analysis

CLI Reference

Scan Modes

Fast Mode Scoring

AI-Powered Analysis (Claude)

Quickstart Scenarios

Scan DVMCP (all 10 challenges)

Custom tool server (non-MCP /execute API)

Authenticated endpoint (GitHub MCP)

Remote public MCP (DeepWiki)

Differential scan

JSON report for CI

Run tests

Kubernetes Deployment

Clusters with many MCPs

Quick deploy

What it checks in-cluster

Recurring scans

Customization

Project Structure

Risk Scoring

Attack Chain Detection

Testing with DVMCP

Exit Code

Documentation Hub

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages