sitepulse is a Rust-based CLI and MCP-enabled site intelligence tool for technical SEO, sitemap health checks, and AI agent readiness audits.
It discovers URLs from a sitemap.xml, checks each page's HTTP status, response time, redirect state, final URL, and optional metadata, then produces terminal, CSV, JSON, HTML, JUnit, and SARIF reports. It also includes an --agent-ready audit inspired by emerging agent-web standards such as llms.txt, AI crawler rules, discovery headers, protocol discovery, structured data, DNS-AID, and agentic commerce signals.
For AI-native workflows, sitepulse can run as a local Model Context Protocol (MCP) server via sitepulse mcp, allowing Codex-compatible apps and other MCP clients to call sitemap checks, agent readiness audits, and config validation as structured tools.
The project is designed for WordPress, WooCommerce, e-commerce, publisher, and SaaS websites that need to detect broken links, 404/500 errors, redirect issues, slow pages, metadata gaps, and whether the site is ready for AI agents and crawlers.
The first working version has been implemented.
Current features:
sitepulse check <SITEMAP_URL>commandsitepulse mcpcommand for Model Context Protocol integrations- Standard sitemap parsing
- Sitemap index support
- Gzip sitemap support (
.xml.gz) - Maximum sitemap index depth:
2 - Extract URLs from
<loc>...</loc>entries - Deduplicate repeated URLs
- HTTP status code reporting
- Response time measurement
- Redirect following
- Final URL reporting
- Timeout support
- Custom User-Agent support
- Concurrency support
- Per-request delay support for politeness/rate limiting
- Option to show only errors
- Summary-only output option
- Retry support for network errors and
5xxresponses - GET/HEAD check method selection
- Optional title, meta description, and canonical URL extraction
- Same-host filtering option
- Optional robots.txt filtering
- Initial agent readiness audit (
--agent-ready) - CI-friendly agent readiness score threshold
- Maximum URL limit option
- Dry-run discovery mode
- CSV export
- JSON export
- HTML report export
- CI-friendly non-zero exit option
- Summary report
- Top 10 slowest URLs
- Custom User-Agent
sitepulse/0.1 (+https://example.local)
Requirements:
- Rust stable
- Cargo
Build the project:
cargo buildBuild a release binary:
cargo build --releaseGenerated binary:
./target/release/sitepulseBasic usage:
cargo run -- check https://example.com/sitemap.xmlUsing the compiled binary:
sitepulse check https://example.com/sitemap.xmlsitepulse check <SITEMAP_URL> [OPTIONS]sitepulse config validate <FILE>Options:
| Option | Description | Default |
|---|---|---|
--config <FILE> |
Load check options from a JSON config file | None |
--concurrency <N> |
Number of concurrent HTTP checks | 10 |
--delay-ms <MS> |
Delay before each URL check request in milliseconds | 0 |
--timeout <SECONDS> |
Request timeout in seconds | 10 |
--user-agent <VALUE> |
Custom User-Agent for all HTTP requests | sitepulse/0.1 (+https://example.local) |
--method <METHOD> |
HTTP method for URL checks: get or head |
get |
--analyze-meta |
Extract page title, meta description, and canonical URL. Uses GET even with --method=head |
Disabled |
--only-errors |
Show only network errors and 4xx/5xx responses |
Disabled |
--summary-only |
Print only the summary, without the per-URL result table | Disabled |
--export <FILE> |
Write results to a CSV file | None |
--export-json <FILE> |
Write results to a JSON file | None |
--export-html <FILE> |
Write an HTML report | None |
--export-junit <FILE> |
Write URL check results as JUnit XML for CI systems | None |
--export-sarif <FILE> |
Write URL check findings as SARIF for code scanning systems | None |
--fail-on-errors |
Exit with code 2 if any 4xx, 5xx, timeout, or network error is found |
Disabled |
--retries <N> |
Retry failed URL checks and 5xx responses |
0 |
--sitemap-retries <N> |
Retry sitemap downloads before failing | 2 |
--max-urls <N> |
Limit how many discovered URLs are checked | None |
--dry-run |
Discover and filter URLs without running HTTP checks | Disabled |
--same-host-only |
Only check URLs whose host matches the sitemap URL host | Disabled |
--respect-robots |
Filter out URLs disallowed by robots.txt | Disabled |
--agent-ready |
Run an agent readiness audit for the sitemap host | Disabled |
--agent-ready-export-json <FILE> |
Write agent readiness results to a JSON file | None |
--agent-ready-export-html <FILE> |
Write agent readiness results to an HTML file | None |
--agent-ready-fail-under <PERCENT> |
Exit with code 3 if agent readiness score is below the threshold |
None |
Examples:
cargo run -- check https://example.com/sitemap.xml --concurrency 20cargo run -- check https://example.com/sitemap.xml --timeout 15cargo run -- check https://example.com/sitemap.xml --method headcargo run -- check https://example.com/sitemap.xml --analyze-metacargo run -- check https://example.com/sitemap.xml --only-errorscargo run -- check https://example.com/sitemap.xml --export report.csvcargo run -- check https://example.com/sitemap.xml --retries 2cargo run -- check https://example.com/sitemap.xml --max-urls 100cargo run -- check https://example.com/sitemap.xml --same-host-onlycargo run -- check https://example.com/sitemap.xml --respect-robotscargo run -- check https://example.com/sitemap.xml --agent-readycargo run -- check https://example.com/sitemap.xml --sitemap-retries 3cargo run -- check https://example.com/sitemap.xml \
--agent-ready \
--agent-ready-export-json agent-ready.json \
--agent-ready-export-html agent-ready.html \
--agent-ready-fail-under 80Multiple options can be used together:
cargo run -- check https://example.com/sitemap.xml \
--concurrency 20 \
--timeout 10 \
--method head \
--analyze-meta \
--retries 2 \
--sitemap-retries 3 \
--max-urls 1000 \
--same-host-only \
--respect-robots \
--only-errors \
--export report.csv \
--export-json report.json \
--export-html report.html \
--agent-ready \
--agent-ready-export-json agent-ready.json \
--agent-ready-export-html agent-ready.htmlChecking sitemap: https://example.com/sitemap.xml
Concurrency: 20
Timeout: 10s
User-Agent: sitepulse/0.1 (+https://example.local)
Method: HEAD
Analyze meta: yes
Retries: 2
Sitemap retries: 2
Discovered URLs: 1240
STATUS TIME ATTEMPTS METHOD REDIRECT ERROR URL
------------------------------------------------------------------------------------------
200 184ms 1 HEAD no no https://example.com/
301 96ms 1 HEAD yes no https://example.com/old -> https://example.com/new
404 121ms 1 HEAD no no https://example.com/missing-page
500 430ms 3 HEAD no no https://example.com/broken
Summary:
Total: 1240
2xx: 1190
3xx: 22
4xx: 20
5xx: 4
Errors: 4
Average response time: 218ms
Slowest URLs:
1. 3820ms https://example.com/category/electronics
2. 2910ms https://example.com/product/example
Export to CSV:
cargo run -- check https://example.com/sitemap.xml --export report.csvExport to JSON:
cargo run -- check https://example.com/sitemap.xml --export-json report.jsonExport to HTML:
cargo run -- check https://example.com/sitemap.xml --export-html report.htmlCSV, JSON, and HTML result fields include:
urlstatustime_msredirectedfinal_urlerrorattemptsmethodtitlemeta_descriptioncanonical_url
src/
main.rs # Application entry point
cli.rs # CLI arguments and command definitions
sitemap.rs # Sitemap download, parsing, and discovery
checker.rs # URL HTTP checks
report.rs # Terminal output and summary report
export.rs # CSV, JSON, and HTML export
models.rs # Shared data models
examples/
sitemap.xml # Example sitemap for testing
--config accepts a JSON file with check options. Example:
{
"concurrency": 5,
"timeout": 15,
"method": "head",
"analyze_meta": true,
"same_host_only": true,
"respect_robots": true,
"agent_ready": true,
"agent_ready_fail_under": 70
}Command-line options are parsed first, then config values are applied. For repeated audits, keep shared defaults in a config file and pass target-specific values such as the sitemap URL on the command line.
Format code:
cargo fmtRun compile checks:
cargo checkRun tests:
cargo testCompleted:
-
Project skeleton
-
Cargo.toml -
CLI command
-
Sitemap download
-
URL parsing
-
HTTP checks
-
Concurrency
-
Per-request delay support for politeness/rate limiting
-
Timeout
-
Custom User-Agent support
-
--only-errors -
--summary-only -
Retry support
-
Sitemap download retry support
-
GET/HEAD check method selection
-
Optional title, meta description, and canonical URL extraction
-
Same-host filtering option
-
Optional robots.txt filtering
-
Initial agent readiness audit (
--agent-ready) -
CI-friendly agent readiness score threshold
-
Maximum URL limit option
-
Dry-run discovery mode
-
CSV export
-
JSON export
-
HTML report export
-
CI-friendly
--fail-on-errorsoption -
Sitemap index support
-
Gzip sitemap support
-
Slow URL list
-
README
-
Integration tests with a local HTTP server
-
Expanded agent readiness audit (
--agent-ready)- Discoverability checks:
robots.txt, sitemap directives,Linkheaders, DNS-AID - Content accessibility checks:
llms.txt,llms-full.txt, Markdown negotiation - Bot access control checks: AI bot rules, allow/block detection, Content Signals, Web Bot Auth
- Protocol discovery checks: MCP, Agent Skills, WebMCP, A2A, API catalog, OAuth,
auth.md - Page intelligence checks: title, meta description, canonical URL, OpenGraph, JSON-LD, semantic HTML
- Commerce readiness checks: x402, MPP, UCP, ACP
- Scoring/reporting: score, PASS/WARN/FAIL checklist, JSON/HTML exports
- Discoverability checks:
-
Add GitHub release workflow for tagged binary releases
-
Automated versioning with release-plz
-
Add configuration file support for repeated audits
-
Add basic per-request politeness delay
-
Add JUnit and SARIF CI exports
-
Richer structured data validation for JSON-LD schema types
-
Per-host concurrency controls
-
Add Homebrew tap formula draft
-
Add
v0.1.0release notes draft -
Add advanced per-host rate window controls
-
Publish GitHub release notes and binaries for
v0.1.0 -
Publish prebuilt release binaries
Potential next improvements:
- HTTP errors do not crash the program; they are reported per URL.
- If the sitemap cannot be downloaded or the XML is invalid, the program returns a clear error.
- Redirects are followed and the final URL is recorded.
- Duplicate URLs are deduplicated.
This project is licensed under the MIT License. See LICENSE for details.
Please see CONTRIBUTING.md for development setup, validation commands, and pull request guidelines.
Please see SECURITY.md for vulnerability reporting guidelines.
Please see CHANGELOG.md for release history.
- Usage guide
- Configuration
- Export formats
- Agent readiness audit
- MCP support
- CI guide
- Release process
- WordPress and WooCommerce guide
Versioning is automated with release-plz: https://release-plz.ieni.dev/. On pushes to main, the Release PR workflow analyzes conventional commits, updates Cargo.toml and CHANGELOG.md, and opens or updates a release pull request.
Recommended commit prefixes:
- feat: for new features
- fix: for bug fixes
- perf: for performance improvements
- docs: for documentation-only changes
- refactor: for internal changes
- ci: for CI changes
When the release PR is merged, release-plz can create the Git tag and GitHub Release. The existing Release workflow then builds and uploads prebuilt binaries for that tag.