sitepulse

sitepulse is a Rust-based CLI and MCP-enabled site intelligence tool for technical SEO, sitemap health checks, and AI agent readiness audits.

It discovers URLs from a sitemap.xml, checks each page's HTTP status, response time, redirect state, final URL, and optional metadata, then produces terminal, CSV, JSON, HTML, JUnit, and SARIF reports. It also includes an --agent-ready audit inspired by emerging agent-web standards such as llms.txt, AI crawler rules, discovery headers, protocol discovery, structured data, DNS-AID, and agentic commerce signals.

For AI-native workflows, sitepulse can run as a local Model Context Protocol (MCP) server via sitepulse mcp, allowing Codex-compatible apps and other MCP clients to call sitemap checks, agent readiness audits, and config validation as structured tools.

The project is designed for WordPress, WooCommerce, e-commerce, publisher, and SaaS websites that need to detect broken links, 404/500 errors, redirect issues, slow pages, metadata gaps, and whether the site is ready for AI agents and crawlers.

Status

The first working version has been implemented.

Current features:

sitepulse check <SITEMAP_URL> command
sitepulse mcp command for Model Context Protocol integrations
Standard sitemap parsing
Sitemap index support
Gzip sitemap support (.xml.gz)
Maximum sitemap index depth: 2
Extract URLs from <loc>...</loc> entries
Deduplicate repeated URLs
HTTP status code reporting
Response time measurement
Redirect following
Final URL reporting
Timeout support
Custom User-Agent support
Concurrency support
Per-request delay support for politeness/rate limiting
Option to show only errors
Summary-only output option
Retry support for network errors and 5xx responses
GET/HEAD check method selection
Optional title, meta description, and canonical URL extraction
Same-host filtering option
Optional robots.txt filtering
Initial agent readiness audit (--agent-ready)
CI-friendly agent readiness score threshold
Maximum URL limit option
Dry-run discovery mode
CSV export
JSON export
HTML report export
CI-friendly non-zero exit option
Summary report
Top 10 slowest URLs
Custom User-Agent

sitepulse/0.1 (+https://example.local)

Installation

Requirements:

Rust stable
Cargo

Build the project:

cargo build

Build a release binary:

cargo build --release

Generated binary:

./target/release/sitepulse

Usage

Basic usage:

cargo run -- check https://example.com/sitemap.xml

Using the compiled binary:

sitepulse check https://example.com/sitemap.xml

CLI options

sitepulse check <SITEMAP_URL> [OPTIONS]

sitepulse config validate <FILE>

Options:

Option	Description	Default
`--config <FILE>`	Load check options from a JSON config file	None
`--concurrency <N>`	Number of concurrent HTTP checks	`10`
`--delay-ms <MS>`	Delay before each URL check request in milliseconds	`0`
`--timeout <SECONDS>`	Request timeout in seconds	`10`
`--user-agent <VALUE>`	Custom User-Agent for all HTTP requests	`sitepulse/0.1 (+https://example.local)`
`--method <METHOD>`	HTTP method for URL checks: `get` or `head`	`get`
`--analyze-meta`	Extract page title, meta description, and canonical URL. Uses GET even with `--method=head`	Disabled
`--only-errors`	Show only network errors and `4xx`/`5xx` responses	Disabled
`--summary-only`	Print only the summary, without the per-URL result table	Disabled
`--export <FILE>`	Write results to a CSV file	None
`--export-json <FILE>`	Write results to a JSON file	None
`--export-html <FILE>`	Write an HTML report	None
`--export-junit <FILE>`	Write URL check results as JUnit XML for CI systems	None
`--export-sarif <FILE>`	Write URL check findings as SARIF for code scanning systems	None
`--fail-on-errors`	Exit with code `2` if any `4xx`, `5xx`, timeout, or network error is found	Disabled
`--retries <N>`	Retry failed URL checks and `5xx` responses	`0`
`--sitemap-retries <N>`	Retry sitemap downloads before failing	`2`
`--max-urls <N>`	Limit how many discovered URLs are checked	None
`--dry-run`	Discover and filter URLs without running HTTP checks	Disabled
`--same-host-only`	Only check URLs whose host matches the sitemap URL host	Disabled
`--respect-robots`	Filter out URLs disallowed by robots.txt	Disabled
`--agent-ready`	Run an agent readiness audit for the sitemap host	Disabled
`--agent-ready-export-json <FILE>`	Write agent readiness results to a JSON file	None
`--agent-ready-export-html <FILE>`	Write agent readiness results to an HTML file	None
`--agent-ready-fail-under <PERCENT>`	Exit with code `3` if agent readiness score is below the threshold	None

Examples:

cargo run -- check https://example.com/sitemap.xml --concurrency 20

cargo run -- check https://example.com/sitemap.xml --timeout 15

cargo run -- check https://example.com/sitemap.xml --method head

cargo run -- check https://example.com/sitemap.xml --analyze-meta

cargo run -- check https://example.com/sitemap.xml --only-errors

cargo run -- check https://example.com/sitemap.xml --export report.csv

cargo run -- check https://example.com/sitemap.xml --retries 2

cargo run -- check https://example.com/sitemap.xml --max-urls 100

cargo run -- check https://example.com/sitemap.xml --same-host-only

cargo run -- check https://example.com/sitemap.xml --respect-robots

cargo run -- check https://example.com/sitemap.xml --agent-ready

cargo run -- check https://example.com/sitemap.xml --sitemap-retries 3

cargo run -- check https://example.com/sitemap.xml \
  --agent-ready \
  --agent-ready-export-json agent-ready.json \
  --agent-ready-export-html agent-ready.html \
  --agent-ready-fail-under 80

Multiple options can be used together:

cargo run -- check https://example.com/sitemap.xml \
  --concurrency 20 \
  --timeout 10 \
  --method head \
  --analyze-meta \
  --retries 2 \
  --sitemap-retries 3 \
  --max-urls 1000 \
  --same-host-only \
  --respect-robots \
  --only-errors \
  --export report.csv \
  --export-json report.json \
  --export-html report.html \
  --agent-ready \
  --agent-ready-export-json agent-ready.json \
  --agent-ready-export-html agent-ready.html

Example terminal output

Checking sitemap: https://example.com/sitemap.xml
Concurrency: 20
Timeout: 10s
User-Agent: sitepulse/0.1 (+https://example.local)
Method: HEAD
Analyze meta: yes
Retries: 2
Sitemap retries: 2

Discovered URLs: 1240

STATUS      TIME ATTEMPTS  METHOD  REDIRECT    ERROR URL
------------------------------------------------------------------------------------------
200        184ms        1     HEAD        no       no https://example.com/
301         96ms        1     HEAD       yes       no https://example.com/old -> https://example.com/new
404        121ms        1     HEAD        no       no https://example.com/missing-page
500        430ms        3     HEAD        no       no https://example.com/broken

Summary:
Total: 1240
2xx: 1190
3xx: 22
4xx: 20
5xx: 4
Errors: 4
Average response time: 218ms

Slowest URLs:
1. 3820ms https://example.com/category/electronics
2. 2910ms https://example.com/product/example

Export

Export to CSV:

cargo run -- check https://example.com/sitemap.xml --export report.csv

Export to JSON:

cargo run -- check https://example.com/sitemap.xml --export-json report.json

Export to HTML:

cargo run -- check https://example.com/sitemap.xml --export-html report.html

CSV, JSON, and HTML result fields include:

url
status
time_ms
redirected
final_url
error
attempts
method
title
meta_description
canonical_url

Project structure

src/
  main.rs      # Application entry point
  cli.rs       # CLI arguments and command definitions
  sitemap.rs   # Sitemap download, parsing, and discovery
  checker.rs   # URL HTTP checks
  report.rs    # Terminal output and summary report
  export.rs    # CSV, JSON, and HTML export
  models.rs    # Shared data models

examples/
  sitemap.xml  # Example sitemap for testing

Configuration file

--config accepts a JSON file with check options. Example:

{
  "concurrency": 5,
  "timeout": 15,
  "method": "head",
  "analyze_meta": true,
  "same_host_only": true,
  "respect_robots": true,
  "agent_ready": true,
  "agent_ready_fail_under": 70
}

Command-line options are parsed first, then config values are applied. For repeated audits, keep shared defaults in a config file and pass target-specific values such as the sitemap URL on the command line.

Development

Format code:

cargo fmt

Run compile checks:

cargo check

Run tests:

cargo test

Roadmap

Completed:

Potential next improvements:

Notes

HTTP errors do not crash the program; they are reported per URL.
If the sitemap cannot be downloaded or the XML is invalid, the program returns a clear error.
Redirects are followed and the final URL is recorded.
Duplicate URLs are deduplicated.

License

This project is licensed under the MIT License. See LICENSE for details.

Contributing

Please see CONTRIBUTING.md for development setup, validation commands, and pull request guidelines.

Security

Please see SECURITY.md for vulnerability reporting guidelines.

Changelog

Please see CHANGELOG.md for release history.

Additional documentation

Versioning automation

Versioning is automated with release-plz: https://release-plz.ieni.dev/. On pushes to main, the Release PR workflow analyzes conventional commits, updates Cargo.toml and CHANGELOG.md, and opens or updates a release pull request.

Recommended commit prefixes:

feat: for new features
fix: for bug fixes
perf: for performance improvements
docs: for documentation-only changes
refactor: for internal changes
ci: for CI changes

When the release PR is merged, release-plz can create the Git tag and GitHub Release. The existing Release workflow then builds and uploads prebuilt binaries for that tag.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sitepulse

Status

Installation

Usage

CLI options

Example terminal output

Export

Project structure

Configuration file

Development

Roadmap

Notes

License

Contributing

Security

Changelog

Additional documentation

Versioning automation

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
schemas		schemas
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
release-plz.toml		release-plz.toml

Folders and files

Latest commit

History

Repository files navigation

sitepulse

Status

Installation

Usage

CLI options

Example terminal output

Export

Project structure

Configuration file

Development

Roadmap

Notes

License

Contributing

Security

Changelog

Additional documentation

Versioning automation

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages