A CLI tool to mirror websites for offline browsing using Playwright.
# Install globally
npm install -g site-mirror
# Or use directly via npx
npx site-mirror --help# Download a single page with all its assets (no config needed!)
site-mirror run --start https://www.apple.com/iphone/ --singlePage
# Crawl an entire site
site-mirror run --start https://example.com/
# Or use interactive config-based workflow:
site-mirror init # Interactive prompts to create site-mirror.config.json
site-mirror run # Runs the mirror using config
site-mirror serve # Serve locally on port 8080| Command | Description |
|---|---|
site-mirror init |
Interactive setup - creates site-mirror.config.json |
site-mirror run |
Run the mirror (reads config + CLI overrides) |
site-mirror serve |
Serve the ./offline folder locally |
site-mirror serve 3000 |
Serve on a custom port |
| Option | Description | Default |
|---|---|---|
--start <url> |
Start URL (required if not in config) | - |
--out <dir> |
Output directory | ./offline |
--maxPages <n> |
Max pages to crawl (0 = unlimited) | 0 |
--maxDepth <n> |
Max link depth (0 = unlimited) | 0 |
--sameOriginOnly |
Only crawl same-origin pages | true |
--seedSitemaps |
Seed URLs from sitemap.xml/robots.txt | false |
--singlePage |
Download only this page + all its assets | false |
Created via site-mirror init (interactive) or manually:
{
"start": "https://example.com/",
"out": "./offline",
"singlePage": false,
"maxPages": 200,
"maxDepth": 6,
"sameOriginOnly": true,
"seedSitemaps": false
}CLI options override config file settings.
./offline/
├── index.html # Homepage
├── about/
│ └── index.html # /about/ page
├── _next/ # Same-origin assets
│ └── static/
├── _external/ # Cross-origin assets
│ └── cdn.example.com/
│ └── script.js
- Launches headless Chromium via Playwright
- Navigates to each page, waits for network idle
- Captures all static assets (CSS, JS, images, fonts, videos)
- Rewrites absolute same-origin URLs to relative paths
- Injects a script to handle SPA-style navigation offline
- Discovers new pages via
<a href>links - Saves everything to the output directory
- XHR/fetch API responses are not saved (only rendered HTML + static assets)
- Some interactive features requiring live APIs won't work offline
- Be mindful of target site's Terms of Service and robots.txt
MIT