Skip to content

Fix dependency alerts + 6 major enhancements#23

Merged
MALathon merged 6 commits intomainfrom
claude/fix-dependency-alerts-pXz6A
Apr 8, 2026
Merged

Fix dependency alerts + 6 major enhancements#23
MALathon merged 6 commits intomainfrom
claude/fix-dependency-alerts-pXz6A

Conversation

@MALathon
Copy link
Copy Markdown
Owner

@MALathon MALathon commented Apr 8, 2026

Summary

  • Fix dependency vulnerabilities: Update certifi, urllib3, idna, requests, Pygments and all other deps to latest
  • Enhancement 1 — Concurrent Downloads: Parallel PDF downloading via ThreadPoolExecutor (--concurrent --max-workers N)
  • Enhancement 2 — Persistent Document Catalog: SQLite-backed tracking of every PDF across runs with content-hash change detection, deduplication, metadata extraction, JSON/CSV export
  • Enhancement 3 — Watch Mode: One-shot diff (fetcharoo diff) and continuous monitoring (fetcharoo watch) with stdout/JSON/webhook/command notifications
  • Enhancement 4 — MCP Server: Exposes all fetcharoo tools to AI agents via Model Context Protocol
  • Enhancement 5 — Site Schemas Registry: Pre-built configs for arXiv, IETF RFCs, SEC EDGAR, W3C, Federal Register with auto-detection
  • Enhancement 6 — MCP Caching Proxy + Snapshot Monitoring: Wraps any upstream MCP server (e.g., trial-guide) with SQLite caching and change tracking. Single unified server: fetcharoo mcp serve --upstream "npx trial-guide" --ttl 3600

Test plan

  • All 372 tests pass (276 existing + 96 new)
  • No new dependencies required (SQLite is stdlib, threads are stdlib)
  • MCP features are optional — fetcharoo works without mcp package installed
  • Full backward compatibility — all existing CLI and API behavior preserved

https://claude.ai/code/session_01EFk8Enntgip8z3nqk1ppkA

claude added 6 commits April 7, 2026 23:04
Bump certifi (2023.5.7 → 2026.2.25), idna (3.4 → 3.11), and
urllib3 (2.0.2 → 2.6.3) to resolve known CVEs including
CVE-2023-37920, CVE-2024-39689, CVE-2024-3651, CVE-2023-43804,
CVE-2023-45803, and CVE-2024-37891.

https://claude.ai/code/session_01EFk8Enntgip8z3nqk1ppkA
Bump charset-normalizer (3.1.0 → 3.4.7), requests (2.32.5 → 2.33.1),
soupsieve (2.4.1 → 2.8.3), pymupdf (1.26.7 → 1.27.2.2),
packaging (23.1 → 26.0), and others to resolve remaining
dependency alerts.

https://claude.ai/code/session_01EFk8Enntgip8z3nqk1ppkA
…MCP server, site schemas

Enhancement 1 - Concurrent Downloads:
  - New async_downloader.py with ThreadPoolExecutor-based parallel downloads
  - Thread-safe rate limiter shared across workers
  - --concurrent and --max-workers CLI flags
  - Backward compatible: sequential remains default

Enhancement 2 - Persistent Document Catalog:
  - New catalog.py with SQLite-backed DocumentCatalog
  - Content-hash-based change detection and cross-URL deduplication
  - Run history tracking with diff summaries
  - Export as JSON/CSV, search by URL/filename
  - CLI: fetcharoo catalog {show|export|search|runs|duplicates}

Enhancement 3 - Watch Mode:
  - New watcher.py and notifications.py
  - One-shot diff: fetcharoo diff <url> (cron-friendly)
  - Continuous watch: fetcharoo watch <url> --interval 3600
  - Notifications: stdout, JSON, webhook, shell command
  - Git-like diff output: + new, ~ changed, - removed

Enhancement 4 - MCP Server:
  - New mcp_server.py exposing stateful tools via FastMCP
  - Tools: discover_pdfs, download_pdfs, catalog_query, catalog_diff,
    catalog_search, get_document_metadata, find_duplicate_documents
  - Optional dependency: pip install fetcharoo[mcp]
  - CLI: fetcharoo mcp serve

Enhancement 5 - Community Site Schemas:
  - 5 built-in schemas: arxiv, ietf_rfc, sec_edgar, w3c, federal_register
  - Auto-detection: --schema auto matches URL to schema
  - find_schema() and list_schemas() API
  - CLI: fetcharoo schemas {list|match}

All 337 tests pass (276 existing + 61 new).

https://claude.ai/code/session_01EFk8Enntgip8z3nqk1ppkA
Rewrite README to document the 5 new enhancements: concurrent
downloads, persistent document catalog, watch mode, MCP server,
and community site schemas. Includes CLI examples, Python API
usage, and MCP server configuration.

https://claude.ai/code/session_01EFk8Enntgip8z3nqk1ppkA
MCP Caching Proxy (mcp_proxy.py):
  - Wraps any upstream MCP server as a caching layer (Redis for MCP)
  - SQLite-backed ToolCache with TTL-based freshness
  - Content-hash change detection across cached calls
  - Meta-tools: _proxy_call, _cache_status, _cache_history,
    _cache_refresh, _cache_clear
  - CLI: fetcharoo proxy --server "npx trial-guide" --ttl 3600

Snapshot Monitoring (mcp_monitor.py):
  - SnapshotStore for tracking MCP tool outputs over time
  - Content-hash diffing: new/changed/removed/unchanged records
  - Works with any data source (MCP servers, REST APIs, files)
  - Nested field extraction via dot notation for record IDs
  - CLI: fetcharoo monitor {snapshot|sources|history|search}

Clinical Trials Preset (presets/clinical_trials.py):
  - Pre-configured for ClinicalTrials.gov API v2 data model
  - NCT ID extraction, human-readable formatting
  - Works with trial-guide and other clinical trials MCP servers

MCP Server updates:
  - Added snapshot_monitor, snapshot_query, snapshot_history,
    snapshot_sources, snapshot_search tools
  - AI agents get persistent change tracking for any data

All 372 tests pass (337 existing + 35 new).

https://claude.ai/code/session_01EFk8Enntgip8z3nqk1ppkA
Merge the separate MCP server and proxy into one server:
  - fetcharoo mcp serve              (standalone)
  - fetcharoo mcp serve --upstream X (with caching proxy)

When --upstream is provided, upstream_call, upstream_refresh,
cache_status, and cache_clear tools are added alongside the
existing PDF + snapshot tools. No separate proxy command needed.

https://claude.ai/code/session_01EFk8Enntgip8z3nqk1ppkA
@MALathon MALathon merged commit 6b18941 into main Apr 8, 2026
3 of 4 checks passed
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 8, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants