Fix dependency alerts + 6 major enhancements#23
Merged
Conversation
Bump certifi (2023.5.7 → 2026.2.25), idna (3.4 → 3.11), and urllib3 (2.0.2 → 2.6.3) to resolve known CVEs including CVE-2023-37920, CVE-2024-39689, CVE-2024-3651, CVE-2023-43804, CVE-2023-45803, and CVE-2024-37891. https://claude.ai/code/session_01EFk8Enntgip8z3nqk1ppkA
Bump charset-normalizer (3.1.0 → 3.4.7), requests (2.32.5 → 2.33.1), soupsieve (2.4.1 → 2.8.3), pymupdf (1.26.7 → 1.27.2.2), packaging (23.1 → 26.0), and others to resolve remaining dependency alerts. https://claude.ai/code/session_01EFk8Enntgip8z3nqk1ppkA
…MCP server, site schemas
Enhancement 1 - Concurrent Downloads:
- New async_downloader.py with ThreadPoolExecutor-based parallel downloads
- Thread-safe rate limiter shared across workers
- --concurrent and --max-workers CLI flags
- Backward compatible: sequential remains default
Enhancement 2 - Persistent Document Catalog:
- New catalog.py with SQLite-backed DocumentCatalog
- Content-hash-based change detection and cross-URL deduplication
- Run history tracking with diff summaries
- Export as JSON/CSV, search by URL/filename
- CLI: fetcharoo catalog {show|export|search|runs|duplicates}
Enhancement 3 - Watch Mode:
- New watcher.py and notifications.py
- One-shot diff: fetcharoo diff <url> (cron-friendly)
- Continuous watch: fetcharoo watch <url> --interval 3600
- Notifications: stdout, JSON, webhook, shell command
- Git-like diff output: + new, ~ changed, - removed
Enhancement 4 - MCP Server:
- New mcp_server.py exposing stateful tools via FastMCP
- Tools: discover_pdfs, download_pdfs, catalog_query, catalog_diff,
catalog_search, get_document_metadata, find_duplicate_documents
- Optional dependency: pip install fetcharoo[mcp]
- CLI: fetcharoo mcp serve
Enhancement 5 - Community Site Schemas:
- 5 built-in schemas: arxiv, ietf_rfc, sec_edgar, w3c, federal_register
- Auto-detection: --schema auto matches URL to schema
- find_schema() and list_schemas() API
- CLI: fetcharoo schemas {list|match}
All 337 tests pass (276 existing + 61 new).
https://claude.ai/code/session_01EFk8Enntgip8z3nqk1ppkA
Rewrite README to document the 5 new enhancements: concurrent downloads, persistent document catalog, watch mode, MCP server, and community site schemas. Includes CLI examples, Python API usage, and MCP server configuration. https://claude.ai/code/session_01EFk8Enntgip8z3nqk1ppkA
MCP Caching Proxy (mcp_proxy.py):
- Wraps any upstream MCP server as a caching layer (Redis for MCP)
- SQLite-backed ToolCache with TTL-based freshness
- Content-hash change detection across cached calls
- Meta-tools: _proxy_call, _cache_status, _cache_history,
_cache_refresh, _cache_clear
- CLI: fetcharoo proxy --server "npx trial-guide" --ttl 3600
Snapshot Monitoring (mcp_monitor.py):
- SnapshotStore for tracking MCP tool outputs over time
- Content-hash diffing: new/changed/removed/unchanged records
- Works with any data source (MCP servers, REST APIs, files)
- Nested field extraction via dot notation for record IDs
- CLI: fetcharoo monitor {snapshot|sources|history|search}
Clinical Trials Preset (presets/clinical_trials.py):
- Pre-configured for ClinicalTrials.gov API v2 data model
- NCT ID extraction, human-readable formatting
- Works with trial-guide and other clinical trials MCP servers
MCP Server updates:
- Added snapshot_monitor, snapshot_query, snapshot_history,
snapshot_sources, snapshot_search tools
- AI agents get persistent change tracking for any data
All 372 tests pass (337 existing + 35 new).
https://claude.ai/code/session_01EFk8Enntgip8z3nqk1ppkA
Merge the separate MCP server and proxy into one server: - fetcharoo mcp serve (standalone) - fetcharoo mcp serve --upstream X (with caching proxy) When --upstream is provided, upstream_call, upstream_refresh, cache_status, and cache_clear tools are added alongside the existing PDF + snapshot tools. No separate proxy command needed. https://claude.ai/code/session_01EFk8Enntgip8z3nqk1ppkA
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--concurrent --max-workers N)fetcharoo diff) and continuous monitoring (fetcharoo watch) with stdout/JSON/webhook/command notificationsfetcharoo mcp serve --upstream "npx trial-guide" --ttl 3600Test plan
mcppackage installedhttps://claude.ai/code/session_01EFk8Enntgip8z3nqk1ppkA