-
Notifications
You must be signed in to change notification settings - Fork 0
Add robots.txt checking option #1
Copy link
Copy link
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Feature Request
Currently markgrab does not check robots.txt before fetching. Add an optional respect_robots=True parameter that:
- Fetches and parses
robots.txtfor the target domain - Checks if the URL path is allowed for the configured user agent
- Raises
RobotsDisallowedor silently skips if disallowed
This should be opt-in (default False) to maintain backward compatibility.
Motivation
Legal compliance for production deployments. Currently documented in Disclaimer but not enforced.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request