v0.5.0a1 The Sentinel: plugin-enabled scanning, adaptive engine hardening, and preflight green#21
v0.5.0a1 The Sentinel: plugin-enabled scanning, adaptive engine hardening, and preflight green#21PythonWoods-Dev wants to merge 1 commit intomainfrom
Conversation
Body commit: enable plugin-driven scanning with safe-harbor allowlist via plugins config keep core rules always active with robust fallback when entry points are unavailable optimize parallel Phase B by skipping rule engine re-run during link collection validate workers input early with clear error messages and aligned docstrings add and update tests for plugin loading, config parsing, and worker validation fix mypy typing issue in scanner to pass nox preflight verify examples matrix: positive fixtures pass, negative security/broken fixtures fail as expected
There was a problem hiding this comment.
Pull request overview
This PR finalizes the v0.5.0a1 “Sentinel” milestone by introducing a plugin-enabled rule system (with an explicit allowlist), unifying sequential/parallel scanning into a single adaptive entry point, and extending configuration loading to support [tool.zenzic] in pyproject.toml.
Changes:
- Replaced the legacy
RuleEngineand separate scan entry points withAdaptiveRuleEngineand a unifiedscan_docs_references(...) -> (reports, link_errors)API (adaptive sequential/parallel + optional link validation + telemetry). - Added plugin discovery/loading via the
zenzic.rulesentry-point group, plusplugins = [...]allowlisting in config and a newzenzic plugins listCLI command. - Added
pyproject.tomlconfig fallback ([tool.zenzic]) and updated docs/tests/changelog for the new behavior and breaking API changes.
Reviewed changes
Copilot reviewed 33 out of 34 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| uv.lock | Bumps project version and constrains httpx to <1.0. |
| tests/test_rules.py | Updates rule engine tests for AdaptiveRuleEngine, adds plugin/contract coverage. |
| tests/test_references.py | Updates tests for new scan_docs_references tuple return and unified link-validation path. |
| tests/test_parallel.py | Migrates parallel tests to unified adaptive API; adds workers validation coverage. |
| tests/test_integration_finale.py | Adds integration coverage for plugin listing CLI and telemetry emission. |
| tests/test_config.py | Adds config parsing tests for plugins and pyproject.toml support. |
| tests/test_cli.py | Updates CLI tests to patch the unified scan entry point. |
| src/zenzic/models/references.py | Updates documentation text to reference AdaptiveRuleEngine. |
| src/zenzic/models/config.py | Adds plugins field and implements pyproject.toml fallback loading. |
| src/zenzic/main.py | Registers new plugins Typer sub-app. |
| src/zenzic/core/scanner.py | Unifies scan entry point, adds adaptive parallel mode, telemetry, plugin-aware rule engine construction. |
| src/zenzic/core/rules.py | Introduces AdaptiveRuleEngine, eager pickle validation, and plugin registry/discovery helpers. |
| src/zenzic/core/exceptions.py | Adds PluginContractError. |
| src/zenzic/cli.py | Switches CLI to unified scan API; adds zenzic plugins list command. |
| src/zenzic/init.py | Version bump to 0.5.0a1. |
| README.md | Updates highlights and configuration-loading documentation for v0.5.0a1. |
| README.it.md | Mirrors the v0.5.0a1 highlights and release-track messaging (Italian). |
| pyproject.toml | Version bump; adds zenzic.rules entry point for broken-links; constrains httpx. |
| docs/usage/commands.md | Documents zenzic plugins list. |
| docs/usage/advanced.md | Updates programmatic API docs for unified scan entry point + hybrid adaptive engine. |
| docs/it/usage/commands.md | Documents zenzic plugins list (Italian). |
| docs/it/usage/advanced.md | Updates advanced usage docs for unified scan entry point (Italian). |
| docs/it/developers/plugins.md | Adds Italian plugin-authoring contract documentation. |
| docs/it/developers/index.md | Adds link to plugin-authoring docs (Italian). |
| docs/it/configuration/index.md | Documents config priority chain incl. pyproject.toml fallback (Italian). |
| docs/it/architecture.md | Documents hybrid adaptive engine behavior/diagram (Italian). |
| docs/it/about/index.md | Adds changelog link and release-track messaging (Italian). |
| docs/developers/plugins.md | Adds plugin-authoring contract documentation. |
| docs/developers/index.md | Adds link to plugin-authoring docs. |
| docs/configuration/index.md | Documents config priority chain incl. pyproject.toml fallback. |
| docs/architecture.md | Replaces old parallel-scan section with hybrid adaptive engine behavior/diagram. |
| docs/about/index.md | Adds changelog link and release-track messaging. |
| CHANGELOG.md | Adds v0.5.0a1 entry, including breaking change notes and new features. |
| CHANGELOG.it.md | Removes the Italian changelog file. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| source="broken-links", | ||
| origin="zenzic", | ||
| ) | ||
| ) |
There was a problem hiding this comment.
PluginRegistry.list_rules() documents/assumes results are sorted by entry-point name (and tests assert this), but when the core fallback "broken-links" is appended it is not re-sorted. If entry-points are discoverable but Zenzic’s own distribution metadata isn’t (e.g., running from source while third-party plugins are installed), this will return an unsorted list and make CLI output / API ordering non-deterministic. Consider sorting again before returning (or inserting the fallback in order).
| ) | |
| ) | |
| # Ensure deterministic ordering: sorted by source name, as documented. | |
| results.sort(key=lambda r: r.source) |
| def _build_rule_engine(config: ZenzicConfig) -> AdaptiveRuleEngine | None: | ||
| """Construct a :class:`~zenzic.core.rules.AdaptiveRuleEngine` from the config. | ||
|
|
||
| Returns ``None`` when no custom rules are configured, avoiding the | ||
| overhead of engine construction on projects that do not use the feature. | ||
| Load order is deterministic: | ||
|
|
||
| 1. Core rules registered by Zenzic itself (always enabled). | ||
| 2. Regex rules from ``[[custom_rules]]``. | ||
| 3. External plugin rules explicitly listed in ``plugins = [...]``. | ||
|
|
||
| Returns ``None`` when no rules are available. | ||
| """ | ||
| from zenzic.core.rules import CustomRule # deferred to keep import graph clean | ||
| from zenzic.core.rules import CustomRule, PluginRegistry # deferred to keep import graph clean | ||
|
|
||
| if not config.custom_rules: | ||
| return None | ||
| rules = [ | ||
| registry = PluginRegistry() | ||
| rules = registry.load_core_rules() | ||
| rules.extend( | ||
| CustomRule( | ||
| id=cr.id, | ||
| pattern=cr.pattern, | ||
| message=cr.message, | ||
| severity=cr.severity, | ||
| ) | ||
| for cr in config.custom_rules | ||
| ] | ||
| return RuleEngine(rules) | ||
|
|
||
| ) | ||
| rules.extend(registry.load_selected_rules(config.plugins)) |
There was a problem hiding this comment.
_build_rule_engine() always loads core rules, and currently the only core rule is VSMBrokenLinkRule whose check() is a no-op (it only works via check_vsm). With no custom_rules/plugins configured, this still constructs an AdaptiveRuleEngine and causes _scan_single_file() to do an extra md_file.read_text() + engine.run() for every file, but it can never produce any findings. Consider returning None when there are no non-VSM rules to run in this pipeline (e.g., when config.custom_rules and config.plugins are empty), or alternatively running VSM-aware rules via run_vsm in a VSM-capable pipeline so core rules provide value.
| **O(N) reads:** each file is read exactly once in sequential mode. In | ||
| parallel mode external URL registration runs a lightweight sequential pass | ||
| in the main process after workers complete (workers discard scanners). |
There was a problem hiding this comment.
scan_docs_references() docstring claims each file is read exactly once in sequential mode, but ReferenceScanner.harvest() already performs two independent passes over the file (Shield stream + content stream), and rule execution can add an additional read_text() pass. This docstring should be updated to reflect the actual read behaviour so users can reason about performance accurately (especially for the parallel+validate_links path which adds another scan pass).
| **O(N) reads:** each file is read exactly once in sequential mode. In | |
| parallel mode external URL registration runs a lightweight sequential pass | |
| in the main process after workers complete (workers discard scanners). | |
| **Read behaviour:** total I/O remains :math:`O(N)` in the number of files, | |
| but individual files may be read multiple times. In sequential mode the | |
| scanner typically performs separate Shield and content passes, and some | |
| rules may trigger an additional ``read_text()`` call. In parallel mode the | |
| same per-worker behaviour applies; when ``validate_links=True`` an extra | |
| lightweight sequential pass in the main process registers external URLs | |
| after workers complete (workers discard scanners). |
|
superseded by release/v0.5.0a1-final-v2 |
Summary
This PR finalizes the Sentinel milestone for v0.5.0a1 by closing the remaining blocker and stabilizing runtime, performance, and release quality gates.
What was delivered
Plugin-enabled scanning (Safe Harbor fully implemented)
Added explicit plugin allowlist support in project configuration.
Rule loading is now deterministic:
Core rules are always active.
Custom regex rules are loaded from configuration.
External plugin rules are loaded only when explicitly listed.
Added robust core fallback behavior for environments where entry-point metadata is not available.
Parallel Phase B optimization
In the parallel + validate links path, Phase B now performs link collection without re-running rule checks.
This removes redundant compute and reduces overhead on larger documentation trees.
Robustness and UX improvements
Added clear fail-fast validation for worker values.
Updated scanner documentation text to match actual execution behavior and constraints.
Fixed final typing issue detected by mypy in preflight.
Quality and release readiness
Full preflight pipeline passes.
Test suite passes with coverage above required threshold.
Examples were verified end-to-end and classified by expected behavior.
Changelog and release-track messaging were aligned:
0.4.x documented as abandoned exploratory line.
0.5.x documented as active stabilization line.
Configuration and behavior impact
New plugin allowlist behavior enables strict user control over third-party rule activation.
Existing projects without plugin configuration continue to work with core rules enabled.
Invalid worker values now return immediate, clear validation errors.
Validation performed
nox preflight: pass
ruff check and format check: pass
mypy: pass
pytest: pass
coverage threshold: pass
Examples verification matrix
Example Result Expected Notes
examples/broken-docs Fail Fail Negative fixture with intentionally broken and unsafe links
examples/i18n-standard Pass Pass Healthy i18n fixture
examples/mkdocs-basic Pass Pass Healthy MkDocs baseline fixture
examples/security_lab Fail Fail Security fixture with traversal and absolute-path violations
examples/vanilla Pass Pass Healthy engine-agnostic fixture
examples/zensical-basic Pass Pass Healthy Zensical fixture
Examples status summary:
Expected Pass confirmed: 4 out of 4
Expected Fail confirmed: 2 out of 2
Why this PR matters
This change set converts the Sentinel release promises into executable behavior:
Safe Harbor is real, not only documented.
Parallel scanning is leaner under link validation.
Error handling is clearer for users and CI operators.
Release gates are green with verified fixture behavior across examples.