Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,11 @@ Versions follow [Semantic Versioning](https://semver.org/).
(Priority 2) → built-in defaults (Priority 3). If both files exist,
`zenzic.toml` wins unconditionally.

- **`plugins` config key** (`zenzic.toml` / `[tool.zenzic]`) —
`ZenzicConfig.plugins` now exposes an explicit allow-list of external
rule plugin entry-point names to activate during scanning. Core rules
remain always enabled.

- **`scan_docs_references` `verbose` flag** — new keyword-only parameter
`verbose: bool = False`. When `True`, prints a one-line performance
telemetry summary to stderr after the scan: engine mode (Sequential or
Expand Down Expand Up @@ -143,6 +148,11 @@ Versions follow [Semantic Versioning](https://semver.org/).

---

## 0.4.x (abandoned)

This release cycle was exploratory and included multiple breaking changes.
It has been superseded by the 0.5.x stabilization cycle.

## [0.4.0-rc4] — 2026-04-01 — Ghost Route Support, VSM Rule Engine & Content-Addressable Cache

## [0.4.0-rc5] — 2026-04-01 — The Sync Sprint: Zensical v0.0.31+ & Parallel API
Expand Down
4 changes: 4 additions & 0 deletions README.it.md
Original file line number Diff line number Diff line change
Expand Up @@ -262,6 +262,10 @@ non segnalare mai i file tradotti come orfani.
> Il changelog è ora mantenuto in un unico file inglese (`CHANGELOG.md`).
> Questa scelta segue gli standard dell'ecosistema Python open source:
> la cronologia delle versioni è documentazione tecnica, non interfaccia utente.
>
> Nota sul ciclo release: la linea `0.4.x` è stata abbandonata (fase
> esplorativa con breaking changes multipli); la linea attiva di
> stabilizzazione è `0.5.x`.

---

Expand Down
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,9 @@ absolute links are a hard error, and if you declare `engine = "zensical"` you mu
- **`PluginContractError`**: new exception for rule contract violations.
- **Plugin documentation**: `docs/developers/plugins.md` (EN + IT) — full
contract, packaging instructions, and `pyproject.toml` registration examples.
- **Release-track clarification**: the 0.4.x cycle is considered abandoned
(exploratory with repeated breaking changes); 0.5.x is the active
stabilization line.

---

Expand Down
9 changes: 9 additions & 0 deletions docs/about/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,4 +47,13 @@ Built by [PythonWoods](https://github.com/PythonWoods), it is designed to run in

[:lucide-arrow-right: Open](https://github.com/PythonWoods/zenzic)

- :lucide-history:   __Changelog__

---

Full release history and current release track policy.
The 0.4.x line is abandoned; 0.5.x is the active stabilization cycle.

[:lucide-arrow-right: Read](https://github.com/PythonWoods/zenzic/blob/main/CHANGELOG.md)

</div>
9 changes: 9 additions & 0 deletions docs/it/about/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,4 +47,13 @@ Sviluppato da [PythonWoods](https://github.com/PythonWoods), è progettato per e

[:lucide-arrow-right: Apri](https://github.com/PythonWoods/zenzic)

- :lucide-history: &nbsp; __Changelog__

---

Storico completo delle release e policy della linea attiva.
La linea 0.4.x e stata abbandonata; la stabilizzazione attiva e 0.5.x.
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Italian copy: "La linea 0.4.x e stata abbandonata; la stabilizzazione attiva e 0.5.x." is missing accents/grammar (e.g., "è stata", "è 0.5.x" / "è la 0.5.x"). Please correct to avoid typos in published docs.

Suggested change
La linea 0.4.x e stata abbandonata; la stabilizzazione attiva e 0.5.x.
La linea 0.4.x è stata abbandonata; la stabilizzazione attiva è la 0.5.x.

Copilot uses AI. Check for mistakes.

[:lucide-arrow-right: Leggi](https://github.com/PythonWoods/zenzic/blob/main/CHANGELOG.md)

</div>
141 changes: 116 additions & 25 deletions src/zenzic/core/rules.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,8 @@


if TYPE_CHECKING:
from importlib.metadata import EntryPoint

from zenzic.models.vsm import VSM, Route


Expand Down Expand Up @@ -684,10 +686,7 @@ def _to_canonical_url(href: str) -> str | None:
# ─── Plugin discovery ─────────────────────────────────────────────────────────


from dataclasses import dataclass as _dc # noqa: E402 — module-level, after all classes


@_dc
@dataclass(slots=True)
class PluginRuleInfo:
"""Metadata about a discovered plugin rule.

Expand All @@ -705,6 +704,118 @@ class PluginRuleInfo:
origin: str


class PluginRegistry:
"""Registry wrapper around ``importlib.metadata`` entry-points.

Provides read-only discovery for the CLI and explicit rule loading for the
scanner. Discovery is best-effort; loading configured plugins is strict.
"""

def __init__(self, group: str = "zenzic.rules") -> None:
self._group = group

def _entry_points(self) -> list[EntryPoint]:
"""Return sorted entry-points for the configured group."""
from importlib.metadata import entry_points

return sorted(entry_points(group=self._group), key=lambda ep: ep.name)

def list_rules(self) -> list[PluginRuleInfo]:
"""Discover all plugin rules as metadata for CLI inspection."""
results: list[PluginRuleInfo] = []
for ep in self._entry_points():
try:
cls = ep.load()
instance = cls()
if not isinstance(instance, BaseRule):
continue
except Exception: # noqa: BLE001
continue
dist_name = ep.dist.name if ep.dist is not None else "zenzic"
results.append(
PluginRuleInfo(
rule_id=instance.rule_id,
class_name=f"{cls.__module__}.{cls.__qualname__}",
source=ep.name,
origin=dist_name,
)
)
if not any(r.source == "broken-links" for r in results):
results.append(
PluginRuleInfo(
rule_id=VSMBrokenLinkRule().rule_id,
class_name=f"{VSMBrokenLinkRule.__module__}.{VSMBrokenLinkRule.__qualname__}",
source="broken-links",
origin="zenzic",
)
)
# Keep ordering deterministic regardless of fallback insertion order.
results.sort(key=lambda r: r.source)
return results

def load_core_rules(self) -> list[BaseRule]:
"""Load core rules registered by the ``zenzic`` distribution."""
core_eps = [
ep for ep in self._entry_points() if ep.dist is not None and ep.dist.name == "zenzic"
]
loaded = [self._load_entry_point(ep) for ep in core_eps]
if not any(rule.rule_id == "Z001" for rule in loaded):
loaded.append(VSMBrokenLinkRule())
return loaded

def load_selected_rules(self, plugin_ids: Sequence[str]) -> list[BaseRule]:
"""Load only the configured plugin IDs from the entry-point group.

Args:
plugin_ids: Entry-point names declared in ``config.plugins``.

Raises:
PluginContractError: If a configured plugin is missing or invalid.
"""
from zenzic.core.exceptions import PluginContractError # deferred: avoid circular import

requested = [pid.strip() for pid in plugin_ids if pid.strip()]
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_selected_rules() preserves duplicates from plugin_ids (e.g., plugins = ["acme", "acme"]), which will load/instantiate the same entry point multiple times and can introduce unnecessary overhead or duplicated side effects. Consider de-duplicating requested while preserving order before resolving/loading entry points.

Suggested change
requested = [pid.strip() for pid in plugin_ids if pid.strip()]
requested: list[str] = []
seen: set[str] = set()
for pid in plugin_ids:
cleaned = pid.strip()
if cleaned and cleaned not in seen:
seen.add(cleaned)
requested.append(cleaned)

Copilot uses AI. Check for mistakes.
if not requested:
return []

eps_by_name = {ep.name: ep for ep in self._entry_points()}
if "broken-links" in requested and "broken-links" not in eps_by_name:
requested = [pid for pid in requested if pid != "broken-links"]
return [VSMBrokenLinkRule(), *self.load_selected_rules(requested)]

missing = sorted(set(requested) - set(eps_by_name))
if missing:
raise PluginContractError(
"Configured plugin rule IDs were not found in the 'zenzic.rules' "
f"entry-point group: {', '.join(missing)}"
)

loaded: list[BaseRule] = []
for pid in requested:
loaded.append(self._load_entry_point(eps_by_name[pid]))
return loaded

@staticmethod
def _load_entry_point(ep: EntryPoint) -> BaseRule:
"""Load and instantiate one entry-point as a :class:`BaseRule`."""
from zenzic.core.exceptions import PluginContractError # deferred: avoid circular import

try:
cls = ep.load()
instance = cls()
except Exception as exc: # noqa: BLE001
raise PluginContractError(
f"Failed to load plugin rule '{ep.name}': {type(exc).__name__}: {exc}"
) from exc

if not isinstance(instance, BaseRule):
raise PluginContractError(
f"Plugin rule '{ep.name}' must instantiate a BaseRule, got "
f"{type(instance).__qualname__}."
)
return instance


def list_plugin_rules() -> list[PluginRuleInfo]:
"""Return metadata for every rule registered in the ``zenzic.rules`` group.

Expand All @@ -717,24 +828,4 @@ def list_plugin_rules() -> list[PluginRuleInfo]:
Returns:
Sorted list of :class:`PluginRuleInfo`, ordered by ``source`` name.
"""
from importlib.metadata import entry_points

results: list[PluginRuleInfo] = []
eps = entry_points(group="zenzic.rules")
for ep in eps:
try:
cls = ep.load()
instance: BaseRule = cls()
rid = instance.rule_id
except Exception: # noqa: BLE001
continue
dist_name = ep.dist.name if ep.dist is not None else "zenzic"
results.append(
PluginRuleInfo(
rule_id=rid,
class_name=f"{cls.__module__}.{cls.__qualname__}",
source=ep.name,
origin=dist_name,
)
)
return sorted(results, key=lambda r: r.source)
return PluginRegistry().list_rules()
64 changes: 47 additions & 17 deletions src/zenzic/core/scanner.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
from urllib.parse import unquote

from zenzic.core.adapter import get_adapter
from zenzic.core.rules import AdaptiveRuleEngine
from zenzic.core.rules import AdaptiveRuleEngine, BaseRule
from zenzic.core.shield import SecurityFinding, scan_line_for_secrets, scan_url_for_secrets
from zenzic.core.validator import LinkValidator
from zenzic.models.config import ZenzicConfig
Expand Down Expand Up @@ -743,23 +743,47 @@ def _iter_md_files(
def _build_rule_engine(config: ZenzicConfig) -> AdaptiveRuleEngine | None:
"""Construct a :class:`~zenzic.core.rules.AdaptiveRuleEngine` from the config.

Returns ``None`` when no custom rules are configured, avoiding the
overhead of engine construction on projects that do not use the feature.
Load order is deterministic:

1. Core rules registered by Zenzic itself (always enabled).
2. Regex rules from ``[[custom_rules]]``.
3. External plugin rules explicitly listed in ``plugins = [...]``.

Returns ``None`` when no rules are available.
"""
from zenzic.core.rules import CustomRule # deferred to keep import graph clean
from zenzic.core.rules import CustomRule, PluginRegistry # deferred to keep import graph clean

if not config.custom_rules:
# In this per-file pipeline, core VSM-only rules are no-op. Avoid building
# an engine (and avoid extra read_text calls) when no effective rules exist.
if not config.custom_rules and not config.plugins:
return None
rules = [

registry = PluginRegistry()
rules = registry.load_core_rules()
rules.extend(
CustomRule(
id=cr.id,
pattern=cr.pattern,
message=cr.message,
severity=cr.severity,
)
for cr in config.custom_rules
]
return AdaptiveRuleEngine(rules)
)
rules.extend(registry.load_selected_rules(config.plugins))

# Deduplicate by rule_id while preserving declaration priority.
deduped: list[BaseRule] = []
seen: set[str] = set()
for rule in rules:
rid = rule.rule_id
if rid in seen:
continue
seen.add(rid)
deduped.append(rule)

if not deduped:
return None
return AdaptiveRuleEngine(deduped)


def _emit_telemetry(*, mode: str, workers: int, n_files: int, elapsed: float) -> None:
Expand Down Expand Up @@ -826,8 +850,8 @@ def scan_docs_references(

The threshold default (50 files) is a conservative heuristic: below it,
``ProcessPoolExecutor`` spawn overhead (~200–400 ms on a cold interpreter)
exceeds the parallelism benefit. Override with ``workers=N`` to force a
specific pool size regardless of file count.
exceeds the parallelism benefit. Override with ``workers=N`` to select a
specific pool size when parallel mode is active.

**Determinism guarantee:** results are always sorted by ``file_path``
regardless of execution mode.
Expand All @@ -836,9 +860,13 @@ def scan_docs_references(
sequential mode. Files with security findings are excluded from link
validation in both modes.

**O(N) reads:** each file is read exactly once in sequential mode. In
parallel mode external URL registration runs a lightweight sequential pass
in the main process after workers complete (workers discard scanners).
**Read behaviour:** total I/O remains :math:`O(N)` in the number of files,
but individual files may be read multiple times. In sequential mode the
scanner typically performs separate Shield and content passes, and some
rules may trigger an additional ``read_text()`` call. In parallel mode the
same per-worker behaviour applies; when ``validate_links=True`` an extra
lightweight sequential pass in the main process registers external URLs
after workers complete (workers discard scanners).

Args:
repo_root: Repository root (must contain ``docs/``).
Expand All @@ -849,9 +877,8 @@ def scan_docs_references(
workers: Number of worker processes for parallel mode.
``1`` (default) always uses sequential execution.
``None`` lets ``ProcessPoolExecutor`` pick based on
``os.cpu_count()``. Any value other than ``1``
activates parallel mode when the file count is at or
above :data:`ADAPTIVE_PARALLEL_THRESHOLD`.
``os.cpu_count()``. Values must be ``None`` or
greater than or equal to ``1``.
verbose: When ``True``, print a single telemetry line to stderr
after the scan completes. Shows the engine mode, worker
count, elapsed time, and estimated speedup (parallel
Expand All @@ -867,6 +894,9 @@ def scan_docs_references(
"""
import time

if workers is not None and workers < 1:
raise ValueError("workers must be None or an integer >= 1")

if config is None:
config, _ = ZenzicConfig.load(repo_root)

Expand Down Expand Up @@ -912,7 +942,7 @@ def scan_docs_references(
# Shield-as-firewall guarantee (no URLs from compromised files).
secure_scanners_b: list[ReferenceScanner] = []
for md_file in md_files:
_report_b, secure_scanner_b = _scan_single_file(md_file, config, rule_engine)
_report_b, secure_scanner_b = _scan_single_file(md_file, config, None)
if secure_scanner_b is not None:
secure_scanners_b.append(secure_scanner_b)
validator_b = LinkValidator()
Expand Down
8 changes: 8 additions & 0 deletions src/zenzic/models/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,14 @@ class ZenzicConfig(BaseModel):
"message='Remove before publish.' severity='warning'"
),
)
plugins: list[str] = Field(
default_factory=list,
description=(
"Explicit allow-list of external rule plugins to activate from the "
"'zenzic.rules' entry-point group. Core rules shipped by Zenzic are "
"always enabled."
),
)
# Pre-compiled regex patterns for placeholder detection.
# Populated automatically from placeholder_patterns in model_post_init.
# Excluded from serialisation — never written to or read from TOML.
Expand Down
Loading
Loading