diff --git a/CHANGELOG.it.md b/CHANGELOG.it.md deleted file mode 100644 index fcdb077..0000000 --- a/CHANGELOG.it.md +++ /dev/null @@ -1,719 +0,0 @@ - - -# Changelog - -Tutti i cambiamenti degni di nota a questo progetto sono documentati in questo file. -Il formato si basa su [Keep a Changelog][keep-a-changelog]. -Le versioni seguono il [Versionamento Semantico][semver]. - ---- - -## [Non rilasciato] - ---- - -## [0.4.0-rc5] — 2026-04-01 — Sync Sprint: Zensical v0.0.31+ e Parallelismo API - -> **Sprint 10.** `ZensicalAdapter` viene sincronizzato con lo schema ufficiale -> Zensical v0.0.31+ (`[project].nav`). Il parsing nav ora supporta tutte le varianti -> (stringa, pagina con titolo, sezione annidata). La classificazione route diventa -> nav-aware (`ORPHAN_BUT_EXISTING` quando una pagina esiste ma non e elencata nella nav -> esplicita). `map_url()` rispetta `use_directory_urls = false`. La documentazione viene -> allineata in EN/IT e nasce `examples/zensical-basic/` come riferimento canonico. - -### Aggiunto - -- **Nuovo esempio `examples/zensical-basic/`** — progetto minimo completo con: - `zensical.toml` in formato v0.0.31+ (`[project]`), `zenzic.toml` con - `engine = "zensical"`, nav annidata e link relativi puliti. - -- **Sezioni documentali sul parallelismo (rc5)** — aggiunte in - `docs/architecture.md` e `docs/usage/advanced.md`: modello shared-nothing con - `ProcessPoolExecutor`, soglia pratica di convenienza (~200 file), vincoli tecnici - sulla picklability delle regole custom. - -- **Sezioni di coesistenza adapter (EN/IT)** — aggiornati - `docs/configuration/adapters-config.md` e `docs/it/configuration/adapters-config.md` - con la logica quando coesistono `mkdocs.yml` e `zensical.toml`: prevale sempre - `build_context.engine` (nessun auto-switch silenzioso). - -### Modificato - -- **`ZensicalAdapter`** — inizializzazione arricchita con stato nav pre-calcolato: - `_nav_paths`, `_has_explicit_nav`, `_use_directory_urls`. - -- **`get_nav_paths()`** — ora legge correttamente da `[project].nav` e supporta nav - ricorsiva (stringhe, dict titolo->pagina, dict titolo->lista annidata). - -- **`classify_route()`** — ora restituisce `ORPHAN_BUT_EXISTING` quando e presente - una nav esplicita e il file non compare nella nav. - -- **`tests/sandboxes/zensical/zensical.toml`** — migrato al formato `[project]` - con nav esplicita. - -- **Guide migrazione EN/IT** — aggiornati gli snippet `zensical.toml` in - `docs/guide/migration.md` e `docs/it/guide/migration.md` al formato v0.0.31+. - -### Corretto - -- **Compatibilita schema nav Zensical** — rimossa la dipendenza dal formato legacy - `[nav].nav` con coppie `{title, file}`; ora aderiamo allo schema ufficiale - `[project].nav`. - ---- - -## [0.4.0-rc4] — 2026-03-31 — Virtual Site Map, UNREACHABLE_LINK e Rilevamento Collisioni di Routing - -> **Sprint 8.** Zenzic acquisisce l'emulazione del motore di build: la Virtual Site Map (VSM) -> proietta ogni file sorgente al suo URL canonico prima che la build venga eseguita. I link a -> pagine che esistono sul disco ma sono assenti dalla nav vengono ora rilevati come -> `UNREACHABLE_LINK`. Le collisioni di routing (es. `index.md` + `README.md` nella stessa -> directory) vengono segnalate come `CONFLICT`. I percorsi della documentazione unificati sotto -> `/guide/`. Terminologia allineata a "successore compatibile". - -### Aggiunto - -- **Virtual Site Map (VSM)** — nuovo modulo `zenzic.models.vsm` che introduce il dataclass - `Route` (`url`, `source`, `status`, `anchors`, `aliases`) e `build_vsm()`, una funzione - senza I/O che proietta ogni file sorgente `.md` al suo URL canonico e stato di routing - (`REACHABLE`, `ORPHAN_BUT_EXISTING`, `IGNORED`, `CONFLICT`) usando l'adapter del motore - di build attivo. Zenzic emula ora il router del sito senza eseguire la build. - -- **Rilevamento `UNREACHABLE_LINK`** — `validate_links_async` ora incrocia ogni link - interno risolto con successo con la VSM. Un link a un file che esiste sul disco ma non è - elencato nella `nav:` di MkDocs emette `UNREACHABLE_LINK`, intercettando la classe di - 404 che i controlli di esistenza dei file tradizionali non vedono. Disabilitato - automaticamente per `VanillaAdapter` e `ZensicalAdapter` (routing solo da filesystem). - -- **Rilevamento collisioni di routing** — `_detect_collisions()` in `vsm.py` marca come - `CONFLICT` due file sorgente che mappano allo stesso URL canonico. Il caso più comune — - `index.md` e `README.md` nella stessa directory (Double Index) — viene gestito senza - logica speciale: entrambi producono lo stesso URL e vengono quindi intercettati - automaticamente. - -- **Metodi adapter `map_url()` e `classify_route()`** — aggiunti al Protocol `BaseAdapter`, - `MkDocsAdapter`, `ZensicalAdapter` e `VanillaAdapter`. `map_url(rel)` applica la - mappatura URL fisica → virtuale specifica del motore; `classify_route(rel, nav_paths)` - restituisce lo stato di routing per un dato file sorgente. - -### Modificato - -- **`MkDocsAdapter.classify_route`** — quando nessuna sezione `nav:` è dichiarata in - `mkdocs.yml`, tutti i file sono classificati come `REACHABLE` (rispecchia il - comportamento di auto-inclusione di MkDocs). `README.md` rimane `IGNORED` in ogni caso. - -- **Percorsi documentazione** — tutti i riferimenti al percorso obsoleto `/guides/` in - `RELEASE.md`, `RELEASE.it.md`, `CHANGELOG.md`, `CHANGELOG.it.md` e `README.it.md` - aggiornati alla root canonica `/guide/`. - -### Corretto - -- **Terminologia** — "Zensical è un superset di MkDocs" sostituito con "Zensical è un - successore compatibile di MkDocs" in tutta la documentazione e nelle voci di changelog. - ---- - -## [0.4.0-rc3] — 2026-03-29 — Fix i18n Ancore, Snippet Multilingua & Shield Deep-Scan - -> **Sprint 7.** Il gap di fallback i18n per `AnchorMissing` è chiuso. Codice morto eliminato. -> Utility condivisa per la rimappatura dei percorsi locale estratta. Visual Snippets per i -> rilevamenti delle regole custom. Documentazione usage suddivisa in tre pagine dedicate. -> Schema JSON stabilizzato a 7 chiavi. Validazione snippet multilingua (Python/YAML/JSON/TOML) -> e Shield deep-scan sull'intero file aggiunti. - -### Aggiunto - -- **Validazione snippet multilingua** — `check_snippet_content` valida ora i blocchi di codice - delimitati per quattro linguaggi usando parser puri in Python (nessun sottoprocesso): - `python`/`py` → `compile()`; `yaml`/`yml` → `yaml.safe_load()`; `json` → `json.loads()`; - `toml` → `tomllib.loads()`. I blocchi con tag di linguaggio non supportati (es. `bash`) vengono - silenziosamente saltati. `_extract_python_blocks` rinominato in `_extract_code_blocks`. - -- **Shield deep-scan — credenziali nei blocchi delimitati** — Lo scanner di credenziali opera - ora su ogni riga del file sorgente, incluse le righe nei blocchi di codice delimitati (con o - senza etichetta). In precedenza `_iter_content_lines` alimentava sia lo Shield che l'harvester - dei riferimenti, rendendo il contenuto nei fence invisibile allo Shield. Un nuovo generatore - `_skip_frontmatter` fornisce un flusso grezzo di righe (solo senza frontmatter); `harvest()` - esegue ora due pass indipendenti — Shield sul flusso grezzo, ref-def + alt-text sul flusso - filtrato dei contenuti. Link e definizioni di riferimento nei blocchi delimitati rimangono - ignorati per prevenire falsi positivi. - -- **Shield esteso a 7 famiglie di credenziali** — Aggiunte chiavi live Stripe - (`sk_live_[0-9a-zA-Z]{24}`), token Slack (`xox[baprs]-[0-9a-zA-Z]{10,48}`), chiavi API - Google (`AIza[0-9A-Za-z\-_]{35}`) e chiavi private PEM generiche - (`-----BEGIN [A-Z ]+ PRIVATE KEY-----`) in `core/shield.py`. - -- **Metodo `resolve_anchor()` nel protocollo `BaseAdapter`** — Nuovo metodo adapter che - restituisce `True` quando un anchor miss su un file locale deve essere soppresso perché - l'ancora esiste nel file equivalente della locale di default. Implementato in - `MkDocsAdapter`, `ZensicalAdapter` (tramite `remap_to_default_locale()`) e `VanillaAdapter` - (restituisce sempre `False`). - -- **`adapters/_utils.py` — utility pura `remap_to_default_locale()`** — Estrae la logica di - rimappatura dei percorsi locale che era duplicata indipendentemente in `resolve_asset()` e - `is_shadow_of_nav_page()` in entrambi gli adapter. Funzione pura: riceve - `(abs_path, docs_root, locale_dirs)`, restituisce il `Path` equivalente nella locale di - default o `None`. Nessun I/O. - -- **Visual Snippets per i rilevamenti `[[custom_rules]]`** — Le violazioni delle regole custom - mostrano ora la riga sorgente incriminata sotto l'intestazione del rilevamento, preceduta - dall'indicatore `│` nella colore della severity del rilevamento. I rilevamenti standard non - sono interessati. - -- **`strict` e `exit_zero` come campi di `zenzic.toml`** — Entrambi i flag sono ora campi - di prima classe in `ZenzicConfig` (tipo `bool | None`, sentinella `None` = non impostato). - I flag CLI sovrascrivono i valori TOML. Abilita default a livello di progetto. - -- **Schema output JSON — 7 chiavi stabili** — `--format json` emette: - `links`, `orphans`, `snippets`, `placeholders`, `unused_assets`, `references`, `nav_contract`. - -- **Suddivisione documentazione usage** — `docs/usage/index.md` suddivisa in tre pagine - dedicate: `usage/index.md` (install + workflow), `usage/commands.md` (riferimento CLI), - `usage/advanced.md` (pipeline tre-pass, Shield, API programmatica, multilingua). - Mirror italiani (`docs/it/usage/`) a piena parità. Nav `mkdocs.yml` aggiornata. - -### Risolto - -- **`AnchorMissing` non aveva la soppressione tramite fallback i18n** — Il ramo `AnchorMissing` - in `validate_links_async` riportava incondizionatamente. I link a intestazioni tradotte in - file locale generavano falsi positivi. Fix: il ramo `AnchorMissing` ora chiama - `adapter.resolve_anchor()`. Cinque nuovi test di integrazione in `TestI18nFallbackIntegration`. - -### Rimosso - -- **`_should_suppress_via_i18n_fallback()`** — Codice morto. Era definita in `validator.py` - ma non veniva mai chiamata. Rimossa permanentemente. -- **`I18nFallbackConfig` NamedTuple** — Struttura dati interna per la funzione eliminata. - Rimossa. -- **`_I18N_FALLBACK_DISABLED`** — Costante sentinella per la funzione eliminata. Rimossa. -- **`_extract_i18n_fallback_config()`** — Anch'essa codice morto. Era testata da - `TestI18nFallbackConfig` (6 test), anch'essa rimossa. Totale: ~118 righe da `validator.py`. - -### Test - -- 5 nuovi test di integrazione anchor fallback in `TestI18nFallbackIntegration`. -- `TestI18nFallbackConfig` (6 test per le funzioni eliminate) rimossa. -- 8 nuovi test di validazione snippet (YAML valido/non valido, alias `yml`, JSON valido/non - valido, accuratezza numero di riga JSON, TOML valido/non valido). -- 5 nuovi test Shield deep-scan: segreto in fence senza etichetta, segreto in fence `bash`, - segreto in fence senza creazione ref-def, blocco codice pulito senza findings. -- **446 test passano.** `nox preflight` — tutti i gate verdi: ruff ✓ mypy ✓ pytest ✓ - reuse ✓ mkdocs build --strict ✓ zenzic check all --strict ✓. - ---- - -## [0.4.0-rc2] — 2026-03-28 — Il Grande Disaccoppiamento - -> **Sprint 6.** Zenzic cessa di possedere i propri adapter. Gli adapter di terze parti si -> installano come pacchetti Python e vengono scoperti a runtime tramite entry-point. Il Core -> non importa più nessun adapter concreto. La documentazione viene promossa a knowledge base -> strutturata con piena parità i18n. - -### Aggiunto - -- **Scoperta Dinamica degli Adapter** (`_factory.py`) — `get_adapter()` non importa più - direttamente `MkDocsAdapter` o `ZensicalAdapter`. La factory interroga - `importlib.metadata.entry_points(group="zenzic.adapters")` a runtime. Installando un - pacchetto che si registra in questo gruppo, il suo adapter è immediatamente disponibile - come `--engine ` — nessun aggiornamento di Zenzic richiesto. Gli adapter built-in - (`mkdocs`, `zensical`, `vanilla`) sono registrati in `pyproject.toml`: - - ```toml - [project.entry-points."zenzic.adapters"] - mkdocs = "zenzic.core.adapters:MkDocsAdapter" - zensical = "zenzic.core.adapters:ZensicalAdapter" - vanilla = "zenzic.core.adapters:VanillaAdapter" - ``` - -- **Pattern classmethod `from_repo()`** — Gli adapter gestiscono il proprio caricamento della - configurazione e contratto di enforcement. La factory chiama - `AdapterClass.from_repo(context, docs_root, repo_root)` se presente. - -- **Metodo di protocollo `has_engine_config()`** (`BaseAdapter`) — Sostituisce il precedente - controllo `isinstance(adapter, VanillaAdapter)` in `scanner.py`. Lo scanner è ora - completamente disaccoppiato da tutti i tipi di adapter concreti. - -- **`list_adapter_engines() -> list[str]`** — Funzione pubblica che restituisce la lista - ordinata dei nomi degli engine adapter registrati. - -- **Flag `--engine ENGINE` su `check orphans` e `check all`** — Sovrascrive - `build_context.engine` per una singola esecuzione senza modificare `zenzic.toml`. I nomi - sconosciuti producono un errore amichevole con le scelte disponibili: - - ```text - ERROR: Unknown engine adapter 'hugo'. - Installed adapters: mkdocs, vanilla, zensical - Install a third-party adapter or choose from the list above. - ``` - -- **DSL `[[custom_rules]]`** — Regole lint specifiche del progetto in `zenzic.toml` come - array-of-tables TOML puro. Indipendenti dall'adapter: si attivano identicamente con - `mkdocs`, `zensical` e `vanilla`. Pattern compilati una volta al caricamento. Pattern - non validi sollevano `ConfigurationError` all'avvio. Vedi - [DSL Regole Custom](docs/configuration/custom-rules-dsl.md). - -- **Contratto di enforcement `ZensicalAdapter.from_repo()`** — Quando è dichiarato - `engine = "zensical"`, `zensical.toml` deve esistere. `from_repo()` solleva - `ConfigurationError` immediatamente se assente. Nessun fallback silenzioso. - -- **Tracciamento `MkDocsAdapter.config_file_found`** — `from_repo()` registra se - `mkdocs.yml` è stato trovato su disco (indipendentemente dal parsing). `has_engine_config()` - restituisce `True` quando il file esisteva. - -- **Comando `zenzic init`** — Scaffolding di `zenzic.toml` con rilevamento automatico - dell'engine. Rileva `mkdocs.yml` → preimposta `engine = "mkdocs"`; rileva `zensical.toml` - → preimposta `engine = "zensical"`; nessun file rilevato → scaffold Vanilla. Tutte le - impostazioni sono commentate per default. `--force` sovrascrive un file esistente. - -- **Pannello UX "Helpful Hint"** — Quando un comando `check` viene eseguito senza - `zenzic.toml`, Zenzic mostra un pannello informativo Rich che suggerisce `zenzic init`. - Il pannello viene soppresso automaticamente una volta che `zenzic.toml` esiste. Pilotato - dal nuovo flag `loaded_from_file: bool` restituito da `ZenzicConfig.load()`. - -- **`ZenzicConfig.load()` restituisce `tuple[ZenzicConfig, bool]`** — Il secondo elemento - (`loaded_from_file`) è `True` quando `zenzic.toml` è stato trovato e analizzato, `False` - quando si usano i default integrati. - -- **Documentazione — Suddivisione configurazione** — - [Panoramica](docs/configuration/index.md) · - [Impostazioni di Base](docs/configuration/core-settings.md) · - [Adapter e Motore](docs/configuration/adapters-config.md) · - [DSL Regole Custom](docs/configuration/custom-rules-dsl.md) - -- **Documentazione — Parità italiana** — `docs/it/` rispecchia la struttura inglese completa: - `it/configuration/` (4 pagine), `it/developers/writing-an-adapter.md`, - `it/guide/migration.md`. - -- **Documentazione — Guida scrittura Adapter** (`docs/developers/writing-an-adapter.md`). - -- **Documentazione — Guida migrazione MkDocs → Zensical** (`docs/guide/migration.md`) — - Workflow in quattro fasi con approccio baseline/diff/gate. - -### Modificato - -- **Engine sconosciuto → `VanillaAdapter`** (breaking change rispetto a v0.3) — Prima: - fallback a `MkDocsAdapter`. Ora: `VanillaAdapter` (controllo orfani no-op). - -- **`scanner.py` ora è solo-protocollo** — rimossa importazione di `VanillaAdapter`; - sostituito `isinstance(adapter, VanillaAdapter)` con `not adapter.has_engine_config()`. - -- **Parametro `output_format`** (era `format`) — Rinominato in `check_all`, `score` e `diff` - per evitare l'oscuramento del built-in Python `format`. - -### Risolto (Sprint 6 — v0.4.0-rc2) - -- **`check all` esegue ora 7/7 controlli** — La pipeline di integrità delle reference - (`scan_docs_references_with_links`) non veniva mai invocata da `check all`. I link - dangling e gli eventi Shield potevano superare il gate globale in silenzio. Fix: - `_collect_all_results` ora chiama la pipeline delle reference. `_AllCheckResults` - aggiunge i campi `reference_errors` e `security_events`. Exit code `2` dello Shield - è imposto incondizionatamente. L'output JSON acquista la chiave `"references"`. - -- **File fantasma `docs/it/configuration.md`** — La god-page italiana della configurazione - non era stata eliminata dopo la suddivisione in `docs/it/configuration/`. Il controllo - orphan salta correttamente i sottoalberi locale per design; il file era un fantasma fisico. - Fix: file eliminato. - -- **`rule_findings` scartati silenziosamente in `check references`** — `IntegrityReport.rule_findings` - veniva popolato dallo scanner ma mai iterato nel loop di output CLI di `check references`. - Le violazioni delle regole personalizzate erano invisibili agli utenti. Fix: aggiunta - iterazione su `report.rule_findings` nel percorso di output. - -### Risolto (Sprint 5 — v0.4.0-rc1) - -- **Marcatore radice `find_repo_root()`** — da `mkdocs.yml` a `.git` o `zenzic.toml`. -- **Letture O(N)** — eliminato il collo di bottiglia della doppia lettura con - `_scan_single_file()`. -- **Pre-compilazione regex `[[custom_rules]]`** — `model_post_init` compila una volta. -- **Salto del frontmatter YAML** — `_iter_content_lines()` salta il blocco `---` iniziale. -- **Falsi positivi riferimenti immagine** — `(? **Sprint 5.** Undici fix di livello produzione applicati a `0.4.0-alpha.1`. - -### Aggiunto - -- **`validate_same_page_anchors`** — campo booleano in `zenzic.toml` (default `false`). -- **`excluded_external_urls`** — campo lista in `zenzic.toml` (default `[]`). -- **`excluded_build_artifacts`** e **`excluded_asset_dirs`** — nuovi campi in `zenzic.toml`. -- **Sezione Community** — Get Involved, FAQ, Contributing, Bug Report, Docs Issue, Change - Request, Pull Request. - -### Modificato - -- **`find_repo_root()`** — da `mkdocs.yml` a `.git` + `zenzic.toml`. -- **`check_all` refactoring** — `_AllCheckResults` + `_collect_all_results()`. -- **`format` → `output_format`** — elimina ruff A002. -- **`placeholder_patterns` predefiniti** — 23 convenzioni stub EN/IT. - -### Risolto - -- **Ancore esplicite MkDocs Material** in `slug_heading()`. -- **Tag HTML** rimossi prima della slugificazione. -- **Output `check_references`** relativizzato a `docs_root`. - -### Test - -- 405 test passano. `zenzic check all` — 6/6 OK. - ---- - -## [0.4.0-alpha.1] — 2026-03-26 — L'Architettura Sovrana - -> Breaking release candidate. Introduce la Pipeline degli Adapter. - -### Breaking Changes - -- **Migrazione i18n Folder Mode** — da Suffix Mode a Folder Mode (`docs/it/`). -- **`[build_context]` deve essere dichiarato per ultimo** in `zenzic.toml`. - -### Aggiunto - -- **Modello `BuildContext`** — nuova sezione `[build_context]` in `zenzic.toml`. -- **`MkDocsAdapter`** — tre metodi agnostici: `is_locale_dir()`, `resolve_asset()`, - `is_shadow_of_nav_page()`. -- **Factory `get_adapter()`** — unico punto di ingresso per la selezione dell'adapter. -- **Fallback automatico a `mkdocs.yml`** quando `build_context.locales` è vuoto. - -### Risolto - -- Falsi positivi orfani (14 file `docs/it/**/*.md`). -- Falsi positivi link non raggiungibili negli asset di `docs/it/`. -- Tag SPDX `header.html` con whitespace-stripping Jinja2. - -### Test - -- 384 test passano. `zenzic check all` — 6/6 OK. - ---- - -## [0.3.0-rc3] — 2026-03-25 — The Bulldozer Edition - -> **Nota:** Si basa su `0.3.0-rc2`. Aggiunge la Trinità degli Esempi (Gold Standard, -> Broken Docs, Security Lab), il controllo ISO 639-1 nel rilevamento dei suffissi e -> 20 test chaos. Questa è la Release Candidate finale prima del tag stabile `0.3.0`. - -### Aggiunto - -- **Trinità degli Esempi** — tre directory di riferimento in `examples/` che coprono - lo spettro completo dell'integrità documentale: - - `examples/i18n-standard/` — il Gold Standard: gerarchia profonda, suffix mode, - ghost artifacts (`excluded_build_artifacts`), zero link assoluti, 100/100. - - `examples/broken-docs/` — aggiornato con violazione di link assoluto e link i18n - rotto per dimostrare il Portability Enforcement Layer e la validazione cross-locale. - - `examples/security_lab/` — aggiornato con `traversal.md` e `absolute.md`; - quattro trigger distinti di Shield e Portability, tutti verificati. -- **`examples/run_demo.sh` Philosophy Tour** — orchestratore in tre atti: - Atto 1 Standard (deve passare), Atto 2 Broken (deve fallire), Atto 3 Shield (deve bloccare). -- **Demo Ghost Artifact** — `examples/i18n-standard/` referenzia `assets/manual.pdf` - e `assets/brand-kit.zip` tramite `excluded_build_artifacts`. Zenzic ottiene verde - senza i file su disco — prova vivente della Build-Aware Intelligence. - -### Modificato - -- **Guardia ISO 639-1** — `_extract_i18n_locale_patterns` ora valida le stringhe locale - con `re.fullmatch(r'[a-z]{2}', locale)`. Tag di versione (`v1`, `v2`), tag di build - (`beta`, `rc1`), stringhe numeriche, codici BCP 47 e valori maiuscoli vengono - rifiutati silenziosamente. Solo i codici di due lettere minuscole producono pattern - `*.locale.md`. - -### Test - -- **`tests/test_chaos_i18n.py`** — 20 scenari chaos (guardia ISO 639-1 × 11, - orphan check patologico × 9). 367 passati, 0 falliti. - ---- - -## [0.3.0-rc2] — 2026-03-25 — The Agnostic Standard - -> **Nota:** Si basa su `0.3.0-rc1`. Aggiunge il Portability Enforcement Layer (divieto di link -> assoluti) e migra la documentazione del progetto al Suffix Mode i18n engine-agnostico. -> La validazione i18n di Zenzic funziona ora con qualsiasi motore di documentazione senza dipendenze da plugin. - -### Aggiunto - -- **Divieto di Link Assoluti** — I link che iniziano con `/` generano ora un errore bloccante. - I percorsi assoluti sono dipendenti dall'ambiente: si rompono quando la documentazione è - ospitata in una sottocartella (es. `sito.io/docs/`). Zenzic impone l'uso di percorsi relativi - (`../` o `./`) per rendere la documentazione portabile in qualsiasi contesto di hosting. Il - messaggio di errore include un suggerimento esplicito per la correzione. -- **i18n Agnostico a Suffisso** — Supporto per il pattern di traduzione non annidato - (`pagina.locale.md`). Zenzic rileva i suffissi locale dai nomi dei file indipendentemente - da qualsiasi plugin del motore di build. Questo rende la validazione i18n compatibile con - Zensical, MkDocs, Hugo o una semplice cartella di file Markdown senza richiedere plugin specifici. - -### Risolto - -- **Integrità della navigazione i18n** — La documentazione del progetto è stata migrata da - Folder Mode (`docs/it/pagina.md`) a Suffix Mode (`docs/pagina.it.md`). Il Suffix Mode elimina - l'ambiguità di profondità degli asset: i file tradotti sono nella stessa cartella degli - originali, quindi tutti i percorsi relativi sono simmetrici tra le lingue. Risolve la perdita - di contesto durante il cambio lingua e i 404 degli asset cross-locale (doppio slash generato - dai percorsi assoluti in folder mode). -- **Simmetria dei percorsi asset** — Uniformata la profondità dei link per i file originali e - tradotti. Tutti i percorsi relativi nei file `.it.md` sono ora strutturalmente identici alle - loro controparti `.md`, rendendo la manutenzione delle traduzioni semplice e priva di errori. - -### Modificato - -- **Portability Enforcement Layer** — Stadio di pre-risoluzione aggiunto a `validate_links_async` - che rifiuta i percorsi interni assoluti prima che `InMemoryPathResolver` venga consultato. - Eseguito incondizionatamente indipendentemente dal motore, dal plugin o dalla configurazione locale. - ---- - -## [0.3.0-rc1] — 2026-03-25 — The Build-Aware Candidate - -> **Nota:** Questa Release Candidate sostituisce il tag stabile 0.3.0 (ritirato) e incorpora -> tutto il lavoro dello Sprint 4 Fase 1 e Fase 2. È la baseline di riferimento per la linea v0.3.x. - -### Aggiunto - -- **Intelligenza Build-Aware (i18n)** — Zenzic comprende ora il plugin `i18n` di MkDocs in - modalità `folder`. Quando `fallback_to_default: true` è impostato in `mkdocs.yml`, i link - a pagine non tradotte vengono risolti nel locale di default prima di essere segnalati come - rotti. Nessun falso positivo per le traduzioni parziali. -- **`excluded_build_artifacts`** — nuovo campo in `zenzic.toml` che accetta pattern glob - (es. `["pdf/*.pdf"]`) per asset generati al momento del build. I link a percorsi corrispondenti - vengono soppressi in fase di lint senza richiedere il file fisico sul disco. -- **Validazione dei link in stile riferimento** — i link `[testo][id]` vengono ora risolti - attraverso la pipeline completa di `InMemoryPathResolver` (incluso il fallback i18n). - In precedenza invisibili al link checker; ora cittadini di prima classe accanto ai link inline. -- **`I18nFallbackConfig`** — `NamedTuple` interno che codifica la semantica del fallback i18n - (`enabled`, `default_locale`, `locale_dirs`). Progettato per l'estensione: qualsiasi regola - futura locale-aware può consumare questa config senza ri-analizzare `mkdocs.yml`. -- **Suite Tower of Babel** (`tests/test_tower_of_babel.py`) — 20 scenari che coprono la - matrice completa della modalità i18n folder: pagine completamente tradotte, traduzioni parziali, - link fantasma, link diretti cross-locale, collisioni case-sensitivity, percorsi annidati, - esclusione orfani, guard `ConfigurationError` e link in stile riferimento tra locali. -- **Core engine-agnostico** — Zenzic è una CLI standalone pura, utilizzabile con qualsiasi - framework di documentazione (MkDocs, Zensical o nessuno). Zero dipendenze da plugin. -- **`InMemoryPathResolver`** — resolver di link deterministico e engine-agnostico in - `zenzic.core`. Risolve i link Markdown interni rispetto a una mappa di file precostituita - in memoria. Zero I/O dopo la costruzione; supporta link relativi, assoluti al sito e a frammento. -- **Zenzic Shield** — protezione integrata contro gli attacchi di path traversal durante la - scansione dei file. `PathTraversal` emerge come esito distinto ad alta gravità. -- **Configurazione gerarchica** — nuovo campo `fail_under` in `zenzic.toml` (0–100) con - precedenza: flag CLI `--fail-under` > `zenzic.toml` > default `0` (modalità osservazionale). -- **Dynamic Scoring v2** — `zenzic score --save` persiste uno snapshot JSON `ScoreReport` - (`.zenzic-score.json`) con `score`, `threshold`, `status` e breakdown per categoria, - pronto per l'automazione dei badge shields.io tramite `dynamic-badges-action`. -- **Documentazione bilingue** — documentazione EN/IT sincronizzata su tutte le sezioni. - -### Risolto - -- **Falsi positivi file orfani** — `find_orphans()` non segnala più come orfani i file nelle - sottocartelle dei locali (es. `docs/it/`) quando il plugin i18n è configurato in modalità - `folder`. -- **Validazione asset non deterministica** — `validate_links_async()` chiamava in precedenza - `Path.exists()` per ogni link nel percorso critico, producendo risultati dipendenti dall'I/O - in CI. Il Pass 1 costruisce ora una pre-mappa `known_assets: frozenset[str]`; il Pass 2 - usa appartenenza al set O(1) con zero I/O su disco. -- **Iterazione YAML null-safe** — `languages: null` in `mkdocs.yml` è ora gestito correttamente - da tutti gli helper i18n (pattern guard `or []`). In precedenza sollevava `TypeError` quando - la chiave era presente con valore null. -- **Entry point** — `pyproject.toml` corretto in `zenzic.main:cli_main`, che inizializza il - logging prima di passare il controllo a Typer. -- **Type safety** — risolto il `TypeError` (`MagicMock > int`) nei test dello scorer causato - da un mock di configurazione non tipizzato. -- **Integrità degli asset** — la generazione degli artefatti di build (`.zip`) è automatizzata - in `run_demo.sh`, `nox -s preflight` e CI, garantendo uno score 100/100 coerente. -- **Coercizione del tipo `BUILD_DATE`** — formato cambiato da `%Y-%m-%d` a `%Y/%m/%d` per - impedire a PyYAML di convertire automaticamente la stringa data in `datetime.date`. -- **CVE-2026-4539 (Pygments ReDoS)** — rischio accettato e documentato: il ReDoS in - `AdlLexer` di Pygments non è raggiungibile nel threat model di Zenzic (Zenzic non elabora - input ADL; Pygments è usato solo per il syntax highlighting statico della documentazione). - Esenzione aggiunta a `nox -s security` in attesa di una patch upstream. Tutte le altre - vulnerabilità restano pienamente auditate. - -### Modificato - -- **Interfaccia CLI** — rimossi tutti i riferimenti residui al plugin MkDocs; l'API pubblica - è esclusivamente l'interfaccia a riga di comando. La selezione del generatore (`mkdocs.yml`) - è rilevata automaticamente a runtime. -- **Self-check `zenzic.toml`** — `excluded_build_artifacts = ["pdf/*.pdf"]` aggiunto alla - configurazione del repository, eliminando il requisito di pre-generare i PDF prima di - eseguire `zenzic check all` in locale. -- **Zenzic Shield** — protezione Path Traversal ora integrata nel core di `InMemoryPathResolver`, - sostituendo il precedente controllo ad-hoc nel wrapper CLI. - ---- - -## [0.3.0] — 2026-03-24 — [RITIRATO] - -> Sostituito da `0.3.0-rc1`. Questo tag è stato creato prima del merge del lavoro -> Build-Aware Intelligence (i18n folder-mode, mapping asset O(1), link in stile riferimento). -> Usare `0.3.0-rc1`. - ---- - -## [0.2.1] — 2026-03-24 - -### Rimosso - -- **Supporto `zensical.toml`** — Zensical legge ora `mkdocs.yml` nativamente; - un `zensical.toml` separato non è più richiesto né supportato come file di - configurazione del build. Il fixture `examples/broken-docs/zensical.toml` è - mantenuto solo come asset di test. -- **Dipendenza runtime `mkdocs`** — `mkdocs>=1.5.0` rimosso da - `[project.dependencies]`. I pacchetti dei plugin MkDocs (`mkdocs-material`, - `mkdocs-minify-plugin`, `mkdocs-with-pdf`, `mkdocstrings`, `mkdocs-static-i18n`) - rimangono in `[dependency-groups.dev]` in attesa degli equivalenti nativi - Zensical per le funzionalità social, minify e with-pdf. -- **Entry-point plugin MkDocs** — `[project.entry-points."mkdocs.plugins"]` - rimosso. Zenzic non si registra più come entry-point `mkdocs.plugins`. - Usa `zenzic check all` nella CI invece del plugin. - -### Modificato - -- **`find_config_file()`** — cerca solo `mkdocs.yml`; logica di preferenza - `zensical.toml` rimossa. -- **`find_repo_root()`** — risale fino a `mkdocs.yml` o `.git`; non controlla - più `zensical.toml`. -- **`find_orphans()`** — ramo TOML rimosso; legge sempre `mkdocs.yml` via - `_PermissiveYamlLoader`. Ramo di fallback localizzazione i18n rimosso. -- **`_detect_engine()`** — semplificato: `mkdocs.yml` è il singolo trigger di - configurazione; `zensical` viene tentato prima (legge `mkdocs.yml` - nativamente), poi `mkdocs`. L'euristica `zensical.toml`-first è rimossa. -- **`noxfile.py`** — sessioni `docs` e `docs_serve` usano `zensical - build/serve`; `preflight` usa `zensical build --strict`. -- **`justfile`** — target `build`, `serve` e `build-release` usano - `zensical`; `live` è ora un alias per `serve`. -- **`deploy-docs.yml`** — step di build usa `uv run zensical build --strict`. -- **`zenzic.yml`** — trigger path ridotti a `docs/**` e `mkdocs.yml` soltanto. -- **`mkdocs.yml`** — versione aggiornata a `0.2.1`; commento aggiornato per - indicare la lettura nativa di Zensical. -- **`pyproject.toml`** — override mypy per `mkdocs.*` rimosso (non più - dipendenza runtime). - -### Aggiunto - -- **Ristrutturazione documentazione** — nuova sezione `docs/about/` con - `index.md`, `vision.md`, `license.md` (EN + IT); nuova sezione - `docs/reference/` con `index.md` e `api.md` (EN + IT). -- **Nav `mkdocs.yml`** — riflette il nuovo layout `about/` e `reference/` - con le funzionalità Material `navigation.indexes` e `navigation.expand`. - ---- - -## [0.2.0-alpha.1] — 2026-03-23 - -### Aggiunto - -#### Two-Pass Reference Pipeline — `zenzic check references` - -- **`ReferenceMap`** (`zenzic.models.references`) — registro stateful per-file per le definizioni di reference link `[id]: url`. CommonMark §4.7 first-wins: la prima definizione di qualsiasi ID nell'ordine del documento vince; le definizioni successive vengono ignorate e tracciate in `duplicate_ids`. Le chiavi sono case-insensitive (`lower().strip()`). Ogni voce memorizza `(url, line_no)` come metadati per report di errore precisi. La proprietà `integrity_score` restituisce `|used_ids| / |definitions| × 100`; protetta da ZeroDivisionError — restituisce `100.0` quando non esistono definizioni. -- **`ReferenceScanner`** (`zenzic.core.scanner`) — scanner stateful per-file che implementa una pipeline in tre fasi: (1) **Harvesting** (`harvest()`) legge le righe tramite generator `_iter_content_lines()` (O(1) RAM per riga), popola la `ReferenceMap` ed esegue lo Zenzic Shield su ogni URL; (2) **Cross-Check** (`cross_check()`) risolve ogni utilizzo `[testo][id]` rispetto alla mappa completamente popolata, emettendo `ReferenceFinding(issue="DANGLING")` per ogni Dangling Reference; (3) **Integrity Report** (`get_integrity_report()`) calcola l'`integrity_score`, segnala le Dead Definitions (`issue="DEAD_DEF"`) e consolida tutti i finding con gli errori prima. -- **`scan_docs_references` / `scan_docs_references_with_links`** — orchestratori di alto livello che eseguono la pipeline su ogni file `.md` in `docs/`. Contratto Shield-as-firewall: il Pass 2 (Cross-Check) viene saltato interamente per qualsiasi file con eventi `SECRET`. Deduplicazione URL globale opzionale tramite `LinkValidator` quando è richiesto `--links`. -- **Zenzic Shield** (`zenzic.core.shield`) — motore di rilevamento segreti che scansiona ogni URL di riferimento durante l'Harvesting usando pattern pre-compilati con quantificatori a lunghezza esatta (nessun backtracking, O(1) per riga). Tre famiglie di credenziali: OpenAI API key (`sk-[a-zA-Z0-9]{48}`), GitHub token (`gh[pousr]_[a-zA-Z0-9]{36}`), AWS access key (`AKIA[0-9A-Z]{16}`). Qualsiasi rilevamento causa l'interruzione immediata con **Exit Code 2**; nessuna richiesta HTTP viene emessa per documenti contenenti credenziali esposte. -- **`LinkValidator`** (`zenzic.core.validator`) — registro di deduplicazione URL globale sull'intero albero della documentazione. `register_from_map()` registra tutti gli URL `http/https` da una `ReferenceMap`. `validate()` emette esattamente una richiesta HEAD per URL unico, indipendentemente da quanti file vi fanno riferimento. Riutilizza il motore asincrono `_check_external_links` esistente (semaphore(20), fallback HEAD→GET, 401/403/429 trattati come vivi). -- **Comando CLI `zenzic check references`** — attiva la pipeline completa in tre fasi. Flag: `--strict` (le Dead Definitions diventano errori bloccanti), `--links` (validazione HTTP asincrona di tutti gli URL di riferimento, 1 ping per URL unico). Exit Code 2 riservato esclusivamente agli eventi Zenzic Shield. -- **Controllo accessibilità alt-text** (`check_image_alt_text`) — funzione pura che segnala sia immagini inline `![](url)` che tag HTML `` senza alt text. `is_warning=True`; promosso a errore con `--strict`. Non blocca mai i deploy per default. -- **Modulo `zenzic.models.references`** — nuova posizione canonica per `ReferenceMap`, `ReferenceFinding`, `IntegrityReport`. `zenzic.core.models` diventa uno shim di re-export per retrocompatibilità. - -#### Documentazione - -- `docs/architecture.md` — sezione "Two-Pass Reference Pipeline (v0.2.0)": tabella comparativa stateless→document-aware, problema dei forward reference, diagramma ASCII del ciclo di vita, razionale dello streaming tramite generator, invarianti di `ReferenceMap` (first-wins, case-insensitivity, metadati numero di riga), design Shield-as-firewall, diagramma deduplicazione URL globale, gentle nudge accessibilità, riepilogo completo del flusso dati con formula LaTeX dell'integrità. -- `docs/usage.md` — riscrittura completa per v0.2.0: content tab `uv`/`pip` per ogni livello di installazione, `check references` con `--strict`/`--links`, sezione Reference Integrity con formula LaTeX, sezione CI/CD integration (tabella `uvx` vs `uv run`, workflow GitHub Actions, gestione Exit Code 2), sezione Programmatic Usage con esempi API `ReferenceScanner`. -- `README.md` — stile Reference Link ovunque, sezione `## 🛡️ Zenzic Shield`, `> [!WARNING]` per Exit Code 2, tabella dei controlli aggiornata con `check references`. -- Tutti i file di localizzazione italiana (`*.it.md`) sincronizzati con il sorgente inglese secondo la direttiva Parità Documentale. - -### Modificato - -- Modello di scansione: da **stateless** (riga per riga, senza memoria delle righe precedenti) a **document-aware** (Three-Phase Pipeline con stato `ReferenceMap` per-file). -- Modello di memoria: il generator `_iter_content_lines()` sostituisce `.read()` / `.readlines()` nella pipeline di riferimento — la RAM al picco scala con la dimensione della `ReferenceMap`, non con la dimensione del file. -- Deduplicazione URL globale estesa alla pipeline di riferimento: `LinkValidator` deduplica al momento della registrazione sull'intero albero della documentazione — una sola richiesta HTTP per URL unico indipendentemente dal conteggio dei riferimenti. - -### Corretto - -- **Forward Reference Trap** — gli scanner single-pass producono falsi errori di Dangling Reference quando `[testo][id]` appare prima di `[id]: url` nello stesso file. Risolto per design nell'architettura Two-Pass: il Cross-Check viene eseguito solo dopo che l'Harvesting ha completamente popolato la `ReferenceMap`. -- Normalizzazione ID di riferimento: gli spazi iniziali/finali e il casing misto vengono rimossi dentro `add_definition()` e `resolve()` — le voci duplicate per ID che differiscono solo per casing o spaziatura sono impossibili per costruzione. - -### Sicurezza - -- **Exit Code 2** — riservato esclusivamente agli eventi Zenzic Shield. Se `zenzic check references` esce con codice 2, è stato rilevato un pattern di credenziale incorporato in un URL di riferimento. La pipeline si interrompe immediatamente; tutte le richieste HTTP e l'analisi Cross-Check vengono saltate. **Ruota la credenziale esposta immediatamente.** - ---- - -## [0.1.0-alpha.1] — 2026-03-23 - -### 🚀 Funzionalità - -#### Validatore di link nativo — nessun sottoprocesso, nessuna dipendenza MkDocs - -- `zenzic check links` — completamente riscritto come validatore Markdown nativo in due pass. Il pass 1 legge tutti i file `.md` in memoria e pre-calcola i set di ancore dalle intestazioni ATX. Il pass 2 estrae i link inline e le immagini tramite `_MARKDOWN_LINK_RE`, risolve i percorsi interni rispetto alla mappa di file in memoria, valida le ancore `#frammento` e rifiuta i path traversal fuori da `docs/`. Il pass 3 (solo `--strict`) esegue il ping degli URL esterni in modo concorrente tramite `httpx` con concorrenza limitata (`asyncio.Semaphore(20)`), deduplicazione URL e degradazione controllata per risposte 401/403/429. MkDocs non è più richiesto per la validazione dei link. - -#### Punteggio qualità e rilevamento regressioni - -- `zenzic score` — aggrega tutti i risultati dei cinque controlli in un intero ponderato da 0 a 100. Pesi: `links` 35%, `orphans` 20%, `snippets` 20%, `placeholders` 15%, `assets` 10%. Supporta `--format json`, `--save` (persiste snapshot in `.zenzic-score.json`) e `--fail-under `. -- `zenzic diff` — confronta il punteggio corrente con il baseline `.zenzic-score.json`; esce con codice non-zero quando il punteggio regredisce oltre `--threshold` punti. -- `zenzic check all --exit-zero` — produce il report completo ma esce sempre con codice 0; pensato per pipeline CI soft-fail e sprint di miglioramento della documentazione. - -#### Server di sviluppo engine-agnostico con pre-flight - -- `zenzic serve` — rileva automaticamente il motore di documentazione dalla root del repository e lo avvia con `--dev-addr 127.0.0.1:{porta}`. Fallback a server di file statici su `site/` quando nessun binario del motore è installato. Risoluzione porta tramite socket probe prima del subprocess. Esegue un pre-flight silenzioso prima dell'avvio; usa `--no-preflight` per saltarlo. - -#### Configurazione - -- Campo `excluded_assets` in `ZenzicConfig` — lista di percorsi asset esclusi dal controllo asset non usati. -- Campo `excluded_file_patterns` in `ZenzicConfig` — lista di pattern glob di nomi file esclusi dal controllo orfani. - -### 🛡️ Qualità e Test - -- **98,4% di copertura test** su `zenzic.core.*` e wrapper CLI. -- **`PermissiveYamlLoader`** in `scanner.py` — gestisce i tag `!ENV` di MkDocs che altrimenti causavano la segnalazione di tutte le pagine come orfane. - -### 📦 DevOps - -- **Pubblicazione PyPI via OIDC** — workflow `release.yml` pubblica su PyPI usando OpenID Connect trusted publishing; nessun token API a lunga durata. -- **Backend di build hatch** — `pyproject.toml` migrato a `hatchling`; versione ottenuta dai tag git tramite `hatch-vcs`. -- **`zensical` come extra opzionale** — dipendenza `zensical` spostata in `optional-dependencies[zensical]`. -- **Documentazione multilingua (EN/IT)** — tutte le pagine rivolte all'utente disponibili in inglese e italiano. - -## [0.1.0] — 2026-03-18 - -### Aggiunto - -- `zenzic check links` — rilevamento di link non validi e ancore mancanti via `mkdocs build --strict` -- `zenzic check orphans` — rileva file `.md` assenti dalla `nav` -- `zenzic check snippets` — controlla la sintassi di tutti i blocchi Python delimitati -- `zenzic check placeholders` — segnala pagine stub e pattern di testo vietati -- `zenzic check assets` — rileva immagini e asset non utilizzati -- `zenzic check all` — esegue tutti i controlli con un solo comando; supporta `--format json` per l'integrazione CI/CD -- Generazione PDF professionale — plugin `with-pdf` integrato con copertina Jinja2 brandizzata e timestamp dinamico -- File di configurazione `zenzic.toml` con modelli Pydantic v2; tutti i campi opzionali con valori predefiniti sensati -- `justfile` — task runner integrato per lo sviluppo rapido (sync, lint, dev, build-release) -- `examples/broken-docs/` — repository di documentazione intenzionalmente rotta per tutti i cinque tipi di controllo -- `noxfile.py` — task runner per sviluppatori: `tests`, `lint`, `format`, `typecheck`, `reuse`, `security`, `docs`, `preflight`, `screenshot`, `bump` -- `scripts/generate_screenshot.py` — screenshot SVG riproducibile del terminale tramite Rich `Console(record=True)` -- Piena conformità REUSE 3.3 / SPDX su tutti i file sorgente -- GitHub Actions — `ci.yml`, `release.yml`, `sbom.yml`, `secret-scan.yml`, `security-posture.yml`, `dependabot.yml` -- Suite documentazione — index, architettura, riferimento controlli e riferimento configurazione -- Pre-commit hook — ruff, mypy, reuse, self-check di Zenzic - ---- - - - -[keep-a-changelog]: https://keepachangelog.com/en/1.1.0/ -[semver]: https://semver.org/ -[0.3.0]: https://github.com/PythonWoods/zenzic/compare/v0.2.1...v0.3.0 -[0.3.0-rc1]: https://github.com/PythonWoods/zenzic/compare/v0.2.1...v0.3.0-rc1 -[0.2.1]: https://github.com/PythonWoods/zenzic/compare/v0.2.0-alpha.1...v0.2.1 -[0.2.0-alpha.1]: https://github.com/PythonWoods/zenzic/compare/v0.1.0-alpha.1...v0.2.0-alpha.1 -[0.1.0-alpha.1]: https://github.com/PythonWoods/zenzic/compare/v0.1.0...v0.1.0-alpha.1 -[0.1.0]: https://github.com/PythonWoods/zenzic/releases/tag/v0.1.0 diff --git a/CHANGELOG.md b/CHANGELOG.md index 3e264ee..0301583 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -11,6 +11,138 @@ Versions follow [Semantic Versioning](https://semver.org/). ## [Unreleased] +## [0.5.0a1] — 2026-04-02 — The Sentinel: Hybrid Adaptive Engine & Plugin System + +> **Sprint 11.** Zenzic enters the v0.5 cycle with a unified execution model and a +> first-class plugin system. `scan_docs_references` replaces the two separate serial +> and parallel functions. The engine selects sequential or `ProcessPoolExecutor` mode +> automatically based on repository size (threshold: 50 files). All rules are validated +> for pickle-serializability at construction time. Core rules are now registered as +> entry-points under `zenzic.rules`, establishing the public plugin contract. + +### BREAKING CHANGES + +- **`scan_docs_references` signature changed.** The function now returns + `tuple[list[IntegrityReport], list[str]]` instead of `list[IntegrityReport]`. + Callers that ignored link errors must unpack the tuple: + + ```python + # Before (0.4.x) + reports = scan_docs_references(repo_root) + + # After (0.5.x) + reports, _ = scan_docs_references(repo_root) + ``` + +- **`scan_docs_references_parallel` and `scan_docs_references_with_links` are + removed.** Use `scan_docs_references(..., workers=N)` and + `scan_docs_references(..., validate_links=True)` respectively. + +- **`RuleEngine` is removed.** The class is now `AdaptiveRuleEngine` with no + alias. The constructor runs eager pickle validation on every rule and raises + `PluginContractError` if any rule is not serialisable. + +### Added + +- **`AdaptiveRuleEngine`** (`zenzic.core.rules`) — unified rule engine with + Hybrid Adaptive Mode. Replaces and removes `RuleEngine` (no alias). + Validates all rules for pickle-serializability at construction time via + `_assert_pickleable()`. + +- **`_assert_pickleable(rule)`** (`zenzic.core.rules`) — module-level helper + called by `AdaptiveRuleEngine.__init__`. Raises `PluginContractError` on + failure with a diagnostic message including the rule ID, class name, and the + pickle error. + +- **`ADAPTIVE_PARALLEL_THRESHOLD`** (`zenzic.core.scanner`) — module-level + constant (default: `50`). The file count above which parallel mode activates. + Exposed for test overrides without patching private internals. + +- **`PluginContractError`** (`zenzic.core.exceptions`) — new exception for rule + plugin violations. Added to the exception hierarchy docstring. + +- **`zenzic.rules` entry-point group** (`pyproject.toml`) — core rules + registered as first-class plugins: + + ```toml + [project.entry-points."zenzic.rules"] + broken-links = "zenzic.core.rules:VSMBrokenLinkRule" + ``` + +- **`docs/developers/plugins.md`** (EN + IT) — new page documenting the rule + plugin contract: module-level requirement, pickle safety, purity, packaging + via `entry_points`, `plugins` key in `zenzic.toml`, error isolation, and a + pre-publication checklist. + +- **`docs/developers/index.md`** (EN + IT) — added link to `plugins.md`. + +- **`zenzic plugins list`** — new CLI sub-command. Lists every rule registered + in the `zenzic.rules` entry-point group with its `rule_id`, origin + distribution, and fully-qualified class name. Core rules are labelled + `(core)`; third-party rules show the installing package name. + +- **`pyproject.toml` configuration support (ISSUE #5)** — `ZenzicConfig.load()` + now follows a three-level Agnostic Citizen priority chain: + `zenzic.toml` (Priority 1) → `[tool.zenzic]` in `pyproject.toml` + (Priority 2) → built-in defaults (Priority 3). If both files exist, + `zenzic.toml` wins unconditionally. + +- **`scan_docs_references` `verbose` flag** — new keyword-only parameter + `verbose: bool = False`. When `True`, prints a one-line performance + telemetry summary to stderr after the scan: engine mode (Sequential or + Parallel), worker count, file count, elapsed time, and estimated speedup + (parallel mode only). + +- **`PluginRuleInfo` dataclass** (`zenzic.core.rules`) — lightweight struct + returned by the new `list_plugin_rules()` discovery function. Fields: + `rule_id`, `class_name`, `source`, `origin`. + +- **`docs/configuration/index.md`** (EN + IT) — "Configuration loading" section + expanded with the three-level priority table and a `[tool.zenzic]` example. + +### Changed + +- **`scan_docs_references`** (`zenzic.core.scanner`) — unified function + replacing `scan_docs_references` + `scan_docs_references_parallel`. New + signature: + + ```python + scan_docs_references( + repo_root, config=None, + *, validate_links=False, workers=1 + ) -> tuple[list[IntegrityReport], list[str]] + ``` + + Hybrid Adaptive Mode: sequential when `workers=1` or `< 50 files`; parallel + (`ProcessPoolExecutor`) otherwise. Results always sorted by `file_path`. + +- **`docs/architecture.md`** and **`docs/it/architecture.md`** — "Parallel scan + (v0.4.0-rc5)" section replaced by "Hybrid Adaptive Engine (v0.5.0a1)" with + a Fan-out/Fan-in Mermaid diagram showing the threshold decision node. + IT section was previously absent; added from scratch. + +- **`docs/usage/advanced.md`** and **`docs/it/usage/advanced.md`** — parallel + scan section rewritten to document the unified `scan_docs_references` API and + the Hybrid Adaptive Engine threshold table. + +- **`docs/usage/commands.md`** (EN + IT) — added `zenzic plugins list` command + documentation and `--workers` flag reference for the Hybrid Adaptive Engine. + +- **`README.md`** — "RC5 Highlights" replaced by "v0.5.0a1 Highlights — + The Sentinel". + +- **`pyproject.toml`** — version bumped to `0.5.0a1`. + +- **`src/zenzic/__init__.py`** — `__version__` bumped to `"0.5.0a1"`. + +### Removed + +- `scan_docs_references_parallel` — deleted; use `scan_docs_references(..., workers=N)`. +- `scan_docs_references_with_links` — deleted; use `scan_docs_references(..., validate_links=True)`. +- `RuleEngine` — deleted; use `AdaptiveRuleEngine` directly. + +--- + ## [0.4.0-rc4] — 2026-04-01 — Ghost Route Support, VSM Rule Engine & Content-Addressable Cache ## [0.4.0-rc5] — 2026-04-01 — The Sync Sprint: Zensical v0.0.31+ & Parallel API diff --git a/README.it.md b/README.it.md index 1587f59..c14848e 100644 --- a/README.it.md +++ b/README.it.md @@ -42,17 +42,24 @@ fallback, nessuna supposizione. --- -## Novita RC5 (v0.4.0-rc5) - -- **Sync Zensical v0.0.31+**: `ZensicalAdapter` legge ora la nav da `[project].nav` - (schema TOML ufficiale), incluse sezioni annidate. -- **Routing nav-aware**: con nav esplicita, i file presenti su disco ma assenti dalla nav - vengono classificati `ORPHAN_BUT_EXISTING`. -- **Parita URL**: `map_url()` rispetta `[project].use_directory_urls = false` - (`/pagina.html`) oltre al default directory-style (`/pagina/`). -- **Parallelismo API documentato**: modello shared-nothing con `ProcessPoolExecutor`, - note oneste sull'overhead e requisiti di picklability per regole custom. -- **Nuovo esempio canonico**: `examples/zensical-basic/` allineato agli snippet documentati. +## v0.5.0a1 — La Sentinella + +- **Hybrid Adaptive Engine**: `scan_docs_references` è l'unico entry point unificato per + tutte le modalità di scansione. Il motore seleziona l'esecuzione sequenziale o parallela + automaticamente in base alla dimensione del repository (soglia: 50 file). +- **`AdaptiveRuleEngine` con validazione pickle anticipata**: tutte le regole vengono + validate per la serializzabilità pickle al momento della costruzione. Una regola non + serializzabile solleva `PluginContractError` immediatamente. +- **`zenzic plugins list`**: nuovo comando che mostra ogni regola registrata nel gruppo + entry-point `zenzic.rules` — regole Core e plugin di terze parti. +- **Supporto `pyproject.toml` (ISSUE #5)**: incorpora la configurazione Zenzic in + `[tool.zenzic]` quando `zenzic.toml` è assente. `zenzic.toml` vince sempre se entrambi + i file esistono. +- **Telemetria delle prestazioni**: `scan_docs_references(verbose=True)` stampa modalità + motore, numero di worker, tempo di esecuzione e speedup stimato su stderr. +- **`PluginContractError`**: nuova eccezione per le violazioni del contratto delle regole. +- **Documentazione plugin**: `docs/developers/plugins.md` (EN + IT) — contratto completo, + istruzioni di packaging ed esempi di registrazione `pyproject.toml`. --- @@ -248,11 +255,14 @@ non segnalare mai i file tradotti come orfani. ## Changelog & Note di Rilascio -- 📋 [CHANGELOG.md](CHANGELOG.md) — storico completo delle modifiche (inglese) -- 📋 [CHANGELOG.it.md](CHANGELOG.it.md) — storico delle modifiche in italiano +- 📋 [CHANGELOG.md](CHANGELOG.md) — storico completo delle modifiche (unico, in inglese) - 🚀 [RELEASE.md](RELEASE.md) — manifesto di rilascio v0.4.0 (inglese) - 🚀 [RELEASE.it.md](RELEASE.it.md) — manifesto di rilascio v0.4.0 (italiano) +> Il changelog è ora mantenuto in un unico file inglese (`CHANGELOG.md`). +> Questa scelta segue gli standard dell'ecosistema Python open source: +> la cronologia delle versioni è documentazione tecnica, non interfaccia utente. + --- ## Contribuire diff --git a/README.md b/README.md index c76ad34..480aeab 100644 --- a/README.md +++ b/README.md @@ -42,18 +42,27 @@ absolute links are a hard error, and if you declare `engine = "zensical"` you mu --- -## RC5 Highlights (v0.4.0-rc5) - -- **Zensical v0.0.31+ sync**: `ZensicalAdapter` now reads navigation from - `[project].nav` (official TOML schema), including nested sections. -- **Nav-aware routing**: with explicit nav, files present on disk but absent from nav are - classified as `ORPHAN_BUT_EXISTING`. -- **URL mode parity**: `map_url()` now respects `[project].use_directory_urls = false` - (`/page.html`) and default directory URLs (`/page/`). -- **Parallel scan API documented**: shared-nothing `ProcessPoolExecutor` model, - honest overhead notes, and picklability requirements for custom rules. -- **New canonical example**: `examples/zensical-basic/` mirrors the documented TOML - schema and migration flow. +## v0.5.0a1 Highlights — The Sentinel + +- **Hybrid Adaptive Engine**: `scan_docs_references` is the single unified + entry point for all scan modes. The engine selects sequential or parallel + execution automatically based on repository size (threshold: 50 files). No + flags required — Zenzic is fast by default. +- **`AdaptiveRuleEngine` with eager pickle validation**: all rules are validated + for pickle-serializability at construction time. A non-serialisable rule raises + `PluginContractError` immediately — before any file is scanned. +- **`zenzic.rules` entry-point group**: core rules (`VSMBrokenLinkRule`) are + registered as first-class plugins. Third-party packages can extend Zenzic by + registering under the same group and enabling their plugin ID in `zenzic.toml`. +- **`zenzic plugins list`**: new command that displays every rule registered in + the `zenzic.rules` entry-point group — Core rules and third-party plugins. +- **`pyproject.toml` support (ISSUE #5)**: embed Zenzic config in `[tool.zenzic]` + when `zenzic.toml` is absent. `zenzic.toml` always wins if both exist. +- **Performance telemetry**: `scan_docs_references(verbose=True)` prints engine + mode, worker count, elapsed time, and estimated speedup to stderr. +- **`PluginContractError`**: new exception for rule contract violations. +- **Plugin documentation**: `docs/developers/plugins.md` (EN + IT) — full + contract, packaging instructions, and `pyproject.toml` registration examples. --- @@ -492,11 +501,18 @@ For dynamic badge automation and regression detection, see the [CI/CD Integratio --- -## Configuration (`zenzic.toml`) +## Configuration -All fields are optional. Zenzic works with no configuration file. +All fields are optional. Zenzic works with no configuration file at all. + +Zenzic follows a three-level **Agnostic Citizen** priority chain: + +1. `zenzic.toml` at the repository root — sovereign; always wins. +2. `[tool.zenzic]` in `pyproject.toml` — used when `zenzic.toml` is absent. +3. Built-in defaults. ```toml +# zenzic.toml (or [tool.zenzic] in pyproject.toml) docs_dir = "docs" excluded_dirs = ["includes", "assets", "stylesheets", "overrides", "hooks"] snippet_min_lines = 1 diff --git a/docs/architecture.md b/docs/architecture.md index b5aa657..d6c61b8 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -139,43 +139,71 @@ I/O operations, called at process start and end respectively by the CLI layer. --- -## Parallel scan (v0.4.0-rc5) +## Hybrid Adaptive Engine (v0.5.0a1) -The Three-Phase Pipeline is pure by design: `_scan_single_file` takes a file path and -returns an `IntegrityReport` with zero shared state. This makes it trivially -parallelisable. +`scan_docs_references` is the single unified entry point for all scan modes. +There is no longer a separate "parallel" function — the engine **adapts +automatically** based on repository size. ```mermaid -flowchart LR - classDef node fill:#0f172a,stroke:#38bdf8,stroke-width:2px,color:#e2e8f0 +flowchart TD + classDef node fill:#0f172a,stroke:#38bdf8,stroke-width:2px,color:#e2e8f0 + classDef decide fill:#0f172a,stroke:#f59e0b,stroke-width:2px,color:#e2e8f0 classDef worker fill:#0f172a,stroke:#10b981,stroke-width:2px,color:#e2e8f0 - classDef io fill:#0f172a,stroke:#4f46e5,stroke-width:2px,color:#e2e8f0 + classDef seq fill:#0f172a,stroke:#6366f1,stroke-width:2px,color:#e2e8f0 + classDef io fill:#0f172a,stroke:#4f46e5,stroke-width:2px,color:#e2e8f0 + + ENTRY["scan_docs_references()\nrepo_root, config\nworkers, validate_links"]:::node + THRESHOLD{"files ≥ 50\nAND workers ≠ 1?"}:::decide - MAIN["Main process\nbuilds RuleEngine\nlists .md files"]:::node + SEQ["Sequential path\n_scan_single_file × N\nO(N) reads"]:::seq + SORT_SEQ["Sorted list[IntegrityReport]"]:::io + + FAN["Main process\npickle(config, engine)\n→ work_items"]:::node W1["Worker 1\n_scan_single_file\n(page_A.md)"]:::worker W2["Worker 2\n_scan_single_file\n(page_B.md)"]:::worker WN["Worker N\n_scan_single_file\n(page_Z.md)"]:::worker - SORT["Sorted merge\nby file_path"]:::node - OUT["list[IntegrityReport]"]:::io - - MAIN -->|"pickle(config, engine)"| W1 & W2 & WN - W1 & W2 & WN --> SORT - SORT --> OUT + MERGE["Sorted merge\nby file_path"]:::node + SORT_PAR["Sorted list[IntegrityReport]"]:::io + + ENTRY --> THRESHOLD + THRESHOLD -->|"No (< 50 files\nor workers=1)"| SEQ + SEQ --> SORT_SEQ + THRESHOLD -->|"Yes"| FAN + FAN -->|"pickle(config, engine)"| W1 & W2 & WN + W1 & W2 & WN --> MERGE + MERGE --> SORT_PAR ``` -**Shared-nothing architecture:** `config` and `rule_engine` are serialised by `pickle` -when dispatched to each worker. Every worker operates on an independent copy — there is -no shared memory, no lock, no race condition. +### Sequential path + +Used when `workers=1` (the default) or when the repository has fewer than 50 +files. Zero process-spawn overhead. Supports external URL validation in a +single O(N) pass. + +### Parallel path + +Activated when `workers != 1` and the file count is at or above +`ADAPTIVE_PARALLEL_THRESHOLD` (50). Each file is dispatched to an independent +worker process via `ProcessPoolExecutor`. + +**Shared-nothing architecture:** `config` and the `AdaptiveRuleEngine` +(including all registered rules) are serialised by `pickle` before being sent +to each worker. Every worker operates on an independent copy — no shared +memory, no locks, no race conditions. + +**Immutability contract:** workers must not mutate `config`. Rules that write +to mutable global state (e.g. a class-level counter) violate the Pure Functions +Pillar. In parallel mode, each worker holds an independent copy of that state +— mutations are local and discarded, producing results that diverge silently +from sequential mode. -**Immutability contract:** workers must not mutate `config`. All scan functions honour -this contract. Rules that write to shared state (e.g. a counter in a class variable) -violate the Pure Functions Pillar and will produce non-deterministic results in parallel -mode. +**Eager pickle validation:** `AdaptiveRuleEngine` calls `pickle.dumps()` on +every rule at construction time. A non-serialisable rule raises +`PluginContractError` immediately, before any file is scanned. -**Threshold for parallelism benefit:** process-spawn overhead is ~200–400 ms on a cold -Python interpreter. The crossover point where parallelism beats sequential scanning is -approximately 200 files on an 8-core machine. For smaller repos, use -`scan_docs_references` (sequential). +**Determinism guarantee:** results are sorted by `file_path` after collection +regardless of worker scheduling order. --- diff --git a/docs/configuration/index.md b/docs/configuration/index.md index 2d38610..0c41847 100644 --- a/docs/configuration/index.md +++ b/docs/configuration/index.md @@ -99,12 +99,56 @@ locales = ["it"] ## Configuration loading -Zenzic reads `zenzic.toml` from the repository root at startup. The repository root is located by -walking upward from the current working directory until a `.git` directory or a `zenzic.toml` -file is found. - -If `zenzic.toml` is absent, all defaults apply silently. If `zenzic.toml` is present but contains -a **TOML syntax error**, Zenzic raises a `ConfigurationError` with a human-friendly message and -exits immediately — silent fallback on a broken config file would hide mistakes. Unknown fields -are silently ignored, which means adding fields not yet supported by your installed version is -safe. +Zenzic follows a three-level **Agnostic Citizen** priority chain when searching for configuration +at startup: + +| Priority | Source | When used | +| :---: | :--- | :--- | +| 1 | `zenzic.toml` at repository root | Always preferred — the authoritative sovereign config | +| 2 | `[tool.zenzic]` in `pyproject.toml` | Used only when `zenzic.toml` is absent | +| 3 | Built-in defaults | Used when neither file is present | + +The repository root is located by walking upward from the current working directory until a `.git` +directory, a `zenzic.toml`, or a `pyproject.toml` is found. + +### zenzic.toml (Priority 1) + +The dedicated configuration file. If it exists, Zenzic reads it and ignores `pyproject.toml` +entirely — there is no merging between the two files. + +### pyproject.toml (Priority 2) — v0.5.0a1 + +Python projects that already have a `pyproject.toml` can embed Zenzic configuration in the +`[tool.zenzic]` table, eliminating the need for a separate file: + +```toml +# pyproject.toml — embed Zenzic config in the standard Python metadata file + +[tool.zenzic] +docs_dir = "docs" +fail_under = 90 + +[tool.zenzic.build_context] +engine = "mkdocs" + +[[tool.zenzic.custom_rules]] +id = "ZZ-NODRAFT" +pattern = "(?i)\\bDRAFT\\b" +message = "Remove DRAFT marker before publishing." +severity = "warning" +``` + +All fields supported in `zenzic.toml` are equally supported in `[tool.zenzic]`. The +`[build_context]` sub-table becomes `[tool.zenzic.build_context]`, and `[[custom_rules]]` arrays +become `[[tool.zenzic.custom_rules]]`. + +!!! note "Sovereignty rule" + If both `zenzic.toml` **and** `pyproject.toml` exist in the repository root, `zenzic.toml` + wins unconditionally. The `[tool.zenzic]` table in `pyproject.toml` is ignored. + +### Error handling + +If the winning config file contains a **TOML syntax error**, Zenzic raises a `ConfigurationError` +with a human-friendly message and exits immediately — silent fallback on a broken config file +would hide mistakes. Unknown fields are silently ignored, which means adding fields not yet +supported by your installed version is safe. diff --git a/docs/developers/index.md b/docs/developers/index.md index f137dd3..f94c219 100644 --- a/docs/developers/index.md +++ b/docs/developers/index.md @@ -17,6 +17,8 @@ This section covers everything you need to extend, adapt, or contribute to Zenzi ## In this section +- [Writing Plugin Rules](plugins.md) — implement `BaseRule` subclasses, register + them via `entry_points`, and satisfy the pickle / purity contract. - [Writing an Adapter](writing-an-adapter.md) — implement the `BaseAdapter` protocol to teach Zenzic about a new documentation engine. - [Example Projects](examples.md) — four self-contained runnable fixtures that diff --git a/docs/developers/plugins.md b/docs/developers/plugins.md new file mode 100644 index 0000000..2de5983 --- /dev/null +++ b/docs/developers/plugins.md @@ -0,0 +1,219 @@ +--- +icon: lucide/puzzle +--- + + + + +# Writing Plugin Rules + +Zenzic supports external lint rules written in Python. A plugin rule is a +subclass of `BaseRule` distributed as a normal Python package and discovered at +runtime via the `zenzic.rules` [entry-point group][ep]. + +--- + +## The Rule Contract + +Every plugin rule must satisfy three non-negotiable requirements. These are +enforced at engine construction time — a rule that violates any of them is +rejected with a `PluginContractError` before the first file is scanned. + +### 1. Defined at module level + +The class must be importable by name from a module. Classes defined inside +functions or closures cannot be pickled and **will be rejected**. + +```python +# ✓ correct — importable as my_rules.NoDraftRule +class NoDraftRule(BaseRule): ... + +# ✗ wrong — not pickleable; will raise PluginContractError at load time +def make_rule(): + class NoDraftRule(BaseRule): ... + return NoDraftRule() +``` + +### 2. Pickle-serialisable + +The `AdaptiveRuleEngine` serialises rules via `pickle` before dispatching them +to worker processes. Every attribute stored on `self` must be pickleable. + +Safe attributes: strings, numbers, `re.compile()` patterns, frozen dataclasses, +`Path` objects, tuples of safe types. + +Unsafe attributes: open file handles, database connections, lambda functions, +`threading.Lock`, generator objects, or any object that defines `__reduce__` +incorrectly. + +```python +# ✓ compiled regex is pickleable +class NoDraftRule(BaseRule): + _pattern = re.compile(r"(?i)\bDRAFT\b") # class-level attribute + +# ✓ also fine as an instance attribute set in __init__ +class NoDraftRule(BaseRule): + def __init__(self) -> None: + self._pattern = re.compile(r"(?i)\bDRAFT\b") +``` + +### 3. Pure and deterministic + +`check()` and `check_vsm()` must: + +- **Never** open files, make network requests, or call subprocesses. +- **Always** return the same output for the same input — no randomness, no + dependency on mutable global state. +- **Not** mutate their arguments (`file_path`, `text`, `vsm`, `anchors_cache`). + +!!! danger "Global mutable state is prohibited" + A rule that writes to a global counter will appear to work in sequential + mode but will produce **non-deterministic, silently wrong** results in + parallel mode. Worker processes each receive an independent pickle copy + of the engine — mutations are local to the worker and discarded on + completion. All state must be returned as `RuleFinding` objects. + +--- + +## Minimal example + +```python +# my_org_rules/rules.py +import re +from pathlib import Path +from zenzic.core.rules import BaseRule, RuleFinding + + +class NoInternalHostnameRule(BaseRule): + """Flag occurrences of the internal hostname in public documentation.""" + + _pattern = re.compile(r"internal\.corp\.example\.com", re.IGNORECASE) + + @property + def rule_id(self) -> str: + return "MYORG-001" + + def check(self, file_path: Path, text: str) -> list[RuleFinding]: + findings = [] + for lineno, line in enumerate(text.splitlines(), start=1): + if self._pattern.search(line): + findings.append( + RuleFinding( + file_path=file_path, + line_no=lineno, + rule_id=self.rule_id, + message="Internal hostname must not appear in public docs.", + severity="error", + matched_line=line, + ) + ) + return findings +``` + +--- + +## Packaging and registration + +Expose the rule through the `zenzic.rules` entry-point group in your package's +`pyproject.toml`: + +```toml +[project.entry-points."zenzic.rules"] +no-internal-hostname = "my_org_rules.rules:NoInternalHostnameRule" +``` + +The entry-point name (`no-internal-hostname`) is the **plugin ID** that users +reference in `zenzic.toml` (see [Enabling plugins](#enabling-plugins) below). + +Install your package alongside Zenzic: + +```bash +uv add my-org-rules # or: pip install my-org-rules +``` + +After installing, run `zenzic plugins list` to confirm the rule is discovered: + +```bash +zenzic plugins list +# Installed plugin rules (2 found) +# broken-links Z001 (core) zenzic.core.rules.VSMBrokenLinkRule +# no-internal-hostname MYORG-001 (my-org-rules) my_org_rules.rules.NoInternalHostnameRule +``` + +--- + +## Enabling plugins + +Core rules (registered under `zenzic.rules` by Zenzic itself) are always +active. External plugin rules must be explicitly enabled in `zenzic.toml` +under the `plugins` key: + +```toml +# zenzic.toml +[build_context] +engine = "mkdocs" + +plugins = ["no-internal-hostname"] +``` + +Only plugins listed here will be loaded. Installing a package that registers +rules under `zenzic.rules` without listing it in `plugins` has no effect — +this is intentional **Safe Harbor** behaviour: you always know exactly which +rules are active in your project. + +--- + +## VSM-aware rules + +Rules that need to validate links against the routing table should override +`check_vsm` instead of (or in addition to) `check`. The engine calls +`check_vsm` when a VSM and `anchors_cache` are available: + +```python +from collections.abc import Mapping +from zenzic.core.rules import BaseRule, RuleFinding +from zenzic.models.vsm import Route + + +class NoOrphanLinkRule(BaseRule): + @property + def rule_id(self) -> str: + return "MYORG-002" + + def check(self, file_path, text): + return [] # no standalone check; requires VSM context + + def check_vsm(self, file_path, text, vsm: Mapping[str, Route], anchors_cache): + # vsm maps canonical URL → Route; consult vsm[url].status + ... + return [] # return list[Violation] +``` + +See [`BaseRule`][api-baserule] in the API reference for the complete interface. + +--- + +## Error isolation + +If a plugin rule raises an unexpected exception inside `check()` or +`check_vsm()`, the engine catches it, emits a single `"error"` finding with +`rule_id="RULE-ENGINE-ERROR"`, and continues scanning. One faulty plugin +cannot abort the scan of the entire docs tree. + +If a plugin rule fails the **eager pickle validation** at load time (i.e. it +is not serialisable), Zenzic raises `PluginContractError` immediately and +refuses to start. Fix the rule before running Zenzic. + +--- + +## Checklist before publishing + +- [ ] Class defined at module level (not inside a function or lambda). +- [ ] All `self.*` attributes are pickleable. +- [ ] `check()` is pure: no I/O, no side effects, same output for same input. +- [ ] `rule_id` is a stable, unique string (include an org prefix, e.g. `"MYORG-001"`). +- [ ] Entry-point registered under `zenzic.rules` in `pyproject.toml`. +- [ ] Plugin ID listed in the project's `zenzic.toml` under `plugins`. + +[ep]: https://packaging.python.org/en/latest/guides/creating-and-discovering-plugins/#using-package-metadata +[api-baserule]: ../reference/api.md diff --git a/docs/it/architecture.md b/docs/it/architecture.md index 838f3e7..951631e 100644 --- a/docs/it/architecture.md +++ b/docs/it/architecture.md @@ -193,6 +193,76 @@ Proprietà chiave: --- +## Hybrid Adaptive Engine (v0.5.0a1) + +`scan_docs_references` è l'unico punto di ingresso unificato per tutte le +modalità di scansione. Non esiste più una funzione "parallela" separata — il +motore **si adatta automaticamente** in base alla dimensione del repository. + +```mermaid +flowchart TD + classDef node fill:#0f172a,stroke:#38bdf8,stroke-width:2px,color:#e2e8f0 + classDef decide fill:#0f172a,stroke:#f59e0b,stroke-width:2px,color:#e2e8f0 + classDef worker fill:#0f172a,stroke:#10b981,stroke-width:2px,color:#e2e8f0 + classDef seq fill:#0f172a,stroke:#6366f1,stroke-width:2px,color:#e2e8f0 + classDef io fill:#0f172a,stroke:#4f46e5,stroke-width:2px,color:#e2e8f0 + + ENTRY["scan_docs_references()\nrepo_root, config\nworkers, validate_links"]:::node + THRESHOLD{"file ≥ 50\nE workers ≠ 1?"}:::decide + + SEQ["Percorso sequenziale\n_scan_single_file × N\nletture O(N)"]:::seq + SORT_SEQ["Sorted list[IntegrityReport]"]:::io + + FAN["Processo principale\npickle(config, engine)\n→ work_items"]:::node + W1["Worker 1\n_scan_single_file\n(page_A.md)"]:::worker + W2["Worker 2\n_scan_single_file\n(page_B.md)"]:::worker + WN["Worker N\n_scan_single_file\n(page_Z.md)"]:::worker + MERGE["Merge ordinato\nper file_path"]:::node + SORT_PAR["Sorted list[IntegrityReport]"]:::io + + ENTRY --> THRESHOLD + THRESHOLD -->|"No (< 50 file\no workers=1)"| SEQ + SEQ --> SORT_SEQ + THRESHOLD -->|"Sì"| FAN + FAN -->|"pickle(config, engine)"| W1 & W2 & WN + W1 & W2 & WN --> MERGE + MERGE --> SORT_PAR +``` + +### Percorso sequenziale + +Usato quando `workers=1` (default) o quando il repository ha meno di 50 file. +Zero overhead di avvio del processo. Supporta la validazione degli URL esterni +in un singolo pass O(N). + +### Percorso parallelo + +Attivato quando `workers != 1` e il numero di file è pari o superiore a +`ADAPTIVE_PARALLEL_THRESHOLD` (50). Ogni file viene inviato a un processo +worker indipendente tramite `ProcessPoolExecutor`. + +**Architettura shared-nothing:** `config` e l'`AdaptiveRuleEngine` (incluse +tutte le regole registrate) vengono serializzati tramite `pickle` prima di +essere inviati a ciascun worker. Ogni worker opera su una copia indipendente — +nessuna memoria condivisa, nessun lock, nessuna race condition. + +**Contratto di immutabilità:** i worker non devono mutare `config`. Le regole +che scrivono su stato globale mutabile (es. un contatore a livello di classe) +violano il Pilastro delle Funzioni Pure. In modalità parallela, ogni worker +tiene una copia indipendente di quello stato — le mutazioni sono locali e +vengono scartate, producendo risultati che divergono silenziosamente dalla +modalità sequenziale. + +**Validazione pickle anticipata:** `AdaptiveRuleEngine` chiama `pickle.dumps()` +su ogni regola al momento della costruzione. Una regola non serializzabile +solleva `PluginContractError` immediatamente, prima che venga scansionato +qualsiasi file. + +**Garanzia di determinismo:** i risultati vengono ordinati per `file_path` +dopo la raccolta, indipendentemente dall'ordine di scheduling dei worker. + +--- + ## Riepilogo del flusso dati ### CLI diff --git a/docs/it/configuration/index.md b/docs/it/configuration/index.md index fbad9c6..eb13c77 100644 --- a/docs/it/configuration/index.md +++ b/docs/it/configuration/index.md @@ -93,11 +93,57 @@ locales = ["it"] ## Caricamento della configurazione -Zenzic legge `zenzic.toml` dalla root del repository all'avvio. La root viene individuata -risalendo dalla directory di lavoro corrente fino a trovare una directory `.git` o un file -`zenzic.toml`. - -Se `zenzic.toml` è assente, vengono applicati silenziosamente tutti i valori predefiniti. Se -`zenzic.toml` è presente ma contiene un **errore di sintassi TOML**, Zenzic solleva un -`ConfigurationError` con un messaggio leggibile ed esce immediatamente. I campi sconosciuti -vengono ignorati silenziosamente. +Zenzic segue una catena di priorità a tre livelli (**Cittadino Agnostico**) quando cerca la +configurazione all'avvio: + +| Priorità | Sorgente | Quando viene usata | +| :---: | :--- | :--- | +| 1 | `zenzic.toml` nella root del repository | Sempre preferita — la configurazione sovrana | +| 2 | `[tool.zenzic]` in `pyproject.toml` | Usata solo quando `zenzic.toml` è assente | +| 3 | Valori predefiniti interni | Usati quando nessun file è presente | + +La root del repository viene individuata risalendo dalla directory di lavoro corrente fino a +trovare una directory `.git`, un file `zenzic.toml` o un `pyproject.toml`. + +### zenzic.toml (Priorità 1) + +Il file di configurazione dedicato. Se esiste, Zenzic lo legge e ignora completamente +`pyproject.toml` — non c'è nessuna fusione tra i due file. + +### pyproject.toml (Priorità 2) — v0.5.0a1 + +I progetti Python che hanno già un `pyproject.toml` possono incorporare la configurazione di +Zenzic nella tabella `[tool.zenzic]`, eliminando la necessità di un file separato: + +```toml +# pyproject.toml — configurazione Zenzic nel file di metadati Python standard + +[tool.zenzic] +docs_dir = "docs" +fail_under = 90 + +[tool.zenzic.build_context] +engine = "mkdocs" + +[[tool.zenzic.custom_rules]] +id = "ZZ-NODRAFT" +pattern = "(?i)\\bDRAFT\\b" +message = "Rimuovere il marker DRAFT prima della pubblicazione." +severity = "warning" +``` + +Tutti i campi supportati in `zenzic.toml` sono ugualmente supportati in `[tool.zenzic]`. La +sotto-tabella `[build_context]` diventa `[tool.zenzic.build_context]`, e gli array `[[custom_rules]]` +diventano `[[tool.zenzic.custom_rules]]`. + +!!! note "Regola della sovranità" + Se sia `zenzic.toml` **che** `pyproject.toml` esistono nella root del repository, + `zenzic.toml` vince incondizionatamente. La tabella `[tool.zenzic]` in `pyproject.toml` + viene ignorata. + +### Gestione degli errori + +Se il file di configurazione vincitore contiene un **errore di sintassi TOML**, Zenzic solleva un +`ConfigurationError` con un messaggio leggibile ed esce immediatamente — il fallback silenzioso +su un file di configurazione non valido nasconderebbe gli errori. I campi sconosciuti vengono +ignorati silenziosamente. diff --git a/docs/it/developers/index.md b/docs/it/developers/index.md index 27bad7d..37477e2 100644 --- a/docs/it/developers/index.md +++ b/docs/it/developers/index.md @@ -19,6 +19,8 @@ Zenzic. ## In questa sezione +- [Scrivere Regole Plugin](plugins.md) — implementa sottoclassi `BaseRule`, + registrale tramite `entry_points` e soddisfa il contratto pickle / purezza. - [Scrivere un Adapter](writing-an-adapter.md) — implementa il protocollo `BaseAdapter` per insegnare a Zenzic a gestire un nuovo motore di documentazione. - [Progetti di Esempio](examples.md) — quattro fixture eseguibili auto-contenuti che diff --git a/docs/it/developers/plugins.md b/docs/it/developers/plugins.md new file mode 100644 index 0000000..233dbdb --- /dev/null +++ b/docs/it/developers/plugins.md @@ -0,0 +1,225 @@ +--- +icon: lucide/puzzle +--- + + + + +# Scrivere Regole Plugin + +Zenzic supporta regole di linting esterne scritte in Python. Una regola plugin è +una sottoclasse di `BaseRule` distribuita come un normale pacchetto Python e +scoperta a runtime tramite il gruppo di entry-point `zenzic.rules`. + +--- + +## Il Contratto della Regola + +Ogni regola plugin deve soddisfare tre requisiti non negoziabili. Questi vengono +verificati al momento della costruzione del motore — una regola che viola uno +qualsiasi di essi viene rifiutata con `PluginContractError` prima che il primo +file venga scansionato. + +### 1. Definita a livello di modulo + +La classe deve essere importabile per nome da un modulo. Le classi definite +all'interno di funzioni o closure non possono essere pickled e **verranno +rifiutate**. + +```python +# ✓ corretto — importabile come my_rules.NoDraftRule +class NoDraftRule(BaseRule): ... + +# ✗ errato — non serializzabile; solleverà PluginContractError al caricamento +def make_rule(): + class NoDraftRule(BaseRule): ... + return NoDraftRule() +``` + +### 2. Serializzabile via pickle + +L'`AdaptiveRuleEngine` serializza le regole tramite `pickle` prima di +inviarle ai processi worker. Ogni attributo memorizzato su `self` deve essere +serializzabile. + +Attributi sicuri: stringhe, numeri, pattern `re.compile()`, dataclass frozen, +oggetti `Path`, tuple di tipi sicuri. + +Attributi non sicuri: file handle aperti, connessioni a database, funzioni +lambda, `threading.Lock`, oggetti generator, o qualsiasi oggetto che definisce +`__reduce__` incorrettamente. + +```python +# ✓ regex compilata è serializzabile +class NoDraftRule(BaseRule): + _pattern = re.compile(r"(?i)\bDRAFT\b") # attributo di classe + +# ✓ funziona anche come attributo di istanza impostato in __init__ +class NoDraftRule(BaseRule): + def __init__(self) -> None: + self._pattern = re.compile(r"(?i)\bDRAFT\b") +``` + +### 3. Pura e deterministica + +`check()` e `check_vsm()` devono: + +- **Mai** aprire file, fare richieste di rete, o chiamare sottoprocessi. +- **Sempre** restituire lo stesso output per lo stesso input — nessuna casualità, + nessuna dipendenza da stato globale mutabile. +- **Non** mutare i propri argomenti (`file_path`, `text`, `vsm`, `anchors_cache`). + +!!! danger "Lo stato globale mutabile è proibito" + Una regola che scrive su un contatore globale sembrerà funzionare in + modalità sequenziale, ma produrrà risultati **non deterministici e + silenziosamente errati** in modalità parallela. I processi worker + ricevono ciascuno una copia pickle indipendente del motore — le mutazioni + sono locali al worker e vengono scartate al completamento. Tutto lo + stato deve essere restituito come oggetti `RuleFinding`. + +--- + +## Esempio minimale + +```python +# my_org_rules/rules.py +import re +from pathlib import Path +from zenzic.core.rules import BaseRule, RuleFinding + + +class NoInternalHostnameRule(BaseRule): + """Segnala le occorrenze dell'hostname interno nella documentazione pubblica.""" + + _pattern = re.compile(r"internal\.corp\.example\.com", re.IGNORECASE) + + @property + def rule_id(self) -> str: + return "MYORG-001" + + def check(self, file_path: Path, text: str) -> list[RuleFinding]: + findings = [] + for lineno, line in enumerate(text.splitlines(), start=1): + if self._pattern.search(line): + findings.append( + RuleFinding( + file_path=file_path, + line_no=lineno, + rule_id=self.rule_id, + message="L'hostname interno non deve apparire nella documentazione pubblica.", + severity="error", + matched_line=line, + ) + ) + return findings +``` + +--- + +## Pacchettizzazione e registrazione + +Esponi la regola tramite il gruppo di entry-point `zenzic.rules` nel +`pyproject.toml` del tuo pacchetto: + +```toml +[project.entry-points."zenzic.rules"] +no-internal-hostname = "my_org_rules.rules:NoInternalHostnameRule" +``` + +Il nome dell'entry-point (`no-internal-hostname`) è il **plugin ID** che gli +utenti referenziano in `zenzic.toml` (vedi [Abilitare i plugin](#abilitare-i-plugin) +qui sotto). + +Installa il tuo pacchetto insieme a Zenzic: + +```bash +uv add my-org-rules # oppure: pip install my-org-rules +``` + +Dopo l'installazione, esegui `zenzic plugins list` per confermare che la regola venga +scoperta: + +```bash +zenzic plugins list +# Installed plugin rules (2 found) +# broken-links Z001 (core) zenzic.core.rules.VSMBrokenLinkRule +# no-internal-hostname MYORG-001 (my-org-rules) my_org_rules.rules.NoInternalHostnameRule +``` + +--- + +## Abilitare i plugin + +Le regole core (registrate sotto `zenzic.rules` da Zenzic stesso) sono sempre +attive. Le regole plugin esterne devono essere esplicitamente abilitate in +`zenzic.toml` sotto la chiave `plugins`: + +```toml +# zenzic.toml +[build_context] +engine = "mkdocs" + +plugins = ["no-internal-hostname"] +``` + +Solo i plugin elencati qui verranno caricati. L'installazione di un pacchetto +che registra regole sotto `zenzic.rules` senza elencarlo in `plugins` non ha +effetto — questo è il comportamento intenzionale del **Safe Harbor**: sai sempre +esattamente quali regole sono attive nel tuo progetto. + +--- + +## Regole VSM-aware + +Le regole che devono validare i link contro la tabella di routing devono +sovrascrivere `check_vsm` invece di (o in aggiunta a) `check`. Il motore chiama +`check_vsm` quando una VSM e `anchors_cache` sono disponibili: + +```python +from collections.abc import Mapping +from zenzic.core.rules import BaseRule, RuleFinding +from zenzic.models.vsm import Route + + +class NoOrphanLinkRule(BaseRule): + @property + def rule_id(self) -> str: + return "MYORG-002" + + def check(self, file_path, text): + return [] # nessun controllo autonomo; richiede contesto VSM + + def check_vsm(self, file_path, text, vsm: Mapping[str, Route], anchors_cache): + # vsm mappa URL canonico → Route; consulta vsm[url].status + ... + return [] # restituisce list[Violation] +``` + +Vedi [`BaseRule`][api-baserule] nella reference API per l'interfaccia completa. + +--- + +## Isolamento degli errori + +Se una regola plugin solleva un'eccezione inaspettata all'interno di `check()` +o `check_vsm()`, il motore la cattura, emette un singolo finding `"error"` con +`rule_id="RULE-ENGINE-ERROR"`, e continua la scansione. Un plugin difettoso non +può interrompere la scansione dell'intero albero della documentazione. + +Se una regola plugin fallisce la **validazione pickle anticipata** al momento +del caricamento (cioè non è serializzabile), Zenzic solleva `PluginContractError` +immediatamente e si rifiuta di avviarsi. Correggi la regola prima di eseguire +Zenzic. + +--- + +## Checklist prima della pubblicazione + +- [ ] Classe definita a livello di modulo (non all'interno di una funzione o lambda). +- [ ] Tutti gli attributi `self.*` sono serializzabili via pickle. +- [ ] `check()` è pura: nessun I/O, nessun effetto collaterale, stesso output per stesso input. +- [ ] `rule_id` è una stringa stabile e univoca (includi un prefisso org, es. `"MYORG-001"`). +- [ ] Entry-point registrato sotto `zenzic.rules` nel `pyproject.toml`. +- [ ] Plugin ID elencato nel `zenzic.toml` del progetto sotto `plugins`. + +[api-baserule]: ../../reference/api.md diff --git a/docs/it/usage/advanced.md b/docs/it/usage/advanced.md index bbb7079..0b047e8 100644 --- a/docs/it/usage/advanced.md +++ b/docs/it/usage/advanced.md @@ -180,20 +180,20 @@ for f in report.findings: ### Scansione multi-file -Usa `scan_docs_references_with_links` per scansionare ogni file `.md` in un repository e +Usa `scan_docs_references` per scansionare ogni file `.md` in un repository e facoltativamente validare gli URL esterni: ```python from pathlib import Path -from zenzic.core.scanner import scan_docs_references_with_links +from zenzic.core.scanner import scan_docs_references from zenzic.models.config import ZenzicConfig config, _ = ZenzicConfig.load(Path(".")) -reports, link_errors = scan_docs_references_with_links( +reports, link_errors = scan_docs_references( Path("."), + config, validate_links=True, # imposta False per saltare la validazione HTTP - config=config, ) for report in reports: @@ -206,24 +206,55 @@ for error in link_errors: print(f"[LINK] {error}") ``` -`scan_docs_references_with_links` deduplica gli URL esterni sull'intero albero della +`scan_docs_references` deduplica gli URL esterni sull'intero albero della documentazione prima di inviare richieste HTTP — 50 file che linkano allo stesso URL producono esattamente una richiesta HEAD. -### Scansione parallela (repository grandi) +### Hybrid Adaptive Engine — v0.5.0a1 -Per repository con più di ~200 file Markdown, usa `scan_docs_references_parallel`: +`scan_docs_references` è l'unico punto di ingresso unificato per tutte le +modalità di scansione. Seleziona l'esecuzione sequenziale o parallela +**automaticamente** in base al numero di file nel repository: + +| Dimensione repo | Comportamento del motore | Motivo | +| :--- | :--- | :--- | +| < 50 file | Sequenziale (sempre) | L'overhead di avvio del processo (~200–400 ms) supera il beneficio del parallelismo | +| ≥ 50 file, `workers=1` | Sequenziale | Override seriale esplicito | +| ≥ 50 file, `workers=None` o `workers=N` | Parallelo (`ProcessPoolExecutor`) | Il lavoro regex CPU-bound domina; scaling lineare | +| 5 000+ file | Parallelo con `workers=cpu_count` | Speedup 3–6× provato su runner 8-core | + +La soglia di 50 file (`ADAPTIVE_PARALLEL_THRESHOLD`) è il punto di pareggio +conservativo dove il parallelismo ripaga il proprio costo di avvio. ```python from pathlib import Path -from zenzic.core.scanner import scan_docs_references_parallel +from zenzic.core.scanner import scan_docs_references + +# Default: sequenziale (workers=1, zero overhead) +reports, _ = scan_docs_references(Path(".")) + +# Parallelo esplicito: 4 worker, si attiva solo se ≥ 50 file +reports, _ = scan_docs_references(Path("."), workers=4) -reports = scan_docs_references_parallel(Path("."), workers=4) +# Completamente automatico: ProcessPoolExecutor sceglie il numero di worker da os.cpu_count() +reports, _ = scan_docs_references(Path("."), workers=None) + +# Con validazione link esterni (funziona in entrambe le modalità) +reports, link_errors = scan_docs_references(Path("."), validate_links=True, workers=None) ``` -La modalità parallela usa `ProcessPoolExecutor`. La validazione degli URL esterni non è -disponibile in modalità parallela — usa `scan_docs_references_with_links` per la scansione -sequenziale con validazione dei link. +**Garanzia di determinismo:** i risultati sono sempre ordinati per `file_path` +indipendentemente dalla modalità di esecuzione. + +**Contratto pickle per le regole plugin (`BaseRule` subclasses):** + +Le regole vengono validate per la serializzabilità pickle al momento della +costruzione del motore (**validazione anticipata**). Una regola non +serializzabile solleva `PluginContractError` immediatamente — prima che venga +scansionato qualsiasi file. + +Vedi [Scrivere Regole Plugin](../../developers/plugins.md) per il contratto +completo, esempi e istruzioni di pacchettizzazione. --- diff --git a/docs/it/usage/commands.md b/docs/it/usage/commands.md index da85d51..c84a863 100644 --- a/docs/it/usage/commands.md +++ b/docs/it/usage/commands.md @@ -236,6 +236,31 @@ zenzic score # mostra il punteggio per visibilità --- +## Ispezione dei plugin + +```bash +zenzic plugins list # Elenca tutte le regole registrate nel gruppo zenzic.rules +``` + +`zenzic plugins list` mostra ogni regola di linting che il motore conosce — le regole Core +incluse in Zenzic e qualsiasi regola di terze parti installata da pacchetti esterni: + +```text +Installed plugin rules (1 found) + + broken-links Z001 (core) zenzic.core.rules.VSMBrokenLinkRule +``` + +Ogni riga mostra il nome dell'entry-point, il `rule_id` stabile della regola (usato nei +finding e nelle liste di soppressione), la distribuzione di origine (`(core)` per le regole +built-in, o il nome del pacchetto per i plugin di terze parti), e il nome completo della +classe Python. + +Usa questo comando per verificare quali regole sono attive dopo aver installato un pacchetto +plugin. + +--- + ## `uvx` vs `uv run` vs `zenzic` diretto | Invocazione | Comportamento | Quando usare | diff --git a/docs/usage/advanced.md b/docs/usage/advanced.md index 43bbc72..e80d015 100644 --- a/docs/usage/advanced.md +++ b/docs/usage/advanced.md @@ -175,20 +175,20 @@ for f in report.findings: ### Multi-file scan -Use `scan_docs_references_with_links` to scan every `.md` file in a repository and optionally +Use `scan_docs_references` to scan every `.md` file in a repository and optionally validate external URLs: ```python from pathlib import Path -from zenzic.core.scanner import scan_docs_references_with_links +from zenzic.core.scanner import scan_docs_references from zenzic.models.config import ZenzicConfig config, _ = ZenzicConfig.load(Path(".")) -reports, link_errors = scan_docs_references_with_links( +reports, link_errors = scan_docs_references( Path("."), + config, validate_links=True, # set False to skip HTTP validation - config=config, ) for report in reports: @@ -201,84 +201,63 @@ for error in link_errors: print(f"[LINK] {error}") ``` -`scan_docs_references_with_links` deduplicates external URLs across the entire docs tree before +`scan_docs_references` deduplicates external URLs across the entire docs tree before firing HTTP requests — 50 files linking to the same URL result in exactly one HEAD request. -### Parallel scan (large repos) — v0.4.0-rc5 +### Hybrid Adaptive Engine — v0.5.0a1 -For repositories with more than ~200 Markdown files, use `scan_docs_references_parallel`. -It distributes each file to an independent worker process via `ProcessPoolExecutor`, -exploiting the pureness of the per-file scan function (`_scan_single_file`): +`scan_docs_references` is the single unified entry point for all scan modes. +It selects sequential or parallel execution **automatically** based on the +number of files in the repository: + +| Repo size | Engine behaviour | Reason | +| :--- | :--- | :--- | +| < 50 files | Sequential (always) | Process-spawn overhead (~200–400 ms) exceeds the parallelism benefit | +| ≥ 50 files, `workers=1` | Sequential | Explicit serial override | +| ≥ 50 files, `workers=None` or `workers=N` | Parallel (`ProcessPoolExecutor`) | CPU-bound regex work dominates; linear scaling | +| 5 000+ files | Parallel with `workers=cpu_count` | Proven 3–6× speedup on 8-core runners | + +The 50-file threshold (`ADAPTIVE_PARALLEL_THRESHOLD`) is the conservative +break-even point where parallelism pays for its own startup cost. ```python from pathlib import Path -from zenzic.core.scanner import scan_docs_references_parallel +from zenzic.core.scanner import scan_docs_references -# workers=None lets ProcessPoolExecutor choose based on os.cpu_count() -reports = scan_docs_references_parallel(Path("."), workers=4) -``` +# Default: sequential (workers=1, zero overhead) +reports, _ = scan_docs_references(Path(".")) -**Honest performance contract:** +# Explicit parallel: 4 workers, auto-activates only if ≥ 50 files +reports, _ = scan_docs_references(Path("."), workers=4) -| Repo size | Recommended mode | Reason | -| :--- | :--- | :--- | -| < 50 files | `scan_docs_references` (sequential) | Process-spawn overhead (~200 ms) exceeds the parallelism benefit | -| 50 – 200 files | Sequential or parallel | Benchmark your specific case; gains are marginal | -| 200+ files | `scan_docs_references_parallel` | Per-file CPU-bound regex work dominates; linear scaling | -| 5 000+ files | Parallel with `workers=cpu_count` | Proven 3–6× speedup in benchmarks on 8-core CI runners | - -!!! note "No `--parallel` flag" - The CLI does not expose a `--parallel` flag in rc5. Parallelism is available exclusively - via the Python API. A `--workers N` CLI flag is planned for v0.4.0 stable. - -**What parallelism does NOT affect:** - -- External URL validation (`--links`) — not available in parallel mode; use - `scan_docs_references_with_links` instead. -- The Shield — runs per-worker; findings are collected and returned normally. -- Output order — results are sorted by `file_path` after collection to guarantee deterministic - output regardless of worker scheduling. - -**Pickling requirements for custom rules (`BaseRule` subclasses):** - -`ProcessPoolExecutor` serialises the `RuleEngine` (including all registered rules) using -Python `pickle` before dispatching it to worker processes. This imposes one constraint -on custom rule classes: - -- **Rules must be picklable.** A rule defined at module level is always picklable. A rule - defined inside a function, method, or `lambda` is not. -- **Pre-compiled regex patterns** (`re.compile(...)`) are picklable. Store compiled patterns - as class attributes or `__post_init__` instance attributes — never as `lambda` closures. -- **No mutable global state.** Workers receive independent copies of the rule engine. Any - state mutation inside a rule (e.g. a counter) is local to that worker process and is - discarded on completion. Rules that need global state must return it as part of their - `RuleFinding` output — not by writing to a shared variable. +# Fully automatic: ProcessPoolExecutor picks worker count from os.cpu_count() +reports, _ = scan_docs_references(Path("."), workers=None) -```python -# ✓ picklable — defined at module level, uses re.compile -class NoDraftRule(BaseRule): - _pattern = re.compile(r"(?i)\bDRAFT\b") - - @property - def rule_id(self) -> str: - return "ZZ-NODRAFT" - - def check(self, file_path: Path, text: str) -> list[RuleFinding]: - findings = [] - for lineno, line in enumerate(text.splitlines(), start=1): - if self._pattern.search(line): - findings.append(RuleFinding(file_path, lineno, self.rule_id, "DRAFT marker found")) - return findings - -# ✗ not picklable — defined inside a function -def make_rule(): - class LocalRule(BaseRule): ... # pickle cannot serialise this - return LocalRule() +# With external link validation (works in both sequential and parallel mode) +reports, link_errors = scan_docs_references(Path("."), validate_links=True, workers=None) ``` -`CustomRule` entries declared in `[[custom_rules]]` inside `zenzic.toml` are frozen -dataclasses with a compiled regex — they are always picklable and work in parallel mode -out of the box. +**Determinism guarantee:** results are always sorted by `file_path` regardless +of execution mode. The same input always produces the same ordered output. + +**Pickling contract for plugin rules (`BaseRule` subclasses):** + +Rules are validated for pickle-serializability at engine construction time +(**eager validation**). A non-serialisable rule raises `PluginContractError` +immediately — before any file is scanned. + +- **Rules must be defined at module level.** A class defined inside a function + or lambda cannot be pickled and will be rejected at load time. +- **All instance attributes must be pickleable.** Pre-compiled `re.compile()` + patterns, strings, and numbers are always safe. File handles, database + connections, and lambda closures are not. +- **No mutable global state.** Workers receive independent copies of the rule + engine (via pickle). A global counter mutated inside `check()` will be + local to each worker process and discarded on completion — results will differ + from sequential mode silently. Return all state as `RuleFinding` objects. + +See [Writing Plugin Rules](../developers/plugins.md) for the complete contract, +examples, and packaging instructions. --- diff --git a/docs/usage/commands.md b/docs/usage/commands.md index 790cf18..a2bd6b6 100644 --- a/docs/usage/commands.md +++ b/docs/usage/commands.md @@ -232,6 +232,29 @@ zenzic score # show score for visibility --- +## Plugin inspection + +```bash +zenzic plugins list # List all rules registered in the zenzic.rules entry-point group +``` + +`zenzic plugins list` shows every lint rule the engine knows about — Core rules bundled with +Zenzic and any third-party rules installed from external packages: + +```text +Installed plugin rules (1 found) + + broken-links Z001 (core) zenzic.core.rules.VSMBrokenLinkRule +``` + +Each row shows the entry-point name, the rule's stable `rule_id` (used in findings and +suppression lists), the origin distribution (`(core)` for built-in rules, or the package +name for third-party plugins), and the fully qualified Python class name. + +Use this command to verify which rules are active after installing a plugin package. + +--- + ## `uvx` vs `uv run` vs bare `zenzic` | Invocation | Behaviour | When to use | diff --git a/pyproject.toml b/pyproject.toml index 87547c7..4377bae 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -7,7 +7,7 @@ build-backend = "hatchling.build" [project] name = "zenzic" -version = "0.4.0rc5" +version = "0.5.0a1" description = "Engineering-grade, engine-agnostic linter and security shield for Markdown documentation" readme = "README.md" requires-python = ">=3.11" @@ -34,7 +34,7 @@ dependencies = [ "rich>=13.0.0", "pyyaml>=6.0.0", "pydantic>=2.0.0", - "httpx>=0.27", + "httpx>=0.27,<1.0", ] [project.optional-dependencies] @@ -53,6 +53,9 @@ mkdocs = "zenzic.core.adapters:MkDocsAdapter" zensical = "zenzic.core.adapters:ZensicalAdapter" vanilla = "zenzic.core.adapters:VanillaAdapter" +[project.entry-points."zenzic.rules"] +broken-links = "zenzic.core.rules:VSMBrokenLinkRule" + [project.urls] diff --git a/src/zenzic/__init__.py b/src/zenzic/__init__.py index d0c6225..81057e5 100644 --- a/src/zenzic/__init__.py +++ b/src/zenzic/__init__.py @@ -2,4 +2,4 @@ # SPDX-License-Identifier: Apache-2.0 """Zenzic — engineering-grade documentation linter for MkDocs sites.""" -__version__ = "0.4.0rc5" +__version__ = "0.5.0a1" diff --git a/src/zenzic/cli.py b/src/zenzic/cli.py index cbbd693..410b6b5 100644 --- a/src/zenzic/cli.py +++ b/src/zenzic/cli.py @@ -24,7 +24,7 @@ find_placeholders, find_repo_root, find_unused_assets, - scan_docs_references_with_links, + scan_docs_references, ) from zenzic.core.scorer import ScoreReport, compute_score, load_snapshot, save_snapshot from zenzic.core.validator import ( @@ -52,6 +52,13 @@ rich_markup_mode="rich", ) +plugins_app = typer.Typer( + name="plugins", + help="Inspect the Zenzic plugin registry.", + no_args_is_help=True, + rich_markup_mode="rich", +) + console = Console(highlight=False) _NO_CONFIG_HINT = Panel( @@ -205,9 +212,7 @@ def check_references( config, loaded_from_file = ZenzicConfig.load(repo_root) if not loaded_from_file: _print_no_config_hint() - reports, link_errors = scan_docs_references_with_links( - repo_root, validate_links=links, config=config - ) + reports, link_errors = scan_docs_references(repo_root, config, validate_links=links) docs_root = repo_root / config.docs_dir @@ -408,7 +413,7 @@ def _collect_all_results( strict: bool, ) -> _AllCheckResults: """Run all seven checks and return results as a typed container.""" - ref_reports, _ = scan_docs_references_with_links(repo_root, validate_links=False, config=config) + ref_reports, _ = scan_docs_references(repo_root, config, validate_links=False) docs_root = repo_root / config.docs_dir reference_errors: list[str] = [] security_events = 0 @@ -561,6 +566,42 @@ def check_all( console.print("\n[green]SUCCESS:[/] All checks passed.") +@plugins_app.command(name="list") +def plugins_list() -> None: + """List all rules registered in the ``zenzic.rules`` entry-point group. + + Shows Core rules (bundled with Zenzic) and any third-party plugin rules + discovered from installed packages. Each row includes: + + * **Source** — entry-point name (e.g. ``broken-links``). + * **Rule ID** — stable identifier emitted in findings (e.g. ``Z001``). + * **Origin** — distribution that registered the rule. + * **Class** — fully qualified Python class name. + """ + from zenzic.core.rules import list_plugin_rules + + rules = list_plugin_rules() + if not rules: + console.print("[yellow]No rules found in the 'zenzic.rules' entry-point group.[/]") + console.print( + "[dim]Install a plugin package or check that Zenzic is installed correctly.[/]" + ) + return + + console.print(f"\n[bold]Installed plugin rules[/] ({len(rules)} found)\n") + for info in rules: + origin_badge = ( + "[dim cyan](core)[/]" if info.origin == "zenzic" else f"[dim green]({info.origin})[/]" + ) + console.print( + f" [bold cyan]{info.source}[/] " + f"[bold]{info.rule_id}[/] " + f"{origin_badge} " + f"[dim]{info.class_name}[/]" + ) + console.print() + + def _detect_engine(repo_root: Path, override: str | None) -> str | None: """Resolve which documentation engine binary to use for ``zenzic serve``. diff --git a/src/zenzic/core/exceptions.py b/src/zenzic/core/exceptions.py index 29baa8a..e4e8fa7 100644 --- a/src/zenzic/core/exceptions.py +++ b/src/zenzic/core/exceptions.py @@ -13,7 +13,8 @@ ├── ConfigurationError — missing / malformed config files │ └── EngineError — engine binary absent or incompatible ├── CheckError — check machinery failure (not a finding) - └── NetworkError — HTTP failure during link validation + ├── NetworkError — HTTP failure during link validation + └── PluginContractError — rule plugin violates the pickle / purity contract """ from __future__ import annotations @@ -122,3 +123,21 @@ class NetworkError(ZenzicError): context={"url": url, "timeout_s": 10}, ) """ + + +class PluginContractError(ZenzicError): + """Raised when a plugin rule violates the serialisability or purity contract. + + The :class:`~zenzic.core.rules.AdaptiveRuleEngine` validates every rule at + construction time. A rule that cannot be pickled (e.g. defined inside a + function, or holding a reference to an unpickleable object) is rejected + immediately with this error rather than failing inside a worker process. + + Examples:: + + raise PluginContractError( + "Rule 'MY-001' is not serialisable and cannot be used with the " + "AdaptiveRuleEngine.", + context={"rule_id": "MY-001", "cause": str(exc)}, + ) + """ diff --git a/src/zenzic/core/rules.py b/src/zenzic/core/rules.py index 7f38ab9..f28cb37 100644 --- a/src/zenzic/core/rules.py +++ b/src/zenzic/core/rules.py @@ -17,7 +17,7 @@ ``zenzic.rules`` entry-point group. For complex, multi-line logic that a regex cannot express. -Both kinds are applied through the same :class:`RuleEngine.run` interface so +Both kinds are applied through the same :class:`AdaptiveRuleEngine.run` interface so the scanner only sees one surface. Rule dependency taxonomy (🔌 Dev 3 — relevant for caching) @@ -56,11 +56,12 @@ * **Lint the Source:** Rules receive raw Markdown text — never HTML. * **No Subprocesses:** The rule module does not import or invoke any process. * **Pure Functions First:** :meth:`BaseRule.check` must be deterministic and - side-effect-free. :class:`RuleEngine.run` is also pure (list in, list out). + side-effect-free. :class:`AdaptiveRuleEngine.run` is also pure (list in, list out). """ from __future__ import annotations +import pickle import re from abc import ABC, abstractmethod from collections.abc import Mapping, Sequence @@ -327,26 +328,64 @@ def check(self, file_path: Path, text: str) -> list[RuleFinding]: return findings -# ─── RuleEngine ─────────────────────────────────────────────────────────────── +# ─── AdaptiveRuleEngine ─────────────────────────────────────────────────────── -class RuleEngine: +def _assert_pickleable(rule: BaseRule) -> None: + """Raise :class:`PluginContractError` if *rule* cannot be pickled. + + Called at engine construction time (eager validation) so that a + non-serialisable rule is rejected before the first file is scanned, + not inside a worker process mid-run. + + Args: + rule: A :class:`BaseRule` instance to validate. + + Raises: + PluginContractError: When ``pickle.dumps(rule)`` raises any error. + """ + from zenzic.core.exceptions import PluginContractError # deferred: avoid circular import + + try: + pickle.dumps(rule) + except Exception as exc: # noqa: BLE001 + raise PluginContractError( + f"Rule '{rule.rule_id}' ({type(rule).__qualname__}) is not serialisable " + f"and cannot be used with the AdaptiveRuleEngine.\n" + f" Cause: {type(exc).__name__}: {exc}\n" + f" Fix: ensure the rule class is defined at module level (not inside a " + f"function or closure) and that all instance attributes are pickleable.", + ) from exc + + +class AdaptiveRuleEngine: """Applies a collection of :class:`BaseRule` instances to a Markdown file. The engine is stateless after construction — :meth:`run` is a pure function that maps ``(path, text)`` to a list of findings. + All registered rules are validated for pickle-serializability at + construction time (**eager validation**). This ensures that any rule + incompatible with multiprocessing is rejected immediately — before the + first file is scanned — rather than failing silently inside a worker + process. + Usage:: - engine = RuleEngine(config.custom_rules) + engine = AdaptiveRuleEngine(rules) findings = engine.run(Path("docs/guide.md"), text) Args: rules: Iterable of :class:`BaseRule` (or :class:`CustomRule`) instances to apply. Order is preserved in the output. + + Raises: + PluginContractError: If any rule fails the eager pickle validation. """ def __init__(self, rules: Sequence[BaseRule]) -> None: + for rule in rules: + _assert_pickleable(rule) self._rules = rules def __bool__(self) -> bool: @@ -640,3 +679,62 @@ def _to_canonical_url(href: str) -> str | None: return "/" return "/" + "/".join(parts) + "/" + + +# ─── Plugin discovery ───────────────────────────────────────────────────────── + + +from dataclasses import dataclass as _dc # noqa: E402 — module-level, after all classes + + +@_dc +class PluginRuleInfo: + """Metadata about a discovered plugin rule. + + Attributes: + rule_id: The stable identifier returned by :attr:`BaseRule.rule_id`. + class_name: Fully qualified class name (``module.ClassName``). + source: Entry-point name (e.g. ``"broken-links"``). + origin: Distribution name that registered the rule, or + ``"zenzic"`` for core rules. + """ + + rule_id: str + class_name: str + source: str + origin: str + + +def list_plugin_rules() -> list[PluginRuleInfo]: + """Return metadata for every rule registered in the ``zenzic.rules`` group. + + Iterates over all entry points in the ``zenzic.rules`` + ``importlib.metadata`` group, loads each class, instantiates it (using + a no-argument constructor), and captures its :attr:`BaseRule.rule_id`. + Entry points that cannot be loaded or instantiated are skipped — discovery + is best-effort and must never crash the CLI. + + Returns: + Sorted list of :class:`PluginRuleInfo`, ordered by ``source`` name. + """ + from importlib.metadata import entry_points + + results: list[PluginRuleInfo] = [] + eps = entry_points(group="zenzic.rules") + for ep in eps: + try: + cls = ep.load() + instance: BaseRule = cls() + rid = instance.rule_id + except Exception: # noqa: BLE001 + continue + dist_name = ep.dist.name if ep.dist is not None else "zenzic" + results.append( + PluginRuleInfo( + rule_id=rid, + class_name=f"{cls.__module__}.{cls.__qualname__}", + source=ep.name, + origin=dist_name, + ) + ) + return sorted(results, key=lambda r: r.source) diff --git a/src/zenzic/core/scanner.py b/src/zenzic/core/scanner.py index fda4dc5..7f60891 100644 --- a/src/zenzic/core/scanner.py +++ b/src/zenzic/core/scanner.py @@ -24,7 +24,7 @@ from urllib.parse import unquote from zenzic.core.adapter import get_adapter -from zenzic.core.rules import RuleEngine +from zenzic.core.rules import AdaptiveRuleEngine from zenzic.core.shield import SecurityFinding, scan_line_for_secrets, scan_url_for_secrets from zenzic.core.validator import LinkValidator from zenzic.models.config import ZenzicConfig @@ -676,7 +676,7 @@ def get_integrity_report( def _scan_single_file( md_file: Path, config: ZenzicConfig, - rule_engine: RuleEngine | None = None, + rule_engine: AdaptiveRuleEngine | None = None, ) -> tuple[IntegrityReport, ReferenceScanner | None]: """Run the Three-Phase Pipeline on one Markdown file. @@ -687,7 +687,7 @@ def _scan_single_file( Args: md_file: Absolute path to the Markdown file to process. config: Zenzic configuration. - rule_engine: Optional :class:`~zenzic.core.rules.RuleEngine` to apply + rule_engine: Optional :class:`~zenzic.core.rules.AdaptiveRuleEngine` to apply after the reference pipeline. When provided, the file is read once more as a string for the rule pass (rules receive the full text, not the line-by-line generator output). When ``None`` or empty, the @@ -740,8 +740,8 @@ def _iter_md_files( yield md_file -def _build_rule_engine(config: ZenzicConfig) -> RuleEngine | None: - """Construct a :class:`~zenzic.core.rules.RuleEngine` from the config. +def _build_rule_engine(config: ZenzicConfig) -> AdaptiveRuleEngine | None: + """Construct a :class:`~zenzic.core.rules.AdaptiveRuleEngine` from the config. Returns ``None`` when no custom rules are configured, avoiding the overhead of engine construction on projects that do not use the feature. @@ -759,79 +759,114 @@ def _build_rule_engine(config: ZenzicConfig) -> RuleEngine | None: ) for cr in config.custom_rules ] - return RuleEngine(rules) + return AdaptiveRuleEngine(rules) -def scan_docs_references( - repo_root: Path, - config: ZenzicConfig | None = None, -) -> list[IntegrityReport]: - """Run the Three-Phase Pipeline over every .md file in docs/. +def _emit_telemetry(*, mode: str, workers: int, n_files: int, elapsed: float) -> None: + """Write a one-line performance summary to stderr. - Returns one :class:`IntegrityReport` per file, sorted by file path. - Exit Code 2 must be enforced by the CLI layer when any report contains - security findings. + Only called when ``verbose=True`` is passed to :func:`scan_docs_references`. + Writes to stderr so it never contaminates stdout-captured output. - Args: - repo_root: Repository root (must contain ``docs/``). - config: Optional Zenzic configuration. + The speedup estimate for parallel mode assumes a linear model relative to + the sequential baseline: ``speedup ≈ workers × 0.7`` (accounting for + overhead and I/O serialisation). This is a rough heuristic for display + purposes only. - Returns: - Sorted list of :class:`IntegrityReport` objects. + Args: + mode: ``"Sequential"`` or ``"Parallel"``. + workers: Effective worker count used. + n_files: Number of ``.md`` files scanned. + elapsed: Wall-clock seconds from scan start to completion. """ - if config is None: - config, _ = ZenzicConfig.load(repo_root) - - docs_root = repo_root / config.docs_dir - if not docs_root.exists() or not docs_root.is_dir(): - return [] + import sys - rule_engine = _build_rule_engine(config) - reports: list[IntegrityReport] = [] - for md_file in _iter_md_files(docs_root, config): - report, _scanner = _scan_single_file(md_file, config, rule_engine) - reports.append(report) - - return reports + engine_label = ( + f"Adaptive (Parallel, {workers} worker{'s' if workers != 1 else ''})" + if mode == "Parallel" + else "Adaptive (Sequential)" + ) + time_str = f"{elapsed:.2f}s" + speedup_str = "" + if mode == "Parallel" and workers > 1: + estimated = round(min(workers * 0.7, workers - 0.1), 1) + speedup_str = f" Estimated speedup: {estimated}x" + + print( + f"[zenzic] Engine: {engine_label} " + f"Files: {n_files} " + f"Execution time: {time_str}" + f"{speedup_str}", + file=sys.stderr, + ) -def scan_docs_references_with_links( +def scan_docs_references( repo_root: Path, + config: ZenzicConfig | None = None, *, validate_links: bool = False, - config: ZenzicConfig | None = None, + workers: int | None = 1, + verbose: bool = False, ) -> tuple[list[IntegrityReport], list[str]]: - """Run the full Three-Phase Pipeline, optionally validating external URLs. + """Run the Three-Phase Pipeline over every .md file in docs/. + + This is the single unified entry point for all scan modes. The engine + selects sequential or parallel execution automatically based on the number + of files found (**Hybrid Adaptive Mode**): - This is the top-level orchestrator for ``zenzic check references --links``. - It collects all reports and — when ``validate_links=True`` — fires a single - async pass over all unique HTTP/HTTPS URLs found across the entire docs tree. + * **Sequential** — used when ``workers=1`` (the default) or when the repo + has fewer than :data:`ADAPTIVE_PARALLEL_THRESHOLD` files. Zero + process-spawn overhead; supports external URL validation. + * **Parallel** — activated when ``workers != 1`` *and* the file count + meets or exceeds :data:`ADAPTIVE_PARALLEL_THRESHOLD`. Distributes each + file to an independent worker process via ``ProcessPoolExecutor``. + External URL validation is performed in the main process after all + workers complete. - **Deduplication guarantee:** If 50 files all define a reference to - ``https://github.com``, the :class:`~zenzic.core.validator.LinkValidator` - issues exactly one HEAD request. This is enforced at registration time, not - inside the HTTP layer. + The threshold default (50 files) is a conservative heuristic: below it, + ``ProcessPoolExecutor`` spawn overhead (~200–400 ms on a cold interpreter) + exceeds the parallelism benefit. Override with ``workers=N`` to force a + specific pool size regardless of file count. - **Shield-as-firewall:** Files with security findings are excluded from link - validation. We do not ping URLs that contain leaked credentials. + **Determinism guarantee:** results are always sorted by ``file_path`` + regardless of execution mode. - **O(N) reads:** Each file is read exactly once regardless of whether link - validation is enabled. The scanner from Pass 1 is retained and its - ``ref_map`` is passed directly to the validator — no second harvest. + **Shield behaviour:** enforced per-worker in parallel mode; per-file in + sequential mode. Files with security findings are excluded from link + validation in both modes. + + **O(N) reads:** each file is read exactly once in sequential mode. In + parallel mode external URL registration runs a lightweight sequential pass + in the main process after workers complete (workers discard scanners). Args: - repo_root: Repository root (must contain ``docs/``). + repo_root: Repository root (must contain ``docs/``). + config: Optional Zenzic configuration. validate_links: When ``True``, perform async HTTP validation of all - external reference URLs (equivalent to ``--links`` on the CLI). - Disabled by default to preserve the fast-by-default principle. - config: Optional Zenzic configuration. + external reference URLs found across the docs tree. + Disabled by default. + workers: Number of worker processes for parallel mode. + ``1`` (default) always uses sequential execution. + ``None`` lets ``ProcessPoolExecutor`` pick based on + ``os.cpu_count()``. Any value other than ``1`` + activates parallel mode when the file count is at or + above :data:`ADAPTIVE_PARALLEL_THRESHOLD`. + verbose: When ``True``, print a single telemetry line to stderr + after the scan completes. Shows the engine mode, worker + count, elapsed time, and estimated speedup (parallel + mode only). Defaults to ``False``. Returns: A ``(reports, link_errors)`` tuple where: - - ``reports`` is the sorted list of :class:`IntegrityReport` objects. - - ``link_errors`` is a sorted list of HTTP error strings (empty when - ``validate_links=False`` or all URLs pass). + + - ``reports`` is the sorted list of :class:`IntegrityReport` objects, + one per ``.md`` file. + - ``link_errors`` is a sorted list of human-readable HTTP error strings + (empty when ``validate_links=False`` or all URLs pass). """ + import time + if config is None: config, _ = ZenzicConfig.load(repo_root) @@ -840,33 +875,91 @@ def scan_docs_references_with_links( return [], [] rule_engine = _build_rule_engine(config) - reports: list[IntegrityReport] = [] - # Retain secure scanners for Phase B URL registration (avoids double-read). - secure_scanners: list[ReferenceScanner] = [] + md_files = list(_iter_md_files(docs_root, config)) + + if not md_files: + return [], [] + + use_parallel = workers != 1 and len(md_files) >= ADAPTIVE_PARALLEL_THRESHOLD + + _t0 = time.monotonic() + + if use_parallel: + import concurrent.futures + import os + + work_items = [(f, config, rule_engine) for f in md_files] + actual_workers = workers if workers is not None else os.cpu_count() or 1 + with concurrent.futures.ProcessPoolExecutor(max_workers=workers) as executor: + raw = list(executor.map(_worker, work_items)) + reports: list[IntegrityReport] = sorted(raw, key=lambda r: r.file_path) - for md_file in _iter_md_files(docs_root, config): + elapsed = time.monotonic() - _t0 + if verbose: + _emit_telemetry( + mode="Parallel", + workers=actual_workers, + n_files=len(md_files), + elapsed=elapsed, + ) + + if not validate_links: + return reports, [] + + # Phase B in main process: lightweight sequential pass for URL + # registration. Workers discard scanners; we re-collect ref_maps here + # for deduplication. This is an additional O(N) read but preserves the + # Shield-as-firewall guarantee (no URLs from compromised files). + secure_scanners_b: list[ReferenceScanner] = [] + for md_file in md_files: + _report_b, secure_scanner_b = _scan_single_file(md_file, config, rule_engine) + if secure_scanner_b is not None: + secure_scanners_b.append(secure_scanner_b) + validator_b = LinkValidator() + for scanner in secure_scanners_b: + validator_b.register_from_map(scanner.ref_map, scanner.file_path) + return reports, validator_b.validate() + + # Sequential path — zero overhead, full O(N) link-validation support. + reports_seq: list[IntegrityReport] = [] + secure_scanners_seq: list[ReferenceScanner] = [] + + for md_file in md_files: report, secure_scanner = _scan_single_file(md_file, config, rule_engine) - reports.append(report) + reports_seq.append(report) if validate_links and secure_scanner is not None: - secure_scanners.append(secure_scanner) + secure_scanners_seq.append(secure_scanner) + + elapsed_seq = time.monotonic() - _t0 + if verbose: + _emit_telemetry( + mode="Sequential", + workers=1, + n_files=len(md_files), + elapsed=elapsed_seq, + ) if not validate_links: - return reports, [] + return reports_seq, [] # Phase B — global URL deduplication and async HTTP validation. # Uses the already-populated ref_maps from Phase A — no second file read. - validator = LinkValidator() - for scanner in secure_scanners: - validator.register_from_map(scanner.ref_map, scanner.file_path) + validator_seq = LinkValidator() + for scanner in secure_scanners_seq: + validator_seq.register_from_map(scanner.ref_map, scanner.file_path) + return reports_seq, validator_seq.validate() - link_errors = validator.validate() - return reports, link_errors +# ─── Adaptive parallel worker ───────────────────────────────────────────────── -# ─── Parallel scan ──────────────────────────────────────────────────────────── +#: Files below this threshold are scanned sequentially (zero process-spawn +#: overhead). Above it, the AdaptiveRuleEngine switches to a +#: ProcessPoolExecutor automatically. Exposed as a module constant so tests +#: can override it without patching private internals. +ADAPTIVE_PARALLEL_THRESHOLD: int = 50 -def _worker(args: tuple[Path, ZenzicConfig, RuleEngine | None]) -> IntegrityReport: +def _worker(args: tuple[Path, ZenzicConfig, AdaptiveRuleEngine | None]) -> IntegrityReport: """Top-level worker function for ``ProcessPoolExecutor``. Must be a module-level function (not a lambda or nested function) so that @@ -888,62 +981,3 @@ def _worker(args: tuple[Path, ZenzicConfig, RuleEngine | None]) -> IntegrityRepo md_file, config, rule_engine = args report, _scanner = _scan_single_file(md_file, config, rule_engine) return report - - -def scan_docs_references_parallel( - repo_root: Path, - config: ZenzicConfig | None = None, - *, - workers: int | None = None, -) -> list[IntegrityReport]: - """Run the Three-Phase Pipeline in parallel using multiple CPU cores. - - Each file is processed independently by a worker process, exploiting the - pureness of :func:`_scan_single_file`. Results are sorted by file path - after collection to guarantee deterministic output regardless of scheduling - order. - - **When to use:** Large repos (> ~200 files) where I/O wait and CPU-bound - regex work dominate. For smaller repos the process-spawn overhead exceeds - the parallelism benefit — prefer :func:`scan_docs_references` instead. - - **Shield behaviour:** The Shield is enforced per-worker. Files with - security findings are flagged in their :class:`IntegrityReport` as usual; - the caller is responsible for checking ``report.security_findings``. - - **External URL validation** is not supported in parallel mode. Use - :func:`scan_docs_references_with_links` for sequential scan with link - validation. - - Args: - repo_root: Repository root (must contain ``docs/``). - config: Optional Zenzic configuration. - workers: Number of worker processes. Defaults to ``None`` (let - :class:`~concurrent.futures.ProcessPoolExecutor` choose based on - CPU count). Pass ``1`` to disable parallelism (useful for testing). - - Returns: - Sorted list of :class:`IntegrityReport` objects, one per ``.md`` file. - """ - import concurrent.futures - - if config is None: - config, _ = ZenzicConfig.load(repo_root) - - docs_root = repo_root / config.docs_dir - if not docs_root.exists() or not docs_root.is_dir(): - return [] - - rule_engine = _build_rule_engine(config) - md_files = list(_iter_md_files(docs_root, config)) - - if not md_files: - return [] - - work_items = [(f, config, rule_engine) for f in md_files] - - with concurrent.futures.ProcessPoolExecutor(max_workers=workers) as executor: - results = list(executor.map(_worker, work_items)) - - # Sort by file path to guarantee deterministic output order. - return sorted(results, key=lambda r: r.file_path) diff --git a/src/zenzic/main.py b/src/zenzic/main.py index 7b08f7a..6fdbea3 100644 --- a/src/zenzic/main.py +++ b/src/zenzic/main.py @@ -11,7 +11,7 @@ from rich.console import Console from zenzic import __version__ -from zenzic.cli import check_app, clean_app, diff, init, score, serve +from zenzic.cli import check_app, clean_app, diff, init, plugins_app, score, serve from zenzic.core.exceptions import ConfigurationError from zenzic.core.logging import setup_cli_logging @@ -48,6 +48,7 @@ def _main( app.add_typer(check_app, name="check") app.add_typer(clean_app, name="clean") +app.add_typer(plugins_app, name="plugins") app.command(name="score")(score) app.command(name="diff")(diff) app.command(name="serve")(serve) diff --git a/src/zenzic/models/config.py b/src/zenzic/models/config.py index 47cd4a6..d5b3796 100644 --- a/src/zenzic/models/config.py +++ b/src/zenzic/models/config.py @@ -222,53 +222,18 @@ def model_post_init(self, __context: Any) -> None: ] @classmethod - def load(cls, repo_root: Path) -> tuple[ZenzicConfig, bool]: - """Load configuration from zenzic.toml. - - Falls back to defaults when the file does not exist. Raises - :class:`~zenzic.core.exceptions.ConfigurationError` with a Rich- - formatted message when the file exists but contains a TOML syntax error - — silent fallback would hide user mistakes. - - Args: - repo_root: Repository root that may contain ``zenzic.toml``. + def _build_from_data(cls, data: dict[str, Any]) -> ZenzicConfig: + """Construct a ``ZenzicConfig`` from a raw TOML dict. - Returns: - A ``(config, loaded_from_file)`` tuple. ``loaded_from_file`` is - ``True`` when ``zenzic.toml`` was found and parsed, ``False`` when - built-in defaults are in use. - - Raises: - :class:`~zenzic.core.exceptions.ConfigurationError`: When - ``zenzic.toml`` is present but cannot be parsed. + Shared by :meth:`load` (``zenzic.toml``) and the ``pyproject.toml`` + fallback path. Strips unknown keys and promotes sub-tables. """ - from zenzic.core.exceptions import ConfigurationError # deferred to avoid circular import - - config_path = repo_root / "zenzic.toml" - if not config_path.is_file(): - return cls(), False - - try: - with config_path.open("rb") as f: - data = tomllib.load(f) - except tomllib.TOMLDecodeError as exc: - raise ConfigurationError( - f"[bold red]zenzic.toml[/] contains a syntax error and cannot be loaded.\n" - f" [dim]{config_path}[/]\n\n" - f" [red]{exc}[/]\n\n" - "Fix the TOML syntax error and re-run Zenzic.", - context={"config_path": str(config_path)}, - ) from exc - - # Only pass known fields to Pydantic known_fields = cls.model_fields.keys() filtered_data = {k: v for k, v in data.items() if k in known_fields} - # Promote [build_context] sub-table into a BuildContext instance. if "build_context" in data and isinstance(data["build_context"], dict): filtered_data["build_context"] = BuildContext( **{k: v for k, v in data["build_context"].items() if k in BuildContext.model_fields} ) - # Promote [[custom_rules]] array into CustomRuleConfig instances. if "custom_rules" in data and isinstance(data["custom_rules"], list): filtered_data["custom_rules"] = [ CustomRuleConfig( @@ -277,4 +242,70 @@ def load(cls, repo_root: Path) -> tuple[ZenzicConfig, bool]: for r in data["custom_rules"] if isinstance(r, dict) ] - return cls(**filtered_data), True + return cls(**filtered_data) + + @classmethod + def load(cls, repo_root: Path) -> tuple[ZenzicConfig, bool]: + """Load configuration following the Agnostic Citizen priority chain. + + Priority order (first match wins): + + 1. ``zenzic.toml`` at *repo_root* — the authoritative sovereign config. + 2. ``[tool.zenzic]`` table in ``pyproject.toml`` at *repo_root*. + 3. Built-in defaults (``loaded_from_file`` returned as ``False``). + + When the winning file exists but cannot be parsed, a + :class:`~zenzic.core.exceptions.ConfigurationError` is raised with a + Rich-formatted message — silent fallback would hide user mistakes. + + Args: + repo_root: Repository root that may contain config files. + + Returns: + A ``(config, loaded_from_file)`` tuple. ``loaded_from_file`` is + ``True`` when either ``zenzic.toml`` or ``pyproject.toml`` was + found and parsed, ``False`` when built-in defaults are in use. + + Raises: + :class:`~zenzic.core.exceptions.ConfigurationError`: When a + config file is present but cannot be parsed. + """ + from zenzic.core.exceptions import ConfigurationError # deferred to avoid circular import + + # ── Priority 1: zenzic.toml ─────────────────────────────────────────── + zenzic_toml = repo_root / "zenzic.toml" + if zenzic_toml.is_file(): + try: + with zenzic_toml.open("rb") as f: + data = tomllib.load(f) + except tomllib.TOMLDecodeError as exc: + raise ConfigurationError( + f"[bold red]zenzic.toml[/] contains a syntax error and cannot be loaded.\n" + f" [dim]{zenzic_toml}[/]\n\n" + f" [red]{exc}[/]\n\n" + "Fix the TOML syntax error and re-run Zenzic.", + context={"config_path": str(zenzic_toml)}, + ) from exc + return cls._build_from_data(data), True + + # ── Priority 2: [tool.zenzic] in pyproject.toml ────────────────────── + pyproject_toml = repo_root / "pyproject.toml" + if pyproject_toml.is_file(): + try: + with pyproject_toml.open("rb") as f: + pyproject_data = tomllib.load(f) + except tomllib.TOMLDecodeError as exc: + raise ConfigurationError( + f"[bold red]pyproject.toml[/] contains a syntax error and cannot be loaded.\n" + f" [dim]{pyproject_toml}[/]\n\n" + f" [red]{exc}[/]\n\n" + "Fix the TOML syntax error and re-run Zenzic.", + context={"config_path": str(pyproject_toml)}, + ) from exc + tool_section = pyproject_data.get("tool", {}) + zenzic_section = tool_section.get("zenzic", {}) + if zenzic_section: + return cls._build_from_data(zenzic_section), True + + # ── Priority 3: built-in defaults ───────────────────────────────────── + return cls(), False diff --git a/src/zenzic/models/references.py b/src/zenzic/models/references.py index 72846ae..48d28e9 100644 --- a/src/zenzic/models/references.py +++ b/src/zenzic/models/references.py @@ -168,7 +168,7 @@ class IntegrityReport: findings: All reference-quality issues (dangling refs, orphan defs, duplicate defs, missing alt-text). security_findings: Secrets detected by the Shield during Pass 1. - rule_findings: Issues raised by the RuleEngine (custom rules and + rule_findings: Issues raised by the AdaptiveRuleEngine (custom rules and plugin-registered rules). Empty when no rules are configured. """ diff --git a/tests/test_cli.py b/tests/test_cli.py index dc42131..e622b3e 100644 --- a/tests/test_cli.py +++ b/tests/test_cli.py @@ -229,7 +229,7 @@ def test_cli_check_all_json_empty(tmp_path: Path, monkeypatch: pytest.MonkeyPatc @patch("zenzic.cli.find_placeholders", return_value=[]) @patch("zenzic.cli.find_unused_assets", return_value=[]) @patch("zenzic.cli.check_nav_contract", return_value=[]) -@patch("zenzic.cli.scan_docs_references_with_links", return_value=([], [])) +@patch("zenzic.cli.scan_docs_references", return_value=([], [])) def test_check_all_json_with_errors( _refs, _nav, _assets, _ph, _snip, _orphans, _links, _cfg, _root ) -> None: @@ -252,7 +252,7 @@ def test_check_all_json_with_errors( @patch("zenzic.cli.find_placeholders", return_value=[]) @patch("zenzic.cli.find_unused_assets", return_value=[]) @patch("zenzic.cli.check_nav_contract", return_value=[]) -@patch("zenzic.cli.scan_docs_references_with_links", return_value=([], [])) +@patch("zenzic.cli.scan_docs_references", return_value=([], [])) def test_check_all_text_ok(_refs, _nav, _assets, _ph, _snip, _orphans, _links, _cfg, _root) -> None: result = runner.invoke(app, ["check", "all"]) assert result.exit_code == 0 @@ -275,7 +275,7 @@ def test_check_all_text_ok(_refs, _nav, _assets, _ph, _snip, _orphans, _links, _ ) @patch("zenzic.cli.find_unused_assets", return_value=[Path("assets/unused.png")]) @patch("zenzic.cli.check_nav_contract", return_value=[]) -@patch("zenzic.cli.scan_docs_references_with_links", return_value=([], [])) +@patch("zenzic.cli.scan_docs_references", return_value=([], [])) def test_check_all_text_with_all_errors( _refs, _nav, _assets, _ph, _snip, _orphans, _links, _cfg, _root ) -> None: @@ -297,7 +297,7 @@ def test_check_all_text_with_all_errors( @patch("zenzic.cli.find_repo_root", return_value=_ROOT) @patch("zenzic.cli.ZenzicConfig.load", return_value=(_CFG, True)) @patch( - "zenzic.cli.scan_docs_references_with_links", + "zenzic.cli.scan_docs_references", return_value=([], []), ) def test_check_references_ok(_scan, _cfg, _root) -> None: @@ -308,7 +308,7 @@ def test_check_references_ok(_scan, _cfg, _root) -> None: @patch("zenzic.cli.find_repo_root", return_value=_ROOT) @patch("zenzic.cli.ZenzicConfig.load", return_value=(_CFG, True)) -@patch("zenzic.cli.scan_docs_references_with_links") +@patch("zenzic.cli.scan_docs_references") def test_check_references_rule_findings_surfaced(mock_scan, _cfg, _root) -> None: """rule_findings on IntegrityReport must appear in check references output.""" from zenzic.core.rules import RuleFinding diff --git a/tests/test_config.py b/tests/test_config.py index 3006ae6..ace9559 100644 --- a/tests/test_config.py +++ b/tests/test_config.py @@ -82,3 +82,73 @@ def test_placeholder_patterns_compiled_on_init(tmp_path: Path) -> None: assert len(config.placeholder_patterns_compiled) == 2 assert config.placeholder_patterns_compiled[0].search("this is a TODO item") assert config.placeholder_patterns_compiled[1].search("WIP section") + + +# ─── pyproject.toml support (ISSUE #5) ─────────────────────────────────────── + + +def test_load_config_from_pyproject_toml(tmp_path: Path) -> None: + """[tool.zenzic] in pyproject.toml is used when zenzic.toml is absent.""" + (tmp_path / "pyproject.toml").write_text( + "[tool.zenzic]\ndocs_dir = 'my_docs'\nfail_under = 75\n" + ) + config, loaded = ZenzicConfig.load(tmp_path) + assert config.docs_dir == Path("my_docs") + assert config.fail_under == 75 + assert loaded is True + + +def test_load_config_pyproject_build_context(tmp_path: Path) -> None: + """[tool.zenzic.build_context] is parsed correctly from pyproject.toml.""" + (tmp_path / "pyproject.toml").write_text( + "[tool.zenzic]\n" + "[tool.zenzic.build_context]\n" + "engine = 'zensical'\n" + "default_locale = 'en'\n" + "locales = ['it']\n" + ) + config, loaded = ZenzicConfig.load(tmp_path) + assert config.build_context.engine == "zensical" + assert config.build_context.locales == ["it"] + assert loaded is True + + +def test_load_config_pyproject_custom_rules(tmp_path: Path) -> None: + """[[tool.zenzic.custom_rules]] entries are parsed from pyproject.toml.""" + (tmp_path / "pyproject.toml").write_text( + "[tool.zenzic]\n" + "[[tool.zenzic.custom_rules]]\n" + 'id = "ZZ-PY"\n' + 'pattern = "TODO"\n' + 'message = "No TODOs."\n' + 'severity = "warning"\n' + ) + config, loaded = ZenzicConfig.load(tmp_path) + assert len(config.custom_rules) == 1 + assert config.custom_rules[0].id == "ZZ-PY" + assert config.custom_rules[0].severity == "warning" + assert loaded is True + + +def test_load_config_zenzic_toml_wins_over_pyproject(tmp_path: Path) -> None: + """zenzic.toml takes priority over [tool.zenzic] in pyproject.toml.""" + (tmp_path / "zenzic.toml").write_text("fail_under = 90\n") + (tmp_path / "pyproject.toml").write_text("[tool.zenzic]\nfail_under = 42\n") + config, loaded = ZenzicConfig.load(tmp_path) + assert config.fail_under == 90 # zenzic.toml wins + assert loaded is True + + +def test_load_config_pyproject_without_tool_zenzic_uses_defaults(tmp_path: Path) -> None: + """pyproject.toml without [tool.zenzic] falls back to built-in defaults.""" + (tmp_path / "pyproject.toml").write_text("[tool.other]\nfoo = 'bar'\n") + config, loaded = ZenzicConfig.load(tmp_path) + assert config.docs_dir == Path("docs") + assert loaded is False + + +def test_load_config_invalid_pyproject_raises(tmp_path: Path) -> None: + """A malformed pyproject.toml raises ConfigurationError — not silent fallback.""" + (tmp_path / "pyproject.toml").write_text("invalid [ toml") + with pytest.raises(ConfigurationError, match="syntax error"): + ZenzicConfig.load(tmp_path) diff --git a/tests/test_integration_finale.py b/tests/test_integration_finale.py new file mode 100644 index 0000000..3d036a4 --- /dev/null +++ b/tests/test_integration_finale.py @@ -0,0 +1,198 @@ +# SPDX-FileCopyrightText: 2026 PythonWoods +# SPDX-License-Identifier: Apache-2.0 +"""Integration tests for v0.5.0a1 Integration Finale sprint. + +Covers: +- zenzic plugins list (PluginRuleInfo, list_plugin_rules) +- Performance telemetry (scan_docs_references verbose=True) +""" + +from __future__ import annotations + +from io import StringIO +from pathlib import Path +from unittest.mock import patch + +import pytest + +from zenzic.core.rules import PluginRuleInfo, list_plugin_rules +from zenzic.core.scanner import ADAPTIVE_PARALLEL_THRESHOLD, scan_docs_references +from zenzic.models.config import ZenzicConfig + + +# ─── Helpers ────────────────────────────────────────────────────────────────── + + +def _make_docs(tmp_path: Path, n_files: int = 3) -> Path: + docs = tmp_path / "docs" + docs.mkdir() + for i in range(n_files): + (docs / f"page_{i:03d}.md").write_text(f"# Page {i}\n\nContent {i}.\n") + return tmp_path + + +# ─── list_plugin_rules ──────────────────────────────────────────────────────── + + +def test_list_plugin_rules_returns_list() -> None: + """list_plugin_rules() returns a list (possibly empty if entry-points not installed).""" + result = list_plugin_rules() + assert isinstance(result, list) + + +def test_list_plugin_rules_sorted_by_source() -> None: + """Results are sorted by entry-point source name.""" + result = list_plugin_rules() + sources = [r.source for r in result] + assert sources == sorted(sources) + + +def test_list_plugin_rules_contains_core_broken_links() -> None: + """The built-in broken-links rule is present when the package is installed.""" + result = list_plugin_rules() + sources = {r.source for r in result} + # Core package registers 'broken-links' via entry-points + assert "broken-links" in sources + + +def test_list_plugin_rules_broken_links_has_correct_id() -> None: + """broken-links entry-point exposes rule_id 'Z001'.""" + result = list_plugin_rules() + by_source = {r.source: r for r in result} + assert "broken-links" in by_source + assert by_source["broken-links"].rule_id == "Z001" + + +def test_list_plugin_rules_broken_links_origin_is_zenzic() -> None: + """broken-links is registered by the 'zenzic' distribution.""" + result = list_plugin_rules() + by_source = {r.source: r for r in result} + assert by_source["broken-links"].origin == "zenzic" + + +def test_plugin_rule_info_fields() -> None: + """PluginRuleInfo is a plain dataclass with the expected fields.""" + info = PluginRuleInfo( + rule_id="TEST", + class_name="my_pkg.rules.TestRule", + source="test-rule", + origin="my-pkg", + ) + assert info.rule_id == "TEST" + assert info.class_name == "my_pkg.rules.TestRule" + assert info.source == "test-rule" + assert info.origin == "my-pkg" + + +def test_list_plugin_rules_skips_unloadable_entry_point() -> None: + """An entry-point that fails to load is silently skipped.""" + from importlib.metadata import EntryPoint + + bad_ep = EntryPoint( + name="bad", value="nonexistent.module:NonExistentClass", group="zenzic.rules" + ) + + with patch("importlib.metadata.entry_points", return_value=[bad_ep]): + result = list_plugin_rules() + # Should return empty list without raising + assert result == [] + + +# ─── CLI: zenzic plugins list ──────────────────────────────────────────────── + + +def test_cli_plugins_list_command_runs() -> None: + """zenzic plugins list exits 0 and prints output.""" + from typer.testing import CliRunner + + from zenzic.main import app + + runner = CliRunner() + result = runner.invoke(app, ["plugins", "list"]) + assert result.exit_code == 0 + + +def test_cli_plugins_list_shows_broken_links() -> None: + """plugins list output mentions the broken-links core rule.""" + from typer.testing import CliRunner + + from zenzic.main import app + + runner = CliRunner() + result = runner.invoke(app, ["plugins", "list"]) + assert "broken-links" in result.output + assert "Z001" in result.output + + +def test_cli_plugins_list_empty_when_no_rules(monkeypatch: pytest.MonkeyPatch) -> None: + """When no rules are registered, prints an informational message without crashing.""" + from typer.testing import CliRunner + + from zenzic.main import app + + monkeypatch.setattr("zenzic.core.rules.list_plugin_rules", lambda: []) + + runner = CliRunner() + result = runner.invoke(app, ["plugins", "list"]) + assert result.exit_code == 0 + assert "No rules found" in result.output + + +# ─── Telemetry ──────────────────────────────────────────────────────────────── + + +def test_telemetry_sequential_writes_to_stderr(tmp_path: Path) -> None: + """verbose=True emits a telemetry line to stderr for sequential scans.""" + repo = _make_docs(tmp_path, n_files=2) + config = ZenzicConfig() + + captured = StringIO() + with patch("sys.stderr", captured): + scan_docs_references(repo, config, verbose=True) + + output = captured.getvalue() + assert "[zenzic]" in output + assert "Sequential" in output + assert "Files: 2" in output + assert "Execution time:" in output + + +def test_telemetry_disabled_by_default(tmp_path: Path) -> None: + """verbose=False (default) produces no stderr telemetry line.""" + repo = _make_docs(tmp_path, n_files=2) + config = ZenzicConfig() + + captured = StringIO() + with patch("sys.stderr", captured): + scan_docs_references(repo, config) + + assert "[zenzic]" not in captured.getvalue() + + +def test_telemetry_parallel_shows_workers(tmp_path: Path) -> None: + """verbose=True in parallel mode mentions workers and speedup.""" + # Create enough files to trigger parallel mode + repo = _make_docs(tmp_path, n_files=ADAPTIVE_PARALLEL_THRESHOLD) + config = ZenzicConfig() + + captured = StringIO() + with patch("sys.stderr", captured): + scan_docs_references(repo, config, workers=2, verbose=True) + + output = captured.getvalue() + assert "[zenzic]" in output + # Either parallel triggered (if files >= threshold) or sequential fallback — + # either way telemetry must be emitted. + assert "Execution time:" in output + + +def test_telemetry_sequential_no_speedup_line(tmp_path: Path) -> None: + """Sequential telemetry line does not contain a speedup estimate.""" + repo = _make_docs(tmp_path, n_files=2) + config = ZenzicConfig() + + captured = StringIO() + with patch("sys.stderr", captured): + scan_docs_references(repo, config, verbose=True) + + assert "speedup" not in captured.getvalue().lower() diff --git a/tests/test_parallel.py b/tests/test_parallel.py index 48bbe0f..a4edc87 100644 --- a/tests/test_parallel.py +++ b/tests/test_parallel.py @@ -13,11 +13,23 @@ import pytest -from zenzic.core.scanner import scan_docs_references, scan_docs_references_parallel +from zenzic.core.rules import AdaptiveRuleEngine, BaseRule, RuleFinding +from zenzic.core.scanner import scan_docs_references from zenzic.models.config import ZenzicConfig from zenzic.models.references import IntegrityReport +# Module-level BoomRule: pickleable (defined at module level) but raises +# during check(). Used to test that the engine isolates runtime exceptions. +class _BoomRule(BaseRule): + @property + def rule_id(self) -> str: + return "BOOM" + + def check(self, file_path: Path, text: str) -> list[RuleFinding]: + raise RuntimeError("intentional failure") + + # ─── Helpers ────────────────────────────────────────────────────────────────── @@ -45,32 +57,32 @@ def test_parallel_matches_sequential(tmp_path: Path) -> None: repo = _make_docs(tmp_path, n_files=10) config = ZenzicConfig() - sequential = scan_docs_references(repo, config) - parallel = scan_docs_references_parallel(repo, config, workers=2) + sequential, _ = scan_docs_references(repo, config) + parallel, _ = scan_docs_references(repo, config, workers=2) assert _report_fingerprint(sequential) == _report_fingerprint(parallel) def test_parallel_empty_docs(tmp_path: Path) -> None: - """Parallel scan on a repo with no docs returns an empty list.""" + """Parallel scan on a repo with no docs returns empty results.""" (tmp_path / "docs").mkdir() config = ZenzicConfig() - result = scan_docs_references_parallel(tmp_path, config, workers=2) - assert result == [] + reports, _ = scan_docs_references(tmp_path, config, workers=2) + assert reports == [] def test_parallel_docs_not_exist(tmp_path: Path) -> None: - """Parallel scan returns [] when docs_dir does not exist.""" + """Parallel scan returns empty results when docs_dir does not exist.""" config = ZenzicConfig() - result = scan_docs_references_parallel(tmp_path, config, workers=2) - assert result == [] + reports, _ = scan_docs_references(tmp_path, config, workers=2) + assert reports == [] def test_parallel_single_worker_is_sequential(tmp_path: Path) -> None: - """workers=1 effectively disables parallelism but still returns correct results.""" + """workers=1 disables parallelism but still returns correct results.""" repo = _make_docs(tmp_path, n_files=4) config = ZenzicConfig() - result = scan_docs_references_parallel(repo, config, workers=1) + result, _ = scan_docs_references(repo, config, workers=1) assert len(result) == 4 # All refs should resolve (we defined [ref] in every file) for report in result: @@ -81,7 +93,7 @@ def test_parallel_sorted_output(tmp_path: Path) -> None: """Output is sorted by file_path regardless of worker scheduling order.""" repo = _make_docs(tmp_path, n_files=8) config = ZenzicConfig() - result = scan_docs_references_parallel(repo, config, workers=4) + result, _ = scan_docs_references(repo, config, workers=4) paths = [r.file_path for r in result] assert paths == sorted(paths) @@ -95,9 +107,9 @@ def test_idempotency_sequential_100_runs(tmp_path: Path) -> None: repo = _make_docs(tmp_path, n_files=10) config = ZenzicConfig() - baseline = _report_fingerprint(scan_docs_references(repo, config)) + baseline = _report_fingerprint(scan_docs_references(repo, config)[0]) for _ in range(99): - result = _report_fingerprint(scan_docs_references(repo, config)) + result = _report_fingerprint(scan_docs_references(repo, config)[0]) assert result == baseline, "Sequential scan is not deterministic" @@ -106,9 +118,9 @@ def test_idempotency_parallel_10_runs(tmp_path: Path) -> None: repo = _make_docs(tmp_path, n_files=5) config = ZenzicConfig() - baseline = _report_fingerprint(scan_docs_references_parallel(repo, config, workers=2)) + baseline = _report_fingerprint(scan_docs_references(repo, config, workers=2)[0]) for _ in range(9): - result = _report_fingerprint(scan_docs_references_parallel(repo, config, workers=2)) + result = _report_fingerprint(scan_docs_references(repo, config, workers=2)[0]) assert result == baseline, "Parallel scan is not deterministic" @@ -121,10 +133,10 @@ def test_idempotency_concurrent_invocations(tmp_path: Path) -> None: repo = _make_docs(tmp_path, n_files=6) config = ZenzicConfig() - baseline = _report_fingerprint(scan_docs_references(repo, config)) + baseline = _report_fingerprint(scan_docs_references(repo, config)[0]) def run_scan() -> list[tuple[str, float, int]]: - return _report_fingerprint(scan_docs_references(repo, config)) + return _report_fingerprint(scan_docs_references(repo, config)[0]) with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor: futures = [executor.submit(run_scan) for _ in range(8)] @@ -138,18 +150,9 @@ def run_scan() -> list[tuple[str, float, int]]: def test_parallel_rule_exception_isolated(tmp_path: Path) -> None: - """A rule that raises in a worker does not abort other workers.""" - from zenzic.core.rules import BaseRule, RuleEngine, RuleFinding + """A module-level rule that raises at runtime does not abort other files.""" from zenzic.core.scanner import _scan_single_file - class BoomRule(BaseRule): - @property - def rule_id(self) -> str: - return "BOOM" - - def check(self, file_path: Path, text: str) -> list[RuleFinding]: - raise RuntimeError("intentional failure") - docs = tmp_path / "docs" docs.mkdir() files = [docs / f"p{i}.md" for i in range(3)] @@ -157,7 +160,7 @@ def check(self, file_path: Path, text: str) -> list[RuleFinding]: f.write_text("# page\n") config = ZenzicConfig() - engine = RuleEngine([BoomRule()]) + engine = AdaptiveRuleEngine([_BoomRule()]) # All files should produce a report with one RULE-ENGINE-ERROR finding for f in files: diff --git a/tests/test_references.py b/tests/test_references.py index ecba885..319f6ce 100644 --- a/tests/test_references.py +++ b/tests/test_references.py @@ -44,7 +44,6 @@ ReferenceScanner, check_image_alt_text, scan_docs_references, - scan_docs_references_with_links, ) from zenzic.core.shield import SecurityFinding, scan_line_for_secrets, scan_url_for_secrets from zenzic.core.validator import LinkValidator @@ -560,12 +559,12 @@ class TestScanDocsReferences: def test_empty_docs_returns_empty(self, tmp_path: Path) -> None: (tmp_path / "docs").mkdir() (tmp_path / "mkdocs.yml").touch() - reports = scan_docs_references(tmp_path) + reports, _ = scan_docs_references(tmp_path) assert reports == [] def test_missing_docs_dir_returns_empty(self, tmp_path: Path) -> None: (tmp_path / "mkdocs.yml").touch() - reports = scan_docs_references(tmp_path) + reports, _ = scan_docs_references(tmp_path) assert reports == [] def test_single_clean_file(self, tmp_path: Path) -> None: @@ -576,7 +575,7 @@ def test_single_clean_file(self, tmp_path: Path) -> None: "[guide]: https://example.com\n\nSee [guide][guide].\n", encoding="utf-8", ) - reports = scan_docs_references(tmp_path) + reports, _ = scan_docs_references(tmp_path) assert len(reports) == 1 assert reports[0].score == pytest.approx(100.0) assert reports[0].is_secure is True @@ -590,7 +589,7 @@ def test_secret_in_docs_flagged(self, tmp_path: Path) -> None: f"[api]: https://aws.example.com/?key={aws_key}\n", encoding="utf-8", ) - reports = scan_docs_references(tmp_path) + reports, _ = scan_docs_references(tmp_path) assert len(reports) == 1 assert reports[0].is_secure is False @@ -601,7 +600,7 @@ def test_symlinks_skipped(self, tmp_path: Path) -> None: real = tmp_path / "real.md" real.write_text("[ref]: https://example.com\n", encoding="utf-8") (docs / "linked.md").symlink_to(real) - reports = scan_docs_references(tmp_path) + reports, _ = scan_docs_references(tmp_path) assert reports == [] def test_deduplication_1000_refs_single_used_id(self, tmp_path: Path) -> None: @@ -613,7 +612,7 @@ def test_deduplication_1000_refs_single_used_id(self, tmp_path: Path) -> None: for i in range(1000): lines.append(f"Item {i}: see [here][bigref].\n") (docs / "big.md").write_text("".join(lines), encoding="utf-8") - reports = scan_docs_references(tmp_path) + reports, _ = scan_docs_references(tmp_path) assert len(reports) == 1 report = reports[0] assert report.score == pytest.approx(100.0) @@ -838,7 +837,7 @@ def test_scan_docs_references_with_links_no_links_flag(self, tmp_path: Path) -> "[ref]: https://example.com\n\nSee [page][ref].\n", encoding="utf-8", ) - reports, link_errors = scan_docs_references_with_links(tmp_path, validate_links=False) + reports, link_errors = scan_docs_references(tmp_path, validate_links=False) assert len(reports) == 1 assert link_errors == [] @@ -853,7 +852,7 @@ def test_scan_docs_references_with_links_secure_files_only(self, tmp_path: Path) encoding="utf-8", ) # validate_links=True but the file has a secret → URLs must be skipped - reports, link_errors = scan_docs_references_with_links(tmp_path, validate_links=True) + reports, link_errors = scan_docs_references(tmp_path, validate_links=True) # Reports still produced (with security finding) assert len(reports) == 1 assert reports[0].is_secure is False diff --git a/tests/test_rules.py b/tests/test_rules.py index c64c757..566d5c9 100644 --- a/tests/test_rules.py +++ b/tests/test_rules.py @@ -1,6 +1,6 @@ # SPDX-FileCopyrightText: 2026 PythonWoods # SPDX-License-Identifier: Apache-2.0 -"""Tests for the Zenzic Rule Engine: BaseRule, CustomRule, RuleEngine.""" +"""Tests for the Zenzic Rule Engine: BaseRule, CustomRule, AdaptiveRuleEngine.""" from __future__ import annotations @@ -9,10 +9,11 @@ import pytest +from zenzic.core.exceptions import PluginContractError from zenzic.core.rules import ( + AdaptiveRuleEngine, BaseRule, CustomRule, - RuleEngine, RuleFinding, Violation, VSMBrokenLinkRule, @@ -24,6 +25,17 @@ _FILE = Path("docs/guide.md") +# Module-level BrokenRule: pickleable (defined at module level) but raises +# at runtime inside check(). Tests that the engine isolates runtime errors. +class _BrokenRule(BaseRule): + @property + def rule_id(self) -> str: + return "ZZ-BROKEN" + + def check(self, file_path: Path, text: str) -> list[RuleFinding]: + raise RuntimeError("rule internal error") + + # ─── CustomRule ─────────────────────────────────────────────────────────────── @@ -68,25 +80,25 @@ def test_custom_rule_info_severity_not_error() -> None: assert not findings[0].is_error -# ─── RuleEngine ─────────────────────────────────────────────────────────────── +# ─── AdaptiveRuleEngine ─────────────────────────────────────────────────────────────── def test_rule_engine_empty_no_findings() -> None: - engine = RuleEngine([]) + engine = AdaptiveRuleEngine([]) assert not engine assert engine.run(_FILE, "any text") == [] def test_rule_engine_bool_true_when_rules_present() -> None: rule = CustomRule(id="ZZ007", pattern=r"x", message="x", severity="error") - engine = RuleEngine([rule]) + engine = AdaptiveRuleEngine([rule]) assert engine def test_rule_engine_multiple_rules_combined() -> None: r1 = CustomRule(id="ZZ008", pattern=r"TODO", message="todo found", severity="error") r2 = CustomRule(id="ZZ009", pattern=r"FIXME", message="fixme found", severity="warning") - engine = RuleEngine([r1, r2]) + engine = AdaptiveRuleEngine([r1, r2]) text = "Line with TODO here.\nAnother FIXME line.\n" findings = engine.run(_FILE, text) assert len(findings) == 2 @@ -95,30 +107,40 @@ def test_rule_engine_multiple_rules_combined() -> None: def test_rule_engine_isolates_exception() -> None: - """A rule that raises must not abort the entire engine run.""" - - class BrokenRule(BaseRule): - @property - def rule_id(self) -> str: - return "ZZ-BROKEN" - - def check(self, file_path: Path, text: str) -> list[RuleFinding]: - raise RuntimeError("rule internal error") + """A module-level rule that raises at runtime must not abort the engine run. + _BrokenRule is defined at module level so it passes eager pickle validation. + Its check() raises at runtime — the engine must catch it and continue. + """ good_rule = CustomRule(id="ZZ010", pattern=r"x", message="x found", severity="info") - engine = RuleEngine([BrokenRule(), good_rule]) + engine = AdaptiveRuleEngine([_BrokenRule(), good_rule]) findings = engine.run(_FILE, "x line\n") # One error from the broken rule, one info from the good rule assert len(findings) == 2 engine_err = next(f for f in findings if f.rule_id == "RULE-ENGINE-ERROR") - assert "BrokenRule" in engine_err.message or "rule internal error" in engine_err.message + assert "ZZ-BROKEN" in engine_err.message or "rule internal error" in engine_err.message assert engine_err.severity == "error" good_finding = next(f for f in findings if f.rule_id == "ZZ010") assert good_finding.severity == "info" +def test_rule_engine_rejects_non_pickleable_rule() -> None: + """A rule defined inside a function is not pickleable → PluginContractError at construction.""" + + class LocalRule(BaseRule): + @property + def rule_id(self) -> str: + return "ZZ-LOCAL" + + def check(self, file_path: Path, text: str) -> list[RuleFinding]: + return [] + + with pytest.raises(PluginContractError, match="not serialisable"): + AdaptiveRuleEngine([LocalRule()]) + + # ─── Integration with scanner ────────────────────────────────────────────────── @@ -130,7 +152,7 @@ def test_scan_single_file_with_rule_engine(tmp_path: Path) -> None: md.write_text("# Guide\n\nThis is TODO content.\n") config = ZenzicConfig() rule = CustomRule(id="ZZ-TODO", pattern=r"TODO", message="Remove TODO.", severity="warning") - engine = RuleEngine([rule]) + engine = AdaptiveRuleEngine([rule]) report, _ = _scan_single_file(md, config, engine) assert len(report.rule_findings) == 1 @@ -168,7 +190,7 @@ def test_scan_docs_with_custom_rules_from_config(tmp_path: Path) -> None: ) ] ) - reports = scan_docs_references(tmp_path, config) + reports, _ = scan_docs_references(tmp_path, config) assert len(reports) == 1 assert len(reports[0].rule_findings) == 1 assert reports[0].rule_findings[0].rule_id == "ZZ-DRAFT" @@ -227,7 +249,7 @@ def test_custom_rules_fire_regardless_of_engine( if engine == "zensical": (repo / "zensical.toml").write_text("[site]\nname = 'Test'\n") - reports = scan_docs_references(repo, config) + reports, _ = scan_docs_references(repo, config) assert len(reports) == 1, f"Expected 1 report for engine={engine!r}" rule_findings = reports[0].rule_findings assert len(rule_findings) == 1, ( @@ -368,10 +390,10 @@ def test_violation_line_number_is_correct(self) -> None: assert len(violations) == 1 assert violations[0].line_no == 5 - # ── RuleEngine.run_vsm integration ─────────────────────────────────────── + # ── AdaptiveRuleEngine.run_vsm integration ─────────────────────────────────────── def test_run_vsm_converts_violations_to_findings(self) -> None: - engine = RuleEngine([VSMBrokenLinkRule()]) + engine = AdaptiveRuleEngine([VSMBrokenLinkRule()]) vsm = _make_vsm("/ok/") findings = engine.run_vsm(_FILE, "[OK](ok/index.md)\n[Bad](ghost.md)\n", vsm, {}) assert len(findings) == 1 @@ -386,7 +408,7 @@ def test_run_vsm_converts_violations_to_findings(self) -> None: # threshold, which would be ~100× slower on this data size). -class TestRuleEngineTortureTest: +class TestAdaptiveRuleEngineTortureTest: """🛡️ Dev 4: Performance and scalability invariants for the Rule Engine.""" _N = 10_000 @@ -446,8 +468,8 @@ def test_check_vsm_scales_linearly_all_missing(self) -> None: ) def test_run_vsm_engine_scales_with_large_vsm(self) -> None: - """RuleEngine.run_vsm with 10 000-node VSM must complete < 1 s.""" - engine = RuleEngine([VSMBrokenLinkRule()]) + """AdaptiveRuleEngine.run_vsm with 10 000-node VSM must complete < 1 s.""" + engine = AdaptiveRuleEngine([VSMBrokenLinkRule()]) vsm = self._make_large_vsm() # Small file — only the VSM lookup overhead is being measured here text = "\n".join(f"[P](page-{i}.md)" for i in range(100)) diff --git a/uv.lock b/uv.lock index d2f308e..581af7c 100644 --- a/uv.lock +++ b/uv.lock @@ -2035,7 +2035,7 @@ wheels = [ [[package]] name = "zenzic" -version = "0.4.0rc5" +version = "0.5.0a1" source = { editable = "." } dependencies = [ { name = "httpx" }, @@ -2097,7 +2097,7 @@ test = [ [package.metadata] requires-dist = [ - { name = "httpx", specifier = ">=0.27" }, + { name = "httpx", specifier = ">=0.27,<1.0" }, { name = "mkdocs-material", extras = ["imaging"], marker = "extra == 'docs'", specifier = ">=9.0.0" }, { name = "mkdocs-minify-plugin", marker = "extra == 'docs'", specifier = ">=0.7.0" }, { name = "mkdocs-static-i18n", marker = "extra == 'docs'", specifier = ">=1.3.1" },