diff --git a/docs/agent-tool-gating.md b/docs/agent-tool-gating.md index 06d2bf6..71291c4 100644 --- a/docs/agent-tool-gating.md +++ b/docs/agent-tool-gating.md @@ -275,7 +275,7 @@ Wiring (schema/handler layers plus registration order): The plugin loads the same contract as the adapter (`hermes-governance` / bundled YAML). Adding a tool is mostly **config + mapper + policy**, plus an entry in -`GOVERNED_BUILTIN_MODULES` when Hermes registers the tool at module import time. +``builtin_module`` in the repo catalog template when Hermes registers the tool at module import time. **History (before v1):** the first proof gated only `terminal` via `intentframe-terminal`, which imported `tools.terminal_tool` at plugin load (early diff --git a/docs/hermes-intentframe-integration-guide.md b/docs/hermes-intentframe-integration-guide.md index 3bfda84..ba29b91 100644 --- a/docs/hermes-intentframe-integration-guide.md +++ b/docs/hermes-intentframe-integration-guide.md @@ -153,7 +153,9 @@ Names only — full JSON schemas are probed separately via ``probe_hermes_tool_s **Rule:** when debugging “model never calls tool X”, verify X appears in the **OpenAI Tools block** (request dump with `HERMES_DUMP_REQUESTS=1`, trace, or gateway logs), -not only on `/v1/toolsets`. Automated check: +not only on `/v1/toolsets`. For governed builtins, also check preload (`builtin_module` +in yaml) and Hermes per-tool `check_fn` env (e.g. `cronjob` needs `HERMES_GATEWAY_SESSION`). +Automated check: `RUN_HERMES_GATEWAY_TOOLSETS=1 ./tests/scripts/test-hermes-gateway-toolsets.sh` (see [`tests/hermes_gateway/README.md`](../tests/hermes_gateway/README.md#toolsets--provider-payload-test-opt-in-networked-llm)). @@ -271,19 +273,23 @@ unless explicitly added to the contract — govern by **tool name**, not toolset ### 1. Selective preload -```12:29:integrations/hermes/plugin/intentframe-gate/builtin_preload.py -GOVERNED_BUILTIN_MODULES: dict[str, str] = { - "terminal": "tools.terminal_tool", - "process": "tools.process_registry", - "write_file": "tools.file_tools", - "patch": "tools.file_tools", -} +Each catalog tool may declare ``builtin_module`` in the dev-owned repo +``integrations/hermes/governance/tools.yaml`` (copied to runtime on integrate). +The plugin imports those modules for **enabled** governed tools only: -def preload_governed_builtins(governed: frozenset[str]) -> None: - ... - importlib.import_module(module_name) +```yaml +terminal: + enabled: true + builtin_module: tools.terminal_tool +cronjob: + enabled: true + builtin_module: tools.cronjob_tools ``` +[`builtin_preload.py`](../integrations/hermes/plugin/intentframe-gate/builtin_preload.py) +validates ``builtin_module`` must start with ``tools.`` and imports unique modules before +the registry snapshot. + **Why not call `discover_builtin_tools()` in the plugin?** Hermes discovers builtins by AST-scanning `tools/*.py` and importing every module @@ -403,7 +409,7 @@ the modules first. ### Scrutinize import changes like API changes -When editing `GOVERNED_BUILTIN_MODULES` or any plugin import: +When editing ``builtin_module`` in the repo catalog template or any plugin import: | Change | Risk | |--------|------| @@ -507,15 +513,18 @@ do not change manifest or policy files. ### Step 4 — Plugin preload (if Hermes builtin) -If the tool is a Hermes built-in registered at import time, add to -`GOVERNED_BUILTIN_MODULES`: +If the tool is a Hermes built-in registered at import time, set in repo +``integrations/hermes/governance/tools.yaml``: -```python -"my_tool": "tools.my_tool_module", +```yaml +my_tool: + enabled: true + builtin_module: tools.my_tool_module ``` If several catalog names share one module (like `write_file` + `patch` → `file_tools`), -one import is enough — preload dedupes modules. +one import is enough — preload dedupes modules. ``builtin_module`` must start with +``tools.`` (validated at load time). Delete coverage is via `patch` V4A `*** Delete File:` operations (maps to `DELETE_HOST_FILE`). @@ -658,7 +667,7 @@ uv run --package intentframe-integrations-cli python tests/intentframe_integrati uv run --package intentframe-integrations-cli python tests/intentframe_integrations/test_policy_manage.py ``` -Extend `test_builtin_preload.py` when adding `GOVERNED_BUILTIN_MODULES` entries. +Extend `test_builtin_preload.py` when adding ``builtin_module`` entries to the catalog template. ### Layer 2 — Toolsets + OpenAI provider payload (networked LLM) @@ -669,14 +678,14 @@ RUN_HERMES_GATEWAY_TOOLSETS=1 ./tests/scripts/test-hermes-gateway-toolsets.sh Requires `OPENAI_API_KEY`. After `integrate hermes`: 1. `GET /v1/toolsets` — config tool name surface -2. `probe_hermes_tool_schemas.py` — registry schemas (`reason_required`, gate markers) +2. `probe_hermes_tool_schemas.py` — registry schemas for **all** governed catalog tools (`reason_required`, gate markers); probe env includes `HERMES_GATEWAY_SESSION=1` so `cronjob` passes Hermes `check_fn` 3. `POST /v1/responses` with `HERMES_DUMP_REQUESTS=1` — one real `chat.completions` call -4. Assert token usage > 0 and governed tools have required `reason` in `request.body.tools` +4. Assert token usage > 0 and **all** governed catalog tools have required `reason` in `request.body.tools` -Asserts `terminal: ['process', 'terminal']` on toolsets and provider payload schema -for governed tools. Lighter than full E2E (no tool-calling ALLOW/BLOCK probes). +Lighter than full E2E (no tool-calling ALLOW/BLOCK probes). Covers generic mappers +(e.g. `cronjob`) that gateway E2E omits from LLM probes. -Details: [`tests/hermes_gateway/README.md`](../tests/hermes_gateway/README.md#toolsets--provider-payload-test-opt-in-networked-llm). +Details and recent bug fixes: [`tests/hermes_gateway/README.md`](../tests/hermes_gateway/README.md#toolsets--provider-payload-test-opt-in-networked-llm). ### Layer 3 — Scoped gateway E2E (fast smoke) diff --git a/docs/hermes-intentframe-state-report.md b/docs/hermes-intentframe-state-report.md index 93efdaa..9f39eb3 100644 --- a/docs/hermes-intentframe-state-report.md +++ b/docs/hermes-intentframe-state-report.md @@ -1,6 +1,6 @@ # IntentFrame × Hermes integration — state report -> Snapshot of the Hermes agent integration as of **2026-06-23**. For how-to and +> Snapshot of the Hermes agent integration as of **2026-06-24**. For how-to and > troubleshooting, see [`hermes-intentframe-integration-guide.md`](./hermes-intentframe-integration-guide.md). --- @@ -45,7 +45,7 @@ LLM (POST /v1/responses) | Path | Purpose | |------|---------| -| `integrations/hermes/governance/tools.yaml` | Default governed-tool **template** (4 entries) | +| `integrations/hermes/governance/tools.yaml` | Default governed-tool **template** (5 entries) | | `integrations/hermes/policy.yaml` | Shipped policy **template** (RUN_COMMAND + host-file + deletion) | | `~/.intentframe/integrations/hermes/governance/tools.yaml` | Runtime governed-tool config (user-owned) | | `~/.intentframe/integrations/hermes/policy.yaml` | Runtime policy config (user-owned) | @@ -89,12 +89,16 @@ At `register()`: 1. **`install_registry_hook()`** — wrap tools registered later (e.g. MCP refresh). 2. **`preload_governed_builtins(governed)`** — selective import from - `GOVERNED_BUILTIN_MODULES` in [`builtin_preload.py`](../integrations/hermes/plugin/intentframe-gate/builtin_preload.py): + ``builtin_module`` per tool in [`builtin_preload.py`](../integrations/hermes/plugin/intentframe-gate/builtin_preload.py) (from repo ``tools.yaml``): - `terminal` → `tools.terminal_tool` - `process` → `tools.process_registry` - `write_file`, `patch` → `tools.file_tools` + - `cronjob` → `tools.cronjob_tools` 3. **Snapshot loop** — wrap governed entries with `inject_reason()` + `gate_tool_call()`. +`cronjob` also requires `HERMES_GATEWAY_SESSION=1` (or interactive/exec env) to pass +Hermes `check_fn` filtering in `get_tool_definitions()` — preload alone is not enough. + See [`hermes-plugin-registration-order.md`](./hermes-plugin-registration-order.md) for load-order evidence and bisect notes. @@ -136,7 +140,7 @@ or restore defaults. Policy commands apply `agent.json` env via `load_and_activa | Layer | Entry | LLM / network | |-------|-------|---------------| | Unit | `tests/hermes_plugin/`, `tests/hermes_gateway/test_*.py`, adapter tests, `test_policy_manage.py`, `test_integration_pack.py` | No | -| Toolsets + provider payload | `RUN_HERMES_GATEWAY_TOOLSETS=1 ./tests/scripts/test-hermes-gateway-toolsets.sh` | OpenAI `chat.completions` (one round-trip); asserts `tools=` + `reason` in request dump | +| Toolsets + provider payload | `RUN_HERMES_GATEWAY_TOOLSETS=1 ./tests/scripts/test-hermes-gateway-toolsets.sh` | OpenAI `chat.completions` (one round-trip); asserts **all** governed tools + `reason` in request dump | | Live integration | `./tests/scripts/test-hermes-integration.sh` | Backend; policy reload smoke + adapter/plugin probes (no LLM) | | Gateway E2E | `RUN_HERMES_GATEWAY_E2E=1 ./tests/scripts/test-hermes-gateway-e2e.sh` | OpenAI + full stack; native-mapper LLM probes only | @@ -167,7 +171,7 @@ See [`tests/hermes_gateway/README.md`](../tests/hermes_gateway/README.md). --- -## Recent changes (branch `fix-plugin-new-mechanism`) +## Recent changes | Change | Rationale | |--------|-----------| @@ -177,6 +181,11 @@ See [`tests/hermes_gateway/README.md`](../tests/hermes_gateway/README.md). | Hardened block probe prompts | Fix LLM rewriting `/etc/` to sandbox paths | | `load_and_activate_pack` + policy env parity | Policy validation sees same manifest env as backend boot | | `cronjob` generic tool + two-tier probe contract | Live semantic smoke; no gateway LLM E2E for generic mappers | +| **`builtin_module` in repo `tools.yaml`** | Replace hardcoded preload dict; single catalog source for Hermes import paths | +| **Toolsets live: full governed catalog** | Probe/dump had used native E2E tier only — `cronjob` was skipped despite production governance | +| **Toolsets probe: `HERMES_GATEWAY_SESSION=1`** | `cronjob` registered via preload but filtered by Hermes `check_fn` without gateway session env | +| **Toolsets run marker** | Unique token per run for OpenAI Platform log correlation | +| **Loader parity test (`builtin_module`)** | Plugin and shared governance loaders must agree on catalog shape | --- diff --git a/docs/hermes-plugin-registration-order.md b/docs/hermes-plugin-registration-order.md index d6ffb38..93bce65 100644 --- a/docs/hermes-plugin-registration-order.md +++ b/docs/hermes-plugin-registration-order.md @@ -66,25 +66,29 @@ def discover_builtin_tools(tools_dir: Optional[Path] = None) -> List[str]: | **`intentframe-gate` (broken)** | Hook + snapshot only — **no preload** | **empty** → `wrapped = []` | | **`intentframe-gate` (fixed)** | `preload_governed_builtins(governed)` + generic snapshot | governed names present | -The fixed plugin restores the old **early-import effect** generically: - -```12:29:integrations/hermes/plugin/intentframe-gate/builtin_preload.py -GOVERNED_BUILTIN_MODULES: dict[str, str] = { - "terminal": "tools.terminal_tool", - "process": "tools.process_registry", - "write_file": "tools.file_tools", - "patch": "tools.file_tools", -} -... - importlib.import_module(module_name) +The fixed plugin restores the old **early-import effect** generically from the dev-owned +catalog template: + +```yaml +# integrations/hermes/governance/tools.yaml (excerpt) +terminal: + enabled: true + builtin_module: tools.terminal_tool +cronjob: + enabled: true + builtin_module: tools.cronjob_tools ``` +[`builtin_preload.py`](../integrations/hermes/plugin/intentframe-gate/builtin_preload.py) +imports each enabled tool's ``builtin_module`` (must start with ``tools.``) before snapshot. + Then the same wrap loop runs for every governed name — no terminal-specific `_register_terminal_override()`: ```20:35:integrations/hermes/plugin/intentframe-gate/__init__.py - governed = governed_tool_names() - preload_governed_builtins(governed) + governed_tools = load_governed_tools() + governed = frozenset(governed_tools) + preload_governed_builtins(governed_tools) for entry in registry._snapshot_entries(): if entry.name not in governed: @@ -97,10 +101,11 @@ Then the same wrap loop runs for every governed name — no terminal-specific ### Why this matters for code review -Treat changes to `GOVERNED_BUILTIN_MODULES` and any plugin `import` like **API -surface changes**: +Treat changes to ``builtin_module`` in the repo ``tools.yaml`` and any plugin ``import`` +like **API surface changes**: -- Removing an import line can remove a tool from the OpenAI payload entirely. +- Removing or omitting ``builtin_module`` can remove a tool from the OpenAI payload entirely. +- Invalid ``builtin_module`` values are rejected (must start with ``tools.``). - Adding `discover_builtin_tools()` can register unrelated tools (`read_terminal`). - Unit test: [`tests/hermes_plugin/test_builtin_preload.py`](../tests/hermes_plugin/test_builtin_preload.py). @@ -154,7 +159,7 @@ enough on the gateway path — governed Hermes builtins must be preloaded first. ```mermaid flowchart TB subgraph mechanisms["intentframe-gate registration"] - A["1. Selective preload
GOVERNED_BUILTIN_MODULES"] + A["1. Selective preload
tools.yaml builtin_module"] B["2. Snapshot loop
registry._snapshot_entries()"] C["3. Registry hook
patch registry.register"] end @@ -183,8 +188,8 @@ flowchart TB **Why not full `discover_builtin_tools()`?** It imports every builtin module. That pulled in `read_terminal`, which Hermes then merged into the `terminal` toolset and broke the E2E toolsets contract (`['process', 'terminal']` expected). Selective -preload imports only modules listed in `GOVERNED_BUILTIN_MODULES` for names in the -runtime governed set. +preload imports ``builtin_module`` from each **enabled** governed tool in the dev-owned +catalog template (copied to runtime on integrate). --- @@ -305,7 +310,7 @@ registered a gated override — same **early import + wrap** effect as preload t ```python install_registry_hook() governed = governed_tool_names() -preload_governed_builtins(governed) # GOVERNED_BUILTIN_MODULES +preload_governed_builtins(governed_tools) # yaml builtin_module per enabled tool for entry in registry._snapshot_entries(): if entry.name not in governed: @@ -402,13 +407,13 @@ Unit tests: [`tests/hermes_plugin/test_builtin_preload.py`](../tests/hermes_plug | File | Role | |------|------| -| [`builtin_preload.py`](../integrations/hermes/plugin/intentframe-gate/builtin_preload.py) | `GOVERNED_BUILTIN_MODULES` map + selective `importlib.import_module` | +| [`builtin_preload.py`](../integrations/hermes/plugin/intentframe-gate/builtin_preload.py) | Preload from yaml ``builtin_module`` + selective ``importlib.import_module`` | | [`schema.py`](../integrations/hermes/plugin/intentframe-gate/schema.py) | `inject_reason()` — terminal-specific reason text branch | | [`gate.py`](../integrations/hermes/plugin/intentframe-gate/gate.py) | Validate via adapter, strip `reason`, delegate | | [`registry_hook.py`](../integrations/hermes/plugin/intentframe-gate/registry_hook.py) | Patch `registry.register` for dynamic tools | -When adding a governed Hermes **builtin**, add its import module to -`GOVERNED_BUILTIN_MODULES` (see [`test_builtin_preload.py`](../tests/hermes_plugin/test_builtin_preload.py)). +When adding a governed Hermes **builtin**, set ``builtin_module: tools.`` in the +repo catalog template (see [`test_builtin_preload.py`](../tests/hermes_plugin/test_builtin_preload.py)). --- @@ -416,7 +421,7 @@ When adding a governed Hermes **builtin**, add its import module to | Tool | Gateway E2E | Registration note | |------|-------------|-------------------| -| `terminal`, `process`, `write_file`, `patch` | Probed when in scoped yaml | Listed in `GOVERNED_BUILTIN_MODULES` — preload + snapshot | +| `terminal`, `process`, `write_file`, `patch`, `cronjob` | Probed when in scoped yaml | ``builtin_module`` in repo ``tools.yaml`` — preload + snapshot | Delete coverage uses `patch` V4A `*** Delete File:` ops (maps to `DELETE_HOST_FILE`). @@ -426,7 +431,7 @@ If a governed tool fails with “model never calls tool X”: 2. If X is on `/v1/toolsets` but **not** in the OpenAI Tools list, the registry / `get_definitions()` path dropped it (missing entry or failed `check_fn`). 3. Check plugin register logs for `wrapped` — empty means preload map may be missing X. -4. Add X to `GOVERNED_BUILTIN_MODULES` if Hermes registers it at module import time. +4. Set ``builtin_module: tools.`` in the repo catalog template if Hermes registers it at module import time. **Hermes-native long-term fix:** gateway could call `discover_builtin_tools()` before `discover_plugins()` (upstream). Until then, the plugin owns selective preload. diff --git a/integrations/hermes/governance/README.md b/integrations/hermes/governance/README.md index e196c1f..07162e9 100644 --- a/integrations/hermes/governance/README.md +++ b/integrations/hermes/governance/README.md @@ -2,7 +2,7 @@ | File | Owner | Purpose | |------|-------|---------| -| `tools.yaml` (repo) | **Dev** | Tool catalog: names, mappers, action IDs, default `enabled` | +| `tools.yaml` (repo) | **Dev** | Tool catalog: names, mappers, action IDs, default `enabled`, `builtin_module` (Hermes preload import path) | | `tools.yaml` (runtime) | **User** | Same catalog; user toggles `enabled` via `governance enable\|disable` | | `generic_actions.manifest` (repo) | **Dev** | Static list of all `mapper: generic` action IDs (full catalog superset) | | `generic_actions.manifest` (runtime) | **Copied once** | Seeded on `integrate hermes`; never overwritten by automation | @@ -38,11 +38,32 @@ Verify after edits: ```bash uv run --package intentframe-integrations-cli python tests/intentframe_integrations/test_actions_manifest.py +uv run --package intentframe-integrations-cli python tests/hermes_plugin/test_gate.py +uv run --package intentframe-integrations-cli python tests/hermes_plugin/test_builtin_preload.py +uv run --package intentframe-integrations-cli python integrations/hermes/shared/tests/test_governance.py ``` +### `builtin_module` (preload map) + +Each governed Hermes builtin declares `builtin_module: tools.` in repo +`tools.yaml`. The intentframe-gate plugin imports unique modules for **enabled** +tools before registry snapshot (see `builtin_preload.py`). Values must start with +`tools.` — validated by both plugin and shared loaders. + +**Why yaml, not Python:** a hardcoded preload dict drifted from the catalog (e.g. +`cronjob` governed in yaml but easy to omit from code). Yaml is the single source; +`test_plugin_loader_matches_shared_template` asserts plugin/shared parity including +`builtin_module`. + +**`cronjob` nuance:** preload registers the tool, but Hermes `get_tool_definitions()` +also applies `check_cronjob_requirements()` — requires `HERMES_GATEWAY_SESSION=1` +(or interactive/exec env). The toolsets schema probe sets session env to mirror the +gateway; see `tests/hermes_gateway/README.md` (Recent fixes). + ## Dev workflow (adding a generic tool) -1. Add entry to `tools.yaml` with `mapper: generic` and a `HERMES_*` action ID. +1. Add entry to `tools.yaml` with `mapper: generic`, a `HERMES_*` action ID, and + `builtin_module: tools.` when Hermes registers the tool at import time. 2. Regenerate committed `generic_actions.manifest` to include the new action ID (golden test `tests/intentframe_integrations/test_actions_manifest.py` enforces parity). 3. Update `agent.json` `action_types`, shipped `policy.yaml`, and `executor.yaml` diff --git a/integrations/hermes/governance/tools.yaml b/integrations/hermes/governance/tools.yaml index 129b8d5..2b133d9 100644 --- a/integrations/hermes/governance/tools.yaml +++ b/integrations/hermes/governance/tools.yaml @@ -4,6 +4,9 @@ # enabled: true → IntentFrame gates at runtime (plugin wrap + adapter validate). # enabled: false → spec kept in catalog; Hermes runs the tool ungoverned (no intent sent). # +# builtin_module → Hermes module that registers this tool at import time (dev-owned preload map). +# Must start with "tools."; plugin imports only for enabled tools before registry snapshot. +# # User control: intentframe-integrations governance enable|disable hermes # writes ONLY the runtime copy (~/.intentframe/.../governance/tools.yaml). # Restart Hermes gateway + adapter after toggling (governance is cached at process start). @@ -22,6 +25,7 @@ tools: risk: local_process mapper: terminal blocked_response: terminal_json + builtin_module: tools.terminal_tool process: enabled: true @@ -29,6 +33,7 @@ tools: risk: local_process mapper: process blocked_response: generic_json + builtin_module: tools.process_registry write_file: enabled: true @@ -36,6 +41,7 @@ tools: risk: local_write mapper: write_file blocked_response: generic_json + builtin_module: tools.file_tools patch: enabled: true @@ -44,6 +50,7 @@ tools: risk: local_write mapper: patch blocked_response: generic_json + builtin_module: tools.file_tools cronjob: enabled: true @@ -51,3 +58,4 @@ tools: risk: local_process mapper: generic blocked_response: generic_json + builtin_module: tools.cronjob_tools diff --git a/integrations/hermes/plugin/intentframe-gate/README.md b/integrations/hermes/plugin/intentframe-gate/README.md index 7048638..e766b46 100644 --- a/integrations/hermes/plugin/intentframe-gate/README.md +++ b/integrations/hermes/plugin/intentframe-gate/README.md @@ -44,13 +44,13 @@ At plugin load (`register()`): 3. Snapshot loop — wrap governed registry entries with `override=True` On gateway startup, plugins load **before** Hermes builtins. [`builtin_preload.py`](builtin_preload.py) -imports only modules in `GOVERNED_BUILTIN_MODULES` for **governed** tool names so -the snapshot loop can wrap them without calling full `discover_builtin_tools()` -(which would pull in extras like `read_terminal`). Details: -[`docs/hermes-plugin-registration-order.md`](../../../docs/hermes-plugin-registration-order.md). +imports ``builtin_module`` from each **enabled** governed tool in the dev-owned +``governance/tools.yaml`` so the snapshot loop can wrap them without calling full +``discover_builtin_tools()`` (which would pull in extras like ``read_terminal``). +Details: [`docs/hermes-plugin-registration-order.md`](../../../docs/hermes-plugin-registration-order.md). -When adding a governed Hermes builtin, add its import module to -`GOVERNED_BUILTIN_MODULES` and extend +When adding a governed Hermes builtin, set ``builtin_module: tools.`` in the +repo template and extend [`tests/hermes_plugin/test_builtin_preload.py`](../../../tests/hermes_plugin/test_builtin_preload.py). ## Env diff --git a/integrations/hermes/plugin/intentframe-gate/__init__.py b/integrations/hermes/plugin/intentframe-gate/__init__.py index d17b2d3..af12ae7 100644 --- a/integrations/hermes/plugin/intentframe-gate/__init__.py +++ b/integrations/hermes/plugin/intentframe-gate/__init__.py @@ -4,7 +4,7 @@ from .builtin_preload import preload_governed_builtins from .gate import wrap_handler -from .governance_loader import governed_tool_names +from .governance_loader import load_governed_tools from .registry_hook import install_registry_hook from .schema import inject_reason @@ -17,8 +17,9 @@ def register(ctx) -> None: install_registry_hook() - governed = governed_tool_names() - preload_governed_builtins(governed) + governed_tools = load_governed_tools() + governed = frozenset(governed_tools) + preload_governed_builtins(governed_tools) for entry in registry._snapshot_entries(): if entry.name not in governed: diff --git a/integrations/hermes/plugin/intentframe-gate/builtin_preload.py b/integrations/hermes/plugin/intentframe-gate/builtin_preload.py index 4d4525c..079b77a 100644 --- a/integrations/hermes/plugin/intentframe-gate/builtin_preload.py +++ b/integrations/hermes/plugin/intentframe-gate/builtin_preload.py @@ -4,24 +4,23 @@ import importlib import logging +from typing import TYPE_CHECKING + +if TYPE_CHECKING: + from .governance_loader import ToolSpec logger = logging.getLogger(__name__) -# Hermes 0.17 modules that register governed tool names at import time. -# Several catalog names may share one module (write_file + patch → file_tools). -GOVERNED_BUILTIN_MODULES: dict[str, str] = { - "terminal": "tools.terminal_tool", - "process": "tools.process_registry", - "write_file": "tools.file_tools", - "patch": "tools.file_tools", -} +def preload_governed_builtins(governed_tools: dict[str, "ToolSpec"]) -> None: + """Ensure governed Hermes builtins are registered before snapshot wrap. -def preload_governed_builtins(governed: frozenset[str]) -> None: - """Ensure governed Hermes builtins are registered before snapshot wrap.""" + Imports ``builtin_module`` from each enabled governed tool spec (yaml, dev-owned). + Disabled catalog entries are not in *governed_tools* and are never preloaded. + """ seen_modules: set[str] = set() - for tool_name in sorted(governed): - module_name = GOVERNED_BUILTIN_MODULES.get(tool_name) + for tool_name in sorted(governed_tools): + module_name = governed_tools[tool_name].builtin_module if not module_name or module_name in seen_modules: continue seen_modules.add(module_name) diff --git a/integrations/hermes/plugin/intentframe-gate/governance_loader.py b/integrations/hermes/plugin/intentframe-gate/governance_loader.py index a73d96c..0c8ff1d 100644 --- a/integrations/hermes/plugin/intentframe-gate/governance_loader.py +++ b/integrations/hermes/plugin/intentframe-gate/governance_loader.py @@ -12,6 +12,7 @@ VALID_BLOCKED_RESPONSES = frozenset({"terminal_json", "generic_json"}) VALID_MAPPER_KINDS = frozenset({"terminal", "process", "write_file", "patch", "generic"}) +BUILTIN_MODULE_PREFIX = "tools." @dataclass(frozen=True) @@ -23,6 +24,7 @@ class ToolSpec: blocked_response: str = "generic_json" actions: tuple[str, ...] = () enabled: bool = True + builtin_module: str | None = None def policy_actions(self) -> frozenset[str]: if self.actions: @@ -85,6 +87,20 @@ def _parse_enabled(raw: dict[str, Any]) -> bool: return enabled +def _parse_builtin_module(name: str, raw: dict[str, Any]) -> str | None: + value = raw.get("builtin_module") + if value is None: + return None + if not isinstance(value, str) or not value.strip(): + raise ValueError(f"Tool {name!r} builtin_module must be a non-empty string when present") + module = value.strip() + if not module.startswith(BUILTIN_MODULE_PREFIX): + raise ValueError( + f"Tool {name!r} builtin_module {module!r} must start with {BUILTIN_MODULE_PREFIX!r}" + ) + return module + + def _parse_tool(name: str, raw: dict[str, Any]) -> ToolSpec: action = str(raw.get("action", "")).strip() risk = str(raw.get("risk", "")).strip() @@ -104,6 +120,7 @@ def _parse_tool(name: str, raw: dict[str, Any]) -> ToolSpec: blocked_response=blocked_response, actions=_parse_actions(raw, action), enabled=_parse_enabled(raw), + builtin_module=_parse_builtin_module(name, raw), ) diff --git a/integrations/hermes/shared/src/hermes_governance/loader.py b/integrations/hermes/shared/src/hermes_governance/loader.py index 4ff8133..b8a5224 100644 --- a/integrations/hermes/shared/src/hermes_governance/loader.py +++ b/integrations/hermes/shared/src/hermes_governance/loader.py @@ -13,6 +13,7 @@ VALID_BLOCKED_RESPONSES = frozenset({"terminal_json", "generic_json"}) VALID_MAPPER_KINDS = frozenset({"terminal", "process", "write_file", "patch", "generic"}) +BUILTIN_MODULE_PREFIX = "tools." @dataclass(frozen=True) @@ -24,6 +25,7 @@ class ToolSpec: blocked_response: str = "generic_json" actions: tuple[str, ...] = () enabled: bool = True + builtin_module: str | None = None def policy_actions(self) -> frozenset[str]: if self.actions: @@ -94,6 +96,20 @@ def _parse_enabled(raw: dict[str, Any]) -> bool: return enabled +def _parse_builtin_module(name: str, raw: dict[str, Any]) -> str | None: + value = raw.get("builtin_module") + if value is None: + return None + if not isinstance(value, str) or not value.strip(): + raise ValueError(f"Tool {name!r} builtin_module must be a non-empty string when present") + module = value.strip() + if not module.startswith(BUILTIN_MODULE_PREFIX): + raise ValueError( + f"Tool {name!r} builtin_module {module!r} must start with {BUILTIN_MODULE_PREFIX!r}" + ) + return module + + def _parse_tool(name: str, raw: dict[str, Any]) -> ToolSpec: if not isinstance(raw, dict): raise ValueError(f"Tool {name!r} must be a mapping") @@ -129,6 +145,7 @@ def _parse_tool(name: str, raw: dict[str, Any]) -> ToolSpec: blocked_response=blocked_response, actions=_parse_actions(raw, primary_action), enabled=_parse_enabled(raw), + builtin_module=_parse_builtin_module(name, raw), ) diff --git a/integrations/hermes/shared/tests/test_governance.py b/integrations/hermes/shared/tests/test_governance.py index 7dac025..eacafe9 100644 --- a/integrations/hermes/shared/tests/test_governance.py +++ b/integrations/hermes/shared/tests/test_governance.py @@ -41,6 +41,41 @@ def test_load_governed_tools(self) -> None: self.assertEqual(frozenset(catalog), template_catalog_tool_names()) self.assertEqual(frozenset(tools), template_governed_tool_names()) + def test_builtin_module_on_catalog_tools(self) -> None: + from hermes_governance.loader import load_tool_catalog + + with governance_env(): + catalog = load_tool_catalog() + self.assertEqual(catalog["terminal"].builtin_module, "tools.terminal_tool") + self.assertEqual(catalog["process"].builtin_module, "tools.process_registry") + self.assertEqual(catalog["write_file"].builtin_module, "tools.file_tools") + self.assertEqual(catalog["patch"].builtin_module, "tools.file_tools") + self.assertEqual(catalog["cronjob"].builtin_module, "tools.cronjob_tools") + + def test_invalid_builtin_module_prefix_raises(self) -> None: + from hermes_governance.loader import load_tool_catalog + + yaml_text = """ +tools: + terminal: + enabled: true + action: RUN_COMMAND + risk: local_process + mapper: terminal + builtin_module: os.path +""" + with tempfile.NamedTemporaryFile("w", suffix=".yaml", delete=False) as handle: + handle.write(yaml_text) + path = handle.name + + try: + load_tool_catalog.cache_clear() + with self.assertRaises(ValueError): + load_tool_catalog(path) + finally: + load_tool_catalog.cache_clear() + Path(path).unlink(missing_ok=True) + def test_generic_mapper_action_ids(self) -> None: from hermes_governance.loader import generic_mapper_action_ids diff --git a/tests/hermes_gateway/README.md b/tests/hermes_gateway/README.md index bb2ac0c..8d17de1 100644 --- a/tests/hermes_gateway/README.md +++ b/tests/hermes_gateway/README.md @@ -197,6 +197,7 @@ Hermes gateway before trusting stale PID files. | Wrong governed set at runtime | Parent env not propagated to adapter/gateway | E2E `assert_governance_env_contract` failure; check `gateway start` stderr for `Hermes governance config:` line | | `patch replace ALLOW` fails Pass 2a (overwrite BLOCK) | Same marker/file reused across passes | Pass-unique marker + `seed_patch_replace_target` (see [Probe harness determinism](#probe-harness-determinism)) | | `patch replace BLOCK`: path under `/tmp/…` | LLM rewrote `/etc/…` after block | Explicit block prompt in `run_patch_replace_block_once` | +| Toolsets probe: `cronjob missing from get_tool_definitions()` | Probe subprocess lacks gateway session env | Toolsets live sets `HERMES_GATEWAY_SESSION=1` in probe env; see [Recent fixes](#recent-fixes-2026-06) under toolsets section | ## Related docs @@ -227,6 +228,8 @@ parity with CLI child env builders). `test_hermes_install.py` covers Lighter-weight than full gateway E2E: proves intentframe-gate changes appear on the **OpenAI upstream `tools=` payload**, not just Hermes config/listing surfaces. +Covers **all** governed catalog tools (including generic mappers like `cronjob`), not +only the native-mapper subset used for gateway E2E ALLOW/BLOCK probes. Entrypoint: `test_gateway_toolsets_live.py` Wrapper: `tests/scripts/test-hermes-gateway-toolsets.sh` @@ -242,7 +245,9 @@ POST /v1/responses → real chat.completions round-trip + request dump assertions → token usage > 0, governed tools + reason in tools= ``` -Hermes gateway starts with `HERMES_DUMP_REQUESTS=1`. One cheap `POST /v1/responses` +Hermes gateway starts with `HERMES_DUMP_REQUESTS=1`. The schema probe subprocess +sets `HERMES_GATEWAY_SESSION=1` so ``cronjob`` passes Hermes ``check_fn`` (same +as the running gateway). One cheap `POST /v1/responses` prompts the model to reply `OK` without calling tools (minimizes IntentFrame policy noise; still sends the full `tools=` list upstream). @@ -251,8 +256,8 @@ noise; still sends the full `tools=` list upstream). | Surface | What it proves | |---------|----------------| | `GET /v1/toolsets` | Hermes **config** tool names for api_server (e.g. ~31) | -| `probe_hermes_tool_schemas.py` | **Registry** schemas for native-mapper governed tools (`reason` + gate); generic tools skipped | -| Request dump + round-trip assert | **OpenAI `chat.completions` payload** — native governed tools with required `reason` in `tools=` | +| `probe_hermes_tool_schemas.py` | **Registry** schemas for all governed tools (`reason` + gate), including generic mappers | +| Request dump + round-trip assert | **OpenAI `chat.completions` payload** — all governed tools with required `reason` in `tools=` | The registry count and toolsets count differ by design — not every listed toolset name becomes a registry definition on the LLM path. See @@ -263,7 +268,7 @@ name becomes a registry definition on the LLM path. See | Helper | Checks | |--------|--------| | `assert_gateway_openai_roundtrip()` | Gateway `status: completed` and `usage.total_tokens > 0` | -| `assert_provider_tools_surface()` | Native-mapper governed tools in dump `request.body.tools` with required `reason` | +| `assert_provider_tools_surface()` | All governed catalog tools in dump `request.body.tools` with required `reason` | The request dump is written at **preflight** (before the HTTP call to OpenAI). Token usage from the gateway response proves the call **completed** — the dump alone only @@ -273,16 +278,19 @@ Contract helpers: `tests/hermes_gateway/provider_request_contract.py` ### Stderr on success -1. **OpenAI round-trip proof** — input/output/total tokens, provider URL/model -2. **Provider tools= snapshot** — dump path, sorted tool list, `[governed, reason_required=true]` markers +1. **OpenAI round-trip proof** — run marker, input/output/total tokens, provider URL/model +2. **Provider tools= snapshot** — run marker, dump path, sorted tool list, `[governed, reason_required=true]` markers ### Finding the call in OpenAI Platform -Shows as **Chat Completion** (`gpt-4o-mini`), not Responses API. Typical signature: +Each run prints a unique marker: ``intentframe-toolsets-``. Search Platform +Logs / Usage for that token in the user prompt or assistant reply to match this run. -- User prompt: `Reply with the single word OK. Do not call any tools.` -- System includes: `Automated integration test. Do not use tools.` -- ~11k input tokens, **17 tools** in the Tools list, output `OK` +Typical signature: + +- User prompt contains: ``Reply with exactly this single token ... intentframe-toolsets-...`` +- System includes: ``Automated IntentFrame toolsets integration test run_id=...`` +- ~11k input tokens, **17+ tools** in the Tools list, output is the marker token - No tool invocations (unlike full E2E entries that say “Call the terminal tool…”) Platform Logs list tool **names** but not JSON schema details (`reason` in `required`). @@ -294,3 +302,15 @@ RUN_HERMES_GATEWAY_TOOLSETS=1 ./tests/scripts/test-hermes-gateway-toolsets.sh The full gateway E2E also asserts `/v1/toolsets` before the LLM probes, but uses tool-calling prompts for ALLOW/BLOCK — a different OpenAI log signature. + +### Recent fixes (2026-06) + +These were gaps between production behavior and what the toolsets live harness asserted. + +| Bug | Symptom | Root cause | Fix | +|-----|---------|------------|-----| +| **Partial governed coverage** | Toolsets test skipped `cronjob` while production governs it | Probe and provider dump used `gateway_e2e_probe_tool_names()` (native E2E tier only) | Probe uses `governed_tool_names()`; live test asserts `template_governed_tool_names()` on the request dump | +| **Preload map drift** | Adding a governed builtin required editing a hardcoded Python dict | `GOVERNED_BUILTIN_MODULES` lived in `builtin_preload.py`, separate from `tools.yaml` | `builtin_module: tools.` per tool in repo `tools.yaml`; plugin preload imports enabled specs; shared + plugin loaders validate `tools.` prefix | +| **`cronjob` schema probe failure** | Probe reported `cronjob missing from get_tool_definitions()` despite yaml preload | Preload registered `cronjob`, but Hermes `check_cronjob_requirements()` filters it unless `HERMES_GATEWAY_SESSION=1` (or interactive/exec env); probe subprocess lacked that env while the gateway had it | `_run_schema_probe()` sets `probe_env["HERMES_GATEWAY_SESSION"] = "1"` to mirror the running gateway | + +Guarded by `test_governed_tool_coverage.py` (`test_toolsets_live_verifies_full_governed_catalog`) and loader parity in `tests/hermes_plugin/test_gate.py` (`builtin_module` must match between plugin and shared loaders). diff --git a/tests/hermes_gateway/probe_hermes_tool_schemas.py b/tests/hermes_gateway/probe_hermes_tool_schemas.py index c808028..ce15299 100644 --- a/tests/hermes_gateway/probe_hermes_tool_schemas.py +++ b/tests/hermes_gateway/probe_hermes_tool_schemas.py @@ -2,11 +2,12 @@ """Probe Hermes registry schemas after intentframe-gate (reason injection). Run inside the managed Hermes venv with HERMES_HOME set. Used by gateway -toolsets live test to verify **native-mapper** governed tools use Hermes names -and require ``reason`` in their JSON schema. +toolsets live test to verify **all** IntentFrame-governed tools (native and +generic mappers) use Hermes names, require ``reason`` in their JSON schema, +and have gated handlers. -Generic-mapper tools (e.g. ``cronjob``) are governed at runtime but excluded -here — same two-tier contract as gateway E2E (live semantic smoke only). +Requires ``HERMES_GATEWAY_SESSION=1`` in the probe environment (set by the +toolsets live harness) so ``cronjob`` passes Hermes ``check_fn`` filtering. """ from __future__ import annotations @@ -25,7 +26,6 @@ sys.path.insert(0, str(PLUGIN_SRC)) from governance_loader import governed_tool_names # type: ignore # noqa: E402 -from hermes_governance_fixtures import gateway_e2e_probe_tool_names # noqa: E402 def main() -> int: @@ -52,9 +52,7 @@ def main() -> int: ) definitions = get_tool_definitions(enabled_toolsets=enabled_toolsets, quiet_mode=True) - governed = governed_tool_names() - probe_targets = gateway_e2e_probe_tool_names() & governed - skipped_generic = sorted(governed - probe_targets) + probe_targets = governed_tool_names() by_name: dict[str, dict] = {} for item in definitions: fn = item.get("function") @@ -69,7 +67,6 @@ def main() -> int: "enabled_toolset_count": len(enabled_toolsets), "definition_count": len(definitions), "governed_tools": {}, - "skipped_generic_governed": skipped_generic, "distractors": {}, } diff --git a/tests/hermes_gateway/provider_request_contract.py b/tests/hermes_gateway/provider_request_contract.py index 6dd1fe0..e1b72f5 100644 --- a/tests/hermes_gateway/provider_request_contract.py +++ b/tests/hermes_gateway/provider_request_contract.py @@ -65,6 +65,86 @@ def load_newest_request_dump_body( return body +TOOLSETS_RUN_MARKER_PREFIX = "intentframe-toolsets" + + +def toolsets_run_marker(run_id: str) -> str: + """Unique token for one toolsets live test run (OpenAI logs + request dumps).""" + return f"{TOOLSETS_RUN_MARKER_PREFIX}-{run_id}" + + +def toolsets_llm_prompt(marker: str) -> str: + return ( + f"Reply with exactly this single token and nothing else: {marker}. " + "Do not call any tools." + ) + + +def toolsets_llm_instructions(marker: str) -> str: + return ( + f"Automated IntentFrame toolsets integration test run_id={marker}. " + "Do not use tools. Your entire reply must be exactly the marker token." + ) + + +def extract_gateway_text_output(gateway_body: dict[str, Any]) -> str: + """Concatenate assistant text from Hermes ``POST /v1/responses`` output items.""" + parts: list[str] = [] + output = gateway_body.get("output") + if not isinstance(output, list): + return "" + for item in output: + if not isinstance(item, dict): + continue + content = item.get("content") + if isinstance(content, list): + for block in content: + if isinstance(block, dict): + text = block.get("text") + if isinstance(text, str): + parts.append(text) + text = item.get("text") + if isinstance(text, str): + parts.append(text) + return " ".join(parts).strip() + + +def assert_gateway_response_contains_marker(gateway_body: dict[str, Any], marker: str) -> str: + """Assert the LLM round-trip echoed the run marker in gateway output.""" + text = extract_gateway_text_output(gateway_body) + if marker not in text: + raise AssertionError( + f"Gateway response missing run marker {marker!r}.\n" + f" extracted_text={text!r}\n" + f" body: {json.dumps(gateway_body)[:2000]}" + ) + return text + + +def extract_provider_user_message(body: dict[str, Any]) -> str | None: + """Best-effort user message from a chat.completions request dump body.""" + messages = body.get("messages") + if not isinstance(messages, list): + return None + for msg in reversed(messages): + if not isinstance(msg, dict) or msg.get("role") != "user": + continue + content = msg.get("content") + if isinstance(content, str): + return content + return None + + +def assert_provider_request_contains_marker(body: dict[str, Any], marker: str) -> None: + """Assert the OpenAI upstream payload included the run marker in the user message.""" + user = extract_provider_user_message(body) + if user is None or marker not in user: + raise AssertionError( + f"Provider request dump missing run marker {marker!r}.\n" + f" user_message={user!r}" + ) + + def tool_reason_required(fn: dict[str, Any]) -> bool: """Return True when ``reason`` is a required parameter in a function schema.""" params = fn.get("parameters") @@ -191,6 +271,8 @@ def format_gateway_roundtrip_snapshot( *, provider_url: str | None = None, expected_model: str | None = None, + run_marker: str | None = None, + gateway_output_text: str | None = None, ) -> str: """Human-readable proof that Hermes completed an OpenAI chat.completions call.""" usage = extract_gateway_usage(gateway_body) @@ -198,11 +280,19 @@ def format_gateway_roundtrip_snapshot( output_items = output if isinstance(output, list) else [] lines = [ "OpenAI round-trip completed (via Hermes gateway):", - f" gateway_status={gateway_body.get('status')!r}", - f" provider_url={provider_url!r}", ] + if run_marker is not None: + lines.append(f" run_marker={run_marker!r}") + lines.extend( + [ + f" gateway_status={gateway_body.get('status')!r}", + f" provider_url={provider_url!r}", + ] + ) if expected_model is not None: lines.append(f" provider_model={expected_model!r}") + if gateway_output_text is not None: + lines.append(f" gateway_output_text={gateway_output_text!r}") lines.extend( [ f" input_tokens={usage.get('input_tokens', 0)}", @@ -224,6 +314,7 @@ def format_provider_tools_snapshot( governed: frozenset[str], *, dump_path: Path | None = None, + run_marker: str | None = None, ) -> str: """Human-readable summary of ``tools=`` sent to the OpenAI provider.""" by_name = parse_provider_tools(body) @@ -232,6 +323,8 @@ def format_provider_tools_snapshot( f"model={model!r}", f"tool_count={len(by_name)}", ] + if run_marker is not None: + lines.append(f"run_marker={run_marker!r}") if dump_path is not None: lines.append(f"request_dump={dump_path}") lines.append("") diff --git a/tests/hermes_gateway/test_gateway_toolsets_live.py b/tests/hermes_gateway/test_gateway_toolsets_live.py index 0526f93..123b7e5 100644 --- a/tests/hermes_gateway/test_gateway_toolsets_live.py +++ b/tests/hermes_gateway/test_gateway_toolsets_live.py @@ -1,5 +1,12 @@ #!/usr/bin/env python3 -"""Live gateway test: toolsets, schema probe, and provider tools= payload.""" +"""Live gateway test: toolsets, schema probe, and provider tools= payload. + +Asserts every IntentFrame-governed catalog tool (native mappers and generic mappers +such as ``cronjob``) on the OpenAI upstream ``tools=`` payload with required +``reason``. Differs from gateway E2E, which LLM-probes only the native-mapper tier. + +See ``tests/hermes_gateway/README.md`` (toolsets section) for flow and recent fixes. +""" from __future__ import annotations @@ -20,12 +27,17 @@ ) from provider_request_contract import ( # noqa: E402 assert_gateway_openai_roundtrip, + assert_gateway_response_contains_marker, + assert_provider_request_contains_marker, assert_provider_tools_surface, format_gateway_roundtrip_snapshot, format_provider_tools_snapshot, load_newest_request_dump, load_request_dump, request_dump_paths, + toolsets_llm_instructions, + toolsets_llm_prompt, + toolsets_run_marker, ) from cli_runner import CliError, format_diagnostics, run_cli, step, stop_everything # noqa: E402 from isolation import ( # noqa: E402 @@ -47,7 +59,7 @@ _TESTS_DIR = HERE.parent if str(_TESTS_DIR) not in sys.path: sys.path.insert(0, str(_TESTS_DIR)) -from hermes_governance_fixtures import gateway_e2e_probe_tool_names # noqa: E402 +from hermes_governance_fixtures import template_governed_tool_names # noqa: E402 API_HOST = "127.0.0.1" INSTALL_TIMEOUT = 600.0 @@ -73,9 +85,12 @@ def _run_schema_probe(env: IsolatedEnv) -> None: raise AssertionError(f"Probe script missing: {PROBE_SCRIPT}") step("Probe Hermes registry schemas (reason injection + gate markers)") + probe_env = os.environ.copy() + # Mirror gateway process: cronjob check_fn requires HERMES_GATEWAY_SESSION. + probe_env["HERMES_GATEWAY_SESSION"] = "1" result = subprocess.run( [str(python), str(PROBE_SCRIPT)], - env=os.environ.copy(), + env=probe_env, capture_output=True, text=True, timeout=120.0, @@ -98,6 +113,9 @@ def main() -> int: try: env = create_isolated_env() + run_marker = toolsets_run_marker(env.run_id) + step(f"Run marker (OpenAI log correlation): {run_marker}") + print(f"\n==> Toolsets live run marker: {run_marker}", file=sys.stderr) step(f"Activating sandbox HOME={env.home} HERMES_HOME={env.hermes_home}") activate(env) step(f"Seeding Hermes OpenAI provider (model={_e2e_openai_model()})") @@ -147,16 +165,17 @@ def main() -> int: _run_schema_probe(env) existing_dumps = frozenset(request_dump_paths(env.hermes_home)) - step("POST /v1/responses (capture provider tools= for OpenAI)") + step(f"POST /v1/responses (capture provider tools= for OpenAI, marker={run_marker})") responses_body = post_responses( host=API_HOST, port=env.api_port, api_key=env.api_key, - prompt="Reply with the single word OK. Do not call any tools.", - instructions="Automated integration test. Do not use tools.", + prompt=toolsets_llm_prompt(run_marker), + instructions=toolsets_llm_instructions(run_marker), ) assert_gateway_openai_roundtrip(responses_body) - governed = gateway_e2e_probe_tool_names() + gateway_output_text = assert_gateway_response_contains_marker(responses_body, run_marker) + governed = template_governed_tool_names() dump_path, provider_body = load_newest_request_dump( env.hermes_home, existing=existing_dumps, @@ -164,6 +183,7 @@ def main() -> int: dump_raw = load_request_dump(dump_path) request_meta = dump_raw.get("request") request_meta_dict = request_meta if isinstance(request_meta, dict) else {} + assert_provider_request_contains_marker(provider_body, run_marker) assert_provider_tools_surface( provider_body, governed, @@ -178,6 +198,8 @@ def main() -> int: responses_body, provider_url=provider_url, expected_model=_e2e_openai_model(), + run_marker=run_marker, + gateway_output_text=gateway_output_text, ), file=sys.stderr, ) @@ -187,13 +209,17 @@ def main() -> int: provider_body, governed, dump_path=dump_path, + run_marker=run_marker, ), file=sys.stderr, ) assert_real_state_untouched(env) exit_code = 0 - print("\n==> Hermes gateway toolsets live test passed", file=sys.stderr) + print( + f"\n==> Hermes gateway toolsets live test passed (run_marker={run_marker})", + file=sys.stderr, + ) except (CliError, AssertionError, TimeoutError, RuntimeError, subprocess.TimeoutExpired) as exc: print(f"\nERROR: {exc}", file=sys.stderr) if env is not None: diff --git a/tests/hermes_gateway/test_governed_tool_coverage.py b/tests/hermes_gateway/test_governed_tool_coverage.py index e031b08..fe9af38 100644 --- a/tests/hermes_gateway/test_governed_tool_coverage.py +++ b/tests/hermes_gateway/test_governed_tool_coverage.py @@ -67,23 +67,17 @@ def test_live_plugin_gate_covers_all_catalog_tools(self) -> None: for fixture in LIVE_PLUGIN_EXTRA_FIXTURES: self.assertIn(fixture, source) - def test_toolsets_live_uses_native_gateway_probe_tier_only(self) -> None: - native = gateway_e2e_probe_tool_names() - generic = live_semantic_probe_tool_names() + def test_toolsets_live_verifies_full_governed_catalog(self) -> None: + """Toolsets live test asserts plugin changes for all governed tools, not native E2E tier only.""" probe = (GATEWAY_DIR / "probe_hermes_tool_schemas.py").read_text(encoding="utf-8") toolsets_live = (GATEWAY_DIR / "test_gateway_toolsets_live.py").read_text( encoding="utf-8" ) - self.assertIn("gateway_e2e_probe_tool_names", probe) - self.assertIn("gateway_e2e_probe_tool_names", toolsets_live) - self.assertNotIn("template_governed_tool_names", toolsets_live) - for tool in generic: - self.assertNotIn( - f'"{tool}"', - probe, - msg=f"schema probe must not hardcode generic tool {tool!r}", - ) - self.assertFalse(native & generic) + self.assertIn("governed_tool_names()", probe) + self.assertNotIn("gateway_e2e_probe_tool_names", probe) + self.assertIn("template_governed_tool_names", toolsets_live) + self.assertNotIn("gateway_e2e_probe_tool_names", toolsets_live) + self.assertNotIn("skipped_generic_governed", probe) def main() -> int: diff --git a/tests/hermes_gateway/test_provider_request_contract.py b/tests/hermes_gateway/test_provider_request_contract.py index 864b562..92e89ff 100644 --- a/tests/hermes_gateway/test_provider_request_contract.py +++ b/tests/hermes_gateway/test_provider_request_contract.py @@ -15,7 +15,10 @@ from provider_request_contract import ( # noqa: E402 assert_gateway_openai_roundtrip, + assert_gateway_response_contains_marker, + assert_provider_request_contains_marker, assert_provider_tools_surface, + extract_gateway_text_output, format_gateway_roundtrip_snapshot, format_provider_tools_snapshot, load_newest_request_dump, @@ -23,6 +26,9 @@ load_request_dump_body, parse_provider_tools, tool_reason_required, + toolsets_llm_instructions, + toolsets_llm_prompt, + toolsets_run_marker, ) @@ -123,16 +129,39 @@ def test_format_gateway_roundtrip_snapshot(self) -> None: body = { "status": "completed", "usage": {"input_tokens": 11655, "output_tokens": 2, "total_tokens": 11657}, - "output": [{"type": "message", "content": [{"text": "OK"}]}], + "output": [{"type": "message", "content": [{"text": "intentframe-toolsets-abc"}]}], } + marker = toolsets_run_marker("abc") text = format_gateway_roundtrip_snapshot( body, provider_url="https://api.openai.com/v1/chat/completions", expected_model="gpt-4o-mini", + run_marker=marker, + gateway_output_text=marker, ) + self.assertIn("run_marker='intentframe-toolsets-abc'", text) self.assertIn("total_tokens=11657", text) self.assertIn("provider_url='https://api.openai.com/v1/chat/completions'", text) + def test_toolsets_run_marker_helpers(self) -> None: + marker = toolsets_run_marker("deadbeef") + self.assertEqual(marker, "intentframe-toolsets-deadbeef") + self.assertIn(marker, toolsets_llm_prompt(marker)) + self.assertIn(marker, toolsets_llm_instructions(marker)) + + def test_assert_gateway_response_contains_marker(self) -> None: + marker = toolsets_run_marker("abc") + body = { + "output": [{"type": "message", "content": [{"text": marker}]}], + } + self.assertEqual(assert_gateway_response_contains_marker(body, marker), marker) + self.assertEqual(extract_gateway_text_output(body), marker) + + def test_assert_provider_request_contains_marker(self) -> None: + marker = toolsets_run_marker("abc") + body = {"messages": [{"role": "user", "content": toolsets_llm_prompt(marker)}]} + assert_provider_request_contains_marker(body, marker) + def test_format_provider_tools_snapshot(self) -> None: body = { "model": "gpt-4o-mini", diff --git a/tests/hermes_plugin/test_builtin_preload.py b/tests/hermes_plugin/test_builtin_preload.py index 28a521e..a70a743 100644 --- a/tests/hermes_plugin/test_builtin_preload.py +++ b/tests/hermes_plugin/test_builtin_preload.py @@ -6,6 +6,7 @@ import sys import unittest from pathlib import Path +from types import SimpleNamespace from unittest import mock TESTS_DIR = Path(__file__).resolve().parent @@ -17,20 +18,35 @@ preload_mod = load_plugin_module("builtin_preload") +def _spec(module: str | None) -> SimpleNamespace: + return SimpleNamespace(builtin_module=module) + + class PreloadGovernedBuiltinsTests(unittest.TestCase): - def test_imports_unique_modules_for_governed_tools(self) -> None: - governed = frozenset({"terminal", "write_file", "patch"}) + def test_imports_unique_modules_from_governed_specs(self) -> None: + governed_tools = { + "terminal": _spec("tools.terminal_tool"), + "write_file": _spec("tools.file_tools"), + "patch": _spec("tools.file_tools"), + } with mock.patch.object(preload_mod.importlib, "import_module") as import_module: - preload_mod.preload_governed_builtins(governed) + preload_mod.preload_governed_builtins(governed_tools) import_module.assert_any_call("tools.terminal_tool") import_module.assert_any_call("tools.file_tools") self.assertEqual(import_module.call_count, 2) - def test_skips_unknown_governed_tools(self) -> None: - governed = frozenset({"unknown_future_tool"}) + def test_imports_cronjob_module_when_governed(self) -> None: + governed_tools = {"cronjob": _spec("tools.cronjob_tools")} + with mock.patch.object(preload_mod.importlib, "import_module") as import_module: + preload_mod.preload_governed_builtins(governed_tools) + + import_module.assert_called_once_with("tools.cronjob_tools") + + def test_skips_tools_without_builtin_module(self) -> None: + governed_tools = {"unknown_future_tool": _spec(None)} with mock.patch.object(preload_mod.importlib, "import_module") as import_module: - preload_mod.preload_governed_builtins(governed) + preload_mod.preload_governed_builtins(governed_tools) import_module.assert_not_called() diff --git a/tests/hermes_plugin/test_gate.py b/tests/hermes_plugin/test_gate.py index 5130ac6..5ff3868 100644 --- a/tests/hermes_plugin/test_gate.py +++ b/tests/hermes_plugin/test_gate.py @@ -80,9 +80,20 @@ def test_plugin_loader_matches_shared_template(self) -> None: ensure_shared_loader_importable() from hermes_governance.loader import load_governed_tools as shared_load_governed - plugin_names = frozenset(governance_mod.load_governed_tools().keys()) - shared_names = frozenset(shared_load_governed().keys()) - self.assertEqual(plugin_names, shared_names) + plugin_tools = governance_mod.load_governed_tools() + shared_tools = shared_load_governed() + self.assertEqual(frozenset(plugin_tools), frozenset(shared_tools)) + for name in plugin_tools: + self.assertEqual( + plugin_tools[name].builtin_module, + shared_tools[name].builtin_module, + msg=f"builtin_module mismatch for {name!r}", + ) + self.assertEqual( + plugin_tools[name].enabled, + shared_tools[name].enabled, + msg=f"enabled mismatch for {name!r}", + ) class TestGateToolCall(PluginGovernanceEnvMixin, unittest.TestCase): diff --git a/tests/scripts/e2e.sh b/tests/scripts/e2e.sh index 8b11311..516e75f 100755 --- a/tests/scripts/e2e.sh +++ b/tests/scripts/e2e.sh @@ -160,6 +160,7 @@ step "Integrations CLI unit tests" (cd "$REPO_ROOT" && uv run --package intentframe-integrations-cli python tests/intentframe_integrations/test_policy_contract.py) (cd "$REPO_ROOT" && uv run --package intentframe-integrations-cli python tests/intentframe_integrations/test_policy_manage.py) (cd "$REPO_ROOT" && uv run --package intentframe-integrations-cli python tests/intentframe_integrations/test_governance_runtime_contract.py) +(cd "$REPO_ROOT" && uv run --package intentframe-integrations-cli python tests/intentframe_integrations/test_make_catalog_yaml.py) (cd "$REPO_ROOT" && uv run --package intentframe-integrations-cli python tests/intentframe_integrations/test_scoped_governance_yaml.py) (cd "$REPO_ROOT" && uv run --package intentframe-integrations-cli python tests/intentframe_integrations/test_actions_manifest.py) diff --git a/tests/scripts/test-hermes-gateway-toolsets.sh b/tests/scripts/test-hermes-gateway-toolsets.sh index 26270a4..f897260 100755 --- a/tests/scripts/test-hermes-gateway-toolsets.sh +++ b/tests/scripts/test-hermes-gateway-toolsets.sh @@ -3,9 +3,11 @@ # # After integrate hermes: # 1. GET /v1/toolsets (config surface) -# 2. probe_hermes_tool_schemas.py (native governed registry + reason injection) +# 2. probe_hermes_tool_schemas.py (all governed registry tools + reason injection) # 3. POST /v1/responses + HERMES_DUMP_REQUESTS=1 (real chat.completions round-trip) -# 4. Assert token usage > 0 and native governed tools have required reason in tools= +# 4. Assert token usage > 0 and all governed catalog tools have required reason in tools= +# +# Schema probe sets HERMES_GATEWAY_SESSION=1 so cronjob passes Hermes check_fn (see README). # # RUN_HERMES_GATEWAY_TOOLSETS=1 ./tests/scripts/test-hermes-gateway-toolsets.sh #