diff --git a/docs/agent-tool-gating.md b/docs/agent-tool-gating.md
index 06d2bf6..71291c4 100644
--- a/docs/agent-tool-gating.md
+++ b/docs/agent-tool-gating.md
@@ -275,7 +275,7 @@ Wiring (schema/handler layers plus registration order):
The plugin loads the same contract as the adapter (`hermes-governance` / bundled
YAML). Adding a tool is mostly **config + mapper + policy**, plus an entry in
-`GOVERNED_BUILTIN_MODULES` when Hermes registers the tool at module import time.
+``builtin_module`` in the repo catalog template when Hermes registers the tool at module import time.
**History (before v1):** the first proof gated only `terminal` via
`intentframe-terminal`, which imported `tools.terminal_tool` at plugin load (early
diff --git a/docs/hermes-intentframe-integration-guide.md b/docs/hermes-intentframe-integration-guide.md
index 3bfda84..ba29b91 100644
--- a/docs/hermes-intentframe-integration-guide.md
+++ b/docs/hermes-intentframe-integration-guide.md
@@ -153,7 +153,9 @@ Names only — full JSON schemas are probed separately via ``probe_hermes_tool_s
**Rule:** when debugging “model never calls tool X”, verify X appears in the **OpenAI
Tools block** (request dump with `HERMES_DUMP_REQUESTS=1`, trace, or gateway logs),
-not only on `/v1/toolsets`. Automated check:
+not only on `/v1/toolsets`. For governed builtins, also check preload (`builtin_module`
+in yaml) and Hermes per-tool `check_fn` env (e.g. `cronjob` needs `HERMES_GATEWAY_SESSION`).
+Automated check:
`RUN_HERMES_GATEWAY_TOOLSETS=1 ./tests/scripts/test-hermes-gateway-toolsets.sh`
(see [`tests/hermes_gateway/README.md`](../tests/hermes_gateway/README.md#toolsets--provider-payload-test-opt-in-networked-llm)).
@@ -271,19 +273,23 @@ unless explicitly added to the contract — govern by **tool name**, not toolset
### 1. Selective preload
-```12:29:integrations/hermes/plugin/intentframe-gate/builtin_preload.py
-GOVERNED_BUILTIN_MODULES: dict[str, str] = {
- "terminal": "tools.terminal_tool",
- "process": "tools.process_registry",
- "write_file": "tools.file_tools",
- "patch": "tools.file_tools",
-}
+Each catalog tool may declare ``builtin_module`` in the dev-owned repo
+``integrations/hermes/governance/tools.yaml`` (copied to runtime on integrate).
+The plugin imports those modules for **enabled** governed tools only:
-def preload_governed_builtins(governed: frozenset[str]) -> None:
- ...
- importlib.import_module(module_name)
+```yaml
+terminal:
+ enabled: true
+ builtin_module: tools.terminal_tool
+cronjob:
+ enabled: true
+ builtin_module: tools.cronjob_tools
```
+[`builtin_preload.py`](../integrations/hermes/plugin/intentframe-gate/builtin_preload.py)
+validates ``builtin_module`` must start with ``tools.`` and imports unique modules before
+the registry snapshot.
+
**Why not call `discover_builtin_tools()` in the plugin?**
Hermes discovers builtins by AST-scanning `tools/*.py` and importing every module
@@ -403,7 +409,7 @@ the modules first.
### Scrutinize import changes like API changes
-When editing `GOVERNED_BUILTIN_MODULES` or any plugin import:
+When editing ``builtin_module`` in the repo catalog template or any plugin import:
| Change | Risk |
|--------|------|
@@ -507,15 +513,18 @@ do not change manifest or policy files.
### Step 4 — Plugin preload (if Hermes builtin)
-If the tool is a Hermes built-in registered at import time, add to
-`GOVERNED_BUILTIN_MODULES`:
+If the tool is a Hermes built-in registered at import time, set in repo
+``integrations/hermes/governance/tools.yaml``:
-```python
-"my_tool": "tools.my_tool_module",
+```yaml
+my_tool:
+ enabled: true
+ builtin_module: tools.my_tool_module
```
If several catalog names share one module (like `write_file` + `patch` → `file_tools`),
-one import is enough — preload dedupes modules.
+one import is enough — preload dedupes modules. ``builtin_module`` must start with
+``tools.`` (validated at load time).
Delete coverage is via `patch` V4A `*** Delete File:` operations (maps to `DELETE_HOST_FILE`).
@@ -658,7 +667,7 @@ uv run --package intentframe-integrations-cli python tests/intentframe_integrati
uv run --package intentframe-integrations-cli python tests/intentframe_integrations/test_policy_manage.py
```
-Extend `test_builtin_preload.py` when adding `GOVERNED_BUILTIN_MODULES` entries.
+Extend `test_builtin_preload.py` when adding ``builtin_module`` entries to the catalog template.
### Layer 2 — Toolsets + OpenAI provider payload (networked LLM)
@@ -669,14 +678,14 @@ RUN_HERMES_GATEWAY_TOOLSETS=1 ./tests/scripts/test-hermes-gateway-toolsets.sh
Requires `OPENAI_API_KEY`. After `integrate hermes`:
1. `GET /v1/toolsets` — config tool name surface
-2. `probe_hermes_tool_schemas.py` — registry schemas (`reason_required`, gate markers)
+2. `probe_hermes_tool_schemas.py` — registry schemas for **all** governed catalog tools (`reason_required`, gate markers); probe env includes `HERMES_GATEWAY_SESSION=1` so `cronjob` passes Hermes `check_fn`
3. `POST /v1/responses` with `HERMES_DUMP_REQUESTS=1` — one real `chat.completions` call
-4. Assert token usage > 0 and governed tools have required `reason` in `request.body.tools`
+4. Assert token usage > 0 and **all** governed catalog tools have required `reason` in `request.body.tools`
-Asserts `terminal: ['process', 'terminal']` on toolsets and provider payload schema
-for governed tools. Lighter than full E2E (no tool-calling ALLOW/BLOCK probes).
+Lighter than full E2E (no tool-calling ALLOW/BLOCK probes). Covers generic mappers
+(e.g. `cronjob`) that gateway E2E omits from LLM probes.
-Details: [`tests/hermes_gateway/README.md`](../tests/hermes_gateway/README.md#toolsets--provider-payload-test-opt-in-networked-llm).
+Details and recent bug fixes: [`tests/hermes_gateway/README.md`](../tests/hermes_gateway/README.md#toolsets--provider-payload-test-opt-in-networked-llm).
### Layer 3 — Scoped gateway E2E (fast smoke)
diff --git a/docs/hermes-intentframe-state-report.md b/docs/hermes-intentframe-state-report.md
index 93efdaa..9f39eb3 100644
--- a/docs/hermes-intentframe-state-report.md
+++ b/docs/hermes-intentframe-state-report.md
@@ -1,6 +1,6 @@
# IntentFrame × Hermes integration — state report
-> Snapshot of the Hermes agent integration as of **2026-06-23**. For how-to and
+> Snapshot of the Hermes agent integration as of **2026-06-24**. For how-to and
> troubleshooting, see [`hermes-intentframe-integration-guide.md`](./hermes-intentframe-integration-guide.md).
---
@@ -45,7 +45,7 @@ LLM (POST /v1/responses)
| Path | Purpose |
|------|---------|
-| `integrations/hermes/governance/tools.yaml` | Default governed-tool **template** (4 entries) |
+| `integrations/hermes/governance/tools.yaml` | Default governed-tool **template** (5 entries) |
| `integrations/hermes/policy.yaml` | Shipped policy **template** (RUN_COMMAND + host-file + deletion) |
| `~/.intentframe/integrations/hermes/governance/tools.yaml` | Runtime governed-tool config (user-owned) |
| `~/.intentframe/integrations/hermes/policy.yaml` | Runtime policy config (user-owned) |
@@ -89,12 +89,16 @@ At `register()`:
1. **`install_registry_hook()`** — wrap tools registered later (e.g. MCP refresh).
2. **`preload_governed_builtins(governed)`** — selective import from
- `GOVERNED_BUILTIN_MODULES` in [`builtin_preload.py`](../integrations/hermes/plugin/intentframe-gate/builtin_preload.py):
+ ``builtin_module`` per tool in [`builtin_preload.py`](../integrations/hermes/plugin/intentframe-gate/builtin_preload.py) (from repo ``tools.yaml``):
- `terminal` → `tools.terminal_tool`
- `process` → `tools.process_registry`
- `write_file`, `patch` → `tools.file_tools`
+ - `cronjob` → `tools.cronjob_tools`
3. **Snapshot loop** — wrap governed entries with `inject_reason()` + `gate_tool_call()`.
+`cronjob` also requires `HERMES_GATEWAY_SESSION=1` (or interactive/exec env) to pass
+Hermes `check_fn` filtering in `get_tool_definitions()` — preload alone is not enough.
+
See [`hermes-plugin-registration-order.md`](./hermes-plugin-registration-order.md) for
load-order evidence and bisect notes.
@@ -136,7 +140,7 @@ or restore defaults. Policy commands apply `agent.json` env via `load_and_activa
| Layer | Entry | LLM / network |
|-------|-------|---------------|
| Unit | `tests/hermes_plugin/`, `tests/hermes_gateway/test_*.py`, adapter tests, `test_policy_manage.py`, `test_integration_pack.py` | No |
-| Toolsets + provider payload | `RUN_HERMES_GATEWAY_TOOLSETS=1 ./tests/scripts/test-hermes-gateway-toolsets.sh` | OpenAI `chat.completions` (one round-trip); asserts `tools=` + `reason` in request dump |
+| Toolsets + provider payload | `RUN_HERMES_GATEWAY_TOOLSETS=1 ./tests/scripts/test-hermes-gateway-toolsets.sh` | OpenAI `chat.completions` (one round-trip); asserts **all** governed tools + `reason` in request dump |
| Live integration | `./tests/scripts/test-hermes-integration.sh` | Backend; policy reload smoke + adapter/plugin probes (no LLM) |
| Gateway E2E | `RUN_HERMES_GATEWAY_E2E=1 ./tests/scripts/test-hermes-gateway-e2e.sh` | OpenAI + full stack; native-mapper LLM probes only |
@@ -167,7 +171,7 @@ See [`tests/hermes_gateway/README.md`](../tests/hermes_gateway/README.md).
---
-## Recent changes (branch `fix-plugin-new-mechanism`)
+## Recent changes
| Change | Rationale |
|--------|-----------|
@@ -177,6 +181,11 @@ See [`tests/hermes_gateway/README.md`](../tests/hermes_gateway/README.md).
| Hardened block probe prompts | Fix LLM rewriting `/etc/` to sandbox paths |
| `load_and_activate_pack` + policy env parity | Policy validation sees same manifest env as backend boot |
| `cronjob` generic tool + two-tier probe contract | Live semantic smoke; no gateway LLM E2E for generic mappers |
+| **`builtin_module` in repo `tools.yaml`** | Replace hardcoded preload dict; single catalog source for Hermes import paths |
+| **Toolsets live: full governed catalog** | Probe/dump had used native E2E tier only — `cronjob` was skipped despite production governance |
+| **Toolsets probe: `HERMES_GATEWAY_SESSION=1`** | `cronjob` registered via preload but filtered by Hermes `check_fn` without gateway session env |
+| **Toolsets run marker** | Unique token per run for OpenAI Platform log correlation |
+| **Loader parity test (`builtin_module`)** | Plugin and shared governance loaders must agree on catalog shape |
---
diff --git a/docs/hermes-plugin-registration-order.md b/docs/hermes-plugin-registration-order.md
index d6ffb38..93bce65 100644
--- a/docs/hermes-plugin-registration-order.md
+++ b/docs/hermes-plugin-registration-order.md
@@ -66,25 +66,29 @@ def discover_builtin_tools(tools_dir: Optional[Path] = None) -> List[str]:
| **`intentframe-gate` (broken)** | Hook + snapshot only — **no preload** | **empty** → `wrapped = []` |
| **`intentframe-gate` (fixed)** | `preload_governed_builtins(governed)` + generic snapshot | governed names present |
-The fixed plugin restores the old **early-import effect** generically:
-
-```12:29:integrations/hermes/plugin/intentframe-gate/builtin_preload.py
-GOVERNED_BUILTIN_MODULES: dict[str, str] = {
- "terminal": "tools.terminal_tool",
- "process": "tools.process_registry",
- "write_file": "tools.file_tools",
- "patch": "tools.file_tools",
-}
-...
- importlib.import_module(module_name)
+The fixed plugin restores the old **early-import effect** generically from the dev-owned
+catalog template:
+
+```yaml
+# integrations/hermes/governance/tools.yaml (excerpt)
+terminal:
+ enabled: true
+ builtin_module: tools.terminal_tool
+cronjob:
+ enabled: true
+ builtin_module: tools.cronjob_tools
```
+[`builtin_preload.py`](../integrations/hermes/plugin/intentframe-gate/builtin_preload.py)
+imports each enabled tool's ``builtin_module`` (must start with ``tools.``) before snapshot.
+
Then the same wrap loop runs for every governed name — no terminal-specific
`_register_terminal_override()`:
```20:35:integrations/hermes/plugin/intentframe-gate/__init__.py
- governed = governed_tool_names()
- preload_governed_builtins(governed)
+ governed_tools = load_governed_tools()
+ governed = frozenset(governed_tools)
+ preload_governed_builtins(governed_tools)
for entry in registry._snapshot_entries():
if entry.name not in governed:
@@ -97,10 +101,11 @@ Then the same wrap loop runs for every governed name — no terminal-specific
### Why this matters for code review
-Treat changes to `GOVERNED_BUILTIN_MODULES` and any plugin `import` like **API
-surface changes**:
+Treat changes to ``builtin_module`` in the repo ``tools.yaml`` and any plugin ``import``
+like **API surface changes**:
-- Removing an import line can remove a tool from the OpenAI payload entirely.
+- Removing or omitting ``builtin_module`` can remove a tool from the OpenAI payload entirely.
+- Invalid ``builtin_module`` values are rejected (must start with ``tools.``).
- Adding `discover_builtin_tools()` can register unrelated tools (`read_terminal`).
- Unit test: [`tests/hermes_plugin/test_builtin_preload.py`](../tests/hermes_plugin/test_builtin_preload.py).
@@ -154,7 +159,7 @@ enough on the gateway path — governed Hermes builtins must be preloaded first.
```mermaid
flowchart TB
subgraph mechanisms["intentframe-gate registration"]
- A["1. Selective preload
GOVERNED_BUILTIN_MODULES"]
+ A["1. Selective preload
tools.yaml builtin_module"]
B["2. Snapshot loop
registry._snapshot_entries()"]
C["3. Registry hook
patch registry.register"]
end
@@ -183,8 +188,8 @@ flowchart TB
**Why not full `discover_builtin_tools()`?** It imports every builtin module.
That pulled in `read_terminal`, which Hermes then merged into the `terminal` toolset
and broke the E2E toolsets contract (`['process', 'terminal']` expected). Selective
-preload imports only modules listed in `GOVERNED_BUILTIN_MODULES` for names in the
-runtime governed set.
+preload imports ``builtin_module`` from each **enabled** governed tool in the dev-owned
+catalog template (copied to runtime on integrate).
---
@@ -305,7 +310,7 @@ registered a gated override — same **early import + wrap** effect as preload t
```python
install_registry_hook()
governed = governed_tool_names()
-preload_governed_builtins(governed) # GOVERNED_BUILTIN_MODULES
+preload_governed_builtins(governed_tools) # yaml builtin_module per enabled tool
for entry in registry._snapshot_entries():
if entry.name not in governed:
@@ -402,13 +407,13 @@ Unit tests: [`tests/hermes_plugin/test_builtin_preload.py`](../tests/hermes_plug
| File | Role |
|------|------|
-| [`builtin_preload.py`](../integrations/hermes/plugin/intentframe-gate/builtin_preload.py) | `GOVERNED_BUILTIN_MODULES` map + selective `importlib.import_module` |
+| [`builtin_preload.py`](../integrations/hermes/plugin/intentframe-gate/builtin_preload.py) | Preload from yaml ``builtin_module`` + selective ``importlib.import_module`` |
| [`schema.py`](../integrations/hermes/plugin/intentframe-gate/schema.py) | `inject_reason()` — terminal-specific reason text branch |
| [`gate.py`](../integrations/hermes/plugin/intentframe-gate/gate.py) | Validate via adapter, strip `reason`, delegate |
| [`registry_hook.py`](../integrations/hermes/plugin/intentframe-gate/registry_hook.py) | Patch `registry.register` for dynamic tools |
-When adding a governed Hermes **builtin**, add its import module to
-`GOVERNED_BUILTIN_MODULES` (see [`test_builtin_preload.py`](../tests/hermes_plugin/test_builtin_preload.py)).
+When adding a governed Hermes **builtin**, set ``builtin_module: tools.`` in the
+repo catalog template (see [`test_builtin_preload.py`](../tests/hermes_plugin/test_builtin_preload.py)).
---
@@ -416,7 +421,7 @@ When adding a governed Hermes **builtin**, add its import module to
| Tool | Gateway E2E | Registration note |
|------|-------------|-------------------|
-| `terminal`, `process`, `write_file`, `patch` | Probed when in scoped yaml | Listed in `GOVERNED_BUILTIN_MODULES` — preload + snapshot |
+| `terminal`, `process`, `write_file`, `patch`, `cronjob` | Probed when in scoped yaml | ``builtin_module`` in repo ``tools.yaml`` — preload + snapshot |
Delete coverage uses `patch` V4A `*** Delete File:` ops (maps to `DELETE_HOST_FILE`).
@@ -426,7 +431,7 @@ If a governed tool fails with “model never calls tool X”:
2. If X is on `/v1/toolsets` but **not** in the OpenAI Tools list, the registry /
`get_definitions()` path dropped it (missing entry or failed `check_fn`).
3. Check plugin register logs for `wrapped` — empty means preload map may be missing X.
-4. Add X to `GOVERNED_BUILTIN_MODULES` if Hermes registers it at module import time.
+4. Set ``builtin_module: tools.`` in the repo catalog template if Hermes registers it at module import time.
**Hermes-native long-term fix:** gateway could call `discover_builtin_tools()` before
`discover_plugins()` (upstream). Until then, the plugin owns selective preload.
diff --git a/integrations/hermes/governance/README.md b/integrations/hermes/governance/README.md
index e196c1f..07162e9 100644
--- a/integrations/hermes/governance/README.md
+++ b/integrations/hermes/governance/README.md
@@ -2,7 +2,7 @@
| File | Owner | Purpose |
|------|-------|---------|
-| `tools.yaml` (repo) | **Dev** | Tool catalog: names, mappers, action IDs, default `enabled` |
+| `tools.yaml` (repo) | **Dev** | Tool catalog: names, mappers, action IDs, default `enabled`, `builtin_module` (Hermes preload import path) |
| `tools.yaml` (runtime) | **User** | Same catalog; user toggles `enabled` via `governance enable\|disable` |
| `generic_actions.manifest` (repo) | **Dev** | Static list of all `mapper: generic` action IDs (full catalog superset) |
| `generic_actions.manifest` (runtime) | **Copied once** | Seeded on `integrate hermes`; never overwritten by automation |
@@ -38,11 +38,32 @@ Verify after edits:
```bash
uv run --package intentframe-integrations-cli python tests/intentframe_integrations/test_actions_manifest.py
+uv run --package intentframe-integrations-cli python tests/hermes_plugin/test_gate.py
+uv run --package intentframe-integrations-cli python tests/hermes_plugin/test_builtin_preload.py
+uv run --package intentframe-integrations-cli python integrations/hermes/shared/tests/test_governance.py
```
+### `builtin_module` (preload map)
+
+Each governed Hermes builtin declares `builtin_module: tools.` in repo
+`tools.yaml`. The intentframe-gate plugin imports unique modules for **enabled**
+tools before registry snapshot (see `builtin_preload.py`). Values must start with
+`tools.` — validated by both plugin and shared loaders.
+
+**Why yaml, not Python:** a hardcoded preload dict drifted from the catalog (e.g.
+`cronjob` governed in yaml but easy to omit from code). Yaml is the single source;
+`test_plugin_loader_matches_shared_template` asserts plugin/shared parity including
+`builtin_module`.
+
+**`cronjob` nuance:** preload registers the tool, but Hermes `get_tool_definitions()`
+also applies `check_cronjob_requirements()` — requires `HERMES_GATEWAY_SESSION=1`
+(or interactive/exec env). The toolsets schema probe sets session env to mirror the
+gateway; see `tests/hermes_gateway/README.md` (Recent fixes).
+
## Dev workflow (adding a generic tool)
-1. Add entry to `tools.yaml` with `mapper: generic` and a `HERMES_*` action ID.
+1. Add entry to `tools.yaml` with `mapper: generic`, a `HERMES_*` action ID, and
+ `builtin_module: tools.` when Hermes registers the tool at import time.
2. Regenerate committed `generic_actions.manifest` to include the new action ID
(golden test `tests/intentframe_integrations/test_actions_manifest.py` enforces parity).
3. Update `agent.json` `action_types`, shipped `policy.yaml`, and `executor.yaml`
diff --git a/integrations/hermes/governance/tools.yaml b/integrations/hermes/governance/tools.yaml
index 129b8d5..2b133d9 100644
--- a/integrations/hermes/governance/tools.yaml
+++ b/integrations/hermes/governance/tools.yaml
@@ -4,6 +4,9 @@
# enabled: true → IntentFrame gates at runtime (plugin wrap + adapter validate).
# enabled: false → spec kept in catalog; Hermes runs the tool ungoverned (no intent sent).
#
+# builtin_module → Hermes module that registers this tool at import time (dev-owned preload map).
+# Must start with "tools."; plugin imports only for enabled tools before registry snapshot.
+#
# User control: intentframe-integrations governance enable|disable hermes
# writes ONLY the runtime copy (~/.intentframe/.../governance/tools.yaml).
# Restart Hermes gateway + adapter after toggling (governance is cached at process start).
@@ -22,6 +25,7 @@ tools:
risk: local_process
mapper: terminal
blocked_response: terminal_json
+ builtin_module: tools.terminal_tool
process:
enabled: true
@@ -29,6 +33,7 @@ tools:
risk: local_process
mapper: process
blocked_response: generic_json
+ builtin_module: tools.process_registry
write_file:
enabled: true
@@ -36,6 +41,7 @@ tools:
risk: local_write
mapper: write_file
blocked_response: generic_json
+ builtin_module: tools.file_tools
patch:
enabled: true
@@ -44,6 +50,7 @@ tools:
risk: local_write
mapper: patch
blocked_response: generic_json
+ builtin_module: tools.file_tools
cronjob:
enabled: true
@@ -51,3 +58,4 @@ tools:
risk: local_process
mapper: generic
blocked_response: generic_json
+ builtin_module: tools.cronjob_tools
diff --git a/integrations/hermes/plugin/intentframe-gate/README.md b/integrations/hermes/plugin/intentframe-gate/README.md
index 7048638..e766b46 100644
--- a/integrations/hermes/plugin/intentframe-gate/README.md
+++ b/integrations/hermes/plugin/intentframe-gate/README.md
@@ -44,13 +44,13 @@ At plugin load (`register()`):
3. Snapshot loop — wrap governed registry entries with `override=True`
On gateway startup, plugins load **before** Hermes builtins. [`builtin_preload.py`](builtin_preload.py)
-imports only modules in `GOVERNED_BUILTIN_MODULES` for **governed** tool names so
-the snapshot loop can wrap them without calling full `discover_builtin_tools()`
-(which would pull in extras like `read_terminal`). Details:
-[`docs/hermes-plugin-registration-order.md`](../../../docs/hermes-plugin-registration-order.md).
+imports ``builtin_module`` from each **enabled** governed tool in the dev-owned
+``governance/tools.yaml`` so the snapshot loop can wrap them without calling full
+``discover_builtin_tools()`` (which would pull in extras like ``read_terminal``).
+Details: [`docs/hermes-plugin-registration-order.md`](../../../docs/hermes-plugin-registration-order.md).
-When adding a governed Hermes builtin, add its import module to
-`GOVERNED_BUILTIN_MODULES` and extend
+When adding a governed Hermes builtin, set ``builtin_module: tools.`` in the
+repo template and extend
[`tests/hermes_plugin/test_builtin_preload.py`](../../../tests/hermes_plugin/test_builtin_preload.py).
## Env
diff --git a/integrations/hermes/plugin/intentframe-gate/__init__.py b/integrations/hermes/plugin/intentframe-gate/__init__.py
index d17b2d3..af12ae7 100644
--- a/integrations/hermes/plugin/intentframe-gate/__init__.py
+++ b/integrations/hermes/plugin/intentframe-gate/__init__.py
@@ -4,7 +4,7 @@
from .builtin_preload import preload_governed_builtins
from .gate import wrap_handler
-from .governance_loader import governed_tool_names
+from .governance_loader import load_governed_tools
from .registry_hook import install_registry_hook
from .schema import inject_reason
@@ -17,8 +17,9 @@ def register(ctx) -> None:
install_registry_hook()
- governed = governed_tool_names()
- preload_governed_builtins(governed)
+ governed_tools = load_governed_tools()
+ governed = frozenset(governed_tools)
+ preload_governed_builtins(governed_tools)
for entry in registry._snapshot_entries():
if entry.name not in governed:
diff --git a/integrations/hermes/plugin/intentframe-gate/builtin_preload.py b/integrations/hermes/plugin/intentframe-gate/builtin_preload.py
index 4d4525c..079b77a 100644
--- a/integrations/hermes/plugin/intentframe-gate/builtin_preload.py
+++ b/integrations/hermes/plugin/intentframe-gate/builtin_preload.py
@@ -4,24 +4,23 @@
import importlib
import logging
+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+ from .governance_loader import ToolSpec
logger = logging.getLogger(__name__)
-# Hermes 0.17 modules that register governed tool names at import time.
-# Several catalog names may share one module (write_file + patch → file_tools).
-GOVERNED_BUILTIN_MODULES: dict[str, str] = {
- "terminal": "tools.terminal_tool",
- "process": "tools.process_registry",
- "write_file": "tools.file_tools",
- "patch": "tools.file_tools",
-}
+def preload_governed_builtins(governed_tools: dict[str, "ToolSpec"]) -> None:
+ """Ensure governed Hermes builtins are registered before snapshot wrap.
-def preload_governed_builtins(governed: frozenset[str]) -> None:
- """Ensure governed Hermes builtins are registered before snapshot wrap."""
+ Imports ``builtin_module`` from each enabled governed tool spec (yaml, dev-owned).
+ Disabled catalog entries are not in *governed_tools* and are never preloaded.
+ """
seen_modules: set[str] = set()
- for tool_name in sorted(governed):
- module_name = GOVERNED_BUILTIN_MODULES.get(tool_name)
+ for tool_name in sorted(governed_tools):
+ module_name = governed_tools[tool_name].builtin_module
if not module_name or module_name in seen_modules:
continue
seen_modules.add(module_name)
diff --git a/integrations/hermes/plugin/intentframe-gate/governance_loader.py b/integrations/hermes/plugin/intentframe-gate/governance_loader.py
index a73d96c..0c8ff1d 100644
--- a/integrations/hermes/plugin/intentframe-gate/governance_loader.py
+++ b/integrations/hermes/plugin/intentframe-gate/governance_loader.py
@@ -12,6 +12,7 @@
VALID_BLOCKED_RESPONSES = frozenset({"terminal_json", "generic_json"})
VALID_MAPPER_KINDS = frozenset({"terminal", "process", "write_file", "patch", "generic"})
+BUILTIN_MODULE_PREFIX = "tools."
@dataclass(frozen=True)
@@ -23,6 +24,7 @@ class ToolSpec:
blocked_response: str = "generic_json"
actions: tuple[str, ...] = ()
enabled: bool = True
+ builtin_module: str | None = None
def policy_actions(self) -> frozenset[str]:
if self.actions:
@@ -85,6 +87,20 @@ def _parse_enabled(raw: dict[str, Any]) -> bool:
return enabled
+def _parse_builtin_module(name: str, raw: dict[str, Any]) -> str | None:
+ value = raw.get("builtin_module")
+ if value is None:
+ return None
+ if not isinstance(value, str) or not value.strip():
+ raise ValueError(f"Tool {name!r} builtin_module must be a non-empty string when present")
+ module = value.strip()
+ if not module.startswith(BUILTIN_MODULE_PREFIX):
+ raise ValueError(
+ f"Tool {name!r} builtin_module {module!r} must start with {BUILTIN_MODULE_PREFIX!r}"
+ )
+ return module
+
+
def _parse_tool(name: str, raw: dict[str, Any]) -> ToolSpec:
action = str(raw.get("action", "")).strip()
risk = str(raw.get("risk", "")).strip()
@@ -104,6 +120,7 @@ def _parse_tool(name: str, raw: dict[str, Any]) -> ToolSpec:
blocked_response=blocked_response,
actions=_parse_actions(raw, action),
enabled=_parse_enabled(raw),
+ builtin_module=_parse_builtin_module(name, raw),
)
diff --git a/integrations/hermes/shared/src/hermes_governance/loader.py b/integrations/hermes/shared/src/hermes_governance/loader.py
index 4ff8133..b8a5224 100644
--- a/integrations/hermes/shared/src/hermes_governance/loader.py
+++ b/integrations/hermes/shared/src/hermes_governance/loader.py
@@ -13,6 +13,7 @@
VALID_BLOCKED_RESPONSES = frozenset({"terminal_json", "generic_json"})
VALID_MAPPER_KINDS = frozenset({"terminal", "process", "write_file", "patch", "generic"})
+BUILTIN_MODULE_PREFIX = "tools."
@dataclass(frozen=True)
@@ -24,6 +25,7 @@ class ToolSpec:
blocked_response: str = "generic_json"
actions: tuple[str, ...] = ()
enabled: bool = True
+ builtin_module: str | None = None
def policy_actions(self) -> frozenset[str]:
if self.actions:
@@ -94,6 +96,20 @@ def _parse_enabled(raw: dict[str, Any]) -> bool:
return enabled
+def _parse_builtin_module(name: str, raw: dict[str, Any]) -> str | None:
+ value = raw.get("builtin_module")
+ if value is None:
+ return None
+ if not isinstance(value, str) or not value.strip():
+ raise ValueError(f"Tool {name!r} builtin_module must be a non-empty string when present")
+ module = value.strip()
+ if not module.startswith(BUILTIN_MODULE_PREFIX):
+ raise ValueError(
+ f"Tool {name!r} builtin_module {module!r} must start with {BUILTIN_MODULE_PREFIX!r}"
+ )
+ return module
+
+
def _parse_tool(name: str, raw: dict[str, Any]) -> ToolSpec:
if not isinstance(raw, dict):
raise ValueError(f"Tool {name!r} must be a mapping")
@@ -129,6 +145,7 @@ def _parse_tool(name: str, raw: dict[str, Any]) -> ToolSpec:
blocked_response=blocked_response,
actions=_parse_actions(raw, primary_action),
enabled=_parse_enabled(raw),
+ builtin_module=_parse_builtin_module(name, raw),
)
diff --git a/integrations/hermes/shared/tests/test_governance.py b/integrations/hermes/shared/tests/test_governance.py
index 7dac025..eacafe9 100644
--- a/integrations/hermes/shared/tests/test_governance.py
+++ b/integrations/hermes/shared/tests/test_governance.py
@@ -41,6 +41,41 @@ def test_load_governed_tools(self) -> None:
self.assertEqual(frozenset(catalog), template_catalog_tool_names())
self.assertEqual(frozenset(tools), template_governed_tool_names())
+ def test_builtin_module_on_catalog_tools(self) -> None:
+ from hermes_governance.loader import load_tool_catalog
+
+ with governance_env():
+ catalog = load_tool_catalog()
+ self.assertEqual(catalog["terminal"].builtin_module, "tools.terminal_tool")
+ self.assertEqual(catalog["process"].builtin_module, "tools.process_registry")
+ self.assertEqual(catalog["write_file"].builtin_module, "tools.file_tools")
+ self.assertEqual(catalog["patch"].builtin_module, "tools.file_tools")
+ self.assertEqual(catalog["cronjob"].builtin_module, "tools.cronjob_tools")
+
+ def test_invalid_builtin_module_prefix_raises(self) -> None:
+ from hermes_governance.loader import load_tool_catalog
+
+ yaml_text = """
+tools:
+ terminal:
+ enabled: true
+ action: RUN_COMMAND
+ risk: local_process
+ mapper: terminal
+ builtin_module: os.path
+"""
+ with tempfile.NamedTemporaryFile("w", suffix=".yaml", delete=False) as handle:
+ handle.write(yaml_text)
+ path = handle.name
+
+ try:
+ load_tool_catalog.cache_clear()
+ with self.assertRaises(ValueError):
+ load_tool_catalog(path)
+ finally:
+ load_tool_catalog.cache_clear()
+ Path(path).unlink(missing_ok=True)
+
def test_generic_mapper_action_ids(self) -> None:
from hermes_governance.loader import generic_mapper_action_ids
diff --git a/tests/hermes_gateway/README.md b/tests/hermes_gateway/README.md
index bb2ac0c..8d17de1 100644
--- a/tests/hermes_gateway/README.md
+++ b/tests/hermes_gateway/README.md
@@ -197,6 +197,7 @@ Hermes gateway before trusting stale PID files.
| Wrong governed set at runtime | Parent env not propagated to adapter/gateway | E2E `assert_governance_env_contract` failure; check `gateway start` stderr for `Hermes governance config:` line |
| `patch replace ALLOW` fails Pass 2a (overwrite BLOCK) | Same marker/file reused across passes | Pass-unique marker + `seed_patch_replace_target` (see [Probe harness determinism](#probe-harness-determinism)) |
| `patch replace BLOCK`: path under `/tmp/…` | LLM rewrote `/etc/…` after block | Explicit block prompt in `run_patch_replace_block_once` |
+| Toolsets probe: `cronjob missing from get_tool_definitions()` | Probe subprocess lacks gateway session env | Toolsets live sets `HERMES_GATEWAY_SESSION=1` in probe env; see [Recent fixes](#recent-fixes-2026-06) under toolsets section |
## Related docs
@@ -227,6 +228,8 @@ parity with CLI child env builders). `test_hermes_install.py` covers
Lighter-weight than full gateway E2E: proves intentframe-gate changes appear on the
**OpenAI upstream `tools=` payload**, not just Hermes config/listing surfaces.
+Covers **all** governed catalog tools (including generic mappers like `cronjob`), not
+only the native-mapper subset used for gateway E2E ALLOW/BLOCK probes.
Entrypoint: `test_gateway_toolsets_live.py`
Wrapper: `tests/scripts/test-hermes-gateway-toolsets.sh`
@@ -242,7 +245,9 @@ POST /v1/responses → real chat.completions round-trip + request dump
assertions → token usage > 0, governed tools + reason in tools=
```
-Hermes gateway starts with `HERMES_DUMP_REQUESTS=1`. One cheap `POST /v1/responses`
+Hermes gateway starts with `HERMES_DUMP_REQUESTS=1`. The schema probe subprocess
+sets `HERMES_GATEWAY_SESSION=1` so ``cronjob`` passes Hermes ``check_fn`` (same
+as the running gateway). One cheap `POST /v1/responses`
prompts the model to reply `OK` without calling tools (minimizes IntentFrame policy
noise; still sends the full `tools=` list upstream).
@@ -251,8 +256,8 @@ noise; still sends the full `tools=` list upstream).
| Surface | What it proves |
|---------|----------------|
| `GET /v1/toolsets` | Hermes **config** tool names for api_server (e.g. ~31) |
-| `probe_hermes_tool_schemas.py` | **Registry** schemas for native-mapper governed tools (`reason` + gate); generic tools skipped |
-| Request dump + round-trip assert | **OpenAI `chat.completions` payload** — native governed tools with required `reason` in `tools=` |
+| `probe_hermes_tool_schemas.py` | **Registry** schemas for all governed tools (`reason` + gate), including generic mappers |
+| Request dump + round-trip assert | **OpenAI `chat.completions` payload** — all governed tools with required `reason` in `tools=` |
The registry count and toolsets count differ by design — not every listed toolset
name becomes a registry definition on the LLM path. See
@@ -263,7 +268,7 @@ name becomes a registry definition on the LLM path. See
| Helper | Checks |
|--------|--------|
| `assert_gateway_openai_roundtrip()` | Gateway `status: completed` and `usage.total_tokens > 0` |
-| `assert_provider_tools_surface()` | Native-mapper governed tools in dump `request.body.tools` with required `reason` |
+| `assert_provider_tools_surface()` | All governed catalog tools in dump `request.body.tools` with required `reason` |
The request dump is written at **preflight** (before the HTTP call to OpenAI). Token
usage from the gateway response proves the call **completed** — the dump alone only
@@ -273,16 +278,19 @@ Contract helpers: `tests/hermes_gateway/provider_request_contract.py`
### Stderr on success
-1. **OpenAI round-trip proof** — input/output/total tokens, provider URL/model
-2. **Provider tools= snapshot** — dump path, sorted tool list, `[governed, reason_required=true]` markers
+1. **OpenAI round-trip proof** — run marker, input/output/total tokens, provider URL/model
+2. **Provider tools= snapshot** — run marker, dump path, sorted tool list, `[governed, reason_required=true]` markers
### Finding the call in OpenAI Platform
-Shows as **Chat Completion** (`gpt-4o-mini`), not Responses API. Typical signature:
+Each run prints a unique marker: ``intentframe-toolsets-``. Search Platform
+Logs / Usage for that token in the user prompt or assistant reply to match this run.
-- User prompt: `Reply with the single word OK. Do not call any tools.`
-- System includes: `Automated integration test. Do not use tools.`
-- ~11k input tokens, **17 tools** in the Tools list, output `OK`
+Typical signature:
+
+- User prompt contains: ``Reply with exactly this single token ... intentframe-toolsets-...``
+- System includes: ``Automated IntentFrame toolsets integration test run_id=...``
+- ~11k input tokens, **17+ tools** in the Tools list, output is the marker token
- No tool invocations (unlike full E2E entries that say “Call the terminal tool…”)
Platform Logs list tool **names** but not JSON schema details (`reason` in `required`).
@@ -294,3 +302,15 @@ RUN_HERMES_GATEWAY_TOOLSETS=1 ./tests/scripts/test-hermes-gateway-toolsets.sh
The full gateway E2E also asserts `/v1/toolsets` before the LLM probes, but uses
tool-calling prompts for ALLOW/BLOCK — a different OpenAI log signature.
+
+### Recent fixes (2026-06)
+
+These were gaps between production behavior and what the toolsets live harness asserted.
+
+| Bug | Symptom | Root cause | Fix |
+|-----|---------|------------|-----|
+| **Partial governed coverage** | Toolsets test skipped `cronjob` while production governs it | Probe and provider dump used `gateway_e2e_probe_tool_names()` (native E2E tier only) | Probe uses `governed_tool_names()`; live test asserts `template_governed_tool_names()` on the request dump |
+| **Preload map drift** | Adding a governed builtin required editing a hardcoded Python dict | `GOVERNED_BUILTIN_MODULES` lived in `builtin_preload.py`, separate from `tools.yaml` | `builtin_module: tools.` per tool in repo `tools.yaml`; plugin preload imports enabled specs; shared + plugin loaders validate `tools.` prefix |
+| **`cronjob` schema probe failure** | Probe reported `cronjob missing from get_tool_definitions()` despite yaml preload | Preload registered `cronjob`, but Hermes `check_cronjob_requirements()` filters it unless `HERMES_GATEWAY_SESSION=1` (or interactive/exec env); probe subprocess lacked that env while the gateway had it | `_run_schema_probe()` sets `probe_env["HERMES_GATEWAY_SESSION"] = "1"` to mirror the running gateway |
+
+Guarded by `test_governed_tool_coverage.py` (`test_toolsets_live_verifies_full_governed_catalog`) and loader parity in `tests/hermes_plugin/test_gate.py` (`builtin_module` must match between plugin and shared loaders).
diff --git a/tests/hermes_gateway/probe_hermes_tool_schemas.py b/tests/hermes_gateway/probe_hermes_tool_schemas.py
index c808028..ce15299 100644
--- a/tests/hermes_gateway/probe_hermes_tool_schemas.py
+++ b/tests/hermes_gateway/probe_hermes_tool_schemas.py
@@ -2,11 +2,12 @@
"""Probe Hermes registry schemas after intentframe-gate (reason injection).
Run inside the managed Hermes venv with HERMES_HOME set. Used by gateway
-toolsets live test to verify **native-mapper** governed tools use Hermes names
-and require ``reason`` in their JSON schema.
+toolsets live test to verify **all** IntentFrame-governed tools (native and
+generic mappers) use Hermes names, require ``reason`` in their JSON schema,
+and have gated handlers.
-Generic-mapper tools (e.g. ``cronjob``) are governed at runtime but excluded
-here — same two-tier contract as gateway E2E (live semantic smoke only).
+Requires ``HERMES_GATEWAY_SESSION=1`` in the probe environment (set by the
+toolsets live harness) so ``cronjob`` passes Hermes ``check_fn`` filtering.
"""
from __future__ import annotations
@@ -25,7 +26,6 @@
sys.path.insert(0, str(PLUGIN_SRC))
from governance_loader import governed_tool_names # type: ignore # noqa: E402
-from hermes_governance_fixtures import gateway_e2e_probe_tool_names # noqa: E402
def main() -> int:
@@ -52,9 +52,7 @@ def main() -> int:
)
definitions = get_tool_definitions(enabled_toolsets=enabled_toolsets, quiet_mode=True)
- governed = governed_tool_names()
- probe_targets = gateway_e2e_probe_tool_names() & governed
- skipped_generic = sorted(governed - probe_targets)
+ probe_targets = governed_tool_names()
by_name: dict[str, dict] = {}
for item in definitions:
fn = item.get("function")
@@ -69,7 +67,6 @@ def main() -> int:
"enabled_toolset_count": len(enabled_toolsets),
"definition_count": len(definitions),
"governed_tools": {},
- "skipped_generic_governed": skipped_generic,
"distractors": {},
}
diff --git a/tests/hermes_gateway/provider_request_contract.py b/tests/hermes_gateway/provider_request_contract.py
index 6dd1fe0..e1b72f5 100644
--- a/tests/hermes_gateway/provider_request_contract.py
+++ b/tests/hermes_gateway/provider_request_contract.py
@@ -65,6 +65,86 @@ def load_newest_request_dump_body(
return body
+TOOLSETS_RUN_MARKER_PREFIX = "intentframe-toolsets"
+
+
+def toolsets_run_marker(run_id: str) -> str:
+ """Unique token for one toolsets live test run (OpenAI logs + request dumps)."""
+ return f"{TOOLSETS_RUN_MARKER_PREFIX}-{run_id}"
+
+
+def toolsets_llm_prompt(marker: str) -> str:
+ return (
+ f"Reply with exactly this single token and nothing else: {marker}. "
+ "Do not call any tools."
+ )
+
+
+def toolsets_llm_instructions(marker: str) -> str:
+ return (
+ f"Automated IntentFrame toolsets integration test run_id={marker}. "
+ "Do not use tools. Your entire reply must be exactly the marker token."
+ )
+
+
+def extract_gateway_text_output(gateway_body: dict[str, Any]) -> str:
+ """Concatenate assistant text from Hermes ``POST /v1/responses`` output items."""
+ parts: list[str] = []
+ output = gateway_body.get("output")
+ if not isinstance(output, list):
+ return ""
+ for item in output:
+ if not isinstance(item, dict):
+ continue
+ content = item.get("content")
+ if isinstance(content, list):
+ for block in content:
+ if isinstance(block, dict):
+ text = block.get("text")
+ if isinstance(text, str):
+ parts.append(text)
+ text = item.get("text")
+ if isinstance(text, str):
+ parts.append(text)
+ return " ".join(parts).strip()
+
+
+def assert_gateway_response_contains_marker(gateway_body: dict[str, Any], marker: str) -> str:
+ """Assert the LLM round-trip echoed the run marker in gateway output."""
+ text = extract_gateway_text_output(gateway_body)
+ if marker not in text:
+ raise AssertionError(
+ f"Gateway response missing run marker {marker!r}.\n"
+ f" extracted_text={text!r}\n"
+ f" body: {json.dumps(gateway_body)[:2000]}"
+ )
+ return text
+
+
+def extract_provider_user_message(body: dict[str, Any]) -> str | None:
+ """Best-effort user message from a chat.completions request dump body."""
+ messages = body.get("messages")
+ if not isinstance(messages, list):
+ return None
+ for msg in reversed(messages):
+ if not isinstance(msg, dict) or msg.get("role") != "user":
+ continue
+ content = msg.get("content")
+ if isinstance(content, str):
+ return content
+ return None
+
+
+def assert_provider_request_contains_marker(body: dict[str, Any], marker: str) -> None:
+ """Assert the OpenAI upstream payload included the run marker in the user message."""
+ user = extract_provider_user_message(body)
+ if user is None or marker not in user:
+ raise AssertionError(
+ f"Provider request dump missing run marker {marker!r}.\n"
+ f" user_message={user!r}"
+ )
+
+
def tool_reason_required(fn: dict[str, Any]) -> bool:
"""Return True when ``reason`` is a required parameter in a function schema."""
params = fn.get("parameters")
@@ -191,6 +271,8 @@ def format_gateway_roundtrip_snapshot(
*,
provider_url: str | None = None,
expected_model: str | None = None,
+ run_marker: str | None = None,
+ gateway_output_text: str | None = None,
) -> str:
"""Human-readable proof that Hermes completed an OpenAI chat.completions call."""
usage = extract_gateway_usage(gateway_body)
@@ -198,11 +280,19 @@ def format_gateway_roundtrip_snapshot(
output_items = output if isinstance(output, list) else []
lines = [
"OpenAI round-trip completed (via Hermes gateway):",
- f" gateway_status={gateway_body.get('status')!r}",
- f" provider_url={provider_url!r}",
]
+ if run_marker is not None:
+ lines.append(f" run_marker={run_marker!r}")
+ lines.extend(
+ [
+ f" gateway_status={gateway_body.get('status')!r}",
+ f" provider_url={provider_url!r}",
+ ]
+ )
if expected_model is not None:
lines.append(f" provider_model={expected_model!r}")
+ if gateway_output_text is not None:
+ lines.append(f" gateway_output_text={gateway_output_text!r}")
lines.extend(
[
f" input_tokens={usage.get('input_tokens', 0)}",
@@ -224,6 +314,7 @@ def format_provider_tools_snapshot(
governed: frozenset[str],
*,
dump_path: Path | None = None,
+ run_marker: str | None = None,
) -> str:
"""Human-readable summary of ``tools=`` sent to the OpenAI provider."""
by_name = parse_provider_tools(body)
@@ -232,6 +323,8 @@ def format_provider_tools_snapshot(
f"model={model!r}",
f"tool_count={len(by_name)}",
]
+ if run_marker is not None:
+ lines.append(f"run_marker={run_marker!r}")
if dump_path is not None:
lines.append(f"request_dump={dump_path}")
lines.append("")
diff --git a/tests/hermes_gateway/test_gateway_toolsets_live.py b/tests/hermes_gateway/test_gateway_toolsets_live.py
index 0526f93..123b7e5 100644
--- a/tests/hermes_gateway/test_gateway_toolsets_live.py
+++ b/tests/hermes_gateway/test_gateway_toolsets_live.py
@@ -1,5 +1,12 @@
#!/usr/bin/env python3
-"""Live gateway test: toolsets, schema probe, and provider tools= payload."""
+"""Live gateway test: toolsets, schema probe, and provider tools= payload.
+
+Asserts every IntentFrame-governed catalog tool (native mappers and generic mappers
+such as ``cronjob``) on the OpenAI upstream ``tools=`` payload with required
+``reason``. Differs from gateway E2E, which LLM-probes only the native-mapper tier.
+
+See ``tests/hermes_gateway/README.md`` (toolsets section) for flow and recent fixes.
+"""
from __future__ import annotations
@@ -20,12 +27,17 @@
)
from provider_request_contract import ( # noqa: E402
assert_gateway_openai_roundtrip,
+ assert_gateway_response_contains_marker,
+ assert_provider_request_contains_marker,
assert_provider_tools_surface,
format_gateway_roundtrip_snapshot,
format_provider_tools_snapshot,
load_newest_request_dump,
load_request_dump,
request_dump_paths,
+ toolsets_llm_instructions,
+ toolsets_llm_prompt,
+ toolsets_run_marker,
)
from cli_runner import CliError, format_diagnostics, run_cli, step, stop_everything # noqa: E402
from isolation import ( # noqa: E402
@@ -47,7 +59,7 @@
_TESTS_DIR = HERE.parent
if str(_TESTS_DIR) not in sys.path:
sys.path.insert(0, str(_TESTS_DIR))
-from hermes_governance_fixtures import gateway_e2e_probe_tool_names # noqa: E402
+from hermes_governance_fixtures import template_governed_tool_names # noqa: E402
API_HOST = "127.0.0.1"
INSTALL_TIMEOUT = 600.0
@@ -73,9 +85,12 @@ def _run_schema_probe(env: IsolatedEnv) -> None:
raise AssertionError(f"Probe script missing: {PROBE_SCRIPT}")
step("Probe Hermes registry schemas (reason injection + gate markers)")
+ probe_env = os.environ.copy()
+ # Mirror gateway process: cronjob check_fn requires HERMES_GATEWAY_SESSION.
+ probe_env["HERMES_GATEWAY_SESSION"] = "1"
result = subprocess.run(
[str(python), str(PROBE_SCRIPT)],
- env=os.environ.copy(),
+ env=probe_env,
capture_output=True,
text=True,
timeout=120.0,
@@ -98,6 +113,9 @@ def main() -> int:
try:
env = create_isolated_env()
+ run_marker = toolsets_run_marker(env.run_id)
+ step(f"Run marker (OpenAI log correlation): {run_marker}")
+ print(f"\n==> Toolsets live run marker: {run_marker}", file=sys.stderr)
step(f"Activating sandbox HOME={env.home} HERMES_HOME={env.hermes_home}")
activate(env)
step(f"Seeding Hermes OpenAI provider (model={_e2e_openai_model()})")
@@ -147,16 +165,17 @@ def main() -> int:
_run_schema_probe(env)
existing_dumps = frozenset(request_dump_paths(env.hermes_home))
- step("POST /v1/responses (capture provider tools= for OpenAI)")
+ step(f"POST /v1/responses (capture provider tools= for OpenAI, marker={run_marker})")
responses_body = post_responses(
host=API_HOST,
port=env.api_port,
api_key=env.api_key,
- prompt="Reply with the single word OK. Do not call any tools.",
- instructions="Automated integration test. Do not use tools.",
+ prompt=toolsets_llm_prompt(run_marker),
+ instructions=toolsets_llm_instructions(run_marker),
)
assert_gateway_openai_roundtrip(responses_body)
- governed = gateway_e2e_probe_tool_names()
+ gateway_output_text = assert_gateway_response_contains_marker(responses_body, run_marker)
+ governed = template_governed_tool_names()
dump_path, provider_body = load_newest_request_dump(
env.hermes_home,
existing=existing_dumps,
@@ -164,6 +183,7 @@ def main() -> int:
dump_raw = load_request_dump(dump_path)
request_meta = dump_raw.get("request")
request_meta_dict = request_meta if isinstance(request_meta, dict) else {}
+ assert_provider_request_contains_marker(provider_body, run_marker)
assert_provider_tools_surface(
provider_body,
governed,
@@ -178,6 +198,8 @@ def main() -> int:
responses_body,
provider_url=provider_url,
expected_model=_e2e_openai_model(),
+ run_marker=run_marker,
+ gateway_output_text=gateway_output_text,
),
file=sys.stderr,
)
@@ -187,13 +209,17 @@ def main() -> int:
provider_body,
governed,
dump_path=dump_path,
+ run_marker=run_marker,
),
file=sys.stderr,
)
assert_real_state_untouched(env)
exit_code = 0
- print("\n==> Hermes gateway toolsets live test passed", file=sys.stderr)
+ print(
+ f"\n==> Hermes gateway toolsets live test passed (run_marker={run_marker})",
+ file=sys.stderr,
+ )
except (CliError, AssertionError, TimeoutError, RuntimeError, subprocess.TimeoutExpired) as exc:
print(f"\nERROR: {exc}", file=sys.stderr)
if env is not None:
diff --git a/tests/hermes_gateway/test_governed_tool_coverage.py b/tests/hermes_gateway/test_governed_tool_coverage.py
index e031b08..fe9af38 100644
--- a/tests/hermes_gateway/test_governed_tool_coverage.py
+++ b/tests/hermes_gateway/test_governed_tool_coverage.py
@@ -67,23 +67,17 @@ def test_live_plugin_gate_covers_all_catalog_tools(self) -> None:
for fixture in LIVE_PLUGIN_EXTRA_FIXTURES:
self.assertIn(fixture, source)
- def test_toolsets_live_uses_native_gateway_probe_tier_only(self) -> None:
- native = gateway_e2e_probe_tool_names()
- generic = live_semantic_probe_tool_names()
+ def test_toolsets_live_verifies_full_governed_catalog(self) -> None:
+ """Toolsets live test asserts plugin changes for all governed tools, not native E2E tier only."""
probe = (GATEWAY_DIR / "probe_hermes_tool_schemas.py").read_text(encoding="utf-8")
toolsets_live = (GATEWAY_DIR / "test_gateway_toolsets_live.py").read_text(
encoding="utf-8"
)
- self.assertIn("gateway_e2e_probe_tool_names", probe)
- self.assertIn("gateway_e2e_probe_tool_names", toolsets_live)
- self.assertNotIn("template_governed_tool_names", toolsets_live)
- for tool in generic:
- self.assertNotIn(
- f'"{tool}"',
- probe,
- msg=f"schema probe must not hardcode generic tool {tool!r}",
- )
- self.assertFalse(native & generic)
+ self.assertIn("governed_tool_names()", probe)
+ self.assertNotIn("gateway_e2e_probe_tool_names", probe)
+ self.assertIn("template_governed_tool_names", toolsets_live)
+ self.assertNotIn("gateway_e2e_probe_tool_names", toolsets_live)
+ self.assertNotIn("skipped_generic_governed", probe)
def main() -> int:
diff --git a/tests/hermes_gateway/test_provider_request_contract.py b/tests/hermes_gateway/test_provider_request_contract.py
index 864b562..92e89ff 100644
--- a/tests/hermes_gateway/test_provider_request_contract.py
+++ b/tests/hermes_gateway/test_provider_request_contract.py
@@ -15,7 +15,10 @@
from provider_request_contract import ( # noqa: E402
assert_gateway_openai_roundtrip,
+ assert_gateway_response_contains_marker,
+ assert_provider_request_contains_marker,
assert_provider_tools_surface,
+ extract_gateway_text_output,
format_gateway_roundtrip_snapshot,
format_provider_tools_snapshot,
load_newest_request_dump,
@@ -23,6 +26,9 @@
load_request_dump_body,
parse_provider_tools,
tool_reason_required,
+ toolsets_llm_instructions,
+ toolsets_llm_prompt,
+ toolsets_run_marker,
)
@@ -123,16 +129,39 @@ def test_format_gateway_roundtrip_snapshot(self) -> None:
body = {
"status": "completed",
"usage": {"input_tokens": 11655, "output_tokens": 2, "total_tokens": 11657},
- "output": [{"type": "message", "content": [{"text": "OK"}]}],
+ "output": [{"type": "message", "content": [{"text": "intentframe-toolsets-abc"}]}],
}
+ marker = toolsets_run_marker("abc")
text = format_gateway_roundtrip_snapshot(
body,
provider_url="https://api.openai.com/v1/chat/completions",
expected_model="gpt-4o-mini",
+ run_marker=marker,
+ gateway_output_text=marker,
)
+ self.assertIn("run_marker='intentframe-toolsets-abc'", text)
self.assertIn("total_tokens=11657", text)
self.assertIn("provider_url='https://api.openai.com/v1/chat/completions'", text)
+ def test_toolsets_run_marker_helpers(self) -> None:
+ marker = toolsets_run_marker("deadbeef")
+ self.assertEqual(marker, "intentframe-toolsets-deadbeef")
+ self.assertIn(marker, toolsets_llm_prompt(marker))
+ self.assertIn(marker, toolsets_llm_instructions(marker))
+
+ def test_assert_gateway_response_contains_marker(self) -> None:
+ marker = toolsets_run_marker("abc")
+ body = {
+ "output": [{"type": "message", "content": [{"text": marker}]}],
+ }
+ self.assertEqual(assert_gateway_response_contains_marker(body, marker), marker)
+ self.assertEqual(extract_gateway_text_output(body), marker)
+
+ def test_assert_provider_request_contains_marker(self) -> None:
+ marker = toolsets_run_marker("abc")
+ body = {"messages": [{"role": "user", "content": toolsets_llm_prompt(marker)}]}
+ assert_provider_request_contains_marker(body, marker)
+
def test_format_provider_tools_snapshot(self) -> None:
body = {
"model": "gpt-4o-mini",
diff --git a/tests/hermes_plugin/test_builtin_preload.py b/tests/hermes_plugin/test_builtin_preload.py
index 28a521e..a70a743 100644
--- a/tests/hermes_plugin/test_builtin_preload.py
+++ b/tests/hermes_plugin/test_builtin_preload.py
@@ -6,6 +6,7 @@
import sys
import unittest
from pathlib import Path
+from types import SimpleNamespace
from unittest import mock
TESTS_DIR = Path(__file__).resolve().parent
@@ -17,20 +18,35 @@
preload_mod = load_plugin_module("builtin_preload")
+def _spec(module: str | None) -> SimpleNamespace:
+ return SimpleNamespace(builtin_module=module)
+
+
class PreloadGovernedBuiltinsTests(unittest.TestCase):
- def test_imports_unique_modules_for_governed_tools(self) -> None:
- governed = frozenset({"terminal", "write_file", "patch"})
+ def test_imports_unique_modules_from_governed_specs(self) -> None:
+ governed_tools = {
+ "terminal": _spec("tools.terminal_tool"),
+ "write_file": _spec("tools.file_tools"),
+ "patch": _spec("tools.file_tools"),
+ }
with mock.patch.object(preload_mod.importlib, "import_module") as import_module:
- preload_mod.preload_governed_builtins(governed)
+ preload_mod.preload_governed_builtins(governed_tools)
import_module.assert_any_call("tools.terminal_tool")
import_module.assert_any_call("tools.file_tools")
self.assertEqual(import_module.call_count, 2)
- def test_skips_unknown_governed_tools(self) -> None:
- governed = frozenset({"unknown_future_tool"})
+ def test_imports_cronjob_module_when_governed(self) -> None:
+ governed_tools = {"cronjob": _spec("tools.cronjob_tools")}
+ with mock.patch.object(preload_mod.importlib, "import_module") as import_module:
+ preload_mod.preload_governed_builtins(governed_tools)
+
+ import_module.assert_called_once_with("tools.cronjob_tools")
+
+ def test_skips_tools_without_builtin_module(self) -> None:
+ governed_tools = {"unknown_future_tool": _spec(None)}
with mock.patch.object(preload_mod.importlib, "import_module") as import_module:
- preload_mod.preload_governed_builtins(governed)
+ preload_mod.preload_governed_builtins(governed_tools)
import_module.assert_not_called()
diff --git a/tests/hermes_plugin/test_gate.py b/tests/hermes_plugin/test_gate.py
index 5130ac6..5ff3868 100644
--- a/tests/hermes_plugin/test_gate.py
+++ b/tests/hermes_plugin/test_gate.py
@@ -80,9 +80,20 @@ def test_plugin_loader_matches_shared_template(self) -> None:
ensure_shared_loader_importable()
from hermes_governance.loader import load_governed_tools as shared_load_governed
- plugin_names = frozenset(governance_mod.load_governed_tools().keys())
- shared_names = frozenset(shared_load_governed().keys())
- self.assertEqual(plugin_names, shared_names)
+ plugin_tools = governance_mod.load_governed_tools()
+ shared_tools = shared_load_governed()
+ self.assertEqual(frozenset(plugin_tools), frozenset(shared_tools))
+ for name in plugin_tools:
+ self.assertEqual(
+ plugin_tools[name].builtin_module,
+ shared_tools[name].builtin_module,
+ msg=f"builtin_module mismatch for {name!r}",
+ )
+ self.assertEqual(
+ plugin_tools[name].enabled,
+ shared_tools[name].enabled,
+ msg=f"enabled mismatch for {name!r}",
+ )
class TestGateToolCall(PluginGovernanceEnvMixin, unittest.TestCase):
diff --git a/tests/scripts/e2e.sh b/tests/scripts/e2e.sh
index 8b11311..516e75f 100755
--- a/tests/scripts/e2e.sh
+++ b/tests/scripts/e2e.sh
@@ -160,6 +160,7 @@ step "Integrations CLI unit tests"
(cd "$REPO_ROOT" && uv run --package intentframe-integrations-cli python tests/intentframe_integrations/test_policy_contract.py)
(cd "$REPO_ROOT" && uv run --package intentframe-integrations-cli python tests/intentframe_integrations/test_policy_manage.py)
(cd "$REPO_ROOT" && uv run --package intentframe-integrations-cli python tests/intentframe_integrations/test_governance_runtime_contract.py)
+(cd "$REPO_ROOT" && uv run --package intentframe-integrations-cli python tests/intentframe_integrations/test_make_catalog_yaml.py)
(cd "$REPO_ROOT" && uv run --package intentframe-integrations-cli python tests/intentframe_integrations/test_scoped_governance_yaml.py)
(cd "$REPO_ROOT" && uv run --package intentframe-integrations-cli python tests/intentframe_integrations/test_actions_manifest.py)
diff --git a/tests/scripts/test-hermes-gateway-toolsets.sh b/tests/scripts/test-hermes-gateway-toolsets.sh
index 26270a4..f897260 100755
--- a/tests/scripts/test-hermes-gateway-toolsets.sh
+++ b/tests/scripts/test-hermes-gateway-toolsets.sh
@@ -3,9 +3,11 @@
#
# After integrate hermes:
# 1. GET /v1/toolsets (config surface)
-# 2. probe_hermes_tool_schemas.py (native governed registry + reason injection)
+# 2. probe_hermes_tool_schemas.py (all governed registry tools + reason injection)
# 3. POST /v1/responses + HERMES_DUMP_REQUESTS=1 (real chat.completions round-trip)
-# 4. Assert token usage > 0 and native governed tools have required reason in tools=
+# 4. Assert token usage > 0 and all governed catalog tools have required reason in tools=
+#
+# Schema probe sets HERMES_GATEWAY_SESSION=1 so cronjob passes Hermes check_fn (see README).
#
# RUN_HERMES_GATEWAY_TOOLSETS=1 ./tests/scripts/test-hermes-gateway-toolsets.sh
#