fix(tools): whole-sentence compact descriptions + front-loaded behavior disclosure#295
Merged
Merged
Conversation
…or disclosure
Glama's TDQS scored the catalog C (avg 3.5/5, lowest 1.9/5) and the listing's
quality badge B. Root cause: compactDescription() kept the first sentence but
hard-capped it at 80 chars with slice(0,77)+"..." — every tool whose first
sentence ran past 80 shipped mid-word-truncated prose over the wire ("ending
with '...', indicating truncation" per the judge), and several short
descriptions disclosed nothing about behavior.
Mechanism (src/shared/tool-filter.ts, mirror synced in
scripts/measure-tool-tokens.mjs): keep as many leading COMPLETE sentences as
fit a 160-char budget; a boundary is sentence punctuation followed by
whitespace, so "etc.)", "e.g.," and version numbers don't split; only a first
sentence longer than the whole budget falls back to a word-boundary cut with
an ellipsis. Compact mode stays ON by default; AIRMCP_COMPACT_TOOLS=false
still returns full descriptions.
Content: front-load purpose + behavior in sentence one for the 14 lowest-
scoring definitions — 6 skill YAMLs (read/write nature + HITL gating now
disclosed up front instead of a tail "DESTRUCTIVE:" note the old cap never
let clients see) and 8 tools (update_reminder / create_folder /
capture_screenshot / create_shortcut / get_hourly_forecast enriched from
bare one-liners; semantic_index now front-loads "replaces any existing
index" and the real embedding requirement — GEMINI_API_KEY or the Swift
bridge, not bridge-only; find_related / summarize_context disclose read-only
+ prerequisites). event_subscribe and spotlight_clear needed no text change —
the mechanism fix alone un-truncates them.
Verified on a clean-room boot of the rebuilt dist: all 111 starter wire
descriptions are complete sentences <=160 chars, zero "..." endings.
New tests/tool-filter.test.js pins the wire contract (whole sentences,
word-boundary fallback, abbreviation safety, no mid-word cuts); suite 1929
pass. docs/tool-manifest.json + MCPIntents.swift regenerated — counts
unchanged (272 registerTool / 285 manifest / 232 intents; stats:check clean).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Glama's TDQS scored the catalog C (avg 3.5/5, lowest 1.9/5) and the listing's
quality badge B. Root cause: compactDescription() kept the first sentence but
hard-capped it at 80 chars with slice(0,77)+"..." — every tool whose first
sentence ran past 80 shipped mid-word-truncated prose over the wire ("ending
with '...', indicating truncation" per the judge), and several short
descriptions disclosed nothing about behavior.
Mechanism (src/shared/tool-filter.ts, mirror synced in
scripts/measure-tool-tokens.mjs): keep as many leading COMPLETE sentences as
fit a 160-char budget; a boundary is sentence punctuation followed by
whitespace, so "etc.)", "e.g.," and version numbers don't split; only a first
sentence longer than the whole budget falls back to a word-boundary cut with
an ellipsis. Compact mode stays ON by default; AIRMCP_COMPACT_TOOLS=false
still returns full descriptions.
Content: front-load purpose + behavior in sentence one for the 14 lowest-
scoring definitions — 6 skill YAMLs (read/write nature + HITL gating now
disclosed up front instead of a tail "DESTRUCTIVE:" note the old cap never
let clients see) and 8 tools (update_reminder / create_folder /
capture_screenshot / create_shortcut / get_hourly_forecast enriched from
bare one-liners; semantic_index now front-loads "replaces any existing
index" and the real embedding requirement — GEMINI_API_KEY or the Swift
bridge, not bridge-only; find_related / summarize_context disclose read-only
the mechanism fix alone un-truncates them.
Verified on a clean-room boot of the rebuilt dist: all 111 starter wire
descriptions are complete sentences <=160 chars, zero "..." endings.
New tests/tool-filter.test.js pins the wire contract (whole sentences,
word-boundary fallback, abbreviation safety, no mid-word cuts); suite 1929
pass. docs/tool-manifest.json + MCPIntents.swift regenerated — counts
unchanged (272 registerTool / 285 manifest / 232 intents; stats:check clean).