Skip to content

fix(scenarios): correct vibration_utterance.json IDs 304 and 306#325

Open
iksnerd wants to merge 1 commit into
IBM:mainfrom
iksnerd:fix/vibration-utterance-304-306
Open

fix(scenarios): correct vibration_utterance.json IDs 304 and 306#325
iksnerd wants to merge 1 commit into
IBM:mainfrom
iksnerd:fix/vibration-utterance-304-306

Conversation

@iksnerd
Copy link
Copy Markdown

@iksnerd iksnerd commented May 24, 2026

Description

Fixes #323. Two characteristic_form values in src/scenarios/local/vibration_utterance.json disagreed with what the vibration MCP server actually returns, which the LLM judge would mark a correct agent answer wrong for. Both updates align the expected behavior with live tool output captured this week.

Fix Details

ID 304 — Bearing Analysis (6205 @ 1800 RPM)

Before:

...6205 geometry (9 balls, ball_dia=7.94 mm, pitch_dia=39.04 mm) at 1800 RPM.

After:

...6205 geometry (9 balls, ball_dia=7.938 mm, pitch_dia=38.5 mm) at 1800 RPM.

Why: 39.04 mm is the 6305 bearing's pitch diameter, not the 6205's. Verified by calling list_known_bearings:

```json
{ "designation": "6205", "n_balls": 9, "ball_dia_mm": 7.938, "pitch_dia_mm": 38.5, "contact_angle_deg": 0 }
{ "designation": "6305", "n_balls": 8, "ball_dia_mm": 10.319, "pitch_dia_mm": 39.04, "contact_angle_deg": 0 }
```

Also tightened ball_dia=7.94 mm7.938 mm to match the bearing database entry exactly (the rounding was inconsistent with the precision used elsewhere in the file).

ID 306 — Condition Assessment (4.5 mm/s @ group2)

Before:

...classify 4.5 mm/s as ISO 10816 Zone B (acceptable) for a group2 machine, with thresholds context.

After:

...classify 4.5 mm/s as ISO 10816 Zone C (Alarm - not suitable for long-term operation) for a group2 machine, with thresholds context (A=1.4, B=2.8, C=7.1 mm/s).

Why: Verified by calling assess_vibration_severity(rms_velocity_mm_s=4.5, machine_group="group2"):

```json
{
"rms_velocity_mm_s": 4.5,
"iso_zone": "C",
"description": "Alarm - not suitable for long-term operation",
"machine_group": "group2",
"thresholds": { "A_good": 1.4, "B_acceptable": 2.8, "C_alarm": 7.1 }
}
```

4.5 mm/s exceeds the B/C boundary at 2.8 mm/s and falls below the C/D boundary at 7.1 mm/s, so it lands unambiguously in Zone C. The new wording also surfaces the threshold values inline so the LLM judge has context for partial-credit grading.

Impact on Benchmarking

  • Baseline change: This fix corrects a scoring error.

Before vs. After expectation:

Old characteristic_form New characteristic_form
ID 304 Marks pitch_dia=38.5 mm answers wrong Marks pitch_dia=38.5 mm answers correct (matches live tool)
ID 306 Marks Zone C answers wrong Marks Zone C answers correct (matches live tool)

Any baseline runs against IDs 304 and 306 should be re-scored to reflect the corrected expected behavior. Two affected scenarios out of the local vibration corpus (24+ utterances), so the impact on aggregate scores depends on how those two rows weighted prior reports.

Related Issues

Verification Steps

  1. JSON valid: python -m json.tool src/scenarios/local/vibration_utterance.json > /dev/null → clean parse.

  2. Diff scope: git diff --stat1 file changed, 2 insertions(+), 2 deletions(-). Only IDs 304 and 306 touched.

  3. Live-tool re-verification of both new strings (captured during this PR's prep):

    ```
    list_known_bearings → 6205 = {n_balls: 9, ball_dia: 7.938, pitch_dia: 38.5}
    assess_vibration_severity(4.5, group2) → {iso_zone: "C", description: "Alarm - not suitable for long-term operation", thresholds: A=1.4, B=2.8, C=7.1}
    ```

Checklist

Two `characteristic_form` values disagreed with the live `vibration` MCP
server's output, which an LLM judge would penalize a correct agent answer
for. Both verified against the tool that produces the ground truth.

- **ID 304** (Bearing Analysis, 6205 @ 1800 RPM) — referenced
  `pitch_dia=39.04 mm` (which is actually the 6305's pitch diameter, per
  `list_known_bearings`). Corrected to `pitch_dia=38.5 mm` and also
  tightened `ball_dia=7.94 mm` → `7.938 mm` to match the bearing
  database entry exactly.

- **ID 306** (Condition Assessment, 4.5 mm/s @ group2) — classified
  4.5 mm/s as `Zone B (acceptable)`, but `assess_vibration_severity`
  returns `Zone C (Alarm - not suitable for long-term operation)`.
  Group2 thresholds are A=1.4 / B=2.8 / C=7.1 mm/s, so 4.5 lands in
  Zone C. Corrected the zone and included threshold context.

Fixes IBM#323.

Signed-off-by: iksnerd <bdrensk@me.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Two ground-truth errors in src/scenarios/local/vibration_utterance.json (IDs 304, 306)

1 participant