Skip to content

Two ground-truth errors in src/scenarios/local/vibration_utterance.json (IDs 304, 306) #323

@iksnerd

Description

@iksnerd

Summary

Two characteristic_form errors in src/scenarios/local/vibration_utterance.json that an LLM judge would penalize a correct agent answer for. Both verified against the live vibration MCP server (the same code that produces the ground-truth values).

Scope note

These are in the in-repo local scenarios, not the HuggingFace all_utterance.jsonl corpus that #310 and #311 audit. They look like the same class of "scenario ground-truth audit" work though, so cross-linking in case maintainers want to track them together.

Finding 1 — ID 304 — wrong bearing geometry for 6205

Prompt (ID 304):

Calculate the bearing characteristic frequencies for a 6205 bearing running at 1800 RPM.

characteristic_form currently says:

...6205 geometry (9 balls, ball_dia=7.94 mm, pitch_dia=39.04 mm) at 1800 RPM.

Real geometry returned by list_known_bearings:

```json
{
"designation": "6205",
"n_balls": 9,
"ball_dia_mm": 7.938,
"pitch_dia_mm": 38.5,
"contact_angle_deg": 0
}
```

39.04 mm is actually the 6305's pitch diameter (next entry in the same tool output: {"designation": "6305", ..., "pitch_dia_mm": 39.04}). Looks like a copy-paste between adjacent bearing entries when the utterance was authored.

Downstream impact: an agent that correctly calls calculate_bearing_frequencies(rpm=1800, n_balls=9, ball_dia=7.938, pitch_dia=38.5) and returns BPFO=107.165, BPFI=162.835, BSF=69.659, FTF=11.907 Hz (which is what the vibration server itself computes) would be judged against a characteristic_form referencing the wrong pitch diameter.

Finding 2 — ID 306 — wrong ISO 10816 zone

Prompt (ID 306):

What is the vibration severity classification for a machine with an RMS velocity of 4.5 mm/s? It is a medium-sized machine on rigid foundations.

characteristic_form currently says:

...classify 4.5 mm/s as ISO 10816 Zone B (acceptable) for a group2 machine

Real classification returned by assess_vibration_severity:

```json
{
"rms_velocity_mm_s": 4.5,
"iso_zone": "C",
"description": "Alarm - not suitable for long-term operation",
"machine_group": "group2",
"thresholds": { "A_good": 1.4, "B_acceptable": 2.8, "C_alarm": 7.1 }
}
```

Group2 thresholds are A=1.4 / B=2.8 / C=7.1 mm/s, so 4.5 lands unambiguously in Zone C. The characteristic_form would mark a correct "Zone C" answer wrong.

Suggested fixes

  • ID 304: change pitch_dia=39.04 mmpitch_dia=38.5 mm (or drop the explicit number and reference the 6205 entry from the built-in bearing database).
  • ID 306: change Zone B (acceptable)Zone C with the "Alarm - not suitable for long-term operation" description.

Happy to send a PR if useful — but flagging here first since baseline scores may depend on these values.

Related

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions