Summary
Two characteristic_form errors in src/scenarios/local/vibration_utterance.json that an LLM judge would penalize a correct agent answer for. Both verified against the live vibration MCP server (the same code that produces the ground-truth values).
Scope note
These are in the in-repo local scenarios, not the HuggingFace all_utterance.jsonl corpus that #310 and #311 audit. They look like the same class of "scenario ground-truth audit" work though, so cross-linking in case maintainers want to track them together.
Finding 1 — ID 304 — wrong bearing geometry for 6205
Prompt (ID 304):
Calculate the bearing characteristic frequencies for a 6205 bearing running at 1800 RPM.
characteristic_form currently says:
...6205 geometry (9 balls, ball_dia=7.94 mm, pitch_dia=39.04 mm) at 1800 RPM.
Real geometry returned by list_known_bearings:
```json
{
"designation": "6205",
"n_balls": 9,
"ball_dia_mm": 7.938,
"pitch_dia_mm": 38.5,
"contact_angle_deg": 0
}
```
39.04 mm is actually the 6305's pitch diameter (next entry in the same tool output: {"designation": "6305", ..., "pitch_dia_mm": 39.04}). Looks like a copy-paste between adjacent bearing entries when the utterance was authored.
Downstream impact: an agent that correctly calls calculate_bearing_frequencies(rpm=1800, n_balls=9, ball_dia=7.938, pitch_dia=38.5) and returns BPFO=107.165, BPFI=162.835, BSF=69.659, FTF=11.907 Hz (which is what the vibration server itself computes) would be judged against a characteristic_form referencing the wrong pitch diameter.
Finding 2 — ID 306 — wrong ISO 10816 zone
Prompt (ID 306):
What is the vibration severity classification for a machine with an RMS velocity of 4.5 mm/s? It is a medium-sized machine on rigid foundations.
characteristic_form currently says:
...classify 4.5 mm/s as ISO 10816 Zone B (acceptable) for a group2 machine
Real classification returned by assess_vibration_severity:
```json
{
"rms_velocity_mm_s": 4.5,
"iso_zone": "C",
"description": "Alarm - not suitable for long-term operation",
"machine_group": "group2",
"thresholds": { "A_good": 1.4, "B_acceptable": 2.8, "C_alarm": 7.1 }
}
```
Group2 thresholds are A=1.4 / B=2.8 / C=7.1 mm/s, so 4.5 lands unambiguously in Zone C. The characteristic_form would mark a correct "Zone C" answer wrong.
Suggested fixes
- ID 304: change
pitch_dia=39.04 mm → pitch_dia=38.5 mm (or drop the explicit number and reference the 6205 entry from the built-in bearing database).
- ID 306: change
Zone B (acceptable) → Zone C with the "Alarm - not suitable for long-term operation" description.
Happy to send a PR if useful — but flagging here first since baseline scores may depend on these values.
Related
Summary
Two
characteristic_formerrors insrc/scenarios/local/vibration_utterance.jsonthat an LLM judge would penalize a correct agent answer for. Both verified against the livevibrationMCP server (the same code that produces the ground-truth values).Scope note
These are in the in-repo local scenarios, not the HuggingFace
all_utterance.jsonlcorpus that #310 and #311 audit. They look like the same class of "scenario ground-truth audit" work though, so cross-linking in case maintainers want to track them together.Finding 1 — ID 304 — wrong bearing geometry for 6205
Prompt (ID 304):
characteristic_formcurrently says:Real geometry returned by
list_known_bearings:```json
{
"designation": "6205",
"n_balls": 9,
"ball_dia_mm": 7.938,
"pitch_dia_mm": 38.5,
"contact_angle_deg": 0
}
```
39.04 mmis actually the 6305's pitch diameter (next entry in the same tool output:{"designation": "6305", ..., "pitch_dia_mm": 39.04}). Looks like a copy-paste between adjacent bearing entries when the utterance was authored.Downstream impact: an agent that correctly calls
calculate_bearing_frequencies(rpm=1800, n_balls=9, ball_dia=7.938, pitch_dia=38.5)and returns BPFO=107.165, BPFI=162.835, BSF=69.659, FTF=11.907 Hz (which is what thevibrationserver itself computes) would be judged against acharacteristic_formreferencing the wrong pitch diameter.Finding 2 — ID 306 — wrong ISO 10816 zone
Prompt (ID 306):
characteristic_formcurrently says:Real classification returned by
assess_vibration_severity:```json
{
"rms_velocity_mm_s": 4.5,
"iso_zone": "C",
"description": "Alarm - not suitable for long-term operation",
"machine_group": "group2",
"thresholds": { "A_good": 1.4, "B_acceptable": 2.8, "C_alarm": 7.1 }
}
```
Group2 thresholds are A=1.4 / B=2.8 / C=7.1 mm/s, so 4.5 lands unambiguously in Zone C. The
characteristic_formwould mark a correct "Zone C" answer wrong.Suggested fixes
pitch_dia=39.04 mm→pitch_dia=38.5 mm(or drop the explicit number and reference the 6205 entry from the built-in bearing database).Zone B (acceptable)→Zone Cwith the"Alarm - not suitable for long-term operation"description.Happy to send a PR if useful — but flagging here first since baseline scores may depend on these values.
Related