feat(hw_ci): fail infra boot-smoke on NO_MATCH, keep UNAVAILABLE a skip#110
Merged
Conversation
A live place preflight discovered but that adi-lg request can't match (exit 10 NO_MATCH — an uncatalogued/stale part, e.g. a swapped daughter-board missing from the coordinator catalog) was being lumped with contention and silently skipped, so a real config gap went unsurfaced (showed as a neutral "mixed" in Prism). Make NO_MATCH (10) fail while UNAVAILABLE (11, board just busy) stays a neutral skip. The rc->status mapping is now a single tested function (boot_status_for_rc) exposed as `adi-lg-hw-ci boot-status --rc`, which the infra-smoke leg calls instead of a bash case. Claude-Session: https://claude.ai/code/session_01BuMAqiic68LrMr6wWC7NCe
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
The infra boot-smoke leg now fails on
adi-lg requestexit10(NO_MATCH) while keeping exit11(UNAVAILABLE) a neutral skip.Why
Found via the daily job: mini2's daughter-board was swapped to
ad9081, which isn't in the coordinator's board catalog (/api/match→"unknown part"). That returns NO_MATCH (10), which the leg was lumping in with contention and silently skipping — so a real config gap (uncatalogued/stale part on a live board) went unsurfaced and just showed as a neutral "mixed" run in Prism. NO_MATCH for a place preflight discovered almost always means a catalog/config problem this job exists to catch, so it should redden the run, not hide.10NO_MATCH ("request can never be satisfied — unknown part / no such board") → fail11UNAVAILABLE ("board(s) exist but none free within --wait" — genuine contention) → skip (unchanged)0→ pass;12and any other non-zero → fail (unchanged)How
The rc→status mapping is now a single unit-tested function
boot_junit.boot_status_for_rc()(using the named exit-code constants), exposed asadi-lg-hw-ci boot-status --rc <n>. Theinfra-smoke.ymlleg calls it instead of an untestable bashcase, so there's one tested source of truth.Tests
boot_status_for_rcunit tests (0→pass, 11→skip, 10→fail, 12→fail, parametrized other-nonzero→fail) + a CLI test forboot-status. Fulltests/hw_ci/green (248), lint/format clean, YAML valid.Follow-on (yours)
Add an
ad9081entry to the coordinatorboard_catalog.yaml+ redeploy so mini2 boots for real. After that + this change, an uncatalogued live board would turn the daily red (loud) instead of skipping.https://claude.ai/code/session_01BuMAqiic68LrMr6wWC7NCe