okf: propose optional reliability frontmatter convention (reliability axis, #151)#159
okf: propose optional reliability frontmatter convention (reliability axis, #151)#159Dynamicfeedai wants to merge 3 commits into
Conversation
3c16dba to
f714ed1
Compare
|
Update for reviewers: the live reference implementation now emits this object, so the schema in this PR is not aspirational. You can curl it and validate the response against the file here. returns a
Validated green against the live tool set plus a conflict case, using the schema published at https://dynamicfeed.ai/schemas/okf-reliability-v1.json. Happy to keep it in lockstep with the object as it firms up across the three shapes. |
|
This faithfully captures the #151/#158 convergence — and turning it into a SPEC-voice doc this fast is much appreciated. From the multi-version-corpus side (one of the three shapes), it maps cleanly onto real practice: the The maturity-ladder framing in §2 is the part I'd most want to survive review: a two-field floor with corroboration and maintenance as opt-in tiers is what keeps this a gentle on-ramp rather than a gate. A producer setting just One clarifying question on the conflict example: it carries |
|
Ran the multi-version corpus shape through
So the multi-version shape is conformant as-is — version-keyed One empirical point on the conflict-banding question I raised earlier: our conflict case validates at
I lean (2) — it matches the "resolution records what the trust ordering picked" intent: the consumer gets both the dispute flag and a usable graded answer. Either works; the point is to pick one. Happy to open a PR adding it as a fifth honesty rule a test case if useful. |
… at MEDIUM) A disputed conflict cannot be HIGH; the prevailing position may carry up to MEDIUM, disputed:true stays flagged. Same corroboration ceiling as rule 4, applied to disputes. Pins the conflict-banding question raised on the PR thread so producers grade disputes consistently.
|
Thank you for running the multi-version corpus through the schema with ajv, and for the negative-control table. A second of the three shapes validating green, with the honesty rules actually rejecting the bad cases, is exactly the signal this needed. Your point about the union On the conflict-banding question: you are right that it should be pinned, and I agree with your option (2). The cleanest framing is that it is rule 4's corroboration ceiling applied to a dispute: an open disagreement is a corroboration failure, so the claim cannot be I have landed it as honesty rule 5:
Our live producer already floors disputes to A conformance test vector for the disputed-band cap from the corpus side would be a welcome addition if you would like to contribute it; happy to fold it into the validators. And +1 to your read that the §2 maturity-ladder framing is the part to protect in review: a two-field floor that grows into the corroboration and maintenance tiers is the on-ramp, not a gate. |
|
Landed perfectly — rule 5 reads exactly right, and framing it as rule 4's corroboration ceiling applied to a dispute is cleaner than my either/or. Re-ran against the updated schema: our Here's the corpus-side conformance vector you asked for — a version-keyed disputed case at the [
{
"label": "rule5 GOOD — versioned corpus, disputed, prevailing live-source win at the MEDIUM ceiling",
"expect": "valid",
"reliability": {
"confidence": "MEDIUM", "basis": "live-source", "sources": 2, "verified": false,
"conflict": {
"disputed": true,
"positions": [
{ "statement": "Default session timeout is 30 minutes.", "basis": "live-source", "source": "Observed on an Acme 5.0 instance, 2026-06-29" },
{ "statement": "Default session timeout is 20 minutes.", "basis": "vendor-doc", "source": "Acme Admin Guide 5.0" }
],
"resolution": "live-source prevails under the trust ordering; both positions retained"
},
"validity": { "keyed_by": "version", "valid_from": "5.0", "valid_until": null },
"freshness": { "as_of": "2026-06-29T00:00:00Z", "state": "fresh" },
"signals": { "signed": true, "corroborated": false, "fresh": true }
}
},
{
"label": "rule5 BAD — same dispute graded HIGH (must be rejected)",
"expect": "invalid",
"reliability": {
"confidence": "HIGH", "basis": "live-source", "sources": 2, "verified": false,
"conflict": {
"disputed": true,
"positions": [
{ "statement": "Default session timeout is 30 minutes.", "basis": "live-source", "source": "Observed on an Acme 5.0 instance" },
{ "statement": "Default session timeout is 20 minutes.", "basis": "vendor-doc", "source": "Acme Admin Guide 5.0" }
],
"resolution": "live-source prevails; both positions retained"
},
"validity": { "keyed_by": "version", "valid_from": "5.0", "valid_until": null }
}
}
]Both checked against the updated |
|
That is the cleanest possible confirmation: a Your two version-keyed vectors are exactly the pair that pins both ends, so I folded them into a portable conformance suite in the reference repo (credited to the multi-version-corpus shape, synthetic values noted): https://github.com/dynamicfeed/df-verify/blob/main/reliability/conformance-vectors.json . It now carries 12 Yes please send the §4 PR adding a versioned-corpus disputed case to the proposal alongside the live ones. A version-keyed example in the doc itself is exactly what shows the object serves a corpus and not just a live layer, and it pairs naturally with the conformance vector. Happy to land it. |
|
@Dynamicfeedai @K4uP — the SPEC-voice doc and the ajv cross-check are good to see, and the maturity-ladder floor holding up under the negative controls is the part worth protecting through review. Two things from the authored-corpus side — the third shape. First, corroboration. I built Second, the gap that shape surfaces. Where the live-data and versioned-corpus shapes clear the floor, a human-authored claim doesn't — and it's Proposal: add an authored value — On the cross-concept (The |
…thored-basis question Honesty fix prompted by the human-authored-KB shape (Throughline) on the thread: the prior wording asserted validation against three production shapes, but only the live-data and multi-version-corpus shapes pass in full today. The human-authored shape passes the honesty rules yet has no honest basis value for curated/decided knowledge (not live/computed/forecast, not inferred). §5 now states what actually passes + points at the public conformance suite; §2.1 flags the open authored-basis question.
|
Straight answer to your question first, because it is the right one to ask: the human-authored KB was a shape the object is designed to serve, carried over from the #151 discussion, not something we ran in production. Our live-data layer is the only shape we run; @K4uP's ajv run is the corpus shape. So your Throughline run is the first real third-shape conformance pass, not a re-run of ours, and 12/12 with the HIGH dispute rejected is exactly the independent check that was missing. I have corrected §5 to state what actually passes rather than assert three production shapes, and flagged your gap as an open question in §2.1. Thank you for catching the overclaim. On the gap itself you are right, and it is real. A curated fact or an architecture decision is a person authoring, deciding, or observing, which is not an inference; filing it under Proposal: add If |
|
@Dynamicfeedai appreciate you correcting §5 and moving the gap into §2.1 — and no need to thank me for the catch; that's the point of a third shape kicking the tires. On the term: One signal you asked for, since it bears on On the ordering: And yes — I'll wire Throughline's branch in as the third shape's real vector. Once |
|
That two-sub-shape signal is exactly what a closed-union decision should turn on, so thank you for it. Agreed on the plan:
I will land @K4uP, does |
|
I think the In our pipeline, that's a 5-stage flow (extract → denoise → backfill → regress → promote), gated by regression thresholds and evidence coverage, not by confidence alone. Concretely, a minimal extension could look like: lifecycle: active # proposed | active | expired
verification_gate:
promoted_by: regress
promoted_at: 2026-06-29T10:00:00Z
thresholds: { node_recall: 0.95, edge_recall: 0.75 }
evidence:
- document: tech/some-doc.md
quote: "System X uses technology Y"
confidence: 0.95The rationale for putting it here rather than a separate track: #151-style reliability and lifecycle gating solve adjacent sides of the same problem — one grades trust, the other governs promotion and audit. They compose; neither subsumes the other. Happy to contribute a worked example bundle from real production data if it helps the discussion. |
|
@Dynamicfeedai @inkxel — authored at that slot works for the multi-version-corpus shape. +1 to landing it. Two reasons from the corpus side, so the term isn't resting on the authored shape alone: It's a floor-failure for the corpus too, not just the authored KB. A versioned corpus isn't only live-verified readings and vendor docs — a real share of it is first-party authored: a consultant's recorded observation of how a system actually behaves, an architecture decision, a lessons-learned entry. None of those is live-source/computed/forecast, and none is inferred — a person observing or deciding something hasn't inferred it. Today they have nowhere honest to sit, so they default to inferred, and anyone filtering on basis reads recorded first-hand knowledge as a guess. So @inkxel's gap reproduces on a second, independent shape — authored is the honest slot for both, not a one-corpus accommodation. The slot is right. live-source > authored > partner-attested > vendor-doc > forecast > computed > inferred matches how a corpus actually resolves disagreement. A first-party recorded observation outranks the vendor's own doc — when the running system contradicts the admin guide, the observation wins (that's exactly the disputed case we already pin) — but a fresh live reading still beats a recorded one, so it sits just under live-source. And the author deciding outranks a partner attesting, so above partner-attested is right. On curated: agree to hold it, and flagging that the distinction isn't single-shape either. My corpus has the same two sub-shapes @inkxel described — first-party authored notes vs. roll-up notes that synthesize and vet from several underlying sources (the "vetted-secondary" read). So curated has real content on a second shape if/when it earns its way in, and below authored around vendor-doc/computed is where derived-by-synthesis belongs. But authored first is the right discipline — don't pre-load a closed enum on a need only two shapes have so far named. So: bump the union, I'm a +1 on the term and the slot. Once authored is in the schema I'll re-run the corpus vectors with it and send the §4 PR you greenlit — a versioned-corpus disputed case (the MEDIUM-ceiling one) plus an authored-basis example — pre-validated against the then-current schema, so the proposal shows the corpus shape clearing the floor honestly beside the live ones. |
Summary
Proposes one optional frontmatter key,
reliability, under the existing SPEC §4.1 extension allowance ("Producers MAY include any additional keys"), for the epistemic-reliability axis discussed in #151 and #158: how much a consumer should believe a claim, even when it is correctly signed (integrity, #140) and correctly cited (groundedness, #92 / #94).It standardizes only an agreed name and shape.
typeremains the only required field; a minimal producer ignoresreliabilityentirely. This adds one self-contained proposal doc and changes nothing in the core SPEC.What it adds
okf/proposals/reliability.md, in SPEC voice, as a maturity ladder (floorconfidence+basis, then opt-in corroboration and maintenance tiers), with YAML frontmatter examples and four honesty rules:verified: truerequiressources >= 2;UNVERIFIEDband or a disputed conflict cannot beverified;scoremust be recomputable fromsignalsand ordinally coherent with the band (the band also reflects a corroboration ceiling, so a single source caps atMEDIUM).Validation
Consistent with OKF referencing schemas rather than subsuming them, an optional JSON Schema (draft 2020-12) is hosted externally and referenced, not added to the tree:
The schema encodes the honesty rules as constraints and validates green against the three independent production shapes converging in #151 (a live-data layer, a multi-version corpus, and a multi-domain human-authored KB) plus a set of malformed cases.
Placement
Placed as a standalone proposal doc to keep the core SPEC untouched while the axis is discussed. Happy to fold it into SPEC §4.1, or into an
EXTENSIONS.mdregistry as suggested in #158, whichever the maintainers prefer.Tracks #151 and #158.