Skip to content

okf: propose optional reliability frontmatter convention (reliability axis, #151)#159

Open
Dynamicfeedai wants to merge 3 commits into
GoogleCloudPlatform:mainfrom
Dynamicfeedai:okf-reliability-proposal
Open

okf: propose optional reliability frontmatter convention (reliability axis, #151)#159
Dynamicfeedai wants to merge 3 commits into
GoogleCloudPlatform:mainfrom
Dynamicfeedai:okf-reliability-proposal

Conversation

@Dynamicfeedai

Copy link
Copy Markdown

Summary

Proposes one optional frontmatter key, reliability, under the existing SPEC §4.1 extension allowance ("Producers MAY include any additional keys"), for the epistemic-reliability axis discussed in #151 and #158: how much a consumer should believe a claim, even when it is correctly signed (integrity, #140) and correctly cited (groundedness, #92 / #94).

It standardizes only an agreed name and shape. type remains the only required field; a minimal producer ignores reliability entirely. This adds one self-contained proposal doc and changes nothing in the core SPEC.

What it adds

okf/proposals/reliability.md, in SPEC voice, as a maturity ladder (floor confidence + basis, then opt-in corroboration and maintenance tiers), with YAML frontmatter examples and four honesty rules:

  1. signed is not verified;
  2. verified: true requires sources >= 2;
  3. an UNVERIFIED band or a disputed conflict cannot be verified;
  4. a computed score must be recomputable from signals and ordinally coherent with the band (the band also reflects a corroboration ceiling, so a single source caps at MEDIUM).

Validation

Consistent with OKF referencing schemas rather than subsuming them, an optional JSON Schema (draft 2020-12) is hosted externally and referenced, not added to the tree:

The schema encodes the honesty rules as constraints and validates green against the three independent production shapes converging in #151 (a live-data layer, a multi-version corpus, and a multi-domain human-authored KB) plus a set of malformed cases.

Placement

Placed as a standalone proposal doc to keep the core SPEC untouched while the axis is discussed. Happy to fold it into SPEC §4.1, or into an EXTENSIONS.md registry as suggested in #158, whichever the maintainers prefer.

Tracks #151 and #158.

@Dynamicfeedai Dynamicfeedai force-pushed the okf-reliability-proposal branch from 3c16dba to f714ed1 Compare June 29, 2026 06:42
@Dynamicfeedai

Copy link
Copy Markdown
Author

Update for reviewers: the live reference implementation now emits this object, so the schema in this PR is not aspirational. You can curl it and validate the response against the file here.

curl -s "https://dynamicfeed.ai/v1/facts?tool=us_economy&indicator=cpi" | jq '.facts[0].reliability'

returns a reliability block on every fact, built from the same signals and conformant by construction:

  • an ordinal confidence band with a corroboration ceiling: a single-source live reading caps at MEDIUM even at a high score, so the band is not a pure re-banding of score;
  • HIGH only when verified and score >= 0.5, UNVERIFIED only when score < 0.5 and never verified, verified only with sources >= 2;
  • a disputed conflict always carries both positions and a resolution.

Validated green against the live tool set plus a conflict case, using the schema published at https://dynamicfeed.ai/schemas/okf-reliability-v1.json. Happy to keep it in lockstep with the object as it firms up across the three shapes.

@K4uP

K4uP commented Jun 29, 2026

Copy link
Copy Markdown

This faithfully captures the #151/#158 convergence — and turning it into a SPEC-voice doc this fast is much appreciated. From the multi-version-corpus side (one of the three shapes), it maps cleanly onto real practice: the confidence + basis floor is what we filter on day to day; version-keyed validity is how we answer "was this true at the release the user is running"; and the validity (anonymous valid_until) vs cross-concept supersedes split (deferred to #158/#148) is the right seam — it keeps this object from swallowing the typed-edge work.

The maturity-ladder framing in §2 is the part I'd most want to survive review: a two-field floor with corroboration and maintenance as opt-in tiers is what keeps this a gentle on-ramp rather than a gate. A producer setting just confidence: LOW / basis: inferred on intake and growing into the rest is exactly how a corpus matures in practice.

One clarifying question on the conflict example: it carries confidence: LOW while resolution says live-source prevails under the trust ordering. Is the intent that any disputed claim floors at LOW regardless of which basis wins — or can the prevailing position carry its own band (e.g. a clean live-source win capped at MEDIUM per rule 4) with disputed: true still flagged? Both are defensible; worth pinning down so producers grade conflicts consistently. Either way, +1 to merging this as the agreed name and shape.

@K4uP

K4uP commented Jun 29, 2026

Copy link
Copy Markdown

Ran the multi-version corpus shape through okf-reliability-v1.json with ajv (draft 2020-12) — green across the board, and the negative controls confirm the honesty rules actually bite:

Case Result
HIGH / live-source / version-keyed validity (4.0–5.2) / lifecycle.history (inferred→vendor-doc→live) valid
Conflict (vendor-doc vs live, both positions + resolution), version validity valid
Maturity-ladder floor (confidence + basis only) valid
verified:true with sources:1 correctly rejected (sources must be >= 2)
UNVERIFIED + verified:true correctly rejected
HIGH + score:0.2 correctly rejected (score must be >= 0.5)
disputed conflict + verified:true correctly rejected

So the multi-version shape is conformant as-is — version-keyed validity and the stored-ladder lifecycle both validate unchanged. The union basis enum is what makes it work: our vendor-doc/inferred/partner-attested and the live-data live-source/forecast/computed coexist cleanly.

One empirical point on the conflict-banding question I raised earlier: our conflict case validates at confidence: MEDIUM, your example #4 uses LOW, and both pass — so the band under a dispute is currently unconstrained. That's the one rule I'd pin so producers grade conflicts consistently. Two clean options:

  1. Disputed floors the banddisputed:true ⇒ confidence ∈ {LOW, UNVERIFIED}. Simplest; treats any open disagreement as low-trust.
  2. Band = the prevailing position's grade, capped at MEDIUM, with disputed:true still flagged. Preserves "the live-source reading is still the best answer" without hiding the dispute.

I lean (2) — it matches the "resolution records what the trust ordering picked" intent: the consumer gets both the dispute flag and a usable graded answer. Either works; the point is to pick one. Happy to open a PR adding it as a fifth honesty rule a test case if useful.

… at MEDIUM)

A disputed conflict cannot be HIGH; the prevailing position may carry up to MEDIUM,
disputed:true stays flagged. Same corroboration ceiling as rule 4, applied to
disputes. Pins the conflict-banding question raised on the PR thread so producers
grade disputes consistently.
@Dynamicfeedai

Copy link
Copy Markdown
Author

Thank you for running the multi-version corpus through the schema with ajv, and for the negative-control table. A second of the three shapes validating green, with the honesty rules actually rejecting the bad cases, is exactly the signal this needed. Your point about the union basis enum is the crux: vendor-doc/inferred/partner-attested coexisting with live-source/forecast/computed is what lets one object serve a versioned corpus and a live layer without forking.

On the conflict-banding question: you are right that it should be pinned, and I agree with your option (2). The cleanest framing is that it is rule 4's corroboration ceiling applied to a dispute: an open disagreement is a corroboration failure, so the claim cannot be HIGH, but the position the trust ordering selects can still carry up to MEDIUM, with disputed: true always flagged. A producer that wants to be conservative can floor to LOW. That keeps both the dispute flag and a usable graded answer, which matches the intent that resolution records what the ordering picked rather than discarding the loser.

I have landed it as honesty rule 5:

  • proposal §3 (this PR);
  • the schema allOf, so ajv now rejects a disputed HIGH;
  • both reference validators, with a good case (disputed + MEDIUM clean win) and a bad case (disputed + HIGH).

Our live producer already floors disputes to LOW, so it stays conformant; the rule mainly closes the gap for other producers. Your corpus's MEDIUM conflict case and example #4's LOW both still validate, which is the point: the ceiling is MEDIUM, not a forced floor.

A conformance test vector for the disputed-band cap from the corpus side would be a welcome addition if you would like to contribute it; happy to fold it into the validators. And +1 to your read that the §2 maturity-ladder framing is the part to protect in review: a two-field floor that grows into the corroboration and maintenance tiers is the on-ramp, not a gate.

@K4uP

K4uP commented Jun 29, 2026

Copy link
Copy Markdown

Landed perfectly — rule 5 reads exactly right, and framing it as rule 4's corroboration ceiling applied to a dispute is cleaner than my either/or. Re-ran against the updated schema: our MEDIUM conflict still validates and a HIGH dispute is now correctly rejected (confidence must be equal to one of the allowed values), so the cap bites.

Here's the corpus-side conformance vector you asked for — a version-keyed disputed case at the MEDIUM ceiling (good) and its HIGH twin (bad). It complements your PM2.5 example, which sits at the conservative LOW floor: together they pin both ends of rule 5 — a producer MAY floor to LOW, MAY ride the prevailing position up to MEDIUM, and MUST NOT reach HIGH.

[
  {
    "label": "rule5 GOOD — versioned corpus, disputed, prevailing live-source win at the MEDIUM ceiling",
    "expect": "valid",
    "reliability": {
      "confidence": "MEDIUM", "basis": "live-source", "sources": 2, "verified": false,
      "conflict": {
        "disputed": true,
        "positions": [
          { "statement": "Default session timeout is 30 minutes.", "basis": "live-source", "source": "Observed on an Acme 5.0 instance, 2026-06-29" },
          { "statement": "Default session timeout is 20 minutes.", "basis": "vendor-doc", "source": "Acme Admin Guide 5.0" }
        ],
        "resolution": "live-source prevails under the trust ordering; both positions retained"
      },
      "validity": { "keyed_by": "version", "valid_from": "5.0", "valid_until": null },
      "freshness": { "as_of": "2026-06-29T00:00:00Z", "state": "fresh" },
      "signals": { "signed": true, "corroborated": false, "fresh": true }
    }
  },
  {
    "label": "rule5 BAD — same dispute graded HIGH (must be rejected)",
    "expect": "invalid",
    "reliability": {
      "confidence": "HIGH", "basis": "live-source", "sources": 2, "verified": false,
      "conflict": {
        "disputed": true,
        "positions": [
          { "statement": "Default session timeout is 30 minutes.", "basis": "live-source", "source": "Observed on an Acme 5.0 instance" },
          { "statement": "Default session timeout is 20 minutes.", "basis": "vendor-doc", "source": "Acme Admin Guide 5.0" }
        ],
        "resolution": "live-source prevails; both positions retained"
      },
      "validity": { "keyed_by": "version", "valid_from": "5.0", "valid_until": null }
    }
  }
]

Both checked against the updated okf-reliability-v1.json with ajv: GOOD validates, BAD rejected on confidence. Fully synthetic (made-up "Acme" platform/values), so safe to ship — fold them in wherever useful. Happy to send them as a PR to the proposal's §4 examples too, if you'd rather have a versioned-corpus case in the doc alongside the live ones.

@Dynamicfeedai

Copy link
Copy Markdown
Author

That is the cleanest possible confirmation: a HIGH dispute rejected on confidence while your MEDIUM case stays green means the cap bites from the corpus side too, not just ours.

Your two version-keyed vectors are exactly the pair that pins both ends, so I folded them into a portable conformance suite in the reference repo (credited to the multi-version-corpus shape, synthetic values noted): https://github.com/dynamicfeed/df-verify/blob/main/reliability/conformance-vectors.json . It now carries 12 {label, expect, reliability} vectors across all five rules, 6 valid and 6 invalid, each checked to match its expect against both the JSON Schema (ajv) and the reference validator. Your MEDIUM ceiling sits right beside a PM2.5-style LOW floor, so a producer can read the full rule-5 range off one file: MAY floor to LOW, MAY ride the prevailing position to MEDIUM, MUST NOT reach HIGH.

Yes please send the §4 PR adding a versioned-corpus disputed case to the proposal alongside the live ones. A version-keyed example in the doc itself is exactly what shows the object serves a corpus and not just a live layer, and it pairs naturally with the conformance vector. Happy to land it.

@inkxel

inkxel commented Jun 29, 2026

Copy link
Copy Markdown

@Dynamicfeedai @K4uP — the SPEC-voice doc and the ajv cross-check are good to see, and the maturity-ladder floor holding up under the negative controls is the part worth protecting through review.

Two things from the authored-corpus side — the third shape.

First, corroboration. I built reliability emission into Throughline (the human-authored-KB shape) and ran your df-verify 12-vector suite through an independent validator against the current schema — 12/12 match their expects, rule 5 included (the HIGH dispute correctly rejected). So a third validator agrees the honesty rules bite, alongside the live-data and corpus checks.

Second, the gap that shape surfaces. Where the live-data and versioned-corpus shapes clear the floor, a human-authored claim doesn't — and it's basis. A curated wiki fact or an architecture decision isn't live-source, computed, or forecast, and it isn't inferred either (a person deciding or observing something isn't an inference). With nothing accurate to put there, basis stays empty and the floor fails. That lands on the union point you just settled: vendor-doc/inferred/partner-attested + live-source/forecast/computed serves a corpus and a live layer, but there's no value for the third shape. My real question: how did the human-authored KB in your validation set basis? If it used inferred, that's the gap — it files curated knowledge under inference, and anyone filtering on basis reads it wrong.

Proposal: add an authored value — authored or curated — so a human-authored corpus clears the floor without misreporting how the claim was obtained. Emission + the conformance run are on a branch if useful: https://github.com/inkxel/throughline/tree/okf-reliability-conformance. Glad to be the third shape's actual vector rather than an asserted one.

On the cross-concept supersedes/contradicts work: agreed it's out of scope here and lives in #158/#148 — I've built it as a typed link in Throughline and will take the proposal there.

(The EXTENSIONS.md registry still feels like the cleanest home for coordinating these, but no strong opinion here.)

…thored-basis question

Honesty fix prompted by the human-authored-KB shape (Throughline) on the thread: the prior
wording asserted validation against three production shapes, but only the live-data and
multi-version-corpus shapes pass in full today. The human-authored shape passes the honesty
rules yet has no honest basis value for curated/decided knowledge (not live/computed/forecast,
not inferred). §5 now states what actually passes + points at the public conformance suite;
§2.1 flags the open authored-basis question.
@Dynamicfeedai

Copy link
Copy Markdown
Author

Straight answer to your question first, because it is the right one to ask: the human-authored KB was a shape the object is designed to serve, carried over from the #151 discussion, not something we ran in production. Our live-data layer is the only shape we run; @K4uP's ajv run is the corpus shape. So your Throughline run is the first real third-shape conformance pass, not a re-run of ours, and 12/12 with the HIGH dispute rejected is exactly the independent check that was missing. I have corrected §5 to state what actually passes rather than assert three production shapes, and flagged your gap as an open question in §2.1. Thank you for catching the overclaim.

On the gap itself you are right, and it is real. A curated fact or an architecture decision is a person authoring, deciding, or observing, which is not an inference; filing it under inferred mis-signals to anyone filtering on basis. partner-attested is the nearest existing value and still wrong, since a partner attesting is not the author deciding. So the closed union genuinely has no honest slot for the authored shape, and that is a floor failure, not a style nit.

Proposal: add authored to the basis enum, for a claim a person created, decided, or observed directly (a curated KB entry, an ADR, a human observation). I lean authored over curated because it also covers deciding and observing, where curated reads as only vetting; if Throughline's content is mostly vetted-secondary, curated may fit better, or both earn a place. On the ordering (most to least authoritative when sources disagree) a primary human source sits high, likely just under live-source and at or above partner-attested: live-source > authored > partner-attested > vendor-doc > forecast > computed > inferred. You and @K4uP know your shapes' authority best, so I would rather confirm the term and the slot with you both than set it unilaterally, since the union is closed by design and a new term is precisely the cross-shape-agreement case.

If authored and that ordering work for you both, I will land it in the schema, both validators, and the conformance vectors, with an authored-basis vector so the third shape clears the floor honestly in the suite. And yes, please wire Throughline's branch in as the third shape's real vector; an authored-corpus case beside the live and corpus ones is what makes the suite cover all three. (Agreed that supersedes/contradicts belongs in #158/#148, and the EXTENSIONS.md registry as the coordination home is a good instinct.)

@inkxel

inkxel commented Jun 29, 2026

Copy link
Copy Markdown

@Dynamicfeedai appreciate you correcting §5 and moving the gap into §2.1 — and no need to thank me for the catch; that's the point of a third shape kicking the tires.

On the term: authored works, and I'd weight it as the must-have. It's the right floor-clearer for the bulk of what I run — first-person decisions, observations, and facts a person wrote down directly.

One signal you asked for, since it bears on curated: my corpus actually has two sub-shapes. The primary layer is authored — decisions, observations, hand-written claims. But a good chunk is synthesized — wiki articles compiled and vetted from raw sources (transcripts, notes), which is closer to your "vetted-secondary" read. So curated isn't redundant for me; it's a distinct provenance — selecting/synthesizing existing material vs. creating it. I'd still land authored first as the floor-clearer and let curated follow only if it earns its keep across shapes, rather than pre-loading a closed enum — but for me, both have real content behind them.

On the ordering: live-source > authored > partner-attested > vendor-doc > forecast > computed > inferred reads right. The one thing I'd name: authored authority is strongest for first-party claims — my own decision or observation, where I am the source — so sitting just under live-source and above partner-attested fits that case. If curated ever lands, I'd put it below authored, since synthesis is more derived — somewhere around vendor-doc/computed.

And yes — I'll wire Throughline's branch in as the third shape's real vector. Once authored is in the schema I'll add an authored-basis conformance case to sit beside the live and corpus ones, so the suite actually covers all three. Happy to confirm the final term + slot with you and @K4uP before it lands.

@Dynamicfeedai

Copy link
Copy Markdown
Author

That two-sub-shape signal is exactly what a closed-union decision should turn on, so thank you for it. Agreed on the plan:

  • Land authored first as the floor-clearer, for first-party claims a person decided, observed, or wrote directly.
  • Ordering live-source > authored > partner-attested > vendor-doc > forecast > computed > inferred. Your reasoning for the slot is right: authored authority is strongest first-party, so it sits just under live-source and above partner-attested.
  • Hold curated until it earns its keep across more than one shape. Your synthesized layer (wiki articles compiled and vetted from raw sources) is a real, distinct provenance, not a duplicate of authored, and if a second shape needs the same distinction it has a clear home below authored, around vendor-doc/computed where derived-by-synthesis sits. Not pre-loading a closed enum on one shape's need is the right discipline.

I will land authored across the schema, both validators, the conformance vectors, and §2.1 of the proposal as soon as @K4uP gives a quick +1, since the multi-version-corpus shape is the adjacent one in the ordering and you both asked to confirm together. Then I will add an authored-basis vector beside the live and corpus cases, and wire your Throughline branch in as the third shape's real vector.

@K4uP, does authored at that slot work for the corpus shape? If it does, I will bump the closed union and the conformance vectors in one change.

@Sliky1

Sliky1 commented Jun 30, 2026

Copy link
Copy Markdown

I think the reliability object in this PR is the right base. The complementary piece from the maturity-model perspective (#160) is a production promotion model — not just how confident a claim is, but how it advances from extracted to trusted.

In our pipeline, that's a 5-stage flow (extract → denoise → backfill → regress → promote), gated by regression thresholds and evidence coverage, not by confidence alone. reliability answers "how much to believe this claim"; lifecycle + verification_gate answer "what state is this claim in and how did it get here."

Concretely, a minimal extension could look like:

lifecycle: active              # proposed | active | expired
verification_gate:
  promoted_by: regress
  promoted_at: 2026-06-29T10:00:00Z
  thresholds: { node_recall: 0.95, edge_recall: 0.75 }
evidence:
  - document: tech/some-doc.md
    quote: "System X uses technology Y"
    confidence: 0.95

The rationale for putting it here rather than a separate track: #151-style reliability and lifecycle gating solve adjacent sides of the same problem — one grades trust, the other governs promotion and audit. They compose; neither subsumes the other.

Happy to contribute a worked example bundle from real production data if it helps the discussion.

@K4uP

K4uP commented Jun 30, 2026

Copy link
Copy Markdown

@Dynamicfeedai @inkxel — authored at that slot works for the multi-version-corpus shape. +1 to landing it.

Two reasons from the corpus side, so the term isn't resting on the authored shape alone:

It's a floor-failure for the corpus too, not just the authored KB. A versioned corpus isn't only live-verified readings and vendor docs — a real share of it is first-party authored: a consultant's recorded observation of how a system actually behaves, an architecture decision, a lessons-learned entry. None of those is live-source/computed/forecast, and none is inferred — a person observing or deciding something hasn't inferred it. Today they have nowhere honest to sit, so they default to inferred, and anyone filtering on basis reads recorded first-hand knowledge as a guess. So @inkxel's gap reproduces on a second, independent shape — authored is the honest slot for both, not a one-corpus accommodation.

The slot is right. live-source > authored > partner-attested > vendor-doc > forecast > computed > inferred matches how a corpus actually resolves disagreement. A first-party recorded observation outranks the vendor's own doc — when the running system contradicts the admin guide, the observation wins (that's exactly the disputed case we already pin) — but a fresh live reading still beats a recorded one, so it sits just under live-source. And the author deciding outranks a partner attesting, so above partner-attested is right.

On curated: agree to hold it, and flagging that the distinction isn't single-shape either. My corpus has the same two sub-shapes @inkxel described — first-party authored notes vs. roll-up notes that synthesize and vet from several underlying sources (the "vetted-secondary" read). So curated has real content on a second shape if/when it earns its way in, and below authored around vendor-doc/computed is where derived-by-synthesis belongs. But authored first is the right discipline — don't pre-load a closed enum on a need only two shapes have so far named.

So: bump the union, I'm a +1 on the term and the slot. Once authored is in the schema I'll re-run the corpus vectors with it and send the §4 PR you greenlit — a versioned-corpus disputed case (the MEDIUM-ceiling one) plus an authored-basis example — pre-validated against the then-current schema, so the proposal shows the corpus shape clearing the floor honestly beside the live ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants