Context
Natural-language guidance is a soft, cheap experimentation layer. Its usefulness should be observed, not mistaken for deterministic enforcement.
The first real loop test showed both sides:
- approved guidance can influence later output;
- it can also fail to manifest, partially manifest, or conflict with other instructions.
v0.8 should measure this behavior without pretending memory is a contract or gate.
Problem
Approved guidance currently has limited post-run observability:
- operators cannot easily see whether a guidance entry manifested;
- failures may be misread as bugs in the preference layer;
- useful regressions may be missed;
- there is no structured way to learn which kinds of guidance should be promoted into templates, contracts, or gates.
Scope
Add a manifestation report for approved guidance.
For each approved guidance entry in the run snapshot, record an observable status such as:
explicitly_reflected
partially_reflected
contradicted
not_observable
Optionally record supporting notes or artifact references.
Boundary
This report is diagnostic. It must not automatically:
- mutate guidance;
- approve new guidance;
- reject old guidance;
- rewrite the brief;
- turn preference memory into a quality gate.
Acceptance criteria
- Each run can report whether approved guidance from the snapshot appears to manifest.
- Report distinguishes full manifestation, partial manifestation, contradiction, and not-observable cases.
- Report is clearly documented as diagnostic, not enforcement.
- At least one evaluation or fixture demonstrates a guidance entry that is not observable.
- The report can inform future promotion of stable preferences into templates/contracts/gates, but does not do that automatically.
Non-goals
- No autonomous memory updates.
- No preference compiler.
- No hard guarantee that natural-language guidance will be followed.
- No current-run hot reload.
- No automatic quality score.
Context
Natural-language guidance is a soft, cheap experimentation layer. Its usefulness should be observed, not mistaken for deterministic enforcement.
The first real loop test showed both sides:
v0.8 should measure this behavior without pretending memory is a contract or gate.
Problem
Approved guidance currently has limited post-run observability:
Scope
Add a manifestation report for approved guidance.
For each approved guidance entry in the run snapshot, record an observable status such as:
explicitly_reflectedpartially_reflectedcontradictednot_observableOptionally record supporting notes or artifact references.
Boundary
This report is diagnostic. It must not automatically:
Acceptance criteria
Non-goals