feat(observability): add structured digest.* event family to discussion-digest, the unobserved hot-path LLM call

## Finding

`src/workflows/discussion-digest.ts` is the **only** LLM-touching subsystem in the bot that ships without a structured `event: "..."` log family, and it runs on the hot path of every comment-aware workflow (`triage`, `plan`, `implement`, `review`, `resolve`, `remember`, see `src/workflows/handlers/triage.ts:115`, `plan.ts:41`, `implement.ts:54`, `review.ts:62`, `resolve.ts:100`, `remember.ts:47`). Triage, dispatcher, retry, secret-guard, MCP, fleet, and scheduler each get their own table in `docs/operate/observability.md` (see `## Dispatcher log fields` at line 135 and `## Retry log fields` at line 83); digest does not. The only digest emissions in the entire file are five free-form strings (`src/workflows/discussion-digest.ts:329` LLM failure, `:334` structured-output rejection, `:360` directives dropped, `:411` map-reduce start, `:567` fetch failure). No completion event exists, so a successful digest call is invisible to operators.

This silence hides four operationally relevant signals. **(1) Skip rate.** `buildDiscussionDigest` returns `{ ok: false, reason: "no-comments" }` at `src/workflows/discussion-digest.ts:380` when every comment was authored by a bot, with no log line. An operator who sees the workflow run without a "Maintainer guidance" section in the prompt cannot distinguish "no human commented" from "LLM failed" from "all owner directives got dropped by `enforceOwnerDirectives` at `:351`". **(2) Token usage.** `runDigestCall` at `:313` discards `response.usage` even though `LLMUsage { inputTokens, outputTokens }` is on the response object (`src/ai/llm-client.ts:158`). The digest is wired to `config.digestModel` (default Sonnet 4.6) and runs on every structured workflow event, so it materially contributes to spend — yet a fleet operator cannot answer "how much is the digest costing me per workflow" without instrumenting it ad-hoc. **(3) Latency.** No `latencyMs` field is captured for any of the 1-to-N LLM calls in single-pass or map-reduce; under load this is the prime suspect when a workflow's perceived start-to-first-comment delay drifts. **(4) Trust-boundary effectiveness.** `enforceOwnerDirectives` (`:351`) only logs when `dropped > 0`. The `dropped: 0` case — the steady-state, attestation-worthy event — is unobservable, so an attacker successfully extracting a directive is the same log shape as no attack at all.

The fix is a small, contained structured-event family (`digest.skipped`, `digest.call.completed`, `digest.completed`, `digest.failed`) emitted from the existing chokepoints, mirroring the pattern documented at `docs/operate/observability.md:135` (dispatcher) and `:83` (retry). No new dependencies; `response.usage` already carries the token counts, `parseStructuredResponse(...).strategy` (`src/ai/structured-output.ts:57`) already carries strict-vs-tolerant for free.

## Diagram

```mermaid
flowchart TD
 Caller["Workflow handler triage / plan / implement review / resolve / remember"]
 Fetch["fetchAndBuildDigest discussion-digest.ts:541"]
 Build["buildDiscussionDigest discussion-digest.ts:370"]
 Classify["classifyComments partitions owner/other/bot"]
 Skip["humanCount === 0 return no-comments discussion-digest.ts:378"]
 Pack["estimate tokens chunk if over budget"]
 Single["Single-pass LLM call runDigestCall :313"]
 Map["Map: N LLM calls"]
 Reduce["Reduce: 1 LLM call"]
 Enforce["enforceOwnerDirectives drop forged authors :351"]
 Done["return ok=true"]

 Caller --> Fetch
 Fetch -.catch warn.-> FetchFail["log.warn comment fetch failed :567"]
 Fetch --> Build
 Build --> Classify
 Classify --> Skip
 Classify --> Pack
 Pack -- one chunk --> Single
 Pack -- many chunks --> Map
 Map --> Reduce

 Single -.catch warn.-> CallFail["log.warn LLM call failed :329 log.warn parse rejected :334"]
 Map -.catch warn.-> CallFail
 Reduce -.catch warn.-> CallFail

 Single --> Enforce
 Reduce --> Enforce
 Map -.info: chunkCount only.-> MapLog["log.info map-reduce :411"]
 Enforce -.only when dropped greater than 0.-> Drop["log.warn dropped directives :360"]
 Enforce --> Done

 classDef gap fill:#922b21,color:#ffffff,stroke:#641e16
 classDef ok fill:#196f3d,color:#ffffff,stroke:#0e4d2a
 classDef path fill:#1f4e79,color:#ffffff,stroke:#11314d

 class Skip,Done,Single,Map,Reduce gap
 class FetchFail,CallFail,Drop,MapLog ok
 class Caller,Fetch,Build,Classify,Pack,Enforce path
```

Red nodes = chokepoints emitting no structured event today. Green nodes = the five existing free-form warn/info call sites. Blue = unobserved code path.

## Rationale

The digest is on the **forced** prefix of six structured workflows. Each workflow event therefore costs ≥1 digest LLM call (single-pass) or N+1 calls (map-reduce on large threads). With `config.digestModel` defaulting to Sonnet 4.6, the digest can comfortably account for 5-30% of a workflow's per-event spend, and it is the dominant contributor to the wall-clock between webhook receipt and first agent token on a comment-heavy thread (the synchronous `await fetchAndBuildDigest(...)` blocks `postStartingComment`, see e.g. `src/workflows/handlers/triage.ts:115-127`). Yet none of these costs are pin-pointable from logs today.

This matters for three operational scenarios the project already invests in. **(a)** The existing token-usage instrumentation that `feat(observability): log + persist SDK token usage on executions` (commit `5407dcd`) added persists `input_tokens`/`output_tokens` for the **agent** SDK call only; the digest's tokens roll up under nothing. An operator running `SELECT sum(input_tokens) FROM executions WHERE workflow = 'triage'` undercounts true triage spend by the digest portion. **(b)** Issue `feat(observability): add scheduler.scan.* lifecycle events` (open, area: observability) explicitly models the pattern this proposal would reuse — a tick-and-summary event family — so adding `digest.*` keeps the codebase's observability surface uniform rather than divergent. **(c)** `enforceOwnerDirectives` is a load-bearing security boundary (`CLAUDE.md` § *Comment-aware workflows* — "Directives are re-checked post-parse against the classified owner authors, so the boundary does not depend on the model"). The current `dropped > 0`-only log means an SRE has no positive-control evidence the boundary fires when it should, and no baseline to alert on a sudden spike in drops (which is the signature of a prompt-injection or model-regression event the design exists to catch).

The implementation cost is small: ~15 lines of new logging in `discussion-digest.ts`, one event-key constants block, one Zod discriminated union pinning the field shape (mirror `src/orchestrator/log-fields.ts:55#DispatcherOfferLogSchema`), and a new `## Discussion digest log fields` table in `docs/operate/observability.md`. No schema, no new dep, no behaviour change. Compare: the dispatcher `digest.*` analogue (`feat(observability): structured dispatcher.offer.{sent,accepted,rejected,timed_out} events`, closed) was a similarly contained PR.

## References

**Internal**:

- `src/workflows/discussion-digest.ts:313` — `runDigestCall`, the LLM-call chokepoint that drops `response.usage` and emits no completion event.
- `src/workflows/discussion-digest.ts:351` — `enforceOwnerDirectives`, the trust boundary that only logs the `dropped > 0` minority case.
- `src/workflows/discussion-digest.ts:370` — `buildDiscussionDigest`, where the `no-comments` return at `:380` is silent.
- `src/ai/llm-client.ts:158` — `LLMResponse.usage: LLMUsage` carrying `inputTokens` + `outputTokens`, already populated, currently discarded by the digest call site.
- `src/ai/structured-output.ts:57` — `StructuredResult.strategy: "strict" | "tolerant"`, the same JSON-quality signal the closed `triage response recovered via tolerant parser` line at `src/orchestrator/triage.ts:356` already surfaces; digest discards it.
- `docs/operate/observability.md:135` — `## Dispatcher log fields`, the pattern to mirror (event-key constants + discriminated union + test).
- `src/orchestrator/log-fields.ts:55#DispatcherOfferLogSchema` — the strict-Zod-per-event-shape template.
- `src/workflows/handlers/triage.ts:115`, `plan.ts:41`, `implement.ts:54`, `review.ts:62`, `resolve.ts:100`, `remember.ts:47` — the six consumers that synchronously block on the un-observed `await fetchAndBuildDigest(...)`.

**External**:

- [OpenTelemetry Semantic Conventions for GenAI events](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-events/) — the standard schema for `gen_ai.client.token.usage`, `gen_ai.request.duration`, and per-call event attributes the proposed `digest.*` events would be a Pino-side analogue of.
- [LaunchDarkly — LLM Observability: Tutorial & Best Practices](https://launchdarkly.com/blog/llm-observability/) — covers the per-call latency-p50/p90/p99 + token-usage + cost-attribution trio this proposal makes recoverable for the digest path.

## Suggested Next Steps

1. Add `src/workflows/digest-log-fields.ts` with a `DIGEST_LOG_EVENTS` constant (`"digest.skipped" | "digest.call.completed" | "digest.completed" | "digest.failed"`) and a `z.discriminatedUnion("event", ...)` schema mirroring `src/orchestrator/log-fields.ts:55#DispatcherOfferLogSchema`. Co-locate a test that fails CI on event-key drift.
2. In `runDigestCall` (`src/workflows/discussion-digest.ts:313`), capture `performance.now()` deltas and emit `digest.call.completed` with `{ phase: "extract" | "reduce", input_tokens, output_tokens, latency_ms, strategy }` on the success path, where `strategy` is `parsed.strategy`. Reuse `response.usage.inputTokens` / `response.usage.outputTokens` from the existing `client.create` result.
3. In `buildDiscussionDigest` (`:370`), emit `digest.skipped` at the `humanCount === 0` early-return (`:378`) with `{ comment_counts: { owner, other, bot } }`, and `digest.completed` at the final success return with `{ chunks, total_latency_ms, directives_kept, directives_dropped, has_prior_bot_output, untrusted_context_count, conversation_summary_chars }`.
4. In `fetchAndBuildDigest` (`:541`), emit `digest.failed` at the fetch catch (`:567`) and on every `ok: false` propagated return, with the existing `DigestResult.reason` enum (`no-comments | llm-error | parse-error`) carried as a field.
5. Add a `## Discussion digest log fields` section to `docs/operate/observability.md` documenting the four event keys, fields, and at least one suggested alert (e.g. `count(event = "digest.failed" AND reason = "parse-error") by repo` over 15-minute windows as a model-quality regression alarm).
6. (Optional follow-up issue) Persist digest token counts to `executions.input_tokens` / `output_tokens` by summing the agent SDK and digest counters at workflow-end so the existing `SELECT sum(input_tokens) FROM executions` queries account for digest spend without per-handler arithmetic.

## Areas Evaluated

- Read `src/workflows/discussion-digest.ts` end-to-end (629 lines) and inventoried all five log call sites.
- Verified every consumer handler (`triage`, `plan`, `implement`, `review`, `resolve`, `remember`) calls `fetchAndBuildDigest` synchronously before any user-facing comment is posted, via `grep -rn "fetchAndBuildDigest"` against `src/`.
- Cross-checked the existing observability surface in `docs/operate/observability.md` (sections at lines 83, 110, 122, 135, 151, 163, 220), `src/orchestrator/log-fields.ts` (dispatcher), `src/utils/retry-log-fields.ts` (retry), `src/orchestrator/triage.ts` (triage) — none cover discussion-digest.
- Confirmed `LLMResponse.usage` is already populated by `src/ai/llm-client.ts:158`, and `parseStructuredResponse(...).strategy` is already in scope (`src/ai/structured-output.ts:57`), so adding the proposed fields requires no upstream-API changes.
- Reviewed every open + closed `research`-labelled issue (29 total): no overlap with this proposal. The closest neighbours (`feat(observability): structured dispatcher.offer.*`, closed; `feat(observability): periodic fleet-state gauge snapshot`, closed; `feat(observability): add scheduler.scan.* lifecycle events`, open) cover different chokepoints.
- Verified `git log --oneline -20`: the most recent observability commit (`6713cbf feat(observability): add structured retry.* events`) follows the exact event-family pattern this proposal would extend to the digest, confirming the codebase is actively converging on structured `noun.verb` events.

*Generated by the scheduled research action on 2026-06-13*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(observability): add structured digest.* event family to discussion-digest, the unobserved hot-path LLM call #228

Finding

Diagram

Rationale

References

Suggested Next Steps

Areas Evaluated

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat(observability): add structured digest.* event family to discussion-digest, the unobserved hot-path LLM call #228

Description

Finding

Diagram

Rationale

References

Suggested Next Steps

Areas Evaluated

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions