You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
src/workflows/discussion-digest.ts is the only LLM-touching subsystem in the bot that ships without a structured event: "..." log family, and it runs on the hot path of every comment-aware workflow (triage, plan, implement, review, resolve, remember, see src/workflows/handlers/triage.ts:115, plan.ts:41, implement.ts:54, review.ts:62, resolve.ts:100, remember.ts:47). Triage, dispatcher, retry, secret-guard, MCP, fleet, and scheduler each get their own table in docs/operate/observability.md (see ## Dispatcher log fields at line 135 and ## Retry log fields at line 83); digest does not. The only digest emissions in the entire file are five free-form strings (src/workflows/discussion-digest.ts:329 LLM failure, :334 structured-output rejection, :360 directives dropped, :411 map-reduce start, :567 fetch failure). No completion event exists, so a successful digest call is invisible to operators.
This silence hides four operationally relevant signals. (1) Skip rate.buildDiscussionDigest returns { ok: false, reason: "no-comments" } at src/workflows/discussion-digest.ts:380 when every comment was authored by a bot, with no log line. An operator who sees the workflow run without a "Maintainer guidance" section in the prompt cannot distinguish "no human commented" from "LLM failed" from "all owner directives got dropped by enforceOwnerDirectives at :351". (2) Token usage.runDigestCall at :313 discards response.usage even though LLMUsage { inputTokens, outputTokens } is on the response object (src/ai/llm-client.ts:158). The digest is wired to config.digestModel (default Sonnet 4.6) and runs on every structured workflow event, so it materially contributes to spend — yet a fleet operator cannot answer "how much is the digest costing me per workflow" without instrumenting it ad-hoc. (3) Latency. No latencyMs field is captured for any of the 1-to-N LLM calls in single-pass or map-reduce; under load this is the prime suspect when a workflow's perceived start-to-first-comment delay drifts. (4) Trust-boundary effectiveness.enforceOwnerDirectives (:351) only logs when dropped > 0. The dropped: 0 case — the steady-state, attestation-worthy event — is unobservable, so an attacker successfully extracting a directive is the same log shape as no attack at all.
The fix is a small, contained structured-event family (digest.skipped, digest.call.completed, digest.completed, digest.failed) emitted from the existing chokepoints, mirroring the pattern documented at docs/operate/observability.md:135 (dispatcher) and :83 (retry). No new dependencies; response.usage already carries the token counts, parseStructuredResponse(...).strategy (src/ai/structured-output.ts:57) already carries strict-vs-tolerant for free.
Diagram
flowchart TD
Caller["Workflow handler<br/>triage / plan / implement<br/>review / resolve / remember"]
Fetch["fetchAndBuildDigest<br/>discussion-digest.ts:541"]
Build["buildDiscussionDigest<br/>discussion-digest.ts:370"]
Classify["classifyComments<br/>partitions owner/other/bot"]
Skip["humanCount === 0<br/>return no-comments<br/>discussion-digest.ts:378"]
Pack["estimate tokens<br/>chunk if over budget"]
Single["Single-pass LLM call<br/>runDigestCall :313"]
Map["Map: N LLM calls"]
Reduce["Reduce: 1 LLM call"]
Enforce["enforceOwnerDirectives<br/>drop forged authors :351"]
Done["return ok=true"]
Caller --> Fetch
Fetch -.catch warn.-> FetchFail["log.warn comment fetch failed<br/>:567"]
Fetch --> Build
Build --> Classify
Classify --> Skip
Classify --> Pack
Pack -- one chunk --> Single
Pack -- many chunks --> Map
Map --> Reduce
Single -.catch warn.-> CallFail["log.warn LLM call failed :329<br/>log.warn parse rejected :334"]
Map -.catch warn.-> CallFail
Reduce -.catch warn.-> CallFail
Single --> Enforce
Reduce --> Enforce
Map -.info: chunkCount only.-> MapLog["log.info map-reduce :411"]
Enforce -.only when dropped greater than 0.-> Drop["log.warn dropped directives :360"]
Enforce --> Done
classDef gap fill:#922b21,color:#ffffff,stroke:#641e16
classDef ok fill:#196f3d,color:#ffffff,stroke:#0e4d2a
classDef path fill:#1f4e79,color:#ffffff,stroke:#11314d
class Skip,Done,Single,Map,Reduce gap
class FetchFail,CallFail,Drop,MapLog ok
class Caller,Fetch,Build,Classify,Pack,Enforce path
Loading
Red nodes = chokepoints emitting no structured event today. Green nodes = the five existing free-form warn/info call sites. Blue = unobserved code path.
Rationale
The digest is on the forced prefix of six structured workflows. Each workflow event therefore costs ≥1 digest LLM call (single-pass) or N+1 calls (map-reduce on large threads). With config.digestModel defaulting to Sonnet 4.6, the digest can comfortably account for 5-30% of a workflow's per-event spend, and it is the dominant contributor to the wall-clock between webhook receipt and first agent token on a comment-heavy thread (the synchronous await fetchAndBuildDigest(...) blocks postStartingComment, see e.g. src/workflows/handlers/triage.ts:115-127). Yet none of these costs are pin-pointable from logs today.
This matters for three operational scenarios the project already invests in. (a) The existing token-usage instrumentation that feat(observability): log + persist SDK token usage on executions (commit 5407dcd) added persists input_tokens/output_tokens for the agent SDK call only; the digest's tokens roll up under nothing. An operator running SELECT sum(input_tokens) FROM executions WHERE workflow = 'triage' undercounts true triage spend by the digest portion. (b) Issue feat(observability): add scheduler.scan.* lifecycle events (open, area: observability) explicitly models the pattern this proposal would reuse — a tick-and-summary event family — so adding digest.* keeps the codebase's observability surface uniform rather than divergent. (c)enforceOwnerDirectives is a load-bearing security boundary (CLAUDE.md § Comment-aware workflows — "Directives are re-checked post-parse against the classified owner authors, so the boundary does not depend on the model"). The current dropped > 0-only log means an SRE has no positive-control evidence the boundary fires when it should, and no baseline to alert on a sudden spike in drops (which is the signature of a prompt-injection or model-regression event the design exists to catch).
The implementation cost is small: ~15 lines of new logging in discussion-digest.ts, one event-key constants block, one Zod discriminated union pinning the field shape (mirror src/orchestrator/log-fields.ts:55#DispatcherOfferLogSchema), and a new ## Discussion digest log fields table in docs/operate/observability.md. No schema, no new dep, no behaviour change. Compare: the dispatcher digest.* analogue (feat(observability): structured dispatcher.offer.{sent,accepted,rejected,timed_out} events, closed) was a similarly contained PR.
References
Internal:
src/workflows/discussion-digest.ts:313 — runDigestCall, the LLM-call chokepoint that drops response.usage and emits no completion event.
src/workflows/discussion-digest.ts:351 — enforceOwnerDirectives, the trust boundary that only logs the dropped > 0 minority case.
src/workflows/discussion-digest.ts:370 — buildDiscussionDigest, where the no-comments return at :380 is silent.
src/ai/llm-client.ts:158 — LLMResponse.usage: LLMUsage carrying inputTokens + outputTokens, already populated, currently discarded by the digest call site.
src/ai/structured-output.ts:57 — StructuredResult.strategy: "strict" | "tolerant", the same JSON-quality signal the closed triage response recovered via tolerant parser line at src/orchestrator/triage.ts:356 already surfaces; digest discards it.
docs/operate/observability.md:135 — ## Dispatcher log fields, the pattern to mirror (event-key constants + discriminated union + test).
src/orchestrator/log-fields.ts:55#DispatcherOfferLogSchema — the strict-Zod-per-event-shape template.
src/workflows/handlers/triage.ts:115, plan.ts:41, implement.ts:54, review.ts:62, resolve.ts:100, remember.ts:47 — the six consumers that synchronously block on the un-observed await fetchAndBuildDigest(...).
External:
OpenTelemetry Semantic Conventions for GenAI events — the standard schema for gen_ai.client.token.usage, gen_ai.request.duration, and per-call event attributes the proposed digest.* events would be a Pino-side analogue of.
Add src/workflows/digest-log-fields.ts with a DIGEST_LOG_EVENTS constant ("digest.skipped" | "digest.call.completed" | "digest.completed" | "digest.failed") and a z.discriminatedUnion("event", ...) schema mirroring src/orchestrator/log-fields.ts:55#DispatcherOfferLogSchema. Co-locate a test that fails CI on event-key drift.
In runDigestCall (src/workflows/discussion-digest.ts:313), capture performance.now() deltas and emit digest.call.completed with { phase: "extract" | "reduce", input_tokens, output_tokens, latency_ms, strategy } on the success path, where strategy is parsed.strategy. Reuse response.usage.inputTokens / response.usage.outputTokens from the existing client.create result.
In buildDiscussionDigest (:370), emit digest.skipped at the humanCount === 0 early-return (:378) with { comment_counts: { owner, other, bot } }, and digest.completed at the final success return with { chunks, total_latency_ms, directives_kept, directives_dropped, has_prior_bot_output, untrusted_context_count, conversation_summary_chars }.
In fetchAndBuildDigest (:541), emit digest.failed at the fetch catch (:567) and on every ok: false propagated return, with the existing DigestResult.reason enum (no-comments | llm-error | parse-error) carried as a field.
Add a ## Discussion digest log fields section to docs/operate/observability.md documenting the four event keys, fields, and at least one suggested alert (e.g. count(event = "digest.failed" AND reason = "parse-error") by repo over 15-minute windows as a model-quality regression alarm).
(Optional follow-up issue) Persist digest token counts to executions.input_tokens / output_tokens by summing the agent SDK and digest counters at workflow-end so the existing SELECT sum(input_tokens) FROM executions queries account for digest spend without per-handler arithmetic.
Areas Evaluated
Read src/workflows/discussion-digest.ts end-to-end (629 lines) and inventoried all five log call sites.
Verified every consumer handler (triage, plan, implement, review, resolve, remember) calls fetchAndBuildDigest synchronously before any user-facing comment is posted, via grep -rn "fetchAndBuildDigest" against src/.
Cross-checked the existing observability surface in docs/operate/observability.md (sections at lines 83, 110, 122, 135, 151, 163, 220), src/orchestrator/log-fields.ts (dispatcher), src/utils/retry-log-fields.ts (retry), src/orchestrator/triage.ts (triage) — none cover discussion-digest.
Confirmed LLMResponse.usage is already populated by src/ai/llm-client.ts:158, and parseStructuredResponse(...).strategy is already in scope (src/ai/structured-output.ts:57), so adding the proposed fields requires no upstream-API changes.
Reviewed every open + closed research-labelled issue (29 total): no overlap with this proposal. The closest neighbours (feat(observability): structured dispatcher.offer.*, closed; feat(observability): periodic fleet-state gauge snapshot, closed; feat(observability): add scheduler.scan.* lifecycle events, open) cover different chokepoints.
Verified git log --oneline -20: the most recent observability commit (6713cbf feat(observability): add structured retry.* events) follows the exact event-family pattern this proposal would extend to the digest, confirming the codebase is actively converging on structured noun.verb events.
Generated by the scheduled research action on 2026-06-13
Finding
src/workflows/discussion-digest.tsis the only LLM-touching subsystem in the bot that ships without a structuredevent: "..."log family, and it runs on the hot path of every comment-aware workflow (triage,plan,implement,review,resolve,remember, seesrc/workflows/handlers/triage.ts:115,plan.ts:41,implement.ts:54,review.ts:62,resolve.ts:100,remember.ts:47). Triage, dispatcher, retry, secret-guard, MCP, fleet, and scheduler each get their own table indocs/operate/observability.md(see## Dispatcher log fieldsat line 135 and## Retry log fieldsat line 83); digest does not. The only digest emissions in the entire file are five free-form strings (src/workflows/discussion-digest.ts:329LLM failure,:334structured-output rejection,:360directives dropped,:411map-reduce start,:567fetch failure). No completion event exists, so a successful digest call is invisible to operators.This silence hides four operationally relevant signals. (1) Skip rate.
buildDiscussionDigestreturns{ ok: false, reason: "no-comments" }atsrc/workflows/discussion-digest.ts:380when every comment was authored by a bot, with no log line. An operator who sees the workflow run without a "Maintainer guidance" section in the prompt cannot distinguish "no human commented" from "LLM failed" from "all owner directives got dropped byenforceOwnerDirectivesat:351". (2) Token usage.runDigestCallat:313discardsresponse.usageeven thoughLLMUsage { inputTokens, outputTokens }is on the response object (src/ai/llm-client.ts:158). The digest is wired toconfig.digestModel(default Sonnet 4.6) and runs on every structured workflow event, so it materially contributes to spend — yet a fleet operator cannot answer "how much is the digest costing me per workflow" without instrumenting it ad-hoc. (3) Latency. NolatencyMsfield is captured for any of the 1-to-N LLM calls in single-pass or map-reduce; under load this is the prime suspect when a workflow's perceived start-to-first-comment delay drifts. (4) Trust-boundary effectiveness.enforceOwnerDirectives(:351) only logs whendropped > 0. Thedropped: 0case — the steady-state, attestation-worthy event — is unobservable, so an attacker successfully extracting a directive is the same log shape as no attack at all.The fix is a small, contained structured-event family (
digest.skipped,digest.call.completed,digest.completed,digest.failed) emitted from the existing chokepoints, mirroring the pattern documented atdocs/operate/observability.md:135(dispatcher) and:83(retry). No new dependencies;response.usagealready carries the token counts,parseStructuredResponse(...).strategy(src/ai/structured-output.ts:57) already carries strict-vs-tolerant for free.Diagram
flowchart TD Caller["Workflow handler<br/>triage / plan / implement<br/>review / resolve / remember"] Fetch["fetchAndBuildDigest<br/>discussion-digest.ts:541"] Build["buildDiscussionDigest<br/>discussion-digest.ts:370"] Classify["classifyComments<br/>partitions owner/other/bot"] Skip["humanCount === 0<br/>return no-comments<br/>discussion-digest.ts:378"] Pack["estimate tokens<br/>chunk if over budget"] Single["Single-pass LLM call<br/>runDigestCall :313"] Map["Map: N LLM calls"] Reduce["Reduce: 1 LLM call"] Enforce["enforceOwnerDirectives<br/>drop forged authors :351"] Done["return ok=true"] Caller --> Fetch Fetch -.catch warn.-> FetchFail["log.warn comment fetch failed<br/>:567"] Fetch --> Build Build --> Classify Classify --> Skip Classify --> Pack Pack -- one chunk --> Single Pack -- many chunks --> Map Map --> Reduce Single -.catch warn.-> CallFail["log.warn LLM call failed :329<br/>log.warn parse rejected :334"] Map -.catch warn.-> CallFail Reduce -.catch warn.-> CallFail Single --> Enforce Reduce --> Enforce Map -.info: chunkCount only.-> MapLog["log.info map-reduce :411"] Enforce -.only when dropped greater than 0.-> Drop["log.warn dropped directives :360"] Enforce --> Done classDef gap fill:#922b21,color:#ffffff,stroke:#641e16 classDef ok fill:#196f3d,color:#ffffff,stroke:#0e4d2a classDef path fill:#1f4e79,color:#ffffff,stroke:#11314d class Skip,Done,Single,Map,Reduce gap class FetchFail,CallFail,Drop,MapLog ok class Caller,Fetch,Build,Classify,Pack,Enforce pathRed nodes = chokepoints emitting no structured event today. Green nodes = the five existing free-form warn/info call sites. Blue = unobserved code path.
Rationale
The digest is on the forced prefix of six structured workflows. Each workflow event therefore costs ≥1 digest LLM call (single-pass) or N+1 calls (map-reduce on large threads). With
config.digestModeldefaulting to Sonnet 4.6, the digest can comfortably account for 5-30% of a workflow's per-event spend, and it is the dominant contributor to the wall-clock between webhook receipt and first agent token on a comment-heavy thread (the synchronousawait fetchAndBuildDigest(...)blockspostStartingComment, see e.g.src/workflows/handlers/triage.ts:115-127). Yet none of these costs are pin-pointable from logs today.This matters for three operational scenarios the project already invests in. (a) The existing token-usage instrumentation that
feat(observability): log + persist SDK token usage on executions(commit5407dcd) added persistsinput_tokens/output_tokensfor the agent SDK call only; the digest's tokens roll up under nothing. An operator runningSELECT sum(input_tokens) FROM executions WHERE workflow = 'triage'undercounts true triage spend by the digest portion. (b) Issuefeat(observability): add scheduler.scan.* lifecycle events(open, area: observability) explicitly models the pattern this proposal would reuse — a tick-and-summary event family — so addingdigest.*keeps the codebase's observability surface uniform rather than divergent. (c)enforceOwnerDirectivesis a load-bearing security boundary (CLAUDE.md§ Comment-aware workflows — "Directives are re-checked post-parse against the classified owner authors, so the boundary does not depend on the model"). The currentdropped > 0-only log means an SRE has no positive-control evidence the boundary fires when it should, and no baseline to alert on a sudden spike in drops (which is the signature of a prompt-injection or model-regression event the design exists to catch).The implementation cost is small: ~15 lines of new logging in
discussion-digest.ts, one event-key constants block, one Zod discriminated union pinning the field shape (mirrorsrc/orchestrator/log-fields.ts:55#DispatcherOfferLogSchema), and a new## Discussion digest log fieldstable indocs/operate/observability.md. No schema, no new dep, no behaviour change. Compare: the dispatcherdigest.*analogue (feat(observability): structured dispatcher.offer.{sent,accepted,rejected,timed_out} events, closed) was a similarly contained PR.References
Internal:
src/workflows/discussion-digest.ts:313—runDigestCall, the LLM-call chokepoint that dropsresponse.usageand emits no completion event.src/workflows/discussion-digest.ts:351—enforceOwnerDirectives, the trust boundary that only logs thedropped > 0minority case.src/workflows/discussion-digest.ts:370—buildDiscussionDigest, where theno-commentsreturn at:380is silent.src/ai/llm-client.ts:158—LLMResponse.usage: LLMUsagecarryinginputTokens+outputTokens, already populated, currently discarded by the digest call site.src/ai/structured-output.ts:57—StructuredResult.strategy: "strict" | "tolerant", the same JSON-quality signal the closedtriage response recovered via tolerant parserline atsrc/orchestrator/triage.ts:356already surfaces; digest discards it.docs/operate/observability.md:135—## Dispatcher log fields, the pattern to mirror (event-key constants + discriminated union + test).src/orchestrator/log-fields.ts:55#DispatcherOfferLogSchema— the strict-Zod-per-event-shape template.src/workflows/handlers/triage.ts:115,plan.ts:41,implement.ts:54,review.ts:62,resolve.ts:100,remember.ts:47— the six consumers that synchronously block on the un-observedawait fetchAndBuildDigest(...).External:
gen_ai.client.token.usage,gen_ai.request.duration, and per-call event attributes the proposeddigest.*events would be a Pino-side analogue of.Suggested Next Steps
src/workflows/digest-log-fields.tswith aDIGEST_LOG_EVENTSconstant ("digest.skipped" | "digest.call.completed" | "digest.completed" | "digest.failed") and az.discriminatedUnion("event", ...)schema mirroringsrc/orchestrator/log-fields.ts:55#DispatcherOfferLogSchema. Co-locate a test that fails CI on event-key drift.runDigestCall(src/workflows/discussion-digest.ts:313), captureperformance.now()deltas and emitdigest.call.completedwith{ phase: "extract" | "reduce", input_tokens, output_tokens, latency_ms, strategy }on the success path, wherestrategyisparsed.strategy. Reuseresponse.usage.inputTokens/response.usage.outputTokensfrom the existingclient.createresult.buildDiscussionDigest(:370), emitdigest.skippedat thehumanCount === 0early-return (:378) with{ comment_counts: { owner, other, bot } }, anddigest.completedat the final success return with{ chunks, total_latency_ms, directives_kept, directives_dropped, has_prior_bot_output, untrusted_context_count, conversation_summary_chars }.fetchAndBuildDigest(:541), emitdigest.failedat the fetch catch (:567) and on everyok: falsepropagated return, with the existingDigestResult.reasonenum (no-comments | llm-error | parse-error) carried as a field.## Discussion digest log fieldssection todocs/operate/observability.mddocumenting the four event keys, fields, and at least one suggested alert (e.g.count(event = "digest.failed" AND reason = "parse-error") by repoover 15-minute windows as a model-quality regression alarm).executions.input_tokens/output_tokensby summing the agent SDK and digest counters at workflow-end so the existingSELECT sum(input_tokens) FROM executionsqueries account for digest spend without per-handler arithmetic.Areas Evaluated
src/workflows/discussion-digest.tsend-to-end (629 lines) and inventoried all five log call sites.triage,plan,implement,review,resolve,remember) callsfetchAndBuildDigestsynchronously before any user-facing comment is posted, viagrep -rn "fetchAndBuildDigest"againstsrc/.docs/operate/observability.md(sections at lines 83, 110, 122, 135, 151, 163, 220),src/orchestrator/log-fields.ts(dispatcher),src/utils/retry-log-fields.ts(retry),src/orchestrator/triage.ts(triage) — none cover discussion-digest.LLMResponse.usageis already populated bysrc/ai/llm-client.ts:158, andparseStructuredResponse(...).strategyis already in scope (src/ai/structured-output.ts:57), so adding the proposed fields requires no upstream-API changes.research-labelled issue (29 total): no overlap with this proposal. The closest neighbours (feat(observability): structured dispatcher.offer.*, closed;feat(observability): periodic fleet-state gauge snapshot, closed;feat(observability): add scheduler.scan.* lifecycle events, open) cover different chokepoints.git log --oneline -20: the most recent observability commit (6713cbf feat(observability): add structured retry.* events) follows the exact event-family pattern this proposal would extend to the digest, confirming the codebase is actively converging on structurednoun.verbevents.Generated by the scheduled research action on 2026-06-13