You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Every webhook handler reaches GitHub at least twice (GraphQL fetch + tracking-comment update) and most paginated PR fetches in src/core/fetcher.ts fan out to four parallel paginate(...) calls, but the operator has zero per-request latency signal. The single octokit observability chokepoint, installRateLimitHooks in src/utils/octokit-observability.ts:104-138, registers hook.after("request", ...) and hook.error("request", ...) only: neither closes over a start timestamp, neither emits a duration_ms, and the .strict()GithubApiLogFieldsSchema at src/utils/octokit-observability.ts:39-54 explicitly forbids any new field on the line, so a drop-in addition fails the co-located strict-schema test at test/utils/octokit-observability.test.ts:109-118.
The result is a latency blind spot at the layer that dominates end-to-end webhook lag. When core/fetcher.ts GraphQL pagination, core/tracking-comment.ts REST writes, or the proposal-poller reaction scans get slow, the existing github.api.request line still reads status: 200 and the existing dispatcher.offer.acceptedoffer_latency_ms (docs/operate/observability.md:121-126) keeps reading healthy: the slow leg is the GitHub call, not the dispatcher round-trip, but neither event surfaces it. A hook.wrap("request", ...) closure that captures start = Date.now() and threads duration_ms = Date.now() - start into both hook.after and hook.error log lines closes the gap, mirrors the existing offer_latency_ms / queue_wait_ms_ms naming idiom, and extends (not replaces) the installRateLimitHooks plugin so the single observableOctokit() subclass wired in src/app.ts:82-89 keeps owning every emitted line.
The same closure can fire a new github.api.slow warn event when duration_ms >= SLOW_REQUEST_MS (start at 3000ms — about 3x typical REST latency, well below the abuse-detection 10s search-API ceiling), giving an actionable alert on GitHub-side degradations without dialing every request to info volume.
GitHub-side latency is the single largest variable component of runPipeline end-to-end time. src/core/fetcher.ts issues three parallel paginate(...) calls per PR (PR_FIRST_QUERY, PR_COMMENTS_QUERY, PR_REVIEWS_QUERY) plus a per-review REVIEW_COMMENTS_QUERY fan-out capped at REVIEW_OVERFLOW_CONCURRENCY = 5 (src/core/fetcher.ts:13-16); a single slow page from any of them stalls the whole handler before the agent ever runs, but today the operator sees only the per-stage pipeline.stage line and has to guess whether the stall was network, GraphQL slowness, or local CPU. Adding duration_ms to the existing line — no new sink, no new schema, no extra log volume at default info — makes p50 / p95 / p99 per-route directly queryable in the existing log shipper.
The github.api.slow warn variant is the operational lever. It mirrors how github.api.rate_limit_low already converts a per-request debug line into a warn once remaining crosses RATE_LIMIT_LOW_WATER (src/utils/octokit-observability.ts:86-92), reusing a pattern operators already know rather than introducing a new alert taxonomy. The pre-existing dispatcher.offer.acceptedoffer_latency_ms documentation block (docs/operate/observability.md:234) already calls out p99 tracking as the operating signal for slow daemon responses; symmetric coverage on GitHub's leg means an operator chasing a webhook-to-tracking-comment lag regression can finally tell whether the bottleneck is the daemon (offer_latency_ms) or GitHub (duration_ms) without ad-hoc instrumentation.
Cost is bounded: hook.wrap is the octokit-blessed wrapper hook (see octokit/core.js docs); Date.now() is constant-time; the per-request log line already exists, only the field set grows by one integer. The closed sibling issue #170 (feat(observability): log octokit rate-limit headers via hook.after for per-installation quota visibility) added the rate-limit half; the latency half is the natural follow-on within the same plugin.
References
Internal:
src/utils/octokit-observability.ts:39-54 — .strict()GithubApiLogFieldsSchema, must gain a duration_ms: z.number().int().nonnegative() field.
src/utils/octokit-observability.ts:75-98 — rateLimitFields(), the pure builder that becomes the natural seam for accepting durationMs and re-emitting it on every event variant.
src/utils/octokit-observability.ts:104-138 — installRateLimitHooks(), the only place a hook.wrap / hook.before belongs (single chokepoint).
src/app.ts:82-89 — App({ Octokit: observableOctokit() }), the wiring that propagates the plugin to every app.octokit + getInstallationOctokit().
src/orchestrator/connection-handler.ts:8 — second consumer of observableOctokit(); both pick up the new field for free.
src/core/fetcher.ts:13-78 — the four parallel paginated GraphQL calls per PR that this measurement is for.
test/utils/octokit-observability.test.ts:89-118 — strict-schema drift test that pins the schema; will need a duration_ms case alongside.
docs/operate/observability.md:73-75 — the github.api.* event table that documents the field set today; must list duration_ms + the new github.api.slow event.
External:
octokit/core.js hooks documentation: https://github.com/octokit/core.js#hooks — hook.wrap("request", async (request, options) => {...}) is the documented pattern for measuring request duration.
before-after-hook README (the library backing octokit.hook): https://github.com/gr2m/before-after-hook — distinguishes before / wrap / after / error semantics; wrap is the recommended pattern when the before-side state (start time) must be visible to the after-side.
Extend GithubApiLogFieldsSchema (src/utils/octokit-observability.ts:39-54) with duration_ms: z.number().int().nonnegative() and add GITHUB_API_LOG_EVENTS.slow = "github.api.slow".
Replace hook.after("request", ...) with hook.wrap("request", ...) in installRateLimitHooks (src/utils/octokit-observability.ts:104-138): capture start = Date.now() before await request(options), compute duration_ms after, thread it through rateLimitFields (rename internal arg or add an overload). Mirror the same delta capture inside hook.error("request", ...) so error lines carry latency too.
Add a SLOW_REQUEST_MS constant (start at 3000) and emit github.api.slow at warn when duration_ms >= SLOW_REQUEST_MS, taking precedence over github.api.request but yielding to github.api.rate_limit_low when both apply (or emit both — both are useful operational signals; document the precedence).
Update test/utils/octokit-observability.test.ts:32-118 with two new cases: (a) rateLimitFields accepts and round-trips durationMs, (b) strict schema accepts a github.api.slow line and rejects a latency_ms typo.
Update docs/operate/observability.md:73-75 event table with the new duration_ms field on every existing event + a row for github.api.slow. Add a one-line operations note mirroring the existing offer_latency_ms p99 guidance.
Areas Evaluated
src/utils/octokit-observability.ts — the only octokit observability chokepoint; confirmed hook.after and hook.error only, no timer / no duration_ms.
src/app.ts:82-89 + src/orchestrator/connection-handler.ts:8 — the two callers of observableOctokit(); confirmed both inherit any plugin change for free.
src/logger.ts — confirmed duration_ms would flow through the existing REDACT_PATHS + errSerializer redaction pipeline without modification.
src/core/fetcher.ts — confirmed the per-handler GraphQL call volume that justifies the per-request granularity over a single coarse pipeline-stage metric.
docs/operate/observability.md:73-75,121-126,234 — confirmed the existing _ms naming convention (offer_latency_ms, queue_wait_ms) and the symmetric latency story this finding completes.
test/utils/octokit-observability.test.ts — confirmed the existing strict-schema test is the drift gate the new field must pass.
Finding
Every webhook handler reaches GitHub at least twice (GraphQL fetch + tracking-comment update) and most paginated PR fetches in
src/core/fetcher.tsfan out to four parallelpaginate(...)calls, but the operator has zero per-request latency signal. The single octokit observability chokepoint,installRateLimitHooksinsrc/utils/octokit-observability.ts:104-138, registershook.after("request", ...)andhook.error("request", ...)only: neither closes over a start timestamp, neither emits aduration_ms, and the.strict()GithubApiLogFieldsSchemaatsrc/utils/octokit-observability.ts:39-54explicitly forbids any new field on the line, so a drop-in addition fails the co-located strict-schema test attest/utils/octokit-observability.test.ts:109-118.The result is a latency blind spot at the layer that dominates end-to-end webhook lag. When
core/fetcher.tsGraphQL pagination,core/tracking-comment.tsREST writes, or theproposal-pollerreaction scans get slow, the existinggithub.api.requestline still readsstatus: 200and the existingdispatcher.offer.acceptedoffer_latency_ms(docs/operate/observability.md:121-126) keeps reading healthy: the slow leg is the GitHub call, not the dispatcher round-trip, but neither event surfaces it. Ahook.wrap("request", ...)closure that capturesstart = Date.now()and threadsduration_ms = Date.now() - startinto bothhook.afterandhook.errorlog lines closes the gap, mirrors the existingoffer_latency_ms/queue_wait_ms_msnaming idiom, and extends (not replaces) theinstallRateLimitHooksplugin so the singleobservableOctokit()subclass wired insrc/app.ts:82-89keeps owning every emitted line.The same closure can fire a new
github.api.slowwarn event whenduration_ms >= SLOW_REQUEST_MS(start at 3000ms — about 3x typical REST latency, well below the abuse-detection 10s search-API ceiling), giving an actionable alert on GitHub-side degradations without dialing every request to info volume.Diagram
flowchart LR Caller["pipeline / handler<br/>calls octokit.request or graphql"] Wrap["hook.wrap request<br/>start = Date.now"] Fetch["GitHub HTTP fetch<br/>REST / GraphQL"] After["hook.after request<br/>compute duration_ms<br/>= Date.now - start"] Schema["GithubApiLogFieldsSchema strict<br/>+ duration_ms"] Decide{"duration_ms<br/>>= SLOW_REQUEST_MS<br/>or remaining<br/>< RATE_LIMIT_LOW_WATER?"} Slow["warn github.api.slow<br/>route, status, duration_ms,<br/>rate_limit fields"] Low["warn github.api.rate_limit_low<br/>route, status, duration_ms,<br/>rate_limit fields"] Req["debug github.api.request<br/>route, status, duration_ms,<br/>rate_limit fields"] ErrHook["hook.error request<br/>compute duration_ms<br/>on rate-limit error"] Warn["warn github.api.rate_limit_warning<br/>route, status, retry_after_s,<br/>duration_ms"] Caller --> Wrap Wrap --> Fetch Fetch -->|2xx 3xx 4xx response| After Fetch -->|throw 429 or 403 secondary| ErrHook After --> Schema Schema --> Decide Decide -->|slow| Slow Decide -->|low quota| Low Decide -->|healthy| Req ErrHook --> Warn Wrap:::added Slow:::added Decide:::added After:::current ErrHook:::current Req:::current Low:::current Warn:::current Schema:::current classDef added fill:#196f3d,color:#ffffff,stroke:#0e4d27 classDef current fill:#1f4e79,color:#ffffff,stroke:#143d61Rationale
GitHub-side latency is the single largest variable component of
runPipelineend-to-end time.src/core/fetcher.tsissues three parallelpaginate(...)calls per PR (PR_FIRST_QUERY,PR_COMMENTS_QUERY,PR_REVIEWS_QUERY) plus a per-reviewREVIEW_COMMENTS_QUERYfan-out capped atREVIEW_OVERFLOW_CONCURRENCY = 5(src/core/fetcher.ts:13-16); a single slow page from any of them stalls the whole handler before the agent ever runs, but today the operator sees only the per-stagepipeline.stageline and has to guess whether the stall was network, GraphQL slowness, or local CPU. Addingduration_msto the existing line — no new sink, no new schema, no extra log volume at defaultinfo— makesp50/p95/p99per-route directly queryable in the existing log shipper.The
github.api.slowwarn variant is the operational lever. It mirrors howgithub.api.rate_limit_lowalready converts a per-request debug line into a warn onceremainingcrossesRATE_LIMIT_LOW_WATER(src/utils/octokit-observability.ts:86-92), reusing a pattern operators already know rather than introducing a new alert taxonomy. The pre-existingdispatcher.offer.acceptedoffer_latency_msdocumentation block (docs/operate/observability.md:234) already calls out p99 tracking as the operating signal for slow daemon responses; symmetric coverage on GitHub's leg means an operator chasing a webhook-to-tracking-comment lag regression can finally tell whether the bottleneck is the daemon (offer_latency_ms) or GitHub (duration_ms) without ad-hoc instrumentation.Cost is bounded:
hook.wrapis the octokit-blessed wrapper hook (see octokit/core.js docs);Date.now()is constant-time; the per-request log line already exists, only the field set grows by one integer. The closed sibling issue #170 (feat(observability): log octokit rate-limit headers via hook.after for per-installation quota visibility) added the rate-limit half; the latency half is the natural follow-on within the same plugin.References
Internal:
src/utils/octokit-observability.ts:39-54—.strict()GithubApiLogFieldsSchema, must gain aduration_ms: z.number().int().nonnegative()field.src/utils/octokit-observability.ts:75-98—rateLimitFields(), the pure builder that becomes the natural seam for acceptingdurationMsand re-emitting it on every event variant.src/utils/octokit-observability.ts:104-138—installRateLimitHooks(), the only place ahook.wrap/hook.beforebelongs (single chokepoint).src/app.ts:82-89—App({ Octokit: observableOctokit() }), the wiring that propagates the plugin to everyapp.octokit+getInstallationOctokit().src/orchestrator/connection-handler.ts:8— second consumer ofobservableOctokit(); both pick up the new field for free.src/core/fetcher.ts:13-78— the four parallel paginated GraphQL calls per PR that this measurement is for.test/utils/octokit-observability.test.ts:89-118— strict-schema drift test that pins the schema; will need aduration_mscase alongside.docs/operate/observability.md:73-75— thegithub.api.*event table that documents the field set today; must listduration_ms+ the newgithub.api.slowevent.External:
hook.wrap("request", async (request, options) => {...})is the documented pattern for measuring request duration.Date.now()delta pattern explicitly applied via abefore-after-hookwrap.octokit.hook): https://github.com/gr2m/before-after-hook — distinguishesbefore/wrap/after/errorsemantics;wrapis the recommended pattern when the before-side state (start time) must be visible to the after-side.x-ratelimit-*headers the existing plugin already reads.Suggested Next Steps
GithubApiLogFieldsSchema(src/utils/octokit-observability.ts:39-54) withduration_ms: z.number().int().nonnegative()and addGITHUB_API_LOG_EVENTS.slow = "github.api.slow".hook.after("request", ...)withhook.wrap("request", ...)ininstallRateLimitHooks(src/utils/octokit-observability.ts:104-138): capturestart = Date.now()beforeawait request(options), computeduration_msafter, thread it throughrateLimitFields(rename internal arg or add an overload). Mirror the same delta capture insidehook.error("request", ...)so error lines carry latency too.SLOW_REQUEST_MSconstant (start at3000) and emitgithub.api.slowatwarnwhenduration_ms >= SLOW_REQUEST_MS, taking precedence overgithub.api.requestbut yielding togithub.api.rate_limit_lowwhen both apply (or emit both — both are useful operational signals; document the precedence).test/utils/octokit-observability.test.ts:32-118with two new cases: (a)rateLimitFieldsaccepts and round-tripsdurationMs, (b) strict schema accepts agithub.api.slowline and rejects alatency_mstypo.docs/operate/observability.md:73-75event table with the newduration_msfield on every existing event + a row forgithub.api.slow. Add a one-line operations note mirroring the existingoffer_latency_msp99 guidance.Areas Evaluated
src/utils/octokit-observability.ts— the only octokit observability chokepoint; confirmedhook.afterandhook.erroronly, no timer / noduration_ms.src/app.ts:82-89+src/orchestrator/connection-handler.ts:8— the two callers ofobservableOctokit(); confirmed both inherit any plugin change for free.src/logger.ts— confirmedduration_mswould flow through the existingREDACT_PATHS+errSerializerredaction pipeline without modification.src/core/fetcher.ts— confirmed the per-handler GraphQL call volume that justifies the per-request granularity over a single coarse pipeline-stage metric.docs/operate/observability.md:73-75,121-126,234— confirmed the existing_msnaming convention (offer_latency_ms,queue_wait_ms) and the symmetric latency story this finding completes.test/utils/octokit-observability.test.ts— confirmed the existing strict-schema test is the drift gate the new field must pass.research-labelled issues — confirmed no overlap: the closed sibling feat(observability): log octokit rate-limit headers via hook.after for per-installation quota visibility #170 (rate-limit headers) covers quota state; the open observability issues (daemon.connection.*,scheduler.scan.*,circuit.*triage,retry.*) live in unrelated subsystems.Generated by the scheduled research action on 2026-06-09