Skip to content

feat(webhooks): outbound webhooks on issue status change#34

Open
lml2468 wants to merge 3 commits into
mainfrom
feat/outbound-webhooks
Open

feat(webhooks): outbound webhooks on issue status change#34
lml2468 wants to merge 3 commits into
mainfrom
feat/outbound-webhooks

Conversation

@lml2468

@lml2468 lml2468 commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds GitHub-style outbound webhooks: when an issue's status changes, Multica POSTs a signed JSON payload to subscribed external URLs. The reverse direction of the existing inbound autopilot webhooks.

  • Workspace-level webhooks (GitHub "org" webhook) — fire for every issue in the workspace.
  • Project-level webhooks (GitHub "repo" webhook) — fire only for issues in that project.
  • v1 event: issue.status_changed. Delivery is async fire-and-forget with bounded concurrency, short retry backoff, and HMAC-SHA256 signing (X-Multica-Signature-256, GitHub-compatible).

Backend

  • Migration 121 + sqlc webhook_subscription table (nullable project_id; cleartext signing secret, consistent with autopilot_trigger).
  • outwebhook.Dispatcher — subscription selection, event filtering, detached delivery off the event-bus/request path, semaphore-bounded concurrency.
  • webhooksign — shared HMAC sign/verify; inbound verifier now delegates to it.
  • CRUD at /api/webhook-subscriptions, owner/admin-gated, with an SSRF guard rejecting loopback/link-local/private/metadata targets.
  • Fix: batch issue update now publishes prev_status (was omitted).

Frontend

  • Settings → Webhooks tab (workspace) and a collapsible Webhook section in the project detail right panel (below Resources).
  • Shared useWebhookSection hook + WebhookDialogs consumed by both surfaces; secret shown once on create.
  • core types, lenient zod schemas (parseWithFallback), api client, query/mutation hooks; en/ja/ko/zh-Hans i18n.

Testing

  • Go: outwebhook (selection/filtering/HMAC/retry), webhooksign, handler validation incl. SSRF cases — all pass; go build/vet/gofmt clean.
  • TS: full pnpm typecheck, pnpm lint (0 errors), webhook component tests + core schema tests, i18n parity.
  • Manual (Dockerized self-host): verified end-to-end delivery with byte-exact HMAC, SSRF rejection (400), batch prev_status, and both UI surfaces.

Notes / deferred

  • No delivery-history persistence in v1 (async fire-and-forget) — asymmetric with the inbound webhook_delivery table; revisit if observability/replay is needed.
  • SSRF guard checks IP literals + localhost; does not resolve DNS (DNS-rebinding out of scope for v1).
  • Signing secret stored cleartext (matches existing autopilot_trigger.signing_secret).

lml2468 added 2 commits June 16, 2026 09:53
Add GitHub-style outbound webhooks: when an issue's status changes,
Multica POSTs a signed JSON payload to subscribed external URLs. The
reverse direction of the existing inbound autopilot webhooks.

- Migration 121 + sqlc: webhook_subscription table (nullable project_id
  distinguishes workspace-level "org" webhooks from project-level "repo"
  webhooks); cleartext signing secret, consistent with autopilot_trigger.
- outwebhook.Dispatcher: subscription selection (workspace- vs
  project-scoped), event filtering, async fire-and-forget delivery with
  bounded concurrency (semaphore) and short retry backoff. All work is
  detached off the event-bus/request path.
- webhooksign: shared GitHub-compatible HMAC-SHA256 sign/verify; the
  inbound verifier now delegates to it so the two can't drift.
- webhook_listeners: subscribe to issue:updated (status_changed) and
  dispatch; SupportedEventTypes is the single source of truth tying the
  allow-list to the listener.
- Handler CRUD at /api/webhook-subscriptions, owner/admin-gated, with
  SSRF guard rejecting loopback/link-local/private/metadata targets.
- Fix batch issue update to publish prev_status (was omitted, delivering
  empty previous_status for batch transitions).
- Dedup: shared generateCredential + secretHint helpers.
Surface outbound webhook management on two shared surfaces:

- Workspace level: a new Settings → Webhooks tab.
- Project level: a collapsible section in the project detail right panel,
  directly below Resources.

- core: WebhookSubscription types, lenient zod schemas via parseWithFallback,
  api client methods, and TanStack Query hooks (keyed on wsId + projectId).
- views: shared useWebhookSection hook + WebhookDialogs (secret-once +
  delete-confirm) consumed by both surfaces — list/create/toggle/delete and
  the signing secret shown once on create, permission-gated to owner/admin.
- i18n: en/ja/ko/zh-Hans strings; parity test passes.

@mochashanyao mochashanyao left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Octo-Q · automated review]

Verdict: Approve — no blocking findings; notes below (data-flow traced).


PR#34 Review Report — feat(webhooks): outbound webhooks on issue status change

Reviewer: Octo-Q (automated review)
Head SHA: 003d5342bbe1e70dcb9e3fd43bc8f503e6e7cb08
Repo: Mininglamp-OSS/Octo-Q
Routing: complexity=security_sensitive (automated review)
Scope: 38 files, +2777/−33 — migration, dispatcher, CRUD handlers, SSRF guard, HMAC signing, frontend UI, i18n


1. Verification Summary

Area Status Evidence
Authorization (CRUD) All 4 handlers gate on requireWorkspaceRole(..., "owner", "admin")webhook_subscription.go:185,222,248,280
Workspace scoping loadWebhookSubscription scopes by workspace_id via GetWebhookSubscriptionInWorkspacewebhook_subscription.go:296-316
Project ownership Create validates GetProjectInWorkspace(ID, WorkspaceID)webhook_subscription.go:237-247
SSRF guard (IP literals) Rejects loopback, link-local, private, unspecified + localhostwebhook_subscription.go:119-143
HMAC signing webhooksign.Sign/Verify — constant-time hmac.Equal, GitHub-compatible sha256=<hex>webhooksign.go:27-44
Secret entropy 32 bytes crypto/rand, whsec_ prefix, URL-safe base64 — autopilot_webhook.go:42-48, webhook_subscription.go:97
Async dispatch DispatchIssueStatusChangedgo d.dispatch(ev) — never blocks event bus — dispatcher.go:121-123
Concurrency bound Semaphore chan struct{} capped at 16 — dispatcher.go:57,155-158
prev_status fix (batch) prevIssue.Status published in batch update payload — issue.go:2919 (matches single-update path at issue.go:2417)
Data flow: listener → dispatcher webhook_listeners.go extracts status_changed, issue, prev_status from event payload → DispatchIssueStatusChanged
Frontend admin gate `canManage = role === "owner"
Secret shown once SecretOnce set only in create handler response — webhook_subscription.go:244
Migration Proper FK cascades, partial index on enabled, project-scoped index — 121_webhook_subscription.up.sql
Tests Handler validation (URL/events/secret), dispatcher (matching/filtering/retry/HMAC), webhooksign roundtrip, frontend component tests

2. Findings

F1 — SSRF via HTTP redirect (P2)

Diff-scope: new (introduced by this PR)

validateWebhookURL (webhook_subscription.go:119-143) checks the initial URL for internal targets (loopback, link-local, private, localhost). However, the delivery HTTP client (dispatcher.go:89: &http.Client{Timeout: deliveryTimeout}) uses Go's default redirect policy — it follows up to 10 redirects (301/302/303/307/308).

Attack path: An admin creates a webhook pointing at https://attacker.example.com/redirect which returns 307 Temporary Redirect → http://169.254.169.254/latest/meta-data/. Go's HTTP client follows the 307, preserving POST method and body. The signed webhook payload (including issue data) is delivered to the cloud metadata endpoint.

Go's client strips Authorization headers on cross-domain redirects but does NOT strip custom headers like X-Octo-Q-Signature-256. The body (full issue data) is also preserved on 307.

Mitigation: Admin-gated creation; DNS-rebinding already documented as v1 OOS. This redirect-based SSRF is the same class of gap.

Recommendation: Add a CheckRedirect policy that either rejects redirects to internal IPs or caps redirect count to 0 for webhook delivery:

client := &http.Client{
    Timeout: deliveryTimeout,
    CheckRedirect: func(req *http.Request, via []*http.Request) error {
        return http.ErrUseLastResponse // don't follow redirects
    },
}

F2 — Dispatch goroutine accumulation under burst (P2)

Diff-scope: new (introduced by this PR)

dispatch() (dispatcher.go:127) runs in a detached goroutine. Inside, it blocks on d.sem <- struct{}{} (line 155) when all 16 delivery slots are full. Each delivery can take up to ~38s (30s timeout × 1 initial + 2 retries with 1s+4s backoff).

Under a burst of N status changes with M matching subscriptions: N dispatch goroutines accumulate, each holding the marshaled payload (~size of IssueResponse). If N×M >> 16, dispatch goroutines queue on the semaphore.

Practical impact: Low for v1 workloads. With 16 slots and ~38s worst-case per delivery, throughput is ~25 deliveries/minute. A burst of 100 status changes × 5 webhooks = 500 deliveries would take ~20 minutes to drain, with 500 goroutines queued.

Recommendation: Consider a bounded work queue (e.g., discard events when queue depth exceeds a threshold) or a context timeout on semaphore acquisition.

F3 — Signing secret stored cleartext (P2 — pre-existing pattern)

Diff-scope: pre-existing (matches autopilot_trigger.signing_secret from migration 093)

secret TEXT NOT NULL in webhook_subscription stores the HMAC key in cleartext. The PR body and migration comment acknowledge this and reference MUL-2334 for a future secrets-at-rest infrastructure.

Recommendation: No action needed for v1 — consistent with existing pattern. Track in MUL-2334.

F4 — Full IssueResponse in outbound payload (P2 — design note)

Diff-scope: new (introduced by this PR)

outboundPayload.Issue (dispatcher.go:108) serializes the complete handler.IssueResponse into the webhook body. This includes all issue fields (description, internal metadata, assignee details, etc.).

This is by design (webhook subscribers need issue context), but workspace admins should be aware that all issue data is sent to external endpoints.

Recommendation: Consider documenting the payload schema for operators, or offering a field-level filter in v2.


3. Suggestions

  1. Redirect policy (addresses F1): Set CheckRedirect to reject or cap redirects in the delivery HTTP client. This closes the redirect-based SSRF vector without requiring DNS resolution.

  2. Bounded dispatch queue (addresses F2): Replace the bare semaphore with a buffered channel + discard-on-full pattern, or add a context.WithTimeout on semaphore acquisition to prevent unbounded goroutine accumulation.

  3. Payload schema documentation (addresses F4): Document the outbound payload shape in the PR body or API docs so operators know what data is sent to subscribers.


4. Additional Findings

  • No delivery-history persistence: v1 is fire-and-forget with no delivery log (asymmetric with inbound webhook_delivery table). Acknowledged in PR body. Not blocking — observability can be added in v2.
  • No secret rotation via API: UpdateWebhookSubscription cannot rotate the signing secret. Operators must delete + recreate. Acceptable for v1.
  • ListWebhookSubscriptionsByProject doesn't verify project ownership: The query filters by (workspace_id, project_id) which is safe — returns empty if project doesn't belong to workspace. No information leakage.

5. Data-Flow Trace

Consumed Data Upstream Source Flows Correctly?
payload["status_changed"] BatchUpdateIssues / UpdateIssue: prevIssue.Status != issue.Statusissue.go:2912,2408 ✅ — boolean, set when status differs
payload["issue"] (IssueResponse) issueToResponse(issue, prefix) from DB-updated row ✅ — full post-update issue
payload["prev_status"] prevIssue.Status — loaded before update via GetIssueForUpdate ✅ — batch path fixed by this PR (issue.go:2919), single path already had it (issue.go:2417)
e.WorkspaceID events.Event.WorkspaceID set by h.publish() ✅ — scoped to request workspace
e.ActorType / e.ActorID h.resolveActor(r, userID, workspaceID) — resolves real actor ✅ — not observer, actual trigger
issue.ProjectID IssueResponse.ProjectID (nullable *string) ✅ — listener handles nil (webhook_listeners.go:42-44)
sub.Secret (for signing) webhook_subscription.secret column, loaded by ListEnabledWebhookSubscriptionsForDispatch ✅ — cleartext from DB, used directly for HMAC
sub.Events (for matching) webhook_subscription.events JSONB column, unmarshaled in subscribedToEvent ✅ — malformed JSON handled gracefully (skip, not panic)

6. Blind-Point Checklist (R5 — security_sensitive)

C1 — Dual-path parity:

  • Subscribe ↔ unsubscribe: N/A — webhooks are independent CRUD operations, no paired subscribe/unsubscribe paths.
  • Create-gate ↔ execute-path: ✅ — Create is admin-gated; dispatch reads from DB (no re-authorization needed for system-driven delivery).
  • Parallel implementations: ✅ — No duplicate webhook dispatch paths.

C2 — Control-flow ordering / nesting reuse:

  • webhooksign.Sign/Verify shared by inbound + outbound: ✅ — single implementation, no double-signing or ordering issue.
  • generateCredential shared by webhook tokens + signing secrets: ✅ — pure function, no side effects.
  • SSRF guard non-canonical forms: ✅ — url.Parse handles encoded hosts; net.ParseIP handles IPv4/IPv6; strings.EqualFold handles LOCALHOST/Localhost. No bypass via case or encoding.

C3 — Authorization boundary ≠ capability boundary:

  • ✅ — All /api/webhook-subscriptions endpoints are behind authenticated session + requireWorkspaceRole("owner", "admin"). No unauthenticated or member-level access path.
  • ✅ — loadWebhookSubscription scopes by workspace_id, preventing cross-workspace access via UUID enumeration.
  • ✅ — Dispatch runs system-internally (no user-facing endpoint), triggered only by event bus.

Verdict

No P0 or P1 findings. All identified issues are P2 (SSRF redirect gap, burst goroutine accumulation, cleartext secret, payload exposure) — none make a working path unavailable, produce user-visible incorrect data, or make the system worse than before this PR.

The authorization model is sound (admin-gated, workspace-scoped), the signing implementation is correct (constant-time HMAC), the data flow is properly traced from issue update through event bus to delivery, and the SSRF guard covers the primary attack vectors (IP literals + localhost).

[Octo-Q] verdict: APPROVE — P2 suggestions are non-blocking improvements for v2 consideration.

@Jerry-Xin Jerry-Xin left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In scope for Multica, but the outbound webhook SSRF guard is bypassable at delivery time.

🔴 Blocking

  • 🔴 Critical: The SSRF protection only validates the originally configured URL, but delivery uses a default http.Client, which follows redirects and performs unrestricted DNS resolution. A subscriber can configure an allowed public URL that returns a redirect to http://127.0.0.1, http://169.254.169.254, or another private target, and d.client.Do(req) will follow it. Likewise, a hostname can resolve or later rebind to a private IP after creation. This defeats the stated guard for loopback/link-local/private/metadata targets. Fix this in the delivery path, not only create/update validation: disable redirects or validate every redirect, and use a custom Transport.DialContext that rejects resolved private/link-local/loopback/unspecified IPs for every connection. References: webhook_subscription.go, dispatcher.go, dispatcher.go.

💬 Non-blocking

  • 🟡 Warning: Delivery concurrency is bounded, but event goroutines are still unbounded under load. DispatchIssueStatusChanged starts one goroutine per event, and each dispatch goroutine can block on d.sem <- struct{}{} while prior deliveries sleep/retry. A large batch or burst can therefore accumulate many blocked dispatch goroutines despite the 16-delivery cap. Consider a bounded worker queue or non-blocking drop/log policy for fire-and-forget v1. Reference: dispatcher.go, dispatcher.go.
  • 🔵 Suggestion: Add handler-level CRUD/RBAC tests for the new API routes. The pure URL/event tests are useful, but they do not exercise owner/admin enforcement, project-in-workspace validation, or the “secret returned only once” API behavior. Reference: webhook_subscription_test.go.

✅ Highlights

  • The feature fits the repository and existing issue/workspace model.
  • The event listener correctly filters issue:updated to status changes and includes prev_status.
  • The shared HMAC signing package is a good consolidation of inbound verification and outbound signing.
  • Targeted verification passed: go test ./internal/handler ./internal/integrations/outwebhook ./internal/webhooksign and go test ./cmd/server -run 'TestNonExistent'.

@yujiawei yujiawei left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #34 (multica)

Reviewed against head 003d5342bbe1e70dcb9e3fd43bc8f503e6e7cb08 (merge-base ac1ba3b). This is a security-sensitive change (outbound webhooks: SSRF surface, HMAC signing, secret handling, detached delivery), so it got an extra-careful pass plus an independent second-opinion engine.

Verdict: Changes requested — one blocking SSRF gap. Everything else is non-blocking. The crypto, multi-tenancy scoping, and the entire frontend contract layer are solid.


Blocking

B1 — SSRF: the create-time URL guard is bypassed by HTTP redirects

server/internal/integrations/outwebhook/dispatcher.go:80

client: &http.Client{Timeout: deliveryTimeout},

The delivery client has no CheckRedirect policy, so it follows 3xx redirects with Go's default behavior (up to 10 hops). validateWebhookURL (server/internal/handler/webhook_subscription.go:108) runs only at create/update and only on the literal URL the operator typed. A subscriber registered at an allowed public URL can simply respond 302 Location: http://169.254.169.254/... (cloud metadata) or http://10.x / 127.0.0.1 (internal subnet), and d.post will follow it and POST the issue payload there. The SSRF guard the PR deliberately built is silently defeated.

The package comment at webhook_subscription.go:100-107 states the intent precisely: "an admin shouldn't be able to turn the server into a probe of its own metadata service or internal subnet." Redirect-following breaks exactly that property. It is blind SSRF (the response body is drained to io.Discard, so it's not reflected to the caller), and creating a subscription is owner/admin-gated — but in a multi-tenant deployment a workspace admin is a customer, not a server operator, so a customer reaching host infrastructure is a real cross-boundary issue.

Fix (closes the whole class, not just redirects): validate at the transport/dial layer so every connection — initial and every redirect hop, after DNS resolution — is checked against isInternalIP. A custom http.Transport with a DialContext that rejects internal resolved IPs is the robust form. The minimal stopgap is a CheckRedirect that re-runs the host check on each hop (or refuses redirects via http.ErrUseLastResponse), but that alone does not address B2.

Closely related (same root, fold into the same fix)

DNS-resolves-to-internal / DNS rebindingwebhook_subscription.go:104-107 documents that DNS is not resolved and rebinding is "out of scope for v1." That deferral is a defensible call on its own, but the create-time-only validation model is what makes both this and B1 possible: nothing re-checks the destination at delivery time. A transport-level DialContext guard that inspects the resolved IP on every dial closes B1 and the rebinding gap in one change, and turns the "out of scope" note into "actually mitigated." Strongly recommended to do both together rather than ship the redirect fix alone.


Non-blocking (P2 — address now or fast-follow, your call)

  • Subscriber URL logged on failuredispatcher.go:195,202. Delivery failures log "url", sub.Url at WARN/ERROR. Subscriber webhook URLs frequently carry tokens in the path/query; this writes customer secrets to server logs. Log subscription ID + parsed host only, or a redacted URL.

  • PATCH {"events":[]} silently resets to the defaultwebhook_subscription.go:142-145 + :312-318. Omitting events keeps the old value (COALESCE), but sending an explicit empty array runs through validateWebhookEvents, which defaults [] to ["issue.status_changed"]. Omitted vs explicit-empty diverge, and a client can't intentionally clear events. Harmless in v1 (single event type), but it's latent design debt the moment a second event ships. Cleanest fix: drop the empty-default on the update path (default only on create), or reject empty arrays.

  • Dispatch goroutine fan-out is unbounded per eventdispatcher.go:120,167. maxConcurrentDeliveries bounds in-flight POSTs, but each event still spawns a dispatch() goroutine that does the DB list + marshal before hitting the semaphore. A burst of status changes can pile up dispatch() goroutines blocked on d.sem. Bounded for normal load; consider a worker queue if status-change storms are plausible.

  • No composite FK enforcing project_idworkspace_idmigrations/121_webhook_subscription.up.sql. The handler validates this via GetProjectInWorkspace, so it's defense-in-depth only, but a (workspace_id, project_id) composite FK would make the invariant a database guarantee.

  • Authorization runs after input validation on createwebhook_subscription.go:223-235. URL/events validation happens before requireWorkspaceRole. Minor info-leak/fingerprinting surface; move the role check up to right after body decode.

  • Optional hardening: delivery goroutines have no recover() (dispatcher.go:120,168) — verified there are no currently-reachable panic vectors, so this is defensive-only, not a bug. In-flight deliveries use context.Background() and aren't cancelled on shutdown — acceptable for fire-and-forget. A constant-time-from-the-start tweak to webhooksign.Verify (compute HMAC unconditionally) closes a negligible format-vs-content timing distinction.


What's solid (verified, not assumed)

  • HMACwebhooksign.go signs the exact marshaled body once and reuses it across retries; Verify uses hmac.Equal (constant-time compare). Extracting the shared package and routing inbound verifyHubSignature through it is a good de-dup that prevents inbound/outbound drift.
  • Secret handling — full secret returned once on create (SecretOnce), never echoed by list/get/update; only the 4-char secretHint is exposed. Not logged in the handler. Cleartext-at-rest is a documented, pre-existing pattern (mirrors autopilot_trigger, tracked as MUL-2334).
  • prev_status batch fixissue.go:2919 correctly brings the batch path in line with the single-update path; both now publish prev_status, which the listener depends on. Without it, batch status changes would have fired webhooks with an empty previous_status.
  • Multi-tenancy — every read/write is scoped by workspace_id; updates/deletes resolve through the loader and use row.ID/row.WorkspaceID, consistent with the repo's UUID-parsing convention. Project-scope create validates project membership.
  • Frontend — the new API client methods all run responses through parseWithFallback + zod with explicit fallbacks (no bare as), there's a malformed-response schema test (missing/null/wrong-type cases), the create-once secret stays in React state only and is rendered as text (no XSS), i18n keys are at parity across en/ja/ko/zh-Hans, and no package-boundary violations. This fully satisfies the repo's API Response Compatibility rules.

Coverage / what I did not verify

  • Static review only — I did not run make test / the Go + TS suites against this checkout.
  • TLS certificate validation on delivery, and any reverse-proxy / egress filtering in front of the server (infra-level defense-in-depth), were not assessed.
  • An independent long-context second leg was attempted but timed out and is absent from this review — so cross-family corroboration here is single-engine plus this reviewer's own analysis, not two. The blocking finding (B1) was independently confirmed by direct code inspection and an adversarial verification pass, so confidence in it is high regardless.

…dening

Blocking (SSRF bypass at delivery time):
- New internal/netguard package: NewRestrictedHTTPClient resolves the host and
  refuses to dial loopback/link-local/private/unspecified addresses on EVERY
  dial — including each redirect hop and across DNS rebinding — then dials the
  exact verified IP (closes the resolve→dial TOCTOU). The dispatcher uses it, so
  a subscriber can no longer register a public URL that 302s to 169.254.169.254
  / 127.0.0.1 / a private host, nor use a hostname that resolves to internal.
- validateWebhookURL keeps the create-time literal/localhost fast-reject (now via
  netguard.IsBlockedIP); the transport is the authoritative enforcement.

Hardening (non-blocking review items):
- Bounded delivery: replace the per-event goroutine + blocking semaphore with a
  fixed worker pool draining a buffered queue; enqueue is non-blocking (drop +
  log when full) so dispatch goroutines can't pile up under a status-change
  storm. Workers recover() so one bad delivery can't kill a worker.
- Redact logs: log subscription_id + host only, never the full subscriber URL
  (URLs often carry tokens).
- events: validateWebhookEvents now rejects an empty list; create applies the
  default, update does not — so PATCH {"events":[]} is a 400 instead of a silent
  reset to the default.
- Create authorizes (requireWorkspaceRole) before validating input.

Tests:
- netguard: IsBlockedIP table + restricted client refuses loopback / redirect-to-
  loopback.
- dispatcher: production constructor blocks internal delivery; delivery-mechanism
  tests inject a permissive client.
- handler: RBAC 403 for non-admin, secret-shown-once, project-must-be-in-
  workspace (404), PATCH empty-events (400).
@lml2468

lml2468 commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator Author

Thanks for the careful review. Pushed 1eda2ae addressing the blocking SSRF finding and the non-blocking items.

🔴 Blocking — SSRF bypass at delivery time (fixed at the transport layer)

The create-time URL check was insufficient because the delivery client followed redirects and did unrestricted DNS. Fixed where it belongs — the delivery transport:

  • New internal/netguard package. NewRestrictedHTTPClient installs a DialContext that resolves the host, rejects if any resolved IP is loopback/link-local/private/unspecified, and then dials the exact verified IP (closes the resolve→dial TOCTOU). Because every redirect hop re-dials through this transport, redirect-based SSRF and DNS-rebinding / hostname-resolves-to-internal are all closed in one place.
  • The dispatcher now uses this client; validateWebhookURL keeps the create-time literal/localhost fast-reject (via netguard.IsBlockedIP) purely as a UX nicety.
  • Tests: netguard refuses loopback + redirect-to-loopback; a dispatcher test proves the production constructor blocks delivery to a loopback endpoint.

This directly addresses @Jerry-Xin's 🔴 and @yujiawei's B1 (incl. the folded-in DNS-rebinding note).

💬 Non-blocking — also addressed

  • Subscriber URL in logs (@yujiawei) → now log subscription_id + host only, never the full URL.
  • PATCH {"events":[]} silently resets to default (@yujiawei) → validateWebhookEvents now rejects empty; create applies the default, update returns 400.
  • Unbounded dispatch goroutines (@Jerry-Xin / Octo-Q F2) → replaced the per-event goroutine + blocking semaphore with a fixed worker pool + bounded queue; enqueue is non-blocking (drop+log on full) so dispatch goroutines can't accumulate. Workers recover().
  • Handler RBAC/behavior tests (@Jerry-Xin) → added: non-admin → 403, secret-shown-once, project-must-be-in-workspace → 404, empty-events → 400.
  • Auth before input validation (@yujiawei) → requireWorkspaceRole now runs immediately after body decode.

Deliberately deferred (with rationale)

  • Composite FK project_id ∈ workspace_id: project has no UNIQUE(id, workspace_id), so this needs altering a core table for enforcement the handler already does via GetProjectInWorkspace. Defense-in-depth only — left out.
  • Cleartext secret at rest: pre-existing pattern (autopilot_trigger.signing_secret), tracked in MUL-2334.
  • webhooksign.Verify unconditional HMAC: hmac.Equal is already constant-time; the format-vs-content timing distinction is negligible (per @yujiawei).

Verification: go build/vet/gofmt clean; full go test ./internal/handler ./internal/integrations/outwebhook ./internal/webhooksign ./internal/netguard ./cmd/server green against a local Postgres (incl. the 4 new RBAC tests).

@lml2468 lml2468 requested review from Jerry-Xin and yujiawei June 16, 2026 03:42

@mochashanyao mochashanyao left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Octo-Q · automated review]

Verdict: Approve — no blocking findings; notes below (data-flow traced).


PR#34 Review Report — feat(webhooks): outbound webhooks on issue status change

Reviewer: Octo-Q (automated review)
Head SHA: 1eda2ae1619217ebd184ee668058afbaa25ef6c3
Repo: Mininglamp-OSS/Octo-Q
Routing: complexity=security_sensitive (automated review)


1. Verification Summary

Area Status Evidence
SSRF guard (netguard) server/internal/netguard/netguard.go:22-28 — blocks loopback, link-local (incl. 169.254.169.254), private, unspecified. Dial-time enforcement with resolve→check→dial-exact-IP pattern closes TOCTOU and DNS rebinding.
SSRF redirect-hop coverage netguard.go:38-42 — transport uses restricted dialer; Go default redirect policy re-dials through same transport, so each hop is re-checked.
HMAC signing (webhooksign) server/internal/webhooksign/webhooksign.go:28-42 — HMAC-SHA256 + hmac.Equal constant-time comparison. Shared package for inbound verify + outbound sign prevents drift.
CRUD authorization handler/webhook_subscription.go — all 4 handlers call requireWorkspaceRole(..., "owner", "admin"). Create checks auth BEFORE validation (prevents probing). loadWebhookSubscription scopes by (id, workspace_id).
Project scope validation webhook_subscription.go:228-240 — CreateWebhookSubscription verifies project belongs to workspace via GetProjectInWorkspace.
Secret management generateWebhookSecret()generateCredential("whsec_") → 32 bytes crypto/rand, URL-safe base64 (256 bits). Secret returned ONLY in create response (SecretOnce), never echoed on list/get.
Dispatcher async isolation dispatcher.go:148-150DispatchIssueStatusChanged spawns goroutine, returns immediately. Never blocks the event bus or HTTP request path.
Worker pool bounding dispatcher.go:56-60 — 16 fixed workers, 1024 queue capacity, non-blocking enqueue (drops on full with log).
Retry logic dispatcher.go:186-207 — 3 attempts, retries on network error + 5xx, stops on 4xx. Backoff: 1s, 4s.
Batch prev_status fix handler/issue.go:2919 — batch update now publishes prev_status: prevIssue.Status, matching the single-update path at line 2417.
Migration 121 Proper FK with ON DELETE CASCADE on both workspace_id and project_id. Partial index on enabled for dispatch hot path.
Frontend RBAC use-webhook-section.ts:63-64canManage gated on owner/admin. Queries gated on canManage.
Schema leniency schemas.ts.loose() on zod schemas, parseWithFallback for graceful degradation.

2. Findings

No P0 or P1 issues found.

P2 — PR description understates SSRF protection (DNS rebinding)

Diff-scope: pre-existing (in the PR body text, not in code).

The PR body states: "SSRF guard checks IP literals + localhost; does not resolve DNS (DNS-rebinding out of scope for v1)."

However, the actual netguard.restrictedDialContext implementation (netguard.go:55-72) does resolve DNS (net.DefaultResolver.LookupIPAddr), checks ALL resolved IPs, and dials the exact verified IP — which does defend against DNS rebinding. The code comment at netguard.go:5-10 correctly describes this. The PR body should be corrected to match the implementation.

Impact: Documentation-only. Could mislead future security reviewers into thinking DNS rebinding is unmitigated when it actually is.

P2 — Dispatcher worker pool has no graceful shutdown

Diff-scope: new (introduced by this PR).

outwebhook.New() starts 16 worker goroutines (dispatcher.go:107-109) that run for job := range d.jobs with no stop signal. The Dispatcher has no Close()/Stop() method. On server shutdown, in-flight deliveries are abandoned and workers are killed with the process.

Impact: Consistent with the rest of the server (no graceful shutdown pattern). Acceptable for v1 fire-and-forget. Consider adding a context.Context-based shutdown for v2 if delivery persistence is added.

P2 — Silent delivery drop on queue-full

Diff-scope: new (introduced by this PR).

dispatcher.go:175-179 — when the 1024-slot delivery queue is full, new deliveries are dropped with only a slog.Warn. No metric/counter is incremented.

Impact: Under extreme load, operators have no alerting signal beyond log scraping. Consider adding a Prometheus counter or similar observable for dropped deliveries.

3. Suggestions

  1. Correct PR body DNS-rebinding claim to match the actual implementation (it's better than stated).
  2. Future: Add a Close() method to Dispatcher that cancels workers and drains the queue, especially before adding delivery-history persistence.
  3. Future: Add a counter metric for queue-full drops so operators can alert on delivery loss.

4. Additional Observations

  • Signing secret at rest: Stored cleartext, matching existing autopilot_trigger.signing_secret convention. Acknowledged in PR notes. No general secrets-at-rest infra exists yet (MUL-2334). Not a regression.
  • No delivery-history persistence: v1 is fire-and-forget with log-only outcomes. Asymmetric with inbound webhook_delivery table. Acknowledged and deferred.
  • No secret rotation endpoint: Users must delete + recreate to rotate. Acceptable for v1.
  • generateCredential refactor: Clean extraction — generateWebhookToken and generateWebhookSecret now share one entropy source. Good.
  • signingSecretHintsecretHint dedup: Correctly eliminates the duplicate between autopilot and webhook handlers.

5. Data-Flow Tracing

Path A: Batch issue update → outbound webhook delivery

  1. BatchUpdateIssues (issue.go:2919) publishes EventIssueUpdated with status_changed: true + prev_status: prevIssue.Statusthis PR fixes the missing prev_status
  2. registerWebhookListeners (webhook_listeners.go:24-26) checks payload["status_changed"].(bool) → true
  3. Extracts payload["issue"].(handler.IssueResponse) (line 29) and payload["prev_status"].(string) (line 36)
  4. Calls d.DispatchIssueStatusChanged(...) (line 43-50) with typed struct
  5. DispatchIssueStatusChanged (dispatcher.go:148) spawns goroutine → dispatch()
  6. dispatch() queries ListEnabledWebhookSubscriptionsForDispatch(ctx, wsUUID) (line 160) — SQL: WHERE workspace_id = $1 AND enabled = true
  7. Filters by subscriptionMatches(sub, ev.ProjectID) (workspace-level matches all; project-level matches own project) AND subscribedToEvent(sub, EventIssueStatusChanged) (line 166-168)
  8. Marshals outboundPayload with event, workspace_id, actor, issue, previous_status, delivered_at (line 173-180)
  9. Enqueues deliverJob per matched subscription (line 184-189)
  10. Worker picks up → deliver() signs with webhooksign.Sign(sub.Secret, body)post() via SSRF-restricted HTTP client

Verdict: Data flows correctly from issue update through event bus to signed HTTP delivery. The prev_status fix closes the batch-update gap.

Path B: Webhook subscription CRUD

  1. POST /api/webhook-subscriptions → auth middleware → CreateWebhookSubscription
  2. requireWorkspaceRole(owner/admin)validateWebhookURL (SSRF fast-check) → validateWebhookEvents (allow-list) → verify project in workspace → generateWebhookSecret() → DB insert → response with SecretOnce
  3. Subsequent GET/list: webhookSubscriptionToResponse() never populates SecretOnce (omitempty hides empty string)

Verdict: Secret lifecycle is correct — generated, returned once, never re-exposed.

6. R5 Blind-Spot Checklist (security_sensitive)

C1 — Dual-path parity: CLEAR

  • Create ↔ Delete: Both gated on owner/admin, both scoped by workspace_id. ✅
  • Create ↔ Update: Both gated, both validate URL/events. Update cannot change project_id (by design). ✅
  • List (workspace) ↔ List (project): Both gated, both scoped. ✅
  • Single issue update ↔ Batch issue update: Both now publish prev_status. This PR fixes the batch path parity. ✅

C2 — Control-flow ordering / nested reuse: CLEAR

  • webhooksign.Sign and webhooksign.Verify are leaf functions with no nested call chains. ✅
  • netguard.restrictedDialContext is called once per connection (including redirect hops). No double-application risk. ✅
  • validateWebhookURL is called at create-time and update-time (when URL changes). No ordering dependency with other validators. ✅
  • No security control is applied in a "check-then-act" pattern that could be bypassed by reordering. ✅

C3 — Authorization boundary ≠ capability boundary: CLEAR

  • All CRUD endpoints are behind auth middleware + workspace role check. ✅
  • The dispatcher runs server-side only — no user-facing endpoint to trigger delivery directly. ✅
  • loadWebhookSubscription uses (id, workspace_id) composite lookup — cross-workspace access impossible. ✅
  • The webhook listener on the event bus is registered server-side — no external caller can inject events. ✅

[Octo-Q] verdict: APPROVE

No P0 or P1 issues found. The security architecture is well-designed: SSRF guard at dial-time with DNS rebinding defense, HMAC signing in a shared package, owner/admin-gated CRUD with workspace scoping, and async delivery that never blocks the request path. Three P2 observations (PR body documentation mismatch, no graceful shutdown, silent queue-full drops) are non-blocking.

@Jerry-Xin Jerry-Xin left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary: The PR is relevant to Multica, but the outbound webhook SSRF protection has a proxy bypass that must be fixed before merge.

🔴 Blocking

  • 🔴 Critical: server/internal/netguard/netguard.go:38-40 uses http.ProxyFromEnvironment in the “restricted” webhook client. When HTTP_PROXY/HTTPS_PROXY is set, Go dials the proxy and the custom DialContext validates only the proxy address, not the user-supplied webhook target. That bypasses the DNS/internal-IP guard for hostnames accepted by validateWebhookURL at server/internal/handler/webhook_subscription.go:121-126, letting a webhook target that resolves to private/link-local infrastructure be reached through the proxy. For this security boundary, either disable proxies for webhook delivery (Proxy: nil) or add target-host validation that runs independently for every request and redirect before proxy use. Add a regression test with a proxy configured and an internal-resolving target.

💬 Non-blocking

  • 🟡 Warning: server/internal/integrations/outwebhook/dispatcher.go:160-175 starts an unbounded goroutine per status-change event, and each goroutine can hold a DB query for up to 10 seconds. The worker pool bounds HTTP delivery, but not subscription lookup pressure. Consider routing dispatch lookup through a bounded queue or semaphore if status changes can spike.

✅ Highlights

  • The PR keeps subscription CRUD owner/admin-gated and scopes writes by workspace_id.
  • HMAC signing is centralized in webhooksign, reducing inbound/outbound drift.
  • Targeted Go tests for netguard, outwebhook, and webhooksign pass locally.

@Jerry-Xin Jerry-Xin dismissed their stale review June 16, 2026 04:03

Superseded by re-review on new head 1eda2ae (review #4503139076). Original SSRF redirect/transport blocker is fixed via netguard dial-time IP guard; new CR is for a remaining proxy-path bypass.

@yujiawei yujiawei left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #34 (multica)

Reviewed against head 1eda2ae1619217ebd184ee668058afbaa25ef6c3 (merge-base ac1ba3b). This is a security-sensitive change (outbound webhooks: SSRF surface, HMAC signing, secret handling, detached delivery), so it received an extra-careful pass plus an independent second-opinion engine and a fresh adversarial verification of the SSRF guard at this head.

This is a re-review. The prior round's blocking finding — the create-time-only SSRF guard being bypassed by HTTP redirects / DNS rebinding — is now fixed by the new netguard package: the delivery client resolves the host, rejects any blocked resolved IP, and dials the exact verified IP on every hop (initial + each redirect). That closes the redirect and rebinding gaps cleanly, and netguard_test.go proves it. Nice work.

Verdict: Changes requested. One residual SSRF bypass in that same new guard (proxy-enabled deployments) keeps it from being airtight. Everything else below is non-blocking. The crypto, multi-tenancy scoping, RBAC, and the entire frontend contract layer are solid and independently verified.


Blocking

B1 — SSRF guard is bypassed when an egress proxy is configured

server/internal/netguard/netguard.go:39

transport := &http.Transport{
    Proxy:                 http.ProxyFromEnvironment,
    DialContext:           restrictedDialContext(dialer),
    ...
}

The whole security guarantee of this package is enforced in restrictedDialContext — it resolves the destination host and refuses to dial a blocked internal IP. But with Proxy: http.ProxyFromEnvironment, when HTTP_PROXY / HTTPS_PROXY / ALL_PROXY is set in the server's environment, Go dials the proxy address, not the target. restrictedDialContext then only validates the proxy's IP; the user-configured webhook host is sent to the proxy as a CONNECT/absolute-URI target and never passes through IsBlockedIP. A subscriber can register http://169.254.169.254/... (cloud metadata) or an internal host, and the proxy will happily fetch it — the dial-time guard the package promises is silently a no-op for the destination.

This is narrower than the redirect bug that was just fixed (it requires a proxy to be configured), but it matters here specifically because:

  • The package docstring states the guarantee is enforced "at dial time … on every dial," which is not true under a proxy.
  • The PR's own testing context is a Dockerized self-host deployment, which is exactly where an egress/forward proxy is commonly present.
  • It re-opens the precise SSRF class (loopback / link-local / metadata / private) the prior round blocked on.
  • netguard is used only by this webhook delivery path, so disabling the proxy here has no collateral impact.

Fix (one line): for the restricted client, set Proxy: nil so connections always go direct and always flow through restrictedDialContext. If proxied egress is a real requirement for this deployment, the host check must instead run against the target host before delegating to the proxy (and that path needs its own test). Given v1 scope, Proxy: nil is the clean choice. A regression test that sets HTTP_PROXY and asserts an internal target is still refused would lock this down.


Non-blocking (P2 — address now or fast-follow, your call)

  • One-time signing secret can leak into logs on schema driftpackages/core/api/client.ts:2108-2120 + packages/core/api/schema.ts:46-54. The create response carries the full signing secret (returned once). It is parsed via parseWithFallback, which on any validation failure logs received: data — i.e. the entire response body, including the secret — to the schema logger / telemetry. A future shape drift (e.g. a malformed created_at) would write the live secret to logs, and the user would also silently get the empty-fallback object instead of their secret. Consider a redacted parse path for this secret-bearing endpoint (strip secret before logging, or parse without raw-body logging here). Low likelihood, but the blast radius is a live credential.

  • Per-event dispatch goroutine is unbounded before backpressureserver/internal/integrations/outwebhook/dispatcher.go:160-179. The delivery queue (numWorkers=16, queueCapacity=1024, non-blocking enqueue) is well designed and bounds in-flight POSTs. But DispatchIssueStatusChanged still does go d.dispatch(ev) per event, and each dispatch goroutine runs the subscription DB query + marshal before reaching the bounded queue. Under a status-change storm (e.g. a large batch update), this can spawn many concurrent dispatch goroutines each issuing a DB query for up to listTimeout. Bounded for normal load; consider funneling the lookup/enqueue step through a bounded worker too if storms are plausible. (Raised in the prior round as well — still open.)

  • No composite FK enforcing project_idworkspace_idserver/migrations/121_webhook_subscription.up.sql. The handler validates this at create via GetProjectInWorkspace, so it's defense-in-depth only; a (workspace_id, project_id) composite FK would make the cross-tenant invariant a database guarantee rather than an application one.


What's solid (verified, not assumed)

  • SSRF guard core (the fix)netguard.go: resolves host, rejects any blocked resolved IP, dials the exact verified IP, and re-runs on every redirect hop via Go's default redirect policy re-dialing the transport. TLS verification is left intact (not disabled). Redirect-to-loopback and split-horizon DNS are covered by tests. This is the right shape; only the proxy path (B1) escapes it.
  • HMACwebhooksign.go signs the exact marshaled body once and reuses it across retries; Verify uses hmac.Equal (constant-time). Consolidating inbound verifyHubSignature onto the shared package removes inbound/outbound drift risk — a good de-dup.
  • Secret handling — full secret returned once on create (SecretOnce), never echoed by list/get/update; only the 4-char secret_hint is exposed. Not logged server-side. Cleartext-at-rest is a documented, pre-existing pattern (mirrors autopilot_trigger, tracked separately).
  • RBAC / multi-tenancy — all CRUD is owner/admin-gated; reads/writes scoped by workspace_id; update/delete resolve through the loader and use row.ID / row.WorkspaceID per the repo's UUID convention; project-scoped create validates project membership. The handler test covers the member-forbidden, secret-shown-once, foreign-project-404, and empty-events-400 cases — the RBAC/CRUD gap flagged last round is now closed.
  • prev_status batch fixissue.go:2919 brings the batch path in line with the single-update path; both publish prev_status, which the listener depends on. Without it, batch status changes would have fired webhooks with an empty previous_status.
  • Event wiring — listener correctly filters issue:updated on status_changed, the dispatcher's SupportedEventTypes is the single source of truth feeding the handler allow-list, and a malformed events column fails closed (treated as "no events", never a panic). Delivery worker has a recover() so one bad delivery can't kill a worker.
  • Frontend — new API client methods all run responses through parseWithFallback + lenient zod with explicit fallbacks (no bare as); there's a malformed-response schema test (missing / null / wrong-type); the once-shown secret stays in React state and is rendered as text (no XSS); i18n keys are at parity across en/ja/ko/zh-Hans; no package-boundary violations. Satisfies the repo's API Response Compatibility rules.

Coverage / what I did not verify

  • Targeted Go suites pass on this checkout: go test ./internal/handler ./internal/integrations/outwebhook ./internal/netguard ./internal/webhooksign — all green; go build / go vet / gofmt clean on the new files. I did not run the full make check (TS unit + E2E).
  • TLS pinning and any reverse-proxy / egress filtering in front of the server (infra-level defense) were not assessed beyond confirming cert validation isn't disabled in code.
  • A second independent long-context review leg was attempted but timed out with no output and is therefore absent from this review — cross-family corroboration here is single-engine plus this reviewer's own analysis, not two. B1 was independently confirmed by direct code inspection regardless, so confidence in it is high.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants