Skip to content

fix(contract): honor webhook timeout, bound shutdown, unify unsupported-op 501, validate engine/storage env#350

Closed
rmyndharis wants to merge 1 commit into
mainfrom
fix/contract-config-hardening
Closed

fix(contract): honor webhook timeout, bound shutdown, unify unsupported-op 501, validate engine/storage env#350
rmyndharis wants to merge 1 commit into
mainfrom
fix/contract-config-hardening

Conversation

@rmyndharis

@rmyndharis rmyndharis commented Jun 19, 2026

Copy link
Copy Markdown
Owner

Eight small contract / lifecycle / config / doc fixes. No breaking changes; defaults unchanged.

Fixed

  • WEBHOOK_TIMEOUT is honored on the primary (queued) path and the test endpoint. The configured timeout was applied only on the deprecated direct path; the BullMQ webhook processor and POST /webhooks/:id/test hardcoded 10s, so raising WEBHOOK_TIMEOUT had no effect when QUEUE_ENABLED=true (the recommended production path). All three sites now read webhook.timeout. (The processor gains a ConfigService constructor dependency; ConfigModule is global so DI resolves it.)
  • Graceful shutdown is bounded. CacheService.onModuleDestroy awaited redis.quit() with no deadline; on a half-open socket the QUIT reply never arrives, blocking app.close() until the orchestrator SIGKILLs the process. It now force-disconnects after a short timeout (timer cleared on a clean quit) so shutdown always completes.
  • Unsupported status/catalog operations return a consistent 501. Six whatsapp-web.js stubs threw a raw Error → HTTP 500, while Baileys returned 501 for the same ops. They now throw EngineNotSupportedError (501) uniformly. (The silent return null/[] read stubs are intentionally left for a capability-gated follow-up — flipping a 200-empty to 501 would change behavior for clients that treat empty as success.)
  • A misconfigured ENGINE_TYPE / STORAGE_TYPE fails fast at boot. A typo (ENGINE_TYPE=bailys) was swallowed and silently fell back to the legacy engine; validateEnv now whitelists the registered ids.

Changed

  • /api/metrics is memoized for a few seconds. Each scrape ran a full session scan + several aggregate queries; back-to-back scrapes (or multiple Prometheus replicas) now share a short-lived rendered result. Stale-by-a-few-seconds metrics are expected for Prometheus. The METRICS_TOKEN gate and @SkipThrottle are unchanged.
  • Internal: removed a dead if (!validKey) branch in the WebSocket connect handler — validateApiKey always throws on failure, so the surrounding catch is the real rejection path. The load-bearing handleSubscribe null-branch (revoked-mid-connection) is untouched.

Documentation

  • Documented the webhook idempotencyKey / deliveryId fields (body + X-OpenWA-Idempotency-Key / X-OpenWA-Delivery-Id headers) and the dedup-on-idempotencyKey rule in the n8n integration guide.
  • Corrected .env.example rate-limit variables to the names the app actually reads (RATE_LIMIT_MEDIUM_TTL / RATE_LIMIT_MEDIUM_LIMIT, in milliseconds); the advertised RATE_LIMIT_TTL / RATE_LIMIT_MAX were read by nothing.

Tests

New/updated unit tests: webhook timeout flows from config on the processor and both webhook.service sites (AbortSignal.timeout spy, distinct value); cache shutdown force-disconnects when quit() hangs and doesn't when it's clean; validateEnv accepts/rejects engine & storage ids; metrics render() is memoized within the TTL and recomputes after; the WS connect rejection asserts the real throw contract. Full backend 736/736; dashboard build + lint clean.

An adversarial review found no must-fix issues; its one nice-to-have (the two webhook.service timeout sites weren't value-asserted) was closed before opening this PR.

Deferred (own follow-up)

  • Settings PUT: PUT /settings mutates an in-memory field that's lost on restart and never re-applied to runtime config. Making it honest is an API-contract decision (persist + re-apply the truly-mutable values vs. return 501/read-only for env-derived ones), so it's out of scope here.

…ed-op 501, validate engine/storage env

- WEBHOOK_TIMEOUT is now read on the primary (queued) webhook processor and the test endpoint, not just the deprecated direct path (both hardcoded 10s, so raising the timeout had no effect with QUEUE_ENABLED=true).

- CacheService.onModuleDestroy bounds redis.quit() with a force-disconnect deadline so a half-open socket can't block app.close() until SIGKILL.

- whatsapp-web.js status/catalog stubs that threw a raw Error (HTTP 500) now throw EngineNotSupportedError (501), matching Baileys; the silent return-null/[] read stubs are left for capability-gated follow-up.

- validateEnv fails fast on an unknown ENGINE_TYPE / STORAGE_TYPE instead of silently falling back to the default.

- /api/metrics render() is memoized for a few seconds so back-to-back scrapes don't each run a full session scan + aggregates.

- Removed a dead if(!validKey) branch in the WS connect handler (validateApiKey always throws on failure).

- Docs: documented webhook idempotencyKey/deliveryId + X-OpenWA-* headers (n8n dedup); corrected .env.example rate-limit var names (RATE_LIMIT_MEDIUM_TTL/LIMIT, ms).
rmyndharis added a commit that referenced this pull request Jun 19, 2026
* fix(webhook): deliver session lifecycle events and key webhook hardening (#335)

* fix(security): pin outbound webhook and media fetches to validated IP (#338)

* fix(plugins): persist plugin enable/config and restore (#339)

* fix(message): persist bulk-sent messages, sanitize SSRF (#340)

* fix(engine): return the real id for forwarded messages (#341)

* fix(security): harden outbound requests, IPv6 SSRF (#344)

* fix(security): secret-file perms, key pepper, allowedIps (#345)

* fix(storage): bound tar imports, contain storage keys (#346)

* fix(session): reconcile late acks, serialize reactions (#348)

* fix(contract): webhook timeout, bounded shutdown, 501 (#350)

* feat(session): force-kill a stuck session (#352)

* merge #343

* merge #351

* chore(release): v0.4.3 — CHANGELOG, version bump (package.json/dashboard/swagger), README + docs
@rmyndharis

Copy link
Copy Markdown
Owner Author

Shipped in v0.4.3 (integrated via the release PR #354 and tagged v0.4.3). Thanks! 🎉

@rmyndharis rmyndharis closed this Jun 19, 2026
@rmyndharis rmyndharis deleted the fix/contract-config-hardening branch June 19, 2026 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant