Skip to content

fix(engine): Baileys post-review hardening — lazy-load (no Node floor), full inbound, lifecycle + store fixes#312

Merged
rmyndharis merged 7 commits into
mainfrom
feat/baileys-review-hardening
Jun 18, 2026
Merged

fix(engine): Baileys post-review hardening — lazy-load (no Node floor), full inbound, lifecycle + store fixes#312
rmyndharis merged 7 commits into
mainfrom
feat/baileys-review-hardening

Conversation

@rmyndharis

Copy link
Copy Markdown
Owner

Summary

Post-review hardening of the Baileys engine, resolving 12 findings from a deep source-traced review of the 5-slice stack (#299#310). Stacked on #310 (base feat/baileys-contacts-chats); the diff here is the hardening only.

Why this PR exists

A pre-merge multi-agent review found 2 Critical + 10 Important defects in the engine — concentrated in the inbound path, the reconnect lifecycle, the SQLite message store, and the Node-floor decision. Rather than ship the stack with them, this PR fixes all 12 (each adversarially verified, then re-verified by an Opus whole-branch review).

What changed

Engine loading — Approach B (no global Node floor)

  • Lazy-load @whiskeysockets/baileys via a cached dynamic import() in the adapter + message-store service, so the ESM-only package loads only when the Baileys engine is actually used. The default whatsapp-web.js engine no longer boot-loads it.
  • Removed the engines.node >= 20.19 floor and the boot-time Node guard entirely — wwjs-only operators keep any Node version. (Build verified: dist emits a dynamic import(, zero eager require("@whiskeysockets/baileys").)

Inbound pipeline (full parity with wwjs)

  • Incoming media downloaded to base64 (downloadMediaMessage, with normalizeMessageContent unwrapping documentWithCaption/view-once/ephemeral), caption → body, location, and quoted message now populated — previously these arrived blank.
  • Remote delete-for-everyone (protocolMessage REVOKE) → onMessageRevoked; reactions (reactionMessage) → onMessageReaction — instead of being mis-persisted as spurious type:'unknown' message.received rows.
  • Incoming status@broadcast is dropped in session.service.onMessage (mirrors the existing onMessageCreate guard) — no longer pollutes the message list/webhooks.
  • The inbound handler is fully guarded (a malformed message can never crash the event loop).

Connection lifecycle

  • Stop-race guard: connect() re-checks intentionalClose after its awaits, so a disconnect()/logout()/destroy() during connection setup can't resurrect a stopped session with a live socket.
  • Capped backoff reconnect: an in-flight guard + exponential backoff (max 30s) + a 5-attempt cap → FAILED + onError, replacing the previous unbounded tight reconnect loop; the counter resets on a successful connection and the pending timer is cleared on stop.
  • First-connect errors surface as FAILED + onError (mirrors the wwjs adapter) instead of a stuck INITIALIZING.

Persisted message store

  • C1 (Critical): the SQLite eviction wiped the store to ~0 (second-precision datetime('now') vs the millisecond-precision bound param → string-compare matched every same-second row). put() now writes a millisecond-precise createdAt so eviction keeps exactly the cap. A regression test drives real put() (would return 0 under the bug, returns the cap with the fix).
  • I6: added a @ManyToOne(() => Session, { onDelete: 'CASCADE' }) relation so the default SQLite synchronize path emits the FK CASCADE (the migration already had it for Postgres) — deleting a session no longer orphans its stored messages.

Testing

  • 642 unit + 6 e2e green; backend lint 0 / build clean; dashboard build clean / lint 0 errors.
  • New coverage: inbound media/location/quoted/documentWithCaption, revoke/reaction classification (asserts no spurious row), status filter, the stop-race + reconnect-cap (proves cap=5) + first-connect-error lifecycle tests, and the SQLite eviction regression test.
  • Reviewed via per-task spec+quality gates and a final Opus whole-branch review: all 12 findings verified resolved, no blocking regressions.

Non-blocking follow-ups (deferred)

  • Within one messages.upsert batch, webhook/ws emission order is no longer strictly arrival-order (each message carries its own timestamp; consumers shouldn't rely on intra-batch order).
  • A CI grep asserting dist/**/baileys*.js contains import( and no eager require("@whiskeysockets/baileys") would guard the lazy-load invariant against future regressions.

…(Approach B)

Switch @whiskeysockets/baileys from static value-imports to a private cached
dynamic import() in both BaileysAdapter and BaileysMessageStoreService.  The
package now loads only when ENGINE_TYPE=baileys is selected at runtime; the
default whatsapp-web.js path never touches it.

Changes:
- baileys.adapter.ts: remove static value-import; add private `lib` field +
  `loadLib()` async loader; update connect() to call loadLib() and reference
  all Baileys values (default/useMultiFileAuthState/fetchLatestBaileysVersion/
  DisconnectReason/getContentType) via the loaded module.
- baileys-message-store.service.ts: remove static BufferJSON import; add
  `baileysLib` + `loadLib()` loader; call loadLib() inside put()/getMessage().
- package.json: remove the engines.node >=20.19.0 floor (no longer required;
  Baileys loads lazily so there is no boot-time ESM require() constraint).
- package.json jest.transform: override ts-jest tsconfig to CommonJS so
  dynamic import() is downleveled for the CJS jest runner, making
  jest.mock/@whiskeysockets/baileys resolve correctly in unit tests.
- CHANGELOG.md: replace the Node-floor warning with a lazy-load note.

Verified: dist files contain import('@whiskeysockets/baileys') not
require(), 627 unit tests pass, 6 e2e tests pass, lint 0 errors.
…pressions)

Replace `private lib: any` / `private baileysLib: any` with `typeof BaileysLib`
(namespace import type — fully erased at compile time, zero dist require).
Loader return types updated to `Promise<typeof BaileysLib>`. Removes all
`// eslint-disable-next-line @typescript-eslint/no-unsafe-*` suppressions that
the `any` fields required. Wraps `saveCreds` in `() => void` to satisfy the
ev.on void-return constraint now that the call site is typed.
…on + status filter

- handleMessagesUpsert fans out to async processInboundMessage per message
- protocolMessage REVOKE → onMessageRevoked (no onMessage, no spurious DB row)
- reactionMessage → onMessageReaction (no onMessage)
- mapMessage (now async): caption falls back to body; image/video/audio/document/sticker
  downloads via downloadMediaMessage (guarded try/catch — failure logs + omits media,
  never throws into event loop); locationMessage populates location coords + name/address;
  contextInfo.quotedMessage populates quotedMessage {id, body}
- BaileysIncomingFields gains media/location/quotedMessage; buildIncomingMessageFromBaileys
  passes them through to the neutral IncomingMessage
- session.service.ts onMessage: guard isStatusBroadcast at the top (mirrors onMessageCreate)
  so incoming status@broadcast posts are never persisted or webhooked
- Tests: all 634 unit + 6 e2e pass; lint clean; build clean
Wrap the entire processInboundMessage body in try/catch so a throw
from any path logs-and-drops instead of becoming an unhandled rejection.

Use normalizeMessageContent (available in @whiskeysockets/baileys@6.7.23)
to unwrap documentWithCaptionMessage/viewOnceMessage/ephemeralMessage
before extracting subMessage, so documentWithCaptionMessage now yields
a non-empty mimetype and fileName.

Replace the (qm.conversation as unknown as string) double-cast in the
quoted-message body extraction with a properly typed intermediate.

Tests: add fakeStore.put not.toHaveBeenCalled assertions to REVOKE and
reaction tests; add normalizeMessageContent identity mock; add fixture
test proving documentWithCaptionMessage extracts non-empty mimetype.
…K CASCADE on session delete

C1: set createdAt: new Date() explicitly in put() upsert so the stored value carries
millisecond precision matching the :createdAt bound param in enforceLimit(). Without
this, SQLite's datetime('now') stores second-precision (e.g. '…:11') while the JS
Date bound serializes as '…:11.000' — SQLite string-compares '…:11' < '…:11.000' = TRUE,
so every same-second row matches the eviction predicate, wiping the store to ~0.
Reply/forward/react/delete then fail with "message not found".

Add a real put()-driven eviction regression test (6 msgs, cap=3, same wall-clock second)
that must fail on old code (count=0) and pass with the fix (count=3, C4/C5/C6 kept).

I6: add @manytoone(() => Session, { onDelete: 'CASCADE' }) to BaileysStoredMessage so the
synchronize:true SQLite path emits the CASCADE FK (the migration path already had it).
Deleting a session now cascade-deletes its stored messages on both paths. Session entity
added to the unit-test DataSource; session rows seeded before put() calls.

SQLite FK enforcement in the unit test: TypeORM sqlite driver does not run PRAGMA
foreign_keys=ON by default, so FKs are declared but not enforced — existing put() tests
pass without a parent sessions row pre-existing. Session rows are seeded anyway for
correctness (and to be safe if FK enforcement is ever enabled).
Base automatically changed from feat/baileys-contacts-chats to main June 18, 2026 08:37
@rmyndharis rmyndharis merged commit 937aa9e into main Jun 18, 2026
@rmyndharis rmyndharis deleted the feat/baileys-review-hardening branch June 18, 2026 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant