Skip to content

feat(event-bus): restore event-bus plugin with W3C OTel traceparent propagation#5

Closed
sumitjha-arch wants to merge 1 commit into
mainfrom
fix/otel-gcp-pubsub-traceparent-propagation
Closed

feat(event-bus): restore event-bus plugin with W3C OTel traceparent propagation#5
sumitjha-arch wants to merge 1 commit into
mainfrom
fix/otel-gcp-pubsub-traceparent-propagation

Conversation

@sumitjha-arch

Copy link
Copy Markdown

Summary

  • Restores the event-bus plugin (removed in 2.14.0) as WMS production services depend on fp-plugins@2.12.0 which includes it
  • Fixes broken distributed traces across the GCP Pub/Sub async boundary — traceparent was never injected on publish or extracted on consume, causing every incoming Pub/Sub message to start a disconnected root span in Jaeger
  • Fixes TypeScript 6.0 compilation errors in azure-servicebus.ts, rabbitmq.ts, and commons.ts introduced by the TS 6.0 upgrade that removed this code

Root cause

publishToPubSub() set attrs = { event } and called topic.publishMessage() with no traceparent attribute. The push handler at /gcp-pubsub/process-message read the message but never called propagation.extract(). Result: every Pub/Sub-triggered handler was a new root span — invisible in Jaeger alongside the originating HTTP request.

What changed

src/event-bus/gcp-pubsub.ts

On publish:

// NEW — inject active span context into Pub/Sub message attributes
if (otel) {
  otel.propagation.inject(otel.context.active(), attrs);
}
topic.publishMessage({ json: { event, payload }, attributes: attrs });

On consume (push handler):

// NEW — extract parent context, start CONSUMER span, run handlers in context.with()
const parentCtx = otel.propagation.extract(otel.context.active(), attrs);
const span = tracer.startSpan(`pubsub.consume.${eventMsg.event}`, { kind: otel.SpanKind.CONSUMER }, parentCtx);
const handlerCtx = otel.trace.setSpan(parentCtx, span);
await otel.context.with(handlerCtx, runHandlers);

src/event-bus/event-consumer/gcp-pubsub.ts

Pull consumer now extracts traceparent from message attributes, starts a pubsub.pull.process CONSUMER span, and re-injects the context as HTTP headers into the internal instance.inject() call so the push handler receives the correct trace context.

OTel API loading

Both files use a lazy getOtelApi() helper — tries @opentelemetry/api via NODE_PATH first (Kubernetes OTel operator), falls back to the pnpm content-addressed path, returns null (no error) if unavailable. Safe to deploy without OTel.

Test evidence

Verified in a kind cluster with a GCP Pub/Sub emulator, comparing side-by-side deployments:

Scenario message.attributes.traceparent Jaeger result
No fix ABSENT 3 disconnected trace trees
With fix 00-89adb2d60dd4fd1e8c2d1bafd228e468-9f94e3df4a135bc3-01 Single trace tree, HTTP → outbox → consumer

The traceID is preserved end-to-end from the original HTTP request through the pg-boss outbox worker through the Pub/Sub consumer handler.

Test plan

  • pnpm run build passes (TypeScript compilation clean)
  • pnpm test passes (unit tests)
  • Deploy to a staging WMS environment, trigger an event, confirm single trace in SigNoz/Jaeger with pubsub.consume.<event> span as child of the originating HTTP span

🤖 Generated with Claude Code

…ropagation

Re-introduces the event-bus plugin (removed in 2.14.0) with a critical fix:
W3C traceparent is now propagated across the GCP Pub/Sub async boundary so
distributed traces remain connected end-to-end.

### What was broken
`publishToPubSub()` never called `propagation.inject()`, so Pub/Sub message
attributes contained no traceparent.  The push/pull consumer never called
`propagation.extract()`, so every incoming message started a new root span —
completely disconnected from the originating HTTP request's trace tree.

### What is fixed (gcp-pubsub.ts)
- PUBLISH: `otel.propagation.inject(otel.context.active(), attrs)` injects the
  active span context as `traceparent`/`tracestate` into Pub/Sub message
  attributes before every `topic.publishMessage()` call.
- CONSUME (push): `otel.propagation.extract(attrs)` reconstructs the parent
  context from the incoming message, starts a `pubsub.consume.<event>` CONSUMER
  span as a child, and runs all handlers inside `otel.context.with()` so every
  DB query and further publish is linked to the original trace.
- Logs `EVENT_TRACEPARENT_INJECT` on publish and `traceparent_extracted` field
  on consume for observability.

### What is fixed (event-consumer/gcp-pubsub.ts)
- Pull consumer extracts the traceparent from message attributes, starts a
  `pubsub.pull.process` CONSUMER span, and re-injects the context as HTTP
  headers into the internal `instance.inject()` call so the push handler
  receives the correct trace context.

### OTel API resolution
Both files use a lazy `getOtelApi()` helper that tries `@opentelemetry/api`
first (available via NODE_PATH when using the OTel Kubernetes operator) and
falls back to the pnpm content-addressed path.  Returns null if neither is
available, so no error is thrown in environments without OTel.

### TypeScript 6 compatibility fixes
Fixed implicit-any and unknown-err errors in azure-servicebus.ts, rabbitmq.ts,
and commons.ts introduced by the TypeScript 6.0 upgrade.

Verified end-to-end in a kind cluster with a Pub/Sub emulator:
- NO-FIX: traceparent ABSENT in message attributes → broken trace tree
- FIX: traceparent present → same traceID from HTTP request through pg-boss
  outbox publish through Pub/Sub consumer handler

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sumitjha-arch

Copy link
Copy Markdown
Author

Closing — will be handled differently

@sumitjha-arch sumitjha-arch deleted the fix/otel-gcp-pubsub-traceparent-propagation branch June 1, 2026 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant