Skip to content

feat: native OpenTelemetry trace export#51

Open
iliassjabali wants to merge 15 commits intomainfrom
feat/native-otel-trace-export
Open

feat: native OpenTelemetry trace export#51
iliassjabali wants to merge 15 commits intomainfrom
feat/native-otel-trace-export

Conversation

@iliassjabali
Copy link
Copy Markdown
Collaborator

Summary

Adds native OpenTelemetry trace export to both the AgentSpec sidecar and SDK reporter. A new shared @agentspec/otel package provides TracerProvider setup, OTLP exporters, and W3C trace context propagation. Both layers emit spans linked via traceparent headers for full distributed tracing.

  • New @agentspec/otel package with TracerProvider, OTLP exporters (HTTP/protobuf + gRPC), rate-based sampler, W3C propagation
  • Sidecar proxy instrumentation: root spans per request, traceparent injection into upstream, OPA block spans
  • Sidecar control plane instrumentation: spans wrapping all diagnostic endpoints
  • Sidecar event ingestion + explain handler spans
  • Sidecar startup reads spec.observability.tracing from manifest, auto-initializes when backend: otel
  • SDK reporter spans for health refresh and heartbeat push, graceful no-op when OTel not configured
  • Concept doc + step-by-step guide with Jaeger/Tempo verification examples

Architecture

agent.yaml
  spec.observability.tracing.backend: otel
                    |
                    v
            @agentspec/otel
       (shared TracerProvider + OTLP exporter)
          /                          \
         v                            v
   Sidecar (port 4000/4001)     SDK Reporter (agent process)
   - proxy:request spans        - reporter:health-refresh spans
   - cp:* endpoint spans        - reporter:heartbeat spans
   - events:ingest spans
   - explain:generate spans
         |                            |
         +------- traceparent ------->+
         |        (W3C header)        |
         v                            v
      OTLP Exporter ---------> Jaeger / Grafana / Datadog

Package dependency graph

@agentspec/otel (new)
  deps: @opentelemetry/api, sdk-trace-base, exporter-otlp-http,
        exporter-otlp-grpc, resources, semantic-conventions

@agentspec/sidecar
  deps: @agentspec/otel (hard), @opentelemetry/api (runtime)

@agentspec/sdk
  deps: @opentelemetry/api (runtime, ~50KB)
  optionalDeps: @agentspec/otel

Span hierarchy

Sidecar proxy request:
  [proxy:request] POST /chat            (root span, sidecar)
    |-- traceparent injected into upstream headers
    |
    +-- [reporter:health-refresh]       (child span, agent process)
    +-- [reporter:heartbeat]            (child span, agent process)

Sidecar control plane:
  [cp:GET /health/ready]                (root span)
  [cp:GET /explore]                     (root span)
  [cp:GET /gap]                         (root span)

Sidecar internals:
  [events:ingest]                       (root span)
  [explain:generate]                    (root span)

Configuration flow

agent.yaml                    Environment
  |                              |
  |  tracing.endpoint            |  OTEL_EXPORTER_OTLP_ENDPOINT
  |  tracing.sampleRate          |  OTEL_EXPORTER_OTLP_PROTOCOL
  |  metrics.serviceName         |
  v                              v
resolveOtelEndpoint() -----> initTracing(OtelConfig)
                                |
                                v
                          TracerProvider
                            + BatchSpanProcessor
                            + OTLPHttpExporter (default)
                              or OTLPGrpcExporter
                            + TraceIdRatioBasedSampler
                            + W3CTraceContextPropagator

Graceful degradation

OTel configured?
  |
  +-- YES: initTracing() -> real spans exported via OTLP
  |
  +-- NO:  @opentelemetry/api returns no-op tracers
           -> zero overhead, no spans created
           -> SDK works without @agentspec/otel installed

New files

File Purpose
packages/otel/src/provider.ts TracerProvider + OTLP exporter setup
packages/otel/src/sampler.ts Rate-based sampler from manifest sampleRate
packages/otel/src/propagation.ts W3C traceparent inject/extract
packages/otel/src/index.ts Public API re-exports
docs/concepts/observability.md Concept doc: architecture, span reference
docs/guides/add-observability.md Step-by-step OTel setup guide

Modified files

File Change
packages/sidecar/src/proxy.ts Proxy span creation + traceparent injection
packages/sidecar/src/control-plane/index.ts CP endpoint span hooks
packages/sidecar/src/control-plane/events.ts Event ingestion span
packages/sidecar/src/control-plane/explain.ts Explain handler span
packages/sidecar/src/index.ts OTel init at startup + shutdown flush
packages/sidecar/src/config.ts otelEndpoint env var config
packages/sdk/src/agent/reporter.ts Health refresh + heartbeat spans
docs/.vitepress/config.mts Sidebar nav entries
CLAUDE.md Key Files table entry

Test plan

  • @agentspec/otel: 15 unit tests (provider, sampler, propagation)
  • @agentspec/sidecar: 254 tests (9 new OTel tests, 0 regressions)
  • @agentspec/sdk: 288 tests (3 new OTel tests, 0 regressions)
  • All packages build and typecheck cleanly
  • Proxy OTel test verifies traceparent header reaches upstream
  • Reporter OTel test verifies graceful degradation without provider
  • E2E: start Jaeger, run sidecar with backend: otel, verify spans in UI

Adds injectContext/extractContext/setupPropagation helpers wrapping the
W3CTraceContextPropagator from @opentelemetry/core. Adds @opentelemetry/core
as an explicit dependency since it is not re-exported by sdk-trace-base.
- Move @opentelemetry/api from devDependencies to dependencies (runtime import)
- Guard proxy span end in onResponse to prevent double-end on OPA block path
- Wrap explain handler in try/finally for reliable span cleanup
- Remove noisy audit-ring:push span (O(1) op, already covered by proxy:request)
- Use SDK resolveRef() for OTel endpoint resolution (handles $env:/$secret:/$file:)
- Add @agentspec/otel to Key Files in CLAUDE.md
- concepts/observability.md: architecture, span reference, config, degradation
- guides/add-observability.md: step-by-step setup with Jaeger/Tempo examples
- Add both to VitePress sidebar navigation
@iliassjabali iliassjabali marked this pull request as draft April 12, 2026 15:17
@iliassjabali iliassjabali self-assigned this Apr 12, 2026
- Bundle @agentspec/otel into sidecar dist via tsup noExternal
- Update Dockerfile to copy otel package, build it, and strip workspace dep
- Add 3 E2E tests: traceparent injection, distinct trace IDs, client trace propagation
- Update mock-agent to echo traceparent header in responses
- Enable OTel tracing in E2E test agent.yaml
The @opentelemetry/exporter-trace-otlp-grpc package depends on @grpc/grpc-js
which uses dynamic requires that break when bundled by esbuild/tsup. This
caused the sidecar Docker container to crash on startup.

- Make gRPC exporter a dynamic import (only loaded when protocol is 'grpc')
- Make initTracing() async to support the lazy import
- Mark gRPC exporter as external in sidecar tsup config
- Update provider tests for async initTracing
@iliassjabali iliassjabali marked this pull request as ready for review April 12, 2026 16:01
@iliassjabali iliassjabali requested a review from skokaina April 12, 2026 16:20
@iliassjabali iliassjabali removed their assignment Apr 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant