Skip to content

[BUG]: Downstream spans are lost when nginx returns 499 (client closed connection) #310

@jBouyoud

Description

@jBouyoud

Module Version(s)

1.12.0

Bug Report

Description

When using nginx-datadog for head-based sampling on an nginx ingress, downstream spans are silently dropped after nginx records a 499 (client closed connection) status. The trace becomes incomplete even though downstream services actually processed the request (confirmed by correlated logs).

Environment

  • Architecture: Kubernetes with nginx Ingress Controller + nginx-datadog module
  • Sampling: Head-based sampling at the nginx (ingress) level
  • Tracing propagation: Trace context is correctly propagated through the full call chain

Call chain

ingress-public (nginx) → service-A → ingress-private (nginx) → service-B

Reproduction Code

Steps to reproduce

  1. service-A sends an HTTP request through ingress-private to service-B
  2. service-A has an HTTP client timeout of 800ms
  3. service-B takes longer than 800ms to respond
  4. service-A closes the connection after its timeout
  5. ingress-private (nginx) records a 499 status

Observed behavior

  • The trace only contains spans up to the 499 event on ingress-public (nginx) & service-A
  • No spans are reported after the 499 timestamp (i.e., spans from ingress-private (nginx) and service-B are missing)
  • However, logs with the same trace_id exist for ingress-private (nginx) and service-B , proving the request was received and processed downstream
  • The trace appears truncated/incomplete in Datadog APM

Expected behavior

All spans from the full call chain should be reported to Datadog, regardless of whether the originating client closed the connection. The 499 on nginx should not prevent downstream spans from being flushed.

Hypothesis

When nginx handles a 499, it appears that:

  1. nginx aborts the upstream connection to service-B
  2. The nginx-datadog module may stop propagating or flushing spans for that request upon client disconnection
  3. Even if service-B continues to run and generates spans, the trace context chain is broken at the nginx level

Questions

  1. Does the nginx-datadog module explicitly handle the 499 case? Does it flush its own span before aborting?
  2. Is there a way to ensure that spans generated by downstream services (after the 499) are still correctly associated with the original trace?
  3. Could the module be enhanced to always flush/finalize spans on client disconnection rather than silently dropping them?

Error Logs

No response

Operating System

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions