Skip to content

fix(trace): LLM_TTFT parent node pointing to wrong span + missing ERROR/CANCELLED status colors#40

Open
Dasooul03 wants to merge 1 commit into
nageoffer:mainfrom
Dasooul03:fix/llm-ttft-parent-node-and-status-colors
Open

fix(trace): LLM_TTFT parent node pointing to wrong span + missing ERROR/CANCELLED status colors#40
Dasooul03 wants to merge 1 commit into
nageoffer:mainfrom
Dasooul03:fix/llm-ttft-parent-node-and-status-colors

Conversation

@Dasooul03

Copy link
Copy Markdown

Summary

  • Fix llm-first-packet (LLM_TTFT) node having wrong parentNodeId pointing to llm-stream-routing instead of the specific provider stream node
  • Fix frontend trace detail page crashing in failover scenarios due to unmapped ERROR/CANCELLED statuses

Problem

Backend: LLM_TTFT parent node wrong

AbstractOpenAIStyleChatClient.doStreamChat() detached the span in a finally block, but RoutingLLMService.awaitFirstPacket() (annotated with @RagTraceNode("llm-first-packet")) executed after detach, causing the TTFT node to incorrectly point to llm-stream-routing as parent.

Expected trace tree:

llm-stream-routing
└── bailian-stream-chat (LLM_PROVIDER)
    └── llm-first-packet (LLM_TTFT)

Actual trace tree (before fix):

llm-stream-routing
├── bailian-stream-chat (LLM_PROVIDER)
└── llm-first-packet (LLM_TTFT)

Frontend: ERROR/CANCELLED statuses not mapped

STATUS_COLORS only mapped success/failed/running/default. In failover scenarios, provider nodes have CANCELLED or ERROR status, causing the page to crash.

Fix

Backend

  • StreamCancellationHandle: added default void detach() {} for backward-compatible span lifecycle control
  • AbstractOpenAIStyleChatClient.doStreamChat(): no longer detaches in finally; instead returns a handle that carries the detach responsibility
  • RoutingLLMService.streamChat(): calls handle.detach() after awaitFirstPacket completes, wrapped in try-finally to guarantee detach on both success and failover paths

Frontend

  • STATUS_COLORS: added error (red) and cancelled (gray) status mappings
  • getStatusColors: added ?? STATUS_COLORS.default fallback to prevent crashes on unmapped statuses
  • traceUtils.ts: added error/cancelled to statusLabel and statusBadgeVariant

Test Plan

  • Normal chat: verify llm-first-packet parentNodeId points to the specific provider stream node
  • Failover: simulate first-packet timeout triggering provider switch, verify correct trace tree structure after fallback
  • Frontend trace detail page: verify ERROR/CANCELLED status nodes render correctly without crashing

Closes #35

🤖 Generated with Claude Code

…OR/CANCELLED status colors

- Add detach() to StreamCancellationHandle interface for span lifecycle control
- Defer span detach in AbstractOpenAIStyleChatClient from finally block to returned handle
- Call handle.detach() after awaitFirstPacket in RoutingLLMService, ensuring
  llm-first-packet (LLM_TTFT) node correctly nests under the provider stream node
- Use try-finally per candidate to guarantee detach on both success and failover paths
- Frontend: add error/cancelled mappings to STATUS_COLORS, statusLabel, and statusBadgeVariant
- Add fallback ?? STATUS_COLORS.default in getStatusColors to prevent crash on unmapped statuses

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant