feat: support Langfuse trace grouping via LiteLLM proxy metadata#707
feat: support Langfuse trace grouping via LiteLLM proxy metadata#707DanielMaly wants to merge 4 commits into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
WalkthroughAdds an optional ChangesLangfuse Session ID Tracking
Sequence DiagramsequenceDiagram
participant EntryPoint as Entry Point (deriver/dialectic/dreamer)
participant Executor as LLM Executor
participant ToolLoop as Tool Loop
participant Backend as OpenAIBackend
participant Langfuse as LiteLLM/Langfuse
EntryPoint->>Executor: provide LLMTelemetryContext(langfuse_session_id)
Executor->>Executor: copy langfuse_session_id -> call_extras
Executor->>ToolLoop: pass telemetry for iteration
ToolLoop->>ToolLoop: clone telemetry preserving langfuse_session_id
Executor->>Backend: execute_* with extra_params(langfuse_session_id)
Backend->>Backend: read extra_params.langfuse_session_id
Backend->>Langfuse: include extra_body.metadata.session_id in request
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
When LLM calls are routed through a LiteLLM proxy with Langfuse callback enabled, each completion request creates a separate trace with no correlation between them. A single dream cycle (potentially 10-20 LLM calls) produces 10-20 orphaned traces. This change adds langfuse_session_id to LLMTelemetryContext so agent entry points can set a session ID that flows through to LiteLLM as metadata.session_id. LiteLLM's Langfuse callback uses this to group traces from the same agent operation under one session. Changes: - Add langfuse_session_id field to LLMTelemetryContext - Thread it through honcho_llm_call_inner -> extra_params -> backend - OpenAI backend emits extra_body with metadata.session_id - Dreamer specialists set langfuse_session_id from run_id - Dialectic sets langfuse_session_id from run_id - Deriver sets langfuse_session_id from workspace and observed peer - Tool loop iteration context preserves langfuse_session_id - Tests: session_id present and absent cases
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/llm/test_backends/test_openai.py (1)
279-353: ⚡ Quick winUse shared pytest fixtures instead of inline client setup in these new tests.
Both new tests duplicate mock setup; please switch to fixtures from
tests/conftest.pyfor setup/teardown consistency and reduced repetition.As per coding guidelines, "tests/**/*.py: Use pytest fixtures from tests/conftest.py for test setup and teardown".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/llm/test_backends/test_openai.py` around lines 279 - 353, Both tests (test_openai_backend_includes_langfuse_session_id_in_extra_body and test_openai_backend_omits_extra_body_without_langfuse_session_id) inline the same Mock client setup; replace that duplicated setup by using the shared pytest fixture(s) defined in tests/conftest.py (e.g., the mock OpenAI client fixture used elsewhere) instead of creating client = Mock() and assigning client.chat.completions.create directly; update the test signatures to accept the fixture (and adjust any return_value behavior via the fixture or a helper on the fixture) and instantiate OpenAIBackend with that fixture so the tests use the centralized setup/teardown and avoid duplication when calling OpenAIBackend(...) and asserting against client.chat.completions.create.await_args.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/llm/backends/openai.py`:
- Around line 327-330: The code currently sets params["extra_body"] whenever
"langfuse_session_id" exists in extra_params, even if its value is None or an
empty string; change the guard to only set params["extra_body"] when
extra_params.get("langfuse_session_id") is a non-empty string (e.g., value is
not None, is instance of str, and value.strip() is not empty) so that metadata:
{"session_id": ...} is emitted only for real session IDs; update the conditional
around params["extra_body"] accordingly and leave other behavior unchanged.
---
Nitpick comments:
In `@tests/llm/test_backends/test_openai.py`:
- Around line 279-353: Both tests
(test_openai_backend_includes_langfuse_session_id_in_extra_body and
test_openai_backend_omits_extra_body_without_langfuse_session_id) inline the
same Mock client setup; replace that duplicated setup by using the shared pytest
fixture(s) defined in tests/conftest.py (e.g., the mock OpenAI client fixture
used elsewhere) instead of creating client = Mock() and assigning
client.chat.completions.create directly; update the test signatures to accept
the fixture (and adjust any return_value behavior via the fixture or a helper on
the fixture) and instantiate OpenAIBackend with that fixture so the tests use
the centralized setup/teardown and avoid duplication when calling
OpenAIBackend(...) and asserting against
client.chat.completions.create.await_args.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 348202a4-5320-496b-aac8-f5d4cc1445ae
📒 Files selected for processing (8)
src/deriver/deriver.pysrc/dialectic/core.pysrc/dreamer/specialists.pysrc/llm/backends/openai.pysrc/llm/executor.pysrc/llm/tool_loop.pysrc/llm/types.pytests/llm/test_backends/test_openai.py
…kspace-observed pair
The previous deriver-{workspace}-{observed} key created an
ever-growing session that accumulated all deriver calls for a peer
pair. Now generates a batch_id nanoid per batch for properly scoped
sessions: deriver-{batch_id}
Addresses CodeRabbit review: only emit extra_body when langfuse_session_id is a non-empty string, not when it is None or empty. Adds test for the explicit-None case.
|
Is the idea here to add support so that you can centralize the tracing via a litellm proxy rather than via Honcho? A bit confused on the goal if Honcho already has langfuse tracing built into it. Would this work without litellm? |
The issue here is that if traces go from Honcho directly to Langfuse, any routing and cost-tracking information from LiteLLM (or any other proxy) is lost because the trace bypasses it. This change allows a session id to propagate through the OpenAI-compatible endpoint which gives a lot more flexibility in observability integrations. |
Summary
Support Langfuse trace grouping for Honcho's agent loops by passing
session_idthrough to LiteLLM proxy via request metadata. When LLM calls are routed through a LiteLLM proxy withcallbacks: ["langfuse"], all calls from the same agent operation (dream cycle, dialectic request, deriver batch) are now grouped under one Langfuse session.Problem
When Honcho's LLM calls are routed through a LiteLLM proxy with
callbacks: ["langfuse"], each completion request creates a separate Langfuse trace with no correlation between them. A single dream cycle (deduction specialist → induction specialist, potentially 10–20 LLM calls) produces 10–20 orphaned traces. The same applies to dialectic tool loops and deriver batches.This makes it difficult to:
Solution
Add
langfuse_session_idtoLLMTelemetryContext. Agent entry points set it, and the OpenAI backend passes it asextra_body={metadata: {session_id: ...}}to LiteLLM. LiteLLM's Langfuse callback readsmetadata.session_idto group traces under one session.Data flow
Session ID format
dream-{run_id}dream-V1StGXR8dialectic-{run_id}dialectic-3KjR9mN2deriver-{batch_id}deriver-K7pWxQ4zEach agent already generates a unique ID for internal telemetry (the dreamer and dialectic use
run_idfrom nanoid). The deriver now generates abatch_idnanoid per batch inprocess_representation_tasks_batch, matching the same pattern.Changes
src/llm/types.py: Addlangfuse_session_id: str | None = NonetoLLMTelemetryContextsrc/llm/executor.py: Propagatelangfuse_session_idfrom telemetry toextra_paramsdictsrc/llm/backends/openai.py: Whenextra_paramscontainslangfuse_session_id, emitextra_bodywithmetadata.session_idsrc/dreamer/specialists.py: Setlangfuse_session_idfromrun_idsrc/dialectic/core.py: Setlangfuse_session_idfromrun_idsrc/deriver/deriver.py: Generatebatch_idnanoid per batch; setlangfuse_session_idfrom itsrc/llm/tool_loop.py: Preservelangfuse_session_idin per-iteration telemetry context copiesextra_bodyis present whenlangfuse_session_idis set, absent when notWhy session_id, not trace_id?
LiteLLM supports both
session_id(groups traces into a session) andtrace_id/existing_trace_id(merges calls into a single trace). I chosesession_idbecause:existing_trace_id, orphaned spans result if the parent trace was never created.session_id.If tighter single-trace grouping is desired in the future, the same plumbing can carry
trace_id/existing_trace_idwith minimal changes.Complements #693
This PR is complementary to #693 (Langfuse generation tracing). That PR makes individual LLM calls visible as proper generation observations via
@conditional_observe. This PR ensures that when those calls are routed through a LiteLLM proxy, they're also grouped together by agent operation.Related
Summary by CodeRabbit
New Features
Tests