Skip to content

feat: support Langfuse trace grouping via LiteLLM proxy metadata#707

Open
DanielMaly wants to merge 4 commits into
plastic-labs:mainfrom
DanielMaly:feat/langfuse-trace-grouping-via-litellm-metadata
Open

feat: support Langfuse trace grouping via LiteLLM proxy metadata#707
DanielMaly wants to merge 4 commits into
plastic-labs:mainfrom
DanielMaly:feat/langfuse-trace-grouping-via-litellm-metadata

Conversation

@DanielMaly
Copy link
Copy Markdown

@DanielMaly DanielMaly commented May 21, 2026

Summary

Support Langfuse trace grouping for Honcho's agent loops by passing session_id through to LiteLLM proxy via request metadata. When LLM calls are routed through a LiteLLM proxy with callbacks: ["langfuse"], all calls from the same agent operation (dream cycle, dialectic request, deriver batch) are now grouped under one Langfuse session.

Problem

When Honcho's LLM calls are routed through a LiteLLM proxy with callbacks: ["langfuse"], each completion request creates a separate Langfuse trace with no correlation between them. A single dream cycle (deduction specialist → induction specialist, potentially 10–20 LLM calls) produces 10–20 orphaned traces. The same applies to dialectic tool loops and deriver batches.

This makes it difficult to:

  • Understand the full lifecycle of an agent operation
  • Attribute cost and latency to a specific dream/dialectic/deriver run
  • Debug multi-step agent behaviour in context

Solution

Add langfuse_session_id to LLMTelemetryContext. Agent entry points set it, and the OpenAI backend passes it as extra_body={metadata: {session_id: ...}} to LiteLLM. LiteLLM's Langfuse callback reads metadata.session_id to group traces under one session.

Data flow

Agent entry point (e.g. run_dream)
  → LLMTelemetryContext(langfuse_session_id="dream-abc123")
    → honcho_llm_call(telemetry=ctx)
      → honcho_llm_call_inner
        → extra_params["langfuse_session_id"] = "dream-abc123"
          → execute_completion / execute_stream
            → OpenAIBackend._build_params
              → params["extra_body"] = {"metadata": {"session_id": "dream-abc123"}}
                → LiteLLM proxy
                  → Langfuse callback: trace.session_id = "dream-abc123"

Session ID format

Agent Format Example Scope
Dreamer specialist dream-{run_id} dream-V1StGXR8 One dream cycle
Dialectic dialectic-{run_id} dialectic-3KjR9mN2 One chat request
Deriver deriver-{batch_id} deriver-K7pWxQ4z One message batch

Each agent already generates a unique ID for internal telemetry (the dreamer and dialectic use run_id from nanoid). The deriver now generates a batch_id nanoid per batch in process_representation_tasks_batch, matching the same pattern.

Changes

  • src/llm/types.py: Add langfuse_session_id: str | None = None to LLMTelemetryContext
  • src/llm/executor.py: Propagate langfuse_session_id from telemetry to extra_params dict
  • src/llm/backends/openai.py: When extra_params contains langfuse_session_id, emit extra_body with metadata.session_id
  • src/dreamer/specialists.py: Set langfuse_session_id from run_id
  • src/dialectic/core.py: Set langfuse_session_id from run_id
  • src/deriver/deriver.py: Generate batch_id nanoid per batch; set langfuse_session_id from it
  • src/llm/tool_loop.py: Preserve langfuse_session_id in per-iteration telemetry context copies
  • Tests: Verify extra_body is present when langfuse_session_id is set, absent when not

Why session_id, not trace_id?

LiteLLM supports both session_id (groups traces into a session) and trace_id/existing_trace_id (merges calls into a single trace). I chose session_id because:

  1. Robustness: If the first LLM call fails, subsequent calls still appear under the session. With existing_trace_id, orphaned spans result if the parent trace was never created.
  2. Simplicity: No need to distinguish "first call" vs "subsequent calls" — every call just carries the same session_id.
  3. Semantics: A dream cycle or dialectic request IS a session — multiple related traces that should be viewed together.

If tighter single-trace grouping is desired in the future, the same plumbing can carry trace_id/existing_trace_id with minimal changes.

Complements #693

This PR is complementary to #693 (Langfuse generation tracing). That PR makes individual LLM calls visible as proper generation observations via @conditional_observe. This PR ensures that when those calls are routed through a LiteLLM proxy, they're also grouped together by agent operation.

Related

Summary by CodeRabbit

  • New Features

    • Added per-operation session IDs across components so related interactions are consistently tagged and grouped in observability traces and propagated to upstream requests for improved trace correlation.
  • Tests

    • Added tests ensuring session IDs are included in backend requests when provided and omitted when not present.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ea8a74b3-9a0f-4beb-a3bf-19461b9f7a52

📥 Commits

Reviewing files that changed from the base of the PR and between 5630699 and f984737.

📒 Files selected for processing (3)
  • src/llm/backends/openai.py
  • src/llm/executor.py
  • tests/llm/test_backends/test_openai.py

Walkthrough

Adds an optional langfuse_session_id to LLMTelemetryContext, generates session IDs at entry points (deriver/dialectic/dreamer), preserves and forwards the field through executor/tool loop, and injects it into OpenAI requests as extra_body.metadata.session_id; tests cover presence/absence behavior.

Changes

Langfuse Session ID Tracking

Layer / File(s) Summary
Type contract
src/llm/types.py
LLMTelemetryContext gains optional langfuse_session_id: str | None field documented for Langfuse trace grouping via LiteLLM proxy metadata.
Session ID generation at entry points
src/deriver/deriver.py, src/dialectic/core.py, src/dreamer/specialists.py
Deriver generates a per-batch nanoid and sets langfuse_session_id="deriver-{batch_id}"; dialectic sets langfuse_session_id="dialectic-{run_id}"; dreamer sets langfuse_session_id="dream-{run_id}" (or None when run id falsy).
Execution pipeline propagation
src/llm/executor.py, src/llm/tool_loop.py
Executor conditionally copies telemetry.langfuse_session_id into per-call call_extras (extra_params), and tool loop preserves the field when cloning telemetry for iterations.
OpenAI backend integration
src/llm/backends/openai.py
_build_params reads extra_params["langfuse_session_id"] and injects it into the OpenAI request payload via extra_body.metadata.session_id when present and non-empty.
Backend integration tests
tests/llm/test_backends/test_openai.py
Async tests added to verify OpenAIBackend.complete includes extra_body.metadata.session_id when provided and omits extra_body when langfuse_session_id is absent or None.

Sequence Diagram

sequenceDiagram
  participant EntryPoint as Entry Point (deriver/dialectic/dreamer)
  participant Executor as LLM Executor
  participant ToolLoop as Tool Loop
  participant Backend as OpenAIBackend
  participant Langfuse as LiteLLM/Langfuse

  EntryPoint->>Executor: provide LLMTelemetryContext(langfuse_session_id)
  Executor->>Executor: copy langfuse_session_id -> call_extras
  Executor->>ToolLoop: pass telemetry for iteration
  ToolLoop->>ToolLoop: clone telemetry preserving langfuse_session_id
  Executor->>Backend: execute_* with extra_params(langfuse_session_id)
  Backend->>Backend: read extra_params.langfuse_session_id
  Backend->>Langfuse: include extra_body.metadata.session_id in request
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

A rabbit taps the telemetry bell,
Small IDs that group each tell—
From batches, dreams, and dialogues spun,
One tiny string ties traces to one. 🐇✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately summarizes the main change: adding support for Langfuse trace grouping via LiteLLM proxy metadata, which is the core functionality implemented across all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

When LLM calls are routed through a LiteLLM proxy with Langfuse
callback enabled, each completion request creates a separate trace
with no correlation between them. A single dream cycle (potentially
10-20 LLM calls) produces 10-20 orphaned traces.

This change adds langfuse_session_id to LLMTelemetryContext so
agent entry points can set a session ID that flows through to LiteLLM
as metadata.session_id. LiteLLM's Langfuse callback uses this to
group traces from the same agent operation under one session.

Changes:
- Add langfuse_session_id field to LLMTelemetryContext
- Thread it through honcho_llm_call_inner -> extra_params -> backend
- OpenAI backend emits extra_body with metadata.session_id
- Dreamer specialists set langfuse_session_id from run_id
- Dialectic sets langfuse_session_id from run_id
- Deriver sets langfuse_session_id from workspace and observed peer
- Tool loop iteration context preserves langfuse_session_id
- Tests: session_id present and absent cases
@DanielMaly DanielMaly marked this pull request as ready for review May 21, 2026 09:43
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/llm/test_backends/test_openai.py (1)

279-353: ⚡ Quick win

Use shared pytest fixtures instead of inline client setup in these new tests.

Both new tests duplicate mock setup; please switch to fixtures from tests/conftest.py for setup/teardown consistency and reduced repetition.

As per coding guidelines, "tests/**/*.py: Use pytest fixtures from tests/conftest.py for test setup and teardown".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/llm/test_backends/test_openai.py` around lines 279 - 353, Both tests
(test_openai_backend_includes_langfuse_session_id_in_extra_body and
test_openai_backend_omits_extra_body_without_langfuse_session_id) inline the
same Mock client setup; replace that duplicated setup by using the shared pytest
fixture(s) defined in tests/conftest.py (e.g., the mock OpenAI client fixture
used elsewhere) instead of creating client = Mock() and assigning
client.chat.completions.create directly; update the test signatures to accept
the fixture (and adjust any return_value behavior via the fixture or a helper on
the fixture) and instantiate OpenAIBackend with that fixture so the tests use
the centralized setup/teardown and avoid duplication when calling
OpenAIBackend(...) and asserting against
client.chat.completions.create.await_args.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/llm/backends/openai.py`:
- Around line 327-330: The code currently sets params["extra_body"] whenever
"langfuse_session_id" exists in extra_params, even if its value is None or an
empty string; change the guard to only set params["extra_body"] when
extra_params.get("langfuse_session_id") is a non-empty string (e.g., value is
not None, is instance of str, and value.strip() is not empty) so that metadata:
{"session_id": ...} is emitted only for real session IDs; update the conditional
around params["extra_body"] accordingly and leave other behavior unchanged.

---

Nitpick comments:
In `@tests/llm/test_backends/test_openai.py`:
- Around line 279-353: Both tests
(test_openai_backend_includes_langfuse_session_id_in_extra_body and
test_openai_backend_omits_extra_body_without_langfuse_session_id) inline the
same Mock client setup; replace that duplicated setup by using the shared pytest
fixture(s) defined in tests/conftest.py (e.g., the mock OpenAI client fixture
used elsewhere) instead of creating client = Mock() and assigning
client.chat.completions.create directly; update the test signatures to accept
the fixture (and adjust any return_value behavior via the fixture or a helper on
the fixture) and instantiate OpenAIBackend with that fixture so the tests use
the centralized setup/teardown and avoid duplication when calling
OpenAIBackend(...) and asserting against
client.chat.completions.create.await_args.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 348202a4-5320-496b-aac8-f5d4cc1445ae

📥 Commits

Reviewing files that changed from the base of the PR and between b5f24a6 and 33cc2c5.

📒 Files selected for processing (8)
  • src/deriver/deriver.py
  • src/dialectic/core.py
  • src/dreamer/specialists.py
  • src/llm/backends/openai.py
  • src/llm/executor.py
  • src/llm/tool_loop.py
  • src/llm/types.py
  • tests/llm/test_backends/test_openai.py

Comment thread src/llm/backends/openai.py Outdated
…kspace-observed pair

The previous deriver-{workspace}-{observed} key created an
ever-growing session that accumulated all deriver calls for a peer
pair. Now generates a batch_id nanoid per batch for properly scoped
sessions: deriver-{batch_id}
Addresses CodeRabbit review: only emit extra_body when
langfuse_session_id is a non-empty string, not when it is
None or empty. Adds test for the explicit-None case.
@VVoruganti
Copy link
Copy Markdown
Collaborator

Is the idea here to add support so that you can centralize the tracing via a litellm proxy rather than via Honcho? A bit confused on the goal if Honcho already has langfuse tracing built into it.

Would this work without litellm?

@DanielMaly
Copy link
Copy Markdown
Author

DanielMaly commented May 21, 2026

@VVoruganti

Is the idea here to add support so that you can centralize the tracing via a litellm proxy rather than via Honcho? A bit confused on the goal if Honcho already has langfuse tracing built into it.

Would this work without litellm?

The issue here is that if traces go from Honcho directly to Langfuse, any routing and cost-tracking information from LiteLLM (or any other proxy) is lost because the trace bypasses it. This change allows a session id to propagate through the OpenAI-compatible endpoint which gives a lot more flexibility in observability integrations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants