feat: support Langfuse trace grouping via LiteLLM proxy metadata by DanielMaly · Pull Request #707 · plastic-labs/honcho

DanielMaly · 2026-05-21T09:35:33Z

Summary

Support Langfuse trace grouping for Honcho's agent loops by passing session_id through to LiteLLM proxy via request metadata. When LLM calls are routed through a LiteLLM proxy with callbacks: ["langfuse"], all calls from the same agent operation (dream cycle, dialectic request, deriver batch) are now grouped under one Langfuse session.

Problem

When Honcho's LLM calls are routed through a LiteLLM proxy with callbacks: ["langfuse"], each completion request creates a separate Langfuse trace with no correlation between them. A single dream cycle (deduction specialist → induction specialist, potentially 10–20 LLM calls) produces 10–20 orphaned traces. The same applies to dialectic tool loops and deriver batches.

This makes it difficult to:

Understand the full lifecycle of an agent operation
Attribute cost and latency to a specific dream/dialectic/deriver run
Debug multi-step agent behaviour in context

Solution

Add langfuse_session_id to LLMTelemetryContext. Agent entry points set it, and the OpenAI backend passes it as extra_body={metadata: {session_id: ...}} to LiteLLM. LiteLLM's Langfuse callback reads metadata.session_id to group traces under one session.

Data flow

Agent entry point (e.g. run_dream)
  → LLMTelemetryContext(langfuse_session_id="dream-abc123")
    → honcho_llm_call(telemetry=ctx)
      → honcho_llm_call_inner
        → extra_params["langfuse_session_id"] = "dream-abc123"
          → execute_completion / execute_stream
            → OpenAIBackend._build_params
              → params["extra_body"] = {"metadata": {"session_id": "dream-abc123"}}
                → LiteLLM proxy
                  → Langfuse callback: trace.session_id = "dream-abc123"

Session ID format

Agent	Format	Example	Scope
Dreamer specialist	`dream-{run_id}`	`dream-V1StGXR8`	One dream cycle
Dialectic	`dialectic-{run_id}`	`dialectic-3KjR9mN2`	One chat request
Deriver	`deriver-{batch_id}`	`deriver-K7pWxQ4z`	One message batch

Each agent already generates a unique ID for internal telemetry (the dreamer and dialectic use run_id from nanoid). The deriver now generates a batch_id nanoid per batch in process_representation_tasks_batch, matching the same pattern.

Changes

src/llm/types.py: Add langfuse_session_id: str | None = None to LLMTelemetryContext
src/llm/executor.py: Propagate langfuse_session_id from telemetry to extra_params dict
src/llm/backends/openai.py: When extra_params contains langfuse_session_id, emit extra_body with metadata.session_id
src/dreamer/specialists.py: Set langfuse_session_id from run_id
src/dialectic/core.py: Set langfuse_session_id from run_id
src/deriver/deriver.py: Generate batch_id nanoid per batch; set langfuse_session_id from it
src/llm/tool_loop.py: Preserve langfuse_session_id in per-iteration telemetry context copies
Tests: Verify extra_body is present when langfuse_session_id is set, absent when not

Why session_id, not trace_id?

LiteLLM supports both session_id (groups traces into a session) and trace_id/existing_trace_id (merges calls into a single trace). I chose session_id because:

Robustness: If the first LLM call fails, subsequent calls still appear under the session. With existing_trace_id, orphaned spans result if the parent trace was never created.
Simplicity: No need to distinguish "first call" vs "subsequent calls" — every call just carries the same session_id.
Semantics: A dream cycle or dialectic request IS a session — multiple related traces that should be viewed together.

If tighter single-trace grouping is desired in the future, the same plumbing can carry trace_id/existing_trace_id with minimal changes.

Complements #693

This PR is complementary to #693 (Langfuse generation tracing). That PR makes individual LLM calls visible as proper generation observations via @conditional_observe. This PR ensures that when those calls are routed through a LiteLLM proxy, they're also grouped together by agent operation.

feat: Langfuse LLM observability — generation tracing with token usage reporting #693 — Langfuse generation tracing (active PR)
fix: add namespace, model, and provider to langfuse metadata so we ca… #565 — Langfuse metadata (namespace, model, provider)
feat: rework langfuse setup to work more cleanly; fix bug in dream scheduling #253 — Langfuse rework introducing conditional_observe

Summary by CodeRabbit

New Features
- Added per-operation session IDs across components so related interactions are consistently tagged and grouped in observability traces and propagated to upstream requests for improved trace correlation.
Tests
- Added tests ensuring session IDs are included in backend requests when provided and omitted when not present.

coderabbitai · 2026-05-21T09:35:44Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ea8a74b3-9a0f-4beb-a3bf-19461b9f7a52

📥 Commits

Reviewing files that changed from the base of the PR and between 5630699 and f984737.

📒 Files selected for processing (3)

src/llm/backends/openai.py
src/llm/executor.py
tests/llm/test_backends/test_openai.py

Walkthrough

Adds an optional langfuse_session_id to LLMTelemetryContext, generates session IDs at entry points (deriver/dialectic/dreamer), preserves and forwards the field through executor/tool loop, and injects it into OpenAI requests as extra_body.metadata.session_id; tests cover presence/absence behavior.

Changes

Langfuse Session ID Tracking

Layer / File(s)	Summary
Type contract `src/llm/types.py`	`LLMTelemetryContext` gains optional `langfuse_session_id: str \| None` field documented for Langfuse trace grouping via LiteLLM proxy metadata.
Session ID generation at entry points `src/deriver/deriver.py`, `src/dialectic/core.py`, `src/dreamer/specialists.py`	Deriver generates a per-batch nanoid and sets `langfuse_session_id="deriver-{batch_id}"`; dialectic sets `langfuse_session_id="dialectic-{run_id}"`; dreamer sets `langfuse_session_id="dream-{run_id}"` (or `None` when run id falsy).
Execution pipeline propagation `src/llm/executor.py`, `src/llm/tool_loop.py`	Executor conditionally copies `telemetry.langfuse_session_id` into per-call `call_extras` (`extra_params`), and tool loop preserves the field when cloning telemetry for iterations.
OpenAI backend integration `src/llm/backends/openai.py`	`_build_params` reads `extra_params["langfuse_session_id"]` and injects it into the OpenAI request payload via `extra_body.metadata.session_id` when present and non-empty.
Backend integration tests `tests/llm/test_backends/test_openai.py`	Async tests added to verify `OpenAIBackend.complete` includes `extra_body.metadata.session_id` when provided and omits `extra_body` when `langfuse_session_id` is absent or `None`.

Sequence Diagram

sequenceDiagram
  participant EntryPoint as Entry Point (deriver/dialectic/dreamer)
  participant Executor as LLM Executor
  participant ToolLoop as Tool Loop
  participant Backend as OpenAIBackend
  participant Langfuse as LiteLLM/Langfuse

  EntryPoint->>Executor: provide LLMTelemetryContext(langfuse_session_id)
  Executor->>Executor: copy langfuse_session_id -> call_extras
  Executor->>ToolLoop: pass telemetry for iteration
  ToolLoop->>ToolLoop: clone telemetry preserving langfuse_session_id
  Executor->>Backend: execute_* with extra_params(langfuse_session_id)
  Backend->>Backend: read extra_params.langfuse_session_id
  Backend->>Langfuse: include extra_body.metadata.session_id in request

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

A rabbit taps the telemetry bell,
Small IDs that group each tell—
From batches, dreams, and dialogues spun,
One tiny string ties traces to one. 🐇✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title accurately summarizes the main change: adding support for Langfuse trace grouping via LiteLLM proxy metadata, which is the core functionality implemented across all modified files.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

When LLM calls are routed through a LiteLLM proxy with Langfuse callback enabled, each completion request creates a separate trace with no correlation between them. A single dream cycle (potentially 10-20 LLM calls) produces 10-20 orphaned traces. This change adds langfuse_session_id to LLMTelemetryContext so agent entry points can set a session ID that flows through to LiteLLM as metadata.session_id. LiteLLM's Langfuse callback uses this to group traces from the same agent operation under one session. Changes: - Add langfuse_session_id field to LLMTelemetryContext - Thread it through honcho_llm_call_inner -> extra_params -> backend - OpenAI backend emits extra_body with metadata.session_id - Dreamer specialists set langfuse_session_id from run_id - Dialectic sets langfuse_session_id from run_id - Deriver sets langfuse_session_id from workspace and observed peer - Tool loop iteration context preserves langfuse_session_id - Tests: session_id present and absent cases

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/llm/test_backends/test_openai.py (1)

279-353: ⚡ Quick win

Use shared pytest fixtures instead of inline client setup in these new tests.

Both new tests duplicate mock setup; please switch to fixtures from tests/conftest.py for setup/teardown consistency and reduced repetition.

As per coding guidelines, "tests/**/*.py: Use pytest fixtures from tests/conftest.py for test setup and teardown".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/llm/test_backends/test_openai.py` around lines 279 - 353, Both tests
(test_openai_backend_includes_langfuse_session_id_in_extra_body and
test_openai_backend_omits_extra_body_without_langfuse_session_id) inline the
same Mock client setup; replace that duplicated setup by using the shared pytest
fixture(s) defined in tests/conftest.py (e.g., the mock OpenAI client fixture
used elsewhere) instead of creating client = Mock() and assigning
client.chat.completions.create directly; update the test signatures to accept
the fixture (and adjust any return_value behavior via the fixture or a helper on
the fixture) and instantiate OpenAIBackend with that fixture so the tests use
the centralized setup/teardown and avoid duplication when calling
OpenAIBackend(...) and asserting against
client.chat.completions.create.await_args.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/llm/backends/openai.py`:
- Around line 327-330: The code currently sets params["extra_body"] whenever
"langfuse_session_id" exists in extra_params, even if its value is None or an
empty string; change the guard to only set params["extra_body"] when
extra_params.get("langfuse_session_id") is a non-empty string (e.g., value is
not None, is instance of str, and value.strip() is not empty) so that metadata:
{"session_id": ...} is emitted only for real session IDs; update the conditional
around params["extra_body"] accordingly and leave other behavior unchanged.

---

Nitpick comments:
In `@tests/llm/test_backends/test_openai.py`:
- Around line 279-353: Both tests
(test_openai_backend_includes_langfuse_session_id_in_extra_body and
test_openai_backend_omits_extra_body_without_langfuse_session_id) inline the
same Mock client setup; replace that duplicated setup by using the shared pytest
fixture(s) defined in tests/conftest.py (e.g., the mock OpenAI client fixture
used elsewhere) instead of creating client = Mock() and assigning
client.chat.completions.create directly; update the test signatures to accept
the fixture (and adjust any return_value behavior via the fixture or a helper on
the fixture) and instantiate OpenAIBackend with that fixture so the tests use
the centralized setup/teardown and avoid duplication when calling
OpenAIBackend(...) and asserting against
client.chat.completions.create.await_args.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 348202a4-5320-496b-aac8-f5d4cc1445ae

📥 Commits

Reviewing files that changed from the base of the PR and between b5f24a6 and 33cc2c5.

📒 Files selected for processing (8)

src/deriver/deriver.py
src/dialectic/core.py
src/dreamer/specialists.py
src/llm/backends/openai.py
src/llm/executor.py
src/llm/tool_loop.py
src/llm/types.py
tests/llm/test_backends/test_openai.py

…kspace-observed pair The previous deriver-{workspace}-{observed} key created an ever-growing session that accumulated all deriver calls for a peer pair. Now generates a batch_id nanoid per batch for properly scoped sessions: deriver-{batch_id}

Addresses CodeRabbit review: only emit extra_body when langfuse_session_id is a non-empty string, not when it is None or empty. Adds test for the explicit-None case.

VVoruganti · 2026-05-21T21:02:59Z

Is the idea here to add support so that you can centralize the tracing via a litellm proxy rather than via Honcho? A bit confused on the goal if Honcho already has langfuse tracing built into it.

Would this work without litellm?

DanielMaly · 2026-05-21T21:30:22Z

@VVoruganti

Is the idea here to add support so that you can centralize the tracing via a litellm proxy rather than via Honcho? A bit confused on the goal if Honcho already has langfuse tracing built into it.

Would this work without litellm?

The issue here is that if traces go from Honcho directly to Langfuse, any routing and cost-tracking information from LiteLLM (or any other proxy) is lost because the trace bypasses it. This change allows a session id to propagate through the OpenAI-compatible endpoint which gives a lot more flexibility in observability integrations.

feat: support Langfuse trace grouping via LiteLLM proxy metadata

df480a0

DanielMaly marked this pull request as ready for review May 21, 2026 09:43

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Comment thread src/llm/backends/openai.py Outdated

DanielMaly added 2 commits May 21, 2026 11:51

fix: guard langfuse_session_id against None/empty values

f984737

Addresses CodeRabbit review: only emit extra_body when langfuse_session_id is a non-empty string, not when it is None or empty. Adds test for the explicit-None case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support Langfuse trace grouping via LiteLLM proxy metadata#707

feat: support Langfuse trace grouping via LiteLLM proxy metadata#707
DanielMaly wants to merge 4 commits into
plastic-labs:mainfrom
DanielMaly:feat/langfuse-trace-grouping-via-litellm-metadata

DanielMaly commented May 21, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 21, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

VVoruganti commented May 21, 2026

Uh oh!

DanielMaly commented May 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DanielMaly commented May 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Data flow

Session ID format

Changes

Why session_id, not trace_id?

Complements #693

Related

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

VVoruganti commented May 21, 2026

Uh oh!

DanielMaly commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DanielMaly commented May 21, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 21, 2026 •

edited

Loading

DanielMaly commented May 21, 2026 •

edited

Loading