Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 109 additions & 0 deletions .claude/skills/buttercup-langfuse/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
---
name: buttercup-langfuse
description: Buttercup-specific Langfuse wiring. Use when adding LLM tracing to a new LangChain/LangGraph entry point in this repo, attaching tags/metadata to existing chains, troubleshooting "no traces in Langfuse" against this codebase, or touching the Helm/values surface (`global.langfuse.*`, `langfuse-secret.yaml`, `common-env.yaml`). For platform-level Langfuse usage (CLI, instrumentation patterns generally, prompt migration, SDK upgrades), use the `langfuse` skill instead.
---

# Langfuse wiring in Buttercup

How *this repo* integrates Langfuse. For general Langfuse platform/CLI/instrumentation guidance, defer to the `langfuse` skill — this skill only covers Buttercup-local conventions.

Tracing is always optional at runtime: if the env vars are unset or the host is unreachable, callbacks resolve to an empty list and code paths are unaffected. Follow the existing pattern — do not introduce a parallel integration.

## The one helper to use

`buttercup.common.llm.get_langfuse_callbacks() -> list[BaseCallbackHandler]`

- Defined in `common/src/buttercup/common/llm.py`
- `@functools.cache`d — cheap to call repeatedly, but the auth/connectivity probe runs only once per process
- Returns `[]` when Langfuse is disabled, misconfigured, or unreachable
- Internally: runs `is_langfuse_available()` (checks `LANGFUSE_HOST`, then HTTP-probes `/api/public/ingestion`), constructs `langfuse.langchain.CallbackHandler()`, then verifies credentials via `langfuse_auth_check()`

Do **not** instantiate `CallbackHandler` directly or read the env vars yourself. Always go through `get_langfuse_callbacks()`.

## Standard wiring pattern

Attach the callbacks via `RunnableConfig` on the compiled chain/graph. Canonical examples in the repo:

- `patcher/src/buttercup/patcher/agents/leader.py` — LangGraph with tags + metadata
- `seed-gen/src/buttercup/seed_gen/seed_explore.py` — minimal LangGraph
- `seed-gen/src/buttercup/seed_gen/task.py`, `vuln_base_task.py`, `seed_init.py` — variants

Minimal form:

```python
from langchain_core.runnables import RunnableConfig
from buttercup.common.llm import get_langfuse_callbacks

llm_callbacks = get_langfuse_callbacks()
chain = workflow.compile().with_config(
RunnableConfig(tags=["<short-task-name>"], callbacks=llm_callbacks),
)
chain.invoke(state)
```

Rich form (patcher-style) — `tags` for filtering, `metadata` for searchable fields in the Langfuse UI:

```python
chain = patch_team.compile().with_config(
RunnableConfig(
callbacks=llm_callbacks,
tags=["patch_team", challenge.name, task_id, internal_patch_id],
metadata={
"task_id": task_id,
"internal_patch_id": internal_patch_id,
"challenge_project_name": challenge.name,
},
recursion_limit=RECURSION_LIMIT,
configurable={...},
),
)
```

Conventions:
- First tag is the workflow/team name (`"patch_team"`, `"seed-explore"`, …). Identifies the *kind* of run in the Langfuse UI.
- Subsequent tags are high-cardinality identifiers (task_id, challenge name, patch id) for drill-down.
- `metadata` mirrors the identifier tags as structured fields — tags are best for filtering, metadata for inspection.
- Don't reach for `langfuse.observe()` decorators or other SDK entry points — this codebase routes everything through the LangChain callback handler.

## When you're adding a NEW LangChain/LangGraph entry point

1. Import `get_langfuse_callbacks` from `buttercup.common.llm`.
2. Call it once near where you compile the chain/graph.
3. Pass through `RunnableConfig(callbacks=..., tags=[...], metadata={...})`.
4. Pick a workflow tag that doesn't collide with existing ones (`patch_team`, `seed-explore`, `seed-init`, `vuln-discovery`, …).
5. If the component runs in k8s, confirm its Deployment template pulls in the Langfuse env block (see below). Most existing components already do.

## Deployment / config surface

Env vars (consumed by `get_langfuse_callbacks()`):
- `LANGFUSE_HOST` — base URL, e.g. `https://cloud.langfuse.com`
- `LANGFUSE_PUBLIC_KEY` (`pk-lf-...`)
- `LANGFUSE_SECRET_KEY` (`sk-lf-...`)

Where they're set:
- Local docker-compose: `dev/docker-compose/env.template`
- Local make-based deploy: `deployment/env.template` (gated by `LANGFUSE_ENABLED`)
- Kubernetes: `deployment/k8s/values.yaml` under `global.langfuse.{enabled,host,publicKey,secretKey}`
- Rendered into the `<release>-langfuse-secrets` Secret by `deployment/k8s/templates/langfuse-secret.yaml`
- Injected into pods via the `buttercup.env.langfuse` helper in `deployment/k8s/templates/common-env.yaml`
- A new component's Deployment template must include that helper to get traces

Python dependency: `langfuse ~=4.0.1`, declared under `[project.optional-dependencies] full` in `common/pyproject.toml`. Provided via the `[full]` extra of `common`. Components that emit traces depend on `common[full]` (as patcher and seed-gen already do). Don't pin `langfuse` independently — re-use the extra.

## Debugging "I don't see traces"

Walk the chain top-to-bottom; the first failing step short-circuits everything downstream:

1. **`LANGFUSE_HOST` unset.** Logs: `"LangFuse not configured"`. Set the env var.
2. **Host unreachable / not Langfuse.** `is_langfuse_available()` POSTs to `/api/public/ingestion` and expects HTTP 401 (unauthenticated). Anything else → `False`. Check network reachability from the pod and that the URL is correct.
3. **Bad keys.** Logs: `"LangFuse authentication failed"`. `langfuse_auth_check()` POSTs the same endpoint with basic auth and expects HTTP 400 (authenticated but bad payload). 401 means wrong keys. Rotate the secret and re-apply.
4. **Callbacks not attached.** `get_langfuse_callbacks()` returned a handler, but the runnable was invoked without `RunnableConfig(callbacks=...)`. Grep the call site — easy to miss when refactoring.
5. **Wrong runnable.** Only LangChain/LangGraph runnables route through callbacks. Direct HTTP calls to OpenAI/LiteLLM bypass Langfuse entirely. If the new code path uses `httpx.post(...)` or similar, it won't trace — that's expected.
6. **Cache staleness.** All three helpers are `@functools.cache`d per process. If env vars change at runtime, the process must restart. In k8s this means rolling the pod after a secret change.

## What NOT to do

- Don't add a new `langfuse` direct dependency in a sibling component's `pyproject.toml` — depend on `common[full]` instead (as patcher and seed-gen do) to pull it in via the extra.
- Don't gate code paths on `is_langfuse_available()` — the empty-callback-list pattern already makes Langfuse a no-op when disabled.
- Don't pass the callback list into individual `llm.invoke()` calls when there's a parent chain — attach at the highest reasonable scope (the compiled workflow) so child runs nest correctly in the Langfuse UI.
- Don't conflate Langfuse with OpenTelemetry. They coexist: Langfuse for LLM trace nesting/prompt inspection, OTel (`buttercup.common.telemetry`) for system-level spans. `seed_explore.py` shows both wrapped around the same `chain.invoke(...)`.
140 changes: 140 additions & 0 deletions .claude/skills/langfuse/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
---
name: langfuse
description: Interact with Langfuse and access its documentation. Use when needing to (1) query or modify Langfuse data programmatically via the CLI — traces, prompts, datasets, scores, sessions, and any other API resource, (2) look up Langfuse documentation, concepts, integration guides, or SDK usage, or (3) understand how any Langfuse feature works. This skill covers CLI-based API access (via npx) and multiple documentation retrieval methods.
allowed-tools:
- WebFetch(domain:langfuse.com)
- Bash(curl *langfuse.com/*)
- Bash(npx langfuse-cli api __schema *)
- Bash(npx langfuse-cli api * --help *)
- Bash(npx langfuse-cli api * list *)
- Bash(npx langfuse-cli api * get *)
- Bash(bunx langfuse-cli api __schema *)
- Bash(bunx langfuse-cli api * --help *)
- Bash(bunx langfuse-cli api * list *)
- Bash(bunx langfuse-cli api * get *)
---

# Langfuse

This skill helps you use Langfuse effectively across all common workflows: instrumenting applications, migrating prompts, debugging traces, and accessing data programmatically.

## Core Principles

Follow these principles for ALL Langfuse work:

1. **Documentation First**: NEVER implement based on memory. Always fetch current docs before writing code (Langfuse updates frequently) See the section below on how to access documentation.
2. **CLI for Data Access**: Use `langfuse-cli` when querying/modifying Langfuse data. See the section below on how to use the CLI.
3. **Best Practices by Use Case**: Check the relevant reference file below for use-case-specific guidelines before implementing
4. **Use latest Langfuse versions**: Unless the user specified otherwise or there's a good reason, always use the latest version of Langfuse SDKs/APIs.


## Use case specific references

- instrumenting an existing function/application: references/instrumentation.md
- migrating prompts from a codebase into Langfuse: references/prompt-migration.md
- capturing user feedback (thumbs, ratings, implicit signals) as scores on traces: references/user-feedback.md
- further tips on using the Langfuse CLI: references/cli.md
- upgrading or migrating Langfuse SDKs to the latest version: references/sdk-upgrade.md
- systematic error analysis — reading traces, building failure taxonomy, deciding what to fix: references/error-analysis.md
- submitting feedback about this skill: references/skill-feedback.md

## 1. Langfuse API via CLI

Use the `langfuse-cli` to interact with the full Langfuse REST API from the command line. Run via npx (no install required):

Start by discovering the schema and available arguments:

```bash
# Discover all available resources
npx langfuse-cli api __schema

# List actions for a resource
npx langfuse-cli api <resource> --help

# Show args/options for a specific action
npx langfuse-cli api <resource> <action> --help
```

### Credentials

Set environment variables before making calls:

```bash
export LANGFUSE_PUBLIC_KEY=pk-lf-...
export LANGFUSE_SECRET_KEY=sk-lf-...
export LANGFUSE_HOST=https://cloud.langfuse.com # example for EU cloud. For US cloud it's us.cloud.langfuse.com, and can also be a self-hosted URL. The server must always be specified in order to access Langfuse.
```

If not set, ask the user to set them in their shell or a `.env` file (do not ask them to paste keys into chat for security reasons). Keys are found in Langfuse UI → Settings → API Keys.

### Detailed CLI Reference

For common workflows, tips, and full usage patterns, see [references/cli.md](references/cli.md).

## 2. Langfuse Documentation

Three methods to access Langfuse docs, in order of preference. **Always prefer your application's native web fetch and search tools** (e.g., `WebFetch`, `WebSearch`, `mcp_fetch`, etc.) over `curl` when available. The URLs and patterns below work with any fetching method — the `curl` examples are just illustrative.

### 2a. Documentation Index (llms.txt)

Fetch the full index of all documentation pages:

```bash
curl -s https://langfuse.com/llms.txt
```

Returns a structured list of every doc page with titles and URLs. Use this to discover the right page for a topic, then fetch that page directly.

Alternatively, you can start on `https://langfuse.com/docs` and explore the site to find the page you need.

### 2b. Fetch Individual Pages as Markdown

Any page listed in llms.txt can be fetched as markdown by appending `.md` to its path or by using `Accept: text/markdown` in the request headers. Use this when you know which page contains the information needed. Returns clean markdown with code examples and configuration details.

```bash
curl -s "https://langfuse.com/docs/observability/overview.md"
curl -s "https://langfuse.com/docs/observability/overview" -H "Accept: text/markdown"
```

### 2c. Search Documentation

When you need to find information across all docs and github issues/discussions without knowing the specific page:

```bash
curl -s "https://langfuse.com/api/search-docs?query=<url-encoded-query>"
```

Example:

```bash
curl -s "https://langfuse.com/api/search-docs?query=How+do+I+trace+LangGraph+agents"
```

Returns a JSON response with:

- `query`: the original query
- `answer`: a JSON string containing an array of matching documents, each with:
- `url`: link to the doc page
- `title`: page title
- `source.content`: array of relevant text excerpts from the page

Search is a great fallback if you cannot find the relevant pages or need more context. Especially useful when debugging issues as all GitHub Issues and Discussions are also indexed. Responses can be large — extract only the relevant portions.

### Documentation Workflow

1. Start with **llms.txt** to orient — scan for relevant page titles
2. **Fetch specific pages** when you identify the right one
3. Fall back to **search** when the topic is unclear and you want more context

## Skill Feedback

When the user expresses that something about this skill is not working as expected, gives incorrect guidance, is missing information, or could be improved — offer to submit feedback to the Langfuse skill maintainers. This includes when:

- The skill gave wrong or outdated instructions
- A workflow didn't produce the expected result
- The user wishes the skill covered something it doesn't
- The user explicitly says something like "this should work differently" or "this is wrong"

**Do NOT trigger this** for issues with Langfuse itself (the product) — only for issues with this skill's instructions and behavior.

When triggered, follow the process in [references/skill-feedback.md](references/skill-feedback.md).
51 changes: 51 additions & 0 deletions .claude/skills/langfuse/references/cli.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Langfuse CLI Reference

Documentation: https://langfuse.com/docs/api-and-data-platform/features/cli

## Install

```bash
# Run directly (recommended)
npx langfuse-cli api <resource> <action>
bunx langfuse-cli api <resource> <action>

# Or install globally
npm i -g langfuse-cli
langfuse api <resource> <action>
```

## Discovery

```bash
# List all resources and auth info
langfuse api __schema

# List actions for a resource
langfuse api <resource> --help

# Show args/options for a specific action
langfuse api <resource> <action> --help

# Preview the curl command without executing
langfuse api <resource> <action> --curl
```

## Credentials

Set environment variables:

```bash
export LANGFUSE_PUBLIC_KEY=pk-lf-...
export LANGFUSE_SECRET_KEY=sk-lf-...
export LANGFUSE_HOST=https://cloud.langfuse.com
```

## Tips

- Use `--json` for machine-readable JSON output
- Use `--curl` to preview the HTTP request without executing
- Pagination: use `--limit` and `--page` on list endpoints
- All list commands support filtering — check `<resource> <action> --help` for available options
- Prefer `observations-v2s` over `observations` — the v2 endpoint returns richer data
- Prefer `metrics-v2s` over `metrics` — the v2 endpoint returns richer data
- Prefer `score-v2s` over `scores` — the v1 `scores` resource only supports create/delete; use `score-v2s` for list and get operations
Loading