Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
204 changes: 204 additions & 0 deletions .agents/skills/sdk-integrations/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
---
name: sdk-integrations
description: Create or update a Braintrust Python SDK integration using the integrations API. Use when asked to add an integration, update an existing integration, add or update patchers, update auto_instrument, add integration tests, or work in py/src/braintrust/integrations/.
---

# SDK Integrations

SDK integrations define how Braintrust discovers a provider, patches it safely, and keeps provider-specific tracing local to that integration. Read the existing integration closest to your task before writing a new one. If there is no closer example, `py/src/braintrust/integrations/anthropic/` is a useful reference implementation.

## Workflow

1. Read the shared integration primitives and the closest provider example.
2. Choose the task shape: new provider, existing provider update, or `auto_instrument()` update.
3. Implement the smallest integration, patcher, tracing, and export changes needed.
4. Add or update VCR-backed integration tests and only re-record cassettes when behavior changed intentionally.
5. Run the narrowest provider session first, then expand to shared validation only if the change touched shared code.

## Commands

```bash
cd py && nox -s "test_<provider>(latest)"
cd py && nox -s "test_<provider>(latest)" -- -k "test_name"
cd py && nox -s "test_<provider>(latest)" -- --vcr-record=all -k "test_name"
cd py && make test-core
cd py && make lint
```

## Creating or Updating an Integration

### 1. Read the nearest existing implementation

Always inspect these first:

- `py/src/braintrust/integrations/base.py`
- `py/src/braintrust/integrations/runtime.py`
- `py/src/braintrust/integrations/versioning.py`
- `py/src/braintrust/integrations/config.py`

Relevant example implementation:

- `py/src/braintrust/integrations/anthropic/`

Read these additional files only when the task needs them:

- changing `auto_instrument()`: `py/src/braintrust/auto.py` and `py/src/braintrust/auto_test_scripts/test_auto_anthropic_patch_config.py`
- adding or updating VCR tests: `py/src/braintrust/conftest.py` and `py/src/braintrust/integrations/anthropic/test_anthropic.py`

Then choose the path that matches the task:

- new provider: create `py/src/braintrust/integrations/<provider>/`
- existing provider: read the provider package first and change only the affected patchers, tracing, tests, or exports
- `auto_instrument()` only: keep the integration package unchanged unless the option shape or patcher surface also changed

### 2. Create or extend the integration module

For a new provider, create a package under `py/src/braintrust/integrations/<provider>/`.

For an existing provider, keep the module layout unless the current structure is actively causing problems.

Typical files:

- `__init__.py`: public exports for the integration type and any public helpers
- `integration.py`: the `BaseIntegration` subclass, patcher registration, and high-level orchestration
- `patchers.py`: one patcher per patch target, with version gating and existence checks close to the patch
- `tracing.py`: provider-specific span creation, metadata extraction, stream handling, and output normalization
- `test_<provider>.py`: integration tests for `wrap(...)`, `setup()`, sync/async behavior, streaming, and error handling
- `cassettes/`: recorded provider traffic for VCR-backed integration tests when the provider uses HTTP

### 3. Define the integration class

Implement a `BaseIntegration` subclass in `integration.py`.

Set:

- `name`
- `import_names`
- `min_version` and `max_version` only when needed
- `patchers`

Keep the class focused on orchestration. Provider-specific tracing logic should stay in `tracing.py`.

### 4. Add one patcher per coherent patch target

Put patchers in `patchers.py`.

Use `FunctionWrapperPatcher` when patching a single import path with `wrapt.wrap_function_wrapper`. Good examples:

- constructor patchers like `ProviderClient.__init__`
- single API surfaces like `client.responses.create`
- one sync and one async constructor patcher instead of one patcher doing both

Keep patchers narrow. If you need to patch multiple unrelated targets, create multiple patchers rather than one large patcher.

Patchers are responsible for:

- stable patcher ids via `name`
- optional version gating
- existence checks
- idempotence through the base patcher marker

### 5. Keep tracing provider-local

Put span creation, metadata extraction, stream aggregation, error logging, and output normalization in `tracing.py`.

This layer should:

- preserve provider behavior
- support sync, async, and streaming paths as needed
- avoid raising from tracing-only code when that would break the provider call

If the provider has complex streaming internals, keep that logic local instead of forcing it into shared abstractions.

### 6. Wire public exports

Update public exports only as needed:

- `py/src/braintrust/integrations/__init__.py`
- `py/src/braintrust/__init__.py`

### 7. Update auto_instrument only if this integration should be auto-patched

If the provider belongs in `braintrust.auto.auto_instrument()`, add a branch in `py/src/braintrust/auto.py`.

Match the current pattern:

- plain `bool` options for simple on/off integrations
- `IntegrationPatchConfig` only when users need patcher-level selection

## Tests

Keep integration tests with the integration package.

Provider behavior tests should use `@pytest.mark.vcr` whenever the provider uses network calls. Avoid mocks and fakes.

Cover:

- direct `wrap(...)` behavior
- `setup()` patching new clients
- sync behavior
- async behavior
- streaming behavior
- idempotence
- failure/error logging
- patcher selection if using `IntegrationPatchConfig`

Preferred locations:

- provider behavior tests: `py/src/braintrust/integrations/<provider>/test_<provider>.py`
- version helper tests: `py/src/braintrust/integrations/test_versioning.py`
- auto-instrument subprocess tests: `py/src/braintrust/auto_test_scripts/`

If the provider uses VCR, keep cassettes next to the integration test file under `py/src/braintrust/integrations/<provider>/cassettes/`.

Only re-record cassettes when the behavior change is intentional.

Use mocks or fakes only for cases that are hard to drive through recorded provider traffic, such as narrowly scoped error injection, local version-routing logic, or patcher existence checks.

## Patterns

### Constructor patching

If instrumenting future clients created by the SDK is the goal, patch constructors and attach traced surfaces after the real constructor runs. Anthropic is an example of this pattern.

### Patcher selection

Use `IntegrationPatchConfig` only when users benefit from enabling or disabling specific patchers. Validate unknown patcher ids through `BaseIntegration.resolve_patchers()` instead of silently ignoring them.

### Versioning

Prefer feature detection first and version checks second.

Use:

- `detect_module_version(...)`
- `version_in_range(...)`
- `version_matches_spec(...)`

Do not add `packaging` just for integration routing.

## Validation

- Run the narrowest provider session first.
- Run `cd py && make test-core` if you changed shared integration code.
- Run `cd py && make lint` before handing off broader integration changes.
- If you changed `auto_instrument()`, run the relevant subprocess auto-instrument tests.

## Done When

- the provider package contains only the integration, patcher, tracing, export, and test changes required by the task
- provider behavior tests use VCR unless recorded traffic cannot cover the behavior
- cassette changes are present only when provider behavior changed intentionally
- the narrowest affected provider session passes
- `cd py && make test-core` has been run if shared integration code changed
- `cd py && make lint` has been run before handoff

## Common Pitfalls

- Leaving provider behavior in `BaseIntegration` instead of the provider package.
- Combining multiple unrelated patch targets into one patcher.
- Forgetting async or streaming coverage.
- Defaulting to mocks or fakes when the provider flow can be covered with VCR.
- Moving tests but not moving their cassettes.
- Adding patcher selection without tests for enabled and disabled cases.
- Editing `auto_instrument()` in a way that implies a registry exists when it does not.
1 change: 1 addition & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,3 +144,4 @@ Avoid editing `py/src/braintrust/version.py` while also running build commands.
- Reuse existing fixtures and cassette patterns.
- If a change affects examples or integrations, update the nearest example or focused test.
- For CLI/devserver changes, consider whether wheel-mode behavior also needs coverage.
- Do **not** add `from __future__ import annotations` unless it is absolutely required (e.g., a genuine forward-reference that cannot be resolved any other way). This import changes annotation evaluation semantics at runtime and can silently break `get_type_hints()`, Pydantic models, and other runtime introspection. Prefer quoted string literals (`"MyClass"`) or `TYPE_CHECKING` guards for forward references instead.
10 changes: 9 additions & 1 deletion py/noxfile.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,9 @@ def _pinned_python_version():

SRC_DIR = "braintrust"
WRAPPER_DIR = "braintrust/wrappers"
INTEGRATION_DIR = "braintrust/integrations"
INTEGRATION_AUTO_TEST_DIR = "braintrust/integrations/auto_test_scripts"
ANTHROPIC_INTEGRATION_DIR = "braintrust/integrations/anthropic"
CONTRIB_DIR = "braintrust/contrib"
DEVSERVER_DIR = "braintrust/devserver"

Expand Down Expand Up @@ -176,6 +179,7 @@ def test_anthropic(session, version):
_install_test_deps(session)
_install(session, "anthropic", version)
_run_tests(session, f"{WRAPPER_DIR}/test_anthropic.py")
_run_tests(session, f"{INTEGRATION_DIR}/anthropic/test_anthropic.py")
_run_core_tests(session)


Expand Down Expand Up @@ -400,7 +404,11 @@ def _get_braintrust_wheel():

def _run_core_tests(session):
"""Run all tests which don't require optional dependencies."""
_run_tests(session, SRC_DIR, ignore_paths=[WRAPPER_DIR, CONTRIB_DIR, DEVSERVER_DIR])
_run_tests(
session,
SRC_DIR,
ignore_paths=[WRAPPER_DIR, INTEGRATION_AUTO_TEST_DIR, ANTHROPIC_INTEGRATION_DIR, CONTRIB_DIR, DEVSERVER_DIR],
)


def _run_tests(session, test_path, ignore_path="", ignore_paths=None, env=None):
Expand Down
1 change: 1 addition & 0 deletions py/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
"tqdm",
"exceptiongroup>=1.2.0",
"jsonschema",
"packaging",
"python-dotenv",
"sseclient-py",
"python-slugify",
Expand Down
7 changes: 4 additions & 3 deletions py/src/braintrust/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,13 +63,17 @@ def is_equal(expected, output):

from .audit import *
from .auto import (
IntegrationPatchConfig, # noqa: F401 # type: ignore[reportUnusedImport]
auto_instrument, # noqa: F401 # type: ignore[reportUnusedImport]
)
from .framework import *
from .framework2 import *
from .functions.invoke import *
from .functions.stream import *
from .generated_types import *
from .integrations.anthropic import (
wrap_anthropic, # noqa: F401 # type: ignore[reportUnusedImport]
)
from .logger import *
from .logger import (
_internal_get_global_state, # noqa: F401 # type: ignore[reportUnusedImport]
Expand All @@ -89,9 +93,6 @@ def is_equal(expected, output):
BT_IS_ASYNC_ATTRIBUTE, # noqa: F401 # type: ignore[reportUnusedImport]
MarkAsyncWrapper, # noqa: F401 # type: ignore[reportUnusedImport]
)
from .wrappers.anthropic import (
wrap_anthropic, # noqa: F401 # type: ignore[reportUnusedImport]
)
from .wrappers.litellm import (
wrap_litellm, # noqa: F401 # type: ignore[reportUnusedImport]
)
Expand Down
66 changes: 50 additions & 16 deletions py/src/braintrust/auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,13 @@
import logging
from contextlib import contextmanager

from braintrust.integrations import AnthropicIntegration, IntegrationPatchConfig


__all__ = ["auto_instrument"]

logger = logging.getLogger(__name__)
InstrumentOption = bool | IntegrationPatchConfig


@contextmanager
Expand All @@ -29,7 +32,7 @@ def _try_patch():
def auto_instrument(
*,
openai: bool = True,
anthropic: bool = True,
anthropic: InstrumentOption = True,
litellm: bool = True,
pydantic_ai: bool = True,
google_genai: bool = True,
Expand All @@ -49,7 +52,8 @@ def auto_instrument(

Args:
openai: Enable OpenAI instrumentation (default: True)
anthropic: Enable Anthropic instrumentation (default: True)
anthropic: Enable Anthropic instrumentation (default: True), or pass an
IntegrationPatchConfig to select Anthropic patchers explicitly.
litellm: Enable LiteLLM instrumentation (default: True)
pydantic_ai: Enable Pydantic AI instrumentation (default: True)
google_genai: Enable Google GenAI instrumentation (default: True)
Expand Down Expand Up @@ -104,23 +108,33 @@ def auto_instrument(
"""
results = {}

if openai:
openai_enabled = _normalize_bool_option("openai", openai)
anthropic_enabled, anthropic_config = _normalize_anthropic_option(anthropic)
litellm_enabled = _normalize_bool_option("litellm", litellm)
pydantic_ai_enabled = _normalize_bool_option("pydantic_ai", pydantic_ai)
google_genai_enabled = _normalize_bool_option("google_genai", google_genai)
agno_enabled = _normalize_bool_option("agno", agno)
claude_agent_sdk_enabled = _normalize_bool_option("claude_agent_sdk", claude_agent_sdk)
dspy_enabled = _normalize_bool_option("dspy", dspy)
adk_enabled = _normalize_bool_option("adk", adk)

if openai_enabled:
results["openai"] = _instrument_openai()
if anthropic:
results["anthropic"] = _instrument_anthropic()
if litellm:
if anthropic_enabled:
results["anthropic"] = _instrument_integration(AnthropicIntegration, patch_config=anthropic_config)
if litellm_enabled:
results["litellm"] = _instrument_litellm()
if pydantic_ai:
if pydantic_ai_enabled:
results["pydantic_ai"] = _instrument_pydantic_ai()
if google_genai:
if google_genai_enabled:
results["google_genai"] = _instrument_google_genai()
if agno:
if agno_enabled:
results["agno"] = _instrument_agno()
if claude_agent_sdk:
if claude_agent_sdk_enabled:
results["claude_agent_sdk"] = _instrument_claude_agent_sdk()
if dspy:
if dspy_enabled:
results["dspy"] = _instrument_dspy()
if adk:
if adk_enabled:
results["adk"] = _instrument_adk()

return results
Expand All @@ -134,14 +148,34 @@ def _instrument_openai() -> bool:
return False


def _instrument_anthropic() -> bool:
def _instrument_integration(integration, *, patch_config: IntegrationPatchConfig | None = None) -> bool:
with _try_patch():
from braintrust.wrappers.anthropic import patch_anthropic

return patch_anthropic()
return integration.setup(
enabled_patchers=patch_config.enabled_patchers if patch_config is not None else None,
disabled_patchers=patch_config.disabled_patchers if patch_config is not None else None,
)
return False


def _normalize_bool_option(name: str, option: bool) -> bool:
if isinstance(option, bool):
return option

raise TypeError(f"auto_instrument option {name!r} must be a bool, got {type(option).__name__}")


def _normalize_anthropic_option(option: InstrumentOption) -> tuple[bool, IntegrationPatchConfig | None]:
if isinstance(option, bool):
return option, None

if isinstance(option, IntegrationPatchConfig):
return True, option

raise TypeError(
f"auto_instrument option 'anthropic' must be a bool or IntegrationPatchConfig, got {type(option).__name__}"
)


def _instrument_litellm() -> bool:
with _try_patch():
from braintrust.wrappers.litellm import patch_litellm
Expand Down
5 changes: 5 additions & 0 deletions py/src/braintrust/integrations/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from .anthropic import AnthropicIntegration
from .base import IntegrationPatchConfig


__all__ = ["AnthropicIntegration", "IntegrationPatchConfig"]
Loading
Loading