This is the official Ruby SDK for Braintrust, for tracing and evaluating your AI applications.
NOTE: This SDK is currently in BETA status and APIs may change between minor versions.
Add to your Gemfile:
gem "braintrust", require: "braintrust/setup"Set your API key and install:
export BRAINTRUST_API_KEY="your-api-key"
bundle installYour LLM calls are now automatically traced. View them at braintrust.dev.
The SDK also offers additional setup options for a variety of applications.
| What it looks like | When to Use | |
|---|---|---|
| Setup script | require 'braintrust/setup' |
You'd like to automatically setup the SDK at load time. |
| CLI command | braintrust exec -- ruby app.rb |
You prefer not to modify the application source code. |
| Braintrust.init | Call Braintrust.init in your code |
You need to control when setup occurs, or require customized configuration. |
See our examples for more detail.
For most applications, we recommend adding require: "braintrust/setup" to your Gemfile or an initializer file in your Ruby application to automatically setup the SDK. This will automatically apply instrumentation to all available LLM libraries.
You can use environment variables to configure behavior.
You can use this CLI command to instrument any Ruby application without modifying the source code.
First, make sure the gem is installed on the system:
gem install braintrustThen wrap the start up command of any Ruby application to apply:
braintrust exec -- ruby app.rb
braintrust exec -- bundle exec rails server
braintrust exec --only openai -- ruby app.rbYou can use environment variables to configure behavior.
NOTE: Installing a package at the system-level does not guarantee compatibility with all Ruby applications on that system; conflicts with dependencies can arise.
For stronger assurance of compatibility, we recommend either:
- Installing via the application's
Gemfileandbundle installwhen possible. - OR for OCI/Docker deployments,
gem install braintrustwhen building images in your CI/CD pipeline (and verifying their safe function.)
For more control over when auto-instrumentation is applied:
require "braintrust"
Braintrust.initOptions:
| Option | Default | Description |
|---|---|---|
api_key |
ENV['BRAINTRUST_API_KEY'] |
API key |
auto_instrument |
true |
true, false, or Hash with :only/:except keys to filter integrations |
blocking_login |
false |
Block until login completes (async login when false) |
default_project |
ENV['BRAINTRUST_DEFAULT_PROJECT'] |
Default project for spans |
enable_tracing |
true |
Enable OpenTelemetry tracing |
filter_ai_spans |
ENV['BRAINTRUST_OTEL_FILTER_AI_SPANS'] |
Only export AI-related spans |
org_name |
ENV['BRAINTRUST_ORG_NAME'] |
Organization name |
set_global |
true |
Set as global state. Set to false for isolated instances |
Example with options:
Braintrust.init(
default_project: "my-project",
auto_instrument: { only: [:openai] }
)| Variable | Description |
|---|---|
BRAINTRUST_API_KEY |
Required. Your Braintrust API key |
BRAINTRUST_API_URL |
Braintrust API URL (default: https://api.braintrust.dev) |
BRAINTRUST_APP_URL |
Braintrust app URL (default: https://www.braintrust.dev) |
BRAINTRUST_AUTO_INSTRUMENT |
Set to false to disable auto-instrumentation |
BRAINTRUST_DEBUG |
Set to true to enable debug logging |
BRAINTRUST_DEFAULT_PROJECT |
Default project for spans |
BRAINTRUST_FLUSH_ON_EXIT |
Set to false to disable automatic span flushing on program exit |
BRAINTRUST_INSTRUMENT_EXCEPT |
Comma-separated list of integrations to skip |
BRAINTRUST_INSTRUMENT_ONLY |
Comma-separated list of integrations to enable (e.g., openai,anthropic) |
BRAINTRUST_ORG_NAME |
Organization name |
BRAINTRUST_OTEL_FILTER_AI_SPANS |
Set to true to only export AI-related spans |
The SDK automatically instruments these LLM libraries:
| Provider | Gem | Versions | Integration Name | Examples |
|---|---|---|---|---|
| Anthropic | anthropic |
>= 0.3.0 | :anthropic |
Link |
| OpenAI | openai |
>= 0.1.0 | :openai |
Link |
ruby-openai |
>= 7.0.0 | :ruby_openai |
Link | |
| Multiple | ruby_llm |
>= 1.8.0 | :ruby_llm |
Link |
For fine-grained control, disable auto-instrumentation and instrument specific clients:
require "braintrust"
require "openai"
Braintrust.init(auto_instrument: false) # Or BRAINTRUST_AUTO_INSTRUMENT=false
# Instrument all OpenAI clients
Braintrust.instrument!(:openai)
# OR instrument a single client
client = OpenAI::Client.new
Braintrust.instrument!(:openai, target: client)Wrap business logic in spans to see it in your traces:
tracer = OpenTelemetry.tracer_provider.tracer("my-app")
tracer.in_span("process-request") do |span|
span.set_attribute("user.id", user_id)
# LLM calls inside here are automatically nested under this span
response = client.chat.completions.create(...)
endLog binary data (images, PDFs, audio) in your traces:
require "braintrust/trace/attachment"
att = Braintrust::Trace::Attachment.from_file("image/png", "./photo.png")
# Use in messages (OpenAI/Anthropic format)
messages = [
{
role: "user",
content: [
{type: "text", text: "What's in this image?"},
att.to_h
]
}
]
# Log to span
span.set_attribute("braintrust.input_json", JSON.generate(messages))Create attachments from various sources:
Braintrust::Trace::Attachment.from_bytes("image/jpeg", image_data)
Braintrust::Trace::Attachment.from_file("application/pdf", "./doc.pdf")
Braintrust::Trace::Attachment.from_url("https://example.com/image.png")See example: trace_attachments.rb
Get a permalink to any span:
tracer = OpenTelemetry.tracer_provider.tracer("my-app")
tracer.in_span("my-operation") do |span|
# your code here
puts "View trace at: #{Braintrust::Trace.permalink(span)}"
endRun evaluations against your AI systems:
require "braintrust"
Braintrust.init
Braintrust::Eval.run(
project: "my-project",
experiment: "classifier-v1",
cases: [
{input: "apple", expected: "fruit"},
{input: "carrot", expected: "vegetable"}
],
task: ->(input:) { classify(input) },
scorers: [
->(expected:, output:) { output == expected ? 1.0 : 0.0 }
]
)See eval.rb for a full example.
Use test cases from a Braintrust dataset:
Braintrust::Eval.run(
project: "my-project",
dataset: "my-dataset",
task: ->(input:) { classify(input) },
scorers: [...]
)Or define test cases inline with metadata and tags:
Braintrust::Eval.run(
project: "my-project",
experiment: "classifier-v1",
cases: [
{input: "apple", expected: "fruit", tags: ["produce"], metadata: {difficulty: "easy"}},
{input: "salmon", expected: "protein", tags: ["seafood"], metadata: {difficulty: "medium"}}
],
task: ->(input:) { classify(input) },
scorers: [...]
)See dataset.rb for a full example.
Use scoring functions defined in Braintrust:
Braintrust::Eval.run(
project: "my-project",
cases: [...],
task: ->(input:) { ... },
scorers: ["accuracy-scorer"]
)Or define scorers inline with Scorer.new:
Braintrust::Eval.run(
project: "my-project",
cases: [...],
task: ->(input:) { ... },
scorers: [
Braintrust::Scorer.new("exact_match") do |expected:, output:|
output == expected ? 1.0 : 0.0
end
]
)See remote_functions.rb for a full example.
Scorers can return a Hash with :score and :metadata to attach structured context to the score. The metadata is logged on the scorer's span and visible in the Braintrust UI for debugging and filtering:
Braintrust::Scorer.new("translation") do |expected:, output:|
common_words = output.downcase.split & expected.downcase.split
overlap = common_words.size.to_f / expected.split.size
{
score: overlap,
metadata: {word_overlap: common_words.size, missing_words: expected.downcase.split - output.downcase.split}
}
endSee scorer_metadata.rb for a full example.
When several scores can be computed together (e.g. in one LLM call), you can return an Array of score Hash instead of a single value. Each metric appears as a separate score column in the Braintrust UI:
Braintrust::Scorer.new("summary_quality") do |output:, expected:|
words = output.downcase.split
key_terms = expected[:key_terms]
covered = key_terms.count { |t| words.include?(t) }
[
{name: "coverage", score: covered.to_f / key_terms.size, metadata: {missing: key_terms - words}},
{name: "conciseness", score: words.size <= expected[:max_words] ? 1.0 : 0.0}
]
endname and score are required, metadata is optional.
See multi_score.rb for a full example.
Scorers can access the full evaluation trace (all spans generated by the task) by declaring a trace: keyword parameter. This is useful for inspecting intermediate LLM calls, validating tool usage, or checking the message thread:
Braintrust::Eval.run(
project: "my-project",
cases: [{input: "What is 2+2?", expected: "4"}],
task: Braintrust::Task.new { |input:| my_llm_pipeline(input) },
scorers: [
# Access the full trace to inspect LLM spans
Braintrust::Scorer.new("uses_system_prompt") do |output:, trace:|
messages = trace.thread # reconstructed message thread from LLM spans
messages.any? { |m| m["role"] == "system" } ? 1.0 : 0.0
end,
# Filter spans by type
Braintrust::Scorer.new("single_llm_call") do |output:, trace:|
trace.spans(span_type: "llm").length == 1 ? 1.0 : 0.0
end,
# Scorers without trace: still work — the parameter is filtered out automatically
Braintrust::Scorer.new("exact_match") do |output:, expected:|
output == expected ? 1.0 : 0.0
end
]
)See trace_scoring.rb for a full example.
Run evaluations from the Braintrust web UI against code in your own application.
Define evaluators, pass them to the dev server, and start serving:
# eval_server.ru
require "braintrust/eval"
require "braintrust/server"
# Define evaluators — these can reference your application code (models, services, etc.)
food_classifier = Braintrust::Eval::Evaluator.new(
task: ->(input:) { FoodClassifier.classify(input) },
scorers: [
Braintrust::Scorer.new("exact_match") { |expected:, output:| output == expected ? 1.0 : 0.0 }
]
)
# Initialize Braintrust (requires BRAINTRUST_API_KEY)
Braintrust.init(blocking_login: true)
# Start the server
run Braintrust::Server::Rack.app(
evaluators: {
"food-classifier" => food_classifier
}
)Add your Rack server to your Gemfile:
gem "rack"
gem "puma" # recommendedThen start the server:
bundle exec rackup eval_server.ru -p 8300 -o 0.0.0.0See example: server/eval.ru
Custom evaluators
Evaluators can also be defined as subclasses:
class FoodClassifier < Braintrust::Eval::Evaluator
def task
->(input:) { classify(input) }
end
def scorers
[Braintrust::Scorer.new("exact_match") { |expected:, output:| output == expected ? 1.0 : 0.0 }]
end
endUse the Rails engine when your evaluators live inside an existing Rails app and you want to mount the Braintrust eval server into that application.
Define each evaluator in its own file, for example under app/evaluators/:
# app/evaluators/food_classifier.rb
class FoodClassifier < Braintrust::Eval::Evaluator
def task
->(input:) { classify(input) }
end
def scorers
[Braintrust::Scorer.new("exact_match") { |expected:, output:| output == expected ? 1.0 : 0.0 }]
end
endThen generate the Braintrust initializer:
bin/rails generate braintrust:eval_server# config/routes.rb
Rails.application.routes.draw do
mount Braintrust::Contrib::Rails::Engine, at: "/braintrust"
endThe generator writes config/initializers/braintrust_server.rb, where you can review or customize the slug-to-evaluator mapping it discovers from app/evaluators/**/*.rb and evaluators/**/*.rb.
See example: contrib/rails/eval.rb
Developing locally
If you want to skip authentication on incoming eval requests while developing locally:
- For Rack: Pass
auth: :nonetoBraintrust::Server::Rack.app(...) - For Rails: Set
config.auth = :noneinconfig/initializers/braintrust_server.rb
NOTE: Setting :none disables authentication on incoming requests into your server; executing evals requires a BRAINTRUST_API_KEY to fetch resources.
Supported web servers
The dev server requires the rack gem and a Rack-compatible web server.
| Server | Version Supported | Notes |
|---|---|---|
| Puma | 6.x | |
| Falcon | 0.x | |
| Passenger | 6.x | |
| WEBrick | Not supported | Does not support server-sent events. |
See examples: server/eval.ru,
First verify there are no errors in your logs after running with BRAINTRUST_DEBUG=true set.
Your application needs the following for this to work:
require 'bundler/setup'
Bundler.requireIt is present by default in Rails applications, but may not be in Sinatra, Rack, or other applications.
Alternatively, you can add require 'braintrust/setup' to your application initialization files.
See CONTRIBUTING.md for development setup and guidelines.
Apache License 2.0 - see LICENSE.