🚀 New in v0.1.0: Enhanced tracing, custom evaluators, and production monitoring features.
LangSmith is a powerful platform for developing, monitoring, and improving LLM applications in production. It provides the tools you need to build reliable, high-performing LLM applications with confidence.
- Debugging: Trace and visualize complex LLM calls and chains
- Evaluation: Measure and improve model performance with custom metrics
- Monitoring: Track production performance and get alerts for issues
- Collaboration: Share and compare results across your team
- Optimization: Identify and fix performance bottlenecks
pip install langsmithnpm install @langchain/langgraph @langchain/coreimport os
from langchain.smith import RunEvalConfig, run_on_dataset
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.chains import LLMChain
# Set your API keys
os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["LANGCHAIN_API_KEY"] = "your-langchain-api-key"
os.environ["LANGCHAIN_TRACING_V2"] = "true"
# Initialize your model
llm = ChatOpenAI(temperature=0.7, model_name="gpt-4")import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { LLMChain } from "langchain/chains";
// Set your API keys
process.env.OPENAI_API_KEY = "your-openai-key";
process.env.LANGCHAIN_API_KEY = "your-langchain-api-key";
process.env.LANGCHAIN_TRACING_V2 = "true";
// Initialize your model
const llm = new ChatOpenAI({
temperature: 0.7,
modelName: "gpt-4",
});# Define a prompt template
prompt = ChatPromptTemplate.from_template(
"You are a helpful assistant. Answer the following question: {question}"
)
# Create a chain
chain = LLMChain(llm=llm, prompt=prompt)
# Test the chain
response = chain.run(question="What is LangSmith?")
print(response)// Define a prompt template
const prompt = ChatPromptTemplate.fromTemplate(
"You are a helpful assistant. Answer the following question: {question}"
);
// Create a chain
const chain = new LLMChain({
llm,
prompt,
});
// Test the chain
const response = await chain.call({
question: "What is LangSmith?",
});
console.log(response);# Enable tracing (already set in environment variables above)
# All subsequent chain runs will be traced automatically
# Run your chain with tracing
with tracing_enabled():
result = chain.run(question="How does LangSmith help with LLM development?")
print(f"Trace URL: {tracing.get_trace_url()}")import { CallbackManager } from "@langchain/core/callbacks/manager";
// Enable tracing (already set in environment variables)
// All subsequent chain runs will be traced automatically
// Run your chain with tracing
const result = await chain.call(
{ question: "How does LangSmith help with LLM development?" },
{
callbacks: CallbackManager.fromHandlers({
handleChainEnd: (outputs, runId, parentRunId) => {
console.log(`Trace URL: https://smith.langchain.com/o/${process.env.LANGCHAIN_PROJECT}/runs/${runId}`);
},
}),
}
);
console.log(result);from langsmith import Client
client = Client()
dataset_name = "example-qa-dataset"
try:
dataset = client.read_dataset(dataset_name=dataset_name)
except:
# Create a new dataset if it doesn't exist
dataset = client.create_dataset(
dataset_name=dataset_name,
description="Example QA dataset for testing"
)
# Add examples
examples = [
({"question": "What is LangSmith?"}, {"answer": "LangSmith is a platform for developing and monitoring LLM applications."}),
({"question": "How does tracing work?"}, {"answer": "Tracing captures the execution of LLM calls and chains for debugging and analysis."}),
]
client.create_examples(
inputs=[e[0] for e in examples],
outputs=[e[1] for e in examples],
dataset_id=dataset.id
)import { Client } from "langsmith";
const client = new Client();
const datasetName = "example-qa-dataset";
async function setupDataset() {
let dataset;
try {
dataset = await client.readDataset({ datasetName });
} catch (e) {
// Create a new dataset if it doesn't exist
dataset = await client.createDataset(datasetName, {
description: "Example QA dataset for testing",
});
// Add examples
const examples = [
[
{ question: "What is LangSmith?" },
{ answer: "LangSmith is a platform for developing and monitoring LLM applications." },
],
[
{ question: "How does tracing work?" },
{ answer: "Tracing captures the execution of LLM calls and chains for debugging and analysis." },
],
];
await client.createExamples({
inputs: examples.map(([input]) => input),
outputs: examples.map(([_, output]) => output),
datasetId: dataset.id,
});
}
return dataset;
}
// Run the setup
setupDataset().then(dataset => {
console.log(`Dataset ready: ${dataset.name} (${dataset.id})`);
});
### 6. Run Evaluation
```python
# Define evaluation criteria
eval_config = RunEvalConfig(
evaluators=[
"qa", # Built-in QA evaluator
{
"criteria": {
"helpfulness": "How helpful is the response?",
"relevance": "How relevant is the response to the question?",
"conciseness": "How concise is the response?",
}
},
# Custom evaluator function
{
"custom_evaluator": {
"name": "custom_eval",
"evaluation_function": lambda input, output, reference: {
"custom_score": 0.95,
"reasoning": "The response is well-structured and informative."
}
}
}
],
# Optional: Add metadata for analysis
eval_llm=ChatOpenAI(temperature=0, model_name="gpt-4"),
)
# Run evaluation
results = run_on_dataset(
dataset_name=dataset_name,
llm_or_chain_factory=lambda: chain, # Your chain
evaluation=eval_config,
verbose=True,
project_name="my-first-eval"
)
print(f"Evaluation complete. Results: {results}")
Tracing allows you to visualize and debug the execution of your LLM applications.
from langsmith import traceable
from langchain.callbacks.manager import tracing_v2_enabled
@traceable
def process_query(question: str) -> str:
"""Process a user question and return a response."""
# Your LLM chain or processing logic here
return chain.run(question=question)
# Enable tracing for this block
with tracing_v2_enabled(project_name="my-llm-app"):
result = process_query("What is LangSmith?")
print(f"Trace URL: {tracing.get_trace_url()}")import { traceable, tracing_v2_enabled } from "langsmith";
const processQuery = traceable(async (question: string): Promise<string> => {
// Your LLM chain or processing logic here
return chain.call({ question });
}, { name: "process_query" });
// Enable tracing for this block
(async () => {
await tracing_v2_enabled({ projectName: "my-llm-app" }, async () => {
const result = await processQuery("What is LangSmith?");
console.log(`Trace URL: https://smith.langchain.com/o/${process.env.LANGCHAIN_PROJECT}/runs/${tracing.get_current_run_id()}`);
});
})();from langsmith import traceable, tracing_v2_enabled
def retrieve_context(question: str) -> str:
"""Retrieve relevant context for a question."""
# This would typically call a vector store or other data source
return "LangSmith is a platform for developing and monitoring LLM applications."
@traceable
def generate_response(question: str, context: str) -> str:
"""Generate a response using the provided context."""
prompt = f"""Answer the question based on the following context:
{context}
Question: {question}"""
return llm.predict(prompt)
@traceable
def answer_question(question: str) -> str:
"""End-to-end question answering."""
context = retrieve_context(question)
return generate_response(question, context)
# All nested calls will be traced
with tracing_v2_enabled(project_name="nested-tracing"):
response = answer_question("What is LangSmith?")
print(response)import { traceable, tracing_v2_enabled } from "langsmith";
async function retrieveContext(question: string): Promise<string> {
// This would typically call a vector store or other data source
return "LangSmith is a platform for developing and monitoring LLM applications.";
}
const generateResponse = traceable(async (question: string, context: string): Promise<string> => {
const prompt = `Answer the question based on the following context:
${context}
Question: ${question}`;
return llm.predict(prompt);
}, { name: "generate_response" });
const answerQuestion = traceable(async (question: string): Promise<string> => {
const context = await retrieveContext(question);
return generateResponse(question, context);
}, { name: "answer_question" });
// All nested calls will be traced
(async () => {
await tracing_v2_enabled({ projectName: "nested-tracing" }, async () => {
const response = await answerQuestion("What is LangSmith?");
console.log(response);
});
})();LangSmith provides powerful tools for evaluating your LLM applications.
from typing import Dict, Any
from langchain.evaluation import load_evaluator
def custom_evaluator(run, example) -> Dict[str, Any]:
"""Custom evaluator that checks response length and content."""
prediction = run.outputs["output"]
# Initialize evaluators
fact_evaluator = load_evaluator("criteria", criteria="factuality")
# Run evaluations
fact_result = fact_evaluator.evaluate_strings(
prediction=prediction,
input=example.inputs["question"]
)
# Calculate custom metrics
word_count = len(prediction.split())
return {
"fact_score": fact_result["score"],
"word_count": word_count,
"is_too_short": word_count < 5,
"feedback": fact_result["reasoning"]
}
# Use the custom evaluator
eval_config = RunEvalConfig(
custom_evaluators=[custom_evaluator],
eval_llm=ChatOpenAI(temperature=0, model="gpt-4")
)import { RunEvalConfig, loadEvaluator } from "langchain/evaluation";
import { ChatOpenAI } from "@langchain/openai";
interface EvaluationResult {
fact_score: number;
word_count: number;
is_too_short: boolean;
feedback: string;
}
async function customEvaluator(run: any, example: any): Promise<EvaluationResult> {
const prediction = run.outputs.output;
// Initialize evaluators
const factEvaluator = await loadEvaluator("criteria", {
criteria: "factuality",
llm: new ChatOpenAI({ temperature: 0 })
});
// Run evaluations
const factResult = await factEvaluator.evaluateStrings({
prediction,
input: example.inputs.question
});
// Calculate custom metrics
const wordCount = prediction.split(/\s+/).length;
return {
fact_score: factResult.score,
word_count: wordCount,
is_too_short: wordCount < 5,
feedback: factResult.reasoning || ""
};
}
// Use the custom evaluator
const evalConfig: RunEvalConfig = {
customEvaluators: [customEvaluator],
evalLLM: new ChatOpenAI({
temperature: 0,
modelName: "gpt-4"
})
};from langsmith import Client
from langchain.callbacks import get_openai_callback
client = Client()
# Record human feedback
def record_feedback(run_id: str, score: int, comment: str = ""):
client.create_feedback(
run_id,
key="human_rating",
score=score, # 1-5 scale
comment=comment,
)
# Example usage
with get_openai_callback() as cb:
result = chain.run(question="What is LangSmith?")
print(f"Generated response: {result}")
# Get the trace URL for human review
trace_url = tracing.get_trace_url()
print(f"Review at: {trace_url}")
# Simulate human feedback (in a real app, this would come from a UI)
record_feedback(
run_id=tracing.get_current_run_id(),
score=4,
comment="Good response, but could be more detailed."
)import { Client } from "langsmith";
import { getOpenAICallback } from "langchain/callbacks";
const client = new Client();
// Record human feedback
async function recordFeedback(runId: string, score: number, comment: string = ""): Promise<void> {
await client.createFeedback({
runId,
key: "human_rating",
score, // 1-5 scale
comment,
});
}
// Example usage
(async () => {
const cb = await getOpenAICallback();
try {
const result = await chain.call({ question: "What is LangSmith?" });
console.log(`Generated response: ${result.text}`);
// Get the trace URL for human review
const traceUrl = `https://smith.langchain.com/o/${process.env.LANGCHAIN_PROJECT}/runs/${tracing.getCurrentRunId()}`;
console.log(`Review at: ${traceUrl}`);
// Simulate human feedback (in a real app, this would come from a UI)
await recordFeedback(
tracing.getCurrentRunId(),
4,
"Good response, but could be more detailed."
);
} finally {
// Make sure to close the callback
await cb?.close();
}
})();Monitor your LLM applications in production with real-time metrics and alerts.
from langsmith import Client
from datetime import datetime, timedelta
client = Client()
# Define metrics to track
metrics = [
"latency",
"token_usage",
"feedback.human_rating",
"evaluation.fact_score"
]
# Get metrics for the last 24 hours
end_time = datetime.utcnow()
start_time = end_time - timedelta(days=1)
metrics_data = client.read_metrics(
project_name="my-llm-app",
metrics=metrics,
start_time=start_time,
end_time=end_time,
group_by=["model", "prompt_version"]
)
# Analyze metrics
print(f"Average latency: {metrics_data['latency'].mean()}s")
print(f"Total tokens used: {metrics_data['token_usage'].sum()}")import { Client } from "langsmith";
import { subDays } from "date-fns";
const client = new Client();
// Define metrics to track
const metrics = [
"latency",
"token_usage",
"feedback.human_rating",
"evaluation.fact_score"
] as const;
// Get metrics for the last 24 hours
const endTime = new Date();
const startTime = subDays(endTime, 1);
async function getMetrics() {
const metricsData = await client.readMetrics({
projectName: "my-llm-app",
metrics,
startTime,
endTime,
groupBy: ["model", "prompt_version"]
});
// Analyze metrics
const avgLatency = metricsData.latency.reduce((a, b) => a + b, 0) / metricsData.latency.length;
const totalTokens = metricsData.token_usage.reduce((a, b) => a + b, 0);
console.log(`Average latency: ${avgLatency.toFixed(2)}s`);
console.log(`Total tokens used: ${totalTokens}`);
return metricsData;
}
getMetrics().catch(console.error);# Create an alert for high latency
alert_config = {
"name": "High Latency Alert",
"description": "Alert when average latency exceeds threshold",
"metric": "latency",
"condition": ">",
"threshold": 5.0, # seconds
"window": "1h", # 1-hour rolling window
"notification_channels": ["email:your-email@example.com"],
"severity": "high"
}
client.create_alert(
project_name="my-llm-app",
**alert_config
)// Create an alert for high latency
const alertConfig = {
name: "High Latency Alert",
description: "Alert when average latency exceeds threshold",
metric: "latency" as const,
condition: ">" as const,
threshold: 5.0, // seconds
window: "1h", // 1-hour rolling window
notificationChannels: ["email:your-email@example.com"],
severity: "high" as const
};
async function createAlert() {
await client.createAlert({
projectName: "my-llm-app",
...alertConfig
});
console.log("Alert created successfully");
}
createAlert().catch(console.error);from typing import List, Dict, Any
from langchain.schema import SystemMessage, HumanMessage
class SupportBot:
def __init__(self):
self.llm = ChatOpenAI(temperature=0.7, model_name="gpt-4")
self.context = []
@traceable
def generate_response(self, user_input: str) -> str:
"""Generate a response to a user's support request."""
# Add to conversation history
self.context.append(HumanMessage(content=user_input))
# Create prompt with context
messages = [
SystemMessage(content="You are a helpful customer support agent."),
*self.context[-6:] # Last 3 exchanges (user + assistant)
]
# Generate response
response = self.llm(messages)
# Add to context
self.context.append(response)
return response.content
# Initialize bot
bot = SupportBot()
# Example conversation
with tracing_v2_enabled(project_name="support-bot"):
print(bot.generate_response("I can't log into my account."))
print(bot.generate_response("I've tried resetting my password but it's not working."))import { ChatOpenAI } from "@langchain/openai";
import { HumanMessage, SystemMessage } from "@langchain/core/messages";
import { traceable } from "langsmith";
import { tracing_v2_enabled } from "langsmith/trace";
class SupportBot {
private llm: ChatOpenAI;
private context: (HumanMessage | any)[] = [];
constructor() {
this.llm = new ChatOpenAI({
temperature: 0.7,
modelName: "gpt-4",
});
}
@traceable({ name: "generate_response" })
async generateResponse(userInput: string): Promise<string> {
// Add to conversation history
this.context.push(new HumanMessage(userInput));
// Create prompt with context
const messages = [
new SystemMessage("You are a helpful customer support agent."),
...this.context.slice(-6) // Last 3 exchanges (user + assistant)
];
// Generate response
const response = await this.llm.invoke(messages);
// Add to context
this.context.push(response);
return response.content;
}
}
// Example usage
(async () => {
const bot = new SupportBot();
await tracing_v2_enabled({ projectName: "support-bot" }, async () => {
console.log(await bot.generateResponse("I can't log into my account."));
console.log(await bot.generateResponse("I've tried resetting my password but it's not working."));
});
})();from enum import Enum
from pydantic import BaseModel
class ModerationResult(BaseModel):
is_safe: bool
reason: str
confidence: float
flagged_categories: List[str]
explanation: str
class ContentModerator:
def __init__(self):
self.llm = ChatOpenAI(temperature=0, model_name="gpt-4")
@traceable
def moderate_content(self, text: str) -> ModerationResult:
"""Check if content violates moderation policies."""
prompt = f"""Analyze the following content for policy violations:
{text}
Check for:
- Hate speech or discrimination
- Harassment or bullying
- Violence or harmful content
- Sexual content
- Personal information
- Spam or scams
Return a JSON object with:
- is_safe (boolean)
- reason (string)
- confidence (float 0-1)
- flagged_categories (list of strings)
- explanation (string)"""
response = self.llm.predict(prompt)
return ModerationResult.parse_raw(response)
# Example usage
moderator = ContentModerator()
with tracing_v2_enabled(project_name="content-moderation"):
result = moderator.moderate_content("This is a test message with no issues.")
print(f"Is safe: {result.is_safe}")
print(f"Reason: {result.reason}")import { ChatOpenAI } from "@langchain/openai";
import { traceable } from "langsmith";
import { tracing_v2_enabled } from "langsmith/trace";
interface ModerationResult {
is_safe: boolean;
reason: string;
confidence: number;
flagged_categories: string[];
explanation: string;
}
class ContentModerator {
private llm: ChatOpenAI;
constructor() {
this.llm = new ChatOpenAI({
temperature: 0,
modelName: "gpt-4",
});
}
@traceable({ name: "moderate_content" })
async moderateContent(text: string): Promise<ModerationResult> {
const prompt = `Analyze the following content for policy violations:
${text}
Check for:
- Hate speech or discrimination
- Harassment or bullying
- Violence or harmful content
- Sexual content
- Personal information
- Spam or scams
Return a JSON object with:
- is_safe (boolean)
- reason (string)
- confidence (float 0-1)
- flagged_categories (list of strings)
- explanation (string)`;
const response = await this.llm.invoke(prompt);
return JSON.parse(response.content);
}
}
// Example usage
(async () => {
const moderator = new ContentModerator();
await tracing_v2_enabled({ projectName: "content-moderation" }, async () => {
const result = await moderator.moderateContent("This is a test message with no issues.");
console.log(`Is safe: ${result.is_safe}`);
console.log(`Reason: ${result.reason}`);
});
})();- Use meaningful names: Give your traces and spans descriptive names
- Add metadata: Include relevant context in your traces
- Handle errors: Use try/except blocks and log errors appropriately
- Use spans: Group related operations together
from langsmith import trace_span
def process_document(document: str) -> dict:
"""Process a document through multiple steps."""
with trace_span("document_processing") as span:
# Add metadata to the span
span.metadata.update({
"document_length": len(document),
"processing_start_time": datetime.utcnow().isoformat()
})
try:
# Step 1: Extract text
with trace_span("text_extraction"):
text = extract_text(document)
# Step 2: Analyze sentiment
with trace_span("sentiment_analysis"):
sentiment = analyze_sentiment(text)
# Step 3: Generate summary
with trace_span("summarization"):
summary = generate_summary(text)
return {
"text": text,
"sentiment": sentiment,
"summary": summary,
"status": "success"
}
except Exception as e:
# Log the error
span.metadata["error"] = str(e)
raise- Define clear criteria: Be specific about what makes a good response
- Use multiple evaluators: Combine automated and human evaluation
- Test edge cases: Include challenging examples in your test sets
- Iterate: Use evaluation results to improve your prompts and models
- Set up alerts: Get notified of issues in real-time
- Track key metrics: Monitor latency, token usage, and quality scores
- A/B test: Compare different model versions or prompts
- Retain data: Keep enough history to identify trends and patterns
-
Missing Traces
- Verify
LANGCHAIN_TRACING_V2is set to "true" - Check your API key has the correct permissions
- Ensure your code is running in a traced context
- Verify
-
Evaluation Errors
- Check that your dataset format matches expected input
- Verify evaluator requirements are met
- Ensure your API keys have the necessary permissions
-
Performance Issues
- Check for rate limiting
- Optimize your prompts to reduce token usage
- Consider batching requests when possible
tracing_v2_enabled(): Context manager for enabling tracingtraceable: Decorator for tracing functionsRunEvalConfig: Configuration for evaluation runsClient: Main client for interacting with the LangSmith API
create_dataset(): Create a new datasetcreate_examples(): Add examples to a datasetrun_on_dataset(): Run evaluation on a datasetread_metrics(): Read metrics for a projectcreate_alert(): Create a monitoring alert
- Explore the UI: Visit LangSmith Dashboard to view your traces and metrics
- Join the Community: Get help and share your experiences in the LangChain Community
- Read the Docs: Check out the official documentation for more details
- Try Examples: Experiment with the example notebooks in the LangSmith Examples repository