LangSmith: The Complete Guide

🚀 New in v0.1.0: Enhanced tracing, custom evaluators, and production monitoring features.

Introduction

LangSmith is a powerful platform for developing, monitoring, and improving LLM applications in production. It provides the tools you need to build reliable, high-performing LLM applications with confidence.

Key Benefits

Debugging: Trace and visualize complex LLM calls and chains
Evaluation: Measure and improve model performance with custom metrics
Monitoring: Track production performance and get alerts for issues
Collaboration: Share and compare results across your team
Optimization: Identify and fix performance bottlenecks

Quick Start

1. Installation

Python

pip install langsmith

TypeScript

npm install @langchain/langgraph @langchain/core

2. Set Up Your Environment

Python

import os
from langchain.smith import RunEvalConfig, run_on_dataset
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.chains import LLMChain

# Set your API keys
os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["LANGCHAIN_API_KEY"] = "your-langchain-api-key"
os.environ["LANGCHAIN_TRACING_V2"] = "true"

# Initialize your model
llm = ChatOpenAI(temperature=0.7, model_name="gpt-4")

TypeScript

import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { LLMChain } from "langchain/chains";

// Set your API keys
process.env.OPENAI_API_KEY = "your-openai-key";
process.env.LANGCHAIN_API_KEY = "your-langchain-api-key";
process.env.LANGCHAIN_TRACING_V2 = "true";

// Initialize your model
const llm = new ChatOpenAI({
  temperature: 0.7,
  modelName: "gpt-4",
});

3. Create a Simple Chain

Python

# Define a prompt template
prompt = ChatPromptTemplate.from_template(
    "You are a helpful assistant. Answer the following question: {question}"
)

# Create a chain
chain = LLMChain(llm=llm, prompt=prompt)

# Test the chain
response = chain.run(question="What is LangSmith?")
print(response)

TypeScript

// Define a prompt template
const prompt = ChatPromptTemplate.fromTemplate(
  "You are a helpful assistant. Answer the following question: {question}"
);

// Create a chain
const chain = new LLMChain({
  llm,
  prompt,
});

// Test the chain
const response = await chain.call({
  question: "What is LangSmith?",
});
console.log(response);

4. Set Up Tracing

Python

# Enable tracing (already set in environment variables above)
# All subsequent chain runs will be traced automatically

# Run your chain with tracing
with tracing_enabled():
    result = chain.run(question="How does LangSmith help with LLM development?")
    print(f"Trace URL: {tracing.get_trace_url()}")

TypeScript

import { CallbackManager } from "@langchain/core/callbacks/manager";

// Enable tracing (already set in environment variables)
// All subsequent chain runs will be traced automatically

// Run your chain with tracing
const result = await chain.call(
  { question: "How does LangSmith help with LLM development?" },
  {
    callbacks: CallbackManager.fromHandlers({
      handleChainEnd: (outputs, runId, parentRunId) => {
        console.log(`Trace URL: https://smith.langchain.com/o/${process.env.LANGCHAIN_PROJECT}/runs/${runId}`);
      },
    }),
  }
);
console.log(result);

5. Create a Test Dataset

Python

from langsmith import Client

client = Client()

dataset_name = "example-qa-dataset"
try:
    dataset = client.read_dataset(dataset_name=dataset_name)
except:
    # Create a new dataset if it doesn't exist
    dataset = client.create_dataset(
        dataset_name=dataset_name,
        description="Example QA dataset for testing"
    )
    
    # Add examples
    examples = [
        ({"question": "What is LangSmith?"}, {"answer": "LangSmith is a platform for developing and monitoring LLM applications."}),
        ({"question": "How does tracing work?"}, {"answer": "Tracing captures the execution of LLM calls and chains for debugging and analysis."}),
    ]
    
    client.create_examples(
        inputs=[e[0] for e in examples],
        outputs=[e[1] for e in examples],
        dataset_id=dataset.id
    )

TypeScript

import { Client } from "langsmith";

const client = new Client();

const datasetName = "example-qa-dataset";

async function setupDataset() {
  let dataset;
  try {
    dataset = await client.readDataset({ datasetName });
  } catch (e) {
    // Create a new dataset if it doesn't exist
    dataset = await client.createDataset(datasetName, {
      description: "Example QA dataset for testing",
    });
    
    // Add examples
    const examples = [
      [
        { question: "What is LangSmith?" },
        { answer: "LangSmith is a platform for developing and monitoring LLM applications." },
      ],
      [
        { question: "How does tracing work?" },
        { answer: "Tracing captures the execution of LLM calls and chains for debugging and analysis." },
      ],
    ];
    
    await client.createExamples({
      inputs: examples.map(([input]) => input),
      outputs: examples.map(([_, output]) => output),
      datasetId: dataset.id,
    });
  }
  return dataset;
}

// Run the setup
setupDataset().then(dataset => {
  console.log(`Dataset ready: ${dataset.name} (${dataset.id})`);
});


### 6. Run Evaluation

```python
# Define evaluation criteria
eval_config = RunEvalConfig(
    evaluators=[
        "qa",  # Built-in QA evaluator
        {
            "criteria": {
                "helpfulness": "How helpful is the response?",
                "relevance": "How relevant is the response to the question?",
                "conciseness": "How concise is the response?",
            }
        },
        # Custom evaluator function
        {
            "custom_evaluator": {
                "name": "custom_eval",
                "evaluation_function": lambda input, output, reference: {
                    "custom_score": 0.95,
                    "reasoning": "The response is well-structured and informative."
                }
            }
        }
    ],
    # Optional: Add metadata for analysis
    eval_llm=ChatOpenAI(temperature=0, model_name="gpt-4"),
)

# Run evaluation
results = run_on_dataset(
    dataset_name=dataset_name,
    llm_or_chain_factory=lambda: chain,  # Your chain
    evaluation=eval_config,
    verbose=True,
    project_name="my-first-eval"
)

print(f"Evaluation complete. Results: {results}")

Core Concepts

1. Tracing

Tracing allows you to visualize and debug the execution of your LLM applications.

Basic Tracing

from langsmith import traceable
from langchain.callbacks.manager import tracing_v2_enabled

@traceable
def process_query(question: str) -> str:
    """Process a user question and return a response."""
    # Your LLM chain or processing logic here
    return chain.run(question=question)

# Enable tracing for this block
with tracing_v2_enabled(project_name="my-llm-app"):
    result = process_query("What is LangSmith?")
    print(f"Trace URL: {tracing.get_trace_url()}")

TypeScript

import { traceable, tracing_v2_enabled } from "langsmith";

const processQuery = traceable(async (question: string): Promise<string> => {
  // Your LLM chain or processing logic here
  return chain.call({ question });
}, { name: "process_query" });

// Enable tracing for this block
(async () => {
  await tracing_v2_enabled({ projectName: "my-llm-app" }, async () => {
    const result = await processQuery("What is LangSmith?");
    console.log(`Trace URL: https://smith.langchain.com/o/${process.env.LANGCHAIN_PROJECT}/runs/${tracing.get_current_run_id()}`);
  });
})();

Nested Tracing

from langsmith import traceable, tracing_v2_enabled

def retrieve_context(question: str) -> str:
    """Retrieve relevant context for a question."""
    # This would typically call a vector store or other data source
    return "LangSmith is a platform for developing and monitoring LLM applications."

@traceable
def generate_response(question: str, context: str) -> str:
    """Generate a response using the provided context."""
    prompt = f"""Answer the question based on the following context:
    
    {context}
    
    Question: {question}"""
    return llm.predict(prompt)

@traceable
def answer_question(question: str) -> str:
    """End-to-end question answering."""
    context = retrieve_context(question)
    return generate_response(question, context)

# All nested calls will be traced
with tracing_v2_enabled(project_name="nested-tracing"):
    response = answer_question("What is LangSmith?")
    print(response)

TypeScript

import { traceable, tracing_v2_enabled } from "langsmith";

async function retrieveContext(question: string): Promise<string> {
  // This would typically call a vector store or other data source
  return "LangSmith is a platform for developing and monitoring LLM applications.";
}

const generateResponse = traceable(async (question: string, context: string): Promise<string> => {
  const prompt = `Answer the question based on the following context:

${context}

Question: ${question}`;
  return llm.predict(prompt);
}, { name: "generate_response" });

const answerQuestion = traceable(async (question: string): Promise<string> => {
  const context = await retrieveContext(question);
  return generateResponse(question, context);
}, { name: "answer_question" });

// All nested calls will be traced
(async () => {
  await tracing_v2_enabled({ projectName: "nested-tracing" }, async () => {
    const response = await answerQuestion("What is LangSmith?");
    console.log(response);
  });
})();

2. Evaluation

LangSmith provides powerful tools for evaluating your LLM applications.

Custom Evaluators

from typing import Dict, Any
from langchain.evaluation import load_evaluator

def custom_evaluator(run, example) -> Dict[str, Any]:
    """Custom evaluator that checks response length and content."""
    prediction = run.outputs["output"]
    
    # Initialize evaluators
    fact_evaluator = load_evaluator("criteria", criteria="factuality")
    
    # Run evaluations
    fact_result = fact_evaluator.evaluate_strings(
        prediction=prediction,
        input=example.inputs["question"]
    )
    
    # Calculate custom metrics
    word_count = len(prediction.split())
    
    return {
        "fact_score": fact_result["score"],
        "word_count": word_count,
        "is_too_short": word_count < 5,
        "feedback": fact_result["reasoning"]
    }

# Use the custom evaluator
eval_config = RunEvalConfig(
    custom_evaluators=[custom_evaluator],
    eval_llm=ChatOpenAI(temperature=0, model="gpt-4")
)

TypeScript

import { RunEvalConfig, loadEvaluator } from "langchain/evaluation";
import { ChatOpenAI } from "@langchain/openai";

interface EvaluationResult {
  fact_score: number;
  word_count: number;
  is_too_short: boolean;
  feedback: string;
}

async function customEvaluator(run: any, example: any): Promise<EvaluationResult> {
  const prediction = run.outputs.output;
  
  // Initialize evaluators
  const factEvaluator = await loadEvaluator("criteria", {
    criteria: "factuality",
    llm: new ChatOpenAI({ temperature: 0 })
  });
  
  // Run evaluations
  const factResult = await factEvaluator.evaluateStrings({
    prediction,
    input: example.inputs.question
  });
  
  // Calculate custom metrics
  const wordCount = prediction.split(/\s+/).length;
  
  return {
    fact_score: factResult.score,
    word_count: wordCount,
    is_too_short: wordCount < 5,
    feedback: factResult.reasoning || ""
  };
}

// Use the custom evaluator
const evalConfig: RunEvalConfig = {
  customEvaluators: [customEvaluator],
  evalLLM: new ChatOpenAI({
    temperature: 0,
    modelName: "gpt-4"
  })
};

Human Feedback

from langsmith import Client
from langchain.callbacks import get_openai_callback

client = Client()

# Record human feedback
def record_feedback(run_id: str, score: int, comment: str = ""):
    client.create_feedback(
        run_id,
        key="human_rating",
        score=score,  # 1-5 scale
        comment=comment,
    )

# Example usage
with get_openai_callback() as cb:
    result = chain.run(question="What is LangSmith?")
    print(f"Generated response: {result}")
    
    # Get the trace URL for human review
    trace_url = tracing.get_trace_url()
    print(f"Review at: {trace_url}")
    
    # Simulate human feedback (in a real app, this would come from a UI)
    record_feedback(
        run_id=tracing.get_current_run_id(),
        score=4,
        comment="Good response, but could be more detailed."
    )

TypeScript

import { Client } from "langsmith";
import { getOpenAICallback } from "langchain/callbacks";

const client = new Client();

// Record human feedback
async function recordFeedback(runId: string, score: number, comment: string = ""): Promise<void> {
  await client.createFeedback({
    runId,
    key: "human_rating",
    score,  // 1-5 scale
    comment,
  });
}

// Example usage
(async () => {
  const cb = await getOpenAICallback();
  try {
    const result = await chain.call({ question: "What is LangSmith?" });
    console.log(`Generated response: ${result.text}`);
    
    // Get the trace URL for human review
    const traceUrl = `https://smith.langchain.com/o/${process.env.LANGCHAIN_PROJECT}/runs/${tracing.getCurrentRunId()}`;
    console.log(`Review at: ${traceUrl}`);
    
    // Simulate human feedback (in a real app, this would come from a UI)
    await recordFeedback(
      tracing.getCurrentRunId(),
      4,
      "Good response, but could be more detailed."
    );
  } finally {
    // Make sure to close the callback
    await cb?.close();
  }
})();

3. Monitoring

Monitor your LLM applications in production with real-time metrics and alerts.

Setting Up Monitoring

from langsmith import Client
from datetime import datetime, timedelta

client = Client()

# Define metrics to track
metrics = [
    "latency",
    "token_usage",
    "feedback.human_rating",
    "evaluation.fact_score"
]

# Get metrics for the last 24 hours
end_time = datetime.utcnow()
start_time = end_time - timedelta(days=1)

metrics_data = client.read_metrics(
    project_name="my-llm-app",
    metrics=metrics,
    start_time=start_time,
    end_time=end_time,
    group_by=["model", "prompt_version"]
)

# Analyze metrics
print(f"Average latency: {metrics_data['latency'].mean()}s")
print(f"Total tokens used: {metrics_data['token_usage'].sum()}")

TypeScript

import { Client } from "langsmith";
import { subDays } from "date-fns";

const client = new Client();

// Define metrics to track
const metrics = [
  "latency",
  "token_usage",
  "feedback.human_rating",
  "evaluation.fact_score"
] as const;

// Get metrics for the last 24 hours
const endTime = new Date();
const startTime = subDays(endTime, 1);

async function getMetrics() {
  const metricsData = await client.readMetrics({
    projectName: "my-llm-app",
    metrics,
    startTime,
    endTime,
    groupBy: ["model", "prompt_version"]
  });

  // Analyze metrics
  const avgLatency = metricsData.latency.reduce((a, b) => a + b, 0) / metricsData.latency.length;
  const totalTokens = metricsData.token_usage.reduce((a, b) => a + b, 0);
  
  console.log(`Average latency: ${avgLatency.toFixed(2)}s`);
  console.log(`Total tokens used: ${totalTokens}`);
  
  return metricsData;
}

getMetrics().catch(console.error);

Setting Up Alerts

# Create an alert for high latency
alert_config = {
    "name": "High Latency Alert",
    "description": "Alert when average latency exceeds threshold",
    "metric": "latency",
    "condition": ">",
    "threshold": 5.0,  # seconds
    "window": "1h",    # 1-hour rolling window
    "notification_channels": ["email:your-email@example.com"],
    "severity": "high"
}

client.create_alert(
    project_name="my-llm-app",
    **alert_config
)

TypeScript

// Create an alert for high latency
const alertConfig = {
  name: "High Latency Alert",
  description: "Alert when average latency exceeds threshold",
  metric: "latency" as const,
  condition: ">" as const,
  threshold: 5.0,  // seconds
  window: "1h",    // 1-hour rolling window
  notificationChannels: ["email:your-email@example.com"],
  severity: "high" as const
};

async function createAlert() {
  await client.createAlert({
    projectName: "my-llm-app",
    ...alertConfig
  });
  console.log("Alert created successfully");
}

createAlert().catch(console.error);

Real-World Use Cases

1. Customer Support Chatbot

from typing import List, Dict, Any
from langchain.schema import SystemMessage, HumanMessage

class SupportBot:
    def __init__(self):
        self.llm = ChatOpenAI(temperature=0.7, model_name="gpt-4")
        self.context = []
        
    @traceable
    def generate_response(self, user_input: str) -> str:
        """Generate a response to a user's support request."""
        # Add to conversation history
        self.context.append(HumanMessage(content=user_input))
        
        # Create prompt with context
        messages = [
            SystemMessage(content="You are a helpful customer support agent."),
            *self.context[-6:]  # Last 3 exchanges (user + assistant)
        ]
        
        # Generate response
        response = self.llm(messages)
        
        # Add to context
        self.context.append(response)
        
        return response.content

# Initialize bot
bot = SupportBot()

# Example conversation
with tracing_v2_enabled(project_name="support-bot"):
    print(bot.generate_response("I can't log into my account."))
    print(bot.generate_response("I've tried resetting my password but it's not working."))

TypeScript

import { ChatOpenAI } from "@langchain/openai";
import { HumanMessage, SystemMessage } from "@langchain/core/messages";
import { traceable } from "langsmith";
import { tracing_v2_enabled } from "langsmith/trace";

class SupportBot {
  private llm: ChatOpenAI;
  private context: (HumanMessage | any)[] = [];

  constructor() {
    this.llm = new ChatOpenAI({
      temperature: 0.7,
      modelName: "gpt-4",
    });
  }

  @traceable({ name: "generate_response" })
  async generateResponse(userInput: string): Promise<string> {
    // Add to conversation history
    this.context.push(new HumanMessage(userInput));
    
    // Create prompt with context
    const messages = [
      new SystemMessage("You are a helpful customer support agent."),
      ...this.context.slice(-6) // Last 3 exchanges (user + assistant)
    ];
    
    // Generate response
    const response = await this.llm.invoke(messages);
    
    // Add to context
    this.context.push(response);
    
    return response.content;
  }
}

// Example usage
(async () => {
  const bot = new SupportBot();
  
  await tracing_v2_enabled({ projectName: "support-bot" }, async () => {
    console.log(await bot.generateResponse("I can't log into my account."));
    console.log(await bot.generateResponse("I've tried resetting my password but it's not working."));
  });
})();

2. Content Moderation Pipeline

from enum import Enum
from pydantic import BaseModel

class ModerationResult(BaseModel):
    is_safe: bool
    reason: str
    confidence: float
    flagged_categories: List[str]
    explanation: str

class ContentModerator:
    def __init__(self):
        self.llm = ChatOpenAI(temperature=0, model_name="gpt-4")
        
    @traceable
    def moderate_content(self, text: str) -> ModerationResult:
        """Check if content violates moderation policies."""
        prompt = f"""Analyze the following content for policy violations:
        
        {text}
        
        Check for:
        - Hate speech or discrimination
        - Harassment or bullying
        - Violence or harmful content
        - Sexual content
        - Personal information
        - Spam or scams
        
        Return a JSON object with:
        - is_safe (boolean)
        - reason (string)
        - confidence (float 0-1)
        - flagged_categories (list of strings)
        - explanation (string)"""
        
        response = self.llm.predict(prompt)
        return ModerationResult.parse_raw(response)

# Example usage
moderator = ContentModerator()

with tracing_v2_enabled(project_name="content-moderation"):
    result = moderator.moderate_content("This is a test message with no issues.")
    print(f"Is safe: {result.is_safe}")
    print(f"Reason: {result.reason}")

TypeScript

import { ChatOpenAI } from "@langchain/openai";
import { traceable } from "langsmith";
import { tracing_v2_enabled } from "langsmith/trace";

interface ModerationResult {
  is_safe: boolean;
  reason: string;
  confidence: number;
  flagged_categories: string[];
  explanation: string;
}

class ContentModerator {
  private llm: ChatOpenAI;

  constructor() {
    this.llm = new ChatOpenAI({
      temperature: 0,
      modelName: "gpt-4",
    });
  }

  @traceable({ name: "moderate_content" })
  async moderateContent(text: string): Promise<ModerationResult> {
    const prompt = `Analyze the following content for policy violations:

${text}

Check for:
- Hate speech or discrimination
- Harassment or bullying
- Violence or harmful content
- Sexual content
- Personal information
- Spam or scams

Return a JSON object with:
- is_safe (boolean)
- reason (string)
- confidence (float 0-1)
- flagged_categories (list of strings)
- explanation (string)`;

    const response = await this.llm.invoke(prompt);
    return JSON.parse(response.content);
  }
}

// Example usage
(async () => {
  const moderator = new ContentModerator();
  
  await tracing_v2_enabled({ projectName: "content-moderation" }, async () => {
    const result = await moderator.moderateContent("This is a test message with no issues.");
    console.log(`Is safe: ${result.is_safe}`);
    console.log(`Reason: ${result.reason}`);
  });
})();

Best Practices

1. Effective Tracing

Use meaningful names: Give your traces and spans descriptive names
Add metadata: Include relevant context in your traces
Handle errors: Use try/except blocks and log errors appropriately
Use spans: Group related operations together

from langsmith import trace_span

def process_document(document: str) -> dict:
    """Process a document through multiple steps."""
    with trace_span("document_processing") as span:
        # Add metadata to the span
        span.metadata.update({
            "document_length": len(document),
            "processing_start_time": datetime.utcnow().isoformat()
        })
        
        try:
            # Step 1: Extract text
            with trace_span("text_extraction"):
                text = extract_text(document)
                
            # Step 2: Analyze sentiment
            with trace_span("sentiment_analysis"):
                sentiment = analyze_sentiment(text)
                
            # Step 3: Generate summary
            with trace_span("summarization"):
                summary = generate_summary(text)
                
            return {
                "text": text,
                "sentiment": sentiment,
                "summary": summary,
                "status": "success"
            }
            
        except Exception as e:
            # Log the error
            span.metadata["error"] = str(e)
            raise

2. Effective Evaluation

Define clear criteria: Be specific about what makes a good response
Use multiple evaluators: Combine automated and human evaluation
Test edge cases: Include challenging examples in your test sets
Iterate: Use evaluation results to improve your prompts and models

3. Production Monitoring

Set up alerts: Get notified of issues in real-time
Track key metrics: Monitor latency, token usage, and quality scores
A/B test: Compare different model versions or prompts
Retain data: Keep enough history to identify trends and patterns

Troubleshooting

Common Issues

Missing Traces
- Verify LANGCHAIN_TRACING_V2 is set to "true"
- Check your API key has the correct permissions
- Ensure your code is running in a traced context
Evaluation Errors
- Check that your dataset format matches expected input
- Verify evaluator requirements are met
- Ensure your API keys have the necessary permissions
Performance Issues
- Check for rate limiting
- Optimize your prompts to reduce token usage
- Consider batching requests when possible

API Reference

Core Functions

tracing_v2_enabled(): Context manager for enabling tracing
traceable: Decorator for tracing functions
RunEvalConfig: Configuration for evaluation runs
Client: Main client for interacting with the LangSmith API

Client Methods

create_dataset(): Create a new dataset
create_examples(): Add examples to a dataset
run_on_dataset(): Run evaluation on a dataset
read_metrics(): Read metrics for a project
create_alert(): Create a monitoring alert

Next Steps

Explore the UI: Visit LangSmith Dashboard to view your traces and metrics
Join the Community: Get help and share your experiences in the LangChain Community
Read the Docs: Check out the official documentation for more details
Try Examples: Experiment with the example notebooks in the LangSmith Examples repository

FilesExpand file tree

langsmith.md

Latest commit

History

langsmith.md

File metadata and controls

LangSmith: The Complete Guide

Introduction

Key Benefits

Quick Start

1. Installation

Python

TypeScript

2. Set Up Your Environment

Python

TypeScript

3. Create a Simple Chain

Python

TypeScript

4. Set Up Tracing

Python

TypeScript

5. Create a Test Dataset

Python

TypeScript

Core Concepts

1. Tracing

Basic Tracing

TypeScript

Nested Tracing

TypeScript

2. Evaluation

Custom Evaluators

TypeScript

Human Feedback

TypeScript

3. Monitoring

Setting Up Monitoring

TypeScript

Setting Up Alerts

TypeScript

Real-World Use Cases

1. Customer Support Chatbot

TypeScript

2. Content Moderation Pipeline

TypeScript

Best Practices

1. Effective Tracing

2. Effective Evaluation

3. Production Monitoring

Troubleshooting

Common Issues

API Reference

Core Functions

Client Methods

Next Steps