SimKit

🤖 A TypeScript simulation framework for testing and running AI agents

What is SimKit?

SimKit lets you build, test, and run AI agents in your own custom simulated environments. It gives you a simple game loop for running agents step-by-step, supports multiple agents, and includes built-in tools (OTEL) for tracking what happens during your simulations.

Agent Agnostic & No Vendor Lock-in

SimKit works with any AI agent or LLM, no lock-in. Use your own models and run everything locally. OTEL logs can be saved to a local file or sent to a remote server.

Why Use Simulations?

Simulations let you see how your AI agents perform on real world tasks, step by step, in a safe and controlled way.

Traditional evals are great for simple tasks, but they don't give you the full picture. You can't see how your agents handle:

🎯 Multi-step tasks that need planning and memory
🛠️ Lots of different tools and actions
🌍 Realistic data and changing situations
⚡ Decisions that matter over time
🔄 Long-term planning and decision-making
📚 Processing and reasoning over large amounts of context and information

Surprisingly, most AI agents begin to fail when they are asked to do anything more than a few simple tasks.

🔄 Core: The Simulation Loop

SimKit's heart is a simple but powerful tick-based loop:

import { createSimulation, type LoopState } from "@fallom/simkit/simulation";

interface SupportTestState extends LoopState {
  totalIssues: number;
  resolvedIssues: number;
  averageResponseTime: number;
  satisfactionScores: number[];
}

const customerIssues = [
  "My account is locked and I can't access my files",
  "Billing error - charged twice for same month", 
  "App crashes every time I try to upload",
  "Can't find my downloaded files anywhere"
];

const simulation = createSimulation<SupportTestState>({
  maxTicks: 10,
  initialState: { totalIssues: 0, resolvedIssues: 0, averageResponseTime: 0, satisfactionScores: [] },
  
  onTick: async (state) => {
    // Get today's customer issues
    const dailyIssues = getRandomIssues(customerIssues, 2);
    
    for (const issue of dailyIssues) {
      const startTime = Date.now();
      
      // Test your AI support agent
      const agentResponse = await supportAgent.handle(issue);
      
      const responseTime = Date.now() - startTime;
      const satisfaction = scoreResponse(agentResponse, issue);
      
      state.totalIssues++;
      if (satisfaction > 7) state.resolvedIssues++;
      state.satisfactionScores.push(satisfaction);
      
      // Update running averages
      const avgSatisfaction = state.satisfactionScores.reduce((a,b) => a+b, 0) / state.satisfactionScores.length;
      const resolutionRate = (state.resolvedIssues / state.totalIssues) * 100;
      
      console.log(`Resolution Rate: ${resolutionRate.toFixed(1)}% | Avg Satisfaction: ${avgSatisfaction.toFixed(1)}/10`);
    }
    
    return state.tick < 9; // Test for 10 days
  },
  
  onEnd: (state) => {
    const finalSatisfaction = state.satisfactionScores.reduce((a,b) => a+b, 0) / state.satisfactionScores.length;
    console.log(`🎯 Final Results: ${((state.resolvedIssues/state.totalIssues)*100).toFixed(1)}% resolution rate, ${finalSatisfaction.toFixed(1)}/10 satisfaction`);
  }
});

await simulation.run();

What's happening here? Each tick simulates a day of customer support. SimKit feeds random issues to your AI agent, measures response quality and speed, then tracks KPIs over time. Perfect for A/B testing different models, regression testing after prompt changes, or measuring performance before production deployment.

🤖 Built for AI Agents

Global State Access

AI agents need access to simulation state from anywhere:

import { setSimState, getSimState } from "@fallom/simkit/state";

// In your simulation loop
setSimState(state);

// In your AI tools
const currentState = getSimState<MyState>();

Deterministic Testing

Reproduce exact scenarios with seeded randomness - perfect for fair model comparisons:

import { initializeRandom, choice, shuffle } from "@fallom/simkit/random";

// Test Model A
initializeRandom(12345); // Same seed = same test scenarios
const modelA_results = await testSupportAgent(modelA);

// Test Model B with identical scenarios
initializeRandom(12345); // Reset to same seed
const modelB_results = await testSupportAgent(modelB);

// Now you can fairly compare: both models faced the exact same issues
console.log(`Model A: ${modelA_results.satisfaction}/10`);
console.log(`Model B: ${modelB_results.satisfaction}/10`);

Why this matters: Without seeded randomness, Model A might get easy customer issues while Model B gets hard ones, making comparison meaningless. SimKit ensures every model faces identical test scenarios.

📊 OpenTelemetry Integration

Built-in observability for AI agent debugging with zero vendor lock-in:

import { trace } from "@opentelemetry/api";

// SimKit automatically captures spans for you
const tracer = trace.getTracer("my-simulation");
const span = tracer.startSpan("agent-decision");
span.setAttributes({
  "agent.action": "support_response",
  "simulation.tick": state.tick,
  "response.satisfaction": 8.5
});
span.end();

Send telemetry anywhere: Export to your own servers, store in local files, or pipe to any OpenTelemetry-compatible service. No vendor lock-in - you own your data.

✨ Key Features

Feature	Why It Matters for AI
🔄 Tick-Based Loop	Step-by-step agent execution with full control
📊 OpenTelemetry	Track agent decisions and debug complex behaviors
🎲 Seeded Random	Reproduce exact scenarios for testing and validation
🏗️ Global State	AI tools can access simulation state from anywhere
🔧 TypeScript	Full type safety for complex agent interactions
⚡ Bun Optimized	Fast execution for compute-intensive agent simulations

📦 Installation

npm install @fallom/simkit
# or
bun add @fallom/simkit

🎮 Examples

🚀 Getting Started: Energy AI

Simple agent making strategic decisions

cd apps/examples/energy-ai
bun install && bun run start

A straightforward example showing:

AI agent with tool calling
Basic state management
OpenTelemetry integration

🏆 Advanced: Pawn Shop Simulation

Complex multi-agent economic simulation

A comprehensive example demonstrating SimKit's full capabilities:

Multi-agent system - Shop owner + customer agents
Complex state management - Inventory, trades, conversations
Deterministic scenarios - Seeded randomness for testing
Rich telemetry - Custom spans and detailed logging
Tool ecosystem - AI agents with 10+ specialized tools

Perfect for understanding how to build production-grade agent simulations.

🚀 Why SimKit for AI Development?

Traditional Approach	With SimKit
❌ Manual loop management	✅ Built-in tick-based execution
❌ No observability	✅ OpenTelemetry integration
❌ Non-deterministic testing	✅ Seeded randomness
❌ Complex state sharing	✅ Global state management
❌ Manual telemetry setup	✅ Automatic span collection

📖 Learn More

📦 Core Package Docs - Full API reference
🚀 Energy AI Tutorial - Simple getting started guide
🏆 Pawn Shop Deep Dive - Advanced multi-agent patterns

🏗️ Development

# Install dependencies
bun install

# Build all packages  
bun run build

# Format code
bun run format

🚀 Built for the AI simulation community

📖 Documentation • 🎮 Examples • 🐛 Issues

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SimKit

What is SimKit?

Agent Agnostic & No Vendor Lock-in

Why Use Simulations?

🔄 Core: The Simulation Loop

🤖 Built for AI Agents

Global State Access

Deterministic Testing

📊 OpenTelemetry Integration

✨ Key Features

📦 Installation

🎮 Examples

🚀 Getting Started: Energy AI

🏆 Advanced: Pawn Shop Simulation

🚀 Why SimKit for AI Development?

📖 Learn More

🏗️ Development

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

SimKit

What is SimKit?

Agent Agnostic & No Vendor Lock-in

Why Use Simulations?

🔄 Core: The Simulation Loop

🤖 Built for AI Agents

Global State Access

Deterministic Testing

📊 OpenTelemetry Integration

✨ Key Features

📦 Installation

🎮 Examples

🚀 Getting Started: Energy AI

🏆 Advanced: Pawn Shop Simulation

🚀 Why SimKit for AI Development?

📖 Learn More

🏗️ Development