Local-first trace inspector for LLM agents and tool-driven workflows.
Load a trace. See what happened. Understand why.
Steplight is a terminal-native tool for inspecting LLM runs, tool calls, retries, token usage, errors, and execution bottlenecks without sending trace data to a third-party platform.
It is designed for developers building with OpenAI-style workflows, LangChain, MCP tooling, or custom JSON/YAML traces who want a fast local debugging loop and a shareable HTML report.
- Understand what your agent actually did, step by step
- Spot slow tools, retry loops, and cost spikes quickly
- Keep sensitive traces on your own machine
- Stay in the terminal instead of digging through raw JSON by hand
- Share a polished HTML report when you need feedback from teammates
- Interactive Textual inspector with color-coded timeline, scrollable details, and command palette (F1)
- Trace comparison (
slt diff) to spot regressions in duration, tokens, cost, and step composition - Rich terminal summary with JSON output (
--format json) for CI pipelines - Non-zero exit codes on diagnostic thresholds (
--fail-on warning) - Static HTML export for sharing a run with other people
- Built-in diagnostics for common agent failure patterns (bottlenecks, retry loops, cost spikes, context growth, tool concentration, and more)
- Per-tool runtime breakdown to spot which tool is burning the most time
- Support for OpenAI-style traces, Anthropic Messages API, LangChain callbacks, MCP logs, and generic JSON/YAML
- Extensible adapters and rules so you can grow it with your workflow
pip install steplightThe package installs both steplight and the shorter slt alias. The examples below use slt to avoid PowerShell's reserved sl alias.
slt summary sample_traces/agent_with_tools.json
slt summary sample_traces/agent_with_tools.json --format json
slt summary sample_traces/agent_with_tools.json --fail-on warning
slt diff sample_traces/agent_with_tools.json sample_traces/expensive_run.json
slt validate sample_traces/simple_qa.json
slt export sample_traces/expensive_run.json -o report.html
slt inspect sample_traces/agent_with_tools.json+------------------------ Steplight Summary ------------------------+
| Run: Find compliance gaps in policy |
| Source: openai |
| Overview: Duration: 14.0s | Steps: 4 | Tool calls: 2 | Retries: 0 |
| Tokens: 4,230 in / 670 out | Est. cost: $0.0028 |
+-------------------------------------------------------------------+
Diagnostics
- WARNING: Step 'web_search' took 68.6% of total runtime.
- WARNING: Tool 'web_search' took 9.6s. Check timeouts, caching, or external latency.
- INFO: Input tokens grew 4.2x between 'Draft plan' and 'Final answer'.
Bottleneck: web_search (68.6% of total runtime)
Comparing: Find compliance gaps in policy -> Long form research synthesis
Stats
Metric | Base | Head | Change
-----------+---------+---------+---------------------
Duration | 14.00s | 40.00s | +26.00s (+185.7%)
Steps | 4 | 7 | +3 (+75.0%)
Tool calls | 2 | 4 | +2 (+100.0%)
Retries | 0 | 0 | 0
Tokens in | 4,230 | 5,100 | +870 (+20.6%)
Tokens out | 670 | 1,800 | +1,130 (+168.7%)
Cost | $0.0028 | $0.4500 | +$0.4472 (+16180.8%)
Step type changes
prompt 0 -> 1 +1
tool_call 2 -> 4 +2
- OpenAI-style step traces
- Anthropic Messages API conversation traces (text + tool_use + tool_result blocks)
- LangChain callback event exports
- MCP tool call logs
- Generic JSON or YAML with an optional
steplight.yamlmapping file
Steplight currently flags:
- bottlenecks
- retry loops
- context growth
- repeated tool calls
- tool concentration (one tool dominates runtime)
- silent errors
- slow tools
- high-cost runs
- empty completions
mapping:
steps_path: "$.events"
timestamp_field: "ts"
type_field: "event_type"
type_values:
prompt: "llm_start"
completion: "llm_end"
tool_call: "tool_start"
tool_result: "tool_end"slt inspect <file>opens the interactive Textual UIslt summary <file>prints a non-interactive terminal summaryslt summary <file> --format jsonoutputs machine-readable JSONslt summary <file> --fail-on warningexits with code 1 if diagnostics fireslt diff <base> <head>compares two traces side-by-sideslt export <file> -o report.htmlcreates a static HTML reportslt validate <file>checks whether a trace can be parsed successfully
steplight/
cli/
core/
adapters/
tui/
export/
sample_traces/
tests/
python -m pip install -e .[dev]
pytest
python -m steplight.cli.main --helpReleases are fully automated. Push to main using conventional commit messages (feat:, fix:, etc.) and the pipeline handles the rest:
- CI runs tests on Python 3.11, 3.12, and 3.13
- Release Please creates a version bump PR with changelog
- The PR auto-merges after CI passes
- A GitHub Release is created and the package is published to PyPI
Steplight is currently an alpha-stage project focused on a clean local-first inspection experience, a solid parser surface, and practical diagnostics.
MIT