Local-first observability dashboard for agentic research loops.
AutoResearchUI is an open-source sidecar application that gives your AI-driven research loop a dedicated real-time interface. It watches your research repository, streams live metrics, tracks experiment outcomes, shows code diffs, monitors hardware, and lets you review and compare runs over time — all without touching your existing research code.
An agentic research loop is a pattern popularised by Andrej Karpathy where a language model autonomously runs experiments:
- The agent reads
program.md— a file describing the research goal and current plan - The agent edits
train.py(or equivalent) — modifying hyperparameters, architecture, or training logic - The agent runs the experiment —
uv run train.py > run.log 2>&1 - The experiment writes results — metrics appended to
results.tsvor similar - The agent evaluates — if the metric improved, the code change is committed ("ratchet"); if not, it is discarded
- The loop repeats — the agent proposes the next hypothesis and edits the code again
This process can run for hours or days. AutoResearchUI makes that process visible — showing you exactly what the agent is doing, what metrics it is achieving, and what code it is writing, all streamed live to a browser dashboard.
- It is not an agent itself. It observes agents, it does not replace them.
- It does not modify your research repo. It only reads files and watches for changes.
- It is not a hosted service. It runs entirely on your local machine.
- It does not require changes to your research code. Point it at an existing repo and start watching.
| Concept | What It Means |
|---|---|
| Project Root | The directory of your research repo — the folder AutoResearchUI watches |
| Script to Watch | The file the agent edits most (usually train.py) — diffs are tracked against this file |
| Log File | The structured metrics file the experiment writes to (CSV, TSV, JSONL, JSON) |
| Primary Metric | The column from the log file to visualize — e.g. val_bpb, loss, accuracy |
| Optimization Goal | Whether lower (minimize) or higher (maximize) is better |
| Research Command | The shell command that runs one full experiment — e.g. uv run train.py > run.log 2>&1 |
| Session | One continuous monitoring session. Sessions are stored in SQLite and can be compared later. |
| Ratchet | An experiment result labeled KEPT — a commit that improved the best metric |
- Real-time metric chart with WebSocket streaming (no page refresh needed)
- Experiment feed showing
KEPT,DISCARDED, andINFOoutcomes - Side-by-side code diff viewer for the watched script
- Live stdout / stderr terminal panel
- Process lifecycle controls: Start, Stop, Restart
- Every session is persisted to a local SQLite database (
autoresearch.db) - Load any previous session as a baseline from the sidebar
- The chart overlays current run vs. baseline with a dashed line
- A banner shows
IMPROVED +X%orREGRESSED -X%compared to the baseline best
- The Timeline view shows a chart of best metric per commit across the last 30 commits
- Each commit is labeled with the best metric it achieved and how many iterations ran at that point
- A Rollback button on each commit lets you instantly revert the repo to any prior state (uses
git checkout, puts repo in detached HEAD)
- If the research repo contains a
program.md(or readme with checkboxes), AutoResearchUI parses it - The right sidebar shows the agent's current plan as a checklist, with completed items crossed out
- Helps you track whether the agent is following its intended research protocol
- Polls
nvidia-smievery 5 seconds (silently skipped if no NVIDIA GPU is detected) - Right sidebar shows per-GPU: utilization, VRAM used/total, temperature, power draw
- Progress bars update in real time during active training runs
- The Activity view includes an "Ask AI" panel backed by Ollama
- Sends recent experiment context (metrics, hypotheses, outcomes) to a local LLM
- Returns a natural-language explanation of why experiments succeeded or failed
- Works with any model available in your local Ollama installation (default:
llama3.2)
.md— Markdown research summary with experiment table and best metric.html— Standalone interactive HTML with a Chart.js chart, experiment table, and GPU info; zero external dependencies, sharable as a single file.svg— Raw SVG of the metric chart for embedding in papers or notebooks
AutoResearchUI has five views, accessible from the left sidebar:
The primary monitoring screen.
- 4 KPI cards at the top: Current Metric, Best Metric, Total Experiments, Keep Rate
- Baseline comparison banner when a prior session is loaded for comparison
- Live metric chart — blue line for current run, dashed gray line for baseline
- Experiment records table — every iteration with status badge, metric value, and truncated hypothesis
- Right panel — GPU stats, recent git commits, agent plan checklist, process status
Track how the best metric has evolved across git commits.
- Line chart with commits on the X axis and best metric on the Y axis
- Per-commit metadata: short SHA, commit message, date, best metric reached, iteration count
- Hover on any row and a Rollback button appears — instantly reverts to that commit
Browse and compare previous research sessions.
- Session list showing metric name, optimization goal, project path, and start time
- Click any session to see its best metric, experiment count, and outcome log
- Use the sidebar Baseline Session selector to overlay any past session on the Overview chart
Detailed per-iteration logs and terminal output.
- Hypothesis feed — every experiment's hypothesis, status badge, metric value, and timestamp
- Terminal panel — live stdout/stderr stream in a dark terminal window
- AI Research Assistant — natural-language analysis of your run via Ollama
Side-by-side diff of the watched script.
- Shows the before/after state of the file the agent is editing
- Linked to the latest Git commit when one is detected
- Updates in real time as the agent modifies the file
| Dependency | Minimum Version |
|---|---|
| Python | 3.10 |
| Node.js | 18 |
| npm | 9 |
| Git | any (required for timeline and rollback features) |
Optional for full feature set:
| Optional | Purpose |
|---|---|
NVIDIA GPU + nvidia-smi |
GPU monitoring panel |
| Ollama | AI Research Assistant |
git clone https://github.com/your-org/AutoResearchUI.git
cd AutoResearchUIpython -m pip install -e .This installs the autoresearchui CLI command globally into your Python environment.
npm installThat is everything. AutoResearchUI is now ready to use.
Open a terminal in your research repository (not the AutoResearchUI folder):
cd /path/to/your-research-repo
autoresearchuiAutoResearchUI will:
- Detect your Git repository
- Scan for likely script files, log files, and metric columns
- Auto-map the configuration (or prompt you if
--interactive-mappingis set) - Start the FastAPI backend on
http://127.0.0.1:8000 - Start the Next.js dashboard on
http://127.0.0.1:3000 - Open your browser automatically
The entire process takes about 5–10 seconds on first start (Next.js build), then under 2 seconds on subsequent starts.
When you open the dashboard, you will see a dark sidebar on the left. If the mapping was not fully auto-detected, expand the Configuration section to set:
| Field | What to Enter |
|---|---|
| Workspace | Path to your research repo root. Click the folder icon to scan. |
| Target Script | The .py file the agent edits — usually train.py. |
| Log File | The metrics file — usually results.tsv, results.csv, or a .jsonl file. |
| Primary Metric | The column to plot — e.g. val_bpb, loss, accuracy, reward. |
| Optimization | Min if lower is better (loss, BPB), Max if higher is better (accuracy, reward). |
| Runtime Command | The shell command that runs one experiment. |
Click Apply Configuration to activate the mapping and begin tracking.
To start the research loop from the UI, click Start in the top header bar.
If your repo follows the canonical autoresearch layout:
research-repo/
├── train.py # Model, optimizer, main loop — the file the agent edits
├── prepare.py # Data prep, constants, evaluation helpers — rarely touched
├── program.md # Agent instructions, research goal, expected output files
├── pyproject.toml # Dependencies
├── results.tsv # Appended by each experiment: iteration, metric, hypothesis, status
└── run.log # Raw stdout from each run
AutoResearchUI will auto-detect this layout and suggest:
train.pyas Script to Watchresults.tsvas Log Fileval_bpb,loss, or whichever metric column it finds as Primary Metricminimizeas Optimization Goal (for loss-style metrics)uv run train.py > run.log 2>&1as Runtime Command (if found inprogram.md)
It also parses program.md for - [ ] / - [x] checkbox lines and displays them in the sidebar as the Agent Plan checklist.
Expected results.tsv format:
iteration timestamp val_bpb train_loss hypothesis status
1 2026-03-12T09:00:00Z 1.42 1.51 try wider hidden layer KEPT
2 2026-03-12T09:04:00Z 1.31 1.39 raise learning rate KEPT
3 2026-03-12T09:08:00Z 1.35 1.41 add weight decay DISCARDEDAutoResearchUI reads structured experiment logs incrementally — only new lines are parsed on each update, so large logs never cause slowdowns.
iteration,timestamp,loss,accuracy,hypothesis,status
1,2026-03-12T09:00:00Z,1.42,0.55,"try wider hidden layer",KEPT
2,2026-03-12T09:04:00Z,1.31,0.58,"raise learning rate slightly",KEPTSame as CSV but tab-separated. The most common format for autoresearch-style repos.
iteration timestamp val_bpb status hypothesis
1 2026-03-12T09:00:00Z 1.42 KEPT try wider hidden layerOne JSON object per line, each object being one experiment row.
{"iteration": 1, "timestamp": "2026-03-12T09:00:00Z", "bpb": 1.42, "hypothesis": "wider layer", "status": "KEPT"}
{"iteration": 2, "timestamp": "2026-03-12T09:04:00Z", "bpb": 1.31, "hypothesis": "higher lr", "status": "KEPT"}A JSON array of objects where each object is one experiment row.
[
{"iteration": 1, "loss": 1.42, "status": "KEPT"},
{"iteration": 2, "loss": 1.31, "status": "KEPT"}
]Recommended columns (flexible — only iteration or a numeric metric column is strictly required):
| Column | Type | Purpose |
|---|---|---|
iteration |
integer | Row number / experiment index |
timestamp |
ISO 8601 string | When this experiment completed |
val_bpb / loss / accuracy / ... |
float | The primary optimization metric |
hypothesis |
string | Agent's reasoning for this change |
status |
KEPT | DISCARDED | INFO |
Whether the result was accepted |
autoresearchui [OPTIONS]
| Flag | Default | Description |
|---|---|---|
--project-root PATH |
current Git repo | Path to the research repo to monitor |
--interactive-mapping |
off | Prompt for script, log, metric, goal, and command in the terminal before starting |
--non-interactive |
on | Accept auto-detected mapping without prompting (this is the default) |
--no-open-browser |
off | Do not automatically open the dashboard in a browser |
--backend-only |
off | Start only the FastAPI backend (port 8000); skip the Next.js frontend |
--frontend-only |
off | Start only the Next.js frontend (port 3000); expect an external backend |
--skip-install |
off | Do not auto-install missing Python or npm dependencies |
--bootstrap-project |
off | Attempt to install the dependencies of the detected target research repo |
--backend-host HOST |
127.0.0.1 |
Host for the backend server |
--backend-port PORT |
8000 |
Port for the backend server |
--frontend-host HOST |
127.0.0.1 |
Host for the frontend server |
--frontend-port PORT |
3000 |
Port for the frontend server |
Standard — point at the current directory:
cd /path/to/your-research-repo
autoresearchuiExplicit path — point at a different repo:
autoresearchui --project-root /path/to/research-repoInteractive mode — choose every mapping field via terminal prompts:
autoresearchui --interactive-mappingHeadless / CI — start without opening a browser:
autoresearchui --no-open-browserBackend only — useful when the frontend is already running:
autoresearchui --backend-onlyEvery time you start AutoResearchUI and run experiments, the session is saved to autoresearch.db (a local SQLite file in the AutoResearchUI folder).
To compare a current run against a past session:
- Open Configuration in the left sidebar
- Under Baseline Session, select any previous session from the dropdown
- Return to Overview — the chart now shows both runs, with a difference banner
The banner reads:
IMPROVED +X%in green if the current best is better than the baseline bestREGRESSED -X%in red if the current run performs worse
Baselines are session-scoped — they do not affect your research repo in any way.
The Git Timeline view (GitBranch icon in the sidebar) shows how your best metric evolved commit-by-commit.
How it works:
AutoResearchUI correlates the metric data stored in SQLite with Git commit history. For each commit, it finds the best metric value recorded while that commit was the HEAD of the branch.
To use Rollback:
- Go to the Git Timeline view
- Hover over any commit row
- Click the Rollback button that appears
- Confirm the dialog
This runs git checkout <sha> on your research repo, putting it in detached HEAD state at that commit. Your working tree will match the state of the code at that point. From there you can branch or continue experimenting.
Note: Rollback does not delete any commits. It is non-destructive. You can return to
HEADat any time withgit checkout main(or whatever your branch is called).
If an NVIDIA GPU is present, AutoResearchUI polls nvidia-smi every 5 seconds and shows:
- GPU name and index
- Utilization percentage (progress bar)
- VRAM used and total (progress bar, shown in GB)
- Temperature in Celsius
- Power draw in Watts
The GPU panel only appears when at least one NVIDIA GPU is detected. On systems without nvidia-smi (AMD, Apple Silicon, CPU-only), the panel is silently hidden.
The Activity view includes an AI panel backed by Ollama running locally.
To use it:
- Install Ollama: https://ollama.ai
- Pull a model:
ollama pull llama3.2 - Ensure Ollama is running:
ollama serve - In the Activity view, set the Model input to your model name (default:
llama3.2) - Click Ask AI
AutoResearchUI sends the last 10 experiment outcomes (hypothesis, metric value, status) to the model and asks it to explain what is working, what is not, and what to try next.
You can use any model available in Ollama — llama3.1, mistral, gemma2, phi3, etc. Larger models give better analysis.
Three export formats are available from the header bar on any view:
| Button | Output | Contents |
|---|---|---|
.md |
research_summary.md |
Markdown file with project info, best metric, and full experiment table |
.html |
research_summary.html |
Standalone HTML with an interactive Chart.js chart, experiment table, and GPU info — no internet required |
.svg |
autoresearchui-<metric>-graph.svg |
SVG image of the current metric chart — suitable for papers, notebooks, or reports |
AutoResearchUI/
├── app_backend.py FastAPI server — WebSocket streaming, file watching, log parsing,
│ subprocess management, SQLite persistence, Git integration
├── autoresearchui_cli.py CLI launcher — repo detection, auto-mapping, service orchestration
├── Dashboard.tsx Next.js client component — the full dashboard UI
├── ProjectConfig.ts Shared TypeScript type definitions
├── app/
│ ├── page.tsx Next.js page (re-exports Dashboard)
│ ├── layout.tsx Root layout
│ └── globals.css Global styles
├── autoresearch.db SQLite database (created at runtime, stores sessions/experiments)
├── pyproject.toml Python package manifest
└── package.json Node.js package manifest
Your Research Repo
│
│ file events (watchdog)
│ git history (GitPython)
▼
app_backend.py ──── WebSocket ────► Dashboard.tsx
│ (browser)
│ SQLite (autoresearch.db)
│ nvidia-smi subprocess
│ Ollama HTTP (optional)
▼
/api/* REST endpoints
| Method | Path | Description |
|---|---|---|
GET |
/api/health |
Service health, config status, watcher state |
GET |
/api/state |
Full application snapshot (config, metrics, experiments, diff, stdout) |
GET |
/api/discovery |
Scan project root for scripts, logs, metrics, commits, plan |
POST |
/api/config |
Apply a new mapping configuration |
POST |
/api/process/start |
Start the research command |
POST |
/api/process/stop |
Stop the research command |
POST |
/api/process/restart |
Restart the research command |
WS |
/ws |
WebSocket stream — pushes AppSnapshot on every file change |
GET |
/api/sessions |
List all past sessions from SQLite |
GET |
/api/sessions/{id} |
Metric points and experiments for a specific session |
GET |
/api/git/timeline |
Best metric per commit for the last 30 commits |
POST |
/api/git/rollback?sha=<sha> |
git checkout <sha> on the project root |
GET |
/api/export/markdown |
Download research_summary.md |
GET |
/api/export/html |
Download standalone research_summary.html |
GET |
/api/gpu |
Current GPU stats from nvidia-smi |
POST |
/api/llm/analyze?model=<name> |
Analyze experiments with a local Ollama model |
Run the services individually during development:
Backend:
uvicorn app_backend:app --reload --host 127.0.0.1 --port 8000Frontend:
npm run devOpen http://127.0.0.1:3000 in your browser.
Type-check the frontend:
npx tsc --noEmitSyntax-check the backend:
python -m py_compile app_backend.pyRun tests:
python -m pytest tests/"Failed to load backend state" on first load
The backend is not yet running. Make sure uvicorn or autoresearchui started successfully. Check that port 8000 is free: netstat -an | grep 8000.
The chart shows no data
The log file has not been written yet, or the mapping points to the wrong file. Open the sidebar, verify Log File points to the correct path, and check that Primary Metric matches a column in that file. Use --interactive-mapping to review these in the terminal.
Diff view is empty
Diff tracking requires a Git repository. Make sure the project root is inside a Git repo and at least one commit exists.
GPU panel does not appear
NVIDIA GPU monitoring requires nvidia-smi to be on the system PATH. On non-NVIDIA systems (AMD, Apple Silicon, CPU-only), the panel is intentionally hidden.
AI Assistant returns "LLM unavailable"
Check that Ollama is installed and running (ollama serve). The model must be pulled first (ollama pull llama3.2). The endpoint defaults to http://localhost:11434.
The frontend takes a long time on first start
Next.js compiles the frontend on the first run. This is a one-time step that takes 15–30 seconds. Subsequent starts are fast.
AutoResearchUI is a local-only tool. It does not send data to any external service except Ollama (which also runs locally).
- The backend only watches files within the configured project root
- Process start requires a Git repository to be present (safety gate against accidental execution)
- WebSocket connections are origin-restricted to
127.0.0.1andlocalhostby default - SQLite data stays on your machine in
autoresearch.db - Rollback operations only run
git checkout— they never delete branches or commits
Released under the MIT License. See LICENSE.