AutoResearchUI

Local-first observability dashboard for agentic research loops.

AutoResearchUI is an open-source sidecar application that gives your AI-driven research loop a dedicated real-time interface. It watches your research repository, streams live metrics, tracks experiment outcomes, shows code diffs, monitors hardware, and lets you review and compare runs over time — all without touching your existing research code.

What Is an "Agentic Research Loop"?

An agentic research loop is a pattern popularised by Andrej Karpathy where a language model autonomously runs experiments:

The agent reads program.md — a file describing the research goal and current plan
The agent edits train.py (or equivalent) — modifying hyperparameters, architecture, or training logic
The agent runs the experiment — uv run train.py > run.log 2>&1
The experiment writes results — metrics appended to results.tsv or similar
The agent evaluates — if the metric improved, the code change is committed ("ratchet"); if not, it is discarded
The loop repeats — the agent proposes the next hypothesis and edits the code again

This process can run for hours or days. AutoResearchUI makes that process visible — showing you exactly what the agent is doing, what metrics it is achieving, and what code it is writing, all streamed live to a browser dashboard.

What AutoResearchUI Is Not

It is not an agent itself. It observes agents, it does not replace them.
It does not modify your research repo. It only reads files and watches for changes.
It is not a hosted service. It runs entirely on your local machine.
It does not require changes to your research code. Point it at an existing repo and start watching.

Key Concepts

Concept	What It Means
Project Root	The directory of your research repo — the folder AutoResearchUI watches
Script to Watch	The file the agent edits most (usually `train.py`) — diffs are tracked against this file
Log File	The structured metrics file the experiment writes to (CSV, TSV, JSONL, JSON)
Primary Metric	The column from the log file to visualize — e.g. `val_bpb`, `loss`, `accuracy`
Optimization Goal	Whether lower (`minimize`) or higher (`maximize`) is better
Research Command	The shell command that runs one full experiment — e.g. `uv run train.py > run.log 2>&1`
Session	One continuous monitoring session. Sessions are stored in SQLite and can be compared later.
Ratchet	An experiment result labeled `KEPT` — a commit that improved the best metric

Features

Core Monitoring

Real-time metric chart with WebSocket streaming (no page refresh needed)
Experiment feed showing KEPT, DISCARDED, and INFO outcomes
Side-by-side code diff viewer for the watched script
Live stdout / stderr terminal panel
Process lifecycle controls: Start, Stop, Restart

Multi-Run Benchmarking

Every session is persisted to a local SQLite database (autoresearch.db)
Load any previous session as a baseline from the sidebar
The chart overlays current run vs. baseline with a dashed line
A banner shows IMPROVED +X% or REGRESSED -X% compared to the baseline best

Git Ratchet Timeline

The Timeline view shows a chart of best metric per commit across the last 30 commits
Each commit is labeled with the best metric it achieved and how many iterations ran at that point
A Rollback button on each commit lets you instantly revert the repo to any prior state (uses git checkout, puts repo in detached HEAD)

Agent Plan Tracker

If the research repo contains a program.md (or readme with checkboxes), AutoResearchUI parses it
The right sidebar shows the agent's current plan as a checklist, with completed items crossed out
Helps you track whether the agent is following its intended research protocol

GPU & Hardware Monitoring

Polls nvidia-smi every 5 seconds (silently skipped if no NVIDIA GPU is detected)
Right sidebar shows per-GPU: utilization, VRAM used/total, temperature, power draw
Progress bars update in real time during active training runs

AI Research Assistant

The Activity view includes an "Ask AI" panel backed by Ollama
Sends recent experiment context (metrics, hypotheses, outcomes) to a local LLM
Returns a natural-language explanation of why experiments succeeded or failed
Works with any model available in your local Ollama installation (default: llama3.2)

Export

.md — Markdown research summary with experiment table and best metric
.html — Standalone interactive HTML with a Chart.js chart, experiment table, and GPU info; zero external dependencies, sharable as a single file
.svg — Raw SVG of the metric chart for embedding in papers or notebooks

The Dashboard

AutoResearchUI has five views, accessible from the left sidebar:

Overview

The primary monitoring screen.

4 KPI cards at the top: Current Metric, Best Metric, Total Experiments, Keep Rate
Baseline comparison banner when a prior session is loaded for comparison
Live metric chart — blue line for current run, dashed gray line for baseline
Experiment records table — every iteration with status badge, metric value, and truncated hypothesis
Right panel — GPU stats, recent git commits, agent plan checklist, process status

Git Timeline

Track how the best metric has evolved across git commits.

Line chart with commits on the X axis and best metric on the Y axis
Per-commit metadata: short SHA, commit message, date, best metric reached, iteration count
Hover on any row and a Rollback button appears — instantly reverts to that commit

Run History

Browse and compare previous research sessions.

Session list showing metric name, optimization goal, project path, and start time
Click any session to see its best metric, experiment count, and outcome log
Use the sidebar Baseline Session selector to overlay any past session on the Overview chart

Activity

Detailed per-iteration logs and terminal output.

Hypothesis feed — every experiment's hypothesis, status badge, metric value, and timestamp
Terminal panel — live stdout/stderr stream in a dark terminal window
AI Research Assistant — natural-language analysis of your run via Ollama

Code Diff

Side-by-side diff of the watched script.

Shows the before/after state of the file the agent is editing
Linked to the latest Git commit when one is detected
Updates in real time as the agent modifies the file

Installation

Requirements

Dependency	Minimum Version
Python	3.10
Node.js	18
npm	9
Git	any (required for timeline and rollback features)

Optional for full feature set:

Optional	Purpose
NVIDIA GPU + `nvidia-smi`	GPU monitoring panel
Ollama	AI Research Assistant

Step 1 — Clone AutoResearchUI

git clone https://github.com/your-org/AutoResearchUI.git
cd AutoResearchUI

Step 2 — Install Python Package

python -m pip install -e .

This installs the autoresearchui CLI command globally into your Python environment.

Step 3 — Install Frontend Dependencies

npm install

That is everything. AutoResearchUI is now ready to use.

Quick Start

Open a terminal in your research repository (not the AutoResearchUI folder):

cd /path/to/your-research-repo
autoresearchui

AutoResearchUI will:

Detect your Git repository
Scan for likely script files, log files, and metric columns
Auto-map the configuration (or prompt you if --interactive-mapping is set)
Start the FastAPI backend on http://127.0.0.1:8000
Start the Next.js dashboard on http://127.0.0.1:3000
Open your browser automatically

The entire process takes about 5–10 seconds on first start (Next.js build), then under 2 seconds on subsequent starts.

Configuration Walkthrough

When you open the dashboard, you will see a dark sidebar on the left. If the mapping was not fully auto-detected, expand the Configuration section to set:

Field	What to Enter
Workspace	Path to your research repo root. Click the folder icon to scan.
Target Script	The `.py` file the agent edits — usually `train.py`.
Log File	The metrics file — usually `results.tsv`, `results.csv`, or a `.jsonl` file.
Primary Metric	The column to plot — e.g. `val_bpb`, `loss`, `accuracy`, `reward`.
Optimization	`Min` if lower is better (loss, BPB), `Max` if higher is better (accuracy, reward).
Runtime Command	The shell command that runs one experiment.

Click Apply Configuration to activate the mapping and begin tracking.

To start the research loop from the UI, click Start in the top header bar.

Karpathy-Style `autoresearch` Repo

If your repo follows the canonical autoresearch layout:

research-repo/
├── train.py          # Model, optimizer, main loop — the file the agent edits
├── prepare.py        # Data prep, constants, evaluation helpers — rarely touched
├── program.md        # Agent instructions, research goal, expected output files
├── pyproject.toml    # Dependencies
├── results.tsv       # Appended by each experiment: iteration, metric, hypothesis, status
└── run.log           # Raw stdout from each run

AutoResearchUI will auto-detect this layout and suggest:

train.py as Script to Watch
results.tsv as Log File
val_bpb, loss, or whichever metric column it finds as Primary Metric
minimize as Optimization Goal (for loss-style metrics)
uv run train.py > run.log 2>&1 as Runtime Command (if found in program.md)

It also parses program.md for - [ ] / - [x] checkbox lines and displays them in the sidebar as the Agent Plan checklist.

Expected results.tsv format:

iteration	timestamp	val_bpb	train_loss	hypothesis	status
1	2026-03-12T09:00:00Z	1.42	1.51	try wider hidden layer	KEPT
2	2026-03-12T09:04:00Z	1.31	1.39	raise learning rate	KEPT
3	2026-03-12T09:08:00Z	1.35	1.41	add weight decay	DISCARDED

Supported Log Formats

AutoResearchUI reads structured experiment logs incrementally — only new lines are parsed on each update, so large logs never cause slowdowns.

CSV

iteration,timestamp,loss,accuracy,hypothesis,status
1,2026-03-12T09:00:00Z,1.42,0.55,"try wider hidden layer",KEPT
2,2026-03-12T09:04:00Z,1.31,0.58,"raise learning rate slightly",KEPT

TSV

Same as CSV but tab-separated. The most common format for autoresearch-style repos.

iteration	timestamp	val_bpb	status	hypothesis
1	2026-03-12T09:00:00Z	1.42	KEPT	try wider hidden layer

JSON Lines (`.jsonl` / `.ndjson`)

One JSON object per line, each object being one experiment row.

{"iteration": 1, "timestamp": "2026-03-12T09:00:00Z", "bpb": 1.42, "hypothesis": "wider layer", "status": "KEPT"}
{"iteration": 2, "timestamp": "2026-03-12T09:04:00Z", "bpb": 1.31, "hypothesis": "higher lr", "status": "KEPT"}

Plain JSON

A JSON array of objects where each object is one experiment row.

[
  {"iteration": 1, "loss": 1.42, "status": "KEPT"},
  {"iteration": 2, "loss": 1.31, "status": "KEPT"}
]

Recommended columns (flexible — only iteration or a numeric metric column is strictly required):

Column	Type	Purpose
`iteration`	integer	Row number / experiment index
`timestamp`	ISO 8601 string	When this experiment completed
`val_bpb` / `loss` / `accuracy` / ...	float	The primary optimization metric
`hypothesis`	string	Agent's reasoning for this change
`status`	`KEPT` \| `DISCARDED` \| `INFO`	Whether the result was accepted

CLI Reference

autoresearchui [OPTIONS]

Flag	Default	Description
`--project-root PATH`	current Git repo	Path to the research repo to monitor
`--interactive-mapping`	off	Prompt for script, log, metric, goal, and command in the terminal before starting
`--non-interactive`	on	Accept auto-detected mapping without prompting (this is the default)
`--no-open-browser`	off	Do not automatically open the dashboard in a browser
`--backend-only`	off	Start only the FastAPI backend (port 8000); skip the Next.js frontend
`--frontend-only`	off	Start only the Next.js frontend (port 3000); expect an external backend
`--skip-install`	off	Do not auto-install missing Python or npm dependencies
`--bootstrap-project`	off	Attempt to install the dependencies of the detected target research repo
`--backend-host HOST`	`127.0.0.1`	Host for the backend server
`--backend-port PORT`	`8000`	Port for the backend server
`--frontend-host HOST`	`127.0.0.1`	Host for the frontend server
`--frontend-port PORT`	`3000`	Port for the frontend server

Common Usage Patterns

Standard — point at the current directory:

cd /path/to/your-research-repo
autoresearchui

Explicit path — point at a different repo:

autoresearchui --project-root /path/to/research-repo

Interactive mode — choose every mapping field via terminal prompts:

autoresearchui --interactive-mapping

Headless / CI — start without opening a browser:

autoresearchui --no-open-browser

Backend only — useful when the frontend is already running:

autoresearchui --backend-only

Multi-Run Benchmarking

Every time you start AutoResearchUI and run experiments, the session is saved to autoresearch.db (a local SQLite file in the AutoResearchUI folder).

To compare a current run against a past session:

Open Configuration in the left sidebar
Under Baseline Session, select any previous session from the dropdown
Return to Overview — the chart now shows both runs, with a difference banner

The banner reads:

IMPROVED +X% in green if the current best is better than the baseline best
REGRESSED -X% in red if the current run performs worse

Baselines are session-scoped — they do not affect your research repo in any way.

Git Timeline and Rollback

The Git Timeline view (GitBranch icon in the sidebar) shows how your best metric evolved commit-by-commit.

How it works:

AutoResearchUI correlates the metric data stored in SQLite with Git commit history. For each commit, it finds the best metric value recorded while that commit was the HEAD of the branch.

To use Rollback:

Go to the Git Timeline view
Hover over any commit row
Click the Rollback button that appears
Confirm the dialog

This runs git checkout <sha> on your research repo, putting it in detached HEAD state at that commit. Your working tree will match the state of the code at that point. From there you can branch or continue experimenting.

Note: Rollback does not delete any commits. It is non-destructive. You can return to HEAD at any time with git checkout main (or whatever your branch is called).

GPU Monitoring

If an NVIDIA GPU is present, AutoResearchUI polls nvidia-smi every 5 seconds and shows:

GPU name and index
Utilization percentage (progress bar)
VRAM used and total (progress bar, shown in GB)
Temperature in Celsius
Power draw in Watts

The GPU panel only appears when at least one NVIDIA GPU is detected. On systems without nvidia-smi (AMD, Apple Silicon, CPU-only), the panel is silently hidden.

AI Research Assistant

The Activity view includes an AI panel backed by Ollama running locally.

To use it:

Install Ollama: https://ollama.ai
Pull a model: ollama pull llama3.2
Ensure Ollama is running: ollama serve
In the Activity view, set the Model input to your model name (default: llama3.2)
Click Ask AI

AutoResearchUI sends the last 10 experiment outcomes (hypothesis, metric value, status) to the model and asks it to explain what is working, what is not, and what to try next.

You can use any model available in Ollama — llama3.1, mistral, gemma2, phi3, etc. Larger models give better analysis.

Export

Three export formats are available from the header bar on any view:

Button	Output	Contents
`.md`	`research_summary.md`	Markdown file with project info, best metric, and full experiment table
`.html`	`research_summary.html`	Standalone HTML with an interactive Chart.js chart, experiment table, and GPU info — no internet required
`.svg`	`autoresearchui-<metric>-graph.svg`	SVG image of the current metric chart — suitable for papers, notebooks, or reports

Architecture

AutoResearchUI/
├── app_backend.py          FastAPI server — WebSocket streaming, file watching, log parsing,
│                           subprocess management, SQLite persistence, Git integration
├── autoresearchui_cli.py   CLI launcher — repo detection, auto-mapping, service orchestration
├── Dashboard.tsx           Next.js client component — the full dashboard UI
├── ProjectConfig.ts        Shared TypeScript type definitions
├── app/
│   ├── page.tsx            Next.js page (re-exports Dashboard)
│   ├── layout.tsx          Root layout
│   └── globals.css         Global styles
├── autoresearch.db         SQLite database (created at runtime, stores sessions/experiments)
├── pyproject.toml          Python package manifest
└── package.json            Node.js package manifest

Data Flow

Your Research Repo
       │
       │  file events (watchdog)
       │  git history (GitPython)
       ▼
app_backend.py  ──── WebSocket ────►  Dashboard.tsx
       │                               (browser)
       │  SQLite (autoresearch.db)
       │  nvidia-smi subprocess
       │  Ollama HTTP (optional)
       ▼
 /api/* REST endpoints

All API Endpoints

Method	Path	Description
`GET`	`/api/health`	Service health, config status, watcher state
`GET`	`/api/state`	Full application snapshot (config, metrics, experiments, diff, stdout)
`GET`	`/api/discovery`	Scan project root for scripts, logs, metrics, commits, plan
`POST`	`/api/config`	Apply a new mapping configuration
`POST`	`/api/process/start`	Start the research command
`POST`	`/api/process/stop`	Stop the research command
`POST`	`/api/process/restart`	Restart the research command
`WS`	`/ws`	WebSocket stream — pushes `AppSnapshot` on every file change
`GET`	`/api/sessions`	List all past sessions from SQLite
`GET`	`/api/sessions/{id}`	Metric points and experiments for a specific session
`GET`	`/api/git/timeline`	Best metric per commit for the last 30 commits
`POST`	`/api/git/rollback?sha=<sha>`	`git checkout <sha>` on the project root
`GET`	`/api/export/markdown`	Download `research_summary.md`
`GET`	`/api/export/html`	Download standalone `research_summary.html`
`GET`	`/api/gpu`	Current GPU stats from `nvidia-smi`
`POST`	`/api/llm/analyze?model=<name>`	Analyze experiments with a local Ollama model

Development

Run the services individually during development:

Backend:

uvicorn app_backend:app --reload --host 127.0.0.1 --port 8000

Frontend:

npm run dev

Open http://127.0.0.1:3000 in your browser.

Type-check the frontend:

npx tsc --noEmit

Syntax-check the backend:

python -m py_compile app_backend.py

Run tests:

python -m pytest tests/

Troubleshooting

"Failed to load backend state" on first load

The backend is not yet running. Make sure uvicorn or autoresearchui started successfully. Check that port 8000 is free: netstat -an | grep 8000.

The chart shows no data

The log file has not been written yet, or the mapping points to the wrong file. Open the sidebar, verify Log File points to the correct path, and check that Primary Metric matches a column in that file. Use --interactive-mapping to review these in the terminal.

Diff view is empty

Diff tracking requires a Git repository. Make sure the project root is inside a Git repo and at least one commit exists.

GPU panel does not appear

NVIDIA GPU monitoring requires nvidia-smi to be on the system PATH. On non-NVIDIA systems (AMD, Apple Silicon, CPU-only), the panel is intentionally hidden.

AI Assistant returns "LLM unavailable"

Check that Ollama is installed and running (ollama serve). The model must be pulled first (ollama pull llama3.2). The endpoint defaults to http://localhost:11434.

The frontend takes a long time on first start

Next.js compiles the frontend on the first run. This is a one-time step that takes 15–30 seconds. Subsequent starts are fast.

Trust and Safety

AutoResearchUI is a local-only tool. It does not send data to any external service except Ollama (which also runs locally).

The backend only watches files within the configured project root
Process start requires a Git repository to be present (safety gate against accidental execution)
WebSocket connections are origin-restricted to 127.0.0.1 and localhost by default
SQLite data stays on your machine in autoresearch.db
Rollback operations only run git checkout — they never delete branches or commits

License

Released under the MIT License. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github		.github
app		app
types		types
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dashboard.tsx		Dashboard.tsx
LICENSE		LICENSE
ProjectConfig.ts		ProjectConfig.ts
README.md		README.md
app_backend.py		app_backend.py
autoresearchui_cli.py		autoresearchui_cli.py
next-env.d.ts		next-env.d.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

AutoResearchUI

What Is an "Agentic Research Loop"?

What AutoResearchUI Is Not

Key Concepts

Features

Core Monitoring

Multi-Run Benchmarking

Git Ratchet Timeline

Agent Plan Tracker

GPU & Hardware Monitoring

AI Research Assistant

Export

The Dashboard

Overview

Git Timeline

Run History

Activity

Code Diff

Installation

Requirements

Step 1 — Clone AutoResearchUI

Step 2 — Install Python Package

Step 3 — Install Frontend Dependencies

Quick Start

Configuration Walkthrough

Karpathy-Style autoresearch Repo

Supported Log Formats

CSV

TSV

JSON Lines (.jsonl / .ndjson)

Plain JSON

CLI Reference

Common Usage Patterns

Multi-Run Benchmarking

Git Timeline and Rollback

GPU Monitoring

AI Research Assistant

Export

Architecture

Data Flow

All API Endpoints

Development

Troubleshooting

Trust and Safety

License

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Karpathy-Style `autoresearch` Repo

JSON Lines (`.jsonl` / `.ndjson`)

Packages