Skip to content

lixinqi/Experience

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

260 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Experience

A self-improving harness framework built as a PyTorch extension. Composable tensor ops + trainable experience + autograd = portable Modules that improve through training.

The result: portable, self-improving agents that share the same infrastructure but learn different skills through training. Like Claude Code slash commands, but standardized — typed tensors, autograd, shape contracts, trainable experience.

Example: LLM Coder Simulator

An LLM-powered coder composed entirely from reusable ops — no custom logic except a termination validator. Two chained experts collaborate: Expert 1 decides WHAT to do, Expert 2 knows HOW to do it.

# [1, 30] live terminal text
capture_op       = ft_tmux_capture_pane(workspace)
# [1, 30] → "text:hint"|"ctrl:hint"
decision_expert  = ft_expert(capture_op, decision_exp)
# [1, 30] → "echo hello"|"Enter"
cmd_expert       = ft_expert(decision_expert, cmd_exp)
# [1, 30] routes by prefix
switched_send    = ft_switch(decision_expert, [("text",…),("ctrl",…)])
# [1, 30]
validator        = ft_coder_validator(ft_sequential(switched_send, sleep, capture))
# [1] reduced
output           = ft_recurrent(validator)

Every op except ft_coder_validator is shared infrastructure reusable across all harness models.

Chain Experts, Don't Parse

Instead of one monolithic expert + custom parsers:

# BAD: custom parsing, not reusable, not trainable
decision = parse_action(expert_output)  # brittle regex
cmd = extract_command(expert_output)    # another custom parser

Chain two experts — each with its own trainable experience:

# GOOD: composable, trainable, extensible
decision_expert = ft_expert(capture_op, decision_experience, topk=2)   # WHAT
cmd_expert = ft_expert(decision_expert, cmd_experience, topk=2)         # HOW

Expert 1's output feeds directly as Expert 2's input. New action types don't require changing Expert 2 — just add experience entries to Expert 1.

Experience Tensors (Learnable Weights)

# Decision expert: terminal observation → action_type:hint
decision_experience = make_tensor([
    ["提示符后面没有其他文字", "命令行为空", "text:输入shell命令"],
    ["提示符后面有echo命令",   "命令行已有内容", "ctrl:按回车执行"],
], tmpdir)

# Cmd expert: hint → real command
cmd_experience = make_tensor([
    ["text:输入shell命令\n任务是列出目录",    "需要ls",   "ls"],
    ["text:输入shell命令\n任务是输出hello",   "需要echo", "echo hello world"],
    ["ctrl:按回车执行",                       "按Enter",  "Enter"],
], tmpdir)

These are [N, 3] QKV tensors — trainable via backward + StSGD. More experience = better generalization.

Portable Modules

A harness Module is defined by:

  1. Experience tensors — the learned skill (trainable via autograd)
  2. Validator — the termination condition (harness-specific)
  3. Pipeline structure — identical across all modules (shared ops)

Same structure + different experience = different behavior:

Module Experience 1 Experience 2 Validator
Coder terminal → action_type hint → shell cmd prompt returns to idle
Debugger error → strategy strategy → debug cmd tests pass
Deployer status → action action → deploy cmd service healthy
python -m experience.future_tensor.test.test_llm_coder_simulator

Why a Compute Graph?

Programs are sequential, branch, and loop. We compact all three into typed tensor ops:

Control flow Op What it does
Sequential ft_sequential Do A then B
Branch ft_switch If X then A else B
Loop ft_recurrent Repeat until validator passes

This makes agent generation trivial for LLMs. Instead of writing correct imperative code (hard — shared state, error handling, async coordination), the LLM composes a graph from ~10 typed ops (easy — slot-filling with shape verification). Three properties make this uniquely LLM-friendly:

  1. Structure vs behavior separation. The LLM generates the graph (structure) once. Behavior is determined by experience tensors, which self-improve through training. The LLM doesn't need to get behavior right at generation time.
  2. Shape contracts = static verification. If shapes don't match, the composition is wrong — detectable before execution. LLMs make fewer structural errors than behavioral errors.
  3. Fault-tolerant generation. Even poor seed experience gets corrected by training. The graph just needs to be structurally correct.

How It Works

Tensor Elements Are Files

Each tensor element is a text file on disk. Numeric coefficients flow through standard PyTorch autograd while symbolic content (code, translations, commands) lives in files.

{relative_to}/{tensor_uid}/
├── shape                    # JSON: [2, 3]
├── storage/
│   ├── 0/data               # Element at flat index 0
│   ├── 1/data               # Element at flat index 1
│   └── 1/1/data             # Multi-digit index 11

ExperienceTensor

An [N, 3] tensor where each row is (query, key, value):

  • Query: semantic keywords for Jaccard similarity retrieval
  • Key: source domain content
  • Value: target domain content

Starts empty, populated during training. The backward pass computes diffs, the optimizer patches entries.

FutureTensor (Lazy Async Ops)

A FutureTensor is a scalar torch.Tensor monkey-patched with ft_* attributes — a reference to a pending computation. Elements materialize on-demand via LLM calls.

Op Purpose
ft_expert Query + retrieval + LLM translation (chainable)
ft_switch Lazy control-flow branch selection
ft_sequential Sequential evaluation with early-return on error
ft_recurrent Generate-validate retry loop (reduces last dim)
ft_tmux_* Terminal session ops (create, send_text, send_ctrl, capture)
ft_slice / ft_unsqueeze Shape manipulation with autograd

Dual-Channel Gradients

Channel Carries Computed by
Numeric Float coefficients (bfloat16) Standard autograd
Symbolic Unified diffs in files LLM compares actual vs expected

Optimizer (StSGD) applies both: numeric SGD update + patch CLI applies diffs to experience files.

Second Derivative (Meta-Learning)

The LLM reflects on its own reflections:

backward_dispatch_start = torch.ones((), dtype=torch.bfloat16, requires_grad=True)
anchored = need_reflection(input_ft, backward_dispatch_start)
output = model(anchored)
loss = ft_mean(output)

loss.backward(create_graph=True)

records = []
with dispatch_policy(TracePolicy(records)):
    backward_dispatch_start.grad.backward()
# records: list of ReflectionRecord(fn, inputs, output, timestamp)

Design Principles

  • Static DAG, not imperative orchestration — compose a graph of lazy tensors, then pull from output
  • Chain experts, don't parse — two chained ft_expert ops replace custom parsers
  • No shared mutable state — coordination through tensor coordinates
  • Observe the world directly — read live state, no shadow copies
  • Self-improving — experience accumulates knowledge via backward + optimizer step
  • Experience starts empty — learned entirely at runtime, cold-start supported
  • LLM as compute kernel — replaces matrix multiplication with semantic reasoning

Roadmap

  1. Stage 0: Tmux-forced coding agent capture (immediate). Hook Claude Code (via MCP or shell wrapper) to route all bash commands through tmux sessions. Capture every terminal interaction as (observation, decision, command) triples. Write them directly into experience tensors for harness model cold start. No harness model needed — pure data harvesting from real coding sessions.
  2. Adapter mechanism for existing coding agents. Two directions:
    • Agent → Harness Module: wrap Claude Code / OpenCode / OpenClaw / Hermes as a Harness Module — their tool interfaces become ft_* ops in the compute graph, gaining autograd and self-improvement for free.
    • Harness Module → Agent skill: export a trained Harness Module as a coding agent tool/skill — any agent can invoke it as a composable capability without knowing the internals.
  3. Ground-truth terminal interactions sharing. Two mechanisms:
    • Capture streams hub: record live terminal sessions into standardized experience tensors via tmux capture. A central hub aggregates streams from multiple developers/agents into a shared experience pool.
    • Learn from existing interactions: train harness agents on recorded ground-truth sessions — enabling experience transfer across agents. One agent's successful terminal interaction becomes another agent's seed experience.
  4. Meta tasks learning. For each new code repo, bootstrap experience through self-supervised tasks that require no human labels:
    • Masked code reconstruction: mask a code region, train the agent to reconstruct it from surrounding context. Teaches code structure and local patterns.
    • Docstring ↔ code: given a docstring, generate the implementation (and vice versa). Teaches intent-to-code mapping.
    • Code coverage by tests: given a function, generate tests that maximize coverage. Teaches behavioral understanding.
    • Runtime stack prediction: given a call site, predict the runtime call stack. Teaches control flow and dependency reasoning.

Suggested Study: Progressive Research

Four stages of increasing autonomy — each stage removes one human-in-the-loop dependency:

Stage Forward Experience Update Graph Update
0 Claude Code in tmux captured terminal sessions → experience manual graph
1 0th reflection coding agent directly coding agent directly
2 0th reflection 1st reflection (autograd) coding agent guided by 2nd reflection trace
3 0th reflection 1st reflection (autograd) bootstrapped harness model guided by 2nd reflection trace

Stage 0: Ground-truth capture for cold start. Force Claude Code (or any coding agent) to run all bash commands through tmux sessions instead of direct shell execution. Capture the full terminal output stream — every command typed, every response observed, every decision made. These captured sessions become ground-truth experience entries for harness model cold start: terminal observation → decision → command triples, written directly into experience tensors. No harness model runs yet — this stage purely harvests real-world coding interactions as training data.

Stage 1: Manual iteration. The harness runs forward (0th reflection). A coding agent (Claude Code, etc.) inspects results and directly edits experience tensors and the compute graph. This is the current working state — human-assisted improvement.

Stage 2: Self-improving experience, assisted graph evolution. Forward runs as before. The 1st derivative (backward pass) automatically updates experience via StSGD — no human needed for experience improvement. For graph structure changes, a coding agent reads the 2nd derivative trace (ReflectionRecord list) and decides which ops to add/remove/rewire.

Stage 3: Fully autonomous. Same as Stage 2, but the coding agent for graph updates is itself a trained harness model — a "meta-harness" whose experience is "how to improve compute graphs given 2nd derivative traces." The system bootstraps its own architecture search.

Current status: Stage 0 ready (tmux capture infrastructure works), Stage 1 implementing — forward pass, experience tensors, compute graph composition, speculative multi-action dispatch, and 2nd derivative trace recording all work. Experience update and graph update still require coding agent. Stage 2, 3 are TODO.

Thesis

Base LLM weights carry two things: general capabilities (reasoning, reflection) and memories (facts, patterns, code idioms). Most of the parameters are memories.

This project externalizes memory into trainable experience tensors — retrievable, patchable, shareable, and composable. The base LLM only needs to be good at reasoning and reflection. Everything else lives in experience.

If Stage 3's self-referencing succeeds — a harness model improving its own compute graph via 2nd derivative traces — the system becomes an open-ended self-improver. Reasoning reflects on reasoning, experience accumulates without bound, and architecture evolves autonomously. That's the path.

Internals

Two LLM backends: raw_llm_api (OpenAI-compatible, lightweight) and coding_agent (Claude Agent SDK with file tools), dispatched concurrently via asyncio.gather. Autograd functions in symbolic_tensor/function/ implement StMoe (query → retrieval → LLM translate), attention, slicing (symlink views vs copies), stack, fork, merge, and edit-distance loss. Tensor utilities (tensor_util/) provide make_tensor, get_diff_tensor, patch_tensor (unified diffs, cold-start support), and Pythonic methods registered on torch.Tensor (st_pack, st_patch, st_get_diff, st_view_slicer[...], etc.).

Installation

pip install torch openai claude-agent-sdk Levenshtein

Requires Python 3.13+.

Architecture

experience/
├── symbolic_tensor/
│   ├── function/          # Autograd Functions: st_moe, st_attention, st_stack, slice_*, merge, fork, copy, loss
│   ├── tensor_util/       # Symbolic tensor primitives: make, slice, assign, diff, patch, dense/sparse
│   ├── module/            # nn.Module wrappers: StMoeModule, WithDenseView
│   ├── optimizer/         # StSGD: dual-channel (numeric + symbolic patch) optimizer
│   ├── data_loader/       # Batch data loading from files
│   └── test/              # Integration tests
├── future_tensor/
│   ├── future_tensor.py   # FutureTensor factory: lazy async scalar tensor with ft_* attributes
│   ├── status.py          # Status tagged union (confidence, scbf, kContextOverflow, etc.)
│   ├── function/          # Autograd ops: slice, unsqueeze, recurrent, expert, switch, sequential, tmux
│   └── backward_dispatch/ # 2nd-derivative framework: policies, dispatcher, GradFn wrappers
├── llm_client/            # LLM backends: raw API (OpenAI-compatible) and coding agent (Claude SDK)
├── sparse_util/           # Sparse coordinate operations (transpose, convert)
├── fs_util/               # File system utilities (directory packing, path enumeration, text merger)
├── test/                  # End-to-end tests and benchmarks
└── example/               # Training demos

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors