orb-evo

Evolutionary prompt optimization for Orb's pipeline.

Two optimization modes:

Editor evolution — evolves editor prompt strings (patch/rewrite instructions). Scored deterministically via Orb's own audit scanners.
Director evolution — evolves the system prompt or director preamble by running director → writer → LLM judge. Scored on writing quality against real conversation data mined from Orb's DB.

Uses DSPy + a custom evolutionary loop. The strong API model handles mutation, prompt writing, analysis, and judging. A local model runs the Orb pipeline as the optimization target.

How It Works

Editor Mode

1. Load baseline prompt from Orb's tool_defs.py
2. Generate synthetic eval examples with known flaws (banned phrases, repetition, etc.)
3. Evaluate baseline → get scores + traces
4. For each generation:
   a. Mutator (API model): analyze traces, propose mutation strategy
   b. Prompt Writer (API model): write new candidate from strategy
   c. Validate constraints (size, growth, tool references, injection safety)
   d. Evaluate candidate via Orb's actual editor pipeline (ReAct tool-calling loop)
   e. Score with deterministic audit
5. Return best candidate → save evolved prompt + diff + metrics

Director Mode

1. Mine real conversations from Orb's DB (character cards, history, director/writer outputs)
2. Load baseline system prompt or director preamble from Orb
3. Evaluate baseline: director call → writer call → LLM judge scores output 0-10
4. For each generation:
   a. Mutator (API model): analyze judge feedback, propose mutation strategy
   b. Prompt Writer (API model): rewrite prompt following strategy
   c. Validate constraints (size, growth)
   d. Evaluate: run full director → writer → judge pipeline with candidate prompt
5. Return best candidate → save evolved prompt + diff + metrics

Setup

git clone <repo-url> && cd orb-evo
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

Requires a local Orb checkout. Auto-discovered at ~/repos/Orb/ or set ORB_REPO_PATH.

Usage

Mine Conversations (Director Mode Only)

# Extract real conversation data from Orb's DB
python -m orb_evo.cli mine --output data/conversations.jsonl

Editor Evolution

# Dry run
python -m orb_evo.cli evolve-editor --prompt editor_preamble --dry-run

# Full run
bash run.sh evolve-editor --prompt editor_preamble \
  --target-model "openai/gemma-4-31B-it-Q5_K_M.gguf" \
  --target-base http://localhost:8081/v1 \
  --mutator-model "zai/glm-5.1" \
  --mutator-base "https://api.z.ai/api/coding/paas/v4" \
  --iterations 3 --population 3 --dataset-size 5

Director Evolution

# Dry run
python -m orb_evo.cli evolve-director --evolve system_prompt --dry-run

# Evolve system prompt
bash run.sh evolve-director \
  --target-model "openai/gemma-4-31B-it-Q5_K_M.gguf" \
  --target-base http://localhost:8081/v1 \
  --mutator-model "zai/glm-5.1" \
  --mutator-base "https://api.z.ai/api/coding/paas/v4" \
  --evolve system_prompt \
  --dataset-size 5 --iterations 5

# Evolve director preamble instead
bash run.sh evolve-director \
  --target-model "openai/gemma-4-31B-it-Q5_K_M.gguf" \
  --target-base http://localhost:8081/v1 \
  --mutator-model "zai/glm-5.1" \
  --mutator-base "https://api.z.ai/api/coding/paas/v4" \
  --evolve director_preamble \
  --dataset-size 5 --iterations 5

Note: Use run.sh to ensure ZAI_API_KEY is loaded from ~/.bashrc. Background processes don't inherit interactive shell env vars.

CLI Commands

Command	Description
`mine`	Mine real conversations from Orb's DB into JSONL
`evolve-editor`	Evolve editor prompt strings (deterministic scoring)
`evolve-director`	Evolve system prompt or director preamble (LLM judge scoring)

Editor Options

Flag	Default	Description
`--prompt`	required	Prompt name to evolve (see Available Prompts)
`--target-model`	`openai/gemma-4-31b`	Local model being optimized
`--target-base`	`http://localhost:8081/v1`	Target model API base URL
`--mutator-model`	`openai/glm-5.1`	API model for mutation/analysis
`--mutator-base`	auto	API base URL for mutator
`--iterations`	`10`	Number of evolutionary generations
`--population`	`5`	Population size per generation
`--dataset-size`	`20`	Number of eval examples

Director Options

Flag	Default	Description
`--evolve`	`system_prompt`	What to evolve: `system_prompt` or `director_preamble`
`--mined-data`	`data/conversations.jsonl`	Path to mined conversation data
`--target-model`	`openai/gemma-4-31B-it-Q5_K_M.gguf`	Local model being optimized
`--target-base`	`http://localhost:8081/v1`	Target model API base URL
`--mutator-model`	`zai/glm-5.1`	API model for mutation/analysis/judging
`--mutator-base`	auto	API base URL for mutator
`--iterations`	`5`	Number of evolutionary generations
`--population`	`3`	Population size per generation
`--dataset-size`	`5`	Number of eval examples
`--seed`	random	Random seed for example selection (omit for random)
`--max-history`	`0`	Max conversation messages per example (0 = all)
`--mined-data`	`data/conversations.jsonl`	Path to mined conversation data

Output

Results are saved to output/{model_slug}/{target}/{timestamp}/:

File	Contents
`best_prompt.txt`	Winning prompt
`results.json`	Full structured results with all candidates and scores
`diff.patch`	Unified diff from baseline (if changed)

Architecture

Model roles:

Target model (local) — runs the Orb pipeline with candidate prompts. This is the model being optimized for.
Mutator (API) — analyzes evaluation traces and proposes mutation strategies.
Prompt Writer (API) — takes a mutation strategy and writes a concrete new prompt.
Judge (API, director mode only) — rates writer output quality on a 0-10 scale.

Scoring:

Editor mode: deterministic composite of fix rate, text preservation, length compliance, no new issues. Uses Orb's own audit.
Director mode: LLM judge rates writer output quality across prose craft, character voice, engagement, consistency, creativity.

Constraints: Size cap, growth limit, tool-name reference check, injection safety regex.

Available Prompts (Editor Mode)

Name	Orb Constant
`editor_preamble`	`EDITOR_PREAMBLE`
`editor_patch`	`EDITOR_PATCH_INSTRUCTIONS`
`editor_rewrite`	`EDITOR_REWRITE_INSTRUCTIONS`
`editor_both`	`EDITOR_BOTH_INSTRUCTIONS`
`editor_structural`	`STRUCTURAL_REWRITE_INSTRUCTIONS`
`director_preamble`	`DIRECTOR_PREAMBLE`
`director_scene`	`_DIRECT_SCENE_DESCRIPTION`

Development

pip install -e ".[dev]"
python -m pytest tests/ -v

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
orb_evo.egg-info		orb_evo.egg-info
orb_evo		orb_evo
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md
pyproject.toml		pyproject.toml
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

orb-evo

How It Works

Editor Mode

Director Mode

Setup

Usage

Mine Conversations (Director Mode Only)

Editor Evolution

Director Evolution

CLI Commands

Editor Options

Director Options

Output

Architecture

Available Prompts (Editor Mode)

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

orb-evo

How It Works

Editor Mode

Director Mode

Setup

Usage

Mine Conversations (Director Mode Only)

Editor Evolution

Director Evolution

CLI Commands

Editor Options

Director Options

Output

Architecture

Available Prompts (Editor Mode)

Development

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages