Evolutionary prompt optimization for Orb's pipeline.
Two optimization modes:
- Editor evolution — evolves editor prompt strings (patch/rewrite instructions). Scored deterministically via Orb's own audit scanners.
- Director evolution — evolves the system prompt or director preamble by running director → writer → LLM judge. Scored on writing quality against real conversation data mined from Orb's DB.
Uses DSPy + a custom evolutionary loop. The strong API model handles mutation, prompt writing, analysis, and judging. A local model runs the Orb pipeline as the optimization target.
1. Load baseline prompt from Orb's tool_defs.py
2. Generate synthetic eval examples with known flaws (banned phrases, repetition, etc.)
3. Evaluate baseline → get scores + traces
4. For each generation:
a. Mutator (API model): analyze traces, propose mutation strategy
b. Prompt Writer (API model): write new candidate from strategy
c. Validate constraints (size, growth, tool references, injection safety)
d. Evaluate candidate via Orb's actual editor pipeline (ReAct tool-calling loop)
e. Score with deterministic audit
5. Return best candidate → save evolved prompt + diff + metrics
1. Mine real conversations from Orb's DB (character cards, history, director/writer outputs)
2. Load baseline system prompt or director preamble from Orb
3. Evaluate baseline: director call → writer call → LLM judge scores output 0-10
4. For each generation:
a. Mutator (API model): analyze judge feedback, propose mutation strategy
b. Prompt Writer (API model): rewrite prompt following strategy
c. Validate constraints (size, growth)
d. Evaluate: run full director → writer → judge pipeline with candidate prompt
5. Return best candidate → save evolved prompt + diff + metrics
git clone <repo-url> && cd orb-evo
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"Requires a local Orb checkout. Auto-discovered at ~/repos/Orb/ or set ORB_REPO_PATH.
# Extract real conversation data from Orb's DB
python -m orb_evo.cli mine --output data/conversations.jsonl# Dry run
python -m orb_evo.cli evolve-editor --prompt editor_preamble --dry-run
# Full run
bash run.sh evolve-editor --prompt editor_preamble \
--target-model "openai/gemma-4-31B-it-Q5_K_M.gguf" \
--target-base http://localhost:8081/v1 \
--mutator-model "zai/glm-5.1" \
--mutator-base "https://api.z.ai/api/coding/paas/v4" \
--iterations 3 --population 3 --dataset-size 5# Dry run
python -m orb_evo.cli evolve-director --evolve system_prompt --dry-run
# Evolve system prompt
bash run.sh evolve-director \
--target-model "openai/gemma-4-31B-it-Q5_K_M.gguf" \
--target-base http://localhost:8081/v1 \
--mutator-model "zai/glm-5.1" \
--mutator-base "https://api.z.ai/api/coding/paas/v4" \
--evolve system_prompt \
--dataset-size 5 --iterations 5
# Evolve director preamble instead
bash run.sh evolve-director \
--target-model "openai/gemma-4-31B-it-Q5_K_M.gguf" \
--target-base http://localhost:8081/v1 \
--mutator-model "zai/glm-5.1" \
--mutator-base "https://api.z.ai/api/coding/paas/v4" \
--evolve director_preamble \
--dataset-size 5 --iterations 5Note: Use
run.shto ensureZAI_API_KEYis loaded from~/.bashrc. Background processes don't inherit interactive shell env vars.
| Command | Description |
|---|---|
mine |
Mine real conversations from Orb's DB into JSONL |
evolve-editor |
Evolve editor prompt strings (deterministic scoring) |
evolve-director |
Evolve system prompt or director preamble (LLM judge scoring) |
| Flag | Default | Description |
|---|---|---|
--prompt |
required | Prompt name to evolve (see Available Prompts) |
--target-model |
openai/gemma-4-31b |
Local model being optimized |
--target-base |
http://localhost:8081/v1 |
Target model API base URL |
--mutator-model |
openai/glm-5.1 |
API model for mutation/analysis |
--mutator-base |
auto | API base URL for mutator |
--iterations |
10 |
Number of evolutionary generations |
--population |
5 |
Population size per generation |
--dataset-size |
20 |
Number of eval examples |
| Flag | Default | Description |
|---|---|---|
--evolve |
system_prompt |
What to evolve: system_prompt or director_preamble |
--mined-data |
data/conversations.jsonl |
Path to mined conversation data |
--target-model |
openai/gemma-4-31B-it-Q5_K_M.gguf |
Local model being optimized |
--target-base |
http://localhost:8081/v1 |
Target model API base URL |
--mutator-model |
zai/glm-5.1 |
API model for mutation/analysis/judging |
--mutator-base |
auto | API base URL for mutator |
--iterations |
5 |
Number of evolutionary generations |
--population |
3 |
Population size per generation |
--dataset-size |
5 |
Number of eval examples |
--seed |
random | Random seed for example selection (omit for random) |
--max-history |
0 |
Max conversation messages per example (0 = all) |
--mined-data |
data/conversations.jsonl |
Path to mined conversation data |
Results are saved to output/{model_slug}/{target}/{timestamp}/:
| File | Contents |
|---|---|
best_prompt.txt |
Winning prompt |
results.json |
Full structured results with all candidates and scores |
diff.patch |
Unified diff from baseline (if changed) |
Model roles:
- Target model (local) — runs the Orb pipeline with candidate prompts. This is the model being optimized for.
- Mutator (API) — analyzes evaluation traces and proposes mutation strategies.
- Prompt Writer (API) — takes a mutation strategy and writes a concrete new prompt.
- Judge (API, director mode only) — rates writer output quality on a 0-10 scale.
Scoring:
- Editor mode: deterministic composite of fix rate, text preservation, length compliance, no new issues. Uses Orb's own audit.
- Director mode: LLM judge rates writer output quality across prose craft, character voice, engagement, consistency, creativity.
Constraints: Size cap, growth limit, tool-name reference check, injection safety regex.
| Name | Orb Constant |
|---|---|
editor_preamble |
EDITOR_PREAMBLE |
editor_patch |
EDITOR_PATCH_INSTRUCTIONS |
editor_rewrite |
EDITOR_REWRITE_INSTRUCTIONS |
editor_both |
EDITOR_BOTH_INSTRUCTIONS |
editor_structural |
STRUCTURAL_REWRITE_INSTRUCTIONS |
director_preamble |
DIRECTOR_PREAMBLE |
director_scene |
_DIRECT_SCENE_DESCRIPTION |
pip install -e ".[dev]"
python -m pytest tests/ -vMIT