This demo shows how Aegis stabilizes runtime behavior in an existing LangChain/DeepAgents workflow without changing the underlying model or redesigning the app.
- Same task and prompts
- Same workflow shape
- Aegis inserted as a runtime control layer
Aegis is used as a scope-first runtime SDK:
client.auto().llm(...)for model-call stabilizationclient.auto().step(...)for loop/supervisor step stabilization (where applicable)
This repository does not use the legacy plan-first mental model.
from aegis import AegisClient, AegisConfig
import os
client = AegisClient(
api_key=os.environ["AEGIS_API_KEY"],
base_url=os.getenv("AEGIS_BASE_URL"),
config=AegisConfig(mode="balanced"),
)
result = client.auto().llm(...)Returned AegisResult values are inspected in the demo outputs, including:
actionstracemetricsused_fallbackexplanationscopescope_data
OPENAI_API_KEYAEGIS_API_KEY
AEGIS_BASE_URL(set explicitly if not using default endpoint)DEMO_MODEL(default:openai:gpt-4o-mini)
OPENAI_API_KEY=...
AEGIS_API_KEY=...
AEGIS_BASE_URL=http://localhost:8080
DEMO_MODEL=openai:gpt-4o-minipython -m benchmark_v3.run_baseline
python -m benchmark_v3.run_aegis
python -m runners.compare_benchmark_v3python -m runners.run_aegis
python -m runners.run_working_aegis
python -m runners.run_v2_aegis
python -m runners.run_aegis_supervisorAegis runs emit:
aegis_result*.jsonfiles (replacing old plan artifacts)- optional
aegis_debug_summary.txt
Inspect:
actions→ applied runtime controlsused_fallback→ whether fallback behavior was triggeredtrace/metrics→ decision and execution context
This demo shows how Aegis sits above an existing system to:
- reduce waste
- stabilize behavior
- improve consistency
…without changing the underlying model integration. Without rewriting your system.
This demo is designed to reflect a common real-world pattern:
Many AI workflows are built as multi-pass systems
(planner → solver → validator → refine)
Even when the first answer is already correct, these systems often:
- re-run validation
- perform unnecessary refinement
- make extra model calls by default
In this benchmark:
- the baseline represents that standard multi-pass behavior
- Aegis supervises execution at runtime and stops early when safe
Important:
- Aegis does not improve the model’s intelligence
- it does not change the task or inject answers
- it only reduces unnecessary execution
The goal is to measure efficiency, not raw capability.