Aegis LangChain DeepAgents Demo

This demo shows how Aegis stabilizes runtime behavior in an existing LangChain/DeepAgents workflow without changing the underlying model or redesigning the app.

Same task and prompts
Same workflow shape
Aegis inserted as a runtime control layer

What this demo proves

Aegis is used as a scope-first runtime SDK:

client.auto().llm(...) for model-call stabilization
client.auto().step(...) for loop/supervisor step stabilization (where applicable)

This repository does not use the legacy plan-first mental model.

Aegis SDK surface used

from aegis import AegisClient, AegisConfig
import os

client = AegisClient(
    api_key=os.environ["AEGIS_API_KEY"],
    base_url=os.getenv("AEGIS_BASE_URL"),
    config=AegisConfig(mode="balanced"),
)

result = client.auto().llm(...)

Returned AegisResult values are inspected in the demo outputs, including:

actions
trace
metrics
used_fallback
explanation
scope
scope_data

Environment

Required

OPENAI_API_KEY
AEGIS_API_KEY

Optional

AEGIS_BASE_URL (set explicitly if not using default endpoint)
DEMO_MODEL (default: openai:gpt-4o-mini)

Example `.env`

OPENAI_API_KEY=...
AEGIS_API_KEY=...
AEGIS_BASE_URL=http://localhost:8080
DEMO_MODEL=openai:gpt-4o-mini

Run

python -m benchmark_v3.run_baseline
python -m benchmark_v3.run_aegis
python -m runners.compare_benchmark_v3

Other runnable variants

python -m runners.run_aegis
python -m runners.run_working_aegis
python -m runners.run_v2_aegis
python -m runners.run_aegis_supervisor

What to inspect in outputs

Aegis runs emit:

aegis_result*.json files (replacing old plan artifacts)
optional aegis_debug_summary.txt

Inspect:

actions → applied runtime controls
used_fallback → whether fallback behavior was triggered
trace / metrics → decision and execution context

Key takeaway

This demo shows how Aegis sits above an existing system to:

reduce waste
stabilize behavior
improve consistency

…without changing the underlying model integration. Without rewriting your system.

About the Benchmark

This demo is designed to reflect a common real-world pattern:

Many AI workflows are built as multi-pass systems
(planner → solver → validator → refine)

Even when the first answer is already correct, these systems often:

re-run validation
perform unnecessary refinement
make extra model calls by default

In this benchmark:

the baseline represents that standard multi-pass behavior
Aegis supervises execution at runtime and stops early when safe

Important:

Aegis does not improve the model’s intelligence
it does not change the task or inject answers
it only reduces unnecessary execution

The goal is to measure efficiency, not raw capability.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
adapters		adapters
aegis-client		aegis-client
archive		archive
benchmark_v3		benchmark_v3
deepagents		deepagents
demo		demo
notes		notes
runners		runners
scenarios		scenarios
supervisor		supervisor
working_demo		working_demo
working_demo_v2		working_demo_v2
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Aegis LangChain DeepAgents Demo

What this demo proves

Aegis SDK surface used

Environment

Required

Optional

Example `.env`

Run

Other runnable variants

What to inspect in outputs

Key takeaway

About the Benchmark

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Aegis LangChain DeepAgents Demo

What this demo proves

Aegis SDK surface used

Environment

Required

Optional

Example .env

Run

Other runnable variants

What to inspect in outputs

Key takeaway

About the Benchmark

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Example `.env`

Packages