Skip to content

SCELabs/aegis-langchain-deepagents-demo

Repository files navigation

Aegis LangChain DeepAgents Demo

This demo shows how Aegis stabilizes runtime behavior in an existing LangChain/DeepAgents workflow without changing the underlying model or redesigning the app.

  • Same task and prompts
  • Same workflow shape
  • Aegis inserted as a runtime control layer

What this demo proves

Aegis is used as a scope-first runtime SDK:

  • client.auto().llm(...) for model-call stabilization
  • client.auto().step(...) for loop/supervisor step stabilization (where applicable)

This repository does not use the legacy plan-first mental model.


Aegis SDK surface used

from aegis import AegisClient, AegisConfig
import os

client = AegisClient(
    api_key=os.environ["AEGIS_API_KEY"],
    base_url=os.getenv("AEGIS_BASE_URL"),
    config=AegisConfig(mode="balanced"),
)

result = client.auto().llm(...)

Returned AegisResult values are inspected in the demo outputs, including:

  • actions
  • trace
  • metrics
  • used_fallback
  • explanation
  • scope
  • scope_data

Environment

Required

  • OPENAI_API_KEY
  • AEGIS_API_KEY

Optional

  • AEGIS_BASE_URL (set explicitly if not using default endpoint)
  • DEMO_MODEL (default: openai:gpt-4o-mini)

Example .env

OPENAI_API_KEY=...
AEGIS_API_KEY=...
AEGIS_BASE_URL=http://localhost:8080
DEMO_MODEL=openai:gpt-4o-mini

Run

python -m benchmark_v3.run_baseline
python -m benchmark_v3.run_aegis
python -m runners.compare_benchmark_v3

Other runnable variants

python -m runners.run_aegis
python -m runners.run_working_aegis
python -m runners.run_v2_aegis
python -m runners.run_aegis_supervisor

What to inspect in outputs

Aegis runs emit:

  • aegis_result*.json files (replacing old plan artifacts)
  • optional aegis_debug_summary.txt

Inspect:

  • actions → applied runtime controls
  • used_fallback → whether fallback behavior was triggered
  • trace / metrics → decision and execution context

Key takeaway

This demo shows how Aegis sits above an existing system to:

  • reduce waste
  • stabilize behavior
  • improve consistency

…without changing the underlying model integration. Without rewriting your system.

About the Benchmark

This demo is designed to reflect a common real-world pattern:

Many AI workflows are built as multi-pass systems
(planner → solver → validator → refine)

Even when the first answer is already correct, these systems often:

  • re-run validation
  • perform unnecessary refinement
  • make extra model calls by default

In this benchmark:

  • the baseline represents that standard multi-pass behavior
  • Aegis supervises execution at runtime and stops early when safe

Important:

  • Aegis does not improve the model’s intelligence
  • it does not change the task or inject answers
  • it only reduces unnecessary execution

The goal is to measure efficiency, not raw capability.

About

Langchain runtime optimization demo using Aegis.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages