Minimal Agent Framework (MAF)

A small, auditable, replayable agent runtime. One loop. Typed tools. Full traces. Nothing else.

Why MAF?

Every agent framework wants to be everything. MAF wants to be understood.

One runtime loop — read the source in 15 minutes
Typed tools with input/output schema validation
JSONL traces on every run — replay, debug, audit
Any OpenAI-compatible model — OpenAI, Cerebras, Gemini, Groq, local models
Sandboxed by default — fs root boundaries, HTTP allowlists, tool allowlists

Quickstart

python3 -m venv .venv && source .venv/bin/activate
pip install -e .

Mock (no API key needed)

maf run --provider mock --input "Hello from MAF"

OpenAI

export OPENAI_API_KEY="sk-..."
maf run --provider openai --input "List files in the current directory"

Any OpenAI-compatible provider (Gemini, Groq, etc.)

maf run \
  --provider openai \
  --endpoint "https://generativelanguage.googleapis.com/v1beta/openai/chat/completions" \
  --api-key "$GEMINI_API_KEY" \
  --model gemini-2.5-flash \
  --input "List files and summarize what you find"

Real-World Example: Autonomous KPI Analysis

This is how we actually use MAF — delegating a multi-step analytics task that calls external CLIs, processes the results, and writes a report:

maf run \
  --provider openai \
  --endpoint "https://generativelanguage.googleapis.com/v1beta/openai/chat/completions" \
  --api-key "$GEMINI_API_KEY" \
  --model gemini-2.5-flash \
  --max-steps 12 \
  --input 'Gather site traffic data and write a KPI report.
    Step 1: Run shell command: datafast overview --period 30d --json
    Step 2: Run shell command: datafast top referrers --period 30d --json
    Step 3: Write file kpi-report.md with metrics summary and recommendations.
    Step 4: Return the report as final output.'

MAF executes each step autonomously — calling shell.exec, processing JSON output, writing the report via fs, and returning the result. Every step is traced in JSONL.

Related operator writeups:

[09:27:01] run_started    provider=gemini, model=gemini-2.5-flash, max_steps=12
[09:27:02] tool_called    shell.exec → datafast overview --period 30d --json
[09:27:03] tool_result    ✓ exit_code=0, 26 visitors, 66.67% bounce rate
[09:27:05] tool_called    shell.exec → datafast top referrers --period 30d --json
[09:27:06] tool_result    ✓ exit_code=0, Direct: 16, X: 7, Google: 3
[09:27:07] tool_called    shell.exec → datafast top pages --period 30d --json
[09:27:08] tool_result    ✓ exit_code=0, 10 pages returned
[09:27:20] tool_called    fs.write → kpi-report.md (2,006 bytes)
[09:27:25] run_finished   status=completed, 5 steps, ~24 seconds

Built-in Tools

Tool	Purpose	Safety
`shell.exec`	Run shell commands	`cwd` constrained to `fs_root_path`
`fs`	Read, write, list files	Paths must stay under `fs_root_path`
`http.fetch`	HTTP requests	Deny-by-default, URL allowlist required
`kv`	Key-value persistence	File-backed, scoped to run config

CLI

maf run      # Execute a run
maf trace    # Inspect persisted trace events
maf replay   # Replay a prior run from recorded traces
maf perf     # Token throughput metrics

Key flags for maf run:

--provider — mock, openai, cerebras
--model — model identifier
--endpoint — override API endpoint (for OpenAI-compatible providers)
--api-key — override API key directly
--max-steps / --max-run-seconds — budget controls
--stream-events — print structured events live

See docs/cli.md for full reference.

Python API

from maf import AgentRuntime, RuntimeConfig, OpenAIChatAdapter, build_power_tools

config = RuntimeConfig(
    provider="openai",
    model="gpt-4.1-mini",
    max_steps=10,
    max_run_seconds=60,
    trace_dir=".maf/runs",
    fs_root_path="./workspace",
)

adapter = OpenAIChatAdapter(model="gpt-4.1-mini")
tools = build_power_tools(config)
runtime = AgentRuntime(config=config, llm_adapter=adapter, tools=tools)

result = runtime.run("Read all files and create a summary")
print(result.final_output)

Traces & Replay

Every run produces artifacts in <trace_dir>/<run_id>/:

.maf/runs/4f720ddf/
├── trace.jsonl          # Every event: model calls, tool results, timing
├── metadata.json        # Run config, provider, model, budgets
├── state.initial.json   # Input state snapshot
└── state.final.json     # Output state snapshot

Replay any run deterministically:

maf replay --run-id 4f720ddf

Testing

python3 -m pytest tests/ -x     # All tests
python3 scripts/validate_golden_traces.py  # Golden trace validation

Docs

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.beads		.beads
.wiggum		.wiggum
docs		docs
maf		maf
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
PRD.md		PRD.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Minimal Agent Framework (MAF)

Why MAF?

Quickstart

Mock (no API key needed)

OpenAI

Any OpenAI-compatible provider (Gemini, Groq, etc.)

Real-World Example: Autonomous KPI Analysis

Built-in Tools

CLI

Python API

Traces & Replay

Testing

Docs

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Minimal Agent Framework (MAF)

Why MAF?

Quickstart

Mock (no API key needed)

OpenAI

Any OpenAI-compatible provider (Gemini, Groq, etc.)

Real-World Example: Autonomous KPI Analysis

Built-in Tools

CLI

Python API

Traces & Replay

Testing

Docs

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages