Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
205 changes: 89 additions & 116 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Perstack: A Harness for Micro-Agents.
# Perstack

<p align="center">
<a href="https://github.com/perstack-ai/perstack/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-Apache%202.0-blue" alt="License"></a>
Expand All @@ -10,158 +10,127 @@
<p align="center">
<a href="https://perstack.ai/docs/"><strong>Docs</strong></a> ·
<a href="https://perstack.ai/docs/getting-started/"><strong>Getting Started</strong></a> ·
<a href="https://perstack.ai/"><strong>Website</strong></a> ·
<a href="https://github.com/perstack-ai/demo-catalog"><strong>Demo Catalog</strong></a> ·
<a href="https://discord.gg/2xZzrxC9"><strong>Discord</strong></a> ·
<a href="https://x.com/FL4T_LiN3"><strong>X</strong></a>
</p>

If you want to build practical agentic apps like Claude Code or OpenClaw, a harness helps manage the complexity.
Perstack is **a containerized harness for agentic apps**.

Perstack is a harness for agentic apps. It aims to:
- **Harness = Runtime + Config** — Instructions, agent topology, and tools are defined in TOML — not wired in code. The runtime executes what you declare in config.
- **Dev-to-prod in one container** — Same image, same sandbox, same behavior from local to production.
- **Full observability** — Trace every delegation, token, and reasoning step. Replay any run from checkpoints.

- **Do big things with small models**: If a smaller model can do the job, there's no reason to use a bigger one.
- **Quality is a system property, not a model property**: Building agentic software people actually use doesn't require an AI science degree—just a solid understanding of the problems you're solving.
- **Keep your app simple and reliable**: The harness is inevitably complex—Perstack absorbs that complexity so your agentic app doesn't have to.
**Perstack draws clear boundaries** — between your app and the harness, between the harness and each agent — so you can keep building without fighting the mess.

## Getting Started
## Getting started

Perstack keeps expert definition, orchestration, and application integration as separate concerns.

`create-expert` scaffolds experts, the harness handles orchestration, and deployment stays simple because Perstack runs on standard container and serverless infrastructure.

### Defining your first expert

To get started, use the built-in `create-expert` expert to scaffold your first agentic app:
To get started, use the built-in `create-expert` expert to scaffold your first agent:

```bash
# Use `create-expert` to scaffold a micro-agent team named `ai-gaming`
# Use `create-expert` to scaffold a micro-agent team named `bash-gaming`
docker run --pull always --rm -it \
-e FIREWORKS_API_KEY \
-v ./ai-gaming:/workspace \
--env-file .env \
-v ./bash-gaming:/workspace \
perstack/perstack start create-expert \
--provider fireworks \
--model accounts/fireworks/models/kimi-k2p5 \
"Form a team named ai-gaming to build a Bun-based CLI indie game playable on Bash for AI."
--provider <provider> \
--model <model> \
"Form a team named bash-gaming. They build indie CLI games with both AI-facing non-interactive mode and human-facing TUI mode built on Ink + React. Their games must be runnable via npx at any time. Games are polished, well-tested with full playthroughs — TUI mode included."
```

`create-expert` is a built-in expert. It generates a `perstack.toml` that defines a team of micro-agents, runs them, evaluates the results, and iterates until the setup works. Each agent has a single responsibility and its own context window. Complex tasks are broken down and delegated to specialists.

```toml
[experts."ai-gaming"]
description = "Game development team lead"
instruction = "Coordinate the team to build a CLI dungeon crawler."
delegates = ["@ai-gaming/level-designer", "@ai-gaming/programmer", "@ai-gaming/tester"]

[experts."@ai-gaming/level-designer"]
description = "Designs dungeon layouts and game mechanics"
instruction = "Design engaging dungeon levels, enemy encounters, and progression systems."
`create-expert` is a built-in expert. It defines a team of single-purpose micro-agents — called "experts" in Perstack.

[experts."@ai-gaming/programmer"]
description = "Implements the game in TypeScript"
instruction = "Write the game code using Bun, targeting terminal-based gameplay."

[experts."@ai-gaming/tester"]
description = "Tests the game and reports bugs"
instruction = "Play-test the game, find bugs, and verify fixes."
```
create-expert : Thin coordinator that delegates to the experts
├── @create-expert/plan : Expands the user's request into a comprehensive plan
├── @create-expert/write : Produces perstack.toml from plan
└── @create-expert/verify : Runs the expert with a test query and checks the completion
```

The full definition is available at [definitions/create-expert/perstack.toml](https://github.com/perstack-ai/perstack/blob/main/definitions/create-expert/perstack.toml).

While `create-expert` is running, the TUI shows real-time status — active delegation tree, token usage, reasoning streams, and per-agent progress:

```
2026/03/13 08:15:40.083, @bash-gaming/game-designer, ⎇ bash-gaming
2026/03/13 08:15:40.083, @bash-gaming/build, ⎇ bash-gaming
● Reasoning
....

───────────────────────────────────────────────────────────────────────────────
Query: Form a team named bash-gaming. They build indie CLI games with both…
3 running · 7 waiting · 4m 06s · fireworks
Tokens: In 391.9k · Out 38.9k · Cache 145.9k/27.13%
1 running · 3 waiting · 4m 06s · fireworks
Tokens: In 2.9M (Cached 2.1M, Cache Hit 72.69%) · Out 46.2k
⏸ create-expert · accounts/fireworks/models/kimi-k2p5 · ○ 2.4% · Waiting for delegates
└ ⏸ @create-expert/build · accounts/fireworks/models/kimi-k2p5 · ◔ 11.2% · Waiting for delegates
├ ⏸ @create-expert/test-expert · ◔ 5.3% · Waiting for delegates
│ └ ⏸ bash-gaming · ◔ 7.1% · Waiting for delegates
│ └ ⠇ @bash-gaming/game-designer · ◔ 5.2% · Streaming Reasoning...
├ ⏸ @create-expert/test-expert · ◔ 5.2% · Waiting for delegates
│ └ ⠇ bash-gaming · ○ 2.6% · Streaming Reasoning...
│ └ ⠇ @bash-gaming/game-designer · ○ 0.6% · Streaming Reasoning...
└ ⏸ @create-expert/test-expert · ◔ 6.4% · Waiting for delegates
└ ⠇ bash-gaming · ○ 1.9% · Streaming Reasoning...
└ ⠇ @bash-gaming/game-designer · ○ 3.2% · Streaming Reasoning...
└ ⏸ @create-expert/verify · accounts/fireworks/models/kimi-k2p5 · ◔ 11.2% · Waiting for delegates
└ ⏸ bash-gaming · accounts/fireworks/models/kimi-k2p5 · ◔ 6.4% · Waiting for delegates
└ ⠇ @bash-gaming/build · accounts/fireworks/models/kimi-k2p5 · ◔ 3.2% · Streaming Reasoning...
```

### Running your expert

To let your agents work on an actual task, you can use the `perstack start` command to run them interactively:
To run your experts on an actual task, use the `perstack start` command:

```bash
# Let `ai-gaming` build a Wizardry-like dungeon crawler
# Let `bash-gaming` build a Wizardry-like dungeon crawler
docker run --pull always --rm -it \
-e FIREWORKS_API_KEY \
-v ./ai-gaming:/workspace \
perstack/perstack start ai-gaming \
--provider fireworks \
--model accounts/fireworks/models/kimi-k2p5 \
--env-file .env \
-v ./<result-dir>:/workspace \
-v ./bash-gaming/perstack.toml:/definitions/perstack.toml:ro \
perstack/perstack start bash-gaming \
--config /definitions/perstack.toml \
--provider <provider> \
--model <model> \
"Create a Wizardry-like dungeon crawler in a fixed 10-floor labyrinth with complex layouts, traps, fixed room encounters, and random battles. Include special-effect gear drops, leveling, and a skill tree for one playable character. Balance difficulty around build optimization. Death in the dungeon causes loss of one random equipped item."
```

Here is an example of a game built with these commands: [demo-dungeon-crawler](https://github.com/FL4TLiN3/demo-dungeon-crawler). It was built entirely with Kimi K2.5 on Fireworks. You can play it directly:

```bash
npx perstack-demo-dungeon-crawler start
```

<details>
<summary>Generation stats for demo-dungeon-crawler</summary>

| | |
|---|---|
| **Date** | March 8, 2026 |
| **Duration** | 32 min 44 sec |
| **Steps** | 199 |
| **Generated Code** | 13,587 lines across 25 files |
| **Tokens (Input)** | 11.4 M + Cached 10.7 M |
| **Tokens (Output)** | 257.3 K |
| **Cost** | $2.27 (via Fireworks) |

</details>
Here is an example game built with these commands: [demo-catalog](https://github.com/perstack-ai/demo-catalog).
Across 5 runs on 4 providers, the same experts and queries were used.
4 out of 5 runs produced a working dungeon crawler. Full run logs are included in the repository.

### Viewing run history

`perstack log` provides a TUI for browsing past runs and their delegation trees. Every delegation — who called whom, what succeeded, what failed — is visible at a glance:

```bash
$ perstack log --job <jobId>
$ npx perstack log --job <jobId>
Runs (create-expert) Enter:Select b:Back q:Quit
> ✓ create-expert Form a team named bash-gaming. They build indie CLI games…
> ⎇ create-expert Form a team named bash-gaming. They build indie CLI games with both AI-faci…
| \
| ✓ @create-expert/plan Create a team named bash-gaming. They build indie CLI games with bo…
| /
⎇ create-expert (resumed)
| \
| ✓ @create-expert/plan Create a Perstack expert definition for team…
| ✓ @create-expert/build Build the bash-gaming perstack.toml expert…
| ✓ @create-expert/write Create perstack.toml at /workspace/plan.md. This is a new team cre…
| /
⎇ create-expert (resumed)
| \
| ⎇ @create-expert/verify Verify perstack.toml at /workspace/perstack.toml against plan at …
| | \
| | ✓ @create-expert/test-expert Build a word puzzle game 'lexicon'…
| | ⎇ bash-gaming Create a CLI word guessing game called 'cryptoword' published as @bash-ga…
| | | \
| | | ✓ @bash-gaming/plan Create a CLI word guessing game 'cryptoword' published as @bash-g…
| | | /
| | ⎇ bash-gaming (resumed)
| | | \
| | | ✓ @bash-gaming/build Implement the complete cryptoword game package at /home/perstack…
| | | /
| | ⎇ bash-gaming (resumed)
| | | \
| | | ✓ bash-gaming
| | | | \
| | | | ✓ @bash-gaming/game-engine Build the core game engine for "lexicon" - a word search…
| | | | ✓ @bash-gaming/tui-renderer Build the TUI renderer for "lexicon" word search puzzle…
| | | | ✓ @bash-gaming/ai-mode Build the AI mode (headless JSON protocol) for "lexicon" wor…
| | | | ✓ @bash-gaming/npm-dist Build the npm package structure for "lexicon" word search p…
| | | | /
| | | ⎇ bash-gaming (resumed)
| | | | \
| | | | ✓ @bash-gaming/testing Build a comprehensive test suite for the "lexicon" word sear…
| | | | /
| | | ⎇ bash-gaming (resumed)
| | | | \
| | | | ✓ @bash-gaming/evaluator Evaluate the "lexicon" word search puzzle game against all…
| | | | /
| | | ✓ bash-gaming (resumed)
| | | ✓ @bash-gaming/verify Verify the cryptoword package at /home/perstack/cryptoword/: 1…
| | | /
| | ✓ @create-expert/test-expert (resumed)
...
| | ✓ bash-gaming (resumed)
| | /
| ✓ @create-expert/verify (resumed)
| /
✓ create-expert (resumed)
```

`✓` succeeded, `✗` failed, `○` skipped. Use `perstack log --help` for filtering and JSON output options.

### Integrating with your app

Perstack separates the agent harness from the application layer. Your app stays a normal web or terminal app, with no LLM dependencies in the client.
Expand All @@ -188,17 +157,7 @@ RUN perstack install
ENTRYPOINT ["perstack", "run", "my-expert"]
```

The image is Ubuntu-based, multi-arch (`linux/amd64`, `linux/arm64`), and is ~74 MB. `perstack install` pre-resolves MCP servers and prewarms tool definitions for faster, reproducible startup. The runtime can also be imported directly as a TypeScript library ([`@perstack/runtime`](https://www.npmjs.com/package/@perstack/runtime)) for serverless environments. See the [deployment guide](https://perstack.ai/docs/operating-experts/deployment/) for details.


### Why micro-agents?

Perstack is a harness for micro-agents — purpose-specific agents with a single responsibility.

- **Reusable**: Delegates are dependency management for agents — like npm packages or crates. Separate concerns through delegate chains, and compose purpose-built experts across different projects.
- **Cost-Effective**: Purpose-specific experts are designed to run on affordable models. A focused agent with the right domain knowledge on a cheap model outperforms a generalist on an expensive one.
- **Fast**: Smaller models generate faster. Fine-grained tasks broken into delegates run concurrently via parallel delegation.
- **Maintainable**: A monolithic system prompt is like refactoring without tests — every change risks breaking something. Single-responsibility experts are independently testable. Test each one, then compose them.
The image is Ubuntu-based, multi-arch (`linux/amd64`, `linux/arm64`), and is ~74 MB. `perstack install` pre-resolves MCP servers and prewarms tool definitions for faster, reproducible startups. The runtime can also be imported directly as a TypeScript library ([`@perstack/runtime`](https://www.npmjs.com/package/@perstack/runtime)) for serverless environments. See the [deployment guide](https://perstack.ai/docs/operating-experts/deployment/) for details.

## Prerequisites

Expand Down Expand Up @@ -242,9 +201,25 @@ You can also specify custom `.env` file paths with `--env-path`:
perstack start my-expert "query" --env-path .env.production
```

## What's inside?
## Philosophy

An agent harness needs a broad set of capabilities—almost like an operating system.
Three principles guide how Perstack approaches agentic app development:

- **Quality is a system property, not a model property**: Building agentic apps people actually use doesn't require an AI science degree—just a solid understanding of the problems you're solving.
- **Keep your app simple and reliable**: The harness is inevitably complex—Perstack absorbs that complexity so your agentic app doesn't have to.
- **Do big things with small models**: If a smaller model can do the job, there's no reason to use a bigger one.

### Micro-agents — a multi-agent orchestration design

Perstack introduces *micro-agents* — a multi-agent orchestration design built around purpose-specific agents, each with a single responsibility.

- **Simple**: A monolithic agent assembles its system prompt from hundreds of fragments. A multi-agent framework stacks abstraction layers and wires orchestration in code. A Perstack expert is one TOML section — instruction, delegates, done.
- **Reliable**: A plan agent that only plans, a build agent that only builds, a verify agent that only verifies — the pipeline structure itself prevents shortcuts and catches errors that a single generalist would miss.
- **Reusable**: Delegates are dependency management for agents — like npm packages or crates. Separate concerns through delegate chains, and compose purpose-built experts across different projects.

### Expert Stack — harness architecture

Perstack ships a five-layer stack that gives micro-agents everything they need to run.

```
┌──────────────────────────────────────────────────────────────────-┐
Expand All @@ -265,8 +240,6 @@ An agent harness needs a broad set of capabilities—almost like an operating sy
└──────────────────────────────────────────────────────────────────-┘
```

Most of the features below are not new ideas. Perstack takes the usual harness building blocks — tool use, delegation, checkpointing, prompt caching, etc. — makes them easy to operate, puts them on top of standards you already know (MCP, TOML, Docker, SSE), and ships them as one runtime. Where cost or operational burden demands it, Perstack introduces its own take — micro-agents being the first example.

<details>
<summary>Full feature matrix</summary>

Expand Down Expand Up @@ -323,14 +296,14 @@ Most of the features below are not new ideas. Perstack takes the usual harness b
| Deployment | [Deployment](https://perstack.ai/docs/operating-experts/deployment/) |
| CLI and API reference | [References](https://perstack.ai/docs/references/cli/) |

## Status
## Demo

Pre-1.0. The runtime is production-tested, but the API surface may change. [Pin your versions](https://perstack.ai/docs/references/cli/).
[demo-catalog](https://github.com/perstack-ai/demo-catalog) runs the same experts and queries across multiple providers and models. Every run includes raw checkpoints and event logs — fully traceable, replayable, and ready for your own analysis. New demos and provider results are added continuously.

## Community

- [Discord](https://discord.gg/2xZzrxC9)
- [Author on X(@FL4T_LiN3)](https://x.com/FL4T_LiN3)
- [Author on X (@FL4T_LiN3)](https://x.com/FL4T_LiN3)

## Contributing

Expand Down
Loading