Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -84,3 +84,5 @@ site

graphify-out/
service_account.json

eval_reports/
182 changes: 182 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
# agentflow (core Python library) — Engineering Guide

This file documents the **core Python framework** only (`10xscale-agentflow`, the package that
lives in this folder). For the API/CLI, TS client, docs, or playground, see the CLAUDE.md in
their respective folders and the workspace-root `CLAUDE.md` for the monorepo overview.

- Package name (PyPI): `10xscale-agentflow`
- Version: `0.7.5.1` (single source of truth: `pyproject.toml`)
- Requires: Python >= 3.12
- Importable top-level package lives at `agentflow/agentflow/` (this folder is the repo root;
the importable package is the nested `agentflow/` directory).

## What this package is

A graph-based orchestration engine for multi-agent LLM systems. It is **LLM-agnostic**: you bring
the provider SDK (OpenAI / Google GenAI), and Agentflow provides the workflow engine, state,
persistence, tools, memory, evaluation, and event publishing. Inspired by LangGraph but simpler.

## Working principles for this codebase

- **Read before writing.** The public API is large and re-exported through many `__init__.py`
files. Confirm the real export path before referencing a symbol (see Import Map below).
- **Examples are the source of truth**, not the README. `examples/` uses current import paths;
the README and several docstrings still show pre-refactor paths (see Known Doc Drift).
- **Surgical edits.** This is `Development Status :: 5 - Production/Stable`. Don't refactor
module boundaries or rename exports without checking every `__init__.py` that re-exports them.
- **Keep coverage green.** `pytest` enforces `--cov-fail-under=70`. New code needs tests.
- **Optional deps are optional.** Provider SDKs, MCP, Postgres, Redis, Qdrant, Mem0, Kafka,
RabbitMQ, OTEL, a2a are all extras. Guard imports; never make core import a hard optional dep.

## Package layout (real, current)

The importable package is `agentflow/agentflow/`. Top-level subpackages:

| Subpackage | What lives there |
|---|---|
| `core/` | The engine. `graph/` (StateGraph, Agent, ToolNode, CompiledGraph, Node, Edge), `state/` (AgentState, Message, content blocks, reducers, context managers), `llm/` (provider detection + client factory + `call_llm`), `skills/` (dynamic skill injection), `exceptions/` |
| `storage/` | `checkpointer/` (InMemory, Pg), `store/` (vector/long-term memory: Qdrant, Mem0, embeddings), `media/` (multimodal media processing, offload, resolvers, stores) |
| `runtime/` | `adapters/llm/` (OpenAI / OpenAI-Responses / Google GenAI response converters), `publisher/` (Console, Redis, Kafka, RabbitMQ, OTEL, Composite), `protocols/` (a2a, acp) |
| `prebuilt/` | `agent/` (React, RAG, PlanActReflect, SupervisorTeam, Swarm, StructuredOutput), `tools/` (calculator, fetch, files, handoff, memory, search) |
| `qa/` | `evaluation/` (criteria, datasets, evaluator, reporters, simulators) and `testing/` (TestAgent, mocks, quick tests) |
| `utils/` | constants (START/END/ResponseGranularity), `tool` decorator, `convert_messages`, callbacks, validators, id generators, background tasks, graceful shutdown |

## Import Map (verified) — this is the part that bites people

The package was restructured into `core/`, `storage/`, `runtime/`, `qa/`. **There are no
top-level `agentflow.graph`, `agentflow.state`, `agentflow.checkpointer`, `agentflow.skills`,
`agentflow.evaluation`, `agentflow.testing`, `agentflow.adapters`, or `agentflow.publisher`
shims.** Those paths raise `ModuleNotFoundError`. Use the canonical paths:

```python
# Graph engine
from agentflow.core.graph import Agent, StateGraph, ToolNode, CompiledGraph, Node, Edge, RetryConfig
# or the aggregate: from agentflow.core import StateGraph, Agent, ToolNode, AgentState, Message, ...

# State and messages
from agentflow.core.state import AgentState, Message, TextBlock, ToolResultBlock, add_messages

# LLM client/provider helpers
from agentflow.core.llm import call_llm, create_llm_client, detect_provider

# Skills
from agentflow.core.skills import SkillConfig, SkillMeta, SkillsRegistry

# Persistence
from agentflow.storage.checkpointer import InMemoryCheckpointer, PgCheckpointer, BaseCheckpointer
# Vector / long-term memory
from agentflow.storage.store import QdrantStore, Mem0Store, MemoryConfig, AgentMemoryConfig

# Publishers / converters
from agentflow.runtime.publisher import ConsolePublisher, RedisPublisher, KafkaPublisher, RabbitMQPublisher
from agentflow.runtime.adapters.llm import OpenAIConverter, GoogleGenAIConverter, OpenAIResponsesConverter

# Prebuilt
from agentflow.prebuilt.agent import ReactAgent, RAGAgent, SwarmAgent, SupervisorTeamAgent
from agentflow.prebuilt.tools import safe_calculator, fetch_url, create_handoff_tool, memory_tool

# QA
from agentflow.qa.evaluation import AgentEvaluator, EvalConfig, EvalCase, EvalSet
from agentflow.qa.testing import TestAgent, MockMCPClient, MockToolRegistry

# Utils
from agentflow.utils import tool, convert_messages, Command
from agentflow.utils.constants import START, END, ResponseGranularity
```

Note: the root `agentflow/__init__.py` is intentionally empty. Importing the package does not
eagerly pull in submodules; import the subpackage you need.

## Core concepts

**StateGraph -> CompiledGraph.** Build with `StateGraph()`, `add_node`, `add_edge`,
`add_conditional_edges`, `set_entry_point`; then `.compile(...)` returns a `CompiledGraph`.
`compile()` accepts: `checkpointer`, `store`, `media_store`, `interrupt_before`,
`interrupt_after`, `callback_manager`, `shutdown_timeout` (default 30.0).

**CompiledGraph execution API:** `invoke` / `ainvoke` (run), `stream` / `astream` (incremental),
`stop` / `astop` (interrupt), `override_node`, `attach_remote_tools`, `generate_graph`, `aclose`.
- Input shape: `{"messages": [Message...]}`.
- Config keys: `user_id`, `thread_id`, `run_id`, `recursion_limit` (default 25).
- `response_granularity`: `LOW` (messages only, default), `PARTIAL` (context+summary+messages),
`FULL` (full state).

**Agent class** (`agentflow.core.graph.Agent`) — the high-level node that wraps LLM calls,
message conversion, and tool integration. Key constructor params:
`model` (required), `output_type="text"`, `system_prompt`, `tool_node` (name or ToolNode),
`extra_messages`, `trim_context`, `tools_tags`, `reasoning_config`, `skills`, `memory`,
`retry_config` (default True), `fallback_models`, `multimodal_config`, `output_schema`.

**Model strings and providers.** `detect_provider(model)` infers the provider from a
`"provider/model"` prefix or the model name. **It only resolves to `"google"` or `"openai"`.**
Examples: `"gemini/gemini-2.5-flash"`, `"openai/gpt-4o"`, `"gpt-4o-mini"`. Vertex AI is selected
via `use_vertex_ai=True`. There is **no native Anthropic client** in the LLM factory despite
Anthropic/Claude appearing in marketing copy; Claude is reachable only via an OpenAI-compatible
endpoint or the custom-functions approach. Verify before promising native Claude support.

**ToolNode.** `ToolNode(tools, client=None, pass_user_info_to_mcp=False)`. First positional arg
is `tools` (an iterable of callables). `client` is an MCP client (fastmcp/mcp). Tools run in
**parallel** when the LLM requests several at once. Define tools as plain functions; injectable
params (`tool_call_id`, `state`, `config`, plus InjectQ-provided deps) are filled automatically.

**State and Message.** `AgentState` is a Pydantic model; subclass it for custom fields.
`Message.text_message(content, role="user")` is the text factory. `Message.tool_message(...)`,
`Message.image_message(...)` exist. There is **no `Message.from_text`** (README shows it; it is
wrong). Content is a list of typed blocks (TextBlock, ImageBlock, ToolCallBlock, ToolResultBlock,
ReasoningBlock, etc.). Reducers (`add_messages`, `replace_messages`, `append_items`) control how
state lists merge.

**Persistence.** `InMemoryCheckpointer` for dev/tests. `PgCheckpointer` (Postgres + Redis dual
layer) for production; requires `[pg_checkpoint]`.

**Memory / store.** 3-layer model: working state -> checkpointer (hot/durable) -> vector store
(Qdrant/Mem0) for long-term. `MemoryConfig` / `AgentMemoryConfig` drive it; `memory_tool` and
`create_memory_preload_node` wire it into a graph.

**Skills.** `SkillConfig(skills_dir=...)` adds dynamic skill injection. Two modes: `on-demand`
(LLM calls `set_skill()` from a trigger table) and `session` (preload a fixed skill from a state
field via `preload_from`).

**Publishers.** Emit execution events to Console, Redis Pub/Sub, Kafka, RabbitMQ, or OTEL.
`CompositePublisher` fans out to several. OTEL publisher provides tracing (`setup_tracing`).

**QA.** `agentflow.qa.evaluation` is a full eval framework (criteria incl. LLM-as-judge,
trajectory matching, rubric, safety, hallucination; datasets; console/JSON/HTML/JUnit reporters;
user simulators). `agentflow.qa.testing` provides `TestAgent`, `MockMCPClient`, `MockToolRegistry`,
`TestContext` for unit-testing graphs without live LLMs.

## Development workflow

This repo root is `agentflow/`; the importable package is `agentflow/agentflow/`. A `.venv` is
already present.

```bash
# from this folder (agentflow/)
.venv/bin/python -m pytest # full suite (enforces coverage >= 70%)
.venv/bin/python -m pytest tests/graph # one area
ruff check . && ruff format . # lint + format (line-length 100, py312)
# editable install with extras for local dev:
pip install -e ".[google-genai,openai,mcp,pg_checkpoint]"
```

- Tests live in `tests/` (mirrors package layout: `graph/`, `state/`, `storage/`, `store/`,
`checkpointer/`, `publisher/`, `prebuilt/`, `evaluation/`, `testing/`, plus `chaos/`,
`benchmarks/`, `integration/`). Markers: `asyncio`, `integration` (needs real DBs), `slow`.
- Lint config is in `pyproject.toml` `[tool.ruff]` (broad rule set; per-file ignores for a few
large modules). `mypy` and `bandit` are also configured there.
- `examples/` is organized by feature (react, rag, swarm, supervisor_team, memory, skills, mcp,
a2a_sdk, evaluation, testing, multimodal, structured_output, ...). Use these as canonical usage.

## Known doc drift (do not copy from these without checking)

- **README.md import paths are stale.** It imports `agentflow.graph`, `agentflow.state`,
`agentflow.checkpointer` — all removed. Real paths are `agentflow.core.*` / `agentflow.storage.*`.
- **`Message.from_text` does not exist** (README uses it). Use `Message.text_message`.
- **`ToolNode(functions=...)`** keyword is wrong (README MCP example). The param is `tools`.
- A few `examples/` files still use dead paths (`agentflow.state.message`, `agentflow.graph.tool_node`,
`agentflow.evaluation.*`). Treat those specific files as broken until fixed.
- README/docstrings imply native Anthropic support; the LLM factory only builds google/openai
clients. See Model strings above.

When you touch any of the above, prefer fixing the doc/example to match the code rather than the
reverse, unless the export path itself is the bug.
Loading
Loading