eager-tools

Cut agent wall-clock latency by overlapping tool execution with LLM streaming.

A production-grade reference implementation of eager tool calling — the pattern that dispatches each tool the moment its block finishes streaming, not after message_stop.

The problem in one graph

ASCII fallback

Classic parallel tool calling:
stream : [==================================]
tools  :                                     [===========]  ← idle during stream
total  :                                                    ← stream + max(tool)

Eager tool calling:
stream : [==================================]
tool A :   [=========]        ← fires mid-stream
tool B :       [=========]    ← fires mid-stream, overlaps A
tool C :           [=========]← fires at message_stop
total  : [==================================]               ← max(stream, max(tool))

Parallel tool calling overlaps tools with tools. Eager tool calling overlaps tools with generation itself.

Benchmark headline

Synthetic harness — make bench reproduces locally, deterministic. Across 16 workloads (3 → 15 tools), eager beats parallel by 1.20× – 1.50× (median ~1.28×). Parallel is the right baseline: modern frameworks (langchain.agents.create_agent, OpenAI Agents SDK, Vercel AI SDK) already execute tool calls from one assistant message concurrently. Eager's win comes from overlapping tools with the stream itself — something parallel dispatch can't do. Full table + repro details: bench/results.md.

Workload	Sequential	Parallel	Eager	Speedup vs parallel
3-tool analytics	4.90s	3.50s	2.90s	1.21×
9-tool incident triage	17.61s	9.50s	6.50s	1.46×
15-tool ad campaign	30.42s	11.50s	8.80s	1.31×

These are lower bounds. The synthetic stream removes network jitter, tail latency, and provider-side variance — the things that make eager dispatch shine in production. Run make bench-live-anthropic (or -openai) to spot-check against a real provider.

60-second quickstart

pip install eager-tools-core eager-tools-langgraph   # once published
# or, from source:
git clone https://github.com/cloudthinker-ai/eager-tools && cd eager-tools && make sync

import asyncio, os
from langchain.agents import create_agent
from langchain_anthropic import ChatAnthropic
from langchain_core.messages import HumanMessage
from langchain_core.tools import tool
from eager_tools_langgraph import eager_middleware

class SlowTool:
    def __init__(self, name: str, delay: float = 2.0):
        self.name = name
        self.idempotent = True
        self._delay = delay
    async def __call__(self, arguments):
        await asyncio.sleep(self._delay)
        return {"name": self.name, "args": arguments, "ok": True}

@tool
def get_weather(city: str) -> str:
    """Get current weather for a city."""
    return ""

@tool
def get_stock_price(ticker: str) -> str:
    """Get the current stock price for a ticker symbol."""
    return ""

@tool
def get_news(topic: str) -> str:
    """Get recent news on a topic."""
    return ""

eager_tools = {
    "get_weather":    SlowTool("get_weather"),
    "get_stock_price": SlowTool("get_stock_price"),
    "get_news":       SlowTool("get_news"),
}

async def main():
    agent = create_agent(
        model=ChatAnthropic(model_name="claude-sonnet-4-5", timeout=60.0, stop=None),
        tools=[get_weather, get_stock_price, get_news],
        middleware=[eager_middleware(eager_tools)],
    )
    result = await agent.ainvoke({
        "messages": [HumanMessage(
            "Get the weather in NYC, the AAPL stock price, and recent AI news."
        )]
    })
    print(result["messages"][-1].content)

asyncio.run(main())

One middleware line wires eager dispatch into any create_agent call — no changes to your tools or prompt. Works with OpenAI too: swap ChatAnthropic for ChatOpenAI. Runnable variants in examples/.

Why this exists

Modern agent APIs — Anthropic, OpenAI, Bedrock — let the model emit multiple tool_use blocks in one assistant message and run them in parallel. That moves the tool phase from sum of durations to max. Good, but insufficient.

The stream phase still happens first. Tools still wait for message_stop. A four-second model stream followed by 2.5s of parallel tool execution is 6.5 seconds of wall clock. Eager tool calling makes it 4 seconds — the tools run during the stream, not after it.

See METHOD.md for the full mechanism: the seal event, the tool_call_id invariant, the runtime contract, and the edge cases.

For the per-block mechanism (chunks → buffer → seal → dispatch), see docs/diagrams/seal-mechanism-flow.svg.

When NOT to use it

Fast tools (sub-50ms). Seal/dispatch overhead exceeds the latency saved.
Sequentially dependent tools. If tool B needs tool A's result, the model won't emit B until A returns — no pipeline opportunity.
Non-idempotent tools. Payments, destructive commands, outbound messages. Route these to the classic path via Tool.idempotent = False for blanket denial, or via a per-call gate callable for case-by-case decisions with parsed args visible (e.g. allow read_file but not under /etc/). See docs/hitl.md. The gate still gates the eager path; the underlying tool still runs at the framework's tool step for non-denied calls.
Non-streaming backends. If your gateway buffers the full response, eager dispatch is impossible.

Long version with edge cases: docs/when-not-to-use.md.

Contributing

Adapter PRs welcome — LlamaIndex, AutoGen, Vercel AI SDK, any provider that exposes a streaming response with per-block identifiers. Start from packages/eager-tools-core/ as the contract reference. See NEXT.md §3 for the extraction pattern.

Bug reports + design discussions happen in GitHub Discussions — issues are intentionally disabled to keep the signal-to-noise ratio high.

Acknowledgements

This pattern was extracted from production at CloudThinker, where it cuts median agent task latency by 50%. Internal codename: tool-call pipelining. External name: eager tool calling.

Read the full production story: Eager Tool Calling at CloudThinker.

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github		.github
bench		bench
docs		docs
examples		examples
packages		packages
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
TODO.md		TODO.md
pyrightconfig.json		pyrightconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

eager-tools

The problem in one graph

Benchmark headline

60-second quickstart

Why this exists

When NOT to use it

Contributing

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

eager-tools

The problem in one graph

Benchmark headline

60-second quickstart

Why this exists

When NOT to use it

Contributing

Acknowledgements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages