Balloon MCP Server

Balloon MCP is an MCP server for monitoring context fidelity in long AI sessions.

The server is designed around one central observation: long sessions do not only lose facts, they often lose the shape of the user's intent. Balloon turns that problem into a visible runtime surface with profiles, gap reports, drift-pressure summaries, retrieval anchors, corrective prompts, and replayable artifacts.

At a glance:

deterministic Balloon is the stable benchmark anchor
assist Balloon adds optional semantic refinement
staged Balloon adds early, mid, and deep external passes for longer-session discipline

Why Balloon Exists

AI sessions often fail in a frustratingly subtle way:

the latest answer sounds locally reasonable
but it quietly abandons earlier constraints, protected areas, or verification obligations
the user now has to manually drag the session back onto the right path

That is the problem Balloon is built to surface.

For a developer, this usually looks like:

a model proposes a broad refactor when you asked for a bounded change
a test requirement disappears halfway through a session
a protected file or architecture choice gets ignored because the local turn sounded plausible

Balloon is meant to act like a reasoning sidecar for that failure mode. It does not try to be magical code generation. It tries to make drift visible and apply smaller corrective pressure before the session loses the plot.

Status

This release is an early public alpha.

It is a working external approximation of the Balloon architecture. It does not claim:

hidden-state access to closed models
direct backend trickle into proprietary reasoning layers
inference-layer memory implantation

What It Does

Balloon MCP helps a host application:

build a structured session profile
audit the latest turn for drift and omissions
score the current drift pressure instead of only listing raw gaps
surface hidden requirements and questions behind the question
retrieve only the most relevant anchors
generate a low-volume, non-overriding proxy trickle
reinforce recurring context in a memory ledger
promote repeated drift into persistent focus that can change retrieval, trickle ordering, and release behavior
release similarity-matched corrections from memory and trickle into the next step
run a staged external prototype with early, mid, and deep Balloon passes

By design, the server returns analysis artifacts and corrective context. It does not patch your repo by itself.

Optional Hybrid Lane

Balloon now has an optional hybrid semantic lane in addition to the deterministic base.

That means:

deterministic Balloon stays the stable benchmark anchor
semantic CARA can be enabled as a shadow or assist mode
developers can plug in their own model-backed adapter without changing the core server
shadow mode and assist mode are both in the current smoke path
assist mode still depends on the host allowing adapter process execution

See docs/SEMANTIC_CARA.md.

Staged External Prototype

Balloon now also includes a first staged external prototype.

That staged lane is still honest about the MCP boundary:

it is an external approximation, not hidden-state access
it runs early, mid, and deep Balloon stages in the open
it uses similarity-gated release to decide which memory/trickle corrections should stay visible in the next step
it gives us a fourth benchmark lane beyond baseline, deterministic, and assist

See docs/STAGED_EXTERNAL_BALLOON.md.

If you just want the shortest mental model:

baseline drifts
deterministic Balloon repairs the drift
assist Balloon improves the wording and bounded-next-step quality
staged Balloon adds re-check discipline before scope widens

Why It Feels Different

A good Balloon run is not "more context" for its own sake.

It should make one specific failure visible:

the latest answer looks locally plausible
but it has stopped honoring earlier constraints, protected areas, or verification obligations
Balloon surfaces that loss of intent and applies smaller corrective pressure instead of stuffing the whole session back into the next turn
recurring drift can now become persistent focus, so repeated architecture or verification failures get pulled earlier into the correction path

That makes the first useful experience easier to relate to:

you already know what the session should respect
Balloon shows what was dropped
Balloon gives the next model turn a bounded way to recover

Core Entry Point

The fastest way to understand the server is:

balloon_run_cycle

It runs the main Balloon loop:

profile update
hidden-requirement detection
CARA-style gap audit
drift-pressure scoring
persistent drift focus when the same failure pattern keeps recurring
targeted retrieval
proxy trickle generation
optional memory reinforcement

Protocol Surface

Tools:

balloon_run_cycle
balloon_build_profile
balloon_audit_turn
balloon_detect_hidden_requirements
balloon_targeted_retrieval
balloon_generate_proxy_trickle
balloon_repair_next_turn
balloon_semantic_cara_preview
balloon_compare_repair_lanes
balloon_run_staged_cycle
balloon_compare_benchmark_lanes
balloon_score_benchmark_lanes
balloon_run_long_session_benchmark
balloon_score_long_session_benchmark
balloon_prepare_host_setup_packet
balloon_validate_host_setup
balloon_run_install_diagnostics
balloon_prepare_host_flow_packet
balloon_prepare_host_validation_suite
balloon_record_host_validation_result
balloon_summarize_host_validation_results
balloon_describe_slopcode_starter_suite
balloon_plan_slopcode_starter_benchmark
balloon_prepare_slopcode_problem
balloon_prepare_slopcode_live_run_packet
balloon_prepare_slopcode_live_run_finalize_packet
balloon_prepare_slopcode_live_run_batch
balloon_finalize_slopcode_live_run
balloon_finalize_slopcode_live_run_batch
balloon_record_slopcode_run_evidence
balloon_summarize_slopcode_run_evidence
balloon_summarize_slopcode_starter_suite
balloon_export_slopcode_starter_artifacts
balloon_review_session_drift
balloon_update_memory_ledger
balloon_explain_gap_report

Prompts:

balloon/repair-next-turn
balloon/review-session-drift

Resources:

balloon://sessions/{sessionId}/summary
balloon://sessions/{sessionId}/profile
balloon://sessions/{sessionId}/gaps
balloon://sessions/{sessionId}/pressure
balloon://sessions/{sessionId}/trickles
balloon://sessions/{sessionId}/memory
balloon://sessions/{sessionId}/releases
balloon://hosts/matrix
balloon://hosts/{host}
balloon://hosts/{host}/playbook
balloon://hosts/{host}/validation-suite
balloon://hosts/{host}/validation-evidence
balloon://benchmark/slopcode/starter-suite
balloon://benchmark/slopcode/starter-suite/runbook
balloon://benchmark/slopcode/live-run-playbook
balloon://benchmark/slopcode/live-run-batch
balloon://benchmark/slopcode/evidence
balloon://benchmark/slopcode/evidence/{problemName}
balloon://benchmark/slopcode/problems/{problemName}

Getting Started

read docs/INSTALL.md
run npm run verify:balloon:mcp
try the workflow in docs/DEMO_WORKFLOW.md

The recommended real host test right now is VS Code with .vscode/mcp.json.

First Demo

The recommended first demo is intentionally small:

earlier context says not to rewrite architecture and not to skip tests
a later assistant turn confidently proposes a rewrite anyway
Balloon produces a gap report, a drift-pressure summary, a proxy trickle, and a sharper next-turn repair path

If your MCP host is unreliable about prompt invocation, use balloon_repair_next_turn as the tool-level fallback. It returns the repair packet and a deterministic repaired reply, which makes demos and benchmarks more repeatable.

If you want the drift-review prompt without relying on prompt routing, use balloon_review_session_drift.

If you want to compare deterministic vs hybrid repair output directly, use balloon_compare_repair_lanes.

If you want the staged external approximation without depending on prompt routing, use balloon_run_staged_cycle.

If you want the benchmark-safe four-lane comparison, use balloon_compare_benchmark_lanes.

If you want checkpointed long-session comparison in one tool call, use balloon_run_long_session_benchmark.

If you want to inspect whether drift pressure is rising, falling, or staying stuck across a session, read balloon://sessions/{sessionId}/pressure.

If you want Balloon to generate or sanity-check a host config packet, use balloon_prepare_host_setup_packet, balloon_validate_host_setup, balloon_run_install_diagnostics, balloon_prepare_host_flow_packet, balloon_prepare_host_validation_suite, balloon_record_host_validation_result, balloon_summarize_host_validation_results, or balloon://hosts/matrix.

If you want the first real SlopCodeBench starter-suite workflow, use balloon_describe_slopcode_starter_suite and balloon_prepare_slopcode_problem.

If you want Balloon to hand you the full true-live rerun packet for a host/problem pair, use balloon_prepare_slopcode_live_run_packet.

If you want the whole starter-suite rerun pass prepared in one shot, use balloon_prepare_slopcode_live_run_batch.

If you want to paste the final transcript once and have Balloon score it, record the evidence, and export the artifact bundle in one pass, use balloon_finalize_slopcode_live_run.

If you want to refresh a whole starter-suite pass together after several real runs, use balloon_finalize_slopcode_live_run_batch.

If you want repo-backed SCBench summary bundles, use balloon_export_slopcode_starter_artifacts. Those exports now include both pressure traces and live-vs-replay evidence coverage.

If you want to keep benchmark claims honest, record whether a run was truly live with balloon_record_slopcode_run_evidence, summarize it with balloon_summarize_slopcode_run_evidence, and inspect balloon://benchmark/slopcode/evidence.

If the demo feels good, the important part is not that Balloon produced more text. The important part is that it preserved the existing direction and pushed the next reply back toward the user's real constraints.

Documentation

Visual Assets

app/listing icon: docs/assets/balloon-mcp-icon.png
README banner: docs/assets/balloon-mcp-banner.png
staged explainer image: docs/assets/balloon-mcp-stages.png
simple mark: docs/assets/balloon-mcp-mark.png

Good Fit

Balloon MCP is most useful when:

a session has strong prior constraints that should continue to matter
a locally plausible answer may still be drifting away from earlier intent
visible correction artifacts are more valuable than invisible prompt stuffing

Not The Claim

This public alpha does not claim:

hidden-state access to closed models
direct backend trickle into proprietary reasoning layers
repo-wide architecture auditing as the main product identity

The current server is the external approximation of the Balloon architecture: CARA-style gap analysis, targeted retrieval, and proxy trickle for context fidelity over time.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github		.github
docs		docs
examples		examples
src		src
verification		verification
.gitignore		.gitignore
.zenodo.json		.zenodo.json
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
package-lock.json		package-lock.json
package.json		package.json
server.json		server.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Balloon MCP Server

Why Balloon Exists

Status

What It Does

Optional Hybrid Lane

Staged External Prototype

Why It Feels Different

Core Entry Point

Protocol Surface

Getting Started

First Demo

Documentation

Visual Assets

Good Fit

Not The Claim

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Balloon MCP Server

Why Balloon Exists

Status

What It Does

Optional Hybrid Lane

Staged External Prototype

Why It Feels Different

Core Entry Point

Protocol Surface

Getting Started

First Demo

Documentation

Visual Assets

Good Fit

Not The Claim

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages