Balloon MCP is an MCP server for monitoring context fidelity in long AI sessions.
The server is designed around one central observation: long sessions do not only lose facts, they often lose the shape of the user's intent. Balloon turns that problem into a visible runtime surface with profiles, gap reports, drift-pressure summaries, retrieval anchors, corrective prompts, and replayable artifacts.
At a glance:
- deterministic Balloon is the stable benchmark anchor
- assist Balloon adds optional semantic refinement
- staged Balloon adds early, mid, and deep external passes for longer-session discipline
AI sessions often fail in a frustratingly subtle way:
- the latest answer sounds locally reasonable
- but it quietly abandons earlier constraints, protected areas, or verification obligations
- the user now has to manually drag the session back onto the right path
That is the problem Balloon is built to surface.
For a developer, this usually looks like:
- a model proposes a broad refactor when you asked for a bounded change
- a test requirement disappears halfway through a session
- a protected file or architecture choice gets ignored because the local turn sounded plausible
Balloon is meant to act like a reasoning sidecar for that failure mode. It does not try to be magical code generation. It tries to make drift visible and apply smaller corrective pressure before the session loses the plot.
This release is an early public alpha.
It is a working external approximation of the Balloon architecture. It does not claim:
- hidden-state access to closed models
- direct backend trickle into proprietary reasoning layers
- inference-layer memory implantation
Balloon MCP helps a host application:
- build a structured session profile
- audit the latest turn for drift and omissions
- score the current drift pressure instead of only listing raw gaps
- surface hidden requirements and questions behind the question
- retrieve only the most relevant anchors
- generate a low-volume, non-overriding proxy trickle
- reinforce recurring context in a memory ledger
- promote repeated drift into persistent focus that can change retrieval, trickle ordering, and release behavior
- release similarity-matched corrections from memory and trickle into the next step
- run a staged external prototype with early, mid, and deep Balloon passes
By design, the server returns analysis artifacts and corrective context. It does not patch your repo by itself.
Balloon now has an optional hybrid semantic lane in addition to the deterministic base.
That means:
- deterministic Balloon stays the stable benchmark anchor
- semantic CARA can be enabled as a shadow or assist mode
- developers can plug in their own model-backed adapter without changing the core server
- shadow mode and assist mode are both in the current smoke path
- assist mode still depends on the host allowing adapter process execution
Balloon now also includes a first staged external prototype.
That staged lane is still honest about the MCP boundary:
- it is an external approximation, not hidden-state access
- it runs early, mid, and deep Balloon stages in the open
- it uses similarity-gated release to decide which memory/trickle corrections should stay visible in the next step
- it gives us a fourth benchmark lane beyond baseline, deterministic, and assist
See docs/STAGED_EXTERNAL_BALLOON.md.
If you just want the shortest mental model:
- baseline drifts
- deterministic Balloon repairs the drift
- assist Balloon improves the wording and bounded-next-step quality
- staged Balloon adds re-check discipline before scope widens
A good Balloon run is not "more context" for its own sake.
It should make one specific failure visible:
- the latest answer looks locally plausible
- but it has stopped honoring earlier constraints, protected areas, or verification obligations
- Balloon surfaces that loss of intent and applies smaller corrective pressure instead of stuffing the whole session back into the next turn
- recurring drift can now become persistent focus, so repeated architecture or verification failures get pulled earlier into the correction path
That makes the first useful experience easier to relate to:
- you already know what the session should respect
- Balloon shows what was dropped
- Balloon gives the next model turn a bounded way to recover
The fastest way to understand the server is:
balloon_run_cycle
It runs the main Balloon loop:
- profile update
- hidden-requirement detection
- CARA-style gap audit
- drift-pressure scoring
- persistent drift focus when the same failure pattern keeps recurring
- targeted retrieval
- proxy trickle generation
- optional memory reinforcement
Tools:
balloon_run_cycleballoon_build_profileballoon_audit_turnballoon_detect_hidden_requirementsballoon_targeted_retrievalballoon_generate_proxy_trickleballoon_repair_next_turnballoon_semantic_cara_previewballoon_compare_repair_lanesballoon_run_staged_cycleballoon_compare_benchmark_lanesballoon_score_benchmark_lanesballoon_run_long_session_benchmarkballoon_score_long_session_benchmarkballoon_prepare_host_setup_packetballoon_validate_host_setupballoon_run_install_diagnosticsballoon_prepare_host_flow_packetballoon_prepare_host_validation_suiteballoon_record_host_validation_resultballoon_summarize_host_validation_resultsballoon_describe_slopcode_starter_suiteballoon_plan_slopcode_starter_benchmarkballoon_prepare_slopcode_problemballoon_prepare_slopcode_live_run_packetballoon_prepare_slopcode_live_run_finalize_packetballoon_prepare_slopcode_live_run_batchballoon_finalize_slopcode_live_runballoon_finalize_slopcode_live_run_batchballoon_record_slopcode_run_evidenceballoon_summarize_slopcode_run_evidenceballoon_summarize_slopcode_starter_suiteballoon_export_slopcode_starter_artifactsballoon_review_session_driftballoon_update_memory_ledgerballoon_explain_gap_report
Prompts:
balloon/repair-next-turnballoon/review-session-drift
Resources:
balloon://sessions/{sessionId}/summaryballoon://sessions/{sessionId}/profileballoon://sessions/{sessionId}/gapsballoon://sessions/{sessionId}/pressureballoon://sessions/{sessionId}/tricklesballoon://sessions/{sessionId}/memoryballoon://sessions/{sessionId}/releasesballoon://hosts/matrixballoon://hosts/{host}balloon://hosts/{host}/playbookballoon://hosts/{host}/validation-suiteballoon://hosts/{host}/validation-evidenceballoon://benchmark/slopcode/starter-suiteballoon://benchmark/slopcode/starter-suite/runbookballoon://benchmark/slopcode/live-run-playbookballoon://benchmark/slopcode/live-run-batchballoon://benchmark/slopcode/evidenceballoon://benchmark/slopcode/evidence/{problemName}balloon://benchmark/slopcode/problems/{problemName}
- read docs/INSTALL.md
- run
npm run verify:balloon:mcp - try the workflow in docs/DEMO_WORKFLOW.md
The recommended real host test right now is VS Code with .vscode/mcp.json.
The recommended first demo is intentionally small:
- earlier context says not to rewrite architecture and not to skip tests
- a later assistant turn confidently proposes a rewrite anyway
- Balloon produces a gap report, a drift-pressure summary, a proxy trickle, and a sharper next-turn repair path
If your MCP host is unreliable about prompt invocation, use balloon_repair_next_turn as the tool-level fallback. It returns the repair packet and a deterministic repaired reply, which makes demos and benchmarks more repeatable.
If you want the drift-review prompt without relying on prompt routing, use balloon_review_session_drift.
If you want to compare deterministic vs hybrid repair output directly, use balloon_compare_repair_lanes.
If you want the staged external approximation without depending on prompt routing, use balloon_run_staged_cycle.
If you want the benchmark-safe four-lane comparison, use balloon_compare_benchmark_lanes.
If you want checkpointed long-session comparison in one tool call, use balloon_run_long_session_benchmark.
If you want to inspect whether drift pressure is rising, falling, or staying stuck across a session, read balloon://sessions/{sessionId}/pressure.
If you want Balloon to generate or sanity-check a host config packet, use balloon_prepare_host_setup_packet, balloon_validate_host_setup, balloon_run_install_diagnostics, balloon_prepare_host_flow_packet, balloon_prepare_host_validation_suite, balloon_record_host_validation_result, balloon_summarize_host_validation_results, or balloon://hosts/matrix.
If you want the first real SlopCodeBench starter-suite workflow, use balloon_describe_slopcode_starter_suite and balloon_prepare_slopcode_problem.
If you want Balloon to hand you the full true-live rerun packet for a host/problem pair, use balloon_prepare_slopcode_live_run_packet.
If you want the whole starter-suite rerun pass prepared in one shot, use balloon_prepare_slopcode_live_run_batch.
If you want to paste the final transcript once and have Balloon score it, record the evidence, and export the artifact bundle in one pass, use balloon_finalize_slopcode_live_run.
If you want to refresh a whole starter-suite pass together after several real runs, use balloon_finalize_slopcode_live_run_batch.
If you want repo-backed SCBench summary bundles, use balloon_export_slopcode_starter_artifacts. Those exports now include both pressure traces and live-vs-replay evidence coverage.
If you want to keep benchmark claims honest, record whether a run was truly live with balloon_record_slopcode_run_evidence, summarize it with balloon_summarize_slopcode_run_evidence, and inspect balloon://benchmark/slopcode/evidence.
If the demo feels good, the important part is not that Balloon produced more text. The important part is that it preserved the existing direction and pushed the next reply back toward the user's real constraints.
- Installation
- Demo workflow
- Current readiness
- Semantic CARA
- Staged external Balloon
- Host compatibility
- Host validation
- Cline quickstart
- Roo Code quickstart
- Latency and correction tax
- Benchmark lanes
- Long-session benchmark
- SlopCodeBench starter suite
- Contributor starters
- MCP listings
- Architecture roadmap
- Contributing
- Security policy
- Support
- app/listing icon: docs/assets/balloon-mcp-icon.png
- README banner: docs/assets/balloon-mcp-banner.png
- staged explainer image: docs/assets/balloon-mcp-stages.png
- simple mark: docs/assets/balloon-mcp-mark.png
Balloon MCP is most useful when:
- a session has strong prior constraints that should continue to matter
- a locally plausible answer may still be drifting away from earlier intent
- visible correction artifacts are more valuable than invisible prompt stuffing
This public alpha does not claim:
- hidden-state access to closed models
- direct backend trickle into proprietary reasoning layers
- repo-wide architecture auditing as the main product identity
The current server is the external approximation of the Balloon architecture: CARA-style gap analysis, targeted retrieval, and proxy trickle for context fidelity over time.
