This document describes the on-demand terminal UI integration suite used to validate real REPL rendering behavior in tmux.
- Validate end-to-end UI behavior (startup banner, prompt, spinner, approvals, tool output).
- Catch regressions that unit tests cannot see (line redraw, status updates, terminal formatting).
- Preserve enough artifacts to make failures debuggable without rerunning immediately.
Integration test entrypoint:
tests/ui_tmux_regression.rs
Harness utilities:
tests/ui_tmux/mod.rs
Current scenarios:
- Baseline shell approval/rendering flow:
- Start buddy inside an isolated tmux pane through asciinema.
- Run one prompt that produces a deterministic
run_shelltool call via a fake model server. - Approve the command.
- Verify spinner/liveness lines, approval formatting, command output, and final assistant reply.
- Exit cleanly and assert expected mock request count.
- Managed tmux pane + targeted shell flow:
- Run scripted tool calls that create a managed pane (
tmux_create_pane) and then runrun_shelltargeted to that pane. - Approve both operations.
- Verify targeted approval and output rendering.
- Assert expected mock request count and clean shutdown.
- Run scripted tool calls that create a managed pane (
- Shared-shell guardrail flow:
- Fake model calls
run_shellwithset -e. - Verify command is blocked before execution with clear error text.
- Fake model calls
- Default-pane recovery flow:
- Fake model sends
tmux_send_keyswithexitto kill shared shell. - Follow-up
run_shellrequest should trigger shared-pane recovery. - Verify recovery notice and successful command execution.
- Fake model sends
- Missing-target suppression flow:
- Fake model repeats the same missing
tmux_send_keystarget. - Verify repeated identical failures are suppressed with deterministic guidance.
- Fake model repeats the same missing
The suite requires these commands in PATH:
tmuxasciinema
If either is missing, the ignored test fails with an actionable prerequisite message.
Each run writes under:
artifacts/ui-regression/<scenario>-<pid>-<timestamp>/
Artifacts include:
session.cast:- full asciinema recording.
pipe.log:- continuous
tmux pipe-paneoutput stream.
- continuous
snapshots/*.txt:- checkpoint captures from
tmux capture-pane(plain + ANSI).
- checkpoint captures from
report.json:- structured assertion report with
matched=true/falseand artifact paths.
- structured assertion report with
Artifacts are intentionally preserved for both pass and fail runs.
Tmux cleanup behavior:
- Harness always kills its own detached session on teardown.
- Harness also kills the buddy-managed tmux session derived from the scenario session name to prevent session leaks across runs.
- Regression scenarios explicitly assert that the derived buddy-managed session does not exist after teardown.
Opt-in direct cargo command:
cargo test --test ui_tmux_regression -- --ignored --nocaptureMakefile wrapper:
make test-ui-regression- The integration test starts a local scripted fake model HTTP server.
- The fake server returns:
- tool-call response on request #1,
- final assistant text on request #2.
- Responses include short delays to exercise spinner/liveness UI paths.
- The test writes and uses an isolated
buddy.tomlprofile that targets the fake server.
When adding scenarios:
- Keep each scenario deterministic and minimal.
- Add explicit expected substrings for each UI element being validated.
- Persist all relevant captures and update
report.jsonschema only additively. - Keep tests
#[ignore]unless intentionally moving them into default CI coverage.