Use response files for public agent output by wwind123 · Pull Request #62 · wwind123/coding-review-agent-loop

wwind123 · 2026-05-03T02:11:32Z

Summary

Add shared public response-file support under /tmp/coding-review-agent-loop/responses/.
Instruct Claude and Gemini to write the final public GitHub response to that file.
Prefer the response-file contents over stdout when present, while keeping existing stdout/marker parsing as fallback.
Add a Gemini regression test using PR Refresh architecture docs and memory map #61-style internal stdout chatter to verify only the response file is posted.
Document the response-file behavior.

Local mimic

I ran a local FakeRunner PR #61 mimic with Gemini stdout containing diagnostics/tool narration and a clean public response file. The posted comment was only:

The round 1 blocking issue has been fully addressed.

-- Google Gemini

<!-- AGENT_STATE: approved -->

No GitHub writes were made by the mimic.

Tests

/home/wwind123/tools/coding-review-agent-loop/.venv/bin/python -m pytest tests/test_agent_loop.py -q
python3 -m py_compile src/coding_review_agent_loop/agents/base.py src/coding_review_agent_loop/agents/claude.py src/coding_review_agent_loop/agents/gemini.py tests/test_agent_loop.py

-- OpenAI Codex

wwind123 · 2026-05-03T02:16:37Z

Now I have enough context to write a complete review.

PR #62 Review: Use response files for public agent output

Summary of change: Adds a shared mechanism in base.py for generating a UUID-keyed response file path under /tmp/coding-review-agent-loop/responses/. Claude and Gemini are instructed to write their public response there; the orchestrator reads the file and prefers it over stdout. Stdout parsing is kept as fallback. A new Gemini regression test verifies the happy path.

Correctness

public_response_path / read_public_response_file (base.py):

The UUID per-invocation guarantees no cross-run collision. mkdir(parents=True, exist_ok=True) is called eagerly at path-generation time (not file-write time), which is fine — it only creates the directory.
read_public_response_file catches FileNotFoundError and returns None (correct fallback); other I/O errors propagate loudly, which is the right call.
_safe_repo_slug handles / and : but not other potential shell-special characters in repo names (e.g. .. or whitespace). In practice GitHub repo names disallow most of these, so this is not a blocking issue.

Claude (claude.py):

Prompt is always appended last; read_public_response_file(response_path) or text correctly falls back to the parsed JSON stdout if the file is absent. Session ID is still sourced from stdout, which is correct — only the visible response text changes.

Gemini (gemini.py):

Prompt wrapping order: _with_public_response_marker_instruction(with_public_response_file_instruction(...)) — the file instruction is inside, the marker instruction outside. Both mechanisms are active, and the orchestrator prefers the file. The existing _parse_gemini_output / _strip_gemini_preamble fallback still operates on stdout when no file is written.

Test _maybe_write_public_response_file:

Regex r"Write the final public response.*?\n\n([^\n]+/responses/[^\n]+\.md)" correctly parses the injected instruction. The re.S flag lets .*? skip past "that should be posted to GitHub to this file:" to reach the \n\n separator before the path. Verified against with_public_response_file_instruction's f-string template.
If no regex match is found despite public_response_outputs being non-empty, outputs are not consumed and could bleed into subsequent calls. This is acceptable for test-helper code in a deterministic test suite, but worth noting.

No blocking issues found.

Same-PR follow-ups

No test_claude_review_loop_prefers_public_response_file_over_stdout test. The PR adds test_gemini_review_loop_prefers_public_response_file_over_stdout but the analogous Claude test is missing. Claude's code path is symmetric (same public_response_path + read_public_response_file or text), so a mirror test using claude_outputs + public_response_outputs would complete the coverage in tests/test_agent_loop.py.

-- Anthropic Claude

wwind123 · 2026-05-03T02:17:46Z

The PR looks solid and correctly addresses the issue of internal agent dialogue leaking into GitHub comments by providing a dedicated response-file mechanism.

Same-PR follow-ups

Missing Claude regression test. While a Gemini test was added (test_gemini_review_loop_prefers_public_response_file_over_stdout), an analogous test for Claude should be added to tests/test_agent_loop.py to ensure symmetric coverage, especially since Claude's output parsing (JSON-based) differs from Gemini's.

-- Google Gemini

wwind123 · 2026-05-03T02:18:50Z

Implemented and pushed the same-PR follow-up to PR #62.

Added test_claude_review_loop_prefers_public_response_file_over_stdout in tests/test_agent_loop.py, mirroring the Gemini regression while using Claude’s JSON output path and asserting the public response file content is what gets posted.

Verified:
python -m pytest tests/test_agent_loop.py -k 'public_response_file_over_stdout'
python -m pytest

Commit pushed:
6f5c67a Add Claude public response regression test

-- OpenAI Codex

wwind123 · 2026-05-03T02:19:41Z

The new test is well-constructed. A few checks on correctness:

claude_outputs is a valid JSON-encoded string with a result field containing internal chatter — correctly exercises Claude's --output-format json path via _parse_claude_output.
claude_call[-1] is the right index: Claude appends the prompt last (args.append(...)), distinct from Gemini which uses cmd[2] as the --prompt value.
The path assertion "/coding-review-agent-loop/responses/OWNER-REPO/claude/" matches the structure produced by _safe_repo_slug("OWNER/REPO") = "OWNER-REPO" plus agent name "claude".
read_public_response_file strips the text, and the asserted comment matches the provided public_response_outputs value exactly — no off-by-one whitespace issues.
run_pr_loop(...) == 0 verifies the loop correctly parses  from the file content and terminates cleanly.

The round 2 change directly and completely addresses the only same-PR follow-up from round 1. No new issues introduced.

-- Anthropic Claude

wwind123 · 2026-05-03T02:21:15Z

The PR #62 is well-implemented and effectively addresses the issue of internal agent dialogue leaking into GitHub comments. By introducing a dedicated "public response file" mechanism, it provides a robust way for agents to separate their internal reasoning and diagnostics from the final public output.

Key highlights of the PR:

Robust Mechanism: The use of UUID-keyed response files under /tmp/coding-review-agent-loop/responses/ ensures isolation between different runs and agents.
Graceful Fallback: The orchestrator maintains backward compatibility by falling back to stdout parsing if the response file is missing or empty.
Clear Instructions: Both Claude and Gemini are given explicit instructions in their prompts to use the response file for their final output.
Symmetric Test Coverage: Regression tests have been added for both Gemini and Claude to verify that the response file content is preferred over stdout chatter.
Belts and Suspenders: Gemini retains its existing marker-based filtering as an additional layer of protection.

I've verified the changes by running the newly added regression tests, and they pass as expected.

Same-PR follow-ups

None. The previously identified missing Claude test has been implemented and verified.

Future follow-ups

None.

-- Google Gemini

Use response files for public agent output

79938d3

Add Claude public response regression test

6f5c67a

wwind123 merged commit e2d76ec into main May 3, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use response files for public agent output#62

Use response files for public agent output#62
wwind123 merged 2 commits into
mainfrom
codex/strip-gemini-internal-dialogue

wwind123 commented May 3, 2026

Uh oh!

wwind123 commented May 3, 2026

Uh oh!

wwind123 commented May 3, 2026

Uh oh!

wwind123 commented May 3, 2026

Uh oh!

wwind123 commented May 3, 2026

Uh oh!

wwind123 commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wwind123 commented May 3, 2026

Summary

Local mimic

Tests

Uh oh!

wwind123 commented May 3, 2026

PR #62 Review: Use response files for public agent output

Correctness

Same-PR follow-ups

Uh oh!

wwind123 commented May 3, 2026

Same-PR follow-ups

Uh oh!

wwind123 commented May 3, 2026

Uh oh!

wwind123 commented May 3, 2026

Uh oh!

wwind123 commented May 3, 2026

Same-PR follow-ups

Future follow-ups

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant