diff --git a/.codex/skills/babysit-pr/SKILL.md b/.codex/skills/babysit-pr/SKILL.md deleted file mode 100644 index cec623ccafd..00000000000 --- a/.codex/skills/babysit-pr/SKILL.md +++ /dev/null @@ -1,227 +0,0 @@ ---- -name: babysit-pr -description: Babysit a GitHub pull request after creation by continuously polling review comments, CI checks/workflow runs, and mergeability state until the PR is merged/closed or user help is required. Diagnose failures, retry likely flaky failures up to 3 times, auto-fix/push branch-related issues when appropriate, and keep watching open PRs so fresh review feedback is surfaced promptly. Use when the user asks Codex to monitor a PR, watch CI, handle review comments, or keep an eye on failures and feedback on an open PR. -resources: - - path: scripts/gh_pr_watch.py - kind: script - description: GitHub PR watcher that normalizes CI, review, mergeability, and retry state. - - path: references/heuristics.md - kind: reference - description: CI failure classification checklist for PR babysitting decisions. - - path: references/github-api-notes.md - kind: reference - description: GitHub API notes for PR, review, and Actions watcher behavior. -commands: - - name: pr-snapshot - resource_path: scripts/gh_pr_watch.py - example_argv: ["python3", ".codex/skills/babysit-pr/scripts/gh_pr_watch.py", "--pr", "auto", "--once"] - purpose: Emit one JSON snapshot of PR review, CI, and mergeability state. - - name: pr-watch - resource_path: scripts/gh_pr_watch.py - example_argv: ["python3", ".codex/skills/babysit-pr/scripts/gh_pr_watch.py", "--pr", "auto", "--watch"] - purpose: Continuously emit JSONL snapshots while babysitting a PR. - - name: retry-failed-checks - resource_path: scripts/gh_pr_watch.py - example_argv: ["python3", ".codex/skills/babysit-pr/scripts/gh_pr_watch.py", "--pr", "auto", "--retry-failed-now"] - purpose: Rerun failed jobs for the current PR when watcher policy recommends it. -workflow_defaults: - - name: pr_target - value: auto - description: Infer the PR from the current branch unless the user provides a number or URL. - - name: max_flaky_retries - value: "3" - description: Stop for user help after three unrelated/flaky rerun cycles per head SHA. - - name: poll_cadence - value: 1 minute while babysitting - description: Keep polling until the PR closes or a user-help blocker appears. ---- - -# PR Babysitter - -## Objective -Babysit a PR persistently until one of these terminal outcomes occurs: - -- The PR is merged or closed. -- A situation requires user help (for example CI infrastructure issues, repeated flaky failures after retry budget is exhausted, permission problems, or ambiguity that cannot be resolved safely). -- Optional handoff milestone: the PR is currently green + mergeable + review-clean. Treat this as a progress state, not a watcher stop, so late-arriving review comments are still surfaced promptly while the PR remains open. - -Do not stop merely because a single snapshot returns `idle` while checks are still pending. - -## Inputs -Accept any of the following: - -- No PR argument: infer the PR from the current branch (`--pr auto`) -- PR number -- PR URL - -## Core Workflow - -1. When the user asks to "monitor"/"watch"/"babysit" a PR, start with the watcher's continuous mode (`--watch`) unless you are intentionally doing a one-shot diagnostic snapshot. -2. Run the watcher script to snapshot PR/review/CI state (or consume each streamed snapshot from `--watch`). -3. Inspect the `actions` list in the JSON response. -4. If `diagnose_ci_failure` is present, inspect failed run logs and classify the failure. -5. If the failure is likely caused by the current branch, patch code locally, commit, and push. Do not patch random flaky tests, CI infrastructure, dependency outages, runner issues, or other failures that are unrelated to the branch. -6. If `process_review_comment` is present, inspect surfaced review items and decide whether to address them. -7. If a review item is actionable and correct, patch code locally, commit, push, and then mark the associated review thread/comment as resolved once the fix is on GitHub. -8. Do not post replies to human-authored review comments/threads unless the user explicitly confirms the exact response. If a human review item is non-actionable, already addressed, or not valid, surface the item and recommended response to the user instead of replying on GitHub. -9. If the failure is likely flaky/unrelated and `retry_failed_checks` is present, rerun failed jobs with `--retry-failed-now`. -10. If both actionable review feedback and `retry_failed_checks` are present, prioritize review feedback first; a new commit will retrigger CI, so avoid rerunning flaky checks on the old SHA unless you intentionally defer the review change. -11. On every loop, look for newly surfaced review feedback before acting on CI failures or mergeability state, then verify mergeability / merge-conflict status (for example via `gh pr view`) alongside CI. -12. After any push or rerun action, immediately return to step 1 and continue polling on the updated SHA/state. -13. If you had been using `--watch` before pausing to patch/commit/push, relaunch `--watch` yourself in the same turn immediately after the push (do not wait for the user to re-invoke the skill). -14. Repeat polling until `stop_pr_closed` appears or a user-help-required blocker is reached. A green + review-clean + mergeable PR is a progress milestone, not a reason to stop the watcher while the PR is still open. -15. Maintain terminal/session ownership: while babysitting is active, keep consuming watcher output in the same turn; do not leave a detached `--watch` process running and then end the turn as if monitoring were complete. - -## Commands - -### One-shot snapshot - -```bash -python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr auto --once -``` - -### Continuous watch (JSONL) - -```bash -python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr auto --watch -``` - -### Trigger flaky retry cycle (only when watcher indicates) - -```bash -python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr auto --retry-failed-now -``` - -### Explicit PR target - -```bash -python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr --once -``` - -## CI Failure Classification -Use `gh` commands to inspect failed runs before deciding to rerun. - -- `gh run view --json jobs,name,workflowName,conclusion,status,url,headSha` -- `gh api repos///actions/runs//jobs -X GET -f per_page=100` -- `gh api repos///actions/jobs//logs > /tmp/codex-gh-job--logs.zip` -- `gh run view --log-failed` as a fallback after the overall workflow run is complete - -`gh run view --log-failed` is workflow-run scoped and may not expose failed-job logs until the overall run finishes. For faster diagnosis, poll the run's jobs first and, as soon as a specific job has failed, fetch that job's logs directly from the Actions job logs endpoint. The watcher includes a `failed_jobs` list with each failed job's `job_id` and `logs_endpoint` when GitHub exposes one. - -Prefer treating failures as branch-related when failed-job logs point to changed code (compile/test/lint/typecheck/snapshots/static analysis in touched areas). - -Prefer treating failures as flaky/unrelated when logs show transient infra/external issues (timeouts, runner provisioning failures, registry/network outages, GitHub Actions infra errors). - -Do not attempt to fix flaky/unrelated failures by changing tests, build scripts, CI configuration, dependency pins, or infrastructure-adjacent code unless the logs clearly connect the failure to the PR branch. For flaky/unrelated failures, rerun only when the watcher recommends `retry_failed_checks`; otherwise wait or stop for user help. - -If classification is ambiguous, perform one manual diagnosis attempt before choosing rerun. - -Read `.codex/skills/babysit-pr/references/heuristics.md` for a concise checklist. - -## Review Comment Handling -The watcher surfaces review items from: - -- PR issue comments -- Inline review comments -- Review submissions (COMMENT / APPROVED / CHANGES_REQUESTED) - -It intentionally surfaces Codex reviewer bot feedback (for example comments/reviews from `chatgpt-codex-connector[bot]`) in addition to human reviewer feedback. Most unrelated bot noise should still be ignored. -For safety, the watcher only auto-surfaces trusted human review authors (for example repo OWNER/MEMBER/COLLABORATOR, plus the authenticated operator) and approved review bots such as Codex. -On a fresh watcher state file, existing pending review feedback may be surfaced immediately (not only comments that arrive after monitoring starts). This is intentional so already-open review comments are not missed. - -When you agree with a comment and it is actionable: - -1. Patch code locally. -2. Commit with `codex: address PR review feedback (#)`. -3. Push to the PR head branch. -4. After the push succeeds, mark the associated GitHub review thread/comment as resolved. -5. Resume watching on the new SHA immediately (do not stop after reporting the push). -6. If monitoring was running in `--watch` mode, restart `--watch` immediately after the push in the same turn; do not wait for the user to ask again. - -Do not post replies to human-authored GitHub review comments/threads automatically. If you disagree with a human comment, believe it is non-actionable/already addressed, or need to answer a question, report the item to the user with a suggested response and wait for explicit confirmation before posting anything on GitHub. If the user approves a response, prefix it with `[codex]` so it is clear the response is automated and not from the human user. -If the watcher later surfaces your own approved reply because the authenticated operator is treated as a trusted review author, treat that self-authored item as already handled and do not reply again. -If a code review comment/thread is already marked as resolved in GitHub, treat it as non-actionable and safely ignore it unless new unresolved follow-up feedback appears. - -## Git Safety Rules - -- Work only on the PR head branch. -- Avoid destructive git commands. -- Do not switch branches unless necessary to recover context. -- Before editing, check for unrelated uncommitted changes. If present, stop and ask the user. -- After each successful fix, commit and `git push`, then re-run the watcher. -- If you interrupted a live `--watch` session to make the fix, restart `--watch` immediately after the push in the same turn. -- Do not run multiple concurrent `--watch` processes for the same PR/state file; keep one watcher session active and reuse it until it stops or you intentionally restart it. -- A push is not a terminal outcome; continue the monitoring loop unless a strict stop condition is met. - -Commit message defaults: - -- `codex: fix CI failure on PR #` -- `codex: address PR review feedback (#)` - -## Monitoring Loop Pattern -Use this loop in a live Codex session: - -1. Run `--once`. -2. Read `actions`. -3. First check whether the PR is now merged or otherwise closed; if so, report that terminal state and stop polling immediately. -4. Check CI summary, new review items, and mergeability/conflict status. -5. Diagnose CI failures and classify branch-related vs flaky/unrelated. If the overall run is still pending but `failed_jobs` already includes a failed job, fetch that job's logs and diagnose immediately instead of waiting for the whole workflow run to finish. Patch only when the failure is branch-related. -6. For each surfaced review item from another author, patch/commit/push and then resolve it if it is actionable. If it is non-actionable, already addressed, or requires a written answer, surface it to the user with a suggested response instead of posting automatically. If a later snapshot surfaces your own approved reply, treat it as informational and continue without responding again. -7. Process actionable review comments before flaky reruns when both are present; if a review fix requires a commit, push it and skip rerunning failed checks on the old SHA. -8. Retry failed checks only when `retry_failed_checks` is present and you are not about to replace the current SHA with a review/CI fix commit. Do not make code changes for unrelated flakes or infrastructure failures just to get CI green. -9. If you pushed a commit, resolved a review thread, or triggered a rerun, report the action briefly and continue polling (do not stop). If a human review comment needs a written GitHub response, stop and ask for confirmation before posting. -10. After a review-fix push, proactively restart continuous monitoring (`--watch`) in the same turn unless a strict stop condition has already been reached. -11. If everything is passing, mergeable, not blocked on required review approval, and there are no unaddressed review items, report that the PR is currently ready to merge but keep the watcher running so new review comments are surfaced quickly while the PR remains open. -12. If blocked on a user-help-required issue (infra outage, exhausted flaky retries, unclear reviewer request, permissions), report the blocker and stop. -13. Otherwise sleep according to the polling cadence below and repeat. - -When the user explicitly asks to monitor/watch/babysit a PR, prefer `--watch` so polling continues autonomously in one command. Use repeated `--once` snapshots only for debugging, local testing, or when the user explicitly asks for a one-shot check. -Do not stop to ask the user whether to continue polling; continue autonomously until a strict stop condition is met or the user explicitly interrupts. -Do not hand control back to the user after a review-fix push just because a new SHA was created; restarting the watcher and re-entering the poll loop is part of the same babysitting task. -If a `--watch` process is still running and no strict stop condition has been reached, the babysitting task is still in progress; keep streaming/consuming watcher output instead of ending the turn. - -## Polling Cadence -Keep review polling aggressive and continue monitoring even after CI turns green: - -- While CI is not green (pending/running/queued or failing): poll every 1 minute. -- After CI turns green: keep polling at the base cadence while the PR remains open so newly posted review comments are surfaced promptly instead of waiting on a long green-state backoff. -- Reset the cadence immediately whenever anything changes (new commit/SHA, check status changes, new review comments, mergeability changes, review decision changes). -- If CI stops being green again (new commit, rerun, or regression): stay on the base polling cadence. -- If any poll shows the PR is merged or otherwise closed: stop polling immediately and report the terminal state. - -## Stop Conditions (Strict) -Stop only when one of the following is true: - -- PR merged or closed (stop as soon as a poll/snapshot confirms this). -- User intervention is required and Codex cannot safely proceed alone. - -Keep polling when: - -- `actions` contains only `idle` but checks are still pending. -- CI is still running/queued. -- Review state is quiet but CI is not terminal. -- CI is green but mergeability is unknown/pending. -- CI is green and mergeable, but the PR is still open and you are waiting for possible new review comments or merge-conflict changes. -- The PR is green but blocked on review approval (`REVIEW_REQUIRED` / similar); continue polling at the base cadence and surface any new review comments without asking for confirmation to keep watching. - -## Output Expectations -Provide concise progress updates while monitoring and a final summary that includes: - -- During long unchanged monitoring periods, avoid emitting a full update on every poll; summarize only status changes plus occasional heartbeat updates. -- Treat push confirmations, intermediate CI snapshots, ready-to-merge snapshots, and review-action updates as progress updates only; do not emit the final summary or end the babysitting session unless a strict stop condition is met. -- A user request to "monitor" is not satisfied by a couple of sample polls; remain in the loop until a strict stop condition or an explicit user interruption. -- A review-fix commit + push is not a completion event; immediately resume live monitoring (`--watch`) in the same turn and continue reporting progress updates. -- When CI first transitions to all green for the current SHA, emit a one-time celebratory progress update (do not repeat it on every green poll). Preferred style: `🚀 CI is all green! 33/33 passed. Still on watch for review approval.` -- Do not send the final summary while a watcher terminal is still running unless the watcher has emitted/confirmed a strict stop condition; otherwise continue with progress updates. - -- Final PR SHA -- CI status summary -- Mergeability / conflict status -- Fixes pushed -- Flaky retry cycles used -- Remaining unresolved failures or review comments - -## References - -- Heuristics and decision tree: `.codex/skills/babysit-pr/references/heuristics.md` -- GitHub CLI/API details used by the watcher: `.codex/skills/babysit-pr/references/github-api-notes.md` diff --git a/.codex/skills/babysit-pr/agents/openai.yaml b/.codex/skills/babysit-pr/agents/openai.yaml deleted file mode 100644 index c6946cf8c0e..00000000000 --- a/.codex/skills/babysit-pr/agents/openai.yaml +++ /dev/null @@ -1,4 +0,0 @@ -interface: - display_name: "PR Babysitter" - short_description: "Watch PR review comments, CI, and merge conflicts" - default_prompt: "Babysit the current PR: monitor reviewer comments, CI, and merge-conflict status (prefer the watcher’s --watch mode for live monitoring); surface new review feedback before acting on CI or mergeability work, fix valid issues, push updates, and rerun flaky failures up to 3 times. Do not post replies to human-authored review comments unless the user explicitly confirms the exact response. Do not patch unrelated flaky tests, CI infrastructure, dependency outages, runner issues, or other failures that are not caused by the branch. Keep exactly one watcher session active for the PR (do not leave duplicate --watch terminals running). If you pause monitoring to patch review/CI feedback, restart --watch yourself immediately after the push in the same turn. If a watcher is still running and no strict stop condition has been reached, the task is still in progress: keep consuming watcher output and sending progress updates instead of ending the turn. Do not treat a green + mergeable PR as a terminal stop while it is still open; continue polling autonomously after any push/rerun so newly posted review comments are surfaced until a strict terminal stop condition is reached or the user interrupts." diff --git a/.codex/skills/babysit-pr/references/github-api-notes.md b/.codex/skills/babysit-pr/references/github-api-notes.md deleted file mode 100644 index 8c0a7c8a540..00000000000 --- a/.codex/skills/babysit-pr/references/github-api-notes.md +++ /dev/null @@ -1,82 +0,0 @@ -# GitHub CLI / API Notes For `babysit-pr` - -## Primary commands used - -### PR metadata - -- `gh pr view --json number,url,state,mergedAt,closedAt,headRefName,headRefOid,headRepository,headRepositoryOwner` - -Used to resolve PR number, URL, branch, head SHA, and closed/merged state. - -### PR checks summary - -- `gh pr checks --json name,state,bucket,link,workflow,event,startedAt,completedAt` - -Used to compute pending/failed/passed counts and whether the current CI round is terminal. - -### Workflow runs for head SHA - -- `gh api repos/{owner}/{repo}/actions/runs -X GET -f head_sha= -f per_page=100` - -Used to discover failed workflow runs and rerunnable run IDs. - -### Failed log inspection - -- `gh run view --json jobs,name,workflowName,conclusion,status,url,headSha` -- `gh api repos/{owner}/{repo}/actions/runs/{run_id}/jobs -X GET -f per_page=100` -- `gh api repos/{owner}/{repo}/actions/jobs/{job_id}/logs > /tmp/codex-gh-job-{job_id}-logs.zip` -- `gh run view --log-failed` - -Used by Codex to classify branch-related vs flaky/unrelated failures. Prefer the direct job log endpoint as soon as a job has failed because `gh run view --log-failed` may not produce failed-job logs until the overall workflow run completes. - -### Retry failed jobs only - -- `gh run rerun --failed` - -Reruns only failed jobs (and dependencies) for a workflow run. - -## Review-related endpoints - -- Issue comments on PR: - - `gh api repos/{owner}/{repo}/issues//comments?per_page=100` -- Inline PR review comments: - - `gh api repos/{owner}/{repo}/pulls//comments?per_page=100` -- Review submissions: - - `gh api repos/{owner}/{repo}/pulls//reviews?per_page=100` - -## JSON fields consumed by the watcher - -### `gh pr view` - -- `number` -- `url` -- `state` -- `mergedAt` -- `closedAt` -- `headRefName` -- `headRefOid` - -### `gh pr checks` - -- `bucket` (`pass`, `fail`, `pending`, `skipping`) -- `state` -- `name` -- `workflow` -- `link` - -### Actions runs API (`workflow_runs[]`) - -- `id` -- `name` -- `status` -- `conclusion` -- `html_url` -- `head_sha` - -### Actions run jobs API (`jobs[]`) - -- `id` -- `name` -- `status` -- `conclusion` -- `html_url` diff --git a/.codex/skills/babysit-pr/references/heuristics.md b/.codex/skills/babysit-pr/references/heuristics.md deleted file mode 100644 index ee44c4a1948..00000000000 --- a/.codex/skills/babysit-pr/references/heuristics.md +++ /dev/null @@ -1,66 +0,0 @@ -# CI / Review Heuristics - -## CI classification checklist - -Treat as **branch-related** when logs clearly indicate a regression caused by the PR branch: - -- Compile/typecheck/lint failures in files or modules touched by the branch -- Deterministic unit/integration test failures in changed areas -- Snapshot output changes caused by UI/text changes in the branch -- Static analysis violations introduced by the latest push -- Build script/config changes in the PR causing a deterministic failure - -Treat as **likely flaky or unrelated** when evidence points to transient or external issues: - -- DNS/network/registry timeout errors while fetching dependencies -- Runner image provisioning or startup failures -- GitHub Actions infrastructure/service outages -- Cloud/service rate limits or transient API outages -- Non-deterministic failures in unrelated integration tests with known flake patterns - -Do not patch likely flaky/unrelated failures. Use the retry budget for rerunnable failures, wait for pending jobs, or stop and report the blocker when the failure is persistent or infrastructure-owned. - -If uncertain, inspect failed logs once before choosing rerun. - -## Decision tree (fix vs rerun vs stop) - -1. If PR is merged/closed: stop. -2. If there are failed checks: - - Diagnose first. - - If checks are still pending but an individual job has already failed: fetch that job's logs and diagnose now. - - If branch-related: fix locally, commit, push. - - If likely flaky/unrelated and all checks for the current SHA are terminal: rerun failed jobs. - - If likely flaky/unrelated and not safely rerunnable: stop and report the blocker; do not edit unrelated tests, build scripts, CI configuration, dependency pins, or infrastructure code. - - If checks are still pending and no failed job is available yet: wait. -3. If flaky reruns for the same SHA reach the configured limit (default 3): stop and report persistent failure. -4. Independently, process any new human review comments. - -## Review comment agreement criteria - -Address the comment when: - -- The comment is technically correct. -- The change is actionable in the current branch. -- The requested change does not conflict with the user’s intent or recent guidance. -- The change can be made safely without unrelated refactors. - -Fix valid human review feedback in code when possible, but do not post a GitHub reply to a human-authored comment/thread unless the user explicitly confirms the exact response. - -Do not auto-fix when: - -- The comment is ambiguous and needs clarification. -- The request conflicts with explicit user instructions. -- The proposed change requires product/design decisions the user has not made. -- The codebase is in a dirty/unrelated state that makes safe editing uncertain. -- The comment only needs a written answer or disagreement response; propose the reply to the user instead of posting it automatically. - -## Stop-and-ask conditions - -Stop and ask the user instead of continuing automatically when: - -- The local worktree has unrelated uncommitted changes. -- `gh` auth/permissions fail. -- The PR branch cannot be pushed. -- CI failures persist after the flaky retry budget. -- Reviewer feedback requires a product decision or cross-team coordination. -- A human review comment requires a written GitHub reply instead of a code change. diff --git a/.codex/skills/babysit-pr/scripts/gh_pr_watch.py b/.codex/skills/babysit-pr/scripts/gh_pr_watch.py deleted file mode 100755 index a250404824d..00000000000 --- a/.codex/skills/babysit-pr/scripts/gh_pr_watch.py +++ /dev/null @@ -1,873 +0,0 @@ -#!/usr/bin/env python3 -"""Watch GitHub PR CI and review activity for Codex PR babysitting workflows.""" - -import argparse -import json -import os -import re -import subprocess -import sys -import tempfile -import time -from pathlib import Path -from urllib.parse import urlparse - -FAILED_RUN_CONCLUSIONS = { - "failure", - "timed_out", - "cancelled", - "action_required", - "startup_failure", - "stale", -} -PENDING_CHECK_STATES = { - "QUEUED", - "IN_PROGRESS", - "PENDING", - "WAITING", - "REQUESTED", -} -REVIEW_BOT_LOGIN_KEYWORDS = { - "codex", -} -TRUSTED_AUTHOR_ASSOCIATIONS = { - "OWNER", - "MEMBER", - "COLLABORATOR", -} -MERGE_BLOCKING_REVIEW_DECISIONS = { - "REVIEW_REQUIRED", - "CHANGES_REQUESTED", -} -MERGE_CONFLICT_OR_BLOCKING_STATES = { - "BLOCKED", - "DIRTY", - "DRAFT", - "UNKNOWN", -} - - -class GhCommandError(RuntimeError): - pass - - -def parse_args(): - parser = argparse.ArgumentParser( - description=( - "Normalize PR/CI/review state for Codex PR babysitting and optionally " - "trigger flaky reruns." - ) - ) - parser.add_argument("--pr", default="auto", help="auto, PR number, or PR URL") - parser.add_argument("--repo", help="Optional OWNER/REPO override") - parser.add_argument("--poll-seconds", type=int, default=30, help="Watch poll interval") - parser.add_argument( - "--max-flaky-retries", - type=int, - default=3, - help="Max rerun cycles per head SHA before stop recommendation", - ) - parser.add_argument("--state-file", help="Path to state JSON file") - parser.add_argument("--once", action="store_true", help="Emit one snapshot and exit") - parser.add_argument("--watch", action="store_true", help="Continuously emit JSONL snapshots") - parser.add_argument( - "--retry-failed-now", - action="store_true", - help="Rerun failed jobs for current failed workflow runs when policy allows", - ) - parser.add_argument( - "--json", - action="store_true", - help="Emit machine-readable output (default behavior for --once and --retry-failed-now)", - ) - args = parser.parse_args() - - if args.poll_seconds <= 0: - parser.error("--poll-seconds must be > 0") - if args.max_flaky_retries < 0: - parser.error("--max-flaky-retries must be >= 0") - if args.watch and args.retry_failed_now: - parser.error("--watch cannot be combined with --retry-failed-now") - if not args.once and not args.watch and not args.retry_failed_now: - args.once = True - return args - - -def _format_gh_error(cmd, err): - stdout = (err.stdout or "").strip() - stderr = (err.stderr or "").strip() - parts = [f"GitHub CLI command failed: {' '.join(cmd)}"] - if stdout: - parts.append(f"stdout: {stdout}") - if stderr: - parts.append(f"stderr: {stderr}") - return "\n".join(parts) - - -def gh_text(args, repo=None): - cmd = ["gh"] - # `gh api` does not accept `-R/--repo` on all gh versions. The watcher's - # API calls use explicit endpoints (e.g. repos/{owner}/{repo}/...), so the - # repo flag is unnecessary there. - if repo and (not args or args[0] != "api"): - cmd.extend(["-R", repo]) - cmd.extend(args) - try: - proc = subprocess.run(cmd, check=True, capture_output=True, text=True) - except FileNotFoundError as err: - raise GhCommandError("`gh` command not found") from err - except subprocess.CalledProcessError as err: - raise GhCommandError(_format_gh_error(cmd, err)) from err - return proc.stdout - - -def gh_json(args, repo=None): - raw = gh_text(args, repo=repo).strip() - if not raw: - return None - try: - return json.loads(raw) - except json.JSONDecodeError as err: - raise GhCommandError(f"Failed to parse JSON from gh output for {' '.join(args)}") from err - - -def parse_pr_spec(pr_spec): - if pr_spec == "auto": - return {"mode": "auto", "value": None} - if re.fullmatch(r"\d+", pr_spec): - return {"mode": "number", "value": pr_spec} - parsed = urlparse(pr_spec) - if parsed.scheme and parsed.netloc and "/pull/" in parsed.path: - return {"mode": "url", "value": pr_spec} - raise ValueError("--pr must be 'auto', a PR number, or a PR URL") - - -def pr_view_fields(): - return ( - "number,url,state,mergedAt,closedAt,headRefName,headRefOid," - "headRepository,headRepositoryOwner,mergeable,mergeStateStatus,reviewDecision" - ) - - -def checks_fields(): - return "name,state,bucket,link,workflow,event,startedAt,completedAt" - - -def resolve_pr(pr_spec, repo_override=None): - parsed = parse_pr_spec(pr_spec) - cmd = ["pr", "view"] - if parsed["value"] is not None: - cmd.append(parsed["value"]) - cmd.extend(["--json", pr_view_fields()]) - data = gh_json(cmd, repo=repo_override) - if not isinstance(data, dict): - raise GhCommandError("Unexpected PR payload from `gh pr view`") - - pr_url = str(data.get("url") or "") - repo = ( - repo_override - or extract_repo_from_pr_url(pr_url) - or extract_repo_from_pr_view(data) - ) - if not repo: - raise GhCommandError("Unable to determine OWNER/REPO for the PR") - - state = str(data.get("state") or "") - merged = bool(data.get("mergedAt")) - closed = bool(data.get("closedAt")) or state.upper() == "CLOSED" - - return { - "number": int(data["number"]), - "url": pr_url, - "repo": repo, - "head_sha": str(data.get("headRefOid") or ""), - "head_branch": str(data.get("headRefName") or ""), - "state": state, - "merged": merged, - "closed": closed, - "mergeable": str(data.get("mergeable") or ""), - "merge_state_status": str(data.get("mergeStateStatus") or ""), - "review_decision": str(data.get("reviewDecision") or ""), - } - - -def extract_repo_from_pr_view(data): - head_repo = data.get("headRepository") - head_owner = data.get("headRepositoryOwner") - owner = None - name = None - if isinstance(head_owner, dict): - owner = head_owner.get("login") or head_owner.get("name") - elif isinstance(head_owner, str): - owner = head_owner - if isinstance(head_repo, dict): - name = head_repo.get("name") - repo_owner = head_repo.get("owner") - if not owner and isinstance(repo_owner, dict): - owner = repo_owner.get("login") or repo_owner.get("name") - elif isinstance(head_repo, str): - name = head_repo - if owner and name: - return f"{owner}/{name}" - return None -def extract_repo_from_pr_url(pr_url): - parsed = urlparse(pr_url) - parts = [p for p in parsed.path.split("/") if p] - if len(parts) >= 4 and parts[2] == "pull": - return f"{parts[0]}/{parts[1]}" - return None - - -def load_state(path): - if path.exists(): - try: - data = json.loads(path.read_text()) - except json.JSONDecodeError as err: - raise RuntimeError(f"State file is not valid JSON: {path}") from err - if not isinstance(data, dict): - raise RuntimeError(f"State file must contain an object: {path}") - return data, False - return { - "pr": {}, - "started_at": None, - "last_seen_head_sha": None, - "retries_by_sha": {}, - "seen_issue_comment_ids": [], - "seen_review_comment_ids": [], - "seen_review_ids": [], - "last_snapshot_at": None, - }, True - - -def save_state(path, state): - path.parent.mkdir(parents=True, exist_ok=True) - payload = json.dumps(state, indent=2, sort_keys=True) + "\n" - fd, tmp_name = tempfile.mkstemp(prefix=f"{path.name}.", suffix=".tmp", dir=path.parent) - tmp_path = Path(tmp_name) - try: - with os.fdopen(fd, "w", encoding="utf-8") as tmp_file: - tmp_file.write(payload) - os.replace(tmp_path, path) - except Exception: - try: - tmp_path.unlink(missing_ok=True) - except OSError: - pass - raise - - -def default_state_file_for(pr): - repo_slug = pr["repo"].replace("/", "-") - return Path(f"/tmp/codex-babysit-pr-{repo_slug}-pr{pr['number']}.json") - - -def get_pr_checks(pr_spec, repo): - parsed = parse_pr_spec(pr_spec) - cmd = ["pr", "checks"] - if parsed["value"] is not None: - cmd.append(parsed["value"]) - cmd.extend(["--json", checks_fields()]) - data = gh_json(cmd, repo=repo) - if data is None: - return [] - if not isinstance(data, list): - raise GhCommandError("Unexpected payload from `gh pr checks`") - return data - - -def is_pending_check(check): - bucket = str(check.get("bucket") or "").lower() - state = str(check.get("state") or "").upper() - return bucket == "pending" or state in PENDING_CHECK_STATES - - -def summarize_checks(checks): - pending_count = 0 - failed_count = 0 - passed_count = 0 - for check in checks: - bucket = str(check.get("bucket") or "").lower() - if is_pending_check(check): - pending_count += 1 - if bucket == "fail": - failed_count += 1 - if bucket == "pass": - passed_count += 1 - return { - "pending_count": pending_count, - "failed_count": failed_count, - "passed_count": passed_count, - "all_terminal": pending_count == 0, - } - - -def get_workflow_runs_for_sha(repo, head_sha): - endpoint = f"repos/{repo}/actions/runs" - data = gh_json( - ["api", endpoint, "-X", "GET", "-f", f"head_sha={head_sha}", "-f", "per_page=100"], - repo=repo, - ) - if not isinstance(data, dict): - raise GhCommandError("Unexpected payload from actions runs API") - runs = data.get("workflow_runs") or [] - if not isinstance(runs, list): - raise GhCommandError("Expected `workflow_runs` to be a list") - return runs - - -def failed_runs_from_workflow_runs(runs, head_sha): - failed_runs = [] - for run in runs: - if not isinstance(run, dict): - continue - if str(run.get("head_sha") or "") != head_sha: - continue - conclusion = str(run.get("conclusion") or "") - if conclusion not in FAILED_RUN_CONCLUSIONS: - continue - failed_runs.append( - { - "run_id": run.get("id"), - "workflow_name": run.get("name") or run.get("display_title") or "", - "status": str(run.get("status") or ""), - "conclusion": conclusion, - "html_url": str(run.get("html_url") or ""), - } - ) - failed_runs.sort(key=lambda item: (str(item.get("workflow_name") or ""), str(item.get("run_id") or ""))) - return failed_runs - - -def get_jobs_for_run(repo, run_id): - endpoint = f"repos/{repo}/actions/runs/{run_id}/jobs" - data = gh_json(["api", endpoint, "-X", "GET", "-f", "per_page=100"], repo=repo) - if not isinstance(data, dict): - raise GhCommandError("Unexpected payload from actions run jobs API") - jobs = data.get("jobs") or [] - if not isinstance(jobs, list): - raise GhCommandError("Expected `jobs` to be a list") - return jobs - - -def failed_jobs_from_workflow_runs(repo, runs, head_sha): - failed_jobs = [] - for run in runs: - if not isinstance(run, dict): - continue - if str(run.get("head_sha") or "") != head_sha: - continue - run_id = run.get("id") - if run_id in (None, ""): - continue - run_status = str(run.get("status") or "") - run_conclusion = str(run.get("conclusion") or "") - if run_status.lower() == "completed" and run_conclusion not in FAILED_RUN_CONCLUSIONS: - continue - jobs = get_jobs_for_run(repo, run_id) - for job in jobs: - if not isinstance(job, dict): - continue - conclusion = str(job.get("conclusion") or "") - if conclusion not in FAILED_RUN_CONCLUSIONS: - continue - job_id = job.get("id") - logs_endpoint = None - if job_id not in (None, ""): - logs_endpoint = f"repos/{repo}/actions/jobs/{job_id}/logs" - failed_jobs.append( - { - "run_id": run_id, - "workflow_name": run.get("name") or run.get("display_title") or "", - "run_status": run_status, - "run_conclusion": run_conclusion, - "job_id": job_id, - "job_name": str(job.get("name") or ""), - "status": str(job.get("status") or ""), - "conclusion": conclusion, - "html_url": str(job.get("html_url") or ""), - "logs_endpoint": logs_endpoint, - } - ) - failed_jobs.sort( - key=lambda item: ( - str(item.get("workflow_name") or ""), - str(item.get("job_name") or ""), - str(item.get("job_id") or ""), - ) - ) - return failed_jobs - - -def get_authenticated_login(): - data = gh_json(["api", "user"]) - if not isinstance(data, dict) or not data.get("login"): - raise GhCommandError("Unable to determine authenticated GitHub login from `gh api user`") - return str(data["login"]) - - -def comment_endpoints(repo, pr_number): - return { - "issue_comment": f"repos/{repo}/issues/{pr_number}/comments", - "review_comment": f"repos/{repo}/pulls/{pr_number}/comments", - "review": f"repos/{repo}/pulls/{pr_number}/reviews", - } - - -def gh_api_list_paginated(endpoint, repo=None, per_page=100): - items = [] - page = 1 - while True: - sep = "&" if "?" in endpoint else "?" - page_endpoint = f"{endpoint}{sep}per_page={per_page}&page={page}" - payload = gh_json(["api", page_endpoint], repo=repo) - if payload is None: - break - if not isinstance(payload, list): - raise GhCommandError(f"Unexpected paginated payload from gh api {endpoint}") - items.extend(payload) - if len(payload) < per_page: - break - page += 1 - return items - - -def normalize_issue_comments(items): - out = [] - for item in items: - if not isinstance(item, dict): - continue - out.append( - { - "kind": "issue_comment", - "id": str(item.get("id") or ""), - "author": extract_login(item.get("user")), - "author_association": str(item.get("author_association") or ""), - "created_at": str(item.get("created_at") or ""), - "body": str(item.get("body") or ""), - "path": None, - "line": None, - "url": str(item.get("html_url") or ""), - } - ) - return out - - -def normalize_review_comments(items): - out = [] - for item in items: - if not isinstance(item, dict): - continue - line = item.get("line") - if line is None: - line = item.get("original_line") - out.append( - { - "kind": "review_comment", - "id": str(item.get("id") or ""), - "author": extract_login(item.get("user")), - "author_association": str(item.get("author_association") or ""), - "created_at": str(item.get("created_at") or ""), - "body": str(item.get("body") or ""), - "path": item.get("path"), - "line": line, - "url": str(item.get("html_url") or ""), - } - ) - return out - - -def normalize_reviews(items): - out = [] - for item in items: - if not isinstance(item, dict): - continue - out.append( - { - "kind": "review", - "id": str(item.get("id") or ""), - "author": extract_login(item.get("user")), - "author_association": str(item.get("author_association") or ""), - "created_at": str(item.get("submitted_at") or item.get("created_at") or ""), - "body": str(item.get("body") or ""), - "path": None, - "line": None, - "url": str(item.get("html_url") or ""), - } - ) - return out - - -def extract_login(user_obj): - if isinstance(user_obj, dict): - return str(user_obj.get("login") or "") - return "" - - -def is_bot_login(login): - return bool(login) and login.endswith("[bot]") - - -def is_actionable_review_bot_login(login): - if not is_bot_login(login): - return False - lower_login = login.lower() - return any(keyword in lower_login for keyword in REVIEW_BOT_LOGIN_KEYWORDS) - - -def is_trusted_human_review_author(item, authenticated_login): - author = str(item.get("author") or "") - if not author: - return False - if authenticated_login and author == authenticated_login: - return True - association = str(item.get("author_association") or "").upper() - return association in TRUSTED_AUTHOR_ASSOCIATIONS - - -def fetch_new_review_items(pr, state, fresh_state, authenticated_login=None): - repo = pr["repo"] - pr_number = pr["number"] - endpoints = comment_endpoints(repo, pr_number) - - issue_payload = gh_api_list_paginated(endpoints["issue_comment"], repo=repo) - review_comment_payload = gh_api_list_paginated(endpoints["review_comment"], repo=repo) - review_payload = gh_api_list_paginated(endpoints["review"], repo=repo) - - issue_items = normalize_issue_comments(issue_payload) - review_comment_items = normalize_review_comments(review_comment_payload) - review_items = normalize_reviews(review_payload) - all_items = issue_items + review_comment_items + review_items - - seen_issue = {str(x) for x in state.get("seen_issue_comment_ids") or []} - seen_review_comment = {str(x) for x in state.get("seen_review_comment_ids") or []} - seen_review = {str(x) for x in state.get("seen_review_ids") or []} - - # On a brand-new state file, surface existing review activity instead of - # silently treating it as seen. This avoids missing already-pending review - # feedback when monitoring starts after comments were posted. - - new_items = [] - for item in all_items: - item_id = item.get("id") - if not item_id: - continue - author = item.get("author") or "" - if not author: - continue - if is_bot_login(author): - if not is_actionable_review_bot_login(author): - continue - elif not is_trusted_human_review_author(item, authenticated_login): - continue - - kind = item["kind"] - if kind == "issue_comment" and item_id in seen_issue: - continue - if kind == "review_comment" and item_id in seen_review_comment: - continue - if kind == "review" and item_id in seen_review: - continue - - new_items.append(item) - if kind == "issue_comment": - seen_issue.add(item_id) - elif kind == "review_comment": - seen_review_comment.add(item_id) - elif kind == "review": - seen_review.add(item_id) - - new_items.sort(key=lambda item: (item.get("created_at") or "", item.get("kind") or "", item.get("id") or "")) - state["seen_issue_comment_ids"] = sorted(seen_issue) - state["seen_review_comment_ids"] = sorted(seen_review_comment) - state["seen_review_ids"] = sorted(seen_review) - return new_items - - -def current_retry_count(state, head_sha): - retries = state.get("retries_by_sha") or {} - value = retries.get(head_sha, 0) - try: - return int(value) - except (TypeError, ValueError): - return 0 - - -def set_retry_count(state, head_sha, count): - retries = state.get("retries_by_sha") - if not isinstance(retries, dict): - retries = {} - retries[head_sha] = int(count) - state["retries_by_sha"] = retries - - -def unique_actions(actions): - out = [] - seen = set() - for action in actions: - if action not in seen: - out.append(action) - seen.add(action) - return out - - -def has_active_failed_job(failed_jobs): - return any(str(job.get("run_status") or "").lower() != "completed" for job in failed_jobs) - - -def is_pr_ready_to_merge(pr, checks_summary, new_review_items): - if pr["closed"] or pr["merged"]: - return False - if not checks_summary["all_terminal"]: - return False - if checks_summary["failed_count"] > 0 or checks_summary["pending_count"] > 0: - return False - if new_review_items: - return False - if str(pr.get("mergeable") or "") != "MERGEABLE": - return False - if str(pr.get("merge_state_status") or "") in MERGE_CONFLICT_OR_BLOCKING_STATES: - return False - if str(pr.get("review_decision") or "") in MERGE_BLOCKING_REVIEW_DECISIONS: - return False - return True - - -def recommend_actions(pr, checks_summary, failed_runs, failed_jobs, new_review_items, retries_used, max_retries): - actions = [] - if pr["closed"] or pr["merged"]: - if new_review_items: - actions.append("process_review_comment") - actions.append("stop_pr_closed") - return unique_actions(actions) - - if is_pr_ready_to_merge(pr, checks_summary, new_review_items): - actions.append("ready_to_merge") - return unique_actions(actions) - - if new_review_items: - actions.append("process_review_comment") - - has_failed_pr_checks = checks_summary["failed_count"] > 0 or has_active_failed_job(failed_jobs) - if has_failed_pr_checks: - if checks_summary["all_terminal"] and retries_used >= max_retries: - actions.append("stop_exhausted_retries") - else: - actions.append("diagnose_ci_failure") - if checks_summary["all_terminal"] and failed_runs and retries_used < max_retries: - actions.append("retry_failed_checks") - - if not actions: - actions.append("idle") - return unique_actions(actions) - - -def collect_snapshot(args): - pr = resolve_pr(args.pr, repo_override=args.repo) - state_path = Path(args.state_file) if args.state_file else default_state_file_for(pr) - state, fresh_state = load_state(state_path) - - if not state.get("started_at"): - state["started_at"] = int(time.time()) - - authenticated_login = get_authenticated_login() - new_review_items = fetch_new_review_items( - pr, - state, - fresh_state=fresh_state, - authenticated_login=authenticated_login, - ) - # Surface review feedback before drilling into CI and mergeability details. - # That keeps the babysitter responsive to new comments even when other - # actions are also available. - # `gh pr checks -R ` requires an explicit PR/branch/url argument. - # After resolving `--pr auto`, reuse the concrete PR number. - checks = get_pr_checks(str(pr["number"]), repo=pr["repo"]) - checks_summary = summarize_checks(checks) - workflow_runs = get_workflow_runs_for_sha(pr["repo"], pr["head_sha"]) - failed_runs = failed_runs_from_workflow_runs(workflow_runs, pr["head_sha"]) - failed_jobs = failed_jobs_from_workflow_runs(pr["repo"], workflow_runs, pr["head_sha"]) - - retries_used = current_retry_count(state, pr["head_sha"]) - actions = recommend_actions( - pr, - checks_summary, - failed_runs, - failed_jobs, - new_review_items, - retries_used, - args.max_flaky_retries, - ) - - state["pr"] = {"repo": pr["repo"], "number": pr["number"]} - state["last_seen_head_sha"] = pr["head_sha"] - state["last_snapshot_at"] = int(time.time()) - save_state(state_path, state) - - snapshot = { - "pr": pr, - "checks": checks_summary, - "failed_runs": failed_runs, - "failed_jobs": failed_jobs, - "new_review_items": new_review_items, - "actions": actions, - "retry_state": { - "current_sha_retries_used": retries_used, - "max_flaky_retries": args.max_flaky_retries, - }, - } - return snapshot, state_path - - -def retry_failed_now(args): - snapshot, state_path = collect_snapshot(args) - pr = snapshot["pr"] - checks_summary = snapshot["checks"] - failed_runs = snapshot["failed_runs"] - retries_used = snapshot["retry_state"]["current_sha_retries_used"] - max_retries = snapshot["retry_state"]["max_flaky_retries"] - - result = { - "snapshot": snapshot, - "state_file": str(state_path), - "rerun_attempted": False, - "rerun_count": 0, - "rerun_run_ids": [], - "reason": None, - } - - if pr["closed"] or pr["merged"]: - result["reason"] = "pr_closed" - return result - if checks_summary["failed_count"] <= 0: - result["reason"] = "no_failed_pr_checks" - return result - if not failed_runs: - result["reason"] = "no_failed_runs" - return result - if not checks_summary["all_terminal"]: - result["reason"] = "checks_still_pending" - return result - if retries_used >= max_retries: - result["reason"] = "retry_budget_exhausted" - return result - - for run in failed_runs: - run_id = run.get("run_id") - if run_id in (None, ""): - continue - gh_text(["run", "rerun", str(run_id), "--failed"], repo=pr["repo"]) - result["rerun_run_ids"].append(run_id) - - if result["rerun_run_ids"]: - state, _ = load_state(state_path) - new_count = current_retry_count(state, pr["head_sha"]) + 1 - set_retry_count(state, pr["head_sha"], new_count) - state["last_snapshot_at"] = int(time.time()) - save_state(state_path, state) - result["rerun_attempted"] = True - result["rerun_count"] = len(result["rerun_run_ids"]) - result["reason"] = "rerun_triggered" - else: - result["reason"] = "failed_runs_missing_ids" - - return result - - -def print_json(obj): - sys.stdout.write(json.dumps(obj, sort_keys=True) + "\n") - sys.stdout.flush() - - -def print_event(event, payload): - print_json({"event": event, "payload": payload}) - - -def is_ci_green(snapshot): - checks = snapshot.get("checks") or {} - return ( - bool(checks.get("all_terminal")) - and int(checks.get("failed_count") or 0) == 0 - and int(checks.get("pending_count") or 0) == 0 - ) - - -def snapshot_change_key(snapshot): - pr = snapshot.get("pr") or {} - checks = snapshot.get("checks") or {} - review_items = snapshot.get("new_review_items") or [] - return ( - str(pr.get("head_sha") or ""), - str(pr.get("state") or ""), - str(pr.get("mergeable") or ""), - str(pr.get("merge_state_status") or ""), - str(pr.get("review_decision") or ""), - int(checks.get("passed_count") or 0), - int(checks.get("failed_count") or 0), - int(checks.get("pending_count") or 0), - tuple( - (str(item.get("kind") or ""), str(item.get("id") or "")) - for item in review_items - if isinstance(item, dict) - ), - tuple(snapshot.get("actions") or []), - ) - - -def run_watch(args): - poll_seconds = args.poll_seconds - last_change_key = None - while True: - snapshot, state_path = collect_snapshot(args) - print_event( - "snapshot", - { - "snapshot": snapshot, - "state_file": str(state_path), - "next_poll_seconds": poll_seconds, - }, - ) - actions = set(snapshot.get("actions") or []) - if ( - "stop_pr_closed" in actions - or "stop_exhausted_retries" in actions - ): - print_event("stop", {"actions": snapshot.get("actions"), "pr": snapshot.get("pr")}) - return 0 - - current_change_key = snapshot_change_key(snapshot) - changed = current_change_key != last_change_key - green = is_ci_green(snapshot) - pr = snapshot.get("pr") or {} - pr_open = not bool(pr.get("closed")) and not bool(pr.get("merged")) - - if not green or pr_open: - poll_seconds = args.poll_seconds - elif changed or last_change_key is None: - poll_seconds = args.poll_seconds - - last_change_key = current_change_key - time.sleep(poll_seconds) - - -def main(): - args = parse_args() - try: - if args.retry_failed_now: - print_json(retry_failed_now(args)) - return 0 - if args.watch: - return run_watch(args) - snapshot, state_path = collect_snapshot(args) - snapshot["state_file"] = str(state_path) - print_json(snapshot) - return 0 - except (GhCommandError, RuntimeError, ValueError) as err: - sys.stderr.write(f"gh_pr_watch.py error: {err}\n") - return 1 - except KeyboardInterrupt: - sys.stderr.write("gh_pr_watch.py interrupted\n") - return 130 - - -if __name__ == "__main__": - raise SystemExit(main()) diff --git a/.codex/skills/babysit-pr/scripts/test_gh_pr_watch.py b/.codex/skills/babysit-pr/scripts/test_gh_pr_watch.py deleted file mode 100644 index ebbeab119c5..00000000000 --- a/.codex/skills/babysit-pr/scripts/test_gh_pr_watch.py +++ /dev/null @@ -1,261 +0,0 @@ -import argparse -import importlib.util -from pathlib import Path - -import pytest - - -MODULE_PATH = Path(__file__).with_name("gh_pr_watch.py") -MODULE_SPEC = importlib.util.spec_from_file_location("gh_pr_watch", MODULE_PATH) -gh_pr_watch = importlib.util.module_from_spec(MODULE_SPEC) -assert MODULE_SPEC.loader is not None -MODULE_SPEC.loader.exec_module(gh_pr_watch) - - -def sample_pr(): - return { - "number": 123, - "url": "https://github.com/openai/codex/pull/123", - "repo": "openai/codex", - "head_sha": "abc123", - "head_branch": "feature", - "state": "OPEN", - "merged": False, - "closed": False, - "mergeable": "MERGEABLE", - "merge_state_status": "CLEAN", - "review_decision": "", - } - - -def sample_checks(**overrides): - checks = { - "pending_count": 0, - "failed_count": 0, - "passed_count": 12, - "all_terminal": True, - } - checks.update(overrides) - return checks - - -def test_collect_snapshot_fetches_review_items_before_ci(monkeypatch, tmp_path): - call_order = [] - pr = sample_pr() - - monkeypatch.setattr(gh_pr_watch, "resolve_pr", lambda *args, **kwargs: pr) - monkeypatch.setattr(gh_pr_watch, "load_state", lambda path: ({}, True)) - monkeypatch.setattr( - gh_pr_watch, - "get_authenticated_login", - lambda: call_order.append("auth") or "octocat", - ) - monkeypatch.setattr( - gh_pr_watch, - "fetch_new_review_items", - lambda *args, **kwargs: call_order.append("review") or [], - ) - monkeypatch.setattr( - gh_pr_watch, - "get_pr_checks", - lambda *args, **kwargs: call_order.append("checks") or [], - ) - monkeypatch.setattr( - gh_pr_watch, - "summarize_checks", - lambda checks: call_order.append("summarize") or sample_checks(), - ) - monkeypatch.setattr( - gh_pr_watch, - "get_workflow_runs_for_sha", - lambda *args, **kwargs: call_order.append("workflow") or [], - ) - monkeypatch.setattr( - gh_pr_watch, - "failed_runs_from_workflow_runs", - lambda *args, **kwargs: call_order.append("failed_runs") or [], - ) - monkeypatch.setattr( - gh_pr_watch, - "failed_jobs_from_workflow_runs", - lambda *args, **kwargs: call_order.append("failed_jobs") or [], - ) - monkeypatch.setattr( - gh_pr_watch, - "recommend_actions", - lambda *args, **kwargs: call_order.append("recommend") or ["idle"], - ) - monkeypatch.setattr(gh_pr_watch, "save_state", lambda *args, **kwargs: None) - - args = argparse.Namespace( - pr="123", - repo=None, - state_file=str(tmp_path / "watcher-state.json"), - max_flaky_retries=3, - ) - - gh_pr_watch.collect_snapshot(args) - - assert call_order.index("review") < call_order.index("checks") - assert call_order.index("review") < call_order.index("workflow") - - -def test_recommend_actions_prioritizes_review_comments(): - actions = gh_pr_watch.recommend_actions( - sample_pr(), - sample_checks(failed_count=1), - [{"run_id": 99}], - [], - [{"kind": "review_comment", "id": "1"}], - 0, - 3, - ) - - assert actions == [ - "process_review_comment", - "diagnose_ci_failure", - "retry_failed_checks", - ] - - -def test_recommend_actions_ignores_stale_failed_jobs_from_completed_runs(): - actions = gh_pr_watch.recommend_actions( - sample_pr(), - sample_checks(), - [], - [ - { - "run_id": 99, - "run_status": "completed", - "run_conclusion": "failure", - "job_name": "unit tests", - "conclusion": "failure", - } - ], - [], - 3, - 3, - ) - - assert actions == ["ready_to_merge"] - - -def test_recommend_actions_diagnoses_failed_jobs_from_active_runs(): - actions = gh_pr_watch.recommend_actions( - sample_pr(), - sample_checks(all_terminal=False, pending_count=1), - [], - [ - { - "run_id": 99, - "run_status": "in_progress", - "run_conclusion": "", - "job_name": "unit tests", - "conclusion": "failure", - } - ], - [], - 0, - 3, - ) - - assert actions == ["diagnose_ci_failure"] - - -def test_run_watch_keeps_polling_open_ready_to_merge_pr(monkeypatch): - sleeps = [] - events = [] - snapshot = { - "pr": sample_pr(), - "checks": sample_checks(), - "failed_runs": [], - "failed_jobs": [], - "new_review_items": [], - "actions": ["ready_to_merge"], - "retry_state": { - "current_sha_retries_used": 0, - "max_flaky_retries": 3, - }, - } - - monkeypatch.setattr( - gh_pr_watch, - "collect_snapshot", - lambda args: (snapshot, Path("/tmp/codex-babysit-pr-state.json")), - ) - monkeypatch.setattr( - gh_pr_watch, - "print_event", - lambda event, payload: events.append((event, payload)), - ) - - class StopWatch(Exception): - pass - - def fake_sleep(seconds): - sleeps.append(seconds) - if len(sleeps) >= 2: - raise StopWatch - - monkeypatch.setattr(gh_pr_watch.time, "sleep", fake_sleep) - - with pytest.raises(StopWatch): - gh_pr_watch.run_watch(argparse.Namespace(poll_seconds=30)) - - assert sleeps == [30, 30] - assert [event for event, _ in events] == ["snapshot", "snapshot"] - - -def test_failed_jobs_include_direct_logs_endpoint(monkeypatch): - jobs_by_run = { - 99: [ - { - "id": 555, - "name": "unit tests", - "status": "completed", - "conclusion": "failure", - "html_url": "https://github.com/openai/codex/actions/runs/99/job/555", - }, - { - "id": 556, - "name": "lint", - "status": "completed", - "conclusion": "success", - }, - ] - } - - monkeypatch.setattr( - gh_pr_watch, - "get_jobs_for_run", - lambda repo, run_id: jobs_by_run[run_id], - ) - - failed_jobs = gh_pr_watch.failed_jobs_from_workflow_runs( - "openai/codex", - [ - { - "id": 99, - "name": "CI", - "status": "in_progress", - "conclusion": "", - "head_sha": "abc123", - } - ], - "abc123", - ) - - assert failed_jobs == [ - { - "run_id": 99, - "workflow_name": "CI", - "run_status": "in_progress", - "run_conclusion": "", - "job_id": 555, - "job_name": "unit tests", - "status": "completed", - "conclusion": "failure", - "html_url": "https://github.com/openai/codex/actions/runs/99/job/555", - "logs_endpoint": "repos/openai/codex/actions/jobs/555/logs", - } - ] diff --git a/.github/github.json b/.github/github.json index dfacb3bf366..fe5327b58d1 100644 --- a/.github/github.json +++ b/.github/github.json @@ -74,6 +74,41 @@ "deployLabels": [], "healthUrls": [], "relatedRepos": ["code-everywhere"], + "prWorkflow": { + "mergePolicy": { + "requireExplicitUserApproval": true, + "readyStateIsMergeIntent": false, + "fixTrainSettling": "Auto-review and related fixes can arrive after CI turns green; treat green and mergeable PRs as ready for a merge decision, not automatic merge intent." + }, + "watchSkill": "babysit-pr" + }, + "release": { + "intent": { + "kind": "package-version", + "file": "codex-cli/package.json", + "workflow": "Release Intent" + }, + "metadataFiles": [ + "codex-cli/package.json", + "CHANGELOG.md", + "docs/release-notes/RELEASE_NOTES.md" + ], + "batchingPolicy": "Prefer batching small dogfood fixes after a fix train settles instead of cutting one release per PR.", + "cutImmediatelyWhen": [ + "dogfood-blocking regression", + "data-loss or security fix", + "broken install/update/release path", + "explicit user release request" + ], + "deferWhen": [ + "docs-only", + "test-only", + "CI-only", + "cursor/metadata-only", + "ordinary cleanup", + "related fixes still actively landing" + ] + }, "githubSignals": { "postMerge": { "waitForActions": true, diff --git a/AGENTS.md b/AGENTS.md index 0253697f251..8896115d48e 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -2,7 +2,8 @@ Repo workflow metadata lives in `.github/github.json`; keep that file aligned with branch roles, validation gates, GitHub signal capabilities, -workflow names, docs routing, and local cleanup policy when those facts change. +workflow names, PR/release policy, docs routing, and local cleanup policy when +those facts change. Every Code is the product in this repository; `code` is the command users type. Use **Every Code** for the product name in prose, docs, UI copy, issue text, @@ -85,6 +86,9 @@ Examples: ## Upstream Import Workflow - `main` is the Every Code product branch and the GitHub default branch. +- PR readiness is not merge intent. Use the shared `babysit-pr`/GitHub skills + and `.github/github.json` PR workflow metadata when watching fix trains, + auto-review lag, CI, review comments, and merge readiness. - Use `just local-code-rebuild` to rebuild the current branch into the PATH-resolved binary. - After `./build-fast.sh`, run `just local-code-rebuild` again before release smoke checks; the fast build validates dev-fast artifacts, while the rebuild recipe owns the PATH-resolved release binary and embeds the package version. - During active local work, run diff --git a/docs/upstream-import-policy.md b/docs/upstream-import-policy.md index 90d9d1175fa..3af89d4695a 100644 --- a/docs/upstream-import-policy.md +++ b/docs/upstream-import-policy.md @@ -151,20 +151,23 @@ healthy. ## Release Cadence -Cut an Every Code release after every successful upstream import or local hotfix -that should be installed by dogfood users. The active release workflow file is -`.github/workflows/release.yml`, and GitHub displays it as `Release Intent`. -That name is intentional: the workflow runs after relevant `main` pushes, first -decides whether the committed package version represents a new release, and only -then publishes GitHub Release assets. - -Prepare release metadata locally with the Every Code harness before the publish -run: the local command bumps `codex-cli/package.json`, updates `CHANGELOG.md`, -and writes `docs/release-notes/RELEASE_NOTES.md`. The workflow uses the -committed `codex-cli/package.json` version as release intent. If that version -already has a tag, the workflow exits successfully as a no-op; if the tag does -not exist, it validates the metadata and publishes exactly that committed -version instead of generating fallback notes in CI. The full preflight, +Every Code releases are intentional dogfood distribution events, not an +automatic follow-up to every upstream import, local fix, or merged PR. Use the +release policy in `.github/github.json` to decide whether to cut immediately or +defer until nearby release-worthy fixes settle into one batch. + +The active release workflow file is `.github/workflows/release.yml`, and GitHub +displays it as `Release Intent`. That name is intentional: the workflow runs +after relevant `main` pushes, first decides whether the committed package +version represents a new release, and only then publishes GitHub Release assets. + +Prepare release metadata locally with the Every Code harness only when cutting a +release intentionally: the local command bumps `codex-cli/package.json`, updates +`CHANGELOG.md`, and writes `docs/release-notes/RELEASE_NOTES.md`. For this repo, +the committed `codex-cli/package.json` version is release intent. If that +version already has a tag, the workflow exits successfully as a no-op; if the +tag does not exist, it validates the metadata and publishes exactly that +committed version instead of generating fallback notes in CI. The full preflight, macOS/Linux release matrix, and Windows asset build run only on the publish pass after the metadata PR has merged.