Fix cursor-cli E2E flakiness and deferred condensation bug by alishakawaguchi · Pull Request #654 · entireio/cli

alishakawaguchi · 2026-03-07T00:56:10Z

Summary

Two targeted fixes discovered during E2E triage:

Cursor CLI E2E: isolate config dir and fix wait pattern — Parallel cursor-cli E2E tests raced on the shared ~/.config/cursor/cli-config.json file, causing ENOENT errors. Each session now gets an isolated XDG_CONFIG_HOME with a pre-seeded cli-config.json. Also switches WaitFor from the / commands prompt pattern (always visible in the status bar, even during "Thinking") to the Add a follow-up text that only appears after the agent finishes.
Fix deferred condensation when transcript is empty at commit time — When Cursor commits mid-turn before flushing its transcript, condensation silently skipped creating the checkpoint. The checkpoint ID was not recorded, so finalizeAllTurnCheckpoints at stop time had nothing to retry. Now: (1) condenseAttempted flag ensures the checkpoint ID is always appended to TurnCheckpointIDs even on failure, and (2) finalizeAllTurnCheckpoints catches ErrCheckpointNotFound and falls back to CondenseSession with the now-available transcript.

Test plan

New tests: TestPostCommit_ActiveSession_RecordsTurnCheckpointIDs_OnCondensationFailure, TestFinalizeAllTurnCheckpoints_DeferredCondensation, TestFinalizeAllTurnCheckpoints_DeferredCondensation_TranscriptStillEmpty
mise run test:ci passes
Cursor CLI E2E tests pass without ENOENT races

🤖 Generated with Claude Code

…script Automates E2E failure triage with three new components: - scripts/download-e2e-artifacts.sh: reusable script to download CI artifacts - .claude/skills/e2e-triage/SKILL.md: 7-step triage skill (classify flaky vs real bug, create PRs or issues) - .github/workflows/e2e-triage.yml: workflow_run trigger that auto-runs Claude Opus on E2E failure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 1aa72dcd8a2b

Post "Claude is triaging..." when triage starts and a structured summary with PR/issue links when it completes. The skill now writes triage-summary.json which the workflow parses with jq for the Slack message. Falls back to a warning if no summary is produced. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 8e5dcc6ef8ab

…ifications - Build Slack payload via jq (payload-file-path) instead of interpolating raw text into inline JSON, which broke on quotes/newlines in summaries - Add secrets.E2E_SLACK_WEBHOOK_URL guard to "Build Slack summary" and "Notify Slack - triage complete" steps (matching the "started" step) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 7c1914052967

Rewrite SKILL.md with dual-mode support (auto-detected via WORKFLOW_RUN_ID env var): local mode runs tests with mise and re-runs failures up to 3 times, CI mode triggers e2e-isolated.yml workflows for re-run verification. Classification now uses re-run results as the primary signal (all fail = real-bug, mixed results = flaky). Workflow changes: actions permission upgraded to write for gh workflow run, timeout increased to 60m for re-run polling, Claude prompt updated with CI mode hint and re-run instructions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 9f75c3effd9b

Local mode now presents findings interactively and applies fixes directly in the working tree instead of creating branches/PRs/issues: - Step 4a: findings report, proposed fixes, user approval gate, in-place fixes - Step 4b: unchanged CI behavior (batched PR for flaky, issues for real bugs) - Step 5: local mode gets simpler summary table, no triage-summary.json Entire-Checkpoint: 4e1d9cf59d52

Consistent test failures can be test infrastructure bugs (e2e/ code), not product bugs (cmd/entire/cli/). Update classification signals, fix lists, and action sections to distinguish the two. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 28c90fcc7266

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: ba6877944a6c

Replace duplicated artifact-reading steps in e2e-triage Step 1 with a reference to debug-e2e's Debugging Workflow (steps 2-5), keeping the collect list so classification inputs remain clear. Add Related Skills section to README. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: eb14496bde1e

Entire-Checkpoint: bb778fbab533

…d GitHub issues - Remove workflow_run trigger from e2e-triage.yml (now dispatch-only) - Remove issues permission and gh issue commands from CI mode - Replace real-bug GitHub issues with structured CI log reports - Add triage link to Slack failure notification in e2e.yml - Update skill docs and README to reflect new behavior Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 4725d29fd8ff

Remove CI mode (branch creation, PRs, triage-summary.json, CI re-runs via gh workflow run) from the e2e-triage skill while preserving local debugging of CI failures (downloading artifacts, analyzing them, running tests locally). Also removes the e2e-triage.yml workflow and the triage link from the E2E Slack failure notification. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: c8ddf0fdb9df

Teach Step L1 to accept CI run references (latest, run ID, run URL) and use scripts/download-e2e-artifacts.sh to fetch artifacts, skipping local re-runs and jumping straight to shared analysis. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: fb0c122d807e

Skip re-downloading when the artifact directory already exists and is non-empty, printing a log message instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 5f75d4661043

…ge skill Pre-create ~/.config/cursor/ in Bootstrap() so the cursor CLI doesn't crash with ENOENT when writing cli-config.json after accepting workspace trust on CI. Follows the same pattern used by Claude, Gemini, and Droid agents. Update e2e-triage skill to require running real E2E tests after applying fixes, scoped by change type: agent-specific → that agent's full suite, shared infra → all affected agents, prompt-only → just the affected test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: fa1e4e3fb457

Split the monolithic e2e-triage skill into three focused commands (triage-ci, debug, implement) following the agent-integration plugin pattern. Triage-ci is report-only, implement is action-only, and the /e2e orchestrator runs both sequentially. - Create .claude/plugins/e2e/ with plugin.json and command wrappers - Create .claude/skills/e2e/ with orchestrator SKILL.md and 3 procedures - Delete old .claude/skills/e2e-triage/ and .claude/skills/debug-e2e/ - Update all /debug-e2e references to /e2e:debug in agent-integration Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 2b582392c6b6

cursor · 2026-03-07T00:56:14Z

PR Summary

Medium Risk
Touches git hook/session checkpointing logic and introduces a new retry path on turn finalization, which could affect when/if checkpoints are created; changes are localized and covered by new tests.

Overview
Improves Cursor CLI E2E stability by running each tmux session with its own XDG_CONFIG_HOME (pre-seeded cli-config.json) to avoid parallel config races, and by waiting for a post-completion UI marker (Add a follow-up) instead of the always-visible prompt pattern.

Fixes a deferred-condensation edge case in manual commit hooks: PostCommit now records TurnCheckpointIDs when condensation is attempted (even if it fails), and finalizeAllTurnCheckpoints now retries missing checkpoints by calling CondenseSession on ErrCheckpointNotFound.

Adds focused tests covering checkpoint ID recording on condensation failure and deferred checkpoint creation (including the “still empty transcript” no-crash path).

^{Written by Cursor Bugbot for commit 6bbb43e. Configure here.}

Copilot

Pull request overview

Adds tooling and documentation to improve debugging/triage of flaky E2E runs (especially Cursor), including a helper script for downloading CI artifacts and new .claude skill/plugin docs for a triage→fix workflow.

Changes:

Add scripts/download-e2e-artifacts.sh to fetch and normalize GitHub Actions E2E artifacts locally.
Update Cursor E2E agent bootstrap to pre-create the Cursor config directory to avoid runtime ENOENT failures.
Add/update E2E triage/debug/implement skill + plugin documentation and refresh E2E README guidance.

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
scripts/download-e2e-artifacts.sh	New helper script to locate a run (latest/ID/URL), download artifacts, flatten wrapper dirs, and write `.run-info.json`.
e2e/agents/cursor_cli.go	Create Cursor config directory during bootstrap to reduce flaky failures.
e2e/README.md	Document local artifact download + new triage workflow references.
cmd/entire/cli/strategy/common_test.go	Minor formatting/alignment in tests.
.claude/skills/e2e/triage-ci.md	New CI triage procedure doc (download artifacts or rerun locally, classify flaky vs real-bug).
.claude/skills/e2e/implement.md	New procedure doc for applying fixes and verifying via scoped E2E runs.
.claude/skills/e2e/debug.md	Update/normalize debug procedure doc formatting.
.claude/skills/e2e/SKILL.md	Add orchestrator skill definition for the E2E triage→implement pipeline.
.claude/skills/agent-integration/*.md	Update references to use `/e2e:debug` instead of the old command name.
.claude/plugins/e2e/**	Add local plugin command wrappers + plugin metadata and README.

scripts/download-e2e-artifacts.sh

Copilot · 2026-03-07T01:00:07Z

scripts/download-e2e-artifacts.sh

+else
+  mkdir -p "$dest"
+  log "Downloading artifacts to $dest/ ..."
+  gh run download "$run_id" --dir "$dest" 2>&1 >&2 || die "Failed to download artifacts. They may have expired (retention: 7 days)."


The redirection 2>&1 >&2 doesn’t reliably enforce the contract that only the final absolute path is written to stdout; depending on evaluation order it can still leak output to stdout. Redirect the command’s stdout to stderr explicitly (and keep stderr on stderr) so callers can safely capture stdout.

Suggested change

gh run download "$run_id" --dir "$dest" 2>&1 >&2 || die "Failed to download artifacts. They may have expired (retention: 7 days)."

gh run download "$run_id" --dir "$dest" 1>&2 || die "Failed to download artifacts. They may have expired (retention: 7 days)."

scripts/download-e2e-artifacts.sh

Copilot · 2026-03-07T01:00:07Z

e2e/agents/cursor_cli.go

+	home, err := os.UserHomeDir()
+	if err != nil {
+		return fmt.Errorf("get home dir: %w", err)
+	}
+	dir := filepath.Join(home, ".config", "cursor")


This hard-codes the Linux-style config path ~/.config/cursor. To keep local E2E runs working across OSes (and respect XDG_CONFIG_HOME), prefer deriving the base config dir with os.UserConfigDir() and then appending cursor.

Suggested change

home, err := os.UserHomeDir()

if err != nil {

return fmt.Errorf("get home dir: %w", err)

}

dir := filepath.Join(home, ".config", "cursor")

cfgDir, err := os.UserConfigDir()

if err != nil {

return fmt.Errorf("get user config dir: %w", err)

}

dir := filepath.Join(cfgDir, "cursor")

e2e/README.md

Copilot · 2026-03-07T01:00:08Z

e2e/README.md


 - **`.github/workflows/e2e.yml`** — Runs full suite on push to main. Matrix: `[claude, opencode, gemini]`.
 - **`.github/workflows/e2e-isolated.yml`** — Manual dispatch for debugging a single test. Inputs: agent + test name filter.
+- **`.github/workflows/e2e-triage.yml`** — Auto-triggers on E2E failure via `workflow_run`. Runs Claude Code (Opus) to download artifacts, classify failures, and create PRs (flaky) or issues (real bugs).


The README lists .github/workflows/e2e-triage.yml, but there’s no such workflow file in .github/workflows/ in this branch. Either add the workflow in this PR or remove/adjust this bullet so the documentation doesn’t point at a non-existent file.

Suggested change

- **`.github/workflows/e2e-triage.yml`** — Auto-triggers on E2E failure via `workflow_run`. Runs Claude Code (Opus) to download artifacts, classify failures, and create PRs (flaky) or issues (real bugs).

scripts/download-e2e-artifacts.sh

Cursor's atomic config write (cli-config.json.tmp → cli-config.json) races when parallel tests trigger "Workspace Trust Required" simultaneously. Pre-seeding the file with {} in Bootstrap() avoids the temp-file rename path entirely. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 47c9b5f45145

The previous pre-seed fix wasn't sufficient because cursor's write-after-trust-acceptance still uses atomic temp-file rename on the shared ~/.config/cursor/cli-config.json. With 43 parallel tests, multiple cursor processes race on the same .tmp file. Fix: give each tmux session its own XDG_CONFIG_HOME + HOME pointing to an isolated temp directory with a pre-seeded cli-config.json, following the same pattern Claude Code uses with CLAUDE_CONFIG_DIR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: df802d707cb9

- Add per-session XDG_CONFIG_HOME with pre-seeded cli-config.json to prevent ENOENT race on parallel tests (keep real HOME for auth) - Wait for "Add a follow-up" instead of PromptPattern() to avoid premature WaitFor settling during Thinking phase - Clean up temp config dir on session close Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 121daaca3e20

When Cursor commits mid-turn before flushing its transcript, condensation fails silently and the checkpoint ID is never recorded. This means the commit trailer points to nonexistent checkpoint metadata. Fix by separating intent from execution: record checkpoint IDs on condensation *attempt* (not just success), and fall back to deferred CondenseSession at stop time when UpdateCommitted returns ErrCheckpointNotFound. No behavior change for agents that flush transcripts before committing (Claude Code, Gemini, etc.). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: fb5a3e2fb741

Creates a local marketplace at .claude/plugins/ wrapping both the e2e and agent-integration plugins, and registers it via extraKnownMarketplaces in project settings so plugin subcommands (/e2e:debug, /agent-integration:research, etc.) are discoverable by Claude Code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 13d3f0b6de12

- Standardize on `entire-logs/entire.log` in debug.md diagnostic table - Update triage-ci command description to mention CI artifact support Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: d9972bc12f95

alishakawaguchi · 2026-03-09T23:24:59Z

@cursor review

cursor

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

claude · 2026-03-10T00:05:11Z

cmd/entire/cli/strategy/manual_commit_hooks.go

+			if errors.Is(updateErr, checkpoint.ErrCheckpointNotFound) {
+				// Checkpoint wasn't created at post-commit time (e.g., transcript was empty).
+				// Now that the full transcript is available, create it via CondenseSession.
+				_, condenseErr := s.CondenseSession(ctx, repo, cpID, state, nil)
+				if condenseErr != nil {
+					logging.Warn(logCtx, "finalize: deferred condensation failed",
+						slog.String("checkpoint_id", cpIDStr),
+						slog.String("session_id", state.SessionID),
+						slog.String("error", condenseErr.Error()),
+					)
+					errCount++
+				} else {
+					logging.Info(logCtx, "finalize: deferred checkpoint created",
+						slog.String("checkpoint_id", cpIDStr),
+						slog.String("session_id", state.SessionID),
+					)
+				}
+				continue


🟡 When deferred condensation succeeds via CondenseSession at line 2247, the result is discarded (_) and no session state is updated afterward — unlike the normal path through condenseAndUpdateState which updates BaseCommit, StepCount, CheckpointTranscriptStart, LastCheckpointID, clears FilesTouched, and schedules shadow branch deletion. Additionally, nil is passed for committedFiles, causing the 1:1 file filtering (lines 158-177 of manual_commit_condensation.go) to be skipped entirely, so the deferred checkpoint will have ALL session files in FilesTouched instead of just the files from the specific commit. These are minor metadata quality issues in a rare edge-case recovery path, but could theoretically cause duplicate condensation if a manual commit occurs before a new turn starts.

Extended reasoning...

What the bug is

The deferred condensation path in finalizeAllTurnCheckpoints (lines 2244-2261) has two related deficiencies:

Nil committedFiles: CondenseSession is called with nil for committedFiles (line 2247). The filtering logic at lines 158-177 of manual_commit_condensation.go checks len(committedFiles) > 0, so with nil the entire filter block is skipped. The resulting checkpoint will have ALL files from the session transcript in FilesTouched, not just the files from the specific commit linked via the Entire-Checkpoint trailer.

No state updates after success: When CondenseSession succeeds, only a log message is emitted. The CondenseResult is discarded (_). Compare with condenseAndUpdateState (lines 1117-1134) which updates: state.BaseCommit, state.StepCount, state.CheckpointTranscriptStart, state.LastCheckpointID, clears state.FilesTouched, and schedules shadow branch deletion.

The specific code path

This fires when: (1) Cursor commits mid-turn before flushing the transcript, (2) PostCommit condensation fails (empty transcript), (3) the checkpoint ID is recorded in TurnCheckpointIDs via the new condenseAttempted flag, (4) at HandleTurnEnd, finalizeAllTurnCheckpoints tries UpdateCommitted which returns ErrCheckpointNotFound, and (5) falls back to CondenseSession.

Step-by-step proof for the multi-commit scenario (bug_008)

PostCommit C1: Condensation fails (empty transcript). condenseAttempted=true, condensed=false. state.BaseCommit stays at B0. Shadow branch at B0 is preserved (uncondensedActiveOnBranch prevents deletion). TurnCheckpointIDs=[cp1].

PostCommit C2 (same turn): state.BaseCommit still B0, shadow at B0 resolves correctly. Condensation succeeds. condenseAndUpdateState updates state.BaseCommit=C2, state.CheckpointTranscriptStart=totalLines, state.FilesTouched=nil. Shadow at B0 added to shadowBranchesToDelete and deleted. TurnCheckpointIDs=[cp1, cp2].

HandleTurnEnd: finalizeAllTurnCheckpoints runs. For cp2, UpdateCommitted succeeds. For cp1, gets ErrCheckpointNotFound. Falls back to CondenseSession(ctx, repo, cp1, state, nil) — but state has been mutated: BaseCommit=C2, CheckpointTranscriptStart points past C2. getShadowBranchNameForCommit(C2, ...) looks for the wrong shadow branch. The checkpoint cp1 is created with data from the current (mutated) state, not the state at commit C1 time.

Addressing the refutation

One verifier refuted the state update issue, arguing: (a) the comment at lines 2277-2281 says "Do NOT update CheckpointTranscriptStart here", (b) shadow branches are not orphaned because PostCommit didnt update BaseCommit, (c) its best-effort by design. However, these arguments assume the normal deferred path where PostCommit condensation failed for ALL checkpoints in the turn. In the multi-commit scenario where a later condensation succeeds, condenseAndUpdateState DOES update BaseCommit and DOES delete the shadow branch. The comment is correct that updating CheckpointTranscriptStart would break carry-forward in the normal case, but the deferred path should at minimum handle the case where the CondenseResult is used (not discarded) so TotalTranscriptLines is available if needed. The "best-effort" framing is fair — creating any checkpoint is better than none — but the metadata imprecision (wrong FilesTouched, potentially wrong file state in multi-commit turns) is worth noting.

Impact

The practical impact is limited because: (1) this is a recovery path for an already-rare edge case (Cursor committing before transcript flush), (2) the core transcript data in the checkpoint is correct, (3) the FilesTouched imprecision is metadata quality rather than functional breakage, and (4) the multi-commit scenario (first condensation fails, later succeeds) is extremely narrow. The worst case is a checkpoint with a superset of files in FilesTouched and potentially stale CheckpointTranscriptStart that could cause sessionHasNewContent to trigger redundant condensation on a subsequent commit.

How to fix

For the nil committedFiles: resolve the committed files from the checkpoint IDs commit (via the Entire-Checkpoint trailer) using filesChangedInCommit. For the missing state updates: after successful deferred condensation, update at minimum CheckpointTranscriptStart from the CondenseResult.TotalTranscriptLines, and schedule the shadow branch for cleanup. For the multi-commit scenario, the proper fix would require storing per-checkpoint metadata (BaseCommit, committedFiles) alongside TurnCheckpointIDs at record time, which is a more significant design change.

claude · 2026-03-10T00:05:11Z

e2e/agents/cursor_cli.go

+		return "", fmt.Errorf("create temp config dir: %w", err)
+	}
+	cursorDir := filepath.Join(tmp, "cursor")
+	if err := os.MkdirAll(cursorDir, 0o755); err != nil {
+		return "", fmt.Errorf("mkdir cursor config: %w", err)
+	}
+	if err := os.WriteFile(filepath.Join(cursorDir, "cli-config.json"), []byte("{}"), 0o644); err != nil {
+		return "", fmt.Errorf("seed cli-config.json: %w", err)
+	}
+	return tmp, nil


🟡 Nit: cursorConfigDir() leaks the temp directory created by os.MkdirTemp if the subsequent os.MkdirAll (line 85) or os.WriteFile (line 88) fails. Adding os.RemoveAll(tmp) before the error returns on lines 86 and 89 would fix this.

Extended reasoning...

What the bug is

The cursorConfigDir() function at e2e/agents/cursor_cli.go:79-92 creates a temporary directory via os.MkdirTemp on line 80, then performs two subsequent operations: os.MkdirAll to create a cursor subdirectory (line 85) and os.WriteFile to seed a config file (line 88). If either of these operations fails, the function returns an error without cleaning up the tmp directory.

Why existing code doesn't prevent it

The caller in startInteractiveSession only receives the configDir path on success and cleans it up via os.RemoveAll(configDir) in two places: the error path after NewTmuxSession fails, and via s.OnClose() on success. But when cursorConfigDir() itself returns an error, the caller never receives the path, so no cleanup occurs.

Step-by-step proof

os.MkdirTemp("", "cursor-e2e-config-*") succeeds, creating e.g. /tmp/cursor-e2e-config-123456

os.MkdirAll(filepath.Join(tmp, "cursor"), 0o755) fails (e.g., filesystem full, permissions issue)

Function returns ("", error) — the caller gets no path to clean up

/tmp/cursor-e2e-config-123456 remains on disk with no reference to it

Impact

This is very minor in practice. This is E2E test infrastructure code, the failure scenario requires the filesystem to fail between two closely-spaced operations, the leaked directory is tiny, and the OS cleans /tmp periodically. However, it is new code introduced in this PR and the fix is trivial.

How to fix

Add os.RemoveAll(tmp) before each error return after the MkdirTemp call:

if err := os.MkdirAll(cursorDir, 0o755); err != nil { os.RemoveAll(tmp) return "", fmt.Errorf("mkdir cursor config: %w", err) } if err := os.WriteFile(filepath.Join(cursorDir, "cli-config.json"), []byte("{}"), 0o644); err != nil { os.RemoveAll(tmp) return "", fmt.Errorf("seed cli-config.json: %w", err) }

alishakawaguchi and others added 17 commits March 6, 2026 11:42

Add README to E2E triage skill

6bdf24d

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: ba6877944a6c

For testing purposes only

39d1d8f

For testing purposes only

4f54947

Fix allowed tools

270d48d

Entire-Checkpoint: bb778fbab533

Cache downloaded CI artifacts to avoid redundant downloads

790a8f2

Skip re-downloading when the artifact directory already exists and is non-empty, printing a log message instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 5f75d4661043

alishakawaguchi self-assigned this Mar 7, 2026

Copilot AI review requested due to automatic review settings March 7, 2026 00:56

Copilot started reviewing on behalf of alishakawaguchi March 7, 2026 00:56 View session

Copilot AI reviewed Mar 7, 2026

View reviewed changes

cursor bot reviewed Mar 7, 2026

View reviewed changes

scripts/download-e2e-artifacts.sh Outdated Show resolved Hide resolved

alishakawaguchi and others added 7 commits March 6, 2026 17:09

Merge branch 'main' into alisha/e2e-triage-local-only

f20a222

revert e2e changes

e90e3f9

alishakawaguchi and others added 5 commits March 9, 2026 12:28

update triage skill

a62358f

remove everything but cursor cli E2E test fix

e5ba87c

remove everything but cursor cli E2E test fix

3368cbd

Merge branch 'main' into alisha/e2e-triage-local-only

6bbb43e

alishakawaguchi changed the title ~~WIP - Fix cursor E2E flaky tests~~ Fix cursor-cli E2E flakiness and deferred condensation bug Mar 9, 2026

cursor bot reviewed Mar 9, 2026

View reviewed changes

alishakawaguchi marked this pull request as ready for review March 9, 2026 23:31

alishakawaguchi requested a review from a team as a code owner March 9, 2026 23:31

claude bot reviewed Mar 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix cursor-cli E2E flakiness and deferred condensation bug#654

Fix cursor-cli E2E flakiness and deferred condensation bug#654
alishakawaguchi wants to merge 29 commits intomainfrom
alisha/e2e-triage-local-only

alishakawaguchi commented Mar 7, 2026 •

edited

Loading

Uh oh!

cursor bot commented Mar 7, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Mar 7, 2026

Uh oh!

Uh oh!

Copilot AI Mar 7, 2026

Uh oh!

Uh oh!

Copilot AI Mar 7, 2026

Uh oh!

Uh oh!

alishakawaguchi commented Mar 9, 2026

Uh oh!

cursor bot left a comment

Uh oh!

claude bot Mar 10, 2026

Uh oh!

claude bot Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

	gh run download "$run_id" --dir "$dest" 2>&1 >&2 \|\| die "Failed to download artifacts. They may have expired (retention: 7 days)."
	gh run download "$run_id" --dir "$dest" 1>&2 \|\| die "Failed to download artifacts. They may have expired (retention: 7 days)."

Conversation

alishakawaguchi commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

cursor bot commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alishakawaguchi commented Mar 9, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 10, 2026

Choose a reason for hiding this comment

What the bug is

The specific code path

Step-by-step proof for the multi-commit scenario (bug_008)

Addressing the refutation

Impact

How to fix

Uh oh!

claude bot Mar 10, 2026

Choose a reason for hiding this comment

What the bug is

Why existing code doesn't prevent it

Step-by-step proof

Impact

How to fix

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

alishakawaguchi commented Mar 7, 2026 •

edited

Loading

cursor bot commented Mar 7, 2026 •

edited

Loading