Skip to content

fix: prevent agent loop stall on write_to_file filesystem errors (#703)#727

Open
awschmeder wants to merge 2 commits into
Zoo-Code-Org:mainfrom
awschmeder:fix/write-to-file-erofs-stall
Open

fix: prevent agent loop stall on write_to_file filesystem errors (#703)#727
awschmeder wants to merge 2 commits into
Zoo-Code-Org:mainfrom
awschmeder:fix/write-to-file-erofs-stall

Conversation

@awschmeder

@awschmeder awschmeder commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Related GitHub Issue

Closes: #703

Description

The bug (#703): When the model uses write_to_file to create a file whose parent directory cannot be created or written (e.g. a read-only mount or a path under a directory lacking write permission, producing EROFS/EACCES), the agent loop stalled permanently.

handlePartial called createDirectoriesForFile without a .catch() guard. During streaming (block.partial === true) the unguarded throw was caught by BaseTool.handle -> handleError, but the partial-block advancement gate in presentAssistantMessage was skipped (block.partial was still true and neither didRejectTool nor didAlreadyUseTool was set), so userMessageContentReady was never set and the task hung forever.

The fix:

  • Removed the redundant createDirectoriesForFile call from handlePartial (execute() already creates parent directories before diffViewProvider.open()).
  • Moved createDirectoriesForFile in execute() inside the try block so EROFS/EACCES errors route through handleError with diffViewProvider.reset() cleanup and consecutive-mistake counting instead of escaping unhandled.

Correct streaming UI behavior under a filesystem error. Because the failure now occurs while a partial write_to_file block is still streaming, the implementation is designed so the in-progress UI resolves cleanly rather than leaving artifacts. The following are properties of the implementation:

  • The streaming "Zoo wants to edit this file" element does not get stuck in a loading state. Task.finalizePartialToolAsk() (a non-blocking helper that sets the last partial tool ask to partial: false) dismisses it. It deliberately avoids task.ask(..., false), which would fall through to a blocking pWaitFor that would result in a non-terminating spinner left in the chat UI.
  • The error is reported exactly once. handlePartial finalizes the partial ask, resets the diff view, and swallows the streaming error (logged to the console) so only the authoritative non-partial execute() path surfaces it to the user. This is safe because the loop advancement gate does not depend on the streaming throw -- it advances when the non-partial block arrives.
  • The tool element is not duplicated across streaming deltas. A partialStreamFailed flag (cleared between invocations via resetPartialState()) short-circuits later deltas so they do not re-attempt open()/update() and spawn a new partial tool message each time.

Test Procedure

Automated (run from src/):

cd src && npx vitest run core/tools/__tests__/writeToFileTool.spec.ts

31 tests pass, including new regression tests that assert:

  • EROFS in handlePartial does not call createDirectoriesForFile and does not stall.
  • EROFS/EACCES in execute() routes through handleError with diffViewProvider.reset() and does not proceed to open/save.
  • When handlePartial open()/update() throws, finalizePartialToolAsk() and diffViewProvider.reset() are called and the streaming error is swallowed (not surfaced via handleError).
  • A single filesystem failure is reported exactly once across the streaming and execute() phases.
  • Repeated streaming deltas after a failure do not spawn additional partial tool messages or re-attempt open().

npx tsc --noEmit passes cleanly.

Manual (Extension Development Host / F5):

  1. Ask the agent to use the write_file tool to write a file to a destination that is un-writable (no access, or read-only filesystem).
  2. Before the fix: the agent stalls
  3. After the fix: the error is surfaced as a tool result with an error message, and the agent continues the conversation.

Pre-Submission Checklist

  • Issue Linked: This PR is linked to an approved GitHub Issue (see "Related GitHub Issue" above).
  • Scope: My changes are focused on the linked issue (one major feature/fix per PR).
  • Self-Review: I have performed a thorough self-review of my code.
  • Testing: New and/or updated tests have been added to cover my changes (if applicable).
  • Documentation Impact: I have considered if my changes require documentation updates (none required).
  • Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Documentation Updates

  • No documentation updates are required.

Summary by CodeRabbit

  • Bug Fixes
    • Improved file-writing behavior when errors occur during streaming or save operations.
    • Prevented repeated error prompts and duplicate retry attempts after a partial update fails.
    • Fixed a stuck loading/spinner state when an operation ends unexpectedly.
    • Ensured directory creation and file updates happen at the right stage so error handling is more consistent.

…oo-Code-Org#703)

- Remove unguarded createDirectoriesForFile call from handlePartial; the call
  was a redundant optimization (execute() already creates dirs before open())
  and its unguarded throw caused the partial-block advancement gate in
  presentAssistantMessage to be skipped, permanently stalling the agent loop
- Move createDirectoriesForFile in execute() inside the try block so EROFS/
  EACCES errors route through handleError with diffViewProvider.reset() cleanup
  and consecutive-mistake counting, rather than escaping unhandled
- Add regression tests covering both failure paths
…_file filesystem failure

When write_to_file hits a filesystem error (EROFS/EACCES) the streaming
phase left the "Zoo wants to edit this file" spinner running, surfaced the
same error twice (handlePartial + execute), and spawned a new partial tool
message on every subsequent streaming delta.

- Add Task.finalizePartialToolAsk() to finalize a partial tool ask without
  blocking on user input, dismissing the spinner.
- handlePartial swallows streaming filesystem errors (after finalizing the
  spinner and resetting the diff view) so only the authoritative execute()
  error is reported, eliminating the duplicate error bubble.
- Track partialStreamFailed so later streaming deltas short-circuit instead
  of re-attempting and spawning repeated partial tool messages.
- Add regression tests for spinner finalization, single-error reporting, and
  no repeated partial messages.
@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

write_to_file now finalizes partial tool asks on filesystem errors, moves new-file directory creation into the main execute path, and stops partial streaming after a fatal failure. The tests were updated for cleanup, deduplicated error reporting, and EROFS behavior.

Changes

Partial write-to-file error recovery

Layer / File(s) Summary
Partial ask finalizer
src/core/task/Task.ts
Task adds finalizePartialToolAsk() to mark the latest partial tool ask as non-partial and persist it.
Execute cleanup and state
src/core/tools/WriteToFileTool.ts
WriteToFileTool adds partialStreamFailed, clears it between runs, moves new-file directory creation into execute(), and finalizes the partial ask before handling execute-time errors.
Partial streaming failures
src/core/tools/WriteToFileTool.ts
handlePartial() now stops after a prior filesystem failure, no longer creates directories there, and catches diff-view open/update errors to log, finalize the partial ask, and reset the diff view.
Regression tests
src/core/tools/__tests__/writeToFileTool.spec.ts
The test suite updates partial-mode directory creation expectations and adds coverage for swallowed streaming errors, cleanup on diff-view failures, deduplicated retries, and EROFS handling in both partial and execute paths.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • taltas
  • JamesRobert20
  • navedmerchant
  • hannesrudolph
  • edelauna

Poem

A bunny hopped by moonlit pine,
and sniffed the errors in the line.
Partial asks now drift and mend,
with no more stalling at the end.
🐇✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main fix: preventing write_to_file stalls on filesystem errors.
Description check ✅ Passed The description includes the linked issue, fix summary, testing steps, and checklist, matching the template well.
Linked Issues check ✅ Passed The code addresses #703 by removing the partial-path directory creation bug and ensuring filesystem errors recover cleanly.
Out of Scope Changes check ✅ Passed The changes stay focused on the stall fix and related cleanup/tests, with no clear unrelated additions.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@codecov

codecov Bot commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 88.88889% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/core/tools/WriteToFileTool.ts 87.50% 0 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/core/task/Task.ts`:
- Around line 1739-1745: Persist the finalized tool-ask state in
finalizePartialToolAsk by updating the underlying clineMessages entry, not just
the webview. After clearing the partial flag on the last ask/tool message, make
sure the task state is saved through the same persistence path used by
task.ask(..., false) in EditFileTool so the finalized message is written before
any reload/resume. Use the existing updateClineMessage flow only as the UI
update, and add the missing persistence step around finalizePartialToolAsk and
the clineMessages mutation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: fe35d544-0a5c-4962-a19e-ead7a0d45383

📥 Commits

Reviewing files that changed from the base of the PR and between 34898d2 and 53568a5.

📒 Files selected for processing (3)
  • src/core/task/Task.ts
  • src/core/tools/WriteToFileTool.ts
  • src/core/tools/__tests__/writeToFileTool.spec.ts

Comment thread src/core/task/Task.ts
Comment on lines +1739 to +1745
async finalizePartialToolAsk(): Promise<void> {
const lastMessage = this.clineMessages.at(-1)

if (lastMessage && lastMessage.partial && lastMessage.type === "ask" && lastMessage.ask === "tool") {
lastMessage.partial = false
await this.updateClineMessage(lastMessage)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win

Persist the finalized tool-ask state.

Line 1744 only calls updateClineMessage(), which updates the webview but does not save this.clineMessages. The existing finalization path in src/core/tools/EditFileTool.ts:150-171 goes through task.ask(..., false), which persists the row first. If a swallowed partial-stream failure is followed by a reload/resume before another save happens, this message is still stored as partial: true and the spinner can come back.

Suggested fix
 async finalizePartialToolAsk(): Promise<void> {
 	const lastMessage = this.clineMessages.at(-1)

 	if (lastMessage && lastMessage.partial && lastMessage.type === "ask" && lastMessage.ask === "tool") {
 		lastMessage.partial = false
+		await this.saveClineMessages()
 		await this.updateClineMessage(lastMessage)
 	}
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async finalizePartialToolAsk(): Promise<void> {
const lastMessage = this.clineMessages.at(-1)
if (lastMessage && lastMessage.partial && lastMessage.type === "ask" && lastMessage.ask === "tool") {
lastMessage.partial = false
await this.updateClineMessage(lastMessage)
}
async finalizePartialToolAsk(): Promise<void> {
const lastMessage = this.clineMessages.at(-1)
if (lastMessage && lastMessage.partial && lastMessage.type === "ask" && lastMessage.ask === "tool") {
lastMessage.partial = false
await this.saveClineMessages()
await this.updateClineMessage(lastMessage)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/task/Task.ts` around lines 1739 - 1745, Persist the finalized
tool-ask state in finalizePartialToolAsk by updating the underlying
clineMessages entry, not just the webview. After clearing the partial flag on
the last ask/tool message, make sure the task state is saved through the same
persistence path used by task.ask(..., false) in EditFileTool so the finalized
message is written before any reload/resume. Use the existing updateClineMessage
flow only as the UI update, and add the missing persistence step around
finalizePartialToolAsk and the clineMessages mutation.

@awschmeder

Copy link
Copy Markdown
Contributor Author
write-to-file-error-correct.mov

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Agent loop stalls permanently when write_to_file partial streaming hits a filesystem error (EROFS/EACCES)

1 participant