Skip to content

fix(update): wait for writer flush, honour backpressure, clean up tmp on failure#158

Open
Ricardo-M-L wants to merge 1 commit into
MiniMax-AI:mainfrom
Ricardo-M-L:contrib/fix-stream-reader-lock-leak
Open

fix(update): wait for writer flush, honour backpressure, clean up tmp on failure#158
Ricardo-M-L wants to merge 1 commit into
MiniMax-AI:mainfrom
Ricardo-M-L:contrib/fix-stream-reader-lock-leak

Conversation

@Ricardo-M-L
Copy link
Copy Markdown

What does this PR do?

downloadFile() in src/update/self-update.ts had three concrete reliability issues, all triggered in the self-update path:

1. Premature resolve() racing the writer flush

The promise resolved as soon as the read loop hit done, but writer.end() only queues the close — bytes may still be in the kernel pagecache. The very next step is verifySha256(tmp, target.checksum), which reads the file back. On slower disks (HDD, network FS, CI runners with bursty I/O) this races the flush and surfaces as a spurious Checksum mismatch error in the middle of an update.

Fix: Wait for writer.on('finish') before resolving.

2. Missing backpressure on writer.write()

writer.write(value) was called without honouring its boolean return value. For large binaries the writer's internal buffer grows without bound until the entire download is in memory.

Fix: Await the drain event when write() returns false.

3. Half-downloaded binary left in tmpdir() on failure

On any error (network reset, 5xx mid-stream, writer error), the partial file at dest was left in tmpdir() permanently.

Fix: unlinkSync(dest) (best-effort) in a catch before rethrowing.

4. Web Streams reader lock never released

The reader's lock was never released — neither on the happy path nor on errors. The Web Streams API contract requires paired acquire/release; without the release the underlying response body stays locked.

Fix: Move reader.releaseLock() into a finally.

Test plan

  • npx tsc --noEmit passes
  • No behavior change on the success path beyond the (correct) flush wait
  • Manual: artificially slow down disk writes (fsync throttling) and verify the checksum step no longer reports mismatches

Why fix together?

All four are in the same 25-line function and share the same restructure (wrap the await new Promise() in try/catch/finally). Splitting would mean three separate refactors of the same block.

…p tmp

`downloadFile()` in `src/update/self-update.ts` had three concrete
reliability issues, all triggered in the self-update path:

1. `resolve()` was called as soon as the read loop hit `done`, but
   `writer.end()` only *queues* the close — bytes may still be in the
   kernel pagecache. The very next step, `verifySha256()`, reads back
   the same file; on slower disks (HDD, network FS, CI runners) this
   races the flush and surfaces as a spurious "Checksum mismatch"
   error mid-update. Replace the `resolve()` call with a `writer.on(
   'finish')` handler so we only proceed once the file is durably
   closed.

2. `writer.write(value)` was called without honouring its boolean
   return value. For large binaries the writer's internal buffer
   grows without bound until the entire download is in memory. Wait
   for the `'drain'` event when `write()` returns `false`.

3. On any error (network reset, 5xx mid-stream, writer error), the
   half-written file at `dest` was left in `tmpdir()`. Unlink it in a
   catch block before rethrowing.

4. The Web Streams reader's lock was never released — neither on the
   happy path nor on errors. Move the `releaseLock()` into a `finally`
   so the underlying body stream is always freed.

No behavior change on the success path beyond the (correct) flush
wait. `tsc --noEmit` passes.
Copy link
Copy Markdown
Contributor

@NianJiuZst NianJiuZst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling the reliability issues in this path. I agree with waiting for finish and honoring backpressure, but I don't think the error path is fully cleaned up yet.

There is also still no test coverage for src/update/self-update.ts, so the new finish / drain / cleanup / releaseLock() behavior can regress without anything catching it. Please add at least one mocked stream test that exercises a mid-stream failure and verifies the temp file is cleaned up only after the writer is fully torn down.

Reviewed with GPT-5.5 (xhigh reasoning).

Comment thread src/update/self-update.ts
if (done) { writer.end(); break; }
// Honour backpressure so we don't grow the writer's internal buffer
// unboundedly on large binaries / slow disks.
if (!writer.write(value)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the failure path we only unlink dest, but we never close or destroy writer unless the loop reaches the done branch. If reader.read() rejects, or if writer.write() returns false and the writer errors before drain, this block can exit with a live WriteStream and an in-flight pump(). Please mirror the shutdown pattern from src/files/download.ts: tear down the writer in finally, wait for finish/error, and only then remove the temporary file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants