Skip to content

Support resuming interrupted NAR transfers #225

@DerDennisOP

Description

@DerDennisOP

NAR transfers via NarRequest / NarPush currently can't resume from an interrupted state, a network glitch or reconnect mid-transfer forces a restart from byte 0.

Proposal

Additive proto changes (no breaking changes to existing messages):

  • NarStreamHeader { path_hash, total_bytes, content_hash_sha256 }: sent before chunks; receiver sizes storage and verifies the full content hash on completion.
  • NarRequestResume { path_hash, received_bytes }: pull side requests resume from offset.
  • NarPushResume { path_hash, received_bytes }: push side announces buffered prefix.
  • Receivers persist incoming chunks to a *.partial file keyed by path_hash; survives process restart.
  • Senders stay stateless; seek to the requested offset on resume.
  • Bounded .partial lifetime: GC after a configurable TTL.

Server and worker both gain support.

Why

  • Workers recovering from flaky networks don't redo large transfers.
  • Prerequisite for the upcoming gradient-proxy work — proxy will need the same resume semantics for both its proxy↔server and proxy↔worker links.
  • General resilience improvement; no user-facing API change.

Out of scope

  • User-initiated pause/resume.
  • Multi-source / parallel downloads.

Also

  • reconnection flow, so when networking stops the worker doesn't abort builds that have been running for long, rather just reconnect and reauth and continue where stopped.
  • Better Upload stability: If any output upload fails, the worker should retry the entire upload
    sequence up to 3 times, re-requesting presigned URLs / Nar Push.

Design doc and implementation plan to follow as a separate spec.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestmediumMedium severity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions