Skip to content

Latest commit

 

History

History
32 lines (19 loc) · 1.8 KB

File metadata and controls

32 lines (19 loc) · 1.8 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Purpose

This repository contains a tool to fix ZIP files >4GB that were created with a bug where all offsets in the directory structure are written using 32-bit fields instead of ZIP64 format. The files are not damaged but cannot be extracted with standard tools because offsets wrap around at the 4GB boundary.

Running

python3 zip64_fix.py <input.zip> [output.zip]

If no output path is given, writes <input>_fixed.zip.

Architecture

zip64_fix.py fixes the broken ZIP in three phases:

  1. Locate central directory: Finds the EOCD at end of file, then reads the central directory at eocd_offset - cd_size (since the stored offset is truncated to 32 bits).

  2. Calculate actual offsets: Walks entries in order, tracking when stored_offset + adjustment wraps below the previous entry's end, adding 4GB to the adjustment at each boundary crossing. This handles files where the first N entries have correct offsets (below 4GB) and subsequent entries are off by multiples of 4GB.

  3. Write fixed output: Writes new local file headers (with proper sizes and ZIP64 extra fields only when needed), copies compressed data from original, then writes a new central directory and end records. Uses ZIP64 extensions only for fields that actually exceed 32-bit limits.

Key design decisions

  • Sizes and CRC32 are taken as-is from the original central directory (no decompression/recalculation).
  • The data descriptor flag (bit 3) is cleared in output headers since sizes are written directly.
  • ZIP64 extra fields are built conditionally: only fields that overflow 32 bits are included.
  • Local headers are rewritten (not copied) because the originals may have wrong extra field layouts.