Skip to content

Maybe try another zip module #1241

Description

@jimmywarting

I've built a zero-dependency, streamable zip library using only web standards, meaning it works seamlessly across all environments: Node.js, Deno, Bun, and all browsers.

Since it doesn't depend on node:buffer, node:stream, or any Node-specific APIs, it's truly cross-environment friendly.
And if you would one day want this to work in even other backends other then NodeJS, then this would be a recommendation

One key benefit is that it operates on the W3C standard File API, making random access to uncompressed entries in a zip file incredibly fast—as simple as:

blob = zipBlob.slice(start, end)
text = blob.text()

This means ultra-fast performance because it leverages native browser APIs. 🚀
For compressed entries, it uses the standard DecompressionStream('gzip').


Here's a preview of how it could integrate into your codebase:

Code for writing zip files before:

vscode-vsce/src/package.ts

Lines 1825 to 1858 in 2aeafb2

function writeVsix(files: IFile[], packagePath: string): Promise<void> {
return fs.promises
.unlink(packagePath)
.catch(err => (err.code !== 'ENOENT' ? Promise.reject(err) : Promise.resolve(null)))
.then(
() =>
new Promise((c, e) => {
const zip = new yazl.ZipFile();
const zipOptions: Partial<yazl.Options> = {};
// reproducible zip files
const sde = process.env.SOURCE_DATE_EPOCH;
if (sde) {
const epoch = parseInt(sde);
zipOptions.mtime = new Date(epoch * 1000);
files = files.sort((a, b) => a.path.localeCompare(b.path))
}
files.forEach(f =>
isInMemoryFile(f)
? zip.addBuffer(typeof f.contents === 'string' ? Buffer.from(f.contents, 'utf8') : f.contents, f.path, { ...zipOptions, mode: f.mode })
: zip.addFile(f.localPath, f.path, { ...zipOptions, mode: f.mode })
);
zip.end();
const zipStream = fs.createWriteStream(packagePath);
zip.outputStream.pipe(zipStream);
zip.outputStream.once('error', e);
zipStream.once('error', e);
zipStream.once('finish', () => c());
})
);
}

What it could look like when writing zip files with this library:

// import * as yazl from 'yazl';
import ZipWriter from 'zip-go/lib/write.js'

function writeVsix(files: IFile[], packagePath: string): Promise<void> {
	return fs.promises
		.unlink(packagePath)
		.catch(err => (err.code !== 'ENOENT' ? Promise.reject(err) : Promise.resolve(null)))
		.then(async () => {
			// reproducible zip files
			const sde = process.env.SOURCE_DATE_EPOCH;
			files = sde ? files.sort((a, b) => a.path.localeCompare(b.path)) : files;

			const fileStream = ReadableStream.from((async function* () {
				const lastModified = sde ? parseInt(sde) * 1000 : Date.now();

				for (let file of files) {
					if ('contents' in file) {
						yield new File([file.contents], file.path, { lastModified });
					} else {
						const blob = await fs.openAsBlob(file.localPath);
						yield new File([blob], file.path, { lastModified });
					}
				}
			})());

			await fs.promises.writeFile(
				packagePath,
				fileStream.pipeThrough(new ZipWriter())
			);
		})
}

Code for reading zip files before:

vscode-vsce/src/zip.ts

Lines 8 to 48 in 2aeafb2

async function bufferStream(stream: Readable): Promise<Buffer> {
return await new Promise((c, e) => {
const buffers: Buffer[] = [];
stream.on('data', buffer => buffers.push(buffer));
stream.once('error', e);
stream.once('end', () => c(Buffer.concat(buffers)));
});
}
export async function readZip(packagePath: string, filter: (name: string) => boolean): Promise<Map<string, Buffer>> {
const zipfile = await new Promise<ZipFile>((c, e) =>
open(packagePath, { lazyEntries: true }, (err, zipfile) => (err ? e(err) : c(zipfile!)))
);
return await new Promise((c, e) => {
const result = new Map<string, Buffer>();
zipfile.once('close', () => c(result));
zipfile.readEntry();
zipfile.on('entry', (entry: Entry) => {
const name = entry.fileName.toLowerCase();
if (filter(name)) {
zipfile.openReadStream(entry, (err, stream) => {
if (err) {
zipfile.close();
return e(err);
}
bufferStream(stream!).then(buffer => {
result.set(name, buffer);
zipfile.readEntry();
});
});
} else {
zipfile.readEntry();
}
});
});
}

What it could look like when reading zip files with this library:

import { openAsBlob } from 'node:fs';
import zipReader from 'zip-go/lib/read.js';

export async function readZip(packagePath: string, filter: (name: string) => boolean): Promise<Map<string, Buffer>> {
	const zipFile = await openAsBlob(packagePath);
	const result = new Map<string, Buffer>();

	for await (const entry of zipReader(zipFile)) {
		const name = entry.name.toLowerCase();
		if (filter(name)) {
			const bytes = await entry.arrayBuffer();
			result.set(name, Buffer.from(bytes));
		}
	}

	return result;
}

This approach introduces minimal breaking changes to the codebase. However, I'd personally recommend this alternative:

const result = new Map<string, FileLike>();

for await (const entry of zipReader(zipFile)) {
	const name = entry.name.toLowerCase();
	if (filter(name)) {
		result.set(name, entry);
	}
}

// Later, you can use:
const entry = result.get(path)

// For getting a true native File object (and not some file like object)
await entry.file()

// All of these methods exist on Response, Request, File, and Blob, making it very flexible on an entry as well:
await entry.text()
await entry.bytes()
await entry.arrayBuffer()
entry.stream().pipeTo(...)

I'd also suggest this improvement:

- function writeVsix(files: IFile[], packagePath: string): Promise<void> {
+ function writeVsix(files: File[], packagePath: string): Promise<void> {

Instead of creating "memory files" with { content: "..." }, you'd create actual File objects using new File([content], path, { ... }), or use openAsBlob directly for files on disk before calling writeVsix.


Interested in benchmarks?
Check out the performance comparisons here:
https://github.com/jimmywarting/zip-benchmark.js

And here's a browser benchmark as well:
https://jimmywarting.github.io/zip-benchmark.js/browser-benchmark.html

(Note: yauzl shows significant performance penalties when running in non-Node.js environments)

Another reason for why i think my zib lib is better is b/c it dose not have any network, disc or other IO permission, making it very sandboxing safe, you give the lib the data you need and it gives you stuff back. I do not think libs should have any IO access at all - only application should have that privilege

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions