Skip to content

[kokoro-js] Bump @huggingface/transformers to ^4.2.0#320

Open
shreyaskarnik wants to merge 1 commit into
hexgrad:mainfrom
shreyaskarnik:bump-transformers-v4
Open

[kokoro-js] Bump @huggingface/transformers to ^4.2.0#320
shreyaskarnik wants to merge 1 commit into
hexgrad:mainfrom
shreyaskarnik:bump-transformers-v4

Conversation

@shreyaskarnik
Copy link
Copy Markdown

@shreyaskarnik shreyaskarnik commented May 7, 2026

Summary

Bumps the declared @huggingface/transformers range in kokoro.js/package.json from ^3.5.1 to ^4.2.0.

Motivation

When a downstream app depends on both kokoro-js and @huggingface/transformers directly, capping kokoro-js's range at 3.x forces npm to install two copies of @huggingface/transformers whenever the app pins ^4.x. Each copy ships its own onnxruntime-node, and loading two ORT native modules into the same Node process segfaults — ORT is not designed to be initialized twice in one process; the duplicate global state corrupts the runtime.

We hit this in production. The current workaround consumers have to use:

"overrides": {
  "kokoro-js": {
    "@huggingface/transformers": "$@huggingface/transformers"
  }
}

Bumping the declared range removes the need for any consumer to write that override.

API surface verified

kokoro-js consumes { env, StyleTextToSpeech2Model, AutoTokenizer, Tensor, RawAudio } and writes through the internal env.backends.onnx.wasm.wasmPaths path used by the env shim. All exports and the internal path are present in v4.2.0.

Verification

npm test276/276 passing on v4.2.0 (Node v20.20.2, darwin-arm64).

A separate fp32-CPU benchmark (cold-load, per-utterance latency / RTF, streaming TTFA, peak RSS, on onnx-community/Kokoro-82M-v1.0-ONNX, 1 warmup + 3 measured runs each):

Metric v3.5.1 v4.2.0 Δ
Cold load 3475 ms 3455 ms ~flat
Short (37 ch) — wall / RTF 447 ms / 5.99x 442 ms / 6.05x ~flat
Medium (164 ch) — wall / RTF 1721 ms / 6.03x 1731 ms / 5.99x ~flat
Long (516 ch) — wall / RTF 4621 ms / 5.87x 4609 ms / 5.89x ~flat
Streaming TTFA 464 ms 480 ms +3% (1 sample)
Peak RSS 1832 MB 1948 MB +6%

Compute is flat (kokoro's hot path is one ONNX inference call dispatched by onnxruntime-node, not by transformers.js itself; v4's headline 4x BERT-MHA win doesn't apply to StyleTextToSpeech2). RSS is ~6% higher, expected from v4's larger feature set (modular models, WebGPU-in-Node, ModelRegistry). Audio output length is byte-identical.

Test plan

  • cd kokoro.js && npm install && npm test — expect 276/276
  • Run the README's basic and streaming examples on Node — audio.save produces a valid wav

Bumps the declared transformers.js range from ^3.5.1 to ^4.2.0.

Motivation: when an app depends on both kokoro-js and @huggingface/transformers
directly, having the kokoro-js range cap at 3.x forces npm to install two
copies of @huggingface/transformers when the app pins ^4.x. Each copy ships
its own onnxruntime-node, and loading two ORT native modules into the same
Node process causes segfaults (ORT is not designed to be initialized twice in
one process; the duplicate global state corrupts the runtime). We observed
this in production and it disappeared once the two were unified via an
"overrides" workaround in package.json. Bumping the declared range removes
the need for downstream consumers to write that override.

API surface: kokoro-js consumes { env, StyleTextToSpeech2Model, AutoTokenizer,
Tensor, RawAudio } plus the env.backends.onnx.wasm.wasmPaths internal path
used by the env shim. All of these still work in v4.2.0.

Verification: npm test passes (276 / 276) on v4.2.0.
@shreyaskarnik
Copy link
Copy Markdown
Author

cc @xenova

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant