chore(deps): bump tokenizers from 0.20.4 to 0.23.1#97
Merged
Conversation
Bumps [tokenizers](https://github.com/huggingface/tokenizers) from 0.20.4 to 0.23.1. - [Release notes](https://github.com/huggingface/tokenizers/releases) - [Changelog](https://github.com/huggingface/tokenizers/blob/main/RELEASE.md) - [Commits](huggingface/tokenizers@v0.20.4...v0.23.1) --- updated-dependencies: - dependency-name: tokenizers dependency-version: 0.23.1 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
b6ed9fc to
54d0d45
Compare
tokenizers 0.21+ changed two APIs the source touched: - Tokenizer::add_special_tokens now takes the tokens by value (impl IntoIterator<Item = AddedToken>) and returns Result<usize>. The TTS loader now consumes the local Vec and propagates the error via TtsError::Tokenizer instead of passing &Vec and discarding the return. - WordLevelBuilder::vocab now requires ahash::AHashMap, not std HashMap. The decode-loop test fixture builds its tiny WordLevel tokenizer from a HuggingFace tokenizer JSON string via Tokenizer::from_str instead, which is behaviour-identical and adds no new dependency. encode(input, add_special_tokens) and decode(ids, skip_special_tokens) signatures and defaults are unchanged across 0.20->0.23, so all encode/decode call sites keep their existing add-special / skip-special semantics.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bumps tokenizers from 0.20.4 to 0.23.1.
Release notes
Sourced from tokenizers's releases.
... (truncated)
Commits
7f1623bBump version to 0.23.1bbe43adci: release workflow fixes (node + python) (#2043)ab0c5d8Fix node release (#2034)decd8e0bindings/python: free-threaded Python (3.14t) support (#2041)3992692update for release (#2033)bcdd25bBPE cache: per-thread read-through cache to avoid RwLock atomics on hits (#2028)618eb38Bump follow-redirects in /tokenizers/examples/unstable_wasm/www (#2024)b6b1688chore: bump doc-builder SHA for PR upload workflow (#2025)19015d6fix: use uvx --with cairosvg instead of uv pip install --system (#2021)efbcc68Ci benchmarks (#2019)