Add method to return attn-mask for HF Tokenizer. by cptspacemanspiff · Pull Request #60 · mlc-ai/tokenizers-cpp

cptspacemanspiff · 2025-02-11T23:06:12Z

So, I have been doing batched inference of things that use HF tokenizer. Your library is great, but does not expose the attention masks, which are useful when some of the inputs/outputs are padding.

This adds an additional tokenize method to the HF Tokenizer that returns both the token_ids and the attention masks.

It depends on my previous pull request, which separates out the hf cpp header declarations and the implementations.

#57

cptspacemanspiff added 5 commits January 25, 2025 15:26

Add HFTokenizerHeader

2e4f353

Moved the hf tokenizer defs to the header.

434e6b2

Added factories to HFTokenizer, Tokenizer factories call them.

6be4671

Added additional api for attn with masks.

81b1ca7

fix bug.

48de0e8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add method to return attn-mask for HF Tokenizer.#60

Add method to return attn-mask for HF Tokenizer.#60
cptspacemanspiff wants to merge 5 commits intomlc-ai:mainfrom
cptspacemanspiff:return-attn-mask-from-hftokenizer

cptspacemanspiff commented Feb 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cptspacemanspiff commented Feb 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant