Skip or speedup lexer preprocessing

Currently, when trying to run constrained decoding with a new grammar, we are prompted with
```
Creating DFA mask store for LlamaTokenizerFast and custom, may take more than 10 minutes.
```
This holds true even for the smallest grammar examples.

Sadly, this makes experimenting and debugging grammars quite cumbersome, because any modification will result in a cache miss and will trigger this expensive preprocessing.
Additionally, we are currently working in a setup where the grammar is modified on the fly in between interactions with the LLM, and so 10 minutes is prohibitively expensive.

Although the decoding performance might be affected, is there a way to skip this preprocessing? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Skip or speedup lexer preprocessing #115

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Skip or speedup lexer preprocessing #115

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions