Problem
The new LiteRT-LM backends support Gemma 4 chat, thinking output, and tool-call parsing, but strict structured output is not supported yet. Today the LiteRT-LM native and web backends report supportsGrammarConstraints => false, the engine skips template grammars for those backends, and LiteRT-LM generation rejects GenerationParams.grammar, grammarLazy, grammarTriggers, and grammarRoot.
This means responseFormat / JSON schema requests can work with the llama.cpp backend through grammar-constrained decoding, but the LiteRT-LM backend can only rely on prompt/template behavior and post-generation parsing.
Current evidence
LiteRtLmBackend.supportsGrammarConstraints and LiteRtLmBackendWeb.supportsGrammarConstraints are false.
LiteRtLmService._validateGenerationParams and the web mirror reject grammar-related params.
- The native FFI path currently only binds
litert_lm_conversation_config_set_enable_constrained_decoding and sets it to false; there is no Dart API for passing JSON schema, Lark, or other constraint payloads.
- Upstream LiteRT-LM has constrained decoding concepts in
ConversationConfig, including LLGuidance-style JSON Schema and Lark constraint support, so this looks like a missing bridge/backend feature rather than an impossible runtime capability.
Relevant upstream references:
Proposed direction
- Extend
litert-lm-native to expose constrained decoding configuration through the C ABI.
- Add Dart FFI bindings for constraint provider/configuration.
- Map llamadart
responseFormat / JSON schema requests to the LiteRT-LM constraint type when possible.
- Decide how to handle existing llama.cpp GBNF grammar inputs, since they may not translate 1:1 to LiteRT-LM constraints.
- Enable
supportsGrammarConstraints only for LiteRT-LM runtimes/platforms where the constrained decoding path is actually available.
- Add unit tests and real-model E2E/smoke coverage that proves invalid structured output is prevented, not only parsed after generation.
Acceptance criteria
responseFormat: { type: json_schema, ... } works with LiteRT-LM native where supported.
- Web behavior is either implemented or explicitly documented as unsupported with a clear error.
- Unsupported grammar modes produce clear backend-specific errors.
- Tests cover native/web routing, error messages, and at least one real-model JSON-schema smoke.
Related
Follow-up from PR #167 LiteRT-LM backend support.
Problem
The new LiteRT-LM backends support Gemma 4 chat, thinking output, and tool-call parsing, but strict structured output is not supported yet. Today the LiteRT-LM native and web backends report
supportsGrammarConstraints => false, the engine skips template grammars for those backends, and LiteRT-LM generation rejectsGenerationParams.grammar,grammarLazy,grammarTriggers, andgrammarRoot.This means
responseFormat/ JSON schema requests can work with the llama.cpp backend through grammar-constrained decoding, but the LiteRT-LM backend can only rely on prompt/template behavior and post-generation parsing.Current evidence
LiteRtLmBackend.supportsGrammarConstraintsandLiteRtLmBackendWeb.supportsGrammarConstraintsare false.LiteRtLmService._validateGenerationParamsand the web mirror reject grammar-related params.litert_lm_conversation_config_set_enable_constrained_decodingand sets it tofalse; there is no Dart API for passing JSON schema, Lark, or other constraint payloads.ConversationConfig, including LLGuidance-style JSON Schema and Lark constraint support, so this looks like a missing bridge/backend feature rather than an impossible runtime capability.Relevant upstream references:
Proposed direction
litert-lm-nativeto expose constrained decoding configuration through the C ABI.responseFormat/ JSON schema requests to the LiteRT-LM constraint type when possible.supportsGrammarConstraintsonly for LiteRT-LM runtimes/platforms where the constrained decoding path is actually available.Acceptance criteria
responseFormat: { type: json_schema, ... }works with LiteRT-LM native where supported.Related
Follow-up from PR #167 LiteRT-LM backend support.