Skip to content

Support strict structured output for LiteRT-LM backends #174

@leehack

Description

@leehack

Problem

The new LiteRT-LM backends support Gemma 4 chat, thinking output, and tool-call parsing, but strict structured output is not supported yet. Today the LiteRT-LM native and web backends report supportsGrammarConstraints => false, the engine skips template grammars for those backends, and LiteRT-LM generation rejects GenerationParams.grammar, grammarLazy, grammarTriggers, and grammarRoot.

This means responseFormat / JSON schema requests can work with the llama.cpp backend through grammar-constrained decoding, but the LiteRT-LM backend can only rely on prompt/template behavior and post-generation parsing.

Current evidence

  • LiteRtLmBackend.supportsGrammarConstraints and LiteRtLmBackendWeb.supportsGrammarConstraints are false.
  • LiteRtLmService._validateGenerationParams and the web mirror reject grammar-related params.
  • The native FFI path currently only binds litert_lm_conversation_config_set_enable_constrained_decoding and sets it to false; there is no Dart API for passing JSON schema, Lark, or other constraint payloads.
  • Upstream LiteRT-LM has constrained decoding concepts in ConversationConfig, including LLGuidance-style JSON Schema and Lark constraint support, so this looks like a missing bridge/backend feature rather than an impossible runtime capability.

Relevant upstream references:

Proposed direction

  1. Extend litert-lm-native to expose constrained decoding configuration through the C ABI.
  2. Add Dart FFI bindings for constraint provider/configuration.
  3. Map llamadart responseFormat / JSON schema requests to the LiteRT-LM constraint type when possible.
  4. Decide how to handle existing llama.cpp GBNF grammar inputs, since they may not translate 1:1 to LiteRT-LM constraints.
  5. Enable supportsGrammarConstraints only for LiteRT-LM runtimes/platforms where the constrained decoding path is actually available.
  6. Add unit tests and real-model E2E/smoke coverage that proves invalid structured output is prevented, not only parsed after generation.

Acceptance criteria

  • responseFormat: { type: json_schema, ... } works with LiteRT-LM native where supported.
  • Web behavior is either implemented or explicitly documented as unsupported with a clear error.
  • Unsupported grammar modes produce clear backend-specific errors.
  • Tests cover native/web routing, error messages, and at least one real-model JSON-schema smoke.

Related

Follow-up from PR #167 LiteRT-LM backend support.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestlitert-lm-parityLiteRT-LM backend parity featuresstructured-outputJSON schema, grammar, and constrained output work

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions