Support strict structured output for LiteRT-LM backends

## Problem

The new LiteRT-LM backends support Gemma 4 chat, thinking output, and tool-call parsing, but strict structured output is not supported yet. Today the LiteRT-LM native and web backends report `supportsGrammarConstraints => false`, the engine skips template grammars for those backends, and LiteRT-LM generation rejects `GenerationParams.grammar`, `grammarLazy`, `grammarTriggers`, and `grammarRoot`.

This means `responseFormat` / JSON schema requests can work with the llama.cpp backend through grammar-constrained decoding, but the LiteRT-LM backend can only rely on prompt/template behavior and post-generation parsing.

## Current evidence

- `LiteRtLmBackend.supportsGrammarConstraints` and `LiteRtLmBackendWeb.supportsGrammarConstraints` are false.
- `LiteRtLmService._validateGenerationParams` and the web mirror reject grammar-related params.
- The native FFI path currently only binds `litert_lm_conversation_config_set_enable_constrained_decoding` and sets it to `false`; there is no Dart API for passing JSON schema, Lark, or other constraint payloads.
- Upstream LiteRT-LM has constrained decoding concepts in `ConversationConfig`, including LLGuidance-style JSON Schema and Lark constraint support, so this looks like a missing bridge/backend feature rather than an impossible runtime capability.

Relevant upstream references:
- https://raw.githubusercontent.com/google-ai-edge/LiteRT-LM/main/runtime/conversation/conversation.h
- https://raw.githubusercontent.com/google-ai-edge/LiteRT-LM/main/runtime/components/constrained_decoding/llg_constraint_config.h

## Proposed direction

1. Extend `litert-lm-native` to expose constrained decoding configuration through the C ABI.
2. Add Dart FFI bindings for constraint provider/configuration.
3. Map llamadart `responseFormat` / JSON schema requests to the LiteRT-LM constraint type when possible.
4. Decide how to handle existing llama.cpp GBNF grammar inputs, since they may not translate 1:1 to LiteRT-LM constraints.
5. Enable `supportsGrammarConstraints` only for LiteRT-LM runtimes/platforms where the constrained decoding path is actually available.
6. Add unit tests and real-model E2E/smoke coverage that proves invalid structured output is prevented, not only parsed after generation.

## Acceptance criteria

- `responseFormat: { type: json_schema, ... }` works with LiteRT-LM native where supported.
- Web behavior is either implemented or explicitly documented as unsupported with a clear error.
- Unsupported grammar modes produce clear backend-specific errors.
- Tests cover native/web routing, error messages, and at least one real-model JSON-schema smoke.

## Related

Follow-up from PR #167 LiteRT-LM backend support.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support strict structured output for LiteRT-LM backends #174

Problem

Current evidence

Proposed direction

Acceptance criteria

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Support strict structured output for LiteRT-LM backends #174

Description

Problem

Current evidence

Proposed direction

Acceptance criteria

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions