Skip to content

Latest commit

 

History

History
125 lines (92 loc) · 4.12 KB

File metadata and controls

125 lines (92 loc) · 4.12 KB

Custom LLM Backend

Synthdocs ships with MistralBackend and OpenAIBackend, but you can implement your own backend to use any LLM provider.

The Backend Interface

Implement the LLMBackend abstract base class from synthdocs.llm.base:

from synthdocs.llm.base import LLMBackend
from pydantic import BaseModel

class MyBackend(LLMBackend):
    def __init__(self, model: str = "my-model", temperature: float = 0.2) -> None:
        super().__init__(model=model, temperature=temperature)
        # Initialize your client here

    def generate(
        self,
        prompt: str,
        response_model: type[BaseModel] | None = None,
        temperature: float | None = None,
    ) -> str | BaseModel:
        effective_temp = temperature if temperature is not None else self.temperature

        if response_model is not None:
            # Structured output mode: return a validated Pydantic model instance
            # Use your provider's structured output feature, or parse JSON manually
            raw_json = self._call_llm_json_mode(prompt, effective_temp)
            return response_model.model_validate_json(raw_json)

        # Text mode: return raw string
        return self._call_llm_text_mode(prompt, effective_temp)

Key Requirements

  1. Text mode (response_model=None): return a plain str
  2. Structured mode (response_model provided): return a validated Pydantic model instance, not raw JSON text

For structured output, you can either:

  • Use your provider's native structured output / JSON schema feature (preferred)
  • Request JSON and parse with response_model.model_validate_json(raw_json)

Using Your Backend

Pass your backend to the generation functions:

from synthdocs import generate_document, generate_case_batch

# Single document
result = generate_document(case_input, backend=MyBackend())

# Batch generation
results = generate_case_batch(
    template=my_template,
    count=5,
    backend=MyBackend(model="my-large-model", temperature=0.3),
    output_dir=Path("./output"),
)

Evaluating the Judge with Your Backend

If you want to use your custom backend as the judge for fact-location evaluation, you have two options:

1. Run the Judge Benchmark (Recommended)

Sanity-check your backend against the bundled labeled dataset:

from synthdocs.eval.judge_benchmarks import (
    load_fact_locations_judge_benchmark,
    run_fact_locations_judge_benchmark,
)

items = load_fact_locations_judge_benchmark()  # bundled JSONL

result = run_fact_locations_judge_benchmark(
    backend=MyBackend(),
    items=items,
    tightness_threshold=3,
)

print(f"Entailment accuracy: {result['summary']['entailed_accuracy']:.3f}")
print(f"Span pass agreement: {result['summary']['pass_agreement']:.3f}")

This runs your backend through the same judge prompts used in evaluation and compares against human-labeled expected outputs. Look for:

  • Entailment accuracy > 0.9 (does your model correctly identify when facts are present?)
  • Span pass agreement Does the judge's pass/fail (entailed AND span minimality score >= threshold) align with labels? This is a proxy for whether extracted spans are reasonably minimal and usable as citations. WARNING: We are still testing this.

2. End-to-End Evaluation with Your Backend as Judge

Use your backend in the full evaluation pipeline:

from synthdocs.eval.fact_location import (
    FactLocationJudgeConfig,
    run_fact_location_batch_eval,
)

judge_config = FactLocationJudgeConfig(
    backend=MyBackend(temperature=0.1),
    model="my-model",
    temperature=0.1,
    context_chars=120,
    enabled=True,
)

summary = run_fact_location_batch_eval(
    target=Path("output/"),
    run_id="my-run",
    judge_config=judge_config,
)

Note: The CLI (synthdocs eval fact-locations) currently only supports OpenAI and Mistral backends. For custom backends, use the Python API directly.

Reference: Existing Backends

For implementation examples, see:

  • src/synthdocs/llm/openai.py — uses OpenAI's responses.parse() for structured output
  • src/synthdocs/llm/mistral.py — uses response_format: json_schema for structured output