Skip to content

Latest commit

 

History

History
296 lines (224 loc) · 17.4 KB

File metadata and controls

296 lines (224 loc) · 17.4 KB

API Reference

This file documents the public API provided by the modelito package.

Package exports

The package exposes a small, stable set of helpers and a Client-first chat surface. The primary exports (also visible via from modelito import *) are:

  • __version__ — package version string.
  • count_tokens(text: str) -> int — estimate token count (uses tiktoken if available).
  • estimate_remote_timeout(model_name: Optional[str], input_tokens: int = 2048, concurrency: int = 1, with_source: bool = False) -> int | Tuple[int, Dict[str, Any]] — conservative timeout estimator. When with_source=True the function returns a (timeout_seconds, details_dict) tuple with diagnostic metadata.
  • estimate_remote_timeout_details(model_name: Optional[str], input_tokens: int = 2048, concurrency: int = 1) -> Tuple[int, Dict[str, Any]] — diagnostic timeout estimator returning both timeout and computation details.
  • OllamaConnector — small conversation history manager and prompt builder. Connectors now prefer typed Message/Response dataclasses and provide both sync (complete) and async (acomplete) surfaces in addition to the legacy send_sync helper.
  • Embedder — small embeddings-only runtime wrapper that mirrors the provider-selection behavior of Client for callers that only need vector embeddings.
  • Client — primary application entry point with chat(), chat_json(), chat_parsed(), stream(), and provider auto-selection support.
  • Provider, SyncProvider, AsyncProvider, StreamingProvider, EmbeddingProvider, ChatProvider, RawChatProvider — structural provider protocols for legacy, chat-first, and raw OpenAI-compatible code.
  • Message, Response, MessageInput, OpenAIMessageDict — message and response dataclasses / type helpers.
  • ProviderStatus, check_provider_ready(), format_provider_status() — readiness diagnostics helpers for local and hosted providers.
  • OpenAICompatibleHTTPProvider — shared HTTP base class for local OpenAI-compatible runtimes.
  • OllamaProvider — HTTP-aware provider that will call a local Ollama HTTP API when available (via the bundled ollama_service helpers). If the HTTP API is not reachable it will attempt the Ollama CLI as a best-effort fallback (using run_ollama_command) before exposing a safe deterministic summarize() fallback useful for tests.
  • OpenAIProvider — SDK-backed hosted OpenAI provider; can also target hosted OpenAI-compatible APIs via base_url.
  • OMLXProvider — thin preset for local oMLX runtimes, built on OpenAICompatibleHTTPProvider.
  • GeminiProvider, GrokProvider, ClaudeProvider — minimal provider shims with the legacy list_models() / summarize() surface.
  • EmbeddingProvider — structural protocol for provider implementations that expose embed(texts, **kwargs).
  • embed_texts(texts, dim=8) -> List[List[float]] and StubEmbeddingProvider — deterministic test-friendly embedding helpers.
  • normalize_models(raw) -> List[Dict[str, Any]] — normalize provider model listings into dictionaries with an id field.
  • normalize_metadata(raw) -> Dict[str, Any] — normalize provider metadata into a plain dictionary, wrapping scalar values when needed.
  • load_config(path: str) -> dict — JSON/YAML loader for small config files.
  • load_config_data(*paths) -> dict — merge multiple config files with later paths taking precedence; performs a deep merge of nested dicts and supports JSON/YAML parsing.
  • parse_host_port(host_url: str) -> Tuple[str, int] — parse host:port or URL into (host, port).
  • LLMProviderError — base exception used by connector/provider helpers.
  • Ollama helpers: server_is_up, endpoint_url, ensure_ollama_running, get_ollama_binary, install_ollama, start_ollama, stop_ollama, update_ollama, list_local_models, list_remote_models, download_model, delete_model, serve_model, change_ollama_config, run_ollama_command, etc.

Key classes and functions

count_tokens(text: str) -> int : Returns an estimated token count. If tiktoken is installed it uses a real encoding; otherwise a conservative heuristic is used.

estimate_remote_timeout(model_name: Optional[str], input_tokens: int = 2048, concurrency: int = 1, with_source: bool = False) -> int | Tuple[int, Dict[str, Any]] : Returns an integer number of seconds to use as a conservative request timeout for remote LLM calls. Reads a small catalog shipped in modelito/data and applies family/keyword multipliers when present. For a diagnostic breakdown use estimate_remote_timeout_details(...); a CLI wrapper is available at modelito.timeout_cli, and a small calibration harness lives at modelito.timeout_calibrate.

You can request diagnostic details about how a timeout was computed by calling the function with with_source=True. This returns a tuple (timeout_seconds, source_dict) where source_dict contains the matched catalog band, any model overrides, multipliers and other metadata useful for debugging and calibrating timeouts. Example:

from modelito import estimate_remote_timeout

timeout, info = estimate_remote_timeout("llama-2-70b", input_tokens=1000, concurrency=1, with_source=True)
print(timeout)
print(info)

OllamaConnector(provider, shared_history: bool = False, system_message_file: Optional[str] = None, max_history_messages: int = 20, max_history_tokens: Optional[int] = None) : Lightweight stateful connector that manages per-conversation histories and prepares messages lists suitable for provider .summarize() calls.

Important OllamaConnector methods

  • clear_history(conv_id: Optional[str] = None) -> None
  • set_system_message(text: Optional[str]) -> None
  • add_to_history(conv_id: Optional[str], role: str, content: str) -> None
  • get_history(conv_id: Optional[str]) -> List[Dict[str, str]]
  • build_prompt(conv_id: Optional[str], new_messages: Optional[List[Dict[str, str]]]=None, include_history: bool=True, max_prompt_tokens: Optional[int]=None) -> List[Dict[str,str]]
  • send_sync(conv_id: Optional[str], new_messages: List[Dict[str,str]], settings: Optional[dict]=None) -> str — convenience helper that builds the prompt, calls provider.summarize(messages, settings=settings) and updates local history (returns str).
  • complete(conv_id: Optional[str], new_messages: Optional[Iterable]=None, settings: Optional[dict]=None) -> Response — typed convenience wrapper returning a Response dataclass.
  • acomplete(conv_id: Optional[str], new_messages: Optional[Iterable]=None, settings: Optional[dict]=None) -> Response — asynchronous variant.

Provider shims

Provider adapters implement the small provider surfaces used by the connectors. Implementations may choose to support the sync summarize() surface, the async acomplete() surface, streaming, chat(), raw OpenAI-compatible passthrough, and/or embeddings. The core convenience methods are:

  • list_models() -> List[str] — best-effort model enumeration (may be an empty list in offline mode).
  • summarize(messages, settings: Optional[dict] = None) -> str — synchronous completion surface.
  • acomplete(messages, settings: Optional[dict] = None) -> str — asynchronous completion surface (optional).
  • stream(messages, settings: Optional[dict] = None) -> Iterable[str] — streaming generator (optional).
  • chat(messages, settings: Optional[dict] = None) -> Response — structured response surface with metadata when supported.
  • raw_complete(payload: dict[str, Any]) -> dict[str, Any] — OpenAI-compatible completion passthrough that preserves tool calls and arbitrary fields when supported.
  • raw_stream(payload: dict[str, Any]) -> Iterable[dict[str, Any]] — OpenAI-compatible streaming passthrough that yields parsed JSON chunks when supported.
  • embed(texts: Iterable[str], **kwargs) -> List[List[float]] — embeddings surface (optional).

Embeddings-only wrapper

Embedder(provider: str | EmbeddingProvider = "openai", model: Optional[str] = None, **kwargs) : Small embeddings-only runtime selector. It resolves a named embedder from modelito.provider_registry and exposes the narrow embed() surface.

Important Embedder methods and attributes:

  • embed(texts: Iterable[str], **kwargs) -> List[List[float]]
  • provider_name -> str
  • available_embedders() -> List[str]

Registry helpers:

  • from modelito.provider_registry import get_embedder, list_embedders

Example:

from modelito import Embedder

embedder = Embedder(provider="mock")
vectors = embedder.embed(["one", "two"])
print(vectors)
print(Embedder.available_embedders())

Streaming semantics

Providers may stream outputs at different granularities; modelito normalizes these into a simple incremental stream() generator that yields str pieces. Typical provider streaming shapes:

  • Token-level: SDKs may provide token deltas. Modelito yields these as short text fragments suitable for concatenation.
  • Chunk-level: Providers that emit logical chunks or JSON events are parsed and the textual payload is yielded as chunks.
  • Line-delimited / SSE: HTTP services (e.g., Ollama /api/generate) may send newline-delimited JSON/SSE frames; modelito reads and normalizes these to textual chunks.

The stream(messages, settings=None) generator returns an iterable of str fragments which, when concatenated, form the final response. Offline fallbacks emit a single full-text chunk.

Structured output helpers

Client.chat_json(messages, schema=None, settings=None, strict_schema=False) -> dict : Request structured JSON output and return a parsed dict; optionally apply key-presence schema checks and stricter runtime validation when strict_schema=True.

Client.chat_parsed(messages, schema, settings=None, strict_schema=True) -> Any : Request structured JSON output and return a parsed schema object when supported (dataclass or Pydantic-style model hooks).

Ollama helpers

The ollama_service module contains a number of small helpers to interact with the Ollama CLI and HTTP API. The most commonly used helpers are:

  • endpoint_url(host: str, port: int, path: str = "/api/generate") -> str
  • server_is_up(host: str, port: int) -> bool
  • ensure_ollama_running(host: str = "http://127.0.0.1", port: int = 11434, auto_start: bool = False, start_args: Optional[list] = None, timeout: float = 10.0) -> bool
  • get_ollama_binary() -> Optional[str]
  • list_local_models() -> List[str], list_remote_models() -> List[str], and list_remote_model_catalog(query: Optional[str] = None) -> List[RemoteModelCatalogEntry]
  • download_model(model_name: str) -> bool, download_model_progress(model_name: str) -> Iterable[ModelLifecycleState], and delete_model(model_name: str) -> bool
  • serve_model(model_name: Optional[str] = None, start_args: Optional[list] = None, timeout: float = 10.0) -> bool
  • ensure_model_available(model_name: str, allow_download: bool = False, timeout: float = 600.0) -> bool — convenience helper to ensure a model is present locally, optionally downloading it.
  • ensure_model_ready(model_name: str, host: str = "http://127.0.0.1", port: int = 11434, auto_start: bool = False, allow_download: bool = False, timeout: float = 120.0) -> bool — ensure a specific model is downloaded, warmed, and responsive.
  • ensure_model_ready_detailed(model_name: str, host: str = "http://127.0.0.1", port: int = 11434, auto_start: bool = False, allow_download: bool = False, timeout: float = 120.0) -> ReadinessResult — like ensure_model_ready() but returns a structured ReadinessResult object with success status, lifecycle phase, message, source, elapsed_seconds, and error details for cleaner UI integration.
  • get_model_lifecycle_state(model_name: str) -> Optional[ModelLifecycleState], list_model_lifecycle_states() -> Dict[str, ModelLifecycleState], and clear_model_lifecycle_state(model_name: str) -> bool — inspect or reset the in-memory per-model lifecycle tracker.
  • Async wrappers: async_preload_model, async_list_local_models, async_list_remote_models, async_download_model, async_delete_model, async_serve_model, async_ensure_model_available, async_ensure_model_ready, async_ensure_model_ready_detailed — simple asyncio-friendly wrappers that run the synchronous helpers in an executor.
  • change_ollama_config(config: dict, config_path: Optional[str] = None) -> bool

Additional helpers and CLI

The module exposes a few additional convenience helpers and CLI entrypoints useful for diagnostics and local workflows:

  • detect_install_method(platform_name: Optional[str] = None) -> str — pick the preferred install backend (brew, apt, choco, or script-based fallback) for the current platform.
  • pull_model(model_name: str, timeout: float = 600.0) -> bool — convenience wrapper for download_model.
  • preload_model(url: str, port: int, model: str, timeout: float = 120.0) -> None — warm a model via the HTTP API.
  • load_remote_timeout_catalog(path: Optional[Path] = None) -> dict — load the timeout catalog (falls back to the bundled catalog).
  • common_model_timeout(model_name: str) -> Optional[float] — returns a conservative timeout in seconds for a given model.

Platform-specific installer policies

The detect_install_method() and install_ollama() helpers implement a platform-aware preference order to ensure consistent behavior across environments:

  • macOS: brew (if available) → script-based fallback
  • Linux: apt (if available) → script-based fallback
  • Windows: choco (if available) → PowerShell-based fallback

This policy ensures the most commonly available package manager is preferred on each platform. To override and use a specific method, pass the method parameter explicitly to install_ollama(method="script") or detect_install_method() will return the best-effort choice for your platform.

Structured admin helpers

For higher-level tooling, ollama_service now exposes two small dataclasses:

  • RemoteModelCatalogEntry — structured remote catalog item with name, family, tag, installed, and raw fields.
  • ModelLifecycleState — in-memory state snapshot with phase, message, progress, error, and updated_at fields.

CLI usage

modelito exposes two small module-level CLIs useful during development:

  • python -m modelito doctor — diagnose provider readiness and report setup hints.
  • modelito-serve — optional OpenAI-compatible server (/v1/models, /v1/chat/completions, /v1/embeddings) requiring pip install "modelito[serve]".
  • python -m modelito.ollama_service — minimal Ollama lifecycle CLI (start, stop, install, inspect, pull, list-local, list-remote, version).
  • python -m modelito.timeout_cli — print estimated timeouts and diagnostic details for a model.
  • python -m modelito.timeout_calibrate — write calibration prompts and (optionally) exercise a local Ollama server to collect timing samples.

Examples

Use the OllamaConnector together with a provider shim for local tests:

from modelito import OllamaProvider, OllamaConnector
from modelito.messages import Message

provider = OllamaProvider()
conn = OllamaConnector(provider=provider)
resp = conn.send_sync(conv_id="example", new_messages=[Message(role="user", content="Summarize: Hello world")])
print(resp)

Notes

  • The package intentionally keeps provider shims minimal; they are primarily intended for tests and simple local workflows.
  • For production usage you should replace provider shims with real SDK-backed implementations that implement the same list_models() / summarize() surface.
  • Static model metadata in modelito.model_metadata is best-effort fallback data. Prefer provider-reported model information when available.
  • Unknown metadata fields are intentionally represented as None, and static metadata should not be treated as authoritative for safety-critical routing.

Advanced API Features

Unified Provider Abstraction:

  • All providers (OpenAI, Anthropic, Google, Ollama, etc.) accessed via a consistent interface.
  • Runtime provider/model switching: from modelito.provider_registry import get_provider, list_providers.
  • Runtime embedder selection: from modelito.provider_registry import get_embedder, list_embedders.

Local Model Management:

  • Auto-discovery and health checks for local models (Ollama, etc.): LocalModelManager.
  • Dynamic model selection without restart.

API Key Management:

  • Secure, user-friendly API key management: APIKeyManager.
  • Supports environment variable overrides and config files.
  • Validation and error reporting.

Streaming & Partial Results:

  • All streaming-capable providers expose a stream() method for incremental results.
  • See StreamingProvider protocol.

Error Handling & Diagnostics:

  • Standardized error messages and diagnostics: see modelito.errors.
  • Structured error objects for troubleshooting.

Model Capabilities Discovery:

  • Expose model metadata (context window, function/tool support, etc.): get_model_metadata().

Testing & Mocking:

  • Built-in mock mode for testing/CI/offline: MockProvider.

Performance & Caching:

  • Optional in-memory response caching: ResponseCache.
  • Batching utilities for embeddings and batchable operations: batch_iterable.

See the tests/ directory for usage examples and coverage for all features.