Skip to content

flexigpt/inference-go

Repository files navigation

LLM Inference for Go

License: MIT Go Report Card lint test

A single interface in Go to get inference from multiple LLM / AI providers using their official SDKs.

Features at a glance

  • Single normalized interface (ProviderSetAPI) for multiple providers. Current support:

  • Normalized data model in spec/:

    • messages (user / assistant / system/developer instructions are provided via ModelParam.SystemPrompt),
    • text, images, and files, (no audio/video content types yet),
    • tools (function, custom, built-in tools like web search),
    • reasoning / thinking content,
    • streaming events (text + thinking),
    • usage accounting.
    • output controls (structured output, verbosity/effort) and tool policies (where supported by provider APIs).
    • capabilities + normalization:
      • all feature support per SDK is described by spec.ModelCapabilities (spec/capability.go)
      • default SDK-wide capability profiles live in internal/*/capability.go
      • capabilities are available programmatically via ProviderSetAPI.GetProviderCapability
  • Streaming support:

    • Text streaming for all providers that support it.
    • Reasoning / thinking streaming where the provider exposes it (Anthropic, OpenAI Responses).
  • Client and Server Tools:

    • Client tools are supported via Function Calling.
    • Anthropic server-side web search.
    • OpenAI Responses web search tool.
    • OpenAI Chat Completions web search via web_search_options.
  • HTTP-level debugging:

    • Pluggable CompletionDebugger interface.
    • A built-in ready to use implementation at: debugclient.HTTPCompletionDebugger:
      • wraps SDK HTTP clients,
      • captures request/response metadata,
      • redacts secrets and sensitive content,
      • attaches a scrubbed debug blob to FetchCompletionResponse.DebugDetails.

Installation

# Go 1.25+
go get github.com/flexigpt/inference-go

Quickstart

Basic pattern:

  1. Create a ProviderSetAPI.
  2. Add one or more providers. Set their API keys.
  3. Send a FetchCompletionRequest.

Examples

Provider configuration

Providers are registered dynamically via ProviderSetAPI.AddProvider, using AddProviderConfig.

Fields:

  • sdkType (spec.ProviderSDKType)
    • providerSDKTypeAnthropicMessages
    • providerSDKTypeOpenAIChatCompletions
    • providerSDKTypeOpenAIResponses
  • origin (string, required)
    • Base URL to the provider (or your gateway / proxy). Example: https://api.openai.com
  • chatCompletionPathPrefix (string, optional)
    • Extra path prefix appended to origin before the SDK adds the endpoint path.
    • Useful when routing through gateways like https://my-gateway.example.com/openai/.
    • If you accidentally include the full endpoint path, the adapter will trim the suffix that the official SDK adds:
      • Anthropic: trims trailing v1/messages
      • OpenAI Chat Completions: trims trailing chat/completions
      • OpenAI Responses: trims trailing responses
  • apiKeyHeaderKey (string, optional)
    • If your gateway expects a non-standard API key header, set it here.
    • The adapters attach this header when it differs from the standard header:
      • Anthropic standard: x-api-key
      • OpenAI standard: Authorization
  • defaultHeaders (map[string]string, optional)
    • Extra headers appended to every request (e.g. gateway routing headers).

Supported providers

Anthropic Messages API

Feature support

Area Supported? Notes
Text input/output yes User and assistant messages mapped to text blocks.
Streaming text yes
Reasoning / thinking yes Thinking/Redacted is supported; redacted is not streamed to caller. Thinking enabled == temperature omitted.
Streaming thinking yes
Output formats yes Text (default) and jsonSchema via ModelParam.OutputParam.format.
Output verbosity / effort yes ModelParam.OutputParam.verbosity maps to Anthropic output_config.effort (low/medium/high/max).
Stop sequences yes ModelParam.StopSequences maps to stop_sequences.
Images (input) yes Inline base64 (imageData) or remote URLs (imageURL) mapped to Anthropic image blocks.
Files / documents (input) yes PDFs only, via base64 or URL. Plain-text base64 and other MIME types are currently ignored.
Audio/Video input/output no
Tools (function/custom) yes JSON Schema based.
Tool policy yes ToolPolicy supported (auto/any/tool/none) + disableParallel.
Tool output content types yes Tool results support text/image/pdf-document blocks (within Anthropic API constraints).
Web search yes Server web search tool use + web search tool-result blocks.
Citations partial URL citations only. Other stateful citations are not mapped.
Metadata / service tiers opaque Not exposed in normalized types; available in debug payload.
Stateful flows no Library focuses on stateless calls only.
Usage data yes Input/Output/Cached. Anthropic doesn't expose Reasoning tokens usage.
Refusal output partial No dedicated refusal content item; stop_reason=refusal is surfaced via normalized status.
Max Tokens yes MaxTokens are compulsory in the API. This SDK enforces a default of 8192
  • Behavior for conversational + interleaved reasoning message input
    • Input: No reasoning content in the incoming messages.
      • Action: Build the message list unchanged. If the last user message is a tool_result, force thinking disabled; otherwise, honor the requested thinking setting.
    • Input: All reasoning messages are signed.
      • Action: Build the message list unchanged. If the last user message is a tool_result and the previous assistant message begins with thinking content, force thinking enabled; otherwise, honor the requested thinking setting.
    • Input: Mix of reasoning messages where some include a valid signature thinking and others do not.
      • Action: Retain only the reasoning messages with a valid signature; drop the rest. Apply the above behaviors after this cleanup.

OpenAI Responses API

Feature support

Area Supported? Notes
Text input/output yes Input/output messages fully supported.
Streaming text yes
Reasoning / thinking yes Reasoning outputs are mapped. Reasoning inputs are accepted only as encrypted_content; others are dropped.
Streaming thinking yes
Output formats partial Text (default) and jsonSchema via ModelParam.OutputParam.format (mapped to params.Text.format).
Output verbosity yes
Stop sequences no OpenAI Responses API doesnt support stop sequences (ignored if provided in ModeParams).
Images (input) yes imageData (base64) or imageURL, with detail low/high/auto, mapped to Responses input_image items.
Files / documents (input) yes fileData (base64) or fileURL mapped to Responses input_file items; works for PDFs and other file MIME types.
Audio/Video input/output no
Tools (function/custom) yes JSON Schema based. Note: custom tool definitions are currently emitted as function tools.
Tool policy yes ToolPolicy supported (auto/any/tool/none) + disableParallel.
Tool output content types yes Function/custom tool outputs can carry text/image/file content (data or URL).
Web search yes Calls are mapped when emitted; results typically surface as citations/annotations in text.
Citations yes URL citations mapped to spec.CitationKindURL.
Metadata / service tiers opaque Not exposed in normalized types; available in debug payload.
Stateful flows no Store is explicitly disabled (Store: false).
Usage data yes Input/Output/Cached/Reasoning.
  • Behavior for conversational + interleaved reasoning message input
    • Input: No reasoning messages.
      • Action: Build the message list unchanged. Honor the requested thinking setting.
    • Input: All reasoning messages are encrypted_content.
      • Action: Build the message list unchanged. Honor the requested thinking setting.
    • Input: Mixed reasoning messages: some are signature-based and some are encrypted_content.
      • Action: Keep only the encrypted_content reasoning; drop the signature-based reasoning.

OpenAI Chat Completions API

Feature support

Area Supported? Notes
Text input/output yes Only the first choice from output is surfaced up.
Streaming text yes
Reasoning / thinking yes Reasoning effort config only; no separate reasoning messages in API.
Streaming thinking no Not exposed by Chat Completions.
Output formats yes Text (default) and jsonSchema via ModelParam.OutputParam.format (mapped to response_format).
Output verbosity yes ModelParam.OutputParam.verbosity mapped to verbosity (max maps to high).
Stop sequences yes Supported up to 4 sequences (API limit); errors if more than 4 are provided.
Images (input) yes imageData (base64) and imageURL are both supported; base64 is sent as a data URL with detail low/high/auto.
Files / documents (input) yes fileData (base64) only, sent as a data URL; fileURL and stateful file IDs are not used by this adapter.
Audio/Video input/output no
Tools (function/custom) yes JSON Schema based. Note: custom tool definitions are currently emitted as function tools.
Tool policy yes ToolPolicy supported (auto/any/tool/none) + disableParallel (mapped to parallel_tool_calls=false).
Tool output content types partial Tool outputs are forwarded as tool messages with text only; image/file tool output items are ignored. (API limit)
Web search yes Not a tool-call in this API; configured via top-level web_search_options derived from a webSearch ToolChoice.
Citations yes URL citations mapped from annotations.
Metadata / service tiers opaque Not exposed in normalized types; available in debug payload.
Stateful flows no Library focuses on stateless calls only.
Usage data yes Input/Output/Cached/Reasoning.
System prompt role partial SystemPrompt is sent as developer for OpenAI o* / gpt-5* models, for others its sent as system .
  • Behavior for conversational + interleaved reasoning message input
    • Reasoning effort config is kept as is.
    • All reasoning input/output messages are dropped as the api doesn't support it.

Model capabilities and normalization

  • This SDK validates and normalizes requests against a capability profile before calling the underlying provider SDK. Key points:

    • The capability schema is spec.ModelCapabilities in spec/capability.go.
    • Each provider adapter has an SDK-wide default capability profile (as a Go struct):
      • Anthropic Messages: internal/anthropicsdk/capability.go
      • OpenAI Chat Completions: internal/openaichatsdk/capability.go
      • OpenAI Responses: internal/openairesponsessdk/capability.go
    • You can access these defaults programmatically via:
      • ProviderSetAPI.GetProviderCapability(ctx, providerName)
  • Recommended: Per-model behavior

    • Real world features support varies by model.
    • To enforce per-model differences, pass a spec.ModelCapabilityResolver in FetchCompletionOptions.
      • The resolver can start from the provider’s SDK-wide defaults and override fields as needed.
  • See a runnable repository example that demonstrates the intended flow:

HTTP debugging

The library exposes a pluggable CompletionDebugger interface:

type CompletionDebugger interface {
    HTTPClient(base *http.Client) *http.Client
    StartSpan(ctx context.Context, info *spec.CompletionSpanStart) (context.Context, spec.CompletionSpan)
}
  • package debugclient includes an implementation that can be readily used as HTTPCompletionDebugger:

    • wraps the provider SDK’s *http.Client,
    • captures and scrubs:
      • URL, method, headers (with secret redaction),
      • query params,
      • request/response bodies (optional, scrubbed of LLM text and large base64),
      • curl command for reproduction,
    • attaches a structured HTTPDebugState to FetchCompletionResponse.DebugDetails.
    • You can then inspect resp.DebugDetails for a given call, or just rely on slog output.
  • Use it via WithDebugClientBuilder:

ps, _ := inference.NewProviderSetAPI(
    inference.WithDebugClientBuilder(func(p spec.ProviderParam) spec.CompletionDebugger {
        return debugclient.NewHTTPCompletionDebugger(&debugclient.DebugConfig{
            LogToSlog: false,
        })
    }),
)

Notes

  • Stateless focus. The design focuses on stateless request/response interactions:

    • no conversation IDs,
    • no file IDs,
  • Opaque / provider‑specific fields.

    • Many provider‑specific fields (error details, service tiers, cache metadata, full raw responses) are only available through the debug payload, not in the normalized spec types.
    • Few of the common needed params may be added over time and as needed.
  • Token counting - Normalized Usage reports what the provider exposes:

    • Anthropic: input vs. cached tokens, output tokens.
    • OpenAI: prompt vs. cached tokens, completion tokens, reasoning tokens where available.
  • Heuristic prompt filtering.

    • ModelParam.MaxPromptLength triggers sdkutil.FilterMessagesByTokenCount, which uses a simple heuristic token counter. It is approximate, not an exact tokenizer.

Development

  • Formatting follows gofumpt and golines via golangci-lint, which is also used for linting. All rules are in .golangci.yml.
  • Useful scripts are defined in taskfile.yml; requires Task.
  • Bug reports and PRs are welcome:
    • Keep the public API (package inference and spec) small and intentional.
    • Avoid leaking provider‑specific types through the public surface; put them under internal/.
    • Please run tests and linters before sending a PR.

License

Copyright (c) 2026 - Present - Pankaj Pipada

All source code in this repository, unless otherwise noted, is licensed under the MIT License. See LICENSE for details.

About

A single interface in Go to get inference from multiple llm/ai providers using their official SDKs

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

 

Contributors