LLM Inference for Go

A single interface in Go to get inference from multiple LLM / AI providers using their official SDKs.

Features at a glance
Installation
Quickstart
Examples
Provider configuration
Supported providers
Model capabilities and normalization
HTTP debugging
Notes
Development
License

Features at a glance

Single normalized interface (ProviderSetAPI) for multiple providers. Current support:
- Anthropic Messages API. Official SDK used
- OpenAI Chat Completions API Official SDK used
- OpenAI Responses API Official SDK used
Normalized data model in spec/:
- messages (user / assistant / system/developer instructions are provided via ModelParam.SystemPrompt),
- text, images, and files, (no audio/video content types yet),
- tools (function, custom, built-in tools like web search),
- reasoning / thinking content,
- streaming events (text + thinking),
- usage accounting.
- output controls (structured output, verbosity/effort) and tool policies (where supported by provider APIs).
- capabilities + normalization:
  - all feature support per SDK is described by spec.ModelCapabilities (spec/capability.go)
  - default SDK-wide capability profiles live in internal/*/capability.go
  - capabilities are available programmatically via ProviderSetAPI.GetProviderCapability
Streaming support:
- Text streaming for all providers that support it.
- Reasoning / thinking streaming where the provider exposes it (Anthropic, OpenAI Responses).
Client and Server Tools:
- Client tools are supported via Function Calling.
- Anthropic server-side web search.
- OpenAI Responses web search tool.
- OpenAI Chat Completions web search via web_search_options.
HTTP-level debugging:
- Pluggable CompletionDebugger interface.
- A built-in ready to use implementation at: debugclient.HTTPCompletionDebugger:
  - wraps SDK HTTP clients,
  - captures request/response metadata,
  - redacts secrets and sensitive content,
  - attaches a scrubbed debug blob to FetchCompletionResponse.DebugDetails.

Installation

# Go 1.25+
go get github.com/flexigpt/inference-go

Quickstart

Basic pattern:

Create a ProviderSetAPI.
Add one or more providers. Set their API keys.
Send a FetchCompletionRequest.

Examples

Basic Anthropic Messages
Basic OpenAI Chat Completions
Basic OpenAI Responses
Extended OpenAI Responses example
- Demonstrates tools, web search, file and image attachments.
Capability override example (get provider caps, override per-model)

Provider configuration

Providers are registered dynamically via ProviderSetAPI.AddProvider, using AddProviderConfig.

Fields:

sdkType (spec.ProviderSDKType)
- providerSDKTypeAnthropicMessages
- providerSDKTypeOpenAIChatCompletions
- providerSDKTypeOpenAIResponses
origin (string, required)
- Base URL to the provider (or your gateway / proxy). Example: https://api.openai.com
chatCompletionPathPrefix (string, optional)
- Extra path prefix appended to origin before the SDK adds the endpoint path.
- Useful when routing through gateways like https://my-gateway.example.com/openai/.
- If you accidentally include the full endpoint path, the adapter will trim the suffix that the official SDK adds:
  - Anthropic: trims trailing v1/messages
  - OpenAI Chat Completions: trims trailing chat/completions
  - OpenAI Responses: trims trailing responses
apiKeyHeaderKey (string, optional)
- If your gateway expects a non-standard API key header, set it here.
- The adapters attach this header when it differs from the standard header:
  - Anthropic standard: x-api-key
  - OpenAI standard: Authorization
defaultHeaders (map[string]string, optional)
- Extra headers appended to every request (e.g. gateway routing headers).

Supported providers

Anthropic Messages API

Feature support

Area	Supported?	Notes
Text input/output	yes	User and assistant messages mapped to text blocks.
Streaming text	yes
Reasoning / thinking	yes	Thinking/Redacted is supported; redacted is not streamed to caller. Thinking enabled == temperature omitted.
Streaming thinking	yes
Output formats	yes	Text (default) and `jsonSchema` via `ModelParam.OutputParam.format`.
Output verbosity / effort	yes	`ModelParam.OutputParam.verbosity` maps to Anthropic `output_config.effort` (low/medium/high/max).
Stop sequences	yes	`ModelParam.StopSequences` maps to `stop_sequences`.
Images (input)	yes	Inline base64 (`imageData`) or remote URLs (`imageURL`) mapped to Anthropic image blocks.
Files / documents (input)	yes	PDFs only, via base64 or URL. Plain-text base64 and other MIME types are currently ignored.
Audio/Video input/output	no
Tools (function/custom)	yes	JSON Schema based.
Tool policy	yes	`ToolPolicy` supported (auto/any/tool/none) + `disableParallel`.
Tool output content types	yes	Tool results support text/image/pdf-document blocks (within Anthropic API constraints).
Web search	yes	Server web search tool use + web search tool-result blocks.
Citations	partial	URL citations only. Other stateful citations are not mapped.
Metadata / service tiers	opaque	Not exposed in normalized types; available in debug payload.
Stateful flows	no	Library focuses on stateless calls only.
Usage data	yes	Input/Output/Cached. Anthropic doesn't expose Reasoning tokens usage.
Refusal output	partial	No dedicated refusal content item; `stop_reason=refusal` is surfaced via normalized status.
Max Tokens	yes	`MaxTokens` are compulsory in the API. This SDK enforces a default of `8192`

Behavior for conversational + interleaved reasoning message input
- Input: No reasoning content in the incoming messages.
  - Action: Build the message list unchanged. If the last user message is a tool_result, force thinking disabled; otherwise, honor the requested thinking setting.
- Input: All reasoning messages are signed.
  - Action: Build the message list unchanged. If the last user message is a tool_result and the previous assistant message begins with thinking content, force thinking enabled; otherwise, honor the requested thinking setting.
- Input: Mix of reasoning messages where some include a valid signature thinking and others do not.
  - Action: Retain only the reasoning messages with a valid signature; drop the rest. Apply the above behaviors after this cleanup.

OpenAI Responses API

Feature support

Area	Supported?	Notes
Text input/output	yes	Input/output messages fully supported.
Streaming text	yes
Reasoning / thinking	yes	Reasoning outputs are mapped. Reasoning inputs are accepted only as `encrypted_content`; others are dropped.
Streaming thinking	yes
Output formats	partial	Text (default) and `jsonSchema` via `ModelParam.OutputParam.format` (mapped to `params.Text.format`).
Output verbosity	yes
Stop sequences	no	OpenAI Responses API doesnt support stop sequences (ignored if provided in `ModeParams`).
Images (input)	yes	`imageData` (base64) or `imageURL`, with `detail` low/high/auto, mapped to Responses `input_image` items.
Files / documents (input)	yes	`fileData` (base64) or `fileURL` mapped to Responses `input_file` items; works for PDFs and other file MIME types.
Audio/Video input/output	no
Tools (function/custom)	yes	JSON Schema based. Note: `custom` tool definitions are currently emitted as `function` tools.
Tool policy	yes	`ToolPolicy` supported (auto/any/tool/none) + `disableParallel`.
Tool output content types	yes	Function/custom tool outputs can carry text/image/file content (data or URL).
Web search	yes	Calls are mapped when emitted; results typically surface as citations/annotations in text.
Citations	yes	URL citations mapped to `spec.CitationKindURL`.
Metadata / service tiers	opaque	Not exposed in normalized types; available in debug payload.
Stateful flows	no	Store is explicitly disabled (`Store: false`).
Usage data	yes	Input/Output/Cached/Reasoning.

Behavior for conversational + interleaved reasoning message input
- Input: No reasoning messages.
  - Action: Build the message list unchanged. Honor the requested thinking setting.
- Input: All reasoning messages are encrypted_content.
  - Action: Build the message list unchanged. Honor the requested thinking setting.
- Input: Mixed reasoning messages: some are signature-based and some are encrypted_content.
  - Action: Keep only the encrypted_content reasoning; drop the signature-based reasoning.

OpenAI Chat Completions API

Feature support

Area	Supported?	Notes
Text input/output	yes	Only the first choice from output is surfaced up.
Streaming text	yes
Reasoning / thinking	yes	Reasoning effort config only; no separate reasoning messages in API.
Streaming thinking	no	Not exposed by Chat Completions.
Output formats	yes	Text (default) and `jsonSchema` via `ModelParam.OutputParam.format` (mapped to `response_format`).
Output verbosity	yes	`ModelParam.OutputParam.verbosity` mapped to `verbosity` (max maps to high).
Stop sequences	yes	Supported up to 4 sequences (API limit); errors if more than 4 are provided.
Images (input)	yes	`imageData` (base64) and `imageURL` are both supported; base64 is sent as a data URL with `detail` low/high/auto.
Files / documents (input)	yes	`fileData` (base64) only, sent as a data URL; `fileURL` and stateful file IDs are not used by this adapter.
Audio/Video input/output	no
Tools (function/custom)	yes	JSON Schema based. Note: `custom` tool definitions are currently emitted as `function` tools.
Tool policy	yes	`ToolPolicy` supported (auto/any/tool/none) + `disableParallel` (mapped to `parallel_tool_calls=false`).
Tool output content types	partial	Tool outputs are forwarded as tool messages with text only; image/file tool output items are ignored. (API limit)
Web search	yes	Not a tool-call in this API; configured via top-level `web_search_options` derived from a `webSearch` `ToolChoice`.
Citations	yes	URL citations mapped from annotations.
Metadata / service tiers	opaque	Not exposed in normalized types; available in debug payload.
Stateful flows	no	Library focuses on stateless calls only.
Usage data	yes	Input/Output/Cached/Reasoning.
System prompt role	partial	SystemPrompt is sent as `developer` for OpenAI `o` / `gpt-5` models, for others its sent as `system` .

Behavior for conversational + interleaved reasoning message input
- Reasoning effort config is kept as is.
- All reasoning input/output messages are dropped as the api doesn't support it.

Model capabilities and normalization

This SDK validates and normalizes requests against a capability profile before calling the underlying provider SDK. Key points:
- The capability schema is spec.ModelCapabilities in spec/capability.go.
- Each provider adapter has an SDK-wide default capability profile (as a Go struct):
  - Anthropic Messages: internal/anthropicsdk/capability.go
  - OpenAI Chat Completions: internal/openaichatsdk/capability.go
  - OpenAI Responses: internal/openairesponsessdk/capability.go
- You can access these defaults programmatically via:
  - ProviderSetAPI.GetProviderCapability(ctx, providerName)
Recommended: Per-model behavior
- Real world features support varies by model.
- To enforce per-model differences, pass a spec.ModelCapabilityResolver in FetchCompletionOptions.
  - The resolver can start from the provider’s SDK-wide defaults and override fields as needed.
See a runnable repository example that demonstrates the intended flow:
- internal/integration/capability_override_example_test.go

HTTP debugging

The library exposes a pluggable CompletionDebugger interface:

type CompletionDebugger interface {
    HTTPClient(base *http.Client) *http.Client
    StartSpan(ctx context.Context, info *spec.CompletionSpanStart) (context.Context, spec.CompletionSpan)
}

package debugclient includes an implementation that can be readily used as HTTPCompletionDebugger:
- wraps the provider SDK’s *http.Client,
- captures and scrubs:
  - URL, method, headers (with secret redaction),
  - query params,
  - request/response bodies (optional, scrubbed of LLM text and large base64),
  - curl command for reproduction,
- attaches a structured HTTPDebugState to FetchCompletionResponse.DebugDetails.
- You can then inspect resp.DebugDetails for a given call, or just rely on slog output.
Use it via WithDebugClientBuilder:

ps, _ := inference.NewProviderSetAPI(
    inference.WithDebugClientBuilder(func(p spec.ProviderParam) spec.CompletionDebugger {
        return debugclient.NewHTTPCompletionDebugger(&debugclient.DebugConfig{
            LogToSlog: false,
        })
    }),
)

Notes

Stateless focus. The design focuses on stateless request/response interactions:
- no conversation IDs,
- no file IDs,
Opaque / provider‑specific fields.
- Many provider‑specific fields (error details, service tiers, cache metadata, full raw responses) are only available through the debug payload, not in the normalized spec types.
- Few of the common needed params may be added over time and as needed.
Token counting - Normalized Usage reports what the provider exposes:
- Anthropic: input vs. cached tokens, output tokens.
- OpenAI: prompt vs. cached tokens, completion tokens, reasoning tokens where available.
Heuristic prompt filtering.
- ModelParam.MaxPromptLength triggers sdkutil.FilterMessagesByTokenCount, which uses a simple heuristic token counter. It is approximate, not an exact tokenizer.

Development

Formatting follows gofumpt and golines via golangci-lint, which is also used for linting. All rules are in .golangci.yml.
Useful scripts are defined in taskfile.yml; requires Task.
Bug reports and PRs are welcome:
- Keep the public API (package inference and spec) small and intentional.
- Avoid leaking provider‑specific types through the public surface; put them under internal/.
- Please run tests and linters before sending a PR.

License

All source code in this repository, unless otherwise noted, is licensed under the MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.github		.github
.vscode		.vscode
debugclient		debugclient
docs		docs
internal		internal
scripts		scripts
spec		spec
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.testcoverage.yml		.testcoverage.yml
.tool-versions		.tool-versions
LICENSE		LICENSE
README.md		README.md
data_contract_meta.go		data_contract_meta.go
data_contract_meta_test.go		data_contract_meta_test.go
go.mod		go.mod
go.sum		go.sum
provider_set.go		provider_set.go
taskfile.yml		taskfile.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Inference for Go

Features at a glance

Installation

Quickstart

Examples

Provider configuration

Supported providers

Anthropic Messages API

OpenAI Responses API

OpenAI Chat Completions API

Model capabilities and normalization

HTTP debugging

Notes

Development

License

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

LLM Inference for Go

Features at a glance

Installation

Quickstart

Examples

Provider configuration

Supported providers

Anthropic Messages API

OpenAI Responses API

OpenAI Chat Completions API

Model capabilities and normalization

HTTP debugging

Notes

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Contributors

Uh oh!

Languages