Skip to content

umutkeltek/switchback

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

296 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Switchback — a self-hosted AI gateway in Rust

Switchback

CI License: Elastic-2.0 Release Docker

One Rust binary for explainable AI provider routing.

Switchback is a self-hosted AI gateway for teams running LLM traffic across multiple providers and accounts. Point your existing OpenAI- or Anthropic-compatible clients at it and keep your app code unchanged.

It normalizes each request into a typed canonical IR, picks a provider + account using policy, health, cost, latency, quotas, and credential availability, then returns the response in the caller's original wire format. Every request emits an inspectable RouteDecision — selected target, rejected candidates with reasons, fallback order, scores, and a trace id.

# same client, different base URL — no code change
curl http://localhost:8765/v1/chat/completions \
  -H 'content-type: application/json' \
  -d '{"model":"mock/echo","messages":[{"role":"user","content":"hi"}]}'

Reach for Switchback when you need provider choice, credential control, budgets, fallback, and metadata-only observability without buying a hosted AI gateway or running a Python proxy.

Quickstart (60 seconds, no API keys)

# zero-setup mock config — serves immediately
cargo run -p sb-server -- serve --config config/quickstart.yaml      # or: docker run -p 8765:8765 ghcr.io/umutkeltek/switchback:latest

curl -s localhost:8765/health
curl -s localhost:8765/v1/chat/completions -H 'content-type: application/json' \
  -d '{"model":"mock/echo","messages":[{"role":"user","content":"hi"}]}'
open http://localhost:8765/        # the embedded dashboard

→ Full 5-minute walkthrough (tenant key, routing, fallback, cost cap, traces): QUICKSTART.md. Deploying for a team: OPERATIONS.md.

Native coding clients

Codex can use Switchback through the OpenAI Responses surface (/v1/responses), and Claude Code can use it through the Anthropic Messages surface (/v1/messages plus /v1/messages/count_tokens). The client auth files do not become the source of truth: provider credentials and account selection stay in Switchback config/vault/tenants. GET /v1/client-profiles reports the active Codex and Claude Code readiness, endpoint shape, visible models, and non-secret account sources for operators and LLMs.

switchback init --native-clients --config switchback.yaml
switchback serve --config switchback.yaml
open http://127.0.0.1:8765/

The native-client starter runs immediately on mock/echo, with explicit codex and claude-code profiles you can later repoint to real OpenAI/Anthropic provider accounts. Native token-source adapters are available as explicit account auth: kind: codex_oauth reads CODEX_ACCESS_TOKEN or ${HOME}/.codex/auth.json; kind: claude_code_oauth reads CLAUDE_CODE_OAUTH_TOKEN or claudeAiOauth.accessToken from ${HOME}/.claude/.credentials.json. These are direct token-source adapters, not first-party subscription relay. The planned relay provider kinds (codex_native_relay, claude_code_native_relay) fail closed until audited native wire fixtures and adapters exist.

Add a real provider — config, not code

An OpenAI-shaped provider is pure config; a non-bearer one is also config.

# switchback.yaml
providers:
  - id: openrouter
    type: openai_compatible
    base_url: "https://openrouter.ai/api/v1"
    accounts:
      - { id: main, auth: { kind: api_key, key_env: OPENROUTER_API_KEY } }
routes:
  - name: default
    match: { model: "*" }
    targets: ["openrouter/anthropic/claude-3.5-sonnet", "anthropic/claude-3.5-sonnet"]
switchback provider add openrouter --config switchback.yaml   # scaffolds the above
switchback route-preview --config switchback.yaml --model default --json   # see the decision before serving
switchback provider certify --config switchback.yaml          # is the provider actually live?

Presets exist for openai, openrouter, anthropic, gemini, deepseek, groq, mistral, together, fireworks, cerebras, xai, nvidia, ollama, vllm. Real recipes: PROVIDER_SETUP.md.

See the decision

route-preview (and every response's x-switchback-route header) returns the actual decision — not a black box:

{
  "request_id": "req_2a69eea4750c472c",
  "strategy": "ordered_fallback",
  "selected":  { "target_id": "openrouter/anthropic/claude-3.5-sonnet" },
  "fallbacks": [
    { "target_id": "anthropic/claude-3.5-sonnet" }
  ],
  "rejected": [
    { "target_id": "openai/gpt-4o",        "reason": "blended price over max_price ceiling" },
    { "target_id": "bedrock/claude-sonnet", "reason": "no healthy accounts" }
  ],
  "reason": ["route=default", "stream_required=true", "tools_required=false"],
  "unverified": false
}

Each candidate also carries a scores block (cost / latency / ttft / health / account_availability …) so the ordering is auditable.

What you get

  • Route across providers and accounts — OpenAI Chat & Responses, Anthropic Messages, Gemini/Vertex, AWS Bedrock; stream and non-stream; two-level fallback (accounts within a provider, then across providers).
  • Explain every decision — every request emits a RouteDecision; fallback is legal only before the first streamed byte.
  • Control credentials locally — mixed account auth (API key, OAuth refresh, native Codex/Claude Code OAuth tokens, service account, SigV4), fill-first/round-robin selection, per-account lockouts, an age-encrypted vault (key in the OS keychain), de-duplicated OAuth refresh.
  • Enforce budgets and quotas — global/per-provider spend caps; per-tenant budget_usd (→ 402) and max_concurrency (→ 429), rejected before dispatch.
  • Trace without storing prompts — metadata-only traces, durable session rollups, route replay diffs, and an append-only usage ledger; optional OpenTelemetry/Langfuse export.
  • Operate safely — provider certification, hot-reload with revision pinning, circuit breaker, admission control, and an embedded dashboard at /.

More — egress proxies, RTK tool-result compression, schema downleveling, the declarative /cp/v1 control plane, two-tier (Rust + Wasm) plugins, durable SQLite state — is in ARCHITECTURE.md.

Why Switchback?

You need Switchback's bias
A self-hosted team gateway One Rust binary; no hosted dependency, no Python runtime
Existing clients to keep working OpenAI/Anthropic-compatible ingress; no app rewrite
Provider/account control Local age-encrypted vault + per-account fallback
Debuggable routing Every request emits an inspectable RouteDecision
Confidence before cutover provider certify checks a provider is live first
Observability without prompt risk Metadata-only traces, sessions, replay diffs, Langfuse/OTel, and usage

Compatibility

Surface Status
OpenAI Chat Completions / Responses supported
Anthropic Messages (+ count_tokens) supported
Gemini / Vertex supported
AWS Bedrock (SigV4 + event-stream) supported
Streaming (SSE) supported, one code path
Tools / structured output supported, with provider-specific schema limits
Multimodal (images, audio) rejected at ingress — not silently downleveled

What it is — and isn't (yet)

Switchback is a single-binary team gateway, built so it can grow toward hosted scale without a rewrite. The hosted machinery is intentionally not built:

  • Single-host coordination, not a hosted cluster. The durable store is SQLite (WAL) — great for local and single-host/team use. Cross-process coordination on a shared SQLite file is possible only where filesystem locking is known and tested; it is not a hosted multi-node cluster backend (that's Postgres + Redis/etcd territory).
  • Usage is internal accounting, not billing. Accurate cost attribution + a billing_grade honesty flag — not provider-invoice reconciliation, pricing-version snapshots, or external audit export.
  • Tenancy is gateway-level isolation, not multi-tenant SaaS. API-key → tenant scoping, restrictions, quotas, and per-tenant views are real; there's no org/user hierarchy, row-level store filtering, per-tenant secrets, or SSO.

Install

# from source (stable Rust, pinned via rust-toolchain.toml)
git clone https://github.com/umutkeltek/switchback && cd switchback
cargo build --release          # binary at target/release/switchback

# or Docker (multi-arch image on every release)
docker run --rm -p 8765:8765 ghcr.io/umutkeltek/switchback:latest

Prebuilt binaries (linux/macOS/windows, x86_64 + aarch64, with checksums) are on the Releases page.

Security. On loopback (127.0.0.1) with no key set, the gateway runs open — fine locally. A non-loopback bind (0.0.0.0) refuses to start unless you set server.api_key/api_keys or explicitly opt into server.allow_open_admin: true. Once a key is set, every endpoint except / and /health requires it. See SECURITY.md.

Docs

Doc What
QUICKSTART.md 5-minute hands-on walkthrough
OPERATIONS.md Deploying for a team (Docker Compose, backups, auth)
ARCHITECTURE.md Crate-by-crate design + full capability reference
CLI.md The operator & agent-facing CLI contract
PROVIDER_SETUP.md Real provider recipes
SECURITY.md Security model + how to report a vulnerability
AGENTS.md Invariants, conventions, contribution recipes

Status

v0.1.0 is an early source-available release. The core is in place and tested: the data plane, the routing engine, multi-account credentials, metadata-only traces, hot-reload with revision pinning, and optional SQLite state for usage/traces/control metadata. APIs and config may change before v1.0 — data-plane compatibility is the priority. The intended scope is a single-binary team gateway, not a hosted multi-tenant SaaS or a billing platform. Full scope: ARCHITECTURE.md.

Why "Switchback"?

A switchback is a mountain road that keeps climbing by re-routing. That's the design goal: resilient routing that never loses control of where a request went, what it cost, and why.

Contributing

Read AGENTS.md (architecture + invariants) and CONTRIBUTING.md (workflow + the verification bar) before opening a PR. The short version: keep the crate graph acyclic, keep sb-core provider-agnostic, every request emits a RouteDecision, no secrets in logs, and claims need tool evidence (cargo build && cargo test, clippy clean, a curl smoke for request-path changes). Found a vulnerability? See SECURITY.md — report it privately.

License

Source-available under the Elastic License 2.0 (ELv2) — use, copy, modify, and self-host it; you may not offer it to third parties as a hosted/managed service or remove the licensing notices. ELv2 is source-available, not an OSI-approved open-source license.

About

One Rust binary for explainable AI provider routing: multi-provider fallback, encrypted credentials, budgets, quotas, and metadata-only traces — no client code changes.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors