A3S Gateway

The traffic layer for AI-native services

Single binary. ACL config. Hot reload. Built for LLM streaming, scale-to-zero, and safe model rollouts.

Why A3S Gateway • Quick Start • Features • Configuration • Architecture • Deployment • API Reference • Stability

Why A3S Gateway

AI services break the assumptions baked into Web-era gateways:

Assumption	Web services	AI services
Response size	Small, bounded	Unbounded (streaming tokens)
Latency	Milliseconds	Seconds (model inference)
Idle cost	Cheap	Expensive (GPU memory)
Deployment risk	Low	High (model quality regression)
Protocol	HTTP request/response	SSE, WebSocket, gRPC

nginx, Caddy, and Traefik were built for the left column. A3S Gateway is built for the right:

SSE/Streaming — chunked transfer without response buffering; first token reaches the client as soon as the model emits it
Scale-to-zero with request buffering — when a model is cold, incoming requests are held in memory and replayed the moment the replica is ready, not dropped or returned 503
Revision traffic splitting — send 5% of live traffic to a new model version; automatically roll back if error rate or p99 latency crosses a threshold
Traffic mirroring — shadow-test a new model with real requests before it handles a single production response
WebSocket multiplexing — named pub/sub channels over a single connection for real-time AI interactions

Everything else (routing, TLS, rate limiting, circuit breaker, Prometheus) is table-stakes infrastructure packaged so you don't need a second tool.

1069 tests | 80 source files | ~37,000 lines of Rust | Single statically-linked binary | MSRV 1.88

Quick Start

# Install
brew install a3s-lab/tap/a3s-gateway

# Or via cargo
cargo install a3s-gateway

# Start
a3s-gateway --config gateway.acl

# gateway.acl - proxy all traffic to an LLM service
entrypoints "web" {
  address = "0.0.0.0:8080"
}

routers "llm" {
  rule    = "PathPrefix(`/v1`)"
  service = "llm-backend"
  middlewares = ["rate-limit", "auth-jwt"]
}

services "llm-backend" {
  load_balancer {
    strategy        = "least-connections"
    request_timeout = "60s"
    servers         = [
      { url = "http://127.0.0.1:8001" },
      { url = "http://127.0.0.1:8002" }
    ]
    health_check { path = "/health" }
  }

  # Scale to zero — buffer requests during cold start
  scaling {
    min_replicas          = 0
    max_replicas          = 4
    container_concurrency = 10
    buffer_enabled        = true
    executor              = "box"
  }
}

middlewares "rate-limit" { type = "rate-limit"; rate = 60; burst = 10 }
middlewares "auth-jwt"   { type = "jwt"; value = env("JWT_SECRET") }

Programmatic

use a3s_gateway::{Gateway, config::GatewayConfig};
use std::sync::Arc;

#[tokio::main]
async fn main() -> a3s_gateway::Result<()> {
    let config = GatewayConfig::from_file("gateway.acl").await?;
    let gateway = Arc::new(Gateway::new(config)?);
    gateway.start().await?;
    gateway.wait_for_shutdown().await;
    Ok(())
}

Features

AI Workload Patterns

Feature	How it works
SSE / Streaming	Chunked transfer relay — zero response buffering, first token delivered immediately
Scale-to-zero	Knative formula: `desired = ⌈(in_flight + queue_depth) / (concurrency × utilization)⌉`
Cold-start buffering	Requests queue in memory during scale-up; replayed once backend is ready
Revision traffic splitting	Route N% to v1, M% to v2 with per-revision health tracking
Gradual rollout	Step-by-step traffic shift with automatic rollback on error rate or latency breach
Traffic mirroring	Fire-and-forget copy of live traffic to a shadow backend (no client impact)

Core Proxy

Dynamic routing: Traefik-style rule engine — Host(), PathPrefix(), Path(), Headers(), Method(), &&
Load balancing: Round-robin, weighted, least-connections, random
Health checks: Active HTTP probes + passive error-count eviction
Sticky sessions: Cookie-based backend affinity with TTL and LRU eviction
Failover: Automatic switch to secondary pool when primary has no healthy backends
Forwarded headers: Upstreams receive normalized X-Forwarded-For, X-Forwarded-Host, X-Forwarded-Proto, and X-Forwarded-Port
Upstream timeouts: Per-service plain HTTP request_timeout returns 504 Gateway Timeout on slow backends
TLS termination: rustls (pure Rust, no OpenSSL) + ACME/Let's Encrypt (HTTP-01, DNS-01/Cloudflare, DNS-01/Route53)
Hot reload: File-watch config reload (inotify/kqueue) — unchanged HTTP/TCP entrypoints are not rebound

Protocols

Protocol	Capability
HTTP/1.1 & HTTP/2	Full reverse proxy, hop-by-hop header filtering, normalized `X-Forwarded-*` metadata
WebSocket	Upgrade detection, bidirectional relay, named-channel multiplexing
SSE / Streaming	Chunked transfer, zero buffering — optimized for LLM token streams
gRPC	HTTP/2 h2c forwarding with header translation
TCP	Raw byte relay, SNI-based routing (`HostSNI()`), IP filtering
UDP	Session-based datagram relay

Middlewares (15 built-in)

Middleware	Config Keys	Purpose
`jwt`	`value`	JWT validation (HS256)
`api-key`	`header`, `keys`	API key enforcement
`basic-auth`	`username`, `password`	HTTP Basic Auth
`forward-auth`	`forward_auth_url`	Delegate auth to external IdP
`rate-limit`	`rate`, `burst`	Token bucket (in-process)
`rate-limit-redis`	`rate`, `burst`, `redis_url`	Distributed rate limiting
`cors`	`allowed_origins`, `allowed_methods`	CORS headers
`headers`	`request_headers`, `response_headers`	Header manipulation
`strip-prefix`	`prefixes`	Path prefix removal
`body-limit`	`max_body_bytes`	Request body cap (413 on exceed)
`retry`	`max_retries`, `retry_interval_ms`	Retry on upstream failure
`circuit-breaker`	`failure_threshold`, `cooldown_secs`	Closed/Open/HalfOpen state machine
`ip-allow`	`allowed_ips`	CIDR/IP allowlist
`compress`	—	brotli/gzip/deflate (br preferred)
`tcp-filter`	—	Connection limit + IP allowlist for TCP

Observability

All observability features are individually configurable — disable any of them to reduce per-request overhead in high-throughput scenarios.

observability {
  metrics_enabled     = true   # Prometheus metrics (default: true)
  access_log_enabled  = true   # Structured JSON access log (default: true)
  tracing_enabled     = false  # W3C Trace Context propagation (default: true)
}

Prometheus metrics: Per-router/service/backend request counts, latency histograms, error rates, autoscaler state
Structured access log: JSON entries — timestamp, client IP, method, path, status, duration, backend, router
Distributed tracing: W3C Trace Context and B3/Zipkin propagation; inject spans into upstream requests
Management: CLI-first operations plus an optional authenticated/mTLS Dashboard API on a dedicated listener

Service Discovery

File provider: ACL with directory watching and hot reload
DNS provider: Hostname resolution with TTL-based caching
Docker provider: Auto-discover services from container labels (a3s.router.rule, a3s.service.port)
Health-based discovery: Auto-register backends via /.well-known/a3s-service.json
Kubernetes Ingress (kube feature): Watch networking.k8s.io/v1/Ingress resources
Kubernetes IngressRoute CRD (kube feature): Traefik-style advanced routing

Performance

Built for throughput. The proxy hot path uses direct hyper HTTP/1.1 connection pooling with streaming request body passthrough — no intermediate buffering for plain HTTP traffic.

Metric	Value	Conditions
Throughput	67,000 req/s	200 concurrent connections, Apple Silicon, loopback
Latency overhead	69 µs (p50)	Single connection, measures pure gateway overhead
Tail latency	5.6 ms (p99)	200 concurrent connections
Routing	90 ns	Match against 100-route table
Middleware pipeline	130 ns/middleware	Pre-compiled at startup, no per-request allocation
Config reload	3 ms	300-service configuration, hot reload

Benchmarked with oha (Rust HTTP load generator) against a hyper backend on loopback. Criterion micro-benchmarks included in benches/.

Configuration

All configuration uses ACL format (.acl files). Changes are picked up automatically when file watching is enabled — no restart required.

# Full example — LLM API gateway with safe rollout
entrypoints "web"       { address = "0.0.0.0:80" }
entrypoints "websecure" {
  address = "0.0.0.0:443"
  tls { cert_file = "/etc/certs/cert.pem"; key_file = "/etc/certs/key.pem" }
}

routers "llm-api" {
  rule        = "Host(`api.example.com`) && PathPrefix(`/v1`)"
  service     = "llm-service"
  entrypoints = ["websecure"]
  middlewares  = ["auth-jwt", "rate-limit", "circuit-breaker"]
}

services "llm-service" {
  load_balancer {
    strategy        = "least-connections"
    request_timeout = "60s"
    servers         = [
      { url = "http://127.0.0.1:8001" },
      { url = "http://127.0.0.1:8002" }
    ]
    health_check { path = "/health"; interval = "10s" }
  }

  # Mirror 5% of traffic to a new model for shadow testing
  mirror { service = "llm-canary"; percentage = 5 }

  # Scale to zero when idle — buffer requests during cold start
  scaling {
    min_replicas          = 0
    max_replicas          = 8
    container_concurrency = 10
    target_utilization    = 0.7
    buffer_enabled        = true
    executor              = "box"
  }

  # Shift traffic from v1 to v2 over 10 steps, 1 minute apart
  # Auto-rollback if error rate > 5% or p99 > 5s
  rollout {
    from                 = "v1"
    to                   = "v2"
    step_percent         = 10
    step_interval_secs   = 60
    error_rate_threshold = 0.05
    latency_threshold_ms = 5000
  }
}

middlewares "auth-jwt"       { type = "jwt"; value = env("JWT_SECRET") }
middlewares "rate-limit"     { type = "rate-limit"; rate = 100; burst = 20 }
middlewares "circuit-breaker" {
  type              = "circuit-breaker"
  failure_threshold = 5
  cooldown_secs     = 30
  success_threshold = 2
}

providers {
  file { watch = true; directory = "/etc/gateway/conf.d/" }
}

Service Discovery Contract

Backends expose /.well-known/a3s-service.json (RFC 8615) for automatic registration:

{
  "name": "llm-service",
  "version": "2.1.0",
  "routes": [
    { "rule": "PathPrefix(`/v1`)", "middlewares": ["rate-limit"] }
  ],
  "health_path": "/health",
  "weight": 1
}

Architecture

                    ┌──────────────────────────────────────────────┐
                    │              A3S Gateway                      │
                    │                                              │
  Client ──────────┤  Entrypoint (HTTP/HTTPS/TCP/UDP)             │
  (HTTP/WS/SSE/    │      │                                      │
   gRPC/TCP/UDP)   │      ▼                                      │
                    │  TLS Termination (rustls + ACME)            │
                    │      │                                      │
                    │      ▼                                      │
                    │  Router ──── Rule Matching                  │
                    │      │       (Host, Path, Headers, SNI)     │
                    │      ▼                                      │
                    │  Middleware Pipeline                         │
                    │  ┌──────┬────────┬──────────┬───────────┐  │
                    │  │Auth  │  Rate  │  Circuit │  Compress │  │
                    │  │JWT   │  Limit │  Breaker │  CORS     │  │
                    │  └──────┴────────┴──────────┴───────────┘  │
                    │      │                                      │
                    │      ▼                                      │
                    │  Service (LB + Health + Failover + Mirror)  │
                    │      │                                      │
                    │      ▼                                      │
                    │  Scaling (Knative autoscaler + buffer)      │
                    │      │                                      │
                    └──────┼──────────────────────────────────────┘
                           │
              ┌────────────┼────────────┐
              ▼            ▼            ▼
         ┌────────┐  ┌────────┐  ┌──────────┐
         │HTTP    │  │gRPC    │  │TCP/UDP   │
         │Backend │  │Backend │  │Backend   │
         └────────┘  └────────┘  └──────────┘

Core Components

Component	Responsibility
`Gateway`	Lifecycle orchestrator — owns all subsystems
`Entrypoint`	Network listener (HTTP, HTTPS, TCP, UDP)
`Router`	Rule-based request matching; O(n) scan with priority ordering
`Middleware`	Composable request/response pipeline; pre-compiled per-router at startup
`Service`	Backend pool — load balancing, health, mirroring, failover
`Scaling`	Knative autoscaler — concurrency tracking, request buffer, revision router
`Provider`	Dynamic config sources — file, DNS, discovery, Kubernetes
`Proxy`	Protocol forwarder — HTTP, WebSocket, SSE, gRPC, TCP, UDP

Gateway Lifecycle

Created → Starting → Running ⇄ Reloading → Stopping → Stopped

Hot reload (Reloading) replaces router table and service registry atomically under a shared runtime snapshot. HTTP/TCP entrypoints keep their sockets when listener configuration is unchanged. If HTTP/TCP listeners are added or moved, only changed entrypoints are reconciled; unchanged listeners remain active if the new bind fails. UDP entrypoints still restart explicitly.

Deployment

Homebrew

brew install a3s-lab/tap/a3s-gateway

Helm (Kubernetes)

helm install gateway deploy/helm/a3s-gateway \
  --set-file config=my-gateway.acl \
  --set service.type=LoadBalancer

Docker

docker run -v $(pwd)/gateway.acl:/etc/a3s-gateway/gateway.acl \
  -p 8080:8080 ghcr.io/a3s-lab/gateway:latest

Cargo

cargo install a3s-gateway

API Reference

Rust API

Method	Description
`Gateway::new(config)`	Create from `GatewayConfig`
`start()`	Bind listeners and begin proxying
`shutdown()`	Graceful drain and stop
`reload(new_config)`	Atomic hot reload without downtime
`health()`	Current health snapshot
`metrics()`	Prometheus metrics collector
`state()`	`GatewayState` enum

Management

Management is CLI-first by default. The optional Dashboard API runs on a dedicated listener, so /api/gateway/* on traffic entrypoints remains normal user traffic and can be routed by your own routers. Management requests require both a matching allowed_ips entry and a bearer token when auth_token_env is set. Remote management listeners can also require HTTPS and client certificates.

a3s-gateway validate --config gateway.acl
a3s-gateway config --config gateway.acl summary
a3s-gateway config --config gateway.acl entrypoints
a3s-gateway config --config gateway.acl routes
a3s-gateway config --config gateway.acl services
a3s-gateway config --config gateway.acl middlewares
a3s-gateway config --config gateway.acl providers
a3s-gateway config --config gateway.acl json
a3s-gateway management events --url http://127.0.0.1:9090/api/gateway
a3s-gateway management validate --url http://127.0.0.1:9090/api/gateway --file gateway.acl
a3s-gateway management reload --url http://127.0.0.1:9090/api/gateway --file gateway.acl

Use --ca-cert, --client-cert, and --client-key with management events when the management listener requires mTLS.

management {
  enabled        = true
  address        = "127.0.0.1:9090"
  path_prefix    = "/api/gateway"
  auth_token_env = "A3S_GATEWAY_ADMIN_TOKEN"
  allowed_ips    = ["127.0.0.1", "::1"]

  tls {
    cert_file           = "/etc/a3s/admin/server.crt"
    key_file            = "/etc/a3s/admin/server.key"
    client_ca_file      = "/etc/a3s/admin/client-ca.crt"
    require_client_cert = true
    min_version         = "1.3"
  }
}

Endpoint	Response
`GET /api/gateway/health`	Gateway health (JSON)
`GET /api/gateway/metrics`	Prometheus text format
`GET /api/gateway/config`	Active configuration (JSON)
`GET /api/gateway/routes`	Configured routes
`GET /api/gateway/services`	Services with backend health
`GET /api/gateway/backends`	All backends with connection counts
`GET /api/gateway/events`	Recent management security audit events
`GET /api/gateway/version`	Binary version
`POST /api/gateway/config/validate`	Validate an ACL payload without applying it
`POST /api/gateway/config/reload`	Transactionally reload from an ACL payload

Development

cargo build -p a3s-gateway
cargo test -p a3s-gateway --all-features   # 1069 tests
cargo clippy -p a3s-gateway --all-features -- -D warnings
cargo bench --no-run --all-features        # compile benchmarks

Or use the justfile:

just ci              # fmt + lint + test (full gate)
just bench           # run criterion benchmarks
just release-check   # full pre-release validation

Project Structure

src/
├── lib.rs, main.rs          # Public API + CLI
├── gateway.rs               # Lifecycle orchestrator
├── dashboard.rs             # Optional dedicated management API
├── entrypoint.rs            # HTTP/HTTPS/TCP/UDP listeners + hot path
├── error.rs                 # GatewayError, Result
│
├── config/                  # ACL configuration model
│   └── acl.rs, entrypoint.rs, router.rs, service.rs, scaling.rs, middleware.rs
│
├── router/                  # Rule matching engine
│   ├── rule.rs              # Host/Path/Header/Method/SNI matchers
│   └── tcp.rs               # TCP SNI router
│
├── middleware/              # 15 built-in middleware types
│
├── service/                 # Backend pool management
│   └── load_balancer.rs, health_check.rs, passive_health.rs,
│       sticky.rs, mirror.rs, failover.rs
│
├── proxy/                   # Protocol forwarders
│   └── http_proxy.rs, websocket.rs, streaming.rs, grpc.rs,
│       tcp.rs, udp.rs, tls.rs, acme*.rs, ws_mux.rs
│
├── observability/
│   └── metrics.rs, access_log.rs, tracing.rs
│
├── scaling/                 # Knative-style autoscaler
│   └── executor.rs, autoscaler.rs, buffer.rs,
│       concurrency.rs, revision.rs, rollout.rs
│
└── provider/                # Config providers
    └── file_watcher.rs, dns.rs, docker.rs, discovery.rs,
        kubernetes.rs, kubernetes_crd.rs  (feature: kube)

benches/                     # Criterion benchmarks
├── routing.rs               # RouterTable::match_request
├── middleware_pipeline.rs   # Pipeline::process_request
└── acl_parse.rs             # GatewayConfig::from_acl

deploy/
└── helm/a3s-gateway/        # Helm chart

Stability

A3S Gateway follows Semantic Versioning. Starting with v1.0.0:

Stable: Public Rust API (Gateway, GatewayConfig, GatewayState, HealthStatus, GatewayError), ACL configuration format, Management API endpoints, CLI interface.
Unstable (may change in minor releases): #[doc(hidden)] modules (router, middleware), internal provider implementations, benchmark infrastructure.
MSRV: Rust 1.88. May advance in minor releases with at least 3 stable-version lag.

See CHANGELOG.md for release history and RELEASING.md for the release process.

Community

Discord — questions, discussions, updates.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
.cargo		.cargo
.github		.github
benches		benches
deploy		deploy
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
RELEASING.md		RELEASING.md
docker-bake.json		docker-bake.json
justfile		justfile

Folders and files

Latest commit

History

Repository files navigation

A3S Gateway

Why A3S Gateway

Quick Start

Programmatic

Features

AI Workload Patterns

Core Proxy

Protocols

Middlewares (15 built-in)

Observability

Service Discovery

Performance

Configuration

Service Discovery Contract

Architecture

Core Components

Gateway Lifecycle

Deployment

Homebrew

Helm (Kubernetes)

Docker

Cargo

API Reference

Rust API

Management

Development

Project Structure

Stability

Community

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 20

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages