Skip to content

A3S-Lab/Gateway

Repository files navigation

A3S Gateway

The traffic layer for AI-native services

Single binary. ACL config. Hot reload. Built for LLM streaming, scale-to-zero, and safe model rollouts.

Why A3S GatewayQuick StartFeaturesConfigurationArchitectureDeploymentAPI ReferenceStability


Why A3S Gateway

AI services break the assumptions baked into Web-era gateways:

Assumption Web services AI services
Response size Small, bounded Unbounded (streaming tokens)
Latency Milliseconds Seconds (model inference)
Idle cost Cheap Expensive (GPU memory)
Deployment risk Low High (model quality regression)
Protocol HTTP request/response SSE, WebSocket, gRPC

nginx, Caddy, and Traefik were built for the left column. A3S Gateway is built for the right:

  • SSE/Streaming — chunked transfer without response buffering; first token reaches the client as soon as the model emits it
  • Scale-to-zero with request buffering — when a model is cold, incoming requests are held in memory and replayed the moment the replica is ready, not dropped or returned 503
  • Revision traffic splitting — send 5% of live traffic to a new model version; automatically roll back if error rate or p99 latency crosses a threshold
  • Traffic mirroring — shadow-test a new model with real requests before it handles a single production response
  • WebSocket multiplexing — named pub/sub channels over a single connection for real-time AI interactions

Everything else (routing, TLS, rate limiting, circuit breaker, Prometheus) is table-stakes infrastructure packaged so you don't need a second tool.

1069 tests | 80 source files | ~37,000 lines of Rust | Single statically-linked binary | MSRV 1.88


Quick Start

# Install
brew install a3s-lab/tap/a3s-gateway

# Or via cargo
cargo install a3s-gateway

# Start
a3s-gateway --config gateway.acl
# gateway.acl - proxy all traffic to an LLM service
entrypoints "web" {
  address = "0.0.0.0:8080"
}

routers "llm" {
  rule    = "PathPrefix(`/v1`)"
  service = "llm-backend"
  middlewares = ["rate-limit", "auth-jwt"]
}

services "llm-backend" {
  load_balancer {
    strategy        = "least-connections"
    request_timeout = "60s"
    servers         = [
      { url = "http://127.0.0.1:8001" },
      { url = "http://127.0.0.1:8002" }
    ]
    health_check { path = "/health" }
  }

  # Scale to zero — buffer requests during cold start
  scaling {
    min_replicas          = 0
    max_replicas          = 4
    container_concurrency = 10
    buffer_enabled        = true
    executor              = "box"
  }
}

middlewares "rate-limit" { type = "rate-limit"; rate = 60; burst = 10 }
middlewares "auth-jwt"   { type = "jwt"; value = env("JWT_SECRET") }

Programmatic

use a3s_gateway::{Gateway, config::GatewayConfig};
use std::sync::Arc;

#[tokio::main]
async fn main() -> a3s_gateway::Result<()> {
    let config = GatewayConfig::from_file("gateway.acl").await?;
    let gateway = Arc::new(Gateway::new(config)?);
    gateway.start().await?;
    gateway.wait_for_shutdown().await;
    Ok(())
}

Features

AI Workload Patterns

Feature How it works
SSE / Streaming Chunked transfer relay — zero response buffering, first token delivered immediately
Scale-to-zero Knative formula: desired = ⌈(in_flight + queue_depth) / (concurrency × utilization)⌉
Cold-start buffering Requests queue in memory during scale-up; replayed once backend is ready
Revision traffic splitting Route N% to v1, M% to v2 with per-revision health tracking
Gradual rollout Step-by-step traffic shift with automatic rollback on error rate or latency breach
Traffic mirroring Fire-and-forget copy of live traffic to a shadow backend (no client impact)

Core Proxy

  • Dynamic routing: Traefik-style rule engine — Host(), PathPrefix(), Path(), Headers(), Method(), &&
  • Load balancing: Round-robin, weighted, least-connections, random
  • Health checks: Active HTTP probes + passive error-count eviction
  • Sticky sessions: Cookie-based backend affinity with TTL and LRU eviction
  • Failover: Automatic switch to secondary pool when primary has no healthy backends
  • Forwarded headers: Upstreams receive normalized X-Forwarded-For, X-Forwarded-Host, X-Forwarded-Proto, and X-Forwarded-Port
  • Upstream timeouts: Per-service plain HTTP request_timeout returns 504 Gateway Timeout on slow backends
  • TLS termination: rustls (pure Rust, no OpenSSL) + ACME/Let's Encrypt (HTTP-01, DNS-01/Cloudflare, DNS-01/Route53)
  • Hot reload: File-watch config reload (inotify/kqueue) — unchanged HTTP/TCP entrypoints are not rebound

Protocols

Protocol Capability
HTTP/1.1 & HTTP/2 Full reverse proxy, hop-by-hop header filtering, normalized X-Forwarded-* metadata
WebSocket Upgrade detection, bidirectional relay, named-channel multiplexing
SSE / Streaming Chunked transfer, zero buffering — optimized for LLM token streams
gRPC HTTP/2 h2c forwarding with header translation
TCP Raw byte relay, SNI-based routing (HostSNI()), IP filtering
UDP Session-based datagram relay

Middlewares (15 built-in)

Middleware Config Keys Purpose
jwt value JWT validation (HS256)
api-key header, keys API key enforcement
basic-auth username, password HTTP Basic Auth
forward-auth forward_auth_url Delegate auth to external IdP
rate-limit rate, burst Token bucket (in-process)
rate-limit-redis rate, burst, redis_url Distributed rate limiting
cors allowed_origins, allowed_methods CORS headers
headers request_headers, response_headers Header manipulation
strip-prefix prefixes Path prefix removal
body-limit max_body_bytes Request body cap (413 on exceed)
retry max_retries, retry_interval_ms Retry on upstream failure
circuit-breaker failure_threshold, cooldown_secs Closed/Open/HalfOpen state machine
ip-allow allowed_ips CIDR/IP allowlist
compress brotli/gzip/deflate (br preferred)
tcp-filter Connection limit + IP allowlist for TCP

Observability

All observability features are individually configurable — disable any of them to reduce per-request overhead in high-throughput scenarios.

observability {
  metrics_enabled     = true   # Prometheus metrics (default: true)
  access_log_enabled  = true   # Structured JSON access log (default: true)
  tracing_enabled     = false  # W3C Trace Context propagation (default: true)
}
  • Prometheus metrics: Per-router/service/backend request counts, latency histograms, error rates, autoscaler state
  • Structured access log: JSON entries — timestamp, client IP, method, path, status, duration, backend, router
  • Distributed tracing: W3C Trace Context and B3/Zipkin propagation; inject spans into upstream requests
  • Management: CLI-first operations plus an optional authenticated/mTLS Dashboard API on a dedicated listener

Service Discovery

  • File provider: ACL with directory watching and hot reload
  • DNS provider: Hostname resolution with TTL-based caching
  • Docker provider: Auto-discover services from container labels (a3s.router.rule, a3s.service.port)
  • Health-based discovery: Auto-register backends via /.well-known/a3s-service.json
  • Kubernetes Ingress (kube feature): Watch networking.k8s.io/v1/Ingress resources
  • Kubernetes IngressRoute CRD (kube feature): Traefik-style advanced routing

Performance

Built for throughput. The proxy hot path uses direct hyper HTTP/1.1 connection pooling with streaming request body passthrough — no intermediate buffering for plain HTTP traffic.

Metric Value Conditions
Throughput 67,000 req/s 200 concurrent connections, Apple Silicon, loopback
Latency overhead 69 µs (p50) Single connection, measures pure gateway overhead
Tail latency 5.6 ms (p99) 200 concurrent connections
Routing 90 ns Match against 100-route table
Middleware pipeline 130 ns/middleware Pre-compiled at startup, no per-request allocation
Config reload 3 ms 300-service configuration, hot reload

Benchmarked with oha (Rust HTTP load generator) against a hyper backend on loopback. Criterion micro-benchmarks included in benches/.


Configuration

All configuration uses ACL format (.acl files). Changes are picked up automatically when file watching is enabled — no restart required.

# Full example — LLM API gateway with safe rollout
entrypoints "web"       { address = "0.0.0.0:80" }
entrypoints "websecure" {
  address = "0.0.0.0:443"
  tls { cert_file = "/etc/certs/cert.pem"; key_file = "/etc/certs/key.pem" }
}

routers "llm-api" {
  rule        = "Host(`api.example.com`) && PathPrefix(`/v1`)"
  service     = "llm-service"
  entrypoints = ["websecure"]
  middlewares  = ["auth-jwt", "rate-limit", "circuit-breaker"]
}

services "llm-service" {
  load_balancer {
    strategy        = "least-connections"
    request_timeout = "60s"
    servers         = [
      { url = "http://127.0.0.1:8001" },
      { url = "http://127.0.0.1:8002" }
    ]
    health_check { path = "/health"; interval = "10s" }
  }

  # Mirror 5% of traffic to a new model for shadow testing
  mirror { service = "llm-canary"; percentage = 5 }

  # Scale to zero when idle — buffer requests during cold start
  scaling {
    min_replicas          = 0
    max_replicas          = 8
    container_concurrency = 10
    target_utilization    = 0.7
    buffer_enabled        = true
    executor              = "box"
  }

  # Shift traffic from v1 to v2 over 10 steps, 1 minute apart
  # Auto-rollback if error rate > 5% or p99 > 5s
  rollout {
    from                 = "v1"
    to                   = "v2"
    step_percent         = 10
    step_interval_secs   = 60
    error_rate_threshold = 0.05
    latency_threshold_ms = 5000
  }
}

middlewares "auth-jwt"       { type = "jwt"; value = env("JWT_SECRET") }
middlewares "rate-limit"     { type = "rate-limit"; rate = 100; burst = 20 }
middlewares "circuit-breaker" {
  type              = "circuit-breaker"
  failure_threshold = 5
  cooldown_secs     = 30
  success_threshold = 2
}

providers {
  file { watch = true; directory = "/etc/gateway/conf.d/" }
}

Service Discovery Contract

Backends expose /.well-known/a3s-service.json (RFC 8615) for automatic registration:

{
  "name": "llm-service",
  "version": "2.1.0",
  "routes": [
    { "rule": "PathPrefix(`/v1`)", "middlewares": ["rate-limit"] }
  ],
  "health_path": "/health",
  "weight": 1
}

Architecture

                    ┌──────────────────────────────────────────────┐
                    │              A3S Gateway                      │
                    │                                              │
  Client ──────────┤  Entrypoint (HTTP/HTTPS/TCP/UDP)             │
  (HTTP/WS/SSE/    │      │                                      │
   gRPC/TCP/UDP)   │      ▼                                      │
                    │  TLS Termination (rustls + ACME)            │
                    │      │                                      │
                    │      ▼                                      │
                    │  Router ──── Rule Matching                  │
                    │      │       (Host, Path, Headers, SNI)     │
                    │      ▼                                      │
                    │  Middleware Pipeline                         │
                    │  ┌──────┬────────┬──────────┬───────────┐  │
                    │  │Auth  │  Rate  │  Circuit │  Compress │  │
                    │  │JWT   │  Limit │  Breaker │  CORS     │  │
                    │  └──────┴────────┴──────────┴───────────┘  │
                    │      │                                      │
                    │      ▼                                      │
                    │  Service (LB + Health + Failover + Mirror)  │
                    │      │                                      │
                    │      ▼                                      │
                    │  Scaling (Knative autoscaler + buffer)      │
                    │      │                                      │
                    └──────┼──────────────────────────────────────┘
                           │
              ┌────────────┼────────────┐
              ▼            ▼            ▼
         ┌────────┐  ┌────────┐  ┌──────────┐
         │HTTP    │  │gRPC    │  │TCP/UDP   │
         │Backend │  │Backend │  │Backend   │
         └────────┘  └────────┘  └──────────┘

Core Components

Component Responsibility
Gateway Lifecycle orchestrator — owns all subsystems
Entrypoint Network listener (HTTP, HTTPS, TCP, UDP)
Router Rule-based request matching; O(n) scan with priority ordering
Middleware Composable request/response pipeline; pre-compiled per-router at startup
Service Backend pool — load balancing, health, mirroring, failover
Scaling Knative autoscaler — concurrency tracking, request buffer, revision router
Provider Dynamic config sources — file, DNS, discovery, Kubernetes
Proxy Protocol forwarder — HTTP, WebSocket, SSE, gRPC, TCP, UDP

Gateway Lifecycle

Created → Starting → Running ⇄ Reloading → Stopping → Stopped

Hot reload (Reloading) replaces router table and service registry atomically under a shared runtime snapshot. HTTP/TCP entrypoints keep their sockets when listener configuration is unchanged. If HTTP/TCP listeners are added or moved, only changed entrypoints are reconciled; unchanged listeners remain active if the new bind fails. UDP entrypoints still restart explicitly.


Deployment

Homebrew

brew install a3s-lab/tap/a3s-gateway

Helm (Kubernetes)

helm install gateway deploy/helm/a3s-gateway \
  --set-file config=my-gateway.acl \
  --set service.type=LoadBalancer

Docker

docker run -v $(pwd)/gateway.acl:/etc/a3s-gateway/gateway.acl \
  -p 8080:8080 ghcr.io/a3s-lab/gateway:latest

Cargo

cargo install a3s-gateway

API Reference

Rust API

Method Description
Gateway::new(config) Create from GatewayConfig
start() Bind listeners and begin proxying
shutdown() Graceful drain and stop
reload(new_config) Atomic hot reload without downtime
health() Current health snapshot
metrics() Prometheus metrics collector
state() GatewayState enum

Management

Management is CLI-first by default. The optional Dashboard API runs on a dedicated listener, so /api/gateway/* on traffic entrypoints remains normal user traffic and can be routed by your own routers. Management requests require both a matching allowed_ips entry and a bearer token when auth_token_env is set. Remote management listeners can also require HTTPS and client certificates.

a3s-gateway validate --config gateway.acl
a3s-gateway config --config gateway.acl summary
a3s-gateway config --config gateway.acl entrypoints
a3s-gateway config --config gateway.acl routes
a3s-gateway config --config gateway.acl services
a3s-gateway config --config gateway.acl middlewares
a3s-gateway config --config gateway.acl providers
a3s-gateway config --config gateway.acl json
a3s-gateway management events --url http://127.0.0.1:9090/api/gateway
a3s-gateway management validate --url http://127.0.0.1:9090/api/gateway --file gateway.acl
a3s-gateway management reload --url http://127.0.0.1:9090/api/gateway --file gateway.acl

Use --ca-cert, --client-cert, and --client-key with management events when the management listener requires mTLS.

management {
  enabled        = true
  address        = "127.0.0.1:9090"
  path_prefix    = "/api/gateway"
  auth_token_env = "A3S_GATEWAY_ADMIN_TOKEN"
  allowed_ips    = ["127.0.0.1", "::1"]

  tls {
    cert_file           = "/etc/a3s/admin/server.crt"
    key_file            = "/etc/a3s/admin/server.key"
    client_ca_file      = "/etc/a3s/admin/client-ca.crt"
    require_client_cert = true
    min_version         = "1.3"
  }
}
Endpoint Response
GET /api/gateway/health Gateway health (JSON)
GET /api/gateway/metrics Prometheus text format
GET /api/gateway/config Active configuration (JSON)
GET /api/gateway/routes Configured routes
GET /api/gateway/services Services with backend health
GET /api/gateway/backends All backends with connection counts
GET /api/gateway/events Recent management security audit events
GET /api/gateway/version Binary version
POST /api/gateway/config/validate Validate an ACL payload without applying it
POST /api/gateway/config/reload Transactionally reload from an ACL payload

Development

cargo build -p a3s-gateway
cargo test -p a3s-gateway --all-features   # 1069 tests
cargo clippy -p a3s-gateway --all-features -- -D warnings
cargo bench --no-run --all-features        # compile benchmarks

Or use the justfile:

just ci              # fmt + lint + test (full gate)
just bench           # run criterion benchmarks
just release-check   # full pre-release validation

Project Structure

src/
├── lib.rs, main.rs          # Public API + CLI
├── gateway.rs               # Lifecycle orchestrator
├── dashboard.rs             # Optional dedicated management API
├── entrypoint.rs            # HTTP/HTTPS/TCP/UDP listeners + hot path
├── error.rs                 # GatewayError, Result
│
├── config/                  # ACL configuration model
│   └── acl.rs, entrypoint.rs, router.rs, service.rs, scaling.rs, middleware.rs
│
├── router/                  # Rule matching engine
│   ├── rule.rs              # Host/Path/Header/Method/SNI matchers
│   └── tcp.rs               # TCP SNI router
│
├── middleware/              # 15 built-in middleware types
│
├── service/                 # Backend pool management
│   └── load_balancer.rs, health_check.rs, passive_health.rs,
│       sticky.rs, mirror.rs, failover.rs
│
├── proxy/                   # Protocol forwarders
│   └── http_proxy.rs, websocket.rs, streaming.rs, grpc.rs,
│       tcp.rs, udp.rs, tls.rs, acme*.rs, ws_mux.rs
│
├── observability/
│   └── metrics.rs, access_log.rs, tracing.rs
│
├── scaling/                 # Knative-style autoscaler
│   └── executor.rs, autoscaler.rs, buffer.rs,
│       concurrency.rs, revision.rs, rollout.rs
│
└── provider/                # Config providers
    └── file_watcher.rs, dns.rs, docker.rs, discovery.rs,
        kubernetes.rs, kubernetes_crd.rs  (feature: kube)

benches/                     # Criterion benchmarks
├── routing.rs               # RouterTable::match_request
├── middleware_pipeline.rs   # Pipeline::process_request
└── acl_parse.rs             # GatewayConfig::from_acl

deploy/
└── helm/a3s-gateway/        # Helm chart

Stability

A3S Gateway follows Semantic Versioning. Starting with v1.0.0:

  • Stable: Public Rust API (Gateway, GatewayConfig, GatewayState, HealthStatus, GatewayError), ACL configuration format, Management API endpoints, CLI interface.
  • Unstable (may change in minor releases): #[doc(hidden)] modules (router, middleware), internal provider implementations, benchmark infrastructure.
  • MSRV: Rust 1.88. May advance in minor releases with at least 3 stable-version lag.

See CHANGELOG.md for release history and RELEASING.md for the release process.


Community

Discord — questions, discussions, updates.

License

MIT

About

AI Native Traffic Layer

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages