Skip to content

Latest commit

 

History

History
102 lines (78 loc) · 5.42 KB

File metadata and controls

102 lines (78 loc) · 5.42 KB

Architecture

Problem

When rolling out Codex CLI to a development team, the straightforward approach is giving each developer their own OpenAI API key. This provides basic per-key usage metrics through OpenAI's dashboard (spend and model usage), but it falls short in several areas:

  • No content visibility -- you can see how many tokens each person used, but not what prompts or responses were sent. No auditing capability.
  • No centralized logging -- all interaction data lives on OpenAI's side, inaccessible for internal compliance, analysis, or secret detection.
  • Fragmented key management -- N developers means N keys to track, rotate, and revoke individually across OpenAI's dashboard.
  • No secret detection foundation -- developers may inadvertently send credentials, API keys, or connection strings in their prompts, and you have no way to detect or prevent it.

Solution: HTTP Proxy

Codex CLI supports pointing to a custom endpoint via OPENAI_BASE_URL or base_url in config.toml. The proxy sits between Codex CLI and the OpenAI API:

Developer's machine          AWS (us-east-1)              OpenAI
┌──────────────┐     ┌──────────────────────┐     ┌─────────────┐
│  Codex CLI   │────>│  Lambda Function URL │────>│ OpenAI API  │
│              │     │  ┌────────────────┐  │     │             │
│ OPENAI_BASE  │     │  │ Lambda (Go)    │  │     │ api.openai  │
│ _URL = proxy │     │  │                │  │     │ .com        │
│              │     │  │ 1. Validate    │  │     │             │
│ OPENAI_API   │     │  │    token       │  │     │             │
│ _KEY = token │     │  │ 2. Forward to  │  │     │             │
│              │<────│  │    OpenAI      │  │     │             │
│              │     │  │ 3. Log to S3   │  │     │             │
│              │     │  └────────────────┘  │     │             │
└──────────────┘     │         │            │     └─────────────┘
                     │    ┌────┴────┐       │
                     │    │DynamoDB │       │
                     │    │(tokens) │       │
                     │    └─────────┘       │
                     │    ┌─────────┐       │
                     │    │   S3    │       │
                     │    │ (logs)  │       │
                     │    └─────────┘       │
                     │    ┌─────────┐       │
                     │    │Secrets  │       │
                     │    │Manager  │       │
                     │    └─────────┘       │
                     └──────────────────────┘

Key Design Decisions

Lambda Function URL

Lambda Function URLs provide a native HTTPS endpoint directly on the Lambda function, with no intermediate service. They have no extra cost beyond Lambda itself, support CORS natively, and the request runs for up to the full Lambda timeout (120 seconds) -- enough for long OpenAI responses.

Go for the Lambda

  • Cold start: ~50-100ms (vs 300-500ms for Node.js)
  • Memory: Lower footprint for a proxy workload
  • Cost: Less compute time per invocation
  • Provisioned concurrency: 2 warm instances via alias live to eliminate cold starts for the first concurrent requests

DynamoDB for token registry

  • Pay-per-request billing (essentially free for ~20 users)
  • Simple key-value lookup: token -> (user_id, enabled)
  • Easy to enable/disable tokens without redeployment
  • Alternative: Secrets Manager (works for very small teams but less flexible)

S3 for logs

  • Cheapest storage for write-heavy, read-rarely workload
  • Partitioned by year=YYYY/month=MM/day=DD/token/request_id.json
  • 90-day lifecycle expiration (configurable in deploy.sh)
  • Queryable with Athena for ad-hoc analysis
  • Full request/response bodies stored (truncated at 100KB per body)

One token per person

  • Stable identifier for reporting and access control
  • Admin assigns tokens on onboarding, disables on offboarding
  • No automatic rotation (simplicity for ~20 users)
  • No authentication layer (no VPN, no OAuth) -- the token IS the identity

Transparency

The proxy acts as a pure HTTP passthrough:

  • Request body is forwarded to OpenAI unchanged
  • Response body is returned to the client unchanged
  • All Codex CLI features (skills, MCPs, image uploads, streaming) work without modification
  • The only changes: Authorization header is swapped (proxy token -> real OpenAI key) and the request is logged

Resilience

Measure Value
Lambda timeout 120s
HTTP client timeout 120s
Reserved concurrency 25
Provisioned concurrency 2 (alias live)
Lambda memory 512 MB
S3 log retention 90 days
Log write Async (goroutine, 10s timeout)

The log write to S3 happens asynchronously in a goroutine so it doesn't add latency to the response. If the log write fails, the response still reaches the developer.