When rolling out Codex CLI to a development team, the straightforward approach is giving each developer their own OpenAI API key. This provides basic per-key usage metrics through OpenAI's dashboard (spend and model usage), but it falls short in several areas:
- No content visibility -- you can see how many tokens each person used, but not what prompts or responses were sent. No auditing capability.
- No centralized logging -- all interaction data lives on OpenAI's side, inaccessible for internal compliance, analysis, or secret detection.
- Fragmented key management -- N developers means N keys to track, rotate, and revoke individually across OpenAI's dashboard.
- No secret detection foundation -- developers may inadvertently send credentials, API keys, or connection strings in their prompts, and you have no way to detect or prevent it.
Codex CLI supports pointing to a custom endpoint via OPENAI_BASE_URL or base_url in config.toml. The proxy sits between Codex CLI and the OpenAI API:
Developer's machine AWS (us-east-1) OpenAI
┌──────────────┐ ┌──────────────────────┐ ┌─────────────┐
│ Codex CLI │────>│ Lambda Function URL │────>│ OpenAI API │
│ │ │ ┌────────────────┐ │ │ │
│ OPENAI_BASE │ │ │ Lambda (Go) │ │ │ api.openai │
│ _URL = proxy │ │ │ │ │ │ .com │
│ │ │ │ 1. Validate │ │ │ │
│ OPENAI_API │ │ │ token │ │ │ │
│ _KEY = token │ │ │ 2. Forward to │ │ │ │
│ │<────│ │ OpenAI │ │ │ │
│ │ │ │ 3. Log to S3 │ │ │ │
│ │ │ └────────────────┘ │ │ │
└──────────────┘ │ │ │ └─────────────┘
│ ┌────┴────┐ │
│ │DynamoDB │ │
│ │(tokens) │ │
│ └─────────┘ │
│ ┌─────────┐ │
│ │ S3 │ │
│ │ (logs) │ │
│ └─────────┘ │
│ ┌─────────┐ │
│ │Secrets │ │
│ │Manager │ │
│ └─────────┘ │
└──────────────────────┘
Lambda Function URLs provide a native HTTPS endpoint directly on the Lambda function, with no intermediate service. They have no extra cost beyond Lambda itself, support CORS natively, and the request runs for up to the full Lambda timeout (120 seconds) -- enough for long OpenAI responses.
- Cold start: ~50-100ms (vs 300-500ms for Node.js)
- Memory: Lower footprint for a proxy workload
- Cost: Less compute time per invocation
- Provisioned concurrency: 2 warm instances via alias
liveto eliminate cold starts for the first concurrent requests
- Pay-per-request billing (essentially free for ~20 users)
- Simple key-value lookup: token -> (user_id, enabled)
- Easy to enable/disable tokens without redeployment
- Alternative: Secrets Manager (works for very small teams but less flexible)
- Cheapest storage for write-heavy, read-rarely workload
- Partitioned by
year=YYYY/month=MM/day=DD/token/request_id.json - 90-day lifecycle expiration (configurable in deploy.sh)
- Queryable with Athena for ad-hoc analysis
- Full request/response bodies stored (truncated at 100KB per body)
- Stable identifier for reporting and access control
- Admin assigns tokens on onboarding, disables on offboarding
- No automatic rotation (simplicity for ~20 users)
- No authentication layer (no VPN, no OAuth) -- the token IS the identity
The proxy acts as a pure HTTP passthrough:
- Request body is forwarded to OpenAI unchanged
- Response body is returned to the client unchanged
- All Codex CLI features (skills, MCPs, image uploads, streaming) work without modification
- The only changes:
Authorizationheader is swapped (proxy token -> real OpenAI key) and the request is logged
| Measure | Value |
|---|---|
| Lambda timeout | 120s |
| HTTP client timeout | 120s |
| Reserved concurrency | 25 |
| Provisioned concurrency | 2 (alias live) |
| Lambda memory | 512 MB |
| S3 log retention | 90 days |
| Log write | Async (goroutine, 10s timeout) |
The log write to S3 happens asynchronously in a goroutine so it doesn't add latency to the response. If the log write fails, the response still reaches the developer.