Architecture

Problem

When rolling out Codex CLI to a development team, the straightforward approach is giving each developer their own OpenAI API key. This provides basic per-key usage metrics through OpenAI's dashboard (spend and model usage), but it falls short in several areas:

No content visibility -- you can see how many tokens each person used, but not what prompts or responses were sent. No auditing capability.
No centralized logging -- all interaction data lives on OpenAI's side, inaccessible for internal compliance, analysis, or secret detection.
Fragmented key management -- N developers means N keys to track, rotate, and revoke individually across OpenAI's dashboard.
No secret detection foundation -- developers may inadvertently send credentials, API keys, or connection strings in their prompts, and you have no way to detect or prevent it.

Solution: HTTP Proxy

Codex CLI supports pointing to a custom endpoint via OPENAI_BASE_URL or base_url in config.toml. The proxy sits between Codex CLI and the OpenAI API:

Developer's machine          AWS (us-east-1)              OpenAI
┌──────────────┐     ┌──────────────────────┐     ┌─────────────┐
│  Codex CLI   │────>│  Lambda Function URL │────>│ OpenAI API  │
│              │     │  ┌────────────────┐  │     │             │
│ OPENAI_BASE  │     │  │ Lambda (Go)    │  │     │ api.openai  │
│ _URL = proxy │     │  │                │  │     │ .com        │
│              │     │  │ 1. Validate    │  │     │             │
│ OPENAI_API   │     │  │    token       │  │     │             │
│ _KEY = token │     │  │ 2. Forward to  │  │     │             │
│              │<────│  │    OpenAI      │  │     │             │
│              │     │  │ 3. Log to S3   │  │     │             │
│              │     │  └────────────────┘  │     │             │
└──────────────┘     │         │            │     └─────────────┘
                     │    ┌────┴────┐       │
                     │    │DynamoDB │       │
                     │    │(tokens) │       │
                     │    └─────────┘       │
                     │    ┌─────────┐       │
                     │    │   S3    │       │
                     │    │ (logs)  │       │
                     │    └─────────┘       │
                     │    ┌─────────┐       │
                     │    │Secrets  │       │
                     │    │Manager  │       │
                     │    └─────────┘       │
                     └──────────────────────┘

Key Design Decisions

Lambda Function URL

Lambda Function URLs provide a native HTTPS endpoint directly on the Lambda function, with no intermediate service. They have no extra cost beyond Lambda itself, support CORS natively, and the request runs for up to the full Lambda timeout (120 seconds) -- enough for long OpenAI responses.

Go for the Lambda

Cold start: ~50-100ms (vs 300-500ms for Node.js)
Memory: Lower footprint for a proxy workload
Cost: Less compute time per invocation
Provisioned concurrency: 2 warm instances via alias live to eliminate cold starts for the first concurrent requests

DynamoDB for token registry

Pay-per-request billing (essentially free for ~20 users)
Simple key-value lookup: token -> (user_id, enabled)
Easy to enable/disable tokens without redeployment
Alternative: Secrets Manager (works for very small teams but less flexible)

S3 for logs

Cheapest storage for write-heavy, read-rarely workload
Partitioned by year=YYYY/month=MM/day=DD/token/request_id.json
90-day lifecycle expiration (configurable in deploy.sh)
Queryable with Athena for ad-hoc analysis
Full request/response bodies stored (truncated at 100KB per body)

One token per person

Stable identifier for reporting and access control
Admin assigns tokens on onboarding, disables on offboarding
No automatic rotation (simplicity for ~20 users)
No authentication layer (no VPN, no OAuth) -- the token IS the identity

Transparency

The proxy acts as a pure HTTP passthrough:

Request body is forwarded to OpenAI unchanged
Response body is returned to the client unchanged
All Codex CLI features (skills, MCPs, image uploads, streaming) work without modification
The only changes: Authorization header is swapped (proxy token -> real OpenAI key) and the request is logged

Resilience

Measure	Value
Lambda timeout	120s
HTTP client timeout	120s
Reserved concurrency	25
Provisioned concurrency	2 (alias `live`)
Lambda memory	512 MB
S3 log retention	90 days
Log write	Async (goroutine, 10s timeout)

The log write to S3 happens asynchronously in a goroutine so it doesn't add latency to the response. If the log write fails, the response still reaches the developer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture

Problem

Solution: HTTP Proxy

Key Design Decisions

Lambda Function URL

Go for the Lambda

DynamoDB for token registry

S3 for logs

One token per person

Transparency

Resilience

FilesExpand file tree

architecture.md

Latest commit

History

architecture.md

File metadata and controls

Architecture

Problem

Solution: HTTP Proxy

Key Design Decisions

Lambda Function URL

Go for the Lambda

DynamoDB for token registry

S3 for logs

One token per person

Transparency

Resilience