Skip to content

feat: native OpenAI Codex (ChatGPT Plus) OAuth provider#22

Open
nsyring wants to merge 28 commits intoagent0ai:mainfrom
nsyring:feat/openai-codex-oauth
Open

feat: native OpenAI Codex (ChatGPT Plus) OAuth provider#22
nsyring wants to merge 28 commits intoagent0ai:mainfrom
nsyring:feat/openai-codex-oauth

Conversation

@nsyring
Copy link
Copy Markdown

@nsyring nsyring commented Apr 23, 2026

Summary

Adds a third LLM provider (openai-codex) that routes chat completions through the official OpenAI Codex OAuth device-code flow, so users with a ChatGPT Plus subscription can use it directly inside space-agent without a separate OpenAI Platform API key.

The provider is additive: the existing OpenRouter / OpenAI-compatible API path and the Hugging Face local path are byte-compatible with main. The Chat-Completions SSE parser is not modified. A dedicated LLM-client subclass (overlay) and transport function (admin) own the Responses-API SSE event stream, so OpenRouter regressions are structurally impossible from this change.

⚠️ Disclaimer

This provider uses your ChatGPT Plus subscription via the official OpenAI Codex OAuth flow — the same device-code flow the codex CLI and the Codex VS Code extension use. OpenAI's terms of service apply. Use at your own risk.

Screenshots

1. Logged-out state (overlay settings, ChatGPT tab)
26

2. Login-pending state (device code + verification URL)
27

3. Signed-in state (live model catalog from discovery)
28

(Attaching after PR opens — logged-out / login-pending / signed-in / successful Codex chat / DevTools headers proof)

Tested with

  • ChatGPT Plus subscription, gpt-5.4-mini and gpt-5.4, overlay chat + admin chat, fresh-login + refresh + long-session paths
  • Browser (Firefox 150) + packaged Electron desktop build (npm run desktop:dev) — both run the new provider end-to-end
  • 51 pure-function tests: request-shape conversion, SSE event mapping, token manager (always-fresh-read + single-flight), models-response parser, shipped catalog; run with node --test tests/openai_codex_*_test.mjs
  • Manual smoke: node space serve starts cleanly; curl against the three /api/openai_codex_* endpoints without a session returns HTTP 401 with the standard space-agent auth-gate body
  • Refresh flow: after access-token expiry (~1h) the next overlay chat transparently triggers /api/openai_codex_token_refresh, the rotated refresh_token lands in the encrypted config, and chat continues without user interaction

Why this touches the server

Space Agent prefers frontend implementations. This feature adds three authenticated server endpoints (openai_codex_auth_start, openai_codex_auth_poll, openai_codex_token_refresh) because OpenAI refresh tokens use single-use rotation: if two browser tabs refresh concurrently, one succeeds and the other returns invalid_grant, discarding the only valid refresh token and forcing a full re-login. A frontend-only implementation cannot provide the serialization needed to prevent that loss.

This falls under shared-data integrity per /server/AGENTS.md. The server-side layer is intentionally thin: it owns OAuth HTTP traffic and an in-process single-writer mutex (server/lib/openai_codex/refresh_lock.js), and never reads or writes tokens itself — token persistence stays on the frontend under userCrypto:-encrypted entries in ~/conf/onscreen-agent.yaml and ~/conf/admin-chat.yaml, via the same envelope pattern the existing api_key field already uses.

Known limitation: the mutex is in-process, so clustered deployments (WORKERS>1) can still race between workers. This is acceptable for the typical single-user single-browser-profile scenario; clustered-runtime hardening is noted as a follow-up in server/lib/openai_codex/AGENTS.md.

Proxy Interaction (important for reviewers)

The Codex endpoint sits behind Cloudflare bot-detection that rejects standard browser User-Agent strings. Direct browser fetches to chatgpt.com/backend-api/codex/* are blocked; the request must route through the existing server/router/proxy.js, which forwards the explicit User-Agent: codex_cli_rs/... and originator: codex_cli_rs headers set in applyCodexHeaders() (browsers silently ignore JavaScript-set User-Agent values, so the proxy path is the only way the Cloudflare originator arrives at the upstream). Without the proxy, this feature cannot function — it is a factual dependency of the implementation, not a preference.

The proxy is not a new backend endpoint; it is the existing space-agent outbound proxy shared with other cross-origin reads. Cookie header is stripped by the proxy before upstream forward, so space_session never leaks to chatgpt.com.

Architecture

Mirrored wiring across the two chat surfaces. The overlay uses the existing client-subclass pattern; the admin runtime is function-based so Codex ships as a new transport function alongside the existing API / Local dispatcher:

Aspect Overlay Admin
Transport OnscreenAgentCodexLlmClient subclass streamAdminAgentCodexCompletion function
SSE reader readCodexStreamingResponse readCodexStreamingResponse (duplicated)
Hook path ext/js/_core/onscreen_agent/.../openai-codex.js ext/js/_core/admin/views/agent/.../openai-codex.js
Tokens ~/conf/onscreen-agent.yaml ~/conf/admin-chat.yaml
Tab label ChatGPT ChatGPT

The two chat surfaces keep their existing Chat-Completions SSE parser duplication (pre-existing before this PR). Consolidating that parser is tracked as a follow-up; this PR mirrors Codex into both surfaces analogously to keep the change surface narrow.

Request-shape transformation (chatToResponsesRequest) and SSE event mapping (mapCodexEventToChatFrames) live in pure, testable helpers under app/L0/_all/mod/_core/openai_codex/. Each is fully covered by unit tests, including regression guards for the bugs surfaced during live testing.

No new runtime dependencies

Implementation uses only Node built-ins (crypto, http, Buffer) and existing space-agent utilities (space.api, space.utils.userCrypto, space.utils.yaml, space.proxy). Zero new npm dependencies added. package.json and package-lock.json are unchanged.

No regression for the existing API path

OpenRouter and other OpenAI-compatible API-provider configurations are byte-compatible with main:

  • Chat-Completions SSE parser in both chat surfaces is untouched.
  • Existing request-body builder for the API tab is untouched.
  • Storage layer's api_key encoding/decoding is untouched (the new codex_tokens path runs alongside).
  • createOnscreenAgentLlmClient / streamAdminAgentCompletion gain one additional provider branch; existing branches are unchanged.
  • Settings UI adds one tab; the existing API and Local tabs render and save identically.

Browser support matrix unchanged: no new polyfills, the new code uses atob(), fetch(), and AbortController, all widely supported since ~2019. No Electron-specific hooks were added; the packaged desktop build works because it uses the same renderer code.

Known pitfalls handled

  • Cloudflare originator headers: load-bearing; documented with an anti-refactor warning in app/L0/_all/mod/_core/openai_codex/AGENTS.md.
  • Empty response.output on completion: text is accumulated live from response.output_text.delta; the final response.completed payload is only consulted for usage and finish_reason. Documented as the ignore-final-output contract.
  • Codex rejects several Chat-Completions fields (max_output_tokens, temperature, tools, etc.): all stripped in chatToResponsesRequest; list is maintained explicitly so future additions are a one-liner.
  • User/assistant role asymmetry: Codex rejects input_text under an assistant entry (must be output_text); convertContentParts(content, textType) takes the role-aware type, with a regression test that walks the full input array and rejects any misplaced input_text.
  • Cross-tab refresh race: ensureFreshCodexAccessToken re-reads persisted tokens on every call (no in-memory cache), and an in-module single-flight map coalesces concurrent refreshes within one tab on top of the server-side mutex.
  • client_version is a required query parameter on the Codex /models endpoint; omitting it returns HTTP 400 invalid_request_error. Baked into CODEX_MODELS_ENDPOINT.

Follow-ups (not in this PR)

  • Integration test harness against a mock Codex server — pure-function coverage is solid, live-call tests would be brittle and ToS-sensitive in CI.
  • stateSystem.js named-lock refresh mutex for clustered deployments (WORKERS>1).
  • "Import tokens from overlay" button in admin settings, for users who prefer not to OAuth twice.
  • Shared SSE-parser module between overlay and admin (part of the broader overlay/admin parser consolidation already present in main).
  • Distinguishing re-login-required errors from retryable errors in the UI beyond the existing error text surface.

Test plan

Reviewers without a ChatGPT Plus subscription can verify the non-subscription parts (see the "Testing This Locally" section in the module's AGENTS.md):

  • node --test tests/openai_codex_*_test.mjs — 51 tests pass
  • node space serve starts cleanly
  • curl -X POST http://127.0.0.1:3000/api/openai_codex_auth_start returns HTTP 401 with {"error":"Authentication required"} (auth gate works)
  • Overlay and admin settings dialogs render the new ChatGPT tab with the logged-out state (explainer + Sign-in button + scope note)

With a ChatGPT Plus subscription (only author verified):

  • Overlay: Sign in → verify code at auth.openai.com/codex/device → tokens persisted encrypted in ~/conf/onscreen-agent.yaml under codex_tokens
  • Overlay: send chat via gpt-5.4-mini, streamed response rendered correctly
  • Overlay: wait >1h, send another chat, silent refresh via /api/openai_codex_token_refresh
  • Overlay: sign out, confirm codex_tokens removed from config file
  • Admin chat: same flow against ~/conf/admin-chat.yaml
  • Live model-catalog discovery populates dropdown with account-entitled models; falls back to static 6-model catalog on network failure
  • Provider switch (Codex ↔ API ↔ Local) preserves settings across tabs

🤖 Generated with Claude Code

Syring, Nikolas added 28 commits April 23, 2026 13:16
Adds the headless `_core/openai_codex/` module with Codex endpoint
detection, Cloudflare-compatible header helpers, and the module
contract documenting the ChatGPT Plus OAuth transport.

- request.js defines CODEX_BASE_URL plus prefix-match detection so
  `/responses`, `/models`, and future sub-endpoints share one matcher
- applyCodexHeaders() sets the required User-Agent plus originator
  headers; the Cloudflare layer in front of the Codex endpoint rejects
  requests without these regardless of token validity
- extractChatGPTAccountId() tolerantly parses the OAuth JWT claim
  `https://api.openai.com/auth.chatgpt_account_id` using atob() for the
  browser request-mutation path; malformed tokens silently return "" so
  the header is just omitted
- AGENTS.md documents the Cloudflare header requirement, the Chat-
  Completions-to-Responses request-shape conversion rules, the SSE
  event mapping tables, the persisted token shape, and the OAuth URL
  constants
- root AGENTS.md index gains the new module path
Pure stateless converter from OpenAI Chat-Completions bodies into
Codex Responses-API bodies, with 11 unit tests covering the shape
rules documented in the module AGENTS.md.

- the first system message lifts into the top-level `instructions`
  string and drops from `input`
- remaining user/assistant messages become `input[]` entries with
  `content: [{ type: "input_text", text }]`
- multimodal `text` parts stay as `input_text`, `image_url` parts
  become `input_image`
- `max_output_tokens`, `temperature`, and other Chat-Completions-only
  fields are stripped because Codex rejects them with HTTP 400
- `store: false` is always forced so Codex does not retain completions
Pure stateless mapper from Codex Responses-API SSE events into the
Chat-Completions-shaped delta frames the existing space-agent SSE
parsers already understand, with 15 unit tests covering single-event
mapping plus realistic multi-event sequences.

- `response.output_text.delta` and `response.refusal.delta` emit
  `{ choices: [{ delta: { content: delta }, index: 0 }] }` frames
- `response.completed` synthesizes a finish frame with mapped usage
  tokens plus a `[DONE]` marker so the existing Chat-Completions
  stream reader terminates cleanly; the Responses-API does not emit
  a native `[DONE]` line
- `response.incomplete` maps the Codex reason onto the closest Chat-
  Completions finish_reason (`max_output_tokens` -> `length`,
  `content_filter` passes through)
- `response.failed` and standalone `error` events throw with the
  upstream message so the transport layer surfaces the error
- all other events (content_part.added/done, output_item.added/done,
  reasoning, audio, tool-calls, code_interpreter, file_search,
  web_search, image_gen, mcp, queued, annotations, custom_tool_call
  input, future unknowns) are skipped silently to avoid unknown-event
  log noise
- text is accumulated live from delta events rather than from the
  final `response.completed.response.output` because the Codex
  endpoint has been observed returning an empty `output` array even
  when deltas streamed correctly
Adds three authenticated endpoints that own the OAuth device-code flow
and refresh-token rotation against `auth.openai.com`, backed by a new
`server/lib/openai_codex/` helper subsystem.

Endpoints:

- `POST /api/openai_codex_auth_start` -> returns
  `{ deviceAuthId, userCode, verificationUrl, interval, expiresIn }`
- `POST /api/openai_codex_auth_poll` -> returns `{ status: "pending" }`
  until the user authorizes, then `{ status: "complete", tokens }`
- `POST /api/openai_codex_token_refresh` -> returns a refreshed token
  payload; maps OAuth `invalid_grant` onto HTTP 401 so the frontend can
  prompt a fresh login when the refresh token was consumed elsewhere

Why this lives on the server:

OpenAI refresh tokens use single-use rotation. A frontend-only
implementation cannot serialize concurrent tab refreshes safely: both
calls post the same single-use token, one succeeds, one returns
`invalid_grant`, and the only valid refresh token is lost. Full
re-authentication becomes the only recovery. This is a shared-data
integrity concern under the rule in `/server/AGENTS.md`.

The server layer provides:

- `oauth_client.js` pure transport functions for the three OAuth calls
  with defensive JSON body parsing, JWT account-id extraction from the
  `access_token` (never `id_token`), and `502` mapping for upstream
  failures
- `refresh_lock.js` in-process single-writer coalescer keyed by the
  refresh-token string; documented limitation: in clustered runtime
  (WORKERS>1) different workers still race, which is acceptable for
  the single-user single-browser-profile scenario and documented in
  `server/lib/openai_codex/AGENTS.md`

Token persistence stays on the frontend under `userCrypto:`-prefixed
encryption; the server never reads or writes tokens from the app tree.
No revoke endpoint is exposed because logout clears the encrypted
config entry and access tokens expire within about an hour.

Docs:

- `server/lib/openai_codex/AGENTS.md` new contract doc
- `server/api/AGENTS.md` documents the new endpoint family and its
  backend-ownership rationale
- `server/AGENTS.md` index and structure updated
- root `AGENTS.md` index updated
Browser-side helper that wraps the three OAuth backend endpoints and
guarantees the locally persisted refresh token is the one actually
used at refresh time, avoiding single-use-rotation loss across tabs.

- `ensureFreshCodexAccessToken({ loadTokens, saveTokens, ... })`
  re-reads persisted tokens on every call rather than trusting an
  in-memory copy; another tab or process may have rotated the refresh
  token, and the single-use rotation rule means a stale in-memory
  refresh_token would fail with invalid_grant on next refresh
- refresh is triggered when the access token is within the default
  300s safety margin of expiry; the persisted `expires_at` timestamp
  is computed server-side from the OAuth `expires_in` response so both
  chat surfaces share one source of truth
- concurrent refresh calls that observe the same stale refresh_token
  are coalesced into one network request via an in-module map of
  in-flight promises, on top of the separate server-side mutex in
  `server/lib/openai_codex/refresh_lock.js`
- `saveTokens` failures are logged via console.warn but do not block
  the active request; the refreshed tokens are still returned so the
  LLM call can proceed
- thin wrappers `startCodexDeviceAuthorization` and
  `pollCodexDeviceAuthorization` expose the first two OAuth endpoints
  for the upcoming settings UI
- 11 unit tests cover not-expiring, refreshing, always-fresh read,
  concurrent coalesce, missing tokens, invalid_grant propagation, and
  save-failure resilience
Adds the shipped Codex model catalog used by the settings UI and the
stateful controller that drives the device-code login UX against the
three OAuth backend endpoints. Also extends the overlay config enum
with the `openai-codex` provider variant so upcoming UI wiring has a
third tab to bind to.

- `models.js` ships the 6 Codex models observable from ChatGPT Plus
  subscriptions and exports `CODEX_DEFAULT_MODEL_ID` (`gpt-5.4-mini`)
  as the recommended default: cheapest, fastest, and the most
  quota-friendly option. Live discovery via
  `GET /backend-api/codex/models?client_version=1.0.0` is left as
  follow-up rather than MVP because it would add a second network
  call to the login UX.
- `auth_flow.js` emits a small finite state machine
  (`STARTING` -> `PENDING` -> `COMPLETE` | `FAILED`) so the settings
  UI can display the verification URL and user code while polling
  runs, and aborts cleanly via `AbortSignal` when the user cancels.
  It uses the poll interval returned by the OAuth server but enforces
  a 3-second floor so we never hammer the endpoint if the server
  sends something unusable.
- `onscreen_agent/config.js` gains the `openai-codex` provider enum
  value plus new settings fields (`codexModel`, `codexTokens`), and
  `normalizeOnscreenAgentLlmProvider` now recognizes the third
  variant. UI and transport wiring land in the next two commits.
Integrates the `openai-codex` provider into the overlay chat surface:
a new LLM-client subclass with a Codex-aware SSE reader, the request-
mutation hook that swaps Chat-Completions shape for Responses-API
shape and adds the Cloudflare-required headers, and the settings UI
that drives the device-code login.

Transport:

- `OnscreenAgentCodexLlmClient` extends the shared base client and
  uses a dedicated `readCodexStreamingResponse` that feeds raw SSE
  event blocks through `mapCodexEventToChatFrames` so the upstream
  per-delta callback contract stays identical for all providers
- the OpenRouter-style Chat-Completions SSE reader is left untouched;
  Codex is an additive subclass rather than a core refactor
- `createOnscreenAgentLlmClient` now dispatches on three provider
  variants (API / CODEX / LOCAL)

Request hook (`ext/js/_core/onscreen_agent/api.js/prepareOnscreen
AgentApiRequest/end/openai-codex.js`):

- detects Codex-provider settings and rewrites the prepared request
  in place: `requestUrl` becomes the Codex `/responses` endpoint,
  `requestBody` goes through `chatToResponsesRequest`, headers add the
  Cloudflare originator plus extracted `ChatGPT-Account-ID`
- ensures a fresh access token through the always-fresh-read
  `ensureFreshCodexAccessToken` helper, loading persisted tokens from
  the `userCrypto:`-encrypted `codex_tokens` entry in
  `~/conf/onscreen-agent.yaml` and saving refreshed tokens back into
  the same file so other tabs pick up the rotation

Settings UI:

- third segmented-control tab labeled `ChatGPT` next to the existing
  `API` and `Local` tabs
- three UI states: logged-out (explainer + Sign in button),
  login-pending (verification URL + user code + Cancel), and signed-
  in (account summary + model dropdown + Sign out)
- login flow runs through `runCodexDeviceAuthorizationFlow` with an
  AbortController so Cancel stops polling immediately
- model dropdown is populated from `CODEX_MODEL_CATALOG` and defaults
  to `gpt-5.4-mini`

Docs:

- onscreen_agent AGENTS.md now documents the three-tab contract and
  references the Codex flow + token-storage rule
Mirrors the overlay integration for the admin chat surface. The
admin runtime is function-based rather than class-based, so the
shape differs from the overlay wiring but the user-facing behavior
and transport contract are identical.

Transport:

- `streamAdminAgentCodexCompletion` is a new transport function
  added next to `streamAdminAgentApiCompletion`; it validates Codex-
  specific settings and dispatches through the shared Codex request
  hook so the outbound body and headers end up in Responses-API
  shape with the Cloudflare-required originator
- `readCodexStreamingResponse` owns SSE decoding through the shared
  `mapCodexEventToChatFrames` mapper; the Chat-Completions SSE reader
  is unchanged so OpenRouter and other OpenAI-compatible providers
  are untouched
- `streamAdminAgentCompletion` now dispatches on three provider
  variants (LOCAL / CODEX / API)
- the existing SSE reader is duplicated in admin and overlay scopes;
  that duplication is pre-existing and intentionally left alone for
  this PR to keep the change surface narrow

Request hook (`ext/js/_core/admin/views/agent/api.js/prepareAdmin
AgentApiRequest/end/openai-codex.js`):

- detects Codex-provider settings and rewrites the prepared request
  to the Codex `/responses` endpoint with shape conversion, fresh-
  token injection, and Cloudflare headers
- reads and writes persisted tokens in `~/conf/admin-chat.yaml`
  under `codex_tokens` as `userCrypto:`-encrypted ciphertext; admin
  and overlay keep separate token files so the two surfaces can
  technically use different ChatGPT accounts, and refresh-token
  rotation on one surface never races the other

Settings UI:

- third segmented-control tab labeled `ChatGPT` next to `API` and
  `Local`, with the same three UI states (logged-out / pending /
  signed-in) and abort-controlled login flow as the overlay
- adds a scope notice on the logged-out state pointing to the
  overlay settings so users know the login does not carry across
  surfaces, which mirrors the separate token-file decision above

Docs:

- admin AGENTS.md now documents the three-tab contract and the
  admin-scoped Codex login rules
Adds a public-facing section to the README describing the new
`ChatGPT` provider tab in the overlay and admin chat, with setup
steps, cross-surface scope notes, and troubleshooting for the three
failure modes most likely to confuse operators:

- Cloudflare `cf-mitigated: challenge` 403 when a downstream edit
  strips the required originator headers
- `invalid_grant` after a parallel Codex CLI or VS Code extension
  session rotated the same refresh token
- `response.completed.response.output` being empty despite streamed
  deltas (we accumulate from deltas by design)
- Chat-Completions-only body fields being rejected with HTTP 400

Also adds an explicit `Requirements` subsection naming ChatGPT Plus
as the verified plan, and a ToS disclaimer covering the use of the
official OpenAI Codex OAuth flow.
Two bugs blocking the device-code flow against the live endpoint:

1. The OpenAI device-code response uses `interval` as a string (e.g.
   `"5"`) and encodes expiry as an ISO-8601 `expires_at` timestamp
   rather than an `expires_in` seconds field. The previous
   `Number.isFinite(payload.interval)` check rejected the string and
   fell back to the 3-second minimum; `expires_in` was always missing
   so the flow quietly used the 900-second default. Parse both
   defensively and derive seconds from `expires_at` when `expires_in`
   is absent.

2. The poll endpoint returned `{ status: "pending" }` and
   `{ status: "complete", tokens }`. The router's HTTP-response-shape
   heuristic (`server/router/responses.js`) treats any top-level
   `status` key as an HTTP status code, so `Number("pending")` became
   `NaN` and `writeHead` crashed with `ERR_HTTP_INVALID_STATUS_CODE`.
   Rename the semantic field to `state` on both the server return and
   the frontend poll handler.
… for user

Codex's Responses API rejects `input_text` parts that sit under a
`role: "assistant"` input entry with HTTP 400 `invalid_value` on
`input[n].content[0]`. The two text-types are source-scoped:

- user-authored turns use `input_text`
- assistant-authored turns use `output_text`

`convertContentParts` now takes the target text-type from its caller,
and `extractInstructionsAndInput` derives that type per message from
the role. System messages still render as plain text for the
`instructions` field so the internal pass uses `input_text` there as
a neutral default.

The previously recorded test that asserted `input_text` for assistant
strings was itself the bug frozen into an expectation; it is updated
to assert the correct split. A separate regression test now walks the
full `input[]` and fails fast if any assistant entry ever emits
`input_text` again.
The hook files previously reached directly into the file API to read
and write the encrypted `codex_tokens` entry of each surface's YAML
config. That bypassed the store, so `saveSettingsFromDialog` never
copied `codexTokens` into `this.settings` and `buildStoredConfig
Payload` never encoded the tokens at all. The result: users stayed
signed-in for the duration of the dialog draft, but every reload
reverted to the signed-out state because the tokens were never
persisted through the canonical save path.

Clean the contract up so token persistence follows the same pattern
as `api_key`:

- New `app/L0/_all/mod/_core/openai_codex/token_envelope.js` owns the
  encrypt/decrypt envelope, lock-state preservation, and the
  SINGLE_USER_APP bypass, mirroring the existing `encodeStoredApiKey`
  / `decodeStoredApiKey` helpers rather than duplicating that logic in
  each surface's storage.
- `onscreen_agent/storage.js` and `admin/views/agent/storage.js` now
  persist `codex_model` and `codex_tokens` in their YAML payload, and
  load them back through the new envelope helper. Lock-state
  bookkeeping (`storedCodexTokensLocked`, `storedCodexTokensValue`)
  is preserved across saves so a session that cannot currently
  decrypt the ciphertext does not wipe it on save.
- Both stores now include `codexModel`, `codexTokens`, and the
  stored-lock bookkeeping fields in their `settings` and
  `settingsDraft` templates, and `saveSettingsFromDialog` copies those
  fields from draft to live settings. A new
  `applyRefreshedCodexTokens(tokens)` method accepts rotated tokens
  from the hook, updates live settings, keeps the dialog draft in
  sync while the dialog is open, and awaits `persistConfig` so the
  refreshed refresh_token is written back before the next request.
- The overlay and admin request hooks now read tokens from
  `settings.codexTokens` (already decrypted by storage.js during
  load) and, after a refresh, hand the new tokens to
  `Alpine.store("onscreenAgent" | "adminAgent").applyRefreshedCodex
  Tokens(tokens)`. No hook ever reads or writes the YAML file
  directly anymore.
Adds runtime discovery of the Codex model catalog against
`https://chatgpt.com/backend-api/codex/models?client_version=1.0.0`.
The live list is account-scoped: a ChatGPT Plus account may expose
different models than the hardcoded 6-entry fallback, and new models
appear upstream faster than we can refresh a static list.

Split into two files so the parser is independently testable:

- `models_parser.js` is a pure `parseCodexModelsResponse(payload)` that
  filters entries with `supported_in_api === false` or
  `visibility` in {`hide`, `hidden`} (case-insensitive), keeps the
  `slug` as the outbound `id`, prefers `description` over `display_name`
  for the subtitle, and sorts by `(priority, slug)` ascending to match
  the reference Codex-rs ordering. No filter on model-id prefix so
  future model names (`gpt-5.5`, `gpt-6`, ...) flow through without a
  code change.
- `models_discovery.js` wraps the parser with a browser fetch helper
  that reuses `applyCodexHeaders()` for the Cloudflare originator plus
  bearer/account-id headers, tries a direct fetch first, falls back to
  the existing `/api/proxy` (space-agent outbound-proxy infrastructure,
  not a new backend endpoint) on CORS or network errors, and silently
  returns `[]` on any failure so the settings UI can fall back to the
  static `CODEX_MODEL_CATALOG`.

7 parser tests cover happy-path parsing, hidden-visibility filtering,
unsupported-in-api filtering, priority+slug sort ordering, missing
slug rejection, `display_name` fallback for descriptions, and
malformed-payload tolerance.
Both chat-surface stores now merge the discovered runtime catalog with
the static fallback and expose it through the existing
`codexModelCatalog` getter, so panel.html needs no conditional
branches. Runtime entries win on matching `id` so live descriptions
and ordering are preserved; static-only entries trail so the dropdown
always has content even when discovery fails.

- `refreshCodexModelCatalog({ force = false })` is the one entry point.
  It caches the runtime catalog in memory with a 10-minute TTL,
  coalesces concurrent callers through a shared in-flight promise,
  and returns `[]` with the TTL armed when no access token is
  available (so we do not re-probe for a token that will never
  arrive during the current session).
- The settings dialog triggers a non-forced refresh on open; a
  non-authenticated user transparently no-ops. A forced refresh
  happens on successful login so the dropdown reflects the live list
  before the user picks a model.
- New `isCodexSelectedModelInCatalog` getter + UI fallback option
  keeps the user's currently-configured model selectable and readable
  even when the live catalog no longer lists it (account downgrade,
  deprecation). A field-note warns the user that chats may fail and
  suggests switching.

Admin and overlay receive mirror changes: parallel state fields,
parallel merge helpers, parallel getter + dialog trigger. The admin
runtime still does not expose a runtime namespace, so the Alpine
store is the contract between hook and store the same way the token-
rotation refactor introduced it.
…proxy

The previous direct-first / proxy-second fallback was doomed: the
`chatgpt.com/backend-api/codex/models` endpoint sends no
`Access-Control-Allow-Origin` header, so every direct browser fetch
ends with a CORS block. The block surfaces differently depending on
the preflight outcome (opaque `Response` with `status: 0` on some
browsers, synthetic HTTP 400, or a generic `TypeError` on others),
and the fallback condition `err.message.includes("CORS")` was brittle
across those variants.

Route every discovery request through the space-agent outbound proxy
(`space.proxy.buildUrl(...)` -> `/api/proxy`) unconditionally. This is
the existing cross-origin-read infrastructure shared by other frontend
modules, does not require a new backend endpoint, and the proxied URL
is same-origin so CORS is never a factor. When the runtime does not
expose `space.proxy.buildUrl` (test environment, stripped runtime)
the helper now short-circuits to an empty array so callers still
receive the static catalog.

AGENTS.md updated to describe the single-path behavior and the reason
the direct-first attempt was removed.
The proxied model-discovery request landed on `/api/proxy` with
`credentials: "omit"`, so the browser did not attach the
`space_session` cookie. The space-agent authenticated-by-default
API routing then rejected the request with HTTP 401
`{"error": "Authentication required"}` before it could reach the
Codex endpoint, and the signed-in user kept seeing only the static
model catalog.

Switch to `credentials: "same-origin"` — the request URL is always
same-origin because `space.proxy.buildUrl(...)` rewrites to the
space-agent `/api/proxy` endpoint. The proxy explicitly strips the
`cookie` header before forwarding upstream (see
`UPSTREAM_REQUEST_HEADERS_TO_STRIP` in `server/router/proxy.js`), so
this change does not leak space-agent session state to `chatgpt.com`.
The Codex `/models` endpoint requires the `client_version` query
parameter; omitting it returns HTTP 400 `invalid_request_error` with
`loc: ('query', 'client_version'), msg: 'Field required'`. The
previous `CODEX_MODELS_ENDPOINT` constant resolved to the bare path
without the query, so every live discovery call was rejected and the
settings dropdown kept falling back to the static catalog.

The value itself is not account-scoped and only identifies the caller
surface; we advertise `0.0.0` to match the User-Agent prefix we
already send for Cloudflare compliance.
`summarizeOnscreenAgentLlmSelection` and
`summarizeAdminAgentLlmSelection` fell through to
`summarizeLlmConfig(apiEndpoint, model)` for any non-local provider,
which meant a signed-in Codex session displayed the unconfigured API-
tab default (`anthropic/claude-sonnet-4.6`) next to the composer and
throughout the thread view even though the active transport was
Codex with `gpt-5.4` (or whichever Codex model the user selected).

Add an explicit `CODEX` branch to both summaries that reads
`settings.codexModel`, normalized through `normalizeCodexModelId()`
so empty or missing values fall back to the shipped default
`gpt-5.4-mini` rather than an empty string. The API branch remains
unchanged so existing OpenRouter and other OpenAI-compatible setups
are unaffected.
Saving Codex settings previously flashed the status text
`API chat settings updated.` (overlay) or `API LLM settings updated.`
(admin), which is misleading because the active provider at save time
is Codex, not the OpenAI-compatible API tab. Both stores now branch
explicitly on `CODEX` and surface `ChatGPT settings updated.` so the
confirmation matches the tab the user just saved from.
The module cannot be fully exercised without an active ChatGPT Plus
subscription, so the AGENTS.md now spells out explicitly what a
reviewer can and cannot verify without credentials. Pure-function
tests, endpoint auth-gate registration, and module-hierarchy import
cleanliness are reviewable without a subscription; the OAuth flow,
live chat, refresh, and live catalog discovery are not.

This prevents reviewers from either (a) rejecting the PR because
they cannot reach end-to-end verification locally or (b) trying to
use personal credentials and leaving them in temp state afterwards.
Commit c637738 renamed the poll endpoint's return field from `status`
to `state` to avoid collision with the shared router's HTTP-status
heuristic in `server/router/responses.js`, but the two AGENTS.md
descriptions of the endpoint contract were not updated. This brings
server/api/AGENTS.md and server/lib/openai_codex/AGENTS.md in line
with the code and notes the reason for the unusual field name so
future readers do not try to "fix" it back to `status`.
`OnscreenAgentCodexLlmClient.validateSettings` checked
`settings.model`, but the Codex request hook reads the model slug
from `settings.codexModel` (with `settings.model` only as a last-
resort fallback when `codexModel` is empty). The check therefore
passed for any user whose API tab still held the default OpenRouter
model `anthropic/claude-sonnet-4.6` even when Codex was the active
provider and `codexModel` was blank, and the subsequent Codex request
would fail with an unrelated upstream error.

Switch the check to the admin-side pattern
(`!codexModel && !model`) so both fields must be empty before the
"choose a Codex model" error surfaces. The default in
`DEFAULT_ONSCREEN_AGENT_SETTINGS.codexModel` means this rarely fires
in practice, but the asymmetry between the overlay and admin clients
was a real drift.
The exported `isCodexEndpoint` matcher was never called — the two
extension hooks gate on `settings.provider === "openai-codex"`
directly, which is stricter (provider intent) than URL matching
(could false-positive if a user pointed the API tab at Codex
manually). Per the project policy against unused compat shims and
mirrored code paths, remove the helper and its AGENTS.md paragraph
and add a short explanation of why the gate is the settings field
rather than a URL matcher.
Both surface stores defined near-identical local helpers
(`parseCodexTokensDraft`/`serializeCodexTokensDraft` in the overlay,
the `parseAdminCodexTokensDraft`/`serializeAdminCodexTokensDraft`
pair in the admin store) to turn the encrypted YAML `codex_tokens`
payload into an in-memory object and back. Promote the two helpers
into `token_envelope.js` as `parseCodexTokens` and
`serializeCodexTokens`, extend `parseCodexTokens` to also accept a
plain object (for the pre-parsed in-memory case), and have both
stores import them under their previous local names so call sites
stay unchanged.

This removes four duplicate helper definitions without touching the
persistence contract, and replaces the dangling re-exports in
`token_envelope.js` with actual consumers, satisfying the project
rule against unused compat shims.
…anel

The admin settings panel has a field-note explaining that the Codex
sign-in is scope-local to the admin chat and that the overlay needs
its own sign-in; the overlay panel had no such reverse note, so a
user who signed in through the overlay first had no in-UI hint that
the admin chat would still require a separate login. Mirror the note
on the overlay panel with the symmetric wording so the scope
separation is visible from either starting point.
The poll loop previously awaited `pollIntervalSeconds * 1000` at the
top of every iteration, including the first. A user who entered the
code into the ChatGPT browser before the space-agent UI had painted
the pending panel still waited the full interval (default 5 s) for
the first server poll. Skip the wait on the first iteration and keep
it on every subsequent iteration so fast humans see completion near-
instantly without changing the poll cadence for the common case.
Both storage layers built the YAML payload with a truthy-guarded
`codex_tokens` assignment: when the user signed out the key was
simply omitted from the new payload. Because `fileWrite` replaces
the file contents outright this was sufficient today, but it leaves
the guarantee implicit; a future change that starts merging
payloads, or an accidental in-place patch helper, could leave the
previous ciphertext in place after sign-out. Make the clearing
explicit by `delete`-ing the key in the sign-out branch so the
contract "empty in memory means empty on disk" is visible to future
maintainers.
The standard prepareOnscreenAgentCompletionRequest /
prepareAdminAgentApiRequest flow already routes its API URL through
space.proxy.buildUrl(...) for proxyable external endpoints. The Codex
hooks override `requestUrl` with the bare CODEX_RESPONSES_ENDPOINT and
therefore re-introduce the proxy bypass: every Codex chat call paid a
failed-direct-fetch roundtrip rescued by installFetchProxy(...)'s
fallback retry, and emitted a red `Access-Control-Allow-Origin` CORS
error in the DevTools console on the first call of every page load.

Mirror the pattern already used by models_discovery.js: route
CODEX_RESPONSES_ENDPOINT through space.proxy.buildUrl(...) explicitly
and set requestInit.credentials = "same-origin" so the proxy endpoint
receives the browser session cookie. The proxy strips the cookie
header before forwarding upstream, so this does not leak space_session
to chatgpt.com. The existing applyCodexHeaders() Cloudflare originator
plus User-Agent block continues to run with the proxied URL.

Document the proxy routing in openai_codex/AGENTS.md alongside the
existing Cloudflare-header anti-refactor warning so a future cleanup
pass does not silently revert the routing.
Copy link
Copy Markdown

@ThomasR101 ThomasR101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, and building a claude code subscription provider layer too, similar to this thanks for the initial planning of implementation for adding providers.

@nsyring
Copy link
Copy Markdown
Author

nsyring commented Apr 28, 2026

Thanks for the approval — and very cool that the provider scaffolding is already useful for the Claude Code subscription layer.

Two things I'd like to add to main, if useful:

  1. Mistral provider. Cheap, generous rate limits, and I already
    wired it up for hermes-agent One caveat: Mistral's tool
    calling differs from the OpenAI-compatible shape — <one sentence
    on what specifically: the schema / the prefix / how arguments are
    returned> — so the tool adapter needs a small Mistral branch.
    Want me to open a PR?

  2. Model-profile layer in model settings. Right now switching
    provider, model, system-prompt and tool config means touching
    several settings independently. A "profile" would bundle those
    into one named preset you can swap in a single click — useful
    when you A/B providers (e.g. Codex vs. Mistral vs. local) on
    the same task.
    Happy to design and PR this, or leave it to you if it overlaps
    with something on your roadmap. Let me know which you'd prefer.

Cheers,
Niko

@yieldf
Copy link
Copy Markdown

yieldf commented May 3, 2026

@nsyring works great local but when I try to run it over caddy (reverse proxy) to have it run over dns it throws a 403 after authenticating.

@nsyring
Copy link
Copy Markdown
Author

nsyring commented May 4, 2026

@nsyring works great local but when I try to run it over caddy (reverse proxy) to have it run over dns it throws a 403 after authenticating.

@yieldf can you please provide more information to your setup? docker-compose.yml, or similar?
Space-agent has its own internal proxy.

I will check it and provide a fix, if possible.

@arashilmg arashilmg mentioned this pull request May 8, 2026
@arashilmg
Copy link
Copy Markdown

#63

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants