Fix open security & reliability backlog (#38–#47)#52
Open
OnlyTerp wants to merge 10 commits into
Open
Conversation
…45) _resolve_api_key previously fell back to the ambient Cursor key (~/.codex-shim/cursor-api-key, then $CURSOR_API_KEY) for ANY model whose resolved api_key was empty. A model entry pointing at an arbitrary base_url with no api_key would therefore silently forward the user's Cursor key as a Bearer token to that host. Restrict the fallback to Cursor-target entries only — provider in {cursor, cursor-passthrough} or a base_url on a known Cursor host. Empty keys on any other provider now stay empty, so byok_model_has_credentials() returns False and the model is excluded from usable_byok_models() instead of leaking the key upstream. Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
patch-app rewrote the trusted Codex Desktop app.asar bundle using `npx --yes asar`, which resolves and immediately executes whatever version the npm registry currently serves. A registry/maintainer compromise of `asar` (~2M weekly downloads) would gain code execution against the bundle with the user's privileges. Pin the packer to an exact, auditable spec (`@electron/asar@4.2.0`) via `npx --package <spec> asar`, so the resolved version is deterministic regardless of PATH/cache. The spec is overridable via CODEX_SHIM_ASAR_PACKAGE for mirrors or vetted upgrades, and the active spec is printed at patch time. Document the pin and the override in the README. Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…39) _maybe_intercept_web_search called loop.run_until_complete(_perform_web_search) from inside the already-running request handler loop. CPython always raises RuntimeError("This event loop is already running.") there, and the except RuntimeError swallowed it, so web search silently returned "Web search unavailable in this context." on every call. Make _maybe_intercept_web_search a coroutine and await _perform_web_search directly; both non-streaming call sites (_post_openai_chat, _post_anthropic as_responses) now await it. The blocking urlopen in _perform_web_search is moved off the event loop via asyncio.to_thread so a slow search no longer stalls the handler. The run_until_complete/except-RuntimeError block is deleted. Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
The Auto Router cached routing decisions in a module-level dict whose only eviction was a bulk clear() at capacity — flip-flopping between 256 entries and 0 (thundering-herd of classifier calls) — and which was never invalidated when the active model changed, so a picker model switch left stale routing in place. The global dict was also shared across all ShimServer instances. Replace it with a bounded LRU RouterCache (OrderedDict). Each ShimServer owns its own cache (self.router_cache) and passes it to resolve_auto via a new cache= parameter; switch_model now clears it so reconfiguration invalidates stale decisions. A process-wide default RouterCache is retained for callers that don't supply one, keeping reset_cache()/resolve_auto() backward compatible. Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
GET /picker served the page (with the picker token embedded in inline JS) with only Content-Type, so any local script/extension with loopback access could frame it or passively read the token and drive /api/switch. Add defense-in-depth response headers: a strict Content-Security-Policy (default-src 'none', inline script/style only, connect-src 'self', frame-ancestors 'none'), X-Frame-Options: DENY, X-Content-Type-Options: nosniff, Referrer-Policy: no-referrer, and Cache-Control: no-store. Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
_restart_codex_app fell back to Popen(["Codex.exe"], shell=True) when LOCALAPPDATA was unset or the resolved path didn't exist. A bare name with shell=True does a PATH search (cmd.exe /C Codex.exe), so a malicious Codex.exe earlier on PATH would be executed by the user who started the shim when an authenticated picker user switches models with restart enabled. Remove the fallback: relaunch only via the absolute LOCALAPPDATA path; if it can't be resolved, log to stderr and skip the relaunch (Codex is still quit). Refactor the Windows/macOS branches into _restart_codex_windows / _restart_codex_macos helpers so each is directly unit-testable. Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Every upstream call reused one ClientTimeout(total=None, sock_read=None) — aiohttp's "no timeout". A stalled non-streaming upstream (provider outage, network partition) could hold an aiohttp worker open until the OS eventually reset the TCP connection, so a few stuck requests could exhaust capacity and require a process restart. Split the timeout by request shape: streaming (SSE) keeps an unbounded sock_read but a bounded sock_connect; non-streaming gets a finite total wall-clock cap (default 300s, override via CODEX_SHIM_UPSTREAM_TIMEOUT). A _upstream_timeout(body) helper selects per request and is applied to all BYOK and ChatGPT passthrough call sites; compaction (always non-streaming) uses the sync timeout. Documented in the README. Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
/api/models enumerated every configured model slug, display name, provider, and the active model to any loopback caller (local script, browser extension, or a DNS-rebinding page on a misconfigured bind host) with no auth, while the state-changing /api/switch already required the picker token. Guard api_models with the same _valid_picker_token check (403 without it), and update the picker page's loadModels() fetch to send the X-Codex-Shim-Picker-Token header so the UI keeps working. Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
) _dump_debug_request wrote the full forwarded request body — the entire conversation (code context, tool output, history) — to .codex-shim/last_request.json on every non-streaming forward, unconditionally and with the default umask (often world-readable 0644) inside the project directory. Gate it behind CODEX_SHIM_DEBUG_DUMP (off by default). When enabled, create the directory mode 0700 and write the file via os.open(..., 0o600) so it is private from creation rather than relying on the umask. Documented in the README. Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
CI ran only compileall + pytest with no static analysis, no coverage gate, and no test matrix entry for 3.13, and opencode_go.py's HTTP paths were entirely untested. - Add a lint job: blocking `ruff check` (after fixing the 2 real lint errors — an unused import and an undefined-name forward ref) plus a report-only mypy step (codebase isn't fully typed yet). - Run pytest with coverage and --cov-fail-under=70; add Python 3.13 to the matrix and classifiers. - Configure ruff/mypy/coverage in pyproject.toml and add ruff/mypy/pytest-cov to dev extras; ignore coverage/cache artifacts in .gitignore. - Add tests/test_opencode_go.py covering _request_json (success, HTTPError, URLError, non-JSON), model discovery + probes, and refresh orchestration with stubbed urlopen. opencode_go.py coverage 70% -> 97%. Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
This was referenced Jun 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the full open backlog of 10 issues — security hardening, reliability fixes, and a CI quality-gate uplift. Each issue is addressed in its own focused commit with regression tests.
_resolve_api_keyno longer leaks the Cursor key to empty-key models — fallback scoped to Cursor-target entries onlypatch-apppins the ASAR packer (@electron/asar@4.2.0,npx --package) instead of unpinnednpx --yes asar_maybe_intercept_web_searchis async and awaits the search (removesrun_until_completedeadlock)/pickerserves CSP +X-Frame-Options: DENY+nosniff+Referrer-PolicyheadersPopen(["Codex.exe"], shell=True)PATH-hijack fallback in the Codex restart pathCODEX_SHIM_UPSTREAM_TIMEOUT); streaming keeps unbounded read/api/modelsnow requires the picker token (403 otherwise); picker JS sends it_dump_debug_requestis gated behindCODEX_SHIM_DEBUG_DUMPand written0600in a0700dirrufflint gate, report-only mypy, coverage (--cov-fail-under=70), Python 3.13 matrix; newtest_opencode_go.py(module 70%→97%)Notes for reviewers
test_settings_catalog.py,test_server.py,test_router.py, and the newtest_opencode_go.py.run_until_completedeadlock is fixed and tested via_post_openai_chat. A separate pre-existing quirk (nativeweb_search/apply_patchtools are flattened tofunctionbyresponses_to_chatbefore_build_tool_typesruns on the forwarded body) is intentionally left out of scope — it predates this issue and has a different root cause.ruff check(E/F/W) is blocking and passes;ruff format/import-sorting (18-file reflow) and mypy (16 pre-existing type errors) are kept report-only to avoid a permanently-red pipeline or a sweeping out-of-scope reformat. Line-length is set to 160 to match existing code.router.reset_cache()/resolve_auto()still work without an explicit cache; the process-wide defaultRouterCacheis retained.Test plan
python -m compileall codex_shim/ -qruff check codex_shim/ tests/— cleanpytest tests/ -q --cov=codex_shim --cov-fail-under=70— 222 passed, 70.32% totalGenerated with Devin