Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (14)
✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
|
✅ Actions performedReview triggered.
|
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
| } | ||
| console.warn( | ||
| `codex-multi-auth: treating unreadable supervisor lock as stale at ${lockPath}: ${ | ||
| error instanceof Error ? error.message : String(error) | ||
| }`, | ||
| ); | ||
| return true; | ||
| } | ||
| } | ||
|
|
||
| async function withSupervisorStorageLock(runtime, fn, signal) { | ||
| const lockPath = getSupervisorStorageLockPath(runtime); | ||
| const lockDir = dirname(lockPath); | ||
| const waitMs = parseNumberEnv( | ||
| "CODEX_AUTH_CLI_SESSION_LOCK_WAIT_MS", | ||
| DEFAULT_STORAGE_LOCK_WAIT_MS, | ||
| 0, |
There was a problem hiding this comment.
parseNumberEnv ignores 4th max argument — quota thresholds unbounded in fallback path
parseNumberEnv is defined as function parseNumberEnv(name, fallback, min = 0) — only 3 parameters. the 100 passed as the 4th argument here (and for 7d below) is silently dropped.
this means, when the compiled dist isn't present and someone sets CODEX_AUTH_PREEMPTIVE_QUOTA_5H_REMAINING_PCT=200 (or any value > 100), the threshold is returned as-is. since remaining5h can't exceed 100, remaining5h <= threshold5h becomes 100 <= 200 = true — the supervisor would always trigger rotation on every poll cycle, even for healthy accounts. this is a real regression risk on misconfigured machines.
fix: either add a max param to parseNumberEnv or clamp in-line:
getPreemptiveQuotaRemainingPercent5h:
configModule?.getPreemptiveQuotaRemainingPercent5h ??
((pluginConfig) =>
Math.min(
100,
parseNumberEnv(
"CODEX_AUTH_PREEMPTIVE_QUOTA_5H_REMAINING_PCT",
pluginConfig?.preemptiveQuotaRemainingPercent5h ?? 5,
0,
),
)),
getPreemptiveQuotaRemainingPercent7d:
configModule?.getPreemptiveQuotaRemainingPercent7d ??
((pluginConfig) =>
Math.min(
100,
parseNumberEnv(
"CODEX_AUTH_PREEMPTIVE_QUOTA_7D_REMAINING_PCT",
pluginConfig?.preemptiveQuotaRemainingPercent7d ?? 5,
0,
),
)),no existing test exercises env values > 100 — the bug is invisible to the current suite.
Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/codex-supervisor.js
Line: 621-637
Comment:
**`parseNumberEnv` ignores 4th `max` argument — quota thresholds unbounded in fallback path**
`parseNumberEnv` is defined as `function parseNumberEnv(name, fallback, min = 0)` — only 3 parameters. the `100` passed as the 4th argument here (and for `7d` below) is silently dropped.
this means, when the compiled dist isn't present and someone sets `CODEX_AUTH_PREEMPTIVE_QUOTA_5H_REMAINING_PCT=200` (or any value > 100), the threshold is returned as-is. since `remaining5h` can't exceed 100, `remaining5h <= threshold5h` becomes `100 <= 200 = true` — the supervisor would **always** trigger rotation on every poll cycle, even for healthy accounts. this is a real regression risk on misconfigured machines.
fix: either add a `max` param to `parseNumberEnv` or clamp in-line:
```js
getPreemptiveQuotaRemainingPercent5h:
configModule?.getPreemptiveQuotaRemainingPercent5h ??
((pluginConfig) =>
Math.min(
100,
parseNumberEnv(
"CODEX_AUTH_PREEMPTIVE_QUOTA_5H_REMAINING_PCT",
pluginConfig?.preemptiveQuotaRemainingPercent5h ?? 5,
0,
),
)),
getPreemptiveQuotaRemainingPercent7d:
configModule?.getPreemptiveQuotaRemainingPercent7d ??
((pluginConfig) =>
Math.min(
100,
parseNumberEnv(
"CODEX_AUTH_PREEMPTIVE_QUOTA_7D_REMAINING_PCT",
pluginConfig?.preemptiveQuotaRemainingPercent7d ?? 5,
0,
),
)),
```
no existing test exercises env values > 100 — the bug is invisible to the current suite.
How can I resolve this? If you propose a fix, please make it concise.| ); | ||
| const near5h = | ||
| typeof remaining5h === "number" && remaining5h <= threshold5h; | ||
| const near7d = | ||
| typeof remaining7d === "number" && remaining7d <= threshold7d; | ||
| const prewarm5h = | ||
| typeof remaining5h === "number" && remaining5h <= prewarmThreshold5h; | ||
| const prewarm7d = | ||
| typeof remaining7d === "number" && remaining7d <= prewarmThreshold7d; | ||
|
|
||
| if (!near5h && !near7d) { | ||
| return { | ||
| prewarm: prewarm5h || prewarm7d, | ||
| rotate: false, | ||
| reason: "none", | ||
| waitMs: 0, | ||
| remaining5h, | ||
| remaining7d, | ||
| }; | ||
| } | ||
|
|
||
| return { | ||
| prewarm: true, | ||
| rotate: true, | ||
| reason: "quota-near-exhaustion", | ||
| waitMs: computeWaitMsFromSnapshot(snapshot), | ||
| remaining5h, | ||
| remaining7d, | ||
| }; | ||
| } | ||
|
|
||
| function evaluateQuotaSnapshot(snapshot, runtime, pluginConfig) { | ||
| const pressure = computeQuotaPressure(snapshot, runtime, pluginConfig); | ||
| return { | ||
| rotate: pressure.rotate, | ||
| reason: pressure.reason, | ||
| waitMs: pressure.waitMs, | ||
| }; | ||
| } | ||
|
|
||
| function getSnapshotCacheKey(account) { | ||
| if (!account) return ""; | ||
| const segments = [ | ||
| `${account.accountId ?? ""}`, | ||
| `${account.email ?? ""}`, | ||
| `${account.index ?? ""}`, | ||
| ]; | ||
| return segments.some((segment) => segment.length > 0) ? segments.join("|") : ""; | ||
| } | ||
|
|
||
| function getSnapshotCacheTtlMs() { | ||
| return parseNumberEnv( | ||
| "CODEX_AUTH_CLI_SESSION_SNAPSHOT_CACHE_TTL_MS", | ||
| DEFAULT_SNAPSHOT_CACHE_TTL_MS, | ||
| 0, | ||
| ); | ||
| } | ||
|
|
||
| function clearProbeSnapshotCache(account) { | ||
| const cacheKey = getSnapshotCacheKey(account); | ||
| if (!cacheKey) return; | ||
| snapshotProbeCache.delete(cacheKey); | ||
| } | ||
|
|
||
| function clearAllProbeSnapshotCache() { | ||
| snapshotProbeCache.clear(); | ||
| } | ||
|
|
||
| function readCachedProbeSnapshot(account) { | ||
| const cacheKey = getSnapshotCacheKey(account); | ||
| if (!cacheKey) return null; | ||
| const entry = snapshotProbeCache.get(cacheKey); | ||
| if (!entry?.snapshot || entry.expiresAt <= Date.now()) { | ||
| if (entry && !entry.pending) { | ||
| snapshotProbeCache.delete(cacheKey); | ||
| } | ||
| return null; | ||
| } | ||
| return entry.snapshot; | ||
| } | ||
|
|
||
| function rememberProbeSnapshot(account, snapshot) { | ||
| const cacheKey = getSnapshotCacheKey(account); | ||
| if (!cacheKey) return; | ||
| const ttlMs = getSnapshotCacheTtlMs(); | ||
| if (ttlMs <= 0) { | ||
| snapshotProbeCache.delete(cacheKey); | ||
| return; | ||
| } | ||
| const current = snapshotProbeCache.get(cacheKey); | ||
| snapshotProbeCache.set(cacheKey, { | ||
| ...current, | ||
| snapshot, | ||
| expiresAt: Date.now() + ttlMs, | ||
| }); | ||
| } | ||
|
|
||
| async function probeAccountSnapshot(runtime, account, signal, timeoutMs, options = {}) { | ||
| if (signal?.aborted) { | ||
| throw createAbortError("Quota probe aborted"); | ||
| } | ||
| if (!account?.accountId || !account?.access) { | ||
| return null; | ||
| } | ||
| const cacheKey = getSnapshotCacheKey(account); | ||
| let pendingResolver = null; | ||
| let pendingRejecter = null; | ||
| let pendingPromise = null; | ||
| if (options.useCache !== false) { | ||
| const cachedSnapshot = readCachedProbeSnapshot(account); | ||
| if (cachedSnapshot) { | ||
| return cachedSnapshot; | ||
| } | ||
| const pendingEntry = cacheKey ? snapshotProbeCache.get(cacheKey) : null; | ||
| if (pendingEntry?.pending) { | ||
| return abortablePromise( | ||
| pendingEntry.pending, | ||
| signal, |
There was a problem hiding this comment.
Concurrency: TOCTOU window in stale-lock eviction on Windows
the isSupervisorLockStale → readSupervisorLockPayload → safeUnlink / safeUnlinkOwnedSupervisorLock sequence is not atomic. if two supervisor processes both see the lock as stale at the same instant:
- process A reads stale payload (owner = "old"), calls
safeUnlinkOwnedSupervisorLock("old")→ succeeds - process A acquires the lock and writes a new payload (owner = "A")
- process B then calls
safeUnlink(lockPath)on a lock it no longer owns — removing process A's live lock
the safeUnlinkOwnedSupervisorLock check mitigates most cases, but the raw safeUnlink(lockPath) branch (for payloads lacking ownerId) has no owner check and is vulnerable to this race. on windows, this is more likely due to slower fs operations and wider stale windows.
worst case: a live lock is deleted mid-operation, two processes hold the lock simultaneously, and account state is corrupted. while low-probability, consider using a write-verify cycle (read → compare owner → conditional unlink) or an atomic rename to replace stale locks.
Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/codex-supervisor.js
Line: 964-1081
Comment:
**Concurrency: TOCTOU window in stale-lock eviction on Windows**
the `isSupervisorLockStale` → `readSupervisorLockPayload` → `safeUnlink` / `safeUnlinkOwnedSupervisorLock` sequence is not atomic. if two supervisor processes both see the lock as stale at the same instant:
1. process A reads stale payload (owner = "old"), calls `safeUnlinkOwnedSupervisorLock("old")` → succeeds
2. process A acquires the lock and writes a new payload (owner = "A")
3. process B then calls `safeUnlink(lockPath)` on a lock it no longer owns — removing process A's live lock
the `safeUnlinkOwnedSupervisorLock` check mitigates most cases, but the raw `safeUnlink(lockPath)` branch (for payloads lacking `ownerId`) has no owner check and is vulnerable to this race. on windows, this is more likely due to slower fs operations and wider stale windows.
worst case: a live lock is deleted mid-operation, two processes hold the lock simultaneously, and account state is corrupted. while low-probability, consider using a write-verify cycle (read → compare owner → conditional unlink) or an atomic rename to replace stale locks.
How can I resolve this? If you propose a fix, please make it concise.
Summary
Adds an experimental wrapper that can resume interactive Codex sessions after account rotation instead of forcing a cold restart.
What changed
scripts/codex-supervisor.jsand wire it through the Codex wrappercodexCliSessionSupervisorbehind an opt-in setting5outside the supervisor splitValidation
npm run lintnpm run typechecknpm run test:session-supervisor:smokeSupersedes part of #139.
note: greptile review for oc-chatgpt-multi-auth. cite files like
lib/foo.ts:123. confirm regression tests + windows concurrency/token redaction coverage.Greptile Summary
this pr introduces an experimental codex session supervisor — a 2251-line wrapper (
scripts/codex-supervisor.js) that monitors running interactive codex sessions, detects quota pressure or rate-limit events, pre-warms a successor account, gracefully terminates the child, and relaunches withcodex resume <session-id>, all without forcing a cold restart. the feature is gated behindcodexCliSessionSupervisor: falseby default and exposed as a toggle in the experimental settings ui.key findings:
parseNumberEnv4th-arg bug:createRuntimeConfigAccessorscallsparseNumberEnv(..., 0, 100)for both quota-percent fallback accessors, butparseNumberEnvonly accepts 3 parameters — themax=100cap is silently dropped. in the fallback path (no compiled dist), an env var value greater than 100 produces a threshold that always triggers rotation on every monitor poll cycle.withSupervisorStorageLockchecks staleness and unlinks the lock file in two separate steps. on windows with concurrent supervisor processes, thesafeUnlinkbranch (for legacy locks withoutownerId) can delete a live lock acquired by a peer process between the stat and the unlink calls.spawnRealCodexpassesprocess.envdirectly to the child, inheriting all tokens — intentional for a cli wrapper but undocumented. thesafeUnlinkretry loop correctly handles transient windows ntfs errors.parseNumberEnvmax-arg bug undetectable by the current test suite.lib/quota-probe.ts, the session binding file scanner, the pre-warm selection pipeline, and the hotkey remapping insettings-hub.tsare all well-implemented.Confidence Score: 3/5
parseNumberEnvmax-arg drop can cause runaway rotation when the fallback config path is activecreateRuntimeConfigAccessors(parseNumberEnv 4th arg),withSupervisorStorageLock(stale-lock TOCTOU)Important Files Changed
parseNumberEnvcalled with a 4thmax=100arg it silently ignores (fallback quota thresholds unbounded), and a TOCTOU race in stale-lock eviction relevant on Windows.runCodexSupervisorIfEnabled. ThesupervisorDidForwardflag correctly prevents double auto-sync.forwardToRealCodexWithStartupSyncis called on each supervisor restart cycle, which is intentional but undocumented.findPrimaryCodexCommand,hasTopLevelHelpOrVersionFlag, andsplitCodexCommandArgshelpers. Logic correctly handles--, flag-value pairs, and--config=inline forms. No issues found.AbortSignalsupport tofetchCodexQuotaSnapshot. Signal is wired to the innerAbortControllervia event listener with proper cleanup infinally. The abort-before-loop-entry check is correct. No issues found.getCodexCliSessionSupervisoraccessor using the existingresolveBooleanSettingpattern. Defaults tofalse(opt-in). No issues found.parseNumberEnvmax-arg bug. Missing vitest coverage for the TOCTOU stale-lock race under concurrent processes.Sequence Diagram
sequenceDiagram participant W as codex.js (wrapper) participant S as codex-supervisor.js participant L as Storage Lock (.supervisor.lock) participant A as AccountManager participant Q as quota-probe participant C as codex (child process) W->>S: runCodexSupervisorIfEnabled({ codexBin, rawArgs }) S->>L: withSupervisorStorageLock → loadFromDisk L-->>A: AccountManager.loadFromDisk() A-->>S: manager + currentAccount S->>Q: ensureLaunchableAccount (probe batch) Q-->>S: ready account selected + persisted S->>C: spawnRealCodex(codexBin, initialArgs) activate C loop monitor loop (every pollMs) S->>S: waitForSessionBinding (scan .jsonl files) S->>L: withSupervisorStorageLock → loadCurrentState L-->>A: AccountManager.loadFromDisk() S->>Q: probeAccountSnapshot(currentAccount) Q-->>S: quota snapshot alt quota near threshold (prewarm) S->>Q: maybeStartPreparedResumeSelection (background) Q-->>S: pre-warmed successor account end alt quota exhausted / rate-limited (rotate) S->>C: requestChildRestart (SIGINT → SIGTERM → SIGKILL) deactivate C S->>L: markCurrentAccountForRestart S->>Q: commitPreparedSelection or ensureLaunchableAccount Q-->>S: next account committed S->>C: spawnRealCodex(codexBin, resume sessionId) activate C end end C-->>S: exit (code, signal) deactivate C S-->>W: supervisedExitCode W->>W: autoSyncManagerActiveSelectionIfEnabled (if needed)Prompt To Fix All With AI
Last reviewed commit: "Add experimental Cod..."