feat(codex): OpenAI Codex as an iii worker#254
Conversation
New Node worker that drives headless Codex turns over the iii bus, sibling to the claude-code worker with the same architecture. - codex::run / codex::start / codex::stop / codex::status / codex::sessions::list; codex::run shares the run::start_and_wait input contract so acp drives it via --brain-fn codex::run - raw Codex thread events (thread.started, item.*, turn.completed) mirrored verbatim onto codex::events; translated AgentEvent frames (message_complete, function_execution_start/end, turn_end, agent_end) on agent::events - thread resume via engine state scope codex_sessions (iii session_id to Codex thread id); per-turn cwd/model overrides honored - named fields for sandbox_mode, approval_policy, reasoning_effort, skip_git_repo_check, output_schema; raw ThreadOptions pass-through via options - request schemas published on every function (zod 4 toJSONSchema) - deploy: bundle with esbuild single-file build; codex_executable config plus PATH fallback - 55 vitest tests: full executeRun flow, register boundary, abort via AbortSignal, event mapping, config, state, executable resolution
- iii.worker.yaml install script uses pnpm, matching the declared packageManager and committed lockfile - smoke script exits non-zero when the run errors or expected frames are missing - test coverage for web_search item mapping
- codex_config payload field forwards per-turn config.toml overrides (MCP servers, model providers, profiles) as SDK config - images payload field attaches local_image inputs to the prompt - base_url worker config maps to the SDK baseUrl override - 58 tests
New threads start with the iii runtime block, the same engine-grounded discovery rules as the harness identity prompts retargeted to the iii CLI the agent reaches through its sandboxed shell: discover ids via engine::functions::list, fetch the contract with iii trigger <fn> --help before the first call, install from the public registry when nothing registered fits, never invent ids or fields from memory. Codex has no system-prompt option, so the block is prepended to the first turn of a new thread only; the thread persists it across resumes. A localhost engine is reachable under the default sandbox. iii_context: false disables per turn or in config.yaml. 61 tests.
Reading the Codex source (codex-rs/core: config/mod.rs, session/mod.rs) shows developer_instructions is a first-class config key injected as a developer-role message into every turn's context bundle. Switch the iii runtime block from first-message prepending to that channel: - applies on every turn including resumes, rebuilt as turn context rather than persisted history, so nothing duplicates - the user prompt stays untouched (no # Task framing) - composes with codex_config: a caller-supplied developer_instructions wins over the iii block Verified live: a turn with no iii mention in the prompt discovered and listed another worker's function ids via the engine catalog. 63 tests.
- codex::start marks the session 'error' best-effort when a background run throws, so a failed terminal save never leaves it stuck 'working' - mapUsage defaults absent cache/reasoning token fields to 0 instead of undefined From an /improve audit. 65 tests (2 new).
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughA new Rust-based ChangesCodex Worker — Rust Implementation
Sequence DiagramsequenceDiagram
actor Caller
participant functions as functions::register_all
participant run as codex::run
participant codex_exec as codex exec (child)
participant stream_turn as stream_turn
participant translate as translate::step
participant events as events::emit
participant state as state helpers
participant bus as iii engine
Caller->>functions: codex::run (RunRequest)
functions->>run: run(iii, cfg, req)
run->>state: save_session(Working)
state->>bus: state::set codex_sessions/session_id
run->>codex_exec: spawn codex exec --json [args]
run->>codex_exec: write prompt to stdin
run->>stream_turn: stream stdout JSONL
loop per JSONL line
stream_turn->>events: emit raw ThreadEvent
events->>bus: stream::set codex::events
stream_turn->>translate: step(state, ThreadEvent)
translate-->>stream_turn: Vec~Value~ frames
stream_turn->>events: emit each frame
events->>bus: stream::set agent::events
end
stream_turn-->>run: Outcome
run->>state: save_session(Done/Error)
run->>events: emit turn_end / agent_end
events->>bus: stream::set agent::events
run-->>Caller: result JSON {output, stop_reason, thread_id, usage}
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
skill-check — worker0 verified, 19 skipped (no docs/).
Four for four. Nicely done. |
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (4)
codex/tests/config.test.ts (1)
21-32: 💤 Low valueConsider cleaning up temporary directories.
The test creates temporary directories and files but doesn't remove them after the test completes. While the OS will eventually clean these up, explicit cleanup is better practice.
♻️ Suggested cleanup pattern
it('merges a partial file over defaults', async () => { const dir = await mkdtemp(join(tmpdir(), 'codex-config-')); + try { const path = join(dir, 'config.yaml'); await writeFile( path, ['engine_url: ws://10.0.0.1:49134', 'defaults:', ' sandbox_mode: read-only'].join('\n'), ); const cfg = await loadConfig(path); expect(cfg.engine_url).toBe('ws://10.0.0.1:49134'); expect(cfg.defaults.sandbox_mode).toBe('read-only'); expect(cfg.defaults.approval_policy).toBe('never'); + } finally { + await rm(dir, { recursive: true, force: true }); + } });You would need to import
rmfromnode:fs/promisesand apply the same pattern to the other tests that create temp directories.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@codex/tests/config.test.ts` around lines 21 - 32, This test creates a temporary directory via mkdtemp (variable dir) and writes config.yaml but never removes it; wrap the test body in a try/finally (or use afterEach) and in the finally call rm (imported from node:fs/promises) to delete the temp dir (await rm(dir, { recursive: true, force: true })) so the temporary files created by mkdtemp/join/writeFile are cleaned up even on test failures; apply the same pattern to other tests that create temp directories and reference mkdtemp/tmpdir/dir/path/loadConfig in your changes.codex/tests/executable.test.ts (1)
22-29: ⚡ Quick winClean up temporary directory and consider Windows compatibility.
The test creates a temporary directory (line 23) but never removes it, leading to resource accumulation. Additionally,
chmodSync(line 26) doesn't reliably set executable permissions on Windows, which could cause the test to fail or behave unexpectedly on that platform.♻️ Suggested improvements
+import { rmSync } from 'node:fs'; + describe('resolveCodexExecutable', () => { const originalPath = process.env.PATH; + const tempDirs: string[] = []; beforeEach(() => { process.env.PATH = ''; }); afterEach(() => { process.env.PATH = originalPath; + for (const dir of tempDirs) { + rmSync(dir, { recursive: true, force: true }); + } + tempDirs.length = 0; }); it('finds an executable codex on PATH', () => { const dir = mkdtempSync(join(tmpdir(), 'codex-exe-')); + tempDirs.push(dir); const bin = join(dir, 'codex'); writeFileSync(bin, '#!/bin/sh\n'); - chmodSync(bin, 0o755); + if (process.platform !== 'win32') { + chmodSync(bin, 0o755); + } process.env.PATH = dir; expect(resolveCodexExecutable('')).toBe(bin); });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@codex/tests/executable.test.ts` around lines 22 - 29, The test creates a temp directory with mkdtempSync and a test binary via writeFileSync and chmodSync but never removes the directory and uses chmodSync which is unreliable on Windows; update the test around resolveCodexExecutable to create a platform-appropriate executable (on win32 write a .cmd/.bat shim like join(dir, 'codex.cmd') instead of relying on chmodSync, otherwise create the POSIX file and set mode), set process.env.PATH to include the temp dir, and ensure the temp directory is removed after the test (use try/finally or an afterEach that calls rmSync/fs.rmSync with force/recursive) so the temp files are always cleaned up.codex/tests/register.test.ts (1)
52-60: ⚡ Quick winConsider adding error message assertions for clearer test intent.
The tests correctly verify that invalid payloads are rejected, but don't check the specific error messages. While this works, adding message pattern checks (e.g.,
expect(...).rejects.toThrow(/sandbox_mode/)) would make failures easier to diagnose and prevent false positives if a different error is thrown.✨ Example enhancement
it('codex::run rejects invalid payloads', async () => { const fake = await registeredWorker(); await expect( fake.registered.get('codex::run')?.({ prompt: 'x', sandbox_mode: 'yolo' }), - ).rejects.toThrow(); + ).rejects.toThrow(/sandbox_mode/); await expect( fake.registered.get('codex::run')?.({ prompt: 'x', approval_policy: 'always' }), - ).rejects.toThrow(); + ).rejects.toThrow(/approval_policy/); });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@codex/tests/register.test.ts` around lines 52 - 60, Update the test in register.test.ts to assert error messages so failures are clearer: when calling the handler obtained via registered.get('codex::run') (from registeredWorker()), replace the generic rejects.toThrow() checks with rejects.toThrow(/sandbox_mode/) for the payload containing sandbox_mode and rejects.toThrow(/approval_policy/) for the payload containing approval_policy so the test verifies the specific validation errors.codex/scripts/build-bundle.mjs (1)
28-45: ⚡ Quick winAdd error handling for SDK inlining failure.
The
inlinePackageJsonplugin silently proceeds if the SDK source doesn't match the expectedcreateRequire(...).("../package.json")pattern or ifnode_modules/iii-sdk/package.jsonis missing. If the replacement fails (e.g., SDK refactors its internal version-loading logic), the bundle will include the originalcreateRequirecall, which will fail at runtime when the bundle cannot resolve../package.jsonrelative to the bundle output path.Consider adding a check after the replacement to confirm at least one substitution occurred, and throw a build error if the pattern is not found.
🛡️ Proposed safeguard
const replaced = source.replace( /createRequire\(\s*import\.meta\.url\s*\)\s*\(\s*"\.\.\/package\.json"\s*\)/g, JSON.stringify({ version }), ); + if (replaced === source) { + throw new Error( + 'inlinePackageJson: expected createRequire pattern not found in iii-sdk/dist/index.mjs' + ); + } return { contents: replaced, loader: 'js' };🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@codex/scripts/build-bundle.mjs` around lines 28 - 45, The inlinePackageJson plugin currently replaces a createRequire(...) call in iii-sdk/dist/index.mjs but doesn't detect failures; update the setup/onLoad handler (inside inlinePackageJson) to: after reading args.path and node_modules/iii-sdk/package.json, attempt the replace as now, then check the replaced string to confirm at least one substitution occurred (e.g., by comparing lengths or using a regex test); if no replacement happened or the package.json read failed, throw an Error (or return an esbuild error) so the build fails loudly and surfaces the missing/changed pattern; ensure the error message references the target file (args.path) and the createRequire pattern and the node_modules/iii-sdk/package.json read so it's clear why inlining failed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@codex/package.json`:
- Around line 25-30: The codex package.json currently uses a caret range for the
iii-sdk dependency ("iii-sdk": "^0.19.2") which is inconsistent with the repo's
dominant pattern of exact pins; either change the dependency to an exact version
("iii-sdk": "0.19.2") to match other workers or add a short justification in
this package's README or package.json comments explaining why a caret range is
intentional (e.g., compatibility needs); update the codex/package.json entry for
iii-sdk or add the justification and ensure the change is committed so the
repo's dependency pinning policy remains consistent.
In `@codex/README.md`:
- Line 126: The docs for approval_policy are out of sync with the runtime:
update codex/README.md and codex/skills/SKILL.md to describe that
approval_policy is forwarded from requests/defaults by codex/src/run.ts (into
ThreadOptions) and that headless workflows typically set or recommend
approval_policy: "never" rather than declaring it as an enforced runtime
constraint; mention that other values can be passed and will be respected by
ThreadOptions and run.ts instead of implying sandbox-enforced behavior.
In `@codex/src/config.ts`:
- Around line 5-24: Make the top-level ConfigSchema and its nested defaults
object strict so unknown keys cause validation errors; update the creation of
ConfigSchema and the nested defaults z.object({...}) (referenced as ConfigSchema
and the defaults schema) to use Zod's strict mode (e.g., call .strict() on both
z.object(...) instances) so typos like defaults.approval_polciy or
raw_event_stream fail fast instead of being stripped.
In `@codex/src/events.ts`:
- Around line 10-17: The seqBySession map in makeEmitter causes an unbounded
memory leak by retaining every session_id; replace per-session tracking with a
process-global monotonically increasing counter (e.g., add a module-level let
globalSeq = 0) and use that counter when building item_id (use globalSeq++ with
PROCESS_EPOCH) or, if per-session ordering is required, ensure you delete the
session entry from seqBySession when a session finishes; update the emit
function (and item_id creation) to reference the chosen globalSeq (or add
cleanup logic to seqBySession) so session IDs are not retained indefinitely.
- Around line 18-25: The current catch around the iii.trigger({ function_id:
'stream::set', ... }) call swallows failures with console.warn which causes
silent partial success; change the catch to propagate the error (or surface it
into session state) so callers can react—for example, remove the silent swallow
and either rethrow the caught err after logging or call your session error
handler (e.g., setSessionError(session_id, err) or
sessionManager.markFailed(session_id, err)) so stream::set failures are not
ignored and will abort or mark the session appropriately.
In `@codex/src/map.ts`:
- Around line 151-156: The lastAssistant function can return a non-assistant
message when no assistant entry exists; change its final fallback to return the
assistant-empty sentinel by returning makeAssistantMessage([], '', null) instead
of messages[messages.length - 1] so that lastAssistant always yields an
assistant-typed message; update the return in lastAssistant (referencing
lastAssistant and makeAssistantMessage) accordingly.
In `@codex/src/run.ts`:
- Around line 157-191: The session is saved as "working" before creating the
Codex instance and starting/resuming the thread (new Codex, codex.resumeThread,
codex.startThread), so if those throw the DB record remains stuck and no live
handle is set; wrap the pre-stream setup (creation of Codex, obtaining thread
via resumeThread/startThread, creating AbortController and LiveRun handle, and
live.set(session_id, handle)) in a try/catch and on any error persist a terminal
state for the session via saveSession/record update (clearing "working" and
marking failure), then rethrow the error; alternatively move the persistent
"working" save until after the thread and live handle are successfully created.
Ensure you reference the symbols saveSession, Codex, codex.resumeThread,
codex.startThread, AbortController, LiveRun, live.set, and session_id when
making the change.
- Around line 138-157: executeRun currently allows overlapping runs for the same
session_id (loaded via loadSession) which races the shared SessionRecord and the
stored live handle; add a guard after loading the prior record to reject if that
session is already active (e.g., prior?.status === 'working' or a dedicated
in-record flag indicating a live run) and return an error/abort the new run
instead of continuing; update the code paths that set record.status and
saveSession (where record is created/updated) to only proceed when no live run
exists and ensure any change to record.live/status is performed atomically (or
via a compare-and-set/update) so codex::stop will always target the single
stored live handle.
In `@codex/src/state.ts`:
- Around line 13-19: Validate state responses in loadSession and listSessions by
parsing the returned object(s) against the runtime SessionRecord schema instead
of only checking for session_id: implement a type guard or schema validator
(e.g., validateSessionRecord) that confirms required fields like
codex_thread_id, status (allowed enum), numeric token counters, and other
invariants, then use it inside loadSession (return null if validation fails) and
inside listSessions (filter out or omit invalid entries before returning the
array); update the callers to expect null/filtered results accordingly and keep
the unique identifiers loadSession and listSessions to locate changes.
---
Nitpick comments:
In `@codex/scripts/build-bundle.mjs`:
- Around line 28-45: The inlinePackageJson plugin currently replaces a
createRequire(...) call in iii-sdk/dist/index.mjs but doesn't detect failures;
update the setup/onLoad handler (inside inlinePackageJson) to: after reading
args.path and node_modules/iii-sdk/package.json, attempt the replace as now,
then check the replaced string to confirm at least one substitution occurred
(e.g., by comparing lengths or using a regex test); if no replacement happened
or the package.json read failed, throw an Error (or return an esbuild error) so
the build fails loudly and surfaces the missing/changed pattern; ensure the
error message references the target file (args.path) and the createRequire
pattern and the node_modules/iii-sdk/package.json read so it's clear why
inlining failed.
In `@codex/tests/config.test.ts`:
- Around line 21-32: This test creates a temporary directory via mkdtemp
(variable dir) and writes config.yaml but never removes it; wrap the test body
in a try/finally (or use afterEach) and in the finally call rm (imported from
node:fs/promises) to delete the temp dir (await rm(dir, { recursive: true,
force: true })) so the temporary files created by mkdtemp/join/writeFile are
cleaned up even on test failures; apply the same pattern to other tests that
create temp directories and reference mkdtemp/tmpdir/dir/path/loadConfig in your
changes.
In `@codex/tests/executable.test.ts`:
- Around line 22-29: The test creates a temp directory with mkdtempSync and a
test binary via writeFileSync and chmodSync but never removes the directory and
uses chmodSync which is unreliable on Windows; update the test around
resolveCodexExecutable to create a platform-appropriate executable (on win32
write a .cmd/.bat shim like join(dir, 'codex.cmd') instead of relying on
chmodSync, otherwise create the POSIX file and set mode), set process.env.PATH
to include the temp dir, and ensure the temp directory is removed after the test
(use try/finally or an afterEach that calls rmSync/fs.rmSync with
force/recursive) so the temp files are always cleaned up.
In `@codex/tests/register.test.ts`:
- Around line 52-60: Update the test in register.test.ts to assert error
messages so failures are clearer: when calling the handler obtained via
registered.get('codex::run') (from registeredWorker()), replace the generic
rejects.toThrow() checks with rejects.toThrow(/sandbox_mode/) for the payload
containing sandbox_mode and rejects.toThrow(/approval_policy/) for the payload
containing approval_policy so the test verifies the specific validation errors.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 27115d0c-50b1-4736-b3d9-aad57dfa0f4e
⛔ Files ignored due to path filters (4)
codex/assets/cli-help.pngis excluded by!**/*.pngcodex/assets/cli-run.pngis excluded by!**/*.pngcodex/assets/console-traces.pngis excluded by!**/*.pngcodex/pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (31)
README.mdcodex/.gitignorecodex/README.mdcodex/biome.jsoncodex/config.yamlcodex/iii.worker.yamlcodex/package.jsoncodex/scripts/build-bundle.mjscodex/scripts/smoke/smoke-streams.mjscodex/skills/SKILL.mdcodex/src/config.tscodex/src/events.tscodex/src/executable.tscodex/src/iii-prompt.tscodex/src/index.tscodex/src/map.tscodex/src/run.tscodex/src/state.tscodex/src/types.tscodex/tests/_helpers/fake-codex.tscodex/tests/_helpers/fake-iii.tscodex/tests/config.test.tscodex/tests/events.test.tscodex/tests/executable.test.tscodex/tests/map.test.tscodex/tests/register.test.tscodex/tests/run-payload.test.tscodex/tests/run.test.tscodex/tests/state.test.tscodex/tsconfig.jsoncodex/vitest.config.ts
- reject a run when one is already live for the session (live.has guard): the in-process handle is what codex::stop targets, so a second run would clobber it and race the shared record - persist 'working' only after the thread + live handle exist, so a throw during Codex construction never leaves the record stuck in 'working' - README: approval_policy is forwarded per turn and any value is respected, not a fixed runtime constraint 66 tests (1 new).
The worker only ever shelled to `codex exec --json`; Rust does that natively via tokio::process, so the worker becomes a deploy: binary like the other Rust workers — no esbuild bundle, no Node runtime, and the executable PATH hack is gone (it just spawns codex). - spawn `codex exec --json`, write prompt to stdin, parse the JSONL event stream (serde mirror of exec_events.rs; lenient, unknown events/items fall through and are skipped) - raw events verbatim on codex::events, translated AgentEvent frames on agent::events (codex::shell / apply_patch / web_search / server::tool) - thread resume via engine state scope codex_sessions; per-turn cwd/model overrides; concurrency guard (one live run per session) and working-save-after-handle ordering carried over - named fields + codex_config (--config pass-through) + additional_directories + output_schema (temp file) + images; developer_instructions injects the iii runtime context, caller value wins - request schemas via schemars so `iii trigger codex::run --help` prints the parameter table - embedding codex-core was rejected: the crates are unpublished (0.0.0, not on crates.io), so a dependency would be a git URL pulling the whole OpenAI runtime; the binary is the stable contract 23 unit tests (args/config serialization, event mapping, prompt extraction, lenient parsing). Verified live against the engine: pong, command-exec mapped to codex::shell on both streams, thread resume.
There was a problem hiding this comment.
Actionable comments posted: 10
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@codex/src/codex/args.rs`:
- Around line 175-186: The sandbox_mode, approval_policy, and reasoning_effort
fields are currently using unwrap_or_else with clone, which will accept Some("")
(empty strings) and use them to override the defaults. For these safety-critical
options, empty strings should be treated as unset values. Modify the logic for
each of these three fields to check not only for None but also for empty string
values, falling back to the default from d only when the request value is either
None or an empty string. This ensures that empty overrides cannot suppress the
default safety controls.
In `@codex/src/codex/mod.rs`:
- Around line 84-90: The session busy-guard logic has a race condition where the
is_live check and the subsequent LIVE.insert operation are not atomic, allowing
concurrent requests to both pass the check before either completes the
insertion. Fix this by combining the check and insert into a single atomic
operation—use the LIVE data structure's methods (such as a conditional insert or
compare-and-swap operation) to check and insert in one operation while holding
appropriate locks/mutexes, ensuring that only one request succeeds. This fix
needs to be applied at the location around line 84-90 where the is_live check is
performed, and also at the additional location around line 166-171 where the
same pattern exists.
- Around line 99-100: The code is silently dropping errors from load_session and
save_session calls using .ok().flatten() pattern, which breaks resume and
persistence guarantees. Instead of swallowing these errors, properly handle them
by propagating or logging the errors appropriately. Specifically, when
load_session fails at the location where prior is assigned, and when
save_session fails at the other affected locations in the run path, ensure
errors are either propagated up the call stack or logged with details to
maintain clarity about persistence failures. Replace the .ok().flatten() pattern
with explicit error handling that distinguishes between "no prior session"
(which is valid) and "error loading prior session" (which should not be silently
ignored).
- Around line 149-152: The stderr output is configured as piped via
Stdio::piped() in the spawn() call but is never read or drained from the child
process. This can cause the pipe buffer to fill up and block the child process
if it writes too much to stderr. You need to read and drain the stderr output
from the child process after spawning it, either by spawning it in a separate
thread/task to consume the stderr stream, or by using a different stdio
configuration that doesn't require manual draining. Ensure that piped stderr is
properly handled to prevent blocking.
- Around line 338-339: The child process exit status is being discarded by using
let _ with the child.wait().await call, which means a non-zero exit code is
never checked or reflected in the outcome. Capture the result of
child.wait().await instead of ignoring it, then inspect the exit status from the
result. If the process failed (non-zero exit status), update the outcome
variable accordingly to reflect the failure, ensuring that failed process runs
are not reported as successful even if no explicit error event was observed
during streaming.
In `@codex/src/functions/mod.rs`:
- Around line 62-69: The tokio::spawn call at line 62 drops the JoinHandle
immediately, which means panics in the spawned task are unobserved and the
fallback mark_error logic intended as a backstop (referenced in the comment at
lines 63-64) never executes for panic cases. To fix this, capture the JoinHandle
returned by tokio::spawn instead of dropping it, and supervise it to detect task
panics. Ensure that mark_error is called as a fallback whenever the spawned task
panics, not only when it completes normally with is_error=true.
In `@codex/src/functions/types.rs`:
- Around line 69-77: The match expression handling the last message content in
the match on &last.content is currently returning Ok(String::new()) for the
catch-all case, which silently allows malformed payloads to proceed with an
empty prompt. Replace the underscore catch-all pattern case that returns
Ok(String::new()) with an Err variant that returns an appropriate error
indicating the message content is not in a supported format. This will ensure
that invalid or unsupported message content shapes are properly rejected rather
than silently processed with empty input.
In `@codex/src/main.rs`:
- Line 47: The Config::load call at line 47 in the main function currently
aborts startup on config parse/load errors using the `?` operator. Instead of
exiting, follow repository convention by catching the error from Config::load,
logging a warning message describing the failure, and falling back to a default
inert configuration so the worker remains registered and operational. Replace
the `?` operator with explicit error handling that matches the pattern used by
sibling worker binaries in the repository.
- Around line 19-53: The url field in the Cli struct has a default_value set in
the argument attribute, which causes cli.url to never be empty when checked in
the main function. Remove the default_value parameter from the #[arg] attribute
on the url field so that cli.url will be empty when the --url flag is not
provided by the user. This allows the empty-check logic (the if
cli.url.is_empty() condition) to correctly fall back to cfg.engine_url from the
config file, making the configuration-driven URL reachable in normal runs.
In `@codex/src/state.rs`:
- Around line 29-32: In the `load_session` function, the error case of
`serde_json::from_value::<SessionRecord>(v)` is currently converting decode
failures to `Ok(None)`, which masks state corruption or incompatibility issues
by treating them identically to missing sessions. Instead of silently ignoring
errors with `Err(_) => Ok(None)`, propagate the error by returning it so that
decode failures fail fast and properly surface the actual corruption or
incompatibility issue to the caller, rather than silently starting a fresh
conversation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a2b4aa3b-94fc-4350-ab2f-ac17706148c2
⛔ Files ignored due to path filters (1)
codex/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (23)
README.mdcodex/.gitignorecodex/Cargo.tomlcodex/README.mdcodex/build.rscodex/iii.worker.yamlcodex/src/codex/args.rscodex/src/codex/events_types.rscodex/src/codex/mod.rscodex/src/config.rscodex/src/events.rscodex/src/functions/mod.rscodex/src/functions/types.rscodex/src/iii_prompt.rscodex/src/lib.rscodex/src/main.rscodex/src/manifest.rscodex/src/map.rscodex/src/state.rscodex/src/wire.rscodex/tests/args.rscodex/tests/map.rscodex/tests/prompt_config.rs
✅ Files skipped from review due to trivial changes (2)
- codex/src/lib.rs
- codex/README.md
🚧 Files skipped from review as they are similar to previous changes (2)
- codex/.gitignore
- README.md
The per-event translation (JSONL event -> agent::events frames + turn state) lived inside the I/O stream loop, so the orchestration sequence had no unit coverage — only the per-item map helpers did. Lift it into a pure `translate::step(&mut TurnState, ThreadEvent) -> Vec<frame>`; the loop now reads lines, emits raw, calls step, emits the returned frames. No bus wrapper — the loop still uses the iii-sdk III handle directly. Adds tests/translate.rs (6): full command turn frame sequence + state, started-then-completed single-start, completed-without-start synthesis, reasoning->thinking, turn.failed error state, mcp server::tool id. 29 tests total. Verified live: refactored loop emits the identical agent::events / codex::events sequences.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
codex/src/codex/mod.rs (1)
160-163:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winPrompt write failures are silently ignored.
If
write_allorshutdownfails, the turn proceeds with no/partial input to Codex, likely causing a confusing downstream failure or hang rather than a clear error.Suggested fix
if let Some(mut stdin) = child.stdin.take() { - let _ = stdin.write_all(prompt.as_bytes()).await; - let _ = stdin.shutdown().await; + if let Err(e) = stdin.write_all(prompt.as_bytes()).await { + let _ = child.start_kill(); + LIVE.lock().await.remove(&session_id); + return json!({ "session_id": session_id, "is_error": true, "stop_reason": "error", "result": format!("failed to write prompt to codex stdin: {e}") }); + } + let _ = stdin.shutdown().await; // shutdown error is less critical }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@codex/src/codex/mod.rs` around lines 160 - 163, The code is silently ignoring errors from stdin.write_all() and stdin.shutdown() operations by using let _ =, which means if writing the prompt to the Codex process fails, the error is discarded and execution continues without input to the child process. Replace the error-ignoring pattern with proper error handling that logs or propagates the actual errors from both write_all and shutdown operations, ensuring that failures to write the prompt to stdin are surfaced clearly rather than causing confusing downstream failures.
🧹 Nitpick comments (1)
codex/tests/translate.rs (1)
100-109: ⚡ Quick winAdd a direct test for top-level
errorevents.You validate
turn.failedon Line 100+, buttranslate::stephas a separateThreadEvent::Errorbranch with the same error-state semantics. Add a sibling test to lock that path too (is_error=true,stop_reason="error",result_text, no frames).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@codex/tests/translate.rs` around lines 100 - 109, The test `turn_failed_sets_error_state_and_emits_no_frame` validates the turn.failed event path, but translate::step has a separate ThreadEvent::Error branch with identical error-state semantics that is not covered by a direct test. Add a new sibling test function in the same file that directly tests the top-level error event path by passing a JSON event with type "error" (instead of "turn.failed") to the run function, and verify it achieves the same assertions: frames.is_empty(), state.is_error is true, state.stop_reason equals "error", and state.result_text contains the error message from the event.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@codex/src/codex/mod.rs`:
- Around line 160-163: The code is silently ignoring errors from
stdin.write_all() and stdin.shutdown() operations by using let _ =, which means
if writing the prompt to the Codex process fails, the error is discarded and
execution continues without input to the child process. Replace the
error-ignoring pattern with proper error handling that logs or propagates the
actual errors from both write_all and shutdown operations, ensuring that
failures to write the prompt to stdin are surfaced clearly rather than causing
confusing downstream failures.
---
Nitpick comments:
In `@codex/tests/translate.rs`:
- Around line 100-109: The test
`turn_failed_sets_error_state_and_emits_no_frame` validates the turn.failed
event path, but translate::step has a separate ThreadEvent::Error branch with
identical error-state semantics that is not covered by a direct test. Add a new
sibling test function in the same file that directly tests the top-level error
event path by passing a JSON event with type "error" (instead of "turn.failed")
to the run function, and verify it achieves the same assertions:
frames.is_empty(), state.is_error is true, state.stop_reason equals "error", and
state.result_text contains the error message from the event.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6c5a19d4-1cc1-4337-83b2-4f09b1e223e1
📒 Files selected for processing (3)
codex/src/codex/mod.rscodex/src/codex/translate.rscodex/tests/translate.rs
- atomic reserve: check + insert the live slot under one lock (try_reserve), with release on every early-return path; closes the TOCTOU between the busy-check and the insert - drain child stderr in a background task so a chatty codex can't fill the pipe buffer and block; tail surfaces on the error path - inspect child exit status: a non-zero exit with no error event (a crash) is now reported as error, not success (aborted path exempt) - empty-string sandbox_mode / approval_policy / reasoning_effort treated as unset so a caller can't wipe the operator's safety defaults with empty - load_session failure is logged (not silently flattened); a decode failure in state propagates as corruption instead of masquerading as no-session - unsupported message content shape errors instead of running empty prompt - --url defaults to empty so config.yaml engine_url is actually reachable - codex::start supervises the spawned task and marks error on panic 31 tests (2 new). Verified live: pong, config-url fallback, refactored run.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
codex/src/state.rs (1)
59-66:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
list_sessionssilently masks state contract/corruption problems.This path converts non-array payloads to an empty list and drops invalid records with
filter_map(...ok()), which can hide schema drift or corrupted persisted entries. Prefer surfacing an error (or at minimum logging each dropped record).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@codex/src/state.rs` around lines 59 - 66, The list_sessions function silently masks state issues by returning an empty vector when the value is not an array (when as_array() returns None) and by dropping invalid records with filter_map(...ok()). Instead, return an appropriate error when the value is not an array to surface schema/state corruption issues, and replace the filter_map pattern with explicit handling that logs each failed deserialization attempt so that schema drift and corrupted entries are visible for debugging rather than silently hidden.
♻️ Duplicate comments (1)
codex/src/functions/types.rs (1)
71-75:⚠️ Potential issue | 🟠 Major | ⚡ Quick winStill allows empty prompts from text-block arrays.
When
messages[*].contentis an array with no usabletextfields, this branch returnsOk(""), socodex::runcan still execute an empty turn instead of rejecting malformed input. Return an error when the joined text is empty.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@codex/src/functions/types.rs` around lines 71 - 75, The Value::Array branch currently allows empty prompts because it collects text from blocks and joins them without validating the result. After the .join("\n") operation in the Value::Array match arm, add a check to verify the resulting text is not empty; if it is empty, return an error instead of returning Ok("") to reject malformed input that has no usable text fields.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@codex/src/state.rs`:
- Around line 59-66: The list_sessions function silently masks state issues by
returning an empty vector when the value is not an array (when as_array()
returns None) and by dropping invalid records with filter_map(...ok()). Instead,
return an appropriate error when the value is not an array to surface
schema/state corruption issues, and replace the filter_map pattern with explicit
handling that logs each failed deserialization attempt so that schema drift and
corrupted entries are visible for debugging rather than silently hidden.
---
Duplicate comments:
In `@codex/src/functions/types.rs`:
- Around line 71-75: The Value::Array branch currently allows empty prompts
because it collects text from blocks and joins them without validating the
result. After the .join("\n") operation in the Value::Array match arm, add a
check to verify the resulting text is not empty; if it is empty, return an error
instead of returning Ok("") to reject malformed input that has no usable text
fields.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9abeed40-efc7-44fc-84d3-cec373022106
📒 Files selected for processing (8)
codex/src/codex/args.rscodex/src/codex/mod.rscodex/src/functions/mod.rscodex/src/functions/types.rscodex/src/main.rscodex/src/state.rscodex/tests/args.rscodex/tests/prompt_config.rs
🚧 Files skipped from review as they are similar to previous changes (4)
- codex/tests/prompt_config.rs
- codex/src/main.rs
- codex/src/codex/mod.rs
- codex/src/codex/args.rs
…ifest - add iii-permissions.yaml (new-worker SOP § 7): read-only status / sessions::list allow-listed; run/start/stop stay at needs_approval since they spawn a full agent with host fs+shell - trim iii.worker.yaml to the canonical binary-worker shape (drop the stale targets/runtime/scripts blocks; the release matrix is driven by release.yml, not the manifest)
Migrate off file-only config.yaml to the configuration-worker pattern (shell/coder/database): config.yaml is now the seed installed as initial_value; the live value is authoritative and hot-reloads. - src/configuration.rs: register_config / fetch_config / config-change trigger (configuration:updated) / reconcile / whole-snapshot hot-swap. No jail gate — every codex field is a runtime tuning knob. - config.rs: Config gains Serialize + JsonSchema + to_json/from_json/ json_schema; engine_url dropped from the schema (bootstrap, stays on --url). - main.rs: load seed -> register_config -> fetch -> ConfigCell -> bind trigger + reconcile before serving -> register_all(cell). - handlers read the live snapshot per call (ConfigCell). 33 tests. Verified live: schema registered, configuration::get returns the value, codex::run uses the snapshot, configuration::set hot-reloads.
- add codex to create-tag.yml options and release.yml tag patterns (SOP § 6 — without it a release tag triggers nothing) - config registration is now best-effort: register/fetch/trigger failures log + fall back to the seed instead of aborting, so registering codex::* never depends on the configuration worker being up (interface boot smoke runs against a bare engine) - deny the internal codex::on-config-change from agent calls Merged latest main. Verified: codex boots on a bare `workers: []` engine and registers all 6 functions; clippy + 33 tests green.
From an adversarial review (only finding above nit). A malformed JSONL line was dropped with no trace; add a tracing::debug on the parse-failure arm so a format drift is diagnosable instead of silent. 7 test groups, clippy clean.
New Node worker that drives headless Codex turns over the iii bus,
sibling to the claude-code worker with the same architecture.
Summary by CodeRabbit
codex::run,codex::start,codex::stop,codex::status, andcodex::sessions::list, including background execution and resumable sessions.