Skip to content

[codex] Skip unreadable events during pagination#3754

Draft
neubig wants to merge 1 commit into
mainfrom
codex/tolerate-corrupt-event-files
Draft

[codex] Skip unreadable events during pagination#3754
neubig wants to merge 1 commit into
mainfrom
codex/tolerate-corrupt-event-files

Conversation

@neubig

@neubig neubig commented Jun 16, 2026

Copy link
Copy Markdown
Member

Summary

  • skip unreadable persisted event payloads while paginating/searching/counting conversation events
  • log a warning with the conversation id and event index when an event file is missing, empty, non-UTF-8, or fails Pydantic validation
  • add regression coverage for a zero-byte event file between valid events

Root Cause

A zero-byte persisted event file can remain in the on-disk EventLog after an interrupted write. The event search path reads by index for pagination performance, so one unreadable event currently raises and makes /events/search return 500. The event websocket uses the same search path for replay, so affected conversations repeatedly disconnect even though the backend itself is healthy.

Validation

  • uv run pytest tests/agent_server/test_event_service.py -q
  • uv run pre-commit run --files openhands-agent-server/openhands/agent_server/event_service.py tests/agent_server/test_event_service.py

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:a163ce7-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-a163ce7-python \
  ghcr.io/openhands/agent-server:a163ce7-python

All tags pushed for this build

ghcr.io/openhands/agent-server:a163ce7-golang-amd64
ghcr.io/openhands/agent-server:a163ce740049a0cb3a85b27a5e93bd9908756352-golang-amd64
ghcr.io/openhands/agent-server:codex-tolerate-corrupt-event-files-golang-amd64
ghcr.io/openhands/agent-server:a163ce7-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:a163ce7-golang-arm64
ghcr.io/openhands/agent-server:a163ce740049a0cb3a85b27a5e93bd9908756352-golang-arm64
ghcr.io/openhands/agent-server:codex-tolerate-corrupt-event-files-golang-arm64
ghcr.io/openhands/agent-server:a163ce7-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:a163ce7-java-amd64
ghcr.io/openhands/agent-server:a163ce740049a0cb3a85b27a5e93bd9908756352-java-amd64
ghcr.io/openhands/agent-server:codex-tolerate-corrupt-event-files-java-amd64
ghcr.io/openhands/agent-server:a163ce7-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:a163ce7-java-arm64
ghcr.io/openhands/agent-server:a163ce740049a0cb3a85b27a5e93bd9908756352-java-arm64
ghcr.io/openhands/agent-server:codex-tolerate-corrupt-event-files-java-arm64
ghcr.io/openhands/agent-server:a163ce7-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:a163ce7-python-amd64
ghcr.io/openhands/agent-server:a163ce740049a0cb3a85b27a5e93bd9908756352-python-amd64
ghcr.io/openhands/agent-server:codex-tolerate-corrupt-event-files-python-amd64
ghcr.io/openhands/agent-server:a163ce7-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:a163ce7-python-arm64
ghcr.io/openhands/agent-server:a163ce740049a0cb3a85b27a5e93bd9908756352-python-arm64
ghcr.io/openhands/agent-server:codex-tolerate-corrupt-event-files-python-arm64
ghcr.io/openhands/agent-server:a163ce7-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:a163ce7-golang
ghcr.io/openhands/agent-server:a163ce740049a0cb3a85b27a5e93bd9908756352-golang
ghcr.io/openhands/agent-server:codex-tolerate-corrupt-event-files-golang
ghcr.io/openhands/agent-server:a163ce7-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:a163ce7-java
ghcr.io/openhands/agent-server:a163ce740049a0cb3a85b27a5e93bd9908756352-java
ghcr.io/openhands/agent-server:codex-tolerate-corrupt-event-files-java
ghcr.io/openhands/agent-server:a163ce7-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:a163ce7-python
ghcr.io/openhands/agent-server:a163ce740049a0cb3a85b27a5e93bd9908756352-python
ghcr.io/openhands/agent-server:codex-tolerate-corrupt-event-files-python
ghcr.io/openhands/agent-server:a163ce7-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

  • Each variant tag (e.g., a163ce7-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., a163ce7-python-amd64) are also available if needed

Issue

Fixes #3761.

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions

Copy link
Copy Markdown
Contributor

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions

Copy link
Copy Markdown
Contributor

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

@github-actions

Copy link
Copy Markdown
Contributor

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-agent-server/openhands/agent_server
   event_service.py56711380%103–104, 134, 137–138, 142–143, 150, 154, 160, 170–174, 177–180, 251, 272–273, 347, 401, 421, 428, 452–453, 457, 465, 468, 478, 510, 521, 528, 534, 588, 590, 594–596, 600, 609–610, 612, 616, 622, 624, 671, 701, 704, 759, 780, 888, 1000–1003, 1007, 1016, 1018–1019, 1023, 1037–1042, 1044, 1071, 1076–1079, 1083–1086, 1094–1097, 1113, 1130–1133, 1179–1180, 1182–1189, 1191–1192, 1201–1202, 1204–1205, 1212–1213, 1215–1216, 1236, 1242, 1248, 1257–1258
TOTAL313871375656% 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Track PR #3754: Skip unreadable events during pagination

1 participant