Problem
Current behavior: Multiple workers may attempt to build the same Docker image tags concurrently in prepare_environment (base/env/instance), leading to races, inconsistent build states, and flaky failures. Docker's layer cache helps but does not prevent tag-level races or partially built images.
Why it matters: Parallel evaluation is a core feature. Without locking, two workers can:
- Both build the same env/instance image at once.
- Remove each other's intermediate artifacts or fail on tag conflicts.
- Produce non-deterministic failures that are hard to reproduce.
Proposal
Introduce cross-process locks around image creation per tag:
- Implement a filesystem-based lock (e.g.,
fcntl/portalocker) or a simple lockfile in settings.cache_dir keyed by image tag (e.g., image-locks/{sha256(tag)}.lock).
- Wrap each "build if not exists" block in
prepare_environment with:
- Acquire lock for tag.
- Recheck
image_exists(tag) (double-checked locking).
- Build if still missing.
- Release lock.
- Add a timeout for lock acquisition and backoff retry to avoid deadlocks.
- Optionally, add a "lock via Docker labels" enhancement, but prefer filesystem locks for portability.
- Log when waiting on a lock to aid debugging.
- Tests:
- Unit: simulate two threads/processes trying to build the same tag; ensure only one build executes.
- Integration (optional): spawn two processes building the same environment and confirm no race/failure.
Acceptance criteria
- Concurrent runs no longer race on the same image tags.
- If one process is building, others wait and skip build once the image exists.
- Locking is robust, time-bounded, and logged.
- Tests added and passing.
Problem
Current behavior: Multiple workers may attempt to build the same Docker image tags concurrently in
prepare_environment(base/env/instance), leading to races, inconsistent build states, and flaky failures. Docker's layer cache helps but does not prevent tag-level races or partially built images.Why it matters: Parallel evaluation is a core feature. Without locking, two workers can:
Proposal
Introduce cross-process locks around image creation per tag:
fcntl/portalocker) or a simple lockfile insettings.cache_dirkeyed by image tag (e.g.,image-locks/{sha256(tag)}.lock).prepare_environmentwith:image_exists(tag)(double-checked locking).Acceptance criteria