Skip to content

Add concurrency locks for image building #3

@SuperMuel

Description

@SuperMuel

Problem

Current behavior: Multiple workers may attempt to build the same Docker image tags concurrently in prepare_environment (base/env/instance), leading to races, inconsistent build states, and flaky failures. Docker's layer cache helps but does not prevent tag-level races or partially built images.

Why it matters: Parallel evaluation is a core feature. Without locking, two workers can:

  • Both build the same env/instance image at once.
  • Remove each other's intermediate artifacts or fail on tag conflicts.
  • Produce non-deterministic failures that are hard to reproduce.

Proposal

Introduce cross-process locks around image creation per tag:

  • Implement a filesystem-based lock (e.g., fcntl/portalocker) or a simple lockfile in settings.cache_dir keyed by image tag (e.g., image-locks/{sha256(tag)}.lock).
  • Wrap each "build if not exists" block in prepare_environment with:
    1. Acquire lock for tag.
    2. Recheck image_exists(tag) (double-checked locking).
    3. Build if still missing.
    4. Release lock.
  • Add a timeout for lock acquisition and backoff retry to avoid deadlocks.
  • Optionally, add a "lock via Docker labels" enhancement, but prefer filesystem locks for portability.
  • Log when waiting on a lock to aid debugging.
  • Tests:
    • Unit: simulate two threads/processes trying to build the same tag; ensure only one build executes.
    • Integration (optional): spawn two processes building the same environment and confirm no race/failure.

Acceptance criteria

  • Concurrent runs no longer race on the same image tags.
  • If one process is building, others wait and skip build once the image exists.
  • Locking is robust, time-bounded, and logged.
  • Tests added and passing.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions