Skip to content

Native library (FFI) download race condition in parallel test execution #32

@lambdalisue

Description

@lambdalisue

Problem

When running tests with --parallel flag in CI environments, intermittent test failures occur due to native library download race conditions.

Root Cause

Multiple subprocess instances simultaneously attempt to download the same native library (e.g., SQLite3 via @db/sqlite), resulting in:

  • Error: Could not open library: file too short
  • Cause: One process tries to open an incomplete library file while another is still downloading it

Evidence

From CI logs:

Failed to load SQLite3 Dynamic Library
Caused by Error: Could not open library: Could not open library:
/home/runner/.cache/deno/plug/.../libsqlite3.so: file too short

Affected Libraries

Any library using FFI (Foreign Function Interface):

  • @db/sqlite
  • @db/duckdb
  • Other native libraries loaded via @denosaurs/plug

Why This is Probitas-Specific

This issue affects any user running probitas tests in parallel in CI environments. It's not specific to this repository's tests, but a general problem that probitas users will encounter.

Potential Solutions

1. Retry Logic with Exponential Backoff ⭐ Recommended

Automatically retry imports on library race condition errors.

Pros:

  • Transparent to users
  • Works automatically
  • Minimal performance impact

Cons:

  • Adds complexity to loader
  • May mask other issues

Implementation approach:

// In @probitas/scenario loader
async function retryImport(url: string, maxRetries = 3) {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await import(url);
    } catch (err) {
      if (!isLibraryRaceConditionError(err) || attempt === maxRetries) {
        throw err;
      }
      await exponentialBackoff(attempt);
    }
  }
}

2. Serial Execution Option

Add --jobs=1 flag or option to disable parallel execution.

Pros:

  • Simple
  • Guaranteed to work

Cons:

  • Tests run slower
  • Users must opt-in

3. Pre-load Libraries

Document workaround for users to pre-cache libraries before tests.

Pros:

  • No code changes needed

Cons:

  • Users must manually configure
  • Not automatic

4. File Locking

Use file locks to coordinate library downloads.

Pros:

  • Prevents conflicts at OS level

Cons:

  • Complex cross-platform implementation
  • May not work with Deno's caching mechanism

Workaround (Current)

For now, users can add this to their CI workflow:

- name: Pre-load native libraries
  run: |
    deno eval "import { Database } from 'jsr:@db/sqlite@0.12'; const db = new Database(':memory:'); db.close();"

Discussion

Should probitas handle this automatically, or is documenting the workaround sufficient?

For automatic handling, retry logic seems most appropriate as it:

  • Doesn't slow down the common case
  • Handles the error gracefully
  • Requires no user configuration

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions