Make rlm subcalls parallel with thread safety and semaphore by h4shk4t · Pull Request #136 · alexzhang13/rlm

h4shk4t · 2026-03-11T19:31:56Z

Summary

Adds parallel execution support for rlm_query_batched, allowing child RLM subcalls to run concurrently instead of sequentially. This provides significant speedup when the parent model fans out multiple independent queries (e.g., answering 3 independent questions simultaneously).

Changes

rlm/core/rlm.py

New max_concurrent_subcalls parameter (default 1 = sequential, backward-compatible)
New event callbacks: on_subcall_start, on_subcall_complete, on_iteration_start, on_iteration_complete for live progress tracking (Optional, I added this for better debugging - can remove this if out of scope)
Thread-safe _subcall: added _state_lock to protect _cumulative_cost and error tracking when children run in parallel across threads
Child RLMs propagate concurrency settings and callbacks to their own children

rlm/environments/local_repl.py

New _rlm_query_batched_parallel: dispatches subcalls via a short-lived ThreadPoolExecutor, collects results in original prompt order via as_completed
Two-layer concurrency control:
Local: ThreadPoolExecutor(max_workers=max_concurrent_subcalls) bounds children per batch call
Global: process-wide threading.Semaphore(16) bounds total concurrent children across all depths, preventing thread/memory explosion with deep recursion
Thread-safe _pending_llm_calls appends via _calls_lock
Fixed _temp_cwd(): restores to self.original_cwd (set at init) instead of os.getcwd(), which could capture another thread's temp dir that later gets deleted ([Errno 2])

tests/test_rlm_query.py

TestRlmQueryBatchedParallel: 6 tests covering parallel speedup, result ordering, partial failure handling, pending call tracking, sequential fallback (max_concurrent=1), and single-prompt short-circuit
TestGlobalSemaphoreBounding: 2 tests verifying the global semaphore limits concurrent children and is shared across REPL instances

Design decisions

Sequential by default: max_concurrent_subcalls=1 has zero overhead — no locks, no thread pool, no semaphore. Existing behavior is unchanged.
No pool-within-pool deadlock: each depth level creates its own short-lived ThreadPoolExecutor. The global semaphore bounds work, not threads — workers acquire a slot, call subcall_fn, release.
No os.chdir lock: I explored Lock (deadlocks same-thread re-entry in sequential mode) and RLock (deadlocks cross-thread in parallel mode). The root cause was os.getcwd() capturing stale paths, fixed by always restoring to a stable directory.

Usage

from rlm import RLM

# Sequential (default) — identical to previous behavior
rlm = RLM(
    backend="azure_openai",
    backend_kwargs={"model_name": "gpt-4o-mini"},
    max_depth=2,
)
result = rlm.completion("Answer these 3 questions using rlm_query_batched...")
# Children run one at a time: total ~30s for 3 subcalls @ ~10s each

# Parallel — just set max_concurrent_subcalls
rlm = RLM(
    backend="azure_openai",
    backend_kwargs={"model_name": "gpt-4o-mini"},
    max_depth=2,
    max_concurrent_subcalls=4,  # <-- this is the only change
)
result = rlm.completion("Answer these 3 questions using rlm_query_batched...")

alexzhang13 · 2026-03-18T05:44:50Z

I'll look at this when I get time, this is pretty important to get in IMO (assuming it's correct)

…wd race, default=1 - Remove global subcall semaphore and related API (set_global_max_subcalls, etc.) per maintainer feedback — concurrency bounded by ThreadPoolExecutor(max_workers) only - Fix _temp_cwd race condition: restore to self.original_cwd instead of os.getcwd() which could point to a deleted temp dir under concurrent execution ([Errno 2]) - Change default max_concurrent_subcalls from 4 to 1 (sequential, backward-compatible) - Scope max_concurrent_subcalls env_kwargs pass to local environment only - Append pending_llm_calls in prompt order for deterministic metadata - Remove duplicate semaphore-dependent tests, fix default assertions (4→1) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Made-with: Cursor

alexzhang13 · 2026-04-21T05:32:22Z

Fix linting issues first, otherwise lgtm!

Make rlm subcalls parallel with thread safety and semaphore

a485c0d

h4shk4t mentioned this pull request Mar 11, 2026

Making RLM subqueries run in parallel for faster execution #135

Closed

Ashutosh Srivastava and others added 3 commits March 17, 2026 22:51

Fix lint failure

bc52f44

Merge branch 'main' into parallel-rlm-subcalls

460595c

h4shk4t force-pushed the parallel-rlm-subcalls branch from 6b60ec9 to dbb701f Compare April 6, 2026 19:57

Ashutosh Srivastava and others added 3 commits April 20, 2026 22:45

Fix lint

7dc89bb

remove thread based batch RLM, move to subprocesses

7a58640

add extra tests

9a2ee81

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make rlm subcalls parallel with thread safety and semaphore#136

Make rlm subcalls parallel with thread safety and semaphore#136
h4shk4t wants to merge 7 commits into
alexzhang13:mainfrom
h4shk4t:parallel-rlm-subcalls

h4shk4t commented Mar 11, 2026 •

edited

Loading

Uh oh!

alexzhang13 commented Mar 18, 2026

Uh oh!

alexzhang13 commented Apr 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

h4shk4t commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Design decisions

Usage

Uh oh!

alexzhang13 commented Mar 18, 2026

Uh oh!

alexzhang13 commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

h4shk4t commented Mar 11, 2026 •

edited

Loading

alexzhang13 commented Apr 21, 2026 •

edited

Loading