Skip to content

Thread-safe seeding and per-job seeds for drivers #298

@bennoschoenstein

Description

@bennoschoenstein

Description of the Issue:

If a QUEENS driver contains stochastic code of its own, there is currently no clean way to make it reproducible, because QUEENS does not hand a seed to the driver.

Concretely, consider a Monte Carlo study with seed = 42 and three samples. MonteCarlo.pre_run() calls np.random.seed(42) and then draws the samples. The scheduler forwards (sample, job_id) to Driver.run() - but no seed. A driver that needs randomness (e.g. for pre-processing that generates one stochastic realization per job) has to invent its own derivation. This is error-prone and decouples the driver's randomness from the iterator's sampling randomness.

On top of that, np.random.seed(self.seed) in MonteCarlo.pre_run() and LatinHypercubeSampling.pre_run() mutates the global NumPy RNG. Any other code in the same process that touches np.random (Dask workers, drivers, user code, libraries) shares the same pool, so the order of draws becomes non-deterministic under parallel execution.

Proposed Solution:

Adopt numpy.random.SeedSequence:

  • In the iterator, build a local generator instead of seeding globally:
ss = np.random.SeedSequence(self.seed)
rng = np.random.default_rng(ss)
self.samples = parameters.draw_samples(num_samples, rng = rng)
  • Spawn one child SeedSequence per sample via ss.spawn(num_samples) and forward it to the driver - either as a new argument to Driver.run() / Scheduler.evaluate(), or attached to the Parameters object so it is available via sample_as_dict(sample) (e.g. sample["__seed__"]).

With this change, a single user-provided super-seed deterministically drives both the sampling and the per-job driver randomness, independently of thread or worker assignment.

Action Items:

No response

Related Issues:

No response

Interested Parties:

@sbrandstaeter

Metadata

Metadata

Labels

status: openNo solution for this issue has been providedtopic: driverIssue/PR related to the driverstype: discussionIssue to allow for discussion on a certain topictype: software planningIssue to discussion on implemetations and introduction of new features

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions