-
Notifications
You must be signed in to change notification settings - Fork 32
Thread-safe seeding and per-job seeds for drivers #298
Copy link
Copy link
Open
Labels
status: openNo solution for this issue has been providedNo solution for this issue has been providedtopic: driverIssue/PR related to the driversIssue/PR related to the driverstype: discussionIssue to allow for discussion on a certain topicIssue to allow for discussion on a certain topictype: software planningIssue to discussion on implemetations and introduction of new featuresIssue to discussion on implemetations and introduction of new features
Metadata
Metadata
Assignees
Labels
status: openNo solution for this issue has been providedNo solution for this issue has been providedtopic: driverIssue/PR related to the driversIssue/PR related to the driverstype: discussionIssue to allow for discussion on a certain topicIssue to allow for discussion on a certain topictype: software planningIssue to discussion on implemetations and introduction of new featuresIssue to discussion on implemetations and introduction of new features
Type
Fields
Give feedbackNo fields configured for issues without a type.
Description of the Issue:
If a QUEENS driver contains stochastic code of its own, there is currently no clean way to make it reproducible, because QUEENS does not hand a seed to the driver.
Concretely, consider a Monte Carlo study with seed = 42 and three samples. MonteCarlo.pre_run() calls np.random.seed(42) and then draws the samples. The scheduler forwards (sample, job_id) to Driver.run() - but no seed. A driver that needs randomness (e.g. for pre-processing that generates one stochastic realization per job) has to invent its own derivation. This is error-prone and decouples the driver's randomness from the iterator's sampling randomness.
On top of that, np.random.seed(self.seed) in MonteCarlo.pre_run() and LatinHypercubeSampling.pre_run() mutates the global NumPy RNG. Any other code in the same process that touches np.random (Dask workers, drivers, user code, libraries) shares the same pool, so the order of draws becomes non-deterministic under parallel execution.
Proposed Solution:
Adopt numpy.random.SeedSequence:
SeedSequenceper sample viass.spawn(num_samples)and forward it to the driver - either as a new argument toDriver.run()/Scheduler.evaluate(), or attached to theParametersobject so it is available viasample_as_dict(sample)(e.g.sample["__seed__"]).With this change, a single user-provided super-seed deterministically drives both the sampling and the per-job driver randomness, independently of thread or worker assignment.
Action Items:
No response
Related Issues:
No response
Interested Parties:
@sbrandstaeter