Skip to content

Allow different number of cores for the worker and the simulation tool #307

@bennoschoenstein

Description

@bennoschoenstein

Description of the Issue:

Currently, the num_procs setting on a QUEENS scheduler controls two things at once:

  1. How many CPUs the workload manager (e.g. SLURM) gives to a worker.
  2. How many MPI processes the simulation tool is started with (e.g. mpirun -n N 4C ...).

Because both come from the same value, a worker that gets 4 CPUs from the cluster is forced to run the simulation with exactly 4 ranks - and vice versa. There is no way to:

  • give a worker some headroom for Python pre- or post-processing while running the simulation with fewer ranks, or
  • run many short-lived jobs in small worker slots whose simulations still need several ranks.

This couples two things that are conceptually independent: how big the slot on the cluster is and how many cores the simulation actually uses.

Proposed Solution:

Let drivers carry their own optional num_procs value:

  • If the driver's num_procs is set, it is used for the simulation call.
  • If it is not set (default), the driver falls back to the scheduler's num_procs — exactly today's behaviour.

This is fully backward compatible (default None = unchanged behaviour) and adds one optional argument on the driver side. The scheduler keeps its current role of requesting the worker resources from the workload manager; the driver gets to say how many MPI ranks the simulation tool actually uses.

Out of scope (future work): Different core counts for each step of a single driver pipeline (e.g. meshing on 1 core, simulation on N cores, plotting on 1 core in the same job). That would require a deeper architectural change. This issue is a prerequisite but does not attempt to solve it.

Action Items:

  • Add optional num_procs argument to the driver base class.
  • Use it for the jobscript rendering when set; fall back to the scheduler value otherwise.
  • Add a small regression test with two drivers using different values on the same scheduler.
  • One-paragraph note in the docs.

Related Issues:

No response

Interested Parties:

No response

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions