Skip to content

shared-memory domain crash #235

@mabarnes

Description

@mabarnes

I've just had several runs crash on pitagora with the following error written to standard out:

"
There are 15 shared-memory domains. This domain has 48 CPUs.

Error: The number of CPUs needs to be divisible by the number of shared memory domains. Aborting
"

Here's the parallelisation setup in case helpful:

Total number of grid points:
nxnynznmunvpanspec = 467140608 = 445.50 MiB = .44 GiB.
nkx
nkynznmunvpanspec = 106216704 = 101.30 MiB = .10 GiB.
nkxnkynz = 184404 = .18 MiB = .00 GiB.

Number of points to be parallelised:
vmu-layout: 1152 (nmunvpanspec)
kxkyz-layout: 184404 (nkxnkynzntubesnspec)

Number of points per processor:
vmu-layout: 1
kxkyz-layout: 161

I have not selected any special option for shared-memory, so I think it is the default. I don't understand how the number of shared-memory domains is determined, but it does not seem like ideal behaviour for runs to fail for this reason. As things stand, I have no idea how many cores I am allowed to request for a given simulation. Is there a way to overcome this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions