Improve partitioner unit tests by mike-hobson · Pull Request #254 · MetOffice/lfric_core

mike-hobson · 2026-01-29T08:13:55Z

PR Summary

Sci/Tech Reviewer: None required
Code Reviewer: @stevemullerworth

I noted three deficiencies with the partition unit tests:

There are no tests of uneven partitions. The single panel partitioner unit tests use an 8x8 full domain mesh. If this is partitioned over 3 ranks, it should result in partitions of size 3x8, 3x8 and 2x8 (i.e. uneven). Add such tests to the biperiodic, planar and x- plus y- trench mesh partitioner tests.
There is no test of the parallel cubedsphere partitioner. This was omitted because it is slow to run from the command line. Whilst that is true (it is slower than all the other infrastructure unit tests combined), this is the partitioner used for almost all global model runs - so I have added it in (and we'll take the hit on runtime).
The current test suite runs the unit tests on one core - even though the infrastructure unit tests are parallel. The infrastructure unit tests take two and a half minutes to complete. Giving them the correct number of cores to run on drops this to 8 seconds. There is no fine-grained control of resources for the unit tests, so after discussions with SSD, we decided it was fine for all "technical tests" (integration- and unit-tests) to run on multiple cores. The Met Office appears to have its own overrides for this setting, but I have also changed the system default (as I can see no reason for anyone wanting to run a parallel test on a single core)

Code Quality Checklist

I have performed a self-review of my own code
My code follows the project's style guidelines
Comments have been included that aid understanding and enhance the readability of the code
My changes generate no new warnings
All automated checks in the CI pipeline have completed successfully

Testing

I have tested this change locally, using the LFRic Core rose-stem suite
If required (e.g. API changes) I have also run the LFRic Apps test suite using this branch
If any tests fail (rose-stem or CI) the reason is understood and acceptable (e.g. kgo changes)
I have added tests to cover new functionality as appropriate (e.g. system tests, unit tests, etc.)
Any new tests have been assigned an appropriate amount of compute resource and have been allocated to an appropriate testing group (i.e. the developer tests are for jobs which use a small amount of compute resource and complete in a matter of minutes)

Test Suite Results - lfric_core - partition_test/run1

Suite Information

Item	Value
Suite Name	partition_test/run1
Suite User	mike.hobson
Workflow Start	2026-01-29T06:08:00
Groups Run	all

Dependency	Reference	Main Like
lfric_core	mike-hobson/lfric_core@add_uneven_partition_test	False
SimSys_Scripts	MetOffice/SimSys_Scripts@2025.12.1	True

Task Information

✅ succeeded tasks - 372

Security Considerations

I have reviewed my changes for potential security issues
Sensitive data is properly handled (if applicable)
Authentication and authorisation are properly implemented (if applicable)

Performance Impact

Performance of the code has been considered and, if applicable, suitable performance measurements have been conducted

AI Assistance and Attribution

Some of the content of this change has been produced with the assistance of Generative AI tool name (e.g., Met Office Github Copilot Enterprise, Github Copilot Personal, ChatGPT GPT-4, etc) and I have followed the Simulation Systems AI policy (including attribution labels)

Documentation

Where appropriate I have updated documentation related to this change and confirmed that it builds correctly

PSyclone Approval

If you have edited any PSyclone-related code (e.g. PSyKAl-lite, Kernel interface, optimisation scripts, LFRic data structure code) then please contact the TCD Team

Code Review

All dependencies have been resolved
Related Issues have been properly linked and addressed
CLA compliance has been confirmed
Code quality standards have been met
Tests are adequate and have passed
Documentation is complete and accurate
Security considerations have been addressed
Performance impact is acceptable

…ons.

james-bruten-mo

rose-stem all good

stevemullerworth

Testing the branch infrastructure unit tests on the command line makes sense, and code looks OK. On merging on main, I get a failure with run_lfric_xios_integration_tests_azspice_gnu_64bit: the job.out looks fine, but the job.err reports an "unexpected error"

MatthewHambley

Presumed build system code owner's review. This looks fine, bumping the number of processes for the mpiexec is not a problem. I've made a drive-by comment on something else for the actual reviewers to consider.

MatthewHambley · 2026-02-03T13:36:12Z

infrastructure/unit-test/mesh/partition_mod_test.pf

        num_cells_ghost = partition%get_num_cells_ghost()
        @assertEqual( 0, num_cells_ghost )

+      case (3)


I appreciate that you're following the existing pattern, but if each of these tests are so different they may, in fact, be three different tests with different number of MPI processes. Rather than one run three times.

I don't think these tests are really that different. They all call the same list of type-bound functions fr om the partition object, but those functions return different values in each case. It probably could be better "engineered", but this is a unit test, so I don't think it really has to (or even should) be that well "engineered" - a simple linear list of tests and expected results is probably the simplest to understand and maintain.

mike-hobson · 2026-02-06T13:13:38Z

Testing the branch infrastructure unit tests on the command line makes sense, and code looks OK. On merging on main, I get a failure with run_lfric_xios_integration_tests_azspice_gnu_64bit: the job.out looks fine, but the job.err reports an "unexpected error"

These tests all work by writing data to the same set of files - so occasionally there is contention within the filing system and we see the failure mentioned above. This PR allows those tests to run in parallel, so it is more likely that writes will happen at the same time, which exacerbates the issue.

The answer is to make all the tests run in there own directory - so there will be no write contention. The good news is that @EdHone has already done this work in #212 - which is currently waiting for a code review.

The answer, here, is to just wait until #212 makes its way onto main, then merge the branch up to head. I'll pass the PR back into code review when that happens.

mike-hobson · 2026-02-13T15:39:51Z

PR #212 has been committed to main, so that should fix the intermittent failures due to the race condition in the lfric_xios integration tests once the "Update branch" button has been clicked. No other changes to this branch should be required. I am, therefore, passing this back to the reviewer for another test (and hopefully commit).

…n_test

stevemullerworth

Tests previously failed intermittently, but issue resolved by a separate PR.

mike-hobson added 4 commits January 23, 2026 16:10

Add a unit test for partitioning that leads to unevenly sized partiti…

ceff784

…ons.

Add further partitioning unit tests.

abc5637

Update the number of cores used for tech tests for met office.

a18b71a

Update a tech-tests_cpus that I somehow missed.

9eb35f6

mike-hobson requested review from MatthewHambley, james-bruten-mo, mo-rickywong and stevemullerworth as code owners January 29, 2026 08:13

github-actions bot assigned mike-hobson Jan 29, 2026

james-bruten-mo approved these changes Jan 29, 2026

View reviewed changes

stevemullerworth reviewed Jan 30, 2026

View reviewed changes

MatthewHambley approved these changes Feb 3, 2026

View reviewed changes

mike-hobson requested a review from stevemullerworth February 13, 2026 15:40

Merge remote-tracking branch 'upstream/main' into add_uneven_partitio…

ac0e274

…n_test

stevemullerworth approved these changes Feb 17, 2026

View reviewed changes

stevemullerworth merged commit 6cc84d8 into MetOffice:main Feb 17, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve partitioner unit tests#254

Improve partitioner unit tests#254
stevemullerworth merged 5 commits intoMetOffice:mainfrom
mike-hobson:add_uneven_partition_test

mike-hobson commented Jan 29, 2026 •

edited by james-bruten-mo

Loading

Uh oh!

james-bruten-mo left a comment

Uh oh!

stevemullerworth left a comment

Uh oh!

MatthewHambley left a comment

Uh oh!

MatthewHambley Feb 3, 2026

Uh oh!

mike-hobson Feb 6, 2026

Uh oh!

mike-hobson commented Feb 6, 2026

Uh oh!

mike-hobson commented Feb 13, 2026

Uh oh!

stevemullerworth left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

mike-hobson commented Jan 29, 2026 • edited by james-bruten-mo Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Code Quality Checklist

Testing

Test Suite Results - lfric_core - partition_test/run1

Suite Information

Task Information

Security Considerations

Performance Impact

AI Assistance and Attribution

Documentation

PSyclone Approval

Code Review

Uh oh!

james-bruten-mo left a comment

Choose a reason for hiding this comment

Uh oh!

stevemullerworth left a comment

Choose a reason for hiding this comment

Uh oh!

MatthewHambley left a comment

Choose a reason for hiding this comment

Uh oh!

MatthewHambley Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

mike-hobson Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

mike-hobson commented Feb 6, 2026

Uh oh!

mike-hobson commented Feb 13, 2026

Uh oh!

stevemullerworth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mike-hobson commented Jan 29, 2026 •

edited by james-bruten-mo

Loading