Skip to content

Improve partitioner unit tests#254

Merged
stevemullerworth merged 5 commits intoMetOffice:mainfrom
mike-hobson:add_uneven_partition_test
Feb 17, 2026
Merged

Improve partitioner unit tests#254
stevemullerworth merged 5 commits intoMetOffice:mainfrom
mike-hobson:add_uneven_partition_test

Conversation

@mike-hobson
Copy link
Contributor

@mike-hobson mike-hobson commented Jan 29, 2026

PR Summary

Sci/Tech Reviewer: None required
Code Reviewer: @stevemullerworth

I noted three deficiencies with the partition unit tests:

  1. There are no tests of uneven partitions. The single panel partitioner unit tests use an 8x8 full domain mesh. If this is partitioned over 3 ranks, it should result in partitions of size 3x8, 3x8 and 2x8 (i.e. uneven). Add such tests to the biperiodic, planar and x- plus y- trench mesh partitioner tests.
  2. There is no test of the parallel cubedsphere partitioner. This was omitted because it is slow to run from the command line. Whilst that is true (it is slower than all the other infrastructure unit tests combined), this is the partitioner used for almost all global model runs - so I have added it in (and we'll take the hit on runtime).
  3. The current test suite runs the unit tests on one core - even though the infrastructure unit tests are parallel. The infrastructure unit tests take two and a half minutes to complete. Giving them the correct number of cores to run on drops this to 8 seconds. There is no fine-grained control of resources for the unit tests, so after discussions with SSD, we decided it was fine for all "technical tests" (integration- and unit-tests) to run on multiple cores. The Met Office appears to have its own overrides for this setting, but I have also changed the system default (as I can see no reason for anyone wanting to run a parallel test on a single core)

Code Quality Checklist

  • I have performed a self-review of my own code
  • My code follows the project's style guidelines
  • Comments have been included that aid understanding and enhance the readability of the code
  • My changes generate no new warnings
  • All automated checks in the CI pipeline have completed successfully

Testing

  • I have tested this change locally, using the LFRic Core rose-stem suite
  • If required (e.g. API changes) I have also run the LFRic Apps test suite using this branch
  • If any tests fail (rose-stem or CI) the reason is understood and acceptable (e.g. kgo changes)
  • I have added tests to cover new functionality as appropriate (e.g. system tests, unit tests, etc.)
  • Any new tests have been assigned an appropriate amount of compute resource and have been allocated to an appropriate testing group (i.e. the developer tests are for jobs which use a small amount of compute resource and complete in a matter of minutes)

Test Suite Results - lfric_core - partition_test/run1

Suite Information

Item Value
Suite Name partition_test/run1
Suite User mike.hobson
Workflow Start 2026-01-29T06:08:00
Groups Run all
Dependency Reference Main Like
lfric_core mike-hobson/lfric_core@add_uneven_partition_test False
SimSys_Scripts MetOffice/SimSys_Scripts@2025.12.1 True

Task Information

✅ succeeded tasks - 372

Security Considerations

  • I have reviewed my changes for potential security issues
  • Sensitive data is properly handled (if applicable)
  • Authentication and authorisation are properly implemented (if applicable)

Performance Impact

  • Performance of the code has been considered and, if applicable, suitable performance measurements have been conducted

AI Assistance and Attribution

  • Some of the content of this change has been produced with the assistance of Generative AI tool name (e.g., Met Office Github Copilot Enterprise, Github Copilot Personal, ChatGPT GPT-4, etc) and I have followed the Simulation Systems AI policy (including attribution labels)

Documentation

  • Where appropriate I have updated documentation related to this change and confirmed that it builds correctly

PSyclone Approval

  • If you have edited any PSyclone-related code (e.g. PSyKAl-lite, Kernel interface, optimisation scripts, LFRic data structure code) then please contact the TCD Team

Code Review

  • All dependencies have been resolved
  • Related Issues have been properly linked and addressed
  • CLA compliance has been confirmed
  • Code quality standards have been met
  • Tests are adequate and have passed
  • Documentation is complete and accurate
  • Security considerations have been addressed
  • Performance impact is acceptable

Copy link
Collaborator

@james-bruten-mo james-bruten-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rose-stem all good

Copy link
Collaborator

@stevemullerworth stevemullerworth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing the branch infrastructure unit tests on the command line makes sense, and code looks OK. On merging on main, I get a failure with run_lfric_xios_integration_tests_azspice_gnu_64bit: the job.out looks fine, but the job.err reports an "unexpected error"

Copy link
Collaborator

@MatthewHambley MatthewHambley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumed build system code owner's review. This looks fine, bumping the number of processes for the mpiexec is not a problem. I've made a drive-by comment on something else for the actual reviewers to consider.

num_cells_ghost = partition%get_num_cells_ghost()
@assertEqual( 0, num_cells_ghost )

case (3)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate that you're following the existing pattern, but if each of these tests are so different they may, in fact, be three different tests with different number of MPI processes. Rather than one run three times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these tests are really that different. They all call the same list of type-bound functions fr om the partition object, but those functions return different values in each case. It probably could be better "engineered", but this is a unit test, so I don't think it really has to (or even should) be that well "engineered" - a simple linear list of tests and expected results is probably the simplest to understand and maintain.

@mike-hobson
Copy link
Contributor Author

Testing the branch infrastructure unit tests on the command line makes sense, and code looks OK. On merging on main, I get a failure with run_lfric_xios_integration_tests_azspice_gnu_64bit: the job.out looks fine, but the job.err reports an "unexpected error"

These tests all work by writing data to the same set of files - so occasionally there is contention within the filing system and we see the failure mentioned above. This PR allows those tests to run in parallel, so it is more likely that writes will happen at the same time, which exacerbates the issue.

The answer is to make all the tests run in there own directory - so there will be no write contention. The good news is that @EdHone has already done this work in #212 - which is currently waiting for a code review.

The answer, here, is to just wait until #212 makes its way onto main, then merge the branch up to head. I'll pass the PR back into code review when that happens.

@mike-hobson
Copy link
Contributor Author

PR #212 has been committed to main, so that should fix the intermittent failures due to the race condition in the lfric_xios integration tests once the "Update branch" button has been clicked. No other changes to this branch should be required. I am, therefore, passing this back to the reviewer for another test (and hopefully commit).

Copy link
Collaborator

@stevemullerworth stevemullerworth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests previously failed intermittently, but issue resolved by a separate PR.

@stevemullerworth stevemullerworth merged commit 6cc84d8 into MetOffice:main Feb 17, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants