Skip to content

Remove ensemble loading heuristics#1888

Merged
jfrost-mo merged 7 commits intomainfrom
1265_remove_ensemble_heuristics
Feb 9, 2026
Merged

Remove ensemble loading heuristics#1888
jfrost-mo merged 7 commits intomainfrom
1265_remove_ensemble_heuristics

Conversation

@jfrost-mo
Copy link
Member

@jfrost-mo jfrost-mo commented Jan 26, 2026

We now rely on ensemble data having a realization coordinate.
This allows for a significant simplification of the loading process.

It also provides a 2x speedup when loading ensemble data.

Fixes #1808
Fixes #1845

Contribution checklist

Aim to have all relevant checks ticked off before merging. See the developer's guide for more detail.

  • Documentation has been updated to reflect change.
  • New code has tests, and affected old tests have been updated.
  • All tests and CI checks pass.
  • Ensured the pull request title is descriptive.
  • Conda lock files have been updated if dependencies have changed.
  • Attributed any Generative AI, such as GitHub Copilot, used in this PR.
  • Marked the PR as ready to review.

@jfrost-mo jfrost-mo self-assigned this Jan 26, 2026
@jfrost-mo jfrost-mo added the cleanup Non-functional improvement label Jan 26, 2026
@jfrost-mo jfrost-mo marked this pull request as ready for review January 26, 2026 16:19
@github-actions
Copy link
Contributor

github-actions bot commented Jan 26, 2026

Coverage

@jfrost-mo
Copy link
Member Author

jfrost-mo commented Jan 26, 2026

#1872 implements a similar fix to 7cb7d26 as part of a larger set of changes. Note that it will probably conflict.

We now rely on ensemble data having a `realization` coordinate.
This allows for a significant simplification of the loading process.

Fixes #1265
I think this was mostly working coincidentally, as the operations were
also being done in-place. I do, however, expect that the
`iris.util.squeeze(cube)` in _lfric_time_coord_fix_callback was not
applied, which explains some issues we have been having.
This avoids making concatenating cubes difficult before we have done it.
@jfrost-mo jfrost-mo force-pushed the 1265_remove_ensemble_heuristics branch from d020b2e to 7cb7d26 Compare January 26, 2026 16:28
@SGallagherMet
Copy link
Contributor

@mo-tomosevans could you comment on this? I know you've been testing ensemble data
My gut feeling is that this will break out trial and operational data as the 'realisation' number is not included in the cube metadata for the control member so has to be inferred from the file name.
That being said, I'm not sure the code that has been removed would have worked anyway, as the operational and trial file naming conventions are different.

@mo-tomosevans
Copy link

Hi

I think this is PR is something that could address problem I've encountered in #1845. Someone else reported something similar in #1808.

You're right though @SGallagherMet - in the ensemble trial data I've been testing, the ensemble members are given as realization coordinates in the cubes for all but the control member (i.e. <path>/enuk_um_000/enukaa_pd*). In this case there is no realization coordinate by default.

@jfrost-mo
Copy link
Member Author

jfrost-mo commented Jan 27, 2026

the 'realisation' number is not included in the cube metadata for the control member

Am I right in thinking the control would be member 0? If so then it would have a realization coordinate added by the _realization_callback, which should then cause it to merge into a single cube with the other members (assuming I've picked the right coordinate type, etc.)

Is there any operational/trial ensemble data on disk that I could give this a test with?

Copy link
Contributor

@JMEdwardsXtr JMEdwardsXtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have run CSET with data from both the UM and LFRic (two deterministic models in the same suite) and this modified code runs. I do not have ensemble data on which to test it and so cannot comment on how it works with ensembles themselves.

@mo-tomosevans
Copy link

Sorry @jfrost-mo I missed your latest comment.

Am I right in thinking the control would be member 0? If so then it would have a realization coordinate added by the _realization_callback, which should then cause it to merge into a single cube with the other members (assuming I've picked the right coordinate type, etc.)

Yes that is correct and yes _realization_callback does work in creating a realization coordinate. However, the second part of the function:

which should then cause it to merge into a single cube with the other members

is where I'm finding difficulty. I haven't found it able to merge the realizations when multiple ensemble members are pointed to in the input paths (e.g. /enuk_um*/enukaa_pd*).

I'm also getting a new problem with the _fix_spatial_coords_callback which suggests some metadata issues - not sure it belongs in this discussion yet.

They are used unconditionally, so there was no benefit to loading them
within the function.
@jfrost-mo
Copy link
Member Author

I've fixed the issue with _fix_spatial_coords_callback in c1ed446. The merging into a single cube appears to be a data specific issue (and it is member 2 that is failing to merge, not member 0), so I'm going to go ahead and merge this.

@jfrost-mo jfrost-mo merged commit 30be501 into main Feb 9, 2026
8 checks passed
@jfrost-mo jfrost-mo deleted the 1265_remove_ensemble_heuristics branch February 9, 2026 11:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cleanup Non-functional improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Not producing member plots for ensemble trials 'Not producing single cube' error reading UM ENS data

4 participants