Skip to content

Fix scalar log_M handling in catch-at-age output#1321

Draft
awilnoaa wants to merge 12 commits into
mainfrom
dev-fix_nlengths
Draft

Fix scalar log_M handling in catch-at-age output#1321
awilnoaa wants to merge 12 commits into
mainfrom
dev-fix_nlengths

Conversation

@awilnoaa

@awilnoaa awilnoaa commented Mar 18, 2026

Copy link
Copy Markdown
Contributor

What is the feature?

  • Fixes handling of scalar log_M in catch-at-age model output.

How have you implemented the solution?

  • Updated catch-at-age calculations to use get_force_scalar() when consuming population natural mortality values.
  • Updated log_M and related mortality output dimensions to reflect the actual log_M input shape instead of always reporting full age-year dimensions.
  • Fixed JSON header quoting and type conversion issues needed to support the new scalar dimension reporting.

Does the PR impact any other area of the project, maybe another repo?

  • No expected impact outside of FIMS catch-at-age log_M handling and its JSON/model output.

Addresses Issue #1084


Instructions for code reviewer

👋Hello reviewer👋, thank you for taking the time to review this PR!

  • Please use this checklist during your review, checking off items that you have verified are complete but feel free to skip over items that are not relevant!
  • See the GitHub documentation for how to comment on a PR to indicate where you have questions or changes are needed before approving the PR.
  • Please use standard conventional messages for both commit messages and comments
  • PR reviews are a great way to learn so feel free to share your tips and tricks. However, when suggesting changes to the PR that are optional please include nit: (for nitpicking) as the comment type. For example, nit: I prefer using a data.frame() instead of a matrix because ...
  • Engage with the developer. Make it clear when the PR is approved by selecting the approved status, and potentially commenting on the PR with something like This PR is now ready to be merged.

Checklist

  • The PR requests the appropriate base branch (dev for features and main for hot fixes)
  • The code is well-designed
  • The code is designed well for both users and developers
  • Code coverage remains high- [ ] Comments are clear, useful, and explain why instead of what
  • Code is appropriately documented (doxygen and roxygen)

@awilnoaa awilnoaa added the status: in progress Designing or coding is actively occurring label Mar 18, 2026
@codecov

codecov Bot commented Mar 18, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 94.11765% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 82.98%. Comparing base (c57cf29) to head (133a9e0).

Files with missing lines Patch % Lines
inst/include/models/functors/catch_at_age.hpp 94.11% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1321      +/-   ##
==========================================
+ Coverage   82.97%   82.98%   +0.01%     
==========================================
  Files          54       54              
  Lines        2214     2222       +8     
  Branches      579      579              
==========================================
+ Hits         1837     1844       +7     
- Misses        279      280       +1     
  Partials       98       98              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nathanvaughan-NOAA nathanvaughan-NOAA left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that we want the scalar log_M because it speeds things up, should we also change the default parameter setup to create a scalar parameter rather than a replicated n_years*n_ages vector? The setup is on lines 277:287 in create_default_parameters.R. That would also help test/confirm that the changes don't have any bugs.

Comment thread inst/include/interface/rcpp/rcpp_objects/rcpp_models.hpp Outdated
Comment thread inst/include/interface/rcpp/rcpp_objects/rcpp_models.hpp
Comment thread inst/include/interface/rcpp/rcpp_objects/rcpp_models.hpp Outdated
Comment thread inst/include/interface/rcpp/rcpp_objects/rcpp_models.hpp Outdated
@kellijohnson-NOAA

Copy link
Copy Markdown
Contributor

@nathanvaughan-NOAA we went back and forth a bit while Andrea and Adrianne were working on this to try and decide if M should be reported for the entire time x age series if only a single value is estimated and because selectivity we thought that it should match what is done for selectivity where only a single value is reported if it does not vary with time. I am willing to rethink this globally for all parameters but we cannot get that rethink done before the CIE Review. So, I propose that we have M be reported as a single value when only a single value is estimated.
@Andrea-Havron-NOAA I think that Nathan still found some changes that need to be made though. Do you have time today to edit this PR?

@Andrea-Havron-NOAA

Copy link
Copy Markdown
Collaborator

@nathanvaughan-NOAA we went back and forth a bit while Andrea and Adrianne were working on this to try and decide if M should be reported for the entire time x age series if only a single value is estimated and because selectivity we thought that it should match what is done for selectivity where only a single value is reported if it does not vary with time. I am willing to rethink this globally for all parameters but we cannot get that rethink done before the CIE Review. So, I propose that we have M be reported as a single value when only a single value is estimated. @Andrea-Havron-NOAA I think that Nathan still found some changes that need to be made though. Do you have time today to edit this PR?

@kellijohnson-NOAA, yes I can work on this today.

@nathanvaughan-NOAA

Copy link
Copy Markdown
Contributor

Sounds good to me @kellijohnson-NOAA , unless I made any mistakes I think you should be able to just copy in the code chunks in my comments @Andrea-Havron-NOAA and this will be good to go.

@Andrea-Havron-NOAA

Copy link
Copy Markdown
Collaborator

Sounds good to me @kellijohnson-NOAA , unless I made any mistakes I think you should be able to just copy in the code chunks in my comments @Andrea-Havron-NOAA and this will be good to go.

I copied the code over and just fixed some minor bugs. Should I go ahead and change the default log_M to be a scalar in the parameter tibble?

@kellijohnson-NOAA

Copy link
Copy Markdown
Contributor

Yes, that would be great because I think we should be using best practices as the default and there is no population out there that I know of that would have information on both age and time varying natural mortality.

@nathanvaughan-NOAA

Copy link
Copy Markdown
Contributor

Adding my comment on @Andrea-Havron-NOAA 's changes here too incase it doesn't show up in the resolved previous comment. This is in reference to the line 1037:1040 code.

@Andrea-Havron-NOAA I think you edits are still going to have an issue in the derived quantities dim info. Shouldn't fims::Vector{static_cast(population->log_M.size())} have the same dimensions as dim_names? At the moment the vector will always be length 1 because it's just the size on log_M but the dim names could be length 1 is log_M is a scalar or length 2 if it's length n_years*n_ages. If it doesn't matter thats fine but that's why I moved the definition insided the if/else call. I'll change my review to an approve now though so I don't hold you up in the race to the finish :)

@Andrea-Havron-NOAA

Copy link
Copy Markdown
Collaborator

Adding my comment on @Andrea-Havron-NOAA 's changes here too incase it doesn't show up in the resolved previous comment. This is in reference to the line 1037:1040 code.

@Andrea-Havron-NOAA I think you edits are still going to have an issue in the derived quantities dim info. Shouldn't fims::Vector{static_cast(population->log_M.size())} have the same dimensions as dim_names? At the moment the vector will always be length 1 because it's just the size on log_M but the dim names could be length 1 is log_M is a scalar or length 2 if it's length n_years*n_ages. If it doesn't matter thats fine but that's why I moved the definition insided the if/else call. I'll change my review to an approve now though so I don't hold you up in the race to the finish :)

I think I addressed this. I am still using log_M.size() to set the length of mortlaity_M when log_M.size() is not n_years x n_ages, so it should resize mortality_M to length 1. I added a dimension check in the rcpp_population interface to through an error if log_M is not either a scalar or n_years * n_ages. We will have to think about how to handle situations when log_M is just n_years or just n_ages, but the C++ code can't handle this yet anyway.

@Andrea-Havron-NOAA

Copy link
Copy Markdown
Collaborator

@kellijohnson-NOAA, I changed log_M to scalar in the default model and added some log_M dimension tests in the test-test-integration-fleet-log-obs-error-input file. I think we should rename this file to: test-parameter-input-dimensions, but I wanted you to review the tests first before making that change.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes catch-at-age log_M handling so scalar inputs are consumed correctly in-model and reflected correctly in JSON/model output dimensionality (addressing Issue #1084).

Changes:

  • Updated catch-at-age mortality calculations to use get_force_scalar() when reading log_M / writing or reporting M.
  • Made log_M and mortality_M output dimensionality depend on the actual log_M input shape (scalar vs n_years * n_ages).
  • Added/updated an integration test covering scalar vs full-dimension log_M output sizing and added a C++ size check for log_M.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/testthat/test-integration-fleet-log-obs-error-input.R Adds integration coverage ensuring scalar vs matrix-shaped log_M produces matching output lengths for log_M and mortality_M, plus a wrong-size error case.
inst/include/models/functors/catch_at_age.hpp Switches log_M/M accesses to get_force_scalar() in several computations and output filling.
inst/include/interface/rcpp/rcpp_objects/rcpp_population.hpp Adds validation that log_M is either scalar or n_years * n_ages before resizing/using it.
inst/include/interface/rcpp/rcpp_objects/rcpp_models.hpp Updates JSON dimensionality for log_M and allocates/dimensions mortality_M to match log_M shape.
R/create_default_parameters.R Changes the default Population log_M parameter to be scalar by default (instead of always n_years * n_ages).

Comment thread inst/include/interface/rcpp/rcpp_objects/rcpp_population.hpp
Comment on lines 203 to 205
for (size_t age = 0; age < population->n_ages; age++) {
for (size_t year = 0; year < population->n_years; year++) {
size_t i_age_year = age * population->n_years + year;

Copilot AI Mar 23, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Prepare() the age/year folding index uses age * n_years + year, but the main Evaluate() loop (and JSON dimensionality n_years, n_ages) uses i_age_year = year * n_ages + age. With a non-scalar log_M, this will permute log_M -> M (and thus mortality) across age/year. Please update the transformation loop to use the same folding convention as Evaluate() when reading log_M and writing M (and consider reordering loops to make this clearer).

Suggested change
for (size_t age = 0; age < population->n_ages; age++) {
for (size_t year = 0; year < population->n_years; year++) {
size_t i_age_year = age * population->n_years + year;
for (size_t year = 0; year < population->n_years; year++) {
for (size_t age = 0; age < population->n_ages; age++) {
size_t i_age_year = year * population->n_ages + age;

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is an artifact with how we implemented the transformation code in the past:

 // Transformation Section
            for (size_t age = 0; age < this->nages; age++) {
                this->weight_at_age[age] = growth->evaluate(ages[age]);
                for (size_t year = 0; year < this->nyears; year++) {
                    size_t i_age_year = age * this->nyears + year;
                    this->M[i_age_year] = fims_math::exp(this->log_M[i_age_year]);
                }
            }

Rather than track this->weight_at_age, the catch_at_age.hpp file just references growth->evaluate(year, ages[age]) directly.

@msupernaw, is there a reason for this change? Do we want to add a weight_at_age vector back in that we track? If not, we can just move forward with switching the order of how the transformation is called so it is consistent with the Evaluate() function

ss << " \"dimensions\": [" << population_interface->n_years.get()
<< ", " << population_interface->n_ages.get() << "]\n";
} else {
ss << " \"header\": [\"scalar\"],\n";

Copilot AI Mar 23, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scalar log_M JSON dimensionality branch introduces a new header label "scalar". Elsewhere in this file, non-indexed vectors use header "na" (e.g., Fleet log_q), which is also specially handled by dimensions_to_tibble() to avoid creating an extra dimension column. Consider using the existing "na" convention here (and for scalar mortality_M dim names) or update the R reshaping helpers to treat "scalar" the same way, to keep output schemas consistent.

Suggested change
ss << " \"header\": [\"scalar\"],\n";
ss << " \"header\": [\"na\"],\n";

Copilot uses AI. Check for mistakes.

Copilot AI commented Mar 23, 2026

Copy link
Copy Markdown
Contributor

@Andrea-Havron-NOAA I've opened a new pull request, #1346, to work on those changes. Once the pull request is ready, I'll request review from you.

@kellijohnson-NOAA kellijohnson-NOAA left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@awilnoaa @nathanvaughan-NOAA @Andrea-Havron-NOAA thank you for these changes. I am going to move this PR to draft status and update the issue to better reflect what we need. In reviewing this PR I realized that we have a few fundamental changes that we need to make to the codebase beyond just log_M. As we make everything time-varying and potentially age-varying (i.e., random walk at age for selectivity) we need to be consistent in how we do this. I think get_force_scalar() needs some work and we need to be consistent in how all values and their dimensions are reported in the json.
A fundamental question that I have is, do we always want to report things at their full scale, i.e., report every parameter for every time step, but just have a new label for estimation_type when it is essentially mirrored rather than freely estimated. For example, if we have a single M, what would be the estimation type if we report that parameter for every combination of age and year but it is only estimated once. Comparison between models where you estimate it on the annual time step versus a single value would be easier for users if the values are always reported the same. This was a design choice that we talked about a long time ago but it is not being implemented that way right now.
Second, what do we do for transformed parameters when we want their uncertainty on the untransformed scale. This is something @Andrea-Havron-NOAA is thinking a lot about right now.
Third, the dimensions of all inputs need to be checked, not just log_M, and in a systematic way.
So, I am going to make sure that the original issue reflects these needs. I am going to change this to a draft PR so we can pick it up again later, and I look forward to the future conversations that we will have trying to solve all of these problems.

@kellijohnson-NOAA kellijohnson-NOAA marked this pull request as draft March 24, 2026 16:02
@kellijohnson-NOAA kellijohnson-NOAA linked an issue Mar 24, 2026 that may be closed by this pull request
@kellijohnson-NOAA kellijohnson-NOAA linked an issue Mar 24, 2026 that may be closed by this pull request
6 tasks
@kellijohnson-NOAA kellijohnson-NOAA added status: needs discussion Dialogue is needed before a next step can be taken and removed status: in progress Designing or coding is actively occurring labels Mar 25, 2026
@kellijohnson-NOAA kellijohnson-NOAA deleted the branch main May 21, 2026 04:00
@github-project-automation github-project-automation Bot moved this from Backlog to Done in FIMS Code Club May 21, 2026
@kellijohnson-NOAA kellijohnson-NOAA changed the base branch from dev to main May 21, 2026 04:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status: needs discussion Dialogue is needed before a next step can be taken

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Bug:] input and output dimensions of parameters do not always match

6 participants