Fix scalar log_M handling in catch-at-age output#1321
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1321 +/- ##
==========================================
+ Coverage 82.97% 82.98% +0.01%
==========================================
Files 54 54
Lines 2214 2222 +8
Branches 579 579
==========================================
+ Hits 1837 1844 +7
- Misses 279 280 +1
Partials 98 98 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
nathanvaughan-NOAA
left a comment
There was a problem hiding this comment.
Assuming that we want the scalar log_M because it speeds things up, should we also change the default parameter setup to create a scalar parameter rather than a replicated n_years*n_ages vector? The setup is on lines 277:287 in create_default_parameters.R. That would also help test/confirm that the changes don't have any bugs.
|
@nathanvaughan-NOAA we went back and forth a bit while Andrea and Adrianne were working on this to try and decide if M should be reported for the entire time x age series if only a single value is estimated and because selectivity we thought that it should match what is done for selectivity where only a single value is reported if it does not vary with time. I am willing to rethink this globally for all parameters but we cannot get that rethink done before the CIE Review. So, I propose that we have M be reported as a single value when only a single value is estimated. |
@kellijohnson-NOAA, yes I can work on this today. |
|
Sounds good to me @kellijohnson-NOAA , unless I made any mistakes I think you should be able to just copy in the code chunks in my comments @Andrea-Havron-NOAA and this will be good to go. |
I copied the code over and just fixed some minor bugs. Should I go ahead and change the default log_M to be a scalar in the parameter tibble? |
|
Yes, that would be great because I think we should be using best practices as the default and there is no population out there that I know of that would have information on both age and time varying natural mortality. |
|
Adding my comment on @Andrea-Havron-NOAA 's changes here too incase it doesn't show up in the resolved previous comment. This is in reference to the line 1037:1040 code. @Andrea-Havron-NOAA I think you edits are still going to have an issue in the derived quantities dim info. Shouldn't fims::Vector{static_cast(population->log_M.size())} have the same dimensions as dim_names? At the moment the vector will always be length 1 because it's just the size on log_M but the dim names could be length 1 is log_M is a scalar or length 2 if it's length n_years*n_ages. If it doesn't matter thats fine but that's why I moved the definition insided the if/else call. I'll change my review to an approve now though so I don't hold you up in the race to the finish :) |
I think I addressed this. I am still using log_M.size() to set the length of mortlaity_M when log_M.size() is not n_years x n_ages, so it should resize mortality_M to length 1. I added a dimension check in the rcpp_population interface to through an error if log_M is not either a scalar or n_years * n_ages. We will have to think about how to handle situations when log_M is just n_years or just n_ages, but the C++ code can't handle this yet anyway. |
|
@kellijohnson-NOAA, I changed log_M to scalar in the default model and added some log_M dimension tests in the test-test-integration-fleet-log-obs-error-input file. I think we should rename this file to: test-parameter-input-dimensions, but I wanted you to review the tests first before making that change. |
There was a problem hiding this comment.
Pull request overview
Fixes catch-at-age log_M handling so scalar inputs are consumed correctly in-model and reflected correctly in JSON/model output dimensionality (addressing Issue #1084).
Changes:
- Updated catch-at-age mortality calculations to use
get_force_scalar()when readinglog_M/ writing or reportingM. - Made
log_Mandmortality_Moutput dimensionality depend on the actuallog_Minput shape (scalar vsn_years * n_ages). - Added/updated an integration test covering scalar vs full-dimension
log_Moutput sizing and added a C++ size check forlog_M.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
tests/testthat/test-integration-fleet-log-obs-error-input.R |
Adds integration coverage ensuring scalar vs matrix-shaped log_M produces matching output lengths for log_M and mortality_M, plus a wrong-size error case. |
inst/include/models/functors/catch_at_age.hpp |
Switches log_M/M accesses to get_force_scalar() in several computations and output filling. |
inst/include/interface/rcpp/rcpp_objects/rcpp_population.hpp |
Adds validation that log_M is either scalar or n_years * n_ages before resizing/using it. |
inst/include/interface/rcpp/rcpp_objects/rcpp_models.hpp |
Updates JSON dimensionality for log_M and allocates/dimensions mortality_M to match log_M shape. |
R/create_default_parameters.R |
Changes the default Population log_M parameter to be scalar by default (instead of always n_years * n_ages). |
| for (size_t age = 0; age < population->n_ages; age++) { | ||
| for (size_t year = 0; year < population->n_years; year++) { | ||
| size_t i_age_year = age * population->n_years + year; |
There was a problem hiding this comment.
In Prepare() the age/year folding index uses age * n_years + year, but the main Evaluate() loop (and JSON dimensionality n_years, n_ages) uses i_age_year = year * n_ages + age. With a non-scalar log_M, this will permute log_M -> M (and thus mortality) across age/year. Please update the transformation loop to use the same folding convention as Evaluate() when reading log_M and writing M (and consider reordering loops to make this clearer).
| for (size_t age = 0; age < population->n_ages; age++) { | |
| for (size_t year = 0; year < population->n_years; year++) { | |
| size_t i_age_year = age * population->n_years + year; | |
| for (size_t year = 0; year < population->n_years; year++) { | |
| for (size_t age = 0; age < population->n_ages; age++) { | |
| size_t i_age_year = year * population->n_ages + age; |
There was a problem hiding this comment.
It looks like this is an artifact with how we implemented the transformation code in the past:
// Transformation Section
for (size_t age = 0; age < this->nages; age++) {
this->weight_at_age[age] = growth->evaluate(ages[age]);
for (size_t year = 0; year < this->nyears; year++) {
size_t i_age_year = age * this->nyears + year;
this->M[i_age_year] = fims_math::exp(this->log_M[i_age_year]);
}
}
Rather than track this->weight_at_age, the catch_at_age.hpp file just references growth->evaluate(year, ages[age]) directly.
@msupernaw, is there a reason for this change? Do we want to add a weight_at_age vector back in that we track? If not, we can just move forward with switching the order of how the transformation is called so it is consistent with the Evaluate() function
| ss << " \"dimensions\": [" << population_interface->n_years.get() | ||
| << ", " << population_interface->n_ages.get() << "]\n"; | ||
| } else { | ||
| ss << " \"header\": [\"scalar\"],\n"; |
There was a problem hiding this comment.
The scalar log_M JSON dimensionality branch introduces a new header label "scalar". Elsewhere in this file, non-indexed vectors use header "na" (e.g., Fleet log_q), which is also specially handled by dimensions_to_tibble() to avoid creating an extra dimension column. Consider using the existing "na" convention here (and for scalar mortality_M dim names) or update the R reshaping helpers to treat "scalar" the same way, to keep output schemas consistent.
| ss << " \"header\": [\"scalar\"],\n"; | |
| ss << " \"header\": [\"na\"],\n"; |
|
@Andrea-Havron-NOAA I've opened a new pull request, #1346, to work on those changes. Once the pull request is ready, I'll request review from you. |
59d4150 to
91a52cb
Compare
kellijohnson-NOAA
left a comment
There was a problem hiding this comment.
@awilnoaa @nathanvaughan-NOAA @Andrea-Havron-NOAA thank you for these changes. I am going to move this PR to draft status and update the issue to better reflect what we need. In reviewing this PR I realized that we have a few fundamental changes that we need to make to the codebase beyond just log_M. As we make everything time-varying and potentially age-varying (i.e., random walk at age for selectivity) we need to be consistent in how we do this. I think get_force_scalar() needs some work and we need to be consistent in how all values and their dimensions are reported in the json.
A fundamental question that I have is, do we always want to report things at their full scale, i.e., report every parameter for every time step, but just have a new label for estimation_type when it is essentially mirrored rather than freely estimated. For example, if we have a single M, what would be the estimation type if we report that parameter for every combination of age and year but it is only estimated once. Comparison between models where you estimate it on the annual time step versus a single value would be easier for users if the values are always reported the same. This was a design choice that we talked about a long time ago but it is not being implemented that way right now.
Second, what do we do for transformed parameters when we want their uncertainty on the untransformed scale. This is something @Andrea-Havron-NOAA is thinking a lot about right now.
Third, the dimensions of all inputs need to be checked, not just log_M, and in a systematic way.
So, I am going to make sure that the original issue reflects these needs. I am going to change this to a draft PR so we can pick it up again later, and I look forward to the future conversations that we will have trying to solve all of these problems.
or log_M and population M
- add dimension check in rcpp_population - change log_M to be scalar in default model - add tests to check log_M dimensions
Co-authored-by: Andrea-Havron-NOAA <85530309+Andrea-Havron-NOAA@users.noreply.github.com> Agent-Logs-Url: https://github.com/NOAA-FIMS/FIMS/sessions/a7489e2b-9488-42b1-ba55-ff062157642f
91a52cb to
133a9e0
Compare
What is the feature?
log_Min catch-at-age model output.How have you implemented the solution?
get_force_scalar()when consuming population natural mortality values.log_Mand related mortality output dimensions to reflect the actuallog_Minput shape instead of always reporting full age-year dimensions.Does the PR impact any other area of the project, maybe another repo?
log_Mhandling and its JSON/model output.Addresses Issue #1084
Instructions for code reviewer
👋Hello reviewer👋, thank you for taking the time to review this PR!
nit:(for nitpicking) as the comment type. For example,nit:I prefer using adata.frame()instead of amatrixbecause ...This PR is now ready to be merged.Checklist