Merge dev-tidy into main#1509
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1509 +/- ##
==========================================
- Coverage 87.53% 87.52% -0.01%
==========================================
Files 97 99 +2
Lines 8749 8937 +188
Branches 516 523 +7
==========================================
+ Hits 7658 7822 +164
- Misses 1053 1078 +25
+ Partials 38 37 -1 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Implements the three standard broom generics for FIMSFit objects: - tidy.FIMSFit: one row per parameter with broom column conventions (term, estimate, std.error, statistic, p.value); supports conf.int and filtering by estimation_type - glance.FIMSFit: one-row model summary including logLik, AIC, BIC, max_gradient, converged, and terminal_ssb - augment.FIMSFit: observed vs. expected pairs renamed to yardstick conventions (.truth, .pred, .weight) for direct use with yardstick metric functions Also adds get_fit_metrics() and get_fit_stream() as convenience wrappers for computing grouped yardstick metrics from a FIMSFit. Fixes a segfault in get_estimates(): obj$gr() was being called twice — once in FIMSFit() and again in reshape_tmb_estimates(). The gradient vector is now computed once at construction time, stored in a new FIMSFit@gradient slot, and passed through to avoid the second C++ call after clear() has freed memory.
change tidy functions to use .data$
50be962 to
b773901
Compare
|
@ericward-noaa at first glance all of this is GREAT! Thank you. I am wondering if we should rename "observed" and "predicted" to ".true" and ".estimated" so we do not have to rename anything? |
|
That's a good suggestion @kellijohnson-NOAA , I'd defer to you and others. 'observed' and 'expected' or 'observed' and 'predicted' are a little more intuitive for interpretation -- and ".true" and ".estimated" are what yardstick needs. There was another comment on a recent issue about periods confusing people, so I think that is the tradeoff |
|
I am the person that commented about confusion with the ".xxx" notation. I think my main concern with that is putting yet another different type of notation in front of people. Additionally, (and this may just be me thing), I associate the period notation with python, though I know that ggplot uses it for some of their functions (which I also don't love). However, this is BY NO MEANS a 🗻 that I will ⚰️ on. It's not even a hill that I would climb up 🤣. |
kellijohnson-NOAA
left a comment
There was a problem hiding this comment.
I have a few changes and comments but they are pretty minor. I think this PR is a great step forward. There are some places where we could make the FIMS codebase better in the future so less has to be done inside these functions but I think we should merge this in and then let early adopters play around with things before we make any additional changes to fit_fims() or other functions.
| if (!requireNamespace("yardstick", quietly = TRUE)) { | ||
| cli::cli_abort( | ||
| "Package {.pkg yardstick} is required. Install it with | ||
| {.code install.packages('yardstick')}." | ||
| ) | ||
| } |
There was a problem hiding this comment.
{yardstick} is in Imports section of the the DESCRIPTION file so this seems unnecessary.
| dplyr::select( | ||
| dplyr::all_of(meta_cols), | ||
| ".truth" = "observed", | ||
| ".pred" = "expected", | ||
| dplyr::any_of("uncertainty") | ||
| ) |> | ||
| dplyr::mutate( | ||
| .truth = as.numeric(.data$.truth), | ||
| .pred = as.numeric(.data$.pred) | ||
| ) |
There was a problem hiding this comment.
I think it would be cleaner to mutate first, then select.
| dplyr::select( | |
| dplyr::all_of(meta_cols), | |
| ".truth" = "observed", | |
| ".pred" = "expected", | |
| dplyr::any_of("uncertainty") | |
| ) |> | |
| dplyr::mutate( | |
| .truth = as.numeric(.data$.truth), | |
| .pred = as.numeric(.data$.pred) | |
| ) | |
| dplyr::mutate( | |
| .truth = as.numeric(.data$observed), | |
| .pred = as.numeric(.data$expected) | |
| ) |> | |
| dplyr::select( | |
| dplyr::all_of(c(meta_cols, ".truth", "expected")), | |
| dplyr::any_of("uncertainty") | |
| ) |
| dplyr::select(-"uncertainty") | ||
| } else if ("uncertainty" %in% names(out)) { | ||
| out <- dplyr::select(out, -"uncertainty") | ||
| } |
There was a problem hiding this comment.
Can we remove the removal of uncertainty within the if statement and remove the else statement entirely and then just use `dplyr::select(-dplyr::any_of("uncertainty")) with the return statement?
| optimised | ||
| optimized |
There was a problem hiding this comment.
| optimised | |
| optimized |
Neither of these are needed with the cspell routine.
| rlang | ||
| rlib | ||
| rmarkdown | ||
| rmse |
There was a problem hiding this comment.
| rmse |
cspell is case agnostic for stuff like this.
| #' @param x A `FIMSFit` object **or** an already-augmented tibble from | ||
| #' `augment.FIMSFit()`. | ||
| #' @param stream_label Character scalar. The value of the `label` column to | ||
| #' retain, e.g. `"landings_expected"`, `"index_expected"`, |
There was a problem hiding this comment.
| #' retain, e.g. `"landings_expected"`, `"index_expected"`, | |
| #' retain, e.g., `"landings_expected"`, `"index_expected"`, |
| #' `"agecomp_expected"`, or `"lengthcomp_expected"`. If `NULL` (default), | ||
| #' no filtering on label is done. | ||
| #' @param module_id Integer scalar. The `module_id` of the fleet or survey to | ||
| #' retain (e.g. `1` for the first fishing fleet, `2` for the first survey in |
There was a problem hiding this comment.
| #' retain (e.g. `1` for the first fishing fleet, `2` for the first survey in | |
| #' retain (e.g., `1` for the first fishing fleet, `2` for the first survey in |
| # calling obj$gr() again — which segfaults after clear() frees C++ memory. | ||
| # as.numeric() is required because TMB's obj$gr() returns a matrix (1×n) | ||
| # in some call patterns, which would fail the "numeric" slot type check. | ||
| gradient_vec <- if (length(opt) > 0) { |
There was a problem hiding this comment.
| gradient_vec <- if (length(opt) > 0) { | |
| gradient_vector <- if (length(opt) > 0) { |
| NA_real_ | ||
| rep(NA_real_, length(obj[["par"]])) | ||
| } | ||
| max_gradient <- if (length(opt) > 0) max(abs(gradient_vec)) else NA_real_ |
There was a problem hiding this comment.
| max_gradient <- if (length(opt) > 0) max(abs(gradient_vec)) else NA_real_ | |
| max_gradient <- if (length(opt) > 0) max(abs(gradient_vector)) else NA_real_ |
| obj = obj, | ||
| opt = opt, | ||
| max_gradient = max_gradient, | ||
| gradient = gradient_vec, |
There was a problem hiding this comment.
| gradient = gradient_vec, | |
| gradient = gradient_vector, |
What is the feature?
#1452
Add broom/yardstick compatibility via tidy(), glance(), and augment()
This PR includes
(term, estimate, std.error, statistic, p.value); supports conf.int
and filtering by estimation_type
max_gradient, converged, and terminal_ssb
conventions (.truth, .pred, .weight) for direct use with
yardstick metric functions
Also adds get_fit_metrics() and get_fit_stream() as convenience
wrappers for computing grouped yardstick metrics from a FIMSFit.
In get_estimates(): obj$gr() was being
called twice — once in FIMSFit() and again in reshape_tmb_estimates().
The gradient vector is now computed once, stored
in a new FIMSFit@gradient slot, and passed through to avoid the second
C++ call after clear() has freed memory.
Instructions for code reviewer
👋Hello reviewer👋, thank you for taking the time to review this PR!
nit:(for nitpicking) as the comment type. For example,nit:I prefer using adata.frame()instead of amatrixbecause ...This PR is now ready to be merged.Checklist