Conversation
…overlapping key values, enabling mismatches to surface through the full join comparison.
R/ard_compare.R
Outdated
| #' | ||
| #' ard_compare(ard_base, ard_modified)$stat | ||
| #' | ||
| ard_compare <- function(x, y, key_columns = NULL) { |
There was a problem hiding this comment.
All of our functions that begin with ard_*() create new ARDs. Let's name the function compare_ard().
There was a problem hiding this comment.
Renamed, and the ard_compare file is also moved to compare_ard() (as well as the test file)
R/ard_compare.R
Outdated
| #' | ||
| #' ard_compare(ard_base, ard_modified)$stat | ||
| #' | ||
| ard_compare <- function(x, y, key_columns = NULL) { |
There was a problem hiding this comment.
Let's make the default value keys = c(all_ard_groups(), all_ard_variables(), any_of(c("variable", "variable_level", "stat_name"))).
There was a problem hiding this comment.
This is done as a default value in the argument
R/ard_compare.R
Outdated
| #' | ||
| #' ard_compare(ard_base, ard_modified)$stat | ||
| #' | ||
| ard_compare <- function(x, y, key_columns = NULL) { |
There was a problem hiding this comment.
Let's add an argument of the columns to compare, compare = any_of(c("stat_label", "stat", "stat_fmt")).
(Is there anything else we should compare by default?)
There was a problem hiding this comment.
these 3 are added, think it's a good default
R/ard_compare.R
Outdated
| check_class(x, cls = "card") | ||
| check_class(y, cls = "card") | ||
|
|
||
| .validate_environment_metadata(x, y, call = get_cli_abort_call()) |
There was a problem hiding this comment.
Let's remove the env checking for now. It's quite complicated.
There was a problem hiding this comment.
All env checking is removed
R/ard_compare.R
Outdated
|
|
||
| .validate_environment_metadata(x, y, call = get_cli_abort_call()) | ||
|
|
||
| primary_x <- |
There was a problem hiding this comment.
Here we can evaluate the keys and compare columns with
keys <- .process_keys_arg(x, y, keys = {{ keys }})
compare <- .process_compare_arg(x, y, compare = {{ compare }})
# outside the function we define these functions
.process_keys_arg <- function(x, y, keys) {
keys <- intersect(cards_select({{ keys }}, data = x), cards_select({{ keys }}, data = y))
.check_not_empty(keys)
cli::cli_inform("The comparison {.arg keys} are {.emph {.val {keys}}}.")
keys
}
.process_compare_arg <- function(x, y, compare) {
# add checks and return evaluated compare vector...
}
.check_not_empty <- function(x, arg_name = rlang::caller_arg(x)) {
if (rlang::is_empty()) {
cli::cli_abort("The {.arg {arg_name}} argument cannot be empty.")
}
invisible(x)
}There was a problem hiding this comment.
These were added. Please just check the "add checks and return evaluated compare vector" part if that's fine.
R/ard_compare.R
Outdated
| fmt_column <- if ("fmt_fun" %in% names(x) || "fmt_fun" %in% names(y)) { | ||
| "fmt_fun" | ||
| } else if ("fmt_fn" %in% names(x) || "fmt_fn" %in% names(y)) { | ||
| "fmt_fn" | ||
| } else { | ||
| "fmt_fun" | ||
| } |
There was a problem hiding this comment.
Let's just use the columns provided in the compare argument to assess which comparisons to make. We can compare all columns in the same way.
R/ard_compare.R
Outdated
| y_selected <- .ensure_column(y_selected, column) | ||
| } | ||
|
|
||
| # .check_rows_not_in_x_y(x_selected, y_selected, key_columns) |
There was a problem hiding this comment.
Here we can initialize an empty list of results.
results <- rlang::rep_named(c("rows_in_x_not_y", "rows_in_y_not_x"), list(NULL))
results[["compare"]] <- rlang::rep_named(compare, list(NULL))In this example the "compare" element will also be a named list. The names are the columns that we compare.
We could then follow this up with calls to functions that will populate these parts of the list, e.g.
results[["rows_in_x_not_y"]] <- .compare_rows(x, y) # returns the results of the anti join of x and y on the key columns
results[["rows_in_y_not_x"]] <- .compare_rows(y, x) # same as above, but reversed
results[["compare"]] <- .compare_columns(x, y, compare) # loop through the columns we will compare and return a named list of data frames where each data frame contains the rows that are not equal between x and y. The data frame will have the key columns and the two columns compared (from x and y).
R/ard_compare.R
Outdated
|
|
||
| names(mismatch_list) <- names(comparison_targets) | ||
|
|
||
| mismatch_list |
There was a problem hiding this comment.
Lastly, the function will return the results object, and add a class onto this list.
After we get this settled, we will write a print method for class to make it nice.
There was a problem hiding this comment.
Added with structure(results, class = c("ard_comparison", class(results)))
move all helper functions to compare_are_helpers
| #' (based on key columns) | ||
| #' - `rows_in_y_not_x`: data frame of rows present in `y` but not in `x` | ||
| #' (based on key columns) | ||
| #' - `compare`: a named list where each element is a data frame containing |
There was a problem hiding this comment.
Do you mean rename the "compare" to "diff"? That's fine with me yes.
| #' | ||
| #' compare_ard(ard_base, ard_modified)$compare$stat | ||
| #' | ||
| compare_ard <- function(x, |
There was a problem hiding this comment.
I really like the construction and concept of the main function. The helper functions are a bit of a Dedalus but I will take a closer look during the week ;)
There was a problem hiding this comment.
Sure! And I learnt a new word here :)
| #' @examples | ||
| #' ard_base <- ard_summary(ADSL, variables = AGE) | ||
| #' ard_modified <- ard_summary(dplyr::mutate(ADSL, AGE = AGE + 1), variables = AGE) | ||
| #' | ||
| #' compare_ard(ard_base, ard_modified)$compare$stat |
There was a problem hiding this comment.
maybe I would make the more examples more complex (so to show groups etc)
There was a problem hiding this comment.
I'll add some more complex ones.
R/compare_ard_helpers.R
Outdated
| # perform inner join to compare only matching rows | ||
| comparison <- dplyr::inner_join( | ||
| x_selected, | ||
| y_selected, | ||
| by = keys, | ||
| suffix = c(".x", ".y") | ||
| ) |
There was a problem hiding this comment.
for some reason this does not drop the "card" class producing a contracted output (not showing some columns). Should we keep this behavior or should we drop the card class? maybe defining card_diff new class?
$stat
{cards} data frame: 6 x 4
variable stat_name
1 AGE mean
2 AGE median
3 AGE p25
4 AGE p75
5 AGE min
6 AGE max
ℹ 2 more variables: stat.x, stat.y
There was a problem hiding this comment.
Good point, I accepted the below suggestion
Melkiades
left a comment
There was a problem hiding this comment.
Amazing work!! I really like it, output is very intuitive and efficient in showing differences. I added a minor comment on the final compare output that is hiding columns and I would add more examples but for me the code is almost complete. I was wondering if we could need a +- diff like testthat snapshots as an optional diff output... Dont know, for now it is pretty easy to understand the output.
I still need to test this with more complex differences (e.g. I wonder if you solve the fmt before getting into the comparison or not - we should if not), but I wait to try the additional examples ;)
Co-authored-by: Davide Garolini <dgarolini@gmail.com> Signed-off-by: Malan <64360731+malanbos@users.noreply.github.com>
Co-authored-by: Davide Garolini <dgarolini@gmail.com> Signed-off-by: Malan <64360731+malanbos@users.noreply.github.com>
Co-authored-by: Davide Garolini <dgarolini@gmail.com> Signed-off-by: Malan <64360731+malanbos@users.noreply.github.com>
| matches <- mapply( | ||
| identical, |
There was a problem hiding this comment.
as discussed -> all.equals
What changes are proposed in this pull request?
New function to compare two ARDs: ard_compare
Fixes#437 , @malanbos
Comes with top-level ard_compare function, as well as a script with ard_compare_helpers.R, and check_environment.R to modularize it.
Pre-review Checklist (if item does not apply, mark is as complete)
usethis::pr_merge_main()devtools::test_coverage()Reviewer Checklist (if item does not apply, mark is as complete)
pkgdown::build_site(). Check the R console for errors, and review the rendered website.devtools::test_coverage()When the branch is ready to be merged:
NEWS.mdwith the changes from this pull request under the heading "# cards (development version)". If there is an issue associated with the pull request, reference it in parentheses at the end update (seeNEWS.mdfor examples).Optional Reverse Dependency Checks:
Install
checkedwithpak::pak("Genentech/checked")orpak::pak("checked")