BenchmarkScore implementation by bblodfon · Pull Request #36 · mlr-org/mlr3benchmark

bblodfon · 2023-08-26T15:04:18Z

No description provided.

- 'task_id' and 'learner_id' have to be factors

bblodfon · 2023-08-26T15:05:26Z

@sebffischer still needs some discussion about documentation, etc. before reviewing

* remove independent parameter * better doc * no check for ntasks as this class is the container only

bblodfon · 2023-08-30T11:47:39Z

@berndbischl @sebffischer please review - it's just the new container Class. I will do a new PR with the refactoring of BenchmarkAggr and the separate tests.

bblodfon · 2023-08-30T13:38:33Z

So I removed the independent parameter, since this will be decided based on the test you use, not the container/class of the benchmarking results

sebffischer

Thanks a lot already, looking good! :) Just let me know when you disagree on something

sebffischer · 2023-09-06T11:00:48Z

 #' for example the result of [mlr3::BenchmarkResult]`$aggregate` or via [as_benchmark_aggr],
 #' or by passing in a custom dataset of results. Custom datasets must include at the very least,
-#' a character column for learner ids, a character column for task ids, and numeric columns for
+#' a factor column for learner ids, a factor column for task ids, and numeric columns for


was this incorrectly documented or why was this changed?

sebffischer · 2023-09-18T14:17:20Z

+    #' @param task_id (`factor(1)`) \cr
    #' String specifying name of task id column.
-    #' @param learner_id (`character(1)`)\cr
+    #' @param learner_id (`factor(1)`)\cr


why must this be a factor? I don't want to wrote learner_id = factor("regr.lm") I want to write learner_id = "regr.lm"

sebffischer · 2023-09-18T14:20:02Z


    #' @description Subsets the data by given tasks or learners.
    #' Returns data as [data.table::data.table].
-    #' @param task (`character()`) \cr


I might even tend to remove this subset method as it is rather a class method than an instance method (from which you would expect it to modify the instance in-place I guess, at least that would be in line with the rest of mlr3). What do you think?

sebffischer · 2023-09-18T14:22:15Z

+#' # equivalently
+#' as_benchmark_score(df, task_id = "tasks", learner_id = "learners", iteration = "iters")
+#'
+#' if (requireNamespaces(c("mlr3", "rpart"))) {


there is require_namespaces() in mlr3misc, I think I would use it and remove the requireNamespaces helper function from mlr3benchmark as they do the same thing, no need to keep it twice.

sebffischer · 2023-09-18T14:23:10Z

+#'
+#' @details This class is used as a container of benchmarking results where
+#' multiple learners (models) have been tested against multiple tasks (datasets)
+#' using a resampling scheme. The results stored are the per-resampling


here it seems like the resampling scheme is the same for all task-learner combinations. Is this the case? Otherwise rephrase.

sebffischer · 2023-09-18T14:43:23Z

+    #' @field iterations `(numeric())` \cr Unique resampling iterations.
+    iterations = function() unique(private$.dt[[self$col_roles$iteration]]),
+    #' @field measures `(character())` \cr Unique measure names.
+    measures = function() setdiff(colnames(private$.dt), unlist(self$col_roles)),


we don't really support adding measures, right? so we can also compute this once and return it here

sebffischer · 2023-09-18T14:44:08Z

+
+  private = list(
+    .col_roles = character(0),
+    .dt = data.table()


usually we initialize everything to NULL

sebffischer · 2023-09-18T14:44:24Z

+  )
+)
+
+#' @title Coercions to BenchmarkScore


Suggested change

#' @title Coercions to BenchmarkScore

#' @title Conversion to BenchmarkScore

sebffischer · 2023-09-18T14:44:33Z

+
+#' @title Coercions to BenchmarkScore
+#'
+#' @description Coercion methods to [BenchmarkScore].


Suggested change

#' @description Coercion methods to [BenchmarkScore].

#' @description Conversion methods to [BenchmarkScore].

sebffischer · 2023-09-18T14:49:48Z

    loss$lower = loss[, meas] - se * stats::qnorm(1 - (1 - level) / 2)
    loss$upper = loss[, meas] + se * stats::qnorm(1 - (1 - level) / 2)
-    ggplot(data = loss, aes_string(x = object$col_roles$learner_id, y = meas)) +
+    ggplot(data = loss, aes(x = .data[[object$col_roles$learner_id]], y = .data[[meas]])) +


you need to set .data to NULL otherwise some tools will complain that it does not exist

bblodfon added 10 commits August 24, 2023 10:31

disable logging during tests that use benchmark()

91ed2af

post-hoc Friedman-Nemenyi plots only make sense with 3 or more learners

4b06a08

fix warnings (ggplot2 updates)

720f85e

add Benavoli 2017 paper (for Bayesian model analysis)

7b0963f

correct doc

715f802

- 'task_id' and 'learner_id' have to be factors

fix bug in subset + add more tests

12b163a

better variable names in examples

8a74932

correct test (duplicated)

294b736

add BenchmarkScore class + tests

048f043

add lgr to Suggests

c583b8c

bblodfon mentioned this pull request Aug 26, 2023

Add three class types for benchmark results #2

Open

refactoring

a1d7eef

* remove independent parameter * better doc * no check for ntasks as this class is the container only

remove 'independent' parameter

6a2b1b9

bblodfon added 2 commits August 30, 2023 15:41

add check for full benchmark design

6afe2ab

better doc

b623515

sebffischer requested changes Sep 18, 2023

View reviewed changes

jemus42 mentioned this pull request Jan 19, 2024

fix: use %in% in BenchmarkAggr$subset() rather than == #38

Closed

	#' @title Coercions to BenchmarkScore
	#' @title Conversion to BenchmarkScore

	#' @description Coercion methods to [BenchmarkScore].
	#' @description Conversion methods to [BenchmarkScore].

Uh oh!

Conversation

bblodfon commented Aug 26, 2023

Uh oh!

bblodfon commented Aug 26, 2023

Uh oh!

bblodfon commented Aug 30, 2023

Uh oh!

bblodfon commented Aug 30, 2023

Uh oh!

sebffischer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants