Calculate metrics across bootstrapped folds — bootstrap

Calculate metrics across bootstrapped folds of the data; optionally, according to one or more grouped variables. Bootstrap size is equal to the size of the data or each grouped variable, and rows are sampled with replacement.

Usage

bootstrap_metrics(.data, ..., .by = NULL, .seed = 10, .n_boots = 1000)

summarise_bootstrap_metrics(.data, .by = NULL, .conf_level = 0.95)

summarize_bootstrap_metrics(.data, .by = NULL, .conf_level = 0.95)

Arguments

.data: A data frame or data frame extension (e.g. a tibble).
...: <data-masking> Name-value pairs of summary functions to compute across bootstrapped folds of the data. The name will be the name of the variable in the result.
.by: <tidy-select> Optionally, variables to group by. Each group is resampled independently.
.seed: Single value for the random seed, used for reproducible random sampling.
.n_boots: Number of bootstrapped samples to create (per group).
.conf_level: The confidence level to use for the confidence interval. Must be strictly between 0 and 1. Defaults to a 95 percent confidence interval.

Value

For bootstrap_metrics(): A tibble::tibble() whose columns are a combination of the summary expressions and grouping keys that you provide, plus a boot column indicating each bootstrap fold.

For summarise_bootstrap_metrics(): A tibble::tibble() with, for each bootstrapped metric, three columns suffixed _mid, _lower, and _upper, giving the mean and the lower and upper quantiles implied by .conf_level, respectively.

Useful metrics

(Normalized) Root mean squared error: rmse(), nrmse()
Mean (absolute) percent error: mpe(), mape()
Accuracy: accuracy()

Examples

set.seed(99)
df <- data.frame(
  observationid = rep(1:1000, 4),
  model = rep(c(rep("A", 1000), rep("B", 1000)), 2),
  patient_type = "general",
  prediction_type = c(rep("a priori", 2000), rep("a posteriori", 2000)),
  res = c(
    rnorm(1000, 2, 3),
    rnorm(1000, 0.1, 1),
    rnorm(1000, 1, 3),
    rnorm(1000, 0, 0.5)
  ),
  tdm = rnorm(4000, 10, 5)
)

boots <- bootstrap_metrics(
  df,
  rmse = rmse(tdm, tdm - res),
  nrmse = nrmse(tdm, tdm - res),
  accuracy = accuracy(tdm, tdm - res, 2.5, 0.2),
  .by = c(model, patient_type, prediction_type),
  .n_boots = 100
)

summarise_bootstrap_metrics(
  boots,
  .by = c(model, patient_type, prediction_type)
)
#> # A tibble: 4 × 12
#>   model patient_type prediction_type rmse_mid rmse_lower rmse_upper nrmse_mid
#>   <chr> <chr>        <chr>              <dbl>      <dbl>      <dbl>     <dbl>
#> 1 A     general      a posteriori       3.21       3.07       3.37     0.326 
#> 2 A     general      a priori           3.61       3.46       3.76     0.362 
#> 3 B     general      a posteriori       0.490      0.469      0.506    0.0487
#> 4 B     general      a priori           0.986      0.945      1.04     0.0972
#> # ℹ 5 more variables: nrmse_lower <dbl>, nrmse_upper <dbl>, accuracy_mid <dbl>,
#> #   accuracy_lower <dbl>, accuracy_upper <dbl>