Skip to content

Use robust statistic in NVBench summary #342

@oleksandr-pavlyk

Description

@oleksandr-pavlyk

Presently, NVBench uses mean and standard deviation to report the aggregate of and the noise in the timing measurements.

These statistics are sensitive to outliers, which tend to be caused by the changes outside of benchmarked kernels. These outliers are the major source of variability in benchmark results. See #316 (comment)

This issue is to replace uses of conventional location and dispersion statistics with their robust analogs.

  • JSON file would report the following values for GPU time isolated measurements:
    • first quartile $Q_1$, with tag "nv/cold/time/gpu/q1"
    • median $Q_2$, with tag "nv/cold/time/gpu/median"
    • third quartile $Q_3$, with tag "nv/cold/time/gpu/q3"
    • interquartile range $Q_3 - Q_1$, with tag "nv/cold/time/gpu/ir/absolute"
    • relative interquartile range, $(Q_3 - Q_1)/Q_2$, with tag "nv/cold/time/gpu/ir/relative"

This PR also proposes to make "nv/cold/time/gpu/mean" and "nv/cold/time/gpu/stdev/relative" hidden and replace them with "nv/cold/time/gpu/median" and "nv/cold/time/gpu/ir/relative", respectively.

Case study details
./build2/bin/nvbench.example.cpp20.axes -b copy_sweep_grid_shape -a "BlockSize[pow2]=[8,8,8,8,8]" -a NumBlocks=64 --no-batch --warmup-runs 800 --stopping-criterion entropy --jsonbin percentiles3.json
(py314) opavlyk@NV-22T4X34:~/repos/nvbench$ jq '. | .benchmarks[] | .states[] | .summaries[] | select(.tag == "nv/cold/time/gpu/ir/relative") | .data[] | .value' percentiles3.json
"3.686392307281506e-05"
"7.168054580688598e-06"
"8.192062377929826e-06"
"9.21595096588148e-06"
"8.192062377929826e-06"
(py314) opavlyk@NV-22T4X34:~/repos/nvbench$ jq '. | .benchmarks[] | .states[] | .summaries[] | select(.tag == "nv/cold/time/gpu/stdev/relative")
| .data[] | .value' percentiles3.json
"0.01159815695469047"
"0.004487982358438557"
"0.0051512341616596485"
"0.007530377673595805"
"0.0050527872175617035"

It makes sense to make this change uniformly for all reported timings, i.e. Batched GPU times, CPU times, Cold GPU times, CPU-only times, etc.

Metadata

Metadata

Labels

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions