Presently, NVBench uses mean and standard deviation to report the aggregate of and the noise in the timing measurements.
These statistics are sensitive to outliers, which tend to be caused by the changes outside of benchmarked kernels. These outliers are the major source of variability in benchmark results. See #316 (comment)
This issue is to replace uses of conventional location and dispersion statistics with their robust analogs.
- JSON file would report the following values for GPU time isolated measurements:
- first quartile $Q_1$, with tag
"nv/cold/time/gpu/q1"
- median $Q_2$, with tag
"nv/cold/time/gpu/median"
- third quartile $Q_3$, with tag
"nv/cold/time/gpu/q3"
- interquartile range $Q_3 - Q_1$, with tag
"nv/cold/time/gpu/ir/absolute"
- relative interquartile range, $(Q_3 - Q_1)/Q_2$, with tag
"nv/cold/time/gpu/ir/relative"
This PR also proposes to make "nv/cold/time/gpu/mean" and "nv/cold/time/gpu/stdev/relative" hidden and replace them with "nv/cold/time/gpu/median" and "nv/cold/time/gpu/ir/relative", respectively.
Case study details
./build2/bin/nvbench.example.cpp20.axes -b copy_sweep_grid_shape -a "BlockSize[pow2]=[8,8,8,8,8]" -a NumBlocks=64 --no-batch --warmup-runs 800 --stopping-criterion entropy --jsonbin percentiles3.json
(py314) opavlyk@NV-22T4X34:~/repos/nvbench$ jq '. | .benchmarks[] | .states[] | .summaries[] | select(.tag == "nv/cold/time/gpu/ir/relative") | .data[] | .value' percentiles3.json
"3.686392307281506e-05"
"7.168054580688598e-06"
"8.192062377929826e-06"
"9.21595096588148e-06"
"8.192062377929826e-06"
(py314) opavlyk@NV-22T4X34:~/repos/nvbench$ jq '. | .benchmarks[] | .states[] | .summaries[] | select(.tag == "nv/cold/time/gpu/stdev/relative")
| .data[] | .value' percentiles3.json
"0.01159815695469047"
"0.004487982358438557"
"0.0051512341616596485"
"0.007530377673595805"
"0.0050527872175617035"
It makes sense to make this change uniformly for all reported timings, i.e. Batched GPU times, CPU times, Cold GPU times, CPU-only times, etc.
Presently, NVBench uses mean and standard deviation to report the aggregate of and the noise in the timing measurements.
These statistics are sensitive to outliers, which tend to be caused by the changes outside of benchmarked kernels. These outliers are the major source of variability in benchmark results. See #316 (comment)
This issue is to replace uses of conventional location and dispersion statistics with their robust analogs.
"nv/cold/time/gpu/q1""nv/cold/time/gpu/median""nv/cold/time/gpu/q3""nv/cold/time/gpu/ir/absolute""nv/cold/time/gpu/ir/relative"This PR also proposes to make
"nv/cold/time/gpu/mean"and"nv/cold/time/gpu/stdev/relative"hidden and replace them with"nv/cold/time/gpu/median"and"nv/cold/time/gpu/ir/relative", respectively.Case study details
It makes sense to make this change uniformly for all reported timings, i.e. Batched GPU times, CPU times, Cold GPU times, CPU-only times, etc.