Conversation
drossetti
commented
Mar 3, 2020
- print estimated bw, useful for large buffer sizes
- add -d param
- add warmup extra iterations and -w param
That helps comparing performance for large buffer sizes
|
@pakmarkthub mind having a look? |
| break; | ||
| case 'h': | ||
| printf("syntax: %s -s <buf size> -d <gpu dev id> -w <write iters> -r <read iters> -h[help] -c[do-cuMemcpy]\n", argv[0]); | ||
| printf("syntax: %s [-s <buf size>][-d <gpu dev id>][-w <write iters>][-r <read iters>][-h][-c][-w]\n" |
There was a problem hiding this comment.
The last option should be [-W <# iterations>]. You forgot to capitalize the letter.
| printf("syntax: %s -s <buf size> -d <gpu dev id> -w <write iters> -r <read iters> -h[help] -c[do-cuMemcpy]\n", argv[0]); | ||
| printf("syntax: %s [-s <buf size>][-d <gpu dev id>][-w <write iters>][-r <read iters>][-h][-c][-w]\n" | ||
| "-c benchmark cuMemcpy\n" | ||
| "-w <# iterations> modify warmup (default %d)\n", |
There was a problem hiding this comment.
Capitalize the latter W.
| // manually tuned... | ||
| int num_write_iters = 10000; | ||
| int num_read_iters = 100; | ||
| int small_size_iter_factor = 1000; |
There was a problem hiding this comment.
I understand the intention and usefulness for small sizes. However, it changes what the number of iterations users specify. Is there a better way to do this or could you provide an explanation message? Currently, the users need to read the code in order to know that small sizes and large sizes use different number of iterations.
| bool do_cumemcpy = false; | ||
| struct timespec beg, end; | ||
| double lat_us; | ||
| double bw; |
There was a problem hiding this comment.
Isn’t this redundant with copybw?
If you want to do shmoo for bw, is it better to rename the test? “copylat” doesn’t sound right anymore in that case.