I tried to execute compression from the Alexnet notebook example.
It gave me the output of AlexNet_MEM.pth and AlexNet_FLOP.pth
Then I tried to load the model and compare the memory with 'nvidia-smi' command.
Unfortunately, I don't see any memory improvement for both.
I also tried to compare the throughput by comparing time to do inference for each model.
Again, I'm unable to see improvement. All of them took almost the same time to do inference.
Could you advice how did you compare the performance?
I tried to execute compression from the Alexnet notebook example.
It gave me the output of AlexNet_MEM.pth and AlexNet_FLOP.pth
Then I tried to load the model and compare the memory with 'nvidia-smi' command.
Unfortunately, I don't see any memory improvement for both.
I also tried to compare the throughput by comparing time to do inference for each model.
Again, I'm unable to see improvement. All of them took almost the same time to do inference.
Could you advice how did you compare the performance?