Hello,
Thank you for sharing this great implementation with the community.
I just wanted to open this Issue and share my success in running the OPT-175B model on a DGX station.

The model takes ~3 minutes to load and it uses ~58% of memory on the first 7 GPUs and 28% of the last one.
Please feel free to close this issue.
Hello,
Thank you for sharing this great implementation with the community.
I just wanted to open this Issue and share my success in running the OPT-175B model on a DGX station.
The model takes ~3 minutes to load and it uses ~58% of memory on the first 7 GPUs and 28% of the last one.
Please feel free to close this issue.