-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to mitigate the CUDA-error: Out-of-memory? #10
Comments
I found even I used 8 A100 card with your given parameters: chunk size 16*64, the error still happened. |
I changed a docker image with CUDA11.3. The code can run normally. Previously I used the docker image with CUDA11.7. Sorry for bothering you. |
But I still wonder how to reduce GPU memory because I want to run it on other Cards like V100 (32GB) |
Great to know that you have the code working on your end on A100 GPUs. To further reduce the memory, you can try the following:
All of the above, we have not tried on our end locally, so we haven't benchmarked the exact memory savings they would generate, but please feel free to give these a try and let us know how it goes. Hope it helps your research! |
Thanks a lot! I will try your advice. |
Hello! Since I have only 4 A100 available now, I reduced the chunk size from 16*64 to 256. But Out-of-memory error still appears, do you have any idea to fix it?
The text was updated successfully, but these errors were encountered: