-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mixed Precision Support #69
Conversation
Thank you so much for your great efforts! We were currently running experiments using this new version to verify whether it has any negative influence on the performance. Btw, I'm wondering if it is possible to also support mixed precision for |
Great! Please let me know your results for performance. It should be no problem adding those functions. If possible, it would be great if you could provide me with a minimum test script that uses these functions, similar to |
The inference of SPVNAS should be a pretty good example to test these functions: https://github.com/mit-han-lab/spvnas. Thanks! |
The large-scale experiments of MinkowskiNet on NuScenes have just finished:
|
Great, I'm glad it works! 10% and 40% was about what I saw as well in my tests. I added support for insertion and devoxelization in half/double precision with my latest commits. It worked well on my spvnas inference test but I didn't test the backwards functions. Please let me know how it looks in your tests! |
Thanks for the efforts! I will launch some large-scale experiments to test these functions as well. UPDATE: The results of SPVNAS are similar to those of MinkowskiNet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks great! I think it's ready to be merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @CCInc for the great effort on mix precision support! I've also been through the changes and believe that this pull request is ready for merging.
Great, glad to hear it! I will also be happy to help implement SPVNAS architecture search or any other tasks you need, feel free to let me know in an email. |
I implemented basic half precision support compatible with
torch.cuda.amp.autocast
. I also annotated the c++ convolution code a bit.I experimented a lot with resizing tensors to have dimensions of multiples of 8, but it seems like it won't change execution time significantly, so I left that out. With
size=400000, batch_size=4
, on my 2080ti, I get the following results (I attached the nvprof benchmarks as well):Default precision (nvprof):
Mixed precision, no optimization (nvprof):
Mixed precision, all
mm
ops in multiples of 8 (nvprof):Looking at the nvprof results, it looks like barely any computation time is spent on
mm
ops anyways:But mixed precision will slightly speed up models and reduce their memory footprint to some degree. Let me know if you have any questions/suggestions. (should address #17 )