Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add patch to fix quantization failure in PyTorch 1.11.0 on POWER #18489

Merged

Conversation

Flamefire
Copy link
Contributor

@Flamefire Flamefire commented Aug 8, 2023

(created using eb --new-pr)

PyTorch-1.10.0_fix-fp16-quantization-without-fbgemm.patch is missing which didn't apply even though pytorch/pytorch#84750 was merged. The merge is however only in PyTorch 2.0. The patch didn't apply because the code was reformatted in pytorch/pytorch@e60fd10

This PR adds an updated version of that Patch which applies to all PyTorch 1.11-1.13 versions so far.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
taurusml22 - Linux RHEL 7.6, POWER, 8335-GTX (power9le), 6 x NVIDIA Tesla V100-SXM2-32GB, 440.64.00, Python 2.7.5
See https://gist.github.com/Flamefire/ba7aa91a9382c7883a0f0d5502948dc8 for a full test report.

@Flamefire
Copy link
Contributor Author

Flamefire commented Aug 9, 2023

Test report by @Flamefire
FAILED SUCCESS (see below)
Build succeeded for 0 out of 1 (1 easyconfigs in total)
taurusi8026 - Linux CentOS Linux 7.9.2009, x86_64, AMD EPYC 7352 24-Core Processor (zen2), 8 x NVIDIA NVIDIA A100-SXM4-40GB, 470.57.02, Python 2.7.5
See https://gist.github.com/Flamefire/fa230960cd0bb0137bc803ace4d779dd for a full test report.

Only a single failure: distributed/test_distributed_spawn. We exclude this in other ECs as it times out on Ampere GPUs which matches what I see here --> Excluded this test. As the other tests succeed I won't rerun this

@branfosj
Copy link
Member

Test report by @branfosj
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
bear-pg0203u29a.bear.cluster - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz (icelake), 1 x NVIDIA NVIDIA A100-SXM4-80GB, 520.61.05, Python 3.6.8
See https://gist.github.com/branfosj/1f41e5f624b64e9593b2ff6d4c798103 for a full test report.

@casparvl
Copy link
Contributor

Test report by @casparvl
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
gcn3.local.snellius.surf.nl - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz, 4 x NVIDIA NVIDIA A100-SXM4-40GB, 520.61.05, Python 3.6.8
See https://gist.github.com/casparvl/7326feb83763409121944330a8bee114 for a full test report.

@branfosj branfosj added this to the next release (4.8.1?) milestone Aug 12, 2023
@branfosj
Copy link
Member

Going in, thanks @Flamefire!

@branfosj branfosj merged commit 92c0c4b into easybuilders:develop Aug 12, 2023
@Flamefire Flamefire deleted the 20230808140733_new_pr_PyTorch1110 branch August 13, 2023 18:28
@boegel boegel changed the title Fix quantization failure in PyTorch 1.11.0 on POWER add patch to fix quantization failure in PyTorch 1.11.0 on POWER Aug 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants