add patch to fix quantization failure in PyTorch 1.11.0 on POWER #18489

Flamefire · 2023-08-08T12:07:41Z

(created using eb --new-pr)

PyTorch-1.10.0_fix-fp16-quantization-without-fbgemm.patch is missing which didn't apply even though pytorch/pytorch#84750 was merged. The merge is however only in PyTorch 2.0. The patch didn't apply because the code was reformatted in pytorch/pytorch@e60fd10

This PR adds an updated version of that Patch which applies to all PyTorch 1.11-1.13 versions so far.

Flamefire · 2023-08-09T00:04:27Z

Test report by @Flamefire
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
taurusml22 - Linux RHEL 7.6, POWER, 8335-GTX (power9le), 6 x NVIDIA Tesla V100-SXM2-32GB, 440.64.00, Python 2.7.5
See https://gist.github.com/Flamefire/ba7aa91a9382c7883a0f0d5502948dc8 for a full test report.

Flamefire · 2023-08-09T14:02:29Z

Test report by @Flamefire
~~FAILED~~ SUCCESS (see below)
Build succeeded for 0 out of 1 (1 easyconfigs in total)
taurusi8026 - Linux CentOS Linux 7.9.2009, x86_64, AMD EPYC 7352 24-Core Processor (zen2), 8 x NVIDIA NVIDIA A100-SXM4-40GB, 470.57.02, Python 2.7.5
See https://gist.github.com/Flamefire/fa230960cd0bb0137bc803ace4d779dd for a full test report.

Only a single failure: distributed/test_distributed_spawn. We exclude this in other ECs as it times out on Ampere GPUs which matches what I see here --> Excluded this test. As the other tests succeed I won't rerun this

branfosj · 2023-08-11T20:19:42Z

Test report by @branfosj
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
bear-pg0203u29a.bear.cluster - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz (icelake), 1 x NVIDIA NVIDIA A100-SXM4-80GB, 520.61.05, Python 3.6.8
See https://gist.github.com/branfosj/1f41e5f624b64e9593b2ff6d4c798103 for a full test report.

casparvl · 2023-08-11T21:20:38Z

Test report by @casparvl
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
gcn3.local.snellius.surf.nl - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz, 4 x NVIDIA NVIDIA A100-SXM4-40GB, 520.61.05, Python 3.6.8
See https://gist.github.com/casparvl/7326feb83763409121944330a8bee114 for a full test report.

branfosj · 2023-08-12T07:07:12Z

Going in, thanks @Flamefire!

Fix quantization failure in PyTorch 1.11.0 on POWER

5d870a0

This was referenced Aug 8, 2023

add patches to fix PyTorch 1.12 (CPU-only) w/ foss/2022a on POWER #18490

Merged

add patches to fix PyTorch 1.13.1 w/ foss/2022a on POWER + fix flaky test_jit_legacy test #18500

Merged

Exclude timed out test

79775e0

branfosj added the bug fix label Aug 12, 2023

branfosj added this to the next release (4.8.1?) milestone Aug 12, 2023

branfosj approved these changes Aug 12, 2023

View reviewed changes

branfosj merged commit 92c0c4b into easybuilders:develop Aug 12, 2023

Flamefire deleted the 20230808140733_new_pr_PyTorch1110 branch August 13, 2023 18:28

boegel changed the title ~~Fix quantization failure in PyTorch 1.11.0 on POWER~~ add patch to fix quantization failure in PyTorch 1.11.0 on POWER Aug 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add patch to fix quantization failure in PyTorch 1.11.0 on POWER #18489

add patch to fix quantization failure in PyTorch 1.11.0 on POWER #18489

Flamefire commented Aug 8, 2023 •

edited

Loading

Flamefire commented Aug 9, 2023

Flamefire commented Aug 9, 2023 •

edited

Loading

branfosj commented Aug 11, 2023

casparvl commented Aug 11, 2023

branfosj commented Aug 12, 2023

add patch to fix quantization failure in PyTorch 1.11.0 on POWER #18489

add patch to fix quantization failure in PyTorch 1.11.0 on POWER #18489

Conversation

Flamefire commented Aug 8, 2023 • edited Loading

Flamefire commented Aug 9, 2023

Flamefire commented Aug 9, 2023 • edited Loading

branfosj commented Aug 11, 2023

casparvl commented Aug 11, 2023

branfosj commented Aug 12, 2023

Flamefire commented Aug 8, 2023 •

edited

Loading

Flamefire commented Aug 9, 2023 •

edited

Loading